WebPagetest Forums

Full Version: Why can a website on a CDN be accessed via two different URLs?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone,

I've spent several hours today reading this information-packed forum and learning the basics about optimization and about CDNs. I run a small forum (about 3000 visits/day), and I'm considering the use of a CDN. It may be an overkill for the small site, but I'm very curious how a CDN would improve the site's speed.

I'm confused about why it is that the same content of this site can be accessed via two different URLs:

http://www.webpagetest.org/

http://cdn.webpagetest.org/

My first though would be that this is not the way it should be. Is something misconfigured? Or am I missing something?

Are all sites that use CDN accessible via two URLs?
CDN's (at least the easier to use ones) generally run as a reverse proxy. You tell them the path to the content and they will fetch it automatically as needed. The path you provide can be anything you'd like - a custom server, a special directory or even the root of your site (which is what I did).

Even though it will sort of work, you're not supposed to access the site's content itself through the CDN, just the resources get pulled from there when you access the main site. By having the CDN overlay the root of my site it becomes an easy config setting to toggle the CDN on or off. If I need to turn it off, the site just works and I don't have to keep two copies of the files or worry if something isn't in the right directory.
Pat, thanks for answering the question the way you did. I'm starting to get a clearer picture of how it works and how I will likely set it up in my situation.

I will definitely stick around this forum for a while. Thanks for creating this great resource of knowledge on the topic of site optimization.
Here is what i do to avoid this duplicate url with origin pull CDNs.

1) Old setup

Website : http://www.example.com/
Media : http://www.example.com/media/

New setup :

Website : http://www.example.com/
Media : http://static.example.com/media/
CDN : cdn.example.com with static.example.com as origin server

And setup webserver (.htaccess, nginx, what have you) to only serve /media directory on static.example.com ... so now there is no way the normal pages will 'leak' to the cdn.
One thing we recommend is using a robots.txt for the CDN. If you have a Pull CDN, this can be down hrough mod_rewrite on the origin if your CDN does not allow a robots.txt override option. While it can still be seen from multiple URL's it avoids the duplicate content SEO issues.
Reference URL's