CDN experiences: good and bad, Amazon CloudFront vs CacheFly
I’ve been building, hosting and maintaining a variety of websites, both commercially and personally, for over a decade now. In the early days, this involved a cutting-edge setup, migrating from the original CERN httpd to a new but rapidly spreading webserver called “Apache” (which had just broken through the 40% market share barrier), running on a veritable behemoth of a server named Dux: a Sun Ultra Enterprise 2, packing a pair of UltraSPARC processors at an awe-inspiring 167 MHz and 192 Mb of RAM — roughly equivalent to my iPhone today, but in those days a system that powerful would be serving dozens of interactive users simultaneously as well as hosting multiple websites.
The first site we installed under Apache was not the main University website, for a variety of reasons; instead, the first site was a mirror of Yale Medical School’s ‘GASNet’ Anesthesiology website for the benefit of UK users. In those days, transatlantic bandwidth was limited and expensive — Janet, the network linking UK universities to each other and to the Internet, had just introduced a system of charging universities for the transatlantic bandwidth, because the single 45 Mbit/sec link was becoming congested — so mirroring popular US websites was quite a useful thing to do.
A decade later, the connections are two or more orders of magnitude faster and Janet has stopped traffic charging again, but mirroring is still important for performance and reliability reasons. Large downloads benefit particularly, so file distribution sites like Kernel.org and SourceForge have geographically distributed mirrors, with the latter also using geolocation to steer users towards downloading from relatively local servers. Large commercial entities also mirror their sites, either setting up a system themselves or outsourcing the work to a CDN (Content Delivery Network) such as Akamai, who distribute the data across many servers and steer requests towards local servers. Akamai are an extremely large, successful and expensive example, with well over 10,000 servers across six continents: users will always get your data very quickly from a local server, wherever they may be, however busy your site may get — but you’ll pay a pricetag to match.
None of my projects have ever been in the market for a solution on Akamai’s scale, but I have always liked the idea of CDN hosting, both in terms of web performance — which was, after all, the subject of my final year dissertation as a student — and the improved efficiency of reducing unnecessary backbone traffic. So, when I saw the CacheFly CDN offering a free trial with a cheap package ($15/month), I jumped at the chance, quickly copying some frequently-accessed static content (images and CSS files) up to their servers and pointing my links there.
I was disappointed. At the time, I discovered my access — from the UK — was being routed to servers which appeared to be in Sweden, sending traffic on a bizarre tour of little-used Internet connections across mainland Europe which meant accessing my content through the CDN was actually slower than fetching from either of my own servers, even the one in California! (For obvious reasons, British web users visit American websites far more frequently than Swedish ones, so our ISP networks are designed accordingly: multiple, massive pipes across the Atlantic, plus a token amount of capacity which can get your packets to Sweden if you really insist.) Things seem to have improved slightly these days: UK users visiting CacheFly’s own website now seem to be served from Chicago rather than Sweden, which is probably a little faster. Still far from ideal!
Enter Amazon. Having discovered people were (mis)using their S3 online storage system as CDN-style static hosting, a task for which it was adequate despite not being intended or designed to function as a CDN, they took the next logical step and added a genuine CDN offering, CloudFront. Like S3, it was trivial to set up, with no commitment; unlike S3, it’s a genuine CDN. You upload data into an S3 “bucket” as before, but point users to a CloudFront hostname rather than an S3 one: Amazon’s system takes care of copying data from S3 into CloudFront servers for you as needed, as well as routing user requests to their nearest CloudFront server. In my case, this means London; of my two servers in San Jose, one is served from a node a mere 2 ms away, the other from St Louis.