Understanding Web Performance Test Results Part 1
Guest Blogger: Arthur Zey
So you’re comparing two or more CDN solutions and they’ve all offered you website performance test results, extolling the virtues of their particular solutions. But you’re not necessarily an expert in web technologies, let alone CDN technology, so how can you trust what the sales reps are telling you? How do you interpret the results they present?
Over the next few weeks, I’ll be discussing the various issues you should be aware of when analyzing test results. By understanding what exactly is involved in web performance testing, you’ll be on the right path to finding the best CDN solutions to fit your business needs.
There are 7 things you should consider:
1. Backbone versus last mile testing
2. Why the performance monitoring company matters
3. Testing dynamic content versus cacheable content
4. Full-page versus single-object testing
5. Factoring in DNS resolution time
6. Ensure delivered size is the same
7. Disproportionate averaging of monitoring agents
In this installment, I’ll be covering the first two points.
Backbone vs. Last Mile Testing
Backbone testing is by far the most common form of testing offered by performance monitoring companies such as Gomez, Keynote, Catchpoint, and Webmetrics. They host their agents on major Internet backbones, thus avoiding inconsistency introduced by ISPs and routers in the so-called “last mile”. Last mile testing, on the other hand, is achieved by running measurement agents on the computers of participating independent individuals.
There is a widespread myth that because last mile testing uses real end users, it better approximates the performance that can be expected and is thus the superior testing methodology. And while it is true that last mile testing can give a more accurate comparison between performance with a particular CDN vendor and without any CDN at all, this is not entirely accurate.
If you plan to compare two different CDNs, last mile testing has some major pitfalls. Fundamentally, the problem is that it blends together the impact of many different factors in network performance. Imagine that under a backbone test (Figure 1), CDN Alpha takes 4 seconds to deliver some large object, while CDN Beta takes 2 seconds. It can be fairly said that Beta is twice as fast as Alpha. But with last mile testing, let’s say that Alpha takes 8 seconds and Beta takes 7 seconds.
How can you make sense of these results? The last mile seemed to have added 4 seconds to Alpha and 5 seconds to Beta. Is this because Alpha has more nodes closer to last mile users than does Beta, and what could the broader negative implications be of that kind of architecture? Is it just random Internet fluctuations that didn’t statistically normalize? Was the test configured with enough variation in the selected populations? Was it configured with populations that favored the connectivity of Alpha over Beta?
Of course, there’s also the problem of extreme variability over time. The same test on the same CDN may have very different results from day to day–and this may be a function of what last mile agents happen to be participating that day, an ISP’s congestion because of some local event, or any number of other factors. Figure 2 illustrates a very common result for two days of last mile testing. Sure, you’ll get an overall average performance, but how do you interpret the spike at 12:00PM the second day for CDN Alpha? Is that bad performance by the CDN or is it bad luck with respect to the last mile agents that were participating? Why didn’t the spike happen the previous day? Why is CDN Alpha doing better the first day, but worse the second?
By comparison to backbone testing, last mile testing raises many more questions, complicating the analysis. Backbone testing also raises some issues, such as what my colleague Ted Nixon describes as “CDN Bias” in his blog post, Challenges of Performance Testing for China CDN Vendors, and which I address next.
Why The Performance Monitoring Company Matters
Each CDN has their preferred performance monitoring vendor. So if CDN Alpha provides you with test results from one monitoring company, and CDN Beta provides you with test results from a different one, how do you interpret the results? What if Alpha and Beta both test with the same monitoring company? And more importantly, how do you account for “CDN Bias”?
It’s no secret that CDNs put their POPs as close as possible to monitoring companies’ agents, and in turn, monitoring companies put their agents in locations and on networks where there are the most real end users. So while it’s not “cheating” for CDNs to be guided by agent locations when growing their POP maps, it is important to know what to look out for in situations where different CDNs provide results from different monitoring companies, presumably with each showing performance favoring themselves.
One of the ways of making sense of this data is by looking at “connect time.” This factor, usually isolated in performance test results, refers to how long it takes for the monitoring company’s agent to connect to the CDN. When comparing the connect time for a particular agent to other agents (for the same CDN) and cross-referencing that with the connect times for the other CDN, you can get a sense for how much the performance results rely on the proximity of the CDN to the monitoring agent.
If connect times are roughly equivalent across agents and CDNs, then the overall performance is directly comparable.
However, if the connect times are too disparate, then you need to ask yourself: is the worse-performing CDN performing poorly simply because their POP wasn’t in the same datacenter, or is it that their network isn’t very mature? If CDN Alpha has a POP in country C, but CDN Beta does not, then it is clear: take Alpha’s better performance as a direct, legitimate indication of Alpha’s superiority in that region.
But if CDNs Alpha and Beta both have POPs in country C, then you can’t be as sure how to interpret the results. In that case, you need to take a closer look at more factors. Most importantly, was the comparison done on mostly dynamic objects or mostly cacheable objects? If mostly dynamic (by number of bytes), did one CDN convey the data (“content download time”) much faster than the other, proportional to the connect time? If mostly cacheable, did one CDN’s cache respond faster (the “first byte time”)?
The answers to these questions will allow you to more accurately compare the performance of two CDNs, despite bias introduced by each using their preferred performance monitoring company.
In the next installment, I will talk about how to effectively test both cacheable and dynamic content, as well as how to select between full-page and single-object testing. Stay tuned, and you’ll soon be on the right path to finding the best CDN vendor to fit your business needs.