This week, the 3Q blog takes on the topic of testing – one near and dear to all digital marketers.
You don’t have to be a mathematician to report the results of ad copy testing; AdWords has the tools you need, and a simple Google search can help you find a calculator. But you do need to understand the fundamentals of statistical significance in order to read the test results correctly.
There are three key elements to identifying winners and losers: sample size, statistical significance, and practical significance. (You can get the full breakdown of SEM AdQ, 3Q’s approach to ad copy optimization, in my new whitepaper.)
Your accuracy, or your ability to predict future results based on past performance, increases as your sample size (in our case, impressions) increases. If we see 2 clicks in 10 impressions, we have a small sample size and are less confident that the 20% CTR is accurate in comparison to a situation where we have 1,000 impressions and 200 clicks. This is a key consideration to make when planning your tests because accounts or campaigns with low volume are unlikely to produce meaningful results.
Before we dive into statistical significance, it’s important to differentiate between past observations and future predictions. Results from the past are fact – we know exactly what CTR our ads had last month. What we don’t know with certainty is what CTR we’ll achieve in the coming month, even if not a single change occurs. This variation is due to random chance and can be expressed using Standard Error, or visually with a bell curve:
The middle of this curve represents the most likely results, while results on the tail ends are less likely. CTR in the coming month will most likely be 2.9%, and it’s highly unlikely that it will be 3.2% in this example.
To evaluate statistical significance, we’ll take the overlap of two of these bell curves. The larger the overlap, the less likely that the current winner will continue to be the winner.
In the example above, the intersection of the two curves is large, so we have low confidence that the current winner will continue to outperform the current loser. You could also say that the overlap represents the likelihood that we’re wrong.
In the example above, the intersection is small, so it’s highly unlikely that the current loser will end up outperforming the current winner. As we covered in the section above on designing the test, we want to set in advance the percent overlap between these two curves we’re comfortable with. This creates a binary condition where the overlap is either acceptable, and therefore statistically significant, or it is unacceptable and not statistically significant.
The third and final element when we’re determining if we have a winner in a controlled test is to evaluate whether the results are practically significant. In other words, is the difference in our key metric meaningful and helpful?
In the example above, both results are drawn from the same sample size, and both are statistically significant; however, the second test is, practically speaking, meaningless and the test was effectively a waste of time.
Calculating Statistical Significance
When ad copy testing, we recommend you only use the statistical significance data provided in the Ad Variations tool to avoid any calculation errors. Using this tool will allow you to evaluate statistical significance on metrics beyond CTR and CVR (or CPI). Note that this tool will not take sample size or practical significance into account, so be sure to insert good judgment and advance planning here. While we want you to rely on tools for ad testing, the fundamentals of statistical significance are important to keep in mind with any A/B testing analysis you may be running whether it’s a bid or landing page test.
Download our Introduction to SEM AdQ, 3Q’s approach to ad copy testing, for a start-to-finish look at how (and when) to put better ad testing into play.