This is the subhead for the blog post

You just launched an AdWords Campaign Experiment to test two ad variations in your top-volume campaign. Great job! But now what?

Actual A/B test implementation is only part of the battle to make data-based improvements in your account. The most important part of the testing is often forgotten: test evaluation. Here are a few steps to correctly follow the data.

Make Your Hypothesis

To make conclusions based on statistics, you must first state your hypothesis. In the example below, we predict that including the free shipping offer will boost conversions over impressions. I use these two metrics for ad comparisons because they incorporate both CTR and CVR. We’ll make the call to end the test and move forward with the test creative if we hit 90% confidence.

Memory Foam Pillows

Get a Better Night Sleep With A New

Memory Foam Pillow. Buy Yours Now!

 

Memory Foam Pillows

Get a Better Night Sleep With A New

Foam Pillow. Get Free Shipping Now!

MemoryFoam-fast

There are two important reasons for this step in A/B testing:

1. You reduce the test from a two-tail test to a single-tail test (i.e. you’ve made a directional claim instead of just hypothesizing that the two ads will perform differently).

2. You’ve clearly defined which metrics you’ll compare and defined the confidence level. This avoids the common mistake of looking for any difference once the test has started. If you poke around for differences, chances are (hint, hint – we’re talking statistics), you’ll find one.

Set a Testing Time Frame

Since there are many factors that impact performance, try and account for that by setting a minimum test window that crosses normal account fluctuations like day of week. Beyond a time limit, an impression or click volume minimum can also help avoid making decisions too early. Typically I like to see 100 conversions total before testing for significance. The more data gathered, the more confident you can be that performance gains will continue in the future.

You’ll also need to plan when to bail out and call the test a failure. Your hypothesis will only occasionally be true. The elements being tested will not always impact performance.

How do you know when to cut your losses and move on? Set a time and/or volume maximum for when you’ll end the test if you haven’t reached statistical significance.

Many people are tempted to let a test run longer than necessary. Instead of getting absorbed in a single A/B test, move on to another test that you predict will have a larger impact.

Test for Statistical Significance

Once you’ve hit your time and/data minimum, it’s time for the calculations. Note that taking the percent difference between the test and control is not testing for statistical significance. Use formulas within Excel to generate your statistical significance calculator, or find one online. (Avinash Kaushik provides a good example here.) If you’re using AdWords Campaign Experiments, a statistical calculator is built in and alerts you to confidence with small arrow icons next to each metric. Be careful in ACE because it will automatically report statistical significance for every metric – don’t get distracted!

Only review the metrics included in your hypothesis & only accept a confidence level that meets your minimum. If you calculate that there’s an 89% chance the variation is due to chance, the data is not statistically significant. If you say “almost statistically significant,” you aren’t speaking the language of statistics.

Statistically Significant Difference vs. Meaningful Difference

Being statistically significant does not mean that an A/B test has made a noticeable impact on performance. You could see a 1000% improvement in conversions over impressions or a 1% improvement, and both could be statistically significant depending on the sample size. Statistical significance does not give insight into the size or impact of A vs. B – it just signifies that any observed difference is less likely to be due to random chance.

Now What?

Keep testing!