After last week’s great round-up on the Quality Score debate from Clix, I’ve been noodling on QS again –  and as the single named antagonist, I feel compelled to antagonize a bit more. (Which might make for a fun dynamic on the SMX East Enhanced Campaigns panel I’m sharing with brilliant QS advocate Larry Kim on Oct. 1.)

Seriously, though, I just happened to notice some really odd QS behavior this week. Some caveats: the campaign numbers aren’t statistically significant, and the results have been bounced through some AdWords fluctuations (ahem, Enhanced Campaigns). But since most marketers optimizing less-than-massive accounts are subject to these conditions when analyzing data, I firmly believe the conclusions are relevant.

Let me set the stage:

We have a client whose head terms are extremely position-sensitive and sensitive to competitor actions.  We set up a campaign a little over 6 months ago to serve as a “test environment” and have run a series of various position tests within this campaign. We purposely duplicated a handful of key exact-match head terms within this campaign – each duplicate lives in its own ad group like this:

Test Ad Group [keyword 1]
Control Ad Group [keyword 1]
Test Ad Group [keyword 2]
Control Ad Group [keyword 2]

And so on…

Over these 6 months, we have put the test groups vs. the control groups in a series of tests that look at the best way to approach position for this advertiser. We’ve used different bidding tool and methodologies as well as tested positions that average only .10 differences between the 2 groups. These tests are executed in shorter intervals, and in most of the tests we’ve used AdWords experiment tools to split the impressions in an A/B fashion.

(Oh, and just to get this out of the way, this client only has 1 lead gen landing page, and the ad texts are also cloned for the test and control ad groups.)

Earlier in the week, I happened to be looking at an aggregate 6 months of data for this whole campaign at the keyword level and noticed some really weird numbers around QS!

quality score

Duplicated exact match keyword, in the same campaign, with same landing page and same ad text

So here are the 6 months of 1 of the keywords – we have relatively even impressions and clicks for this one resulting in ~11% better CTR for one variation. Avg Position is only .1 different. The conversion rate is quite different, with one variation ~50% better than the other – but with the same ad text and LP I can say this is likely a couple of factors. It either has to do with how we were treating position on some earlier test that collided with some seasonality or shifting competitor strategy, or it’s just random.

In either case, it’s not statistically significant enough to be sure. Still, a 2-slot difference in QS? And the higher QS group has double the first page CPC? Huh?

Here’s another one:

quality score data

Here we have a much bigger difference in average position, less activity overall, and less parity in the impression/click numbers.  There is a way bigger CTR delta and a way bigger conversion rate delta. Both favor the higher QS. First-page CPC is relatively the same here though.

And one more:

quality score study

Higher QS aligned with lower CTR and only slightly higher CVR. CPA is overall higher – certainly not benefiting from CPC discounting on the higher QS.

I’ll also say that within this campaign there are examples of these duplicate pairs that also DO have the same QS despite wildly different metrics. Such as this:

more quality score

Better CTR and CVR on one variation but the same QS.

So, what does this all mean? Well, as I mentioned, these are all small, statistically insignificant data sets. Additionally, this data was gathered over 6 months with several forced and varied test intervals we executed. In those 6 months, there have been large shifts in AdWords (Enhanced Campaigns!), shifting competitor landscape for this advertiser, and some major seasonal spikes. All of those factors are layered on top of each other, so this is in no sense “clean.”

However, to be realistic, most advertisers consistently are working with: a) data sets that are way smaller than truly stat sig data even when you aggregate their entire account activity; b) shifting marketplace conditions.

Most advertisers also can only react to data that they have and is reflected in their account. So, what would the proper reaction/optimization here be? It seems strange that there is no anecdotal consistency that even supports a “gut check” optimization.

I will say that the data Larry Kim has gathered by aggregating hundreds of accounts is surely true and does show strong correlative data. I also know that Google is pretty brilliant at algorithmically rewarding things that ultimately make more money for them and that their data pool is of staggering scale and 100% of all AdWords activity.  However, I repeat my mantra that for most (perhaps all) advertisers working within the scale of even a single huge account, reacting to QS is the wrong thing to do. Once again, exercise the best practices and ignore the QS column.

4 Comments

  1. Terry Whalen September 24th, 2013

    Susan, you are spot on with your post on QS.

    “However, I repeat my mantra that for most (perhaps all) advertisers working within the scale of even a single huge account, reacting to QS is the wrong thing to do. Once again, exercise the best practices and ignore the QS column.”

    –Terry

    p.s. Larry’s message is a good one for AdWords novices – but for non-novices, QS should be ignored. Additionally, Larry never puts QS analysis into context – the context is that for some sets of keywords, you can never have a good QS if you stay true to your value prop; but those sets of keywords may be nicely profitable for you!

  2. Martin Roettgerding September 25th, 2013

    Susan,
    Full agreement on the conclusion – do what’s best and ignore the QS column.

    Still, I’d like to comment on a few things. First of all, there was a massive change in how Google calculates keyword QS. Since then, keyword QS is made up of three equally important components: predicted CTR, ad relevance, and landing page experience. Check your own keywords’ status information. Your QS 1 keywords should almost always have three “below average” components; your QS 2 keywords have two below average and one average.

    An above average evaluation of landing page experience can yield you a four, even if the rest is below average. However, this won’t do anything to your actual QS (the one that’s used in the ad auction) since the influence of landing page experience is negligible there.

    More on this is here: http://www.ppc-epiphany.com/2013/08/08/what-really-happened-in-the-latest-quality-score-update/

    QS aside, there are some other things we can’t evaluate. Average position is one of these things. CTR is another. Our industry treats these metrics as if they were meaningful, but they hardly are. Both metrics are just averages.

    The third table demonstrates this nicely. The control group has a higher CTR despite being in a lower average position. If keywords and ads are the same, this should never happen.

    This is why I’d expand the conclusion to these metrics as well: Ignore them as well. They are byproducts, not KPIs.

  3. Susan Waldes September 25th, 2013

    Martin – great points!
    I frequently talk about how this industry is still more “art” than science. Data like average POS and CTRs can be totally misleading. Even if taken “directionally” in the since that x on average is better than y on average, most advertisers do not have stat sig data sets and optimizations require a healthy dose of “gut checking”

  4. Bryant Garvin February 19th, 2014

    Susan,

    Just a quick note… One thing you may want to double check for both of those ad groups is segment them by Network. There is a REALLY good chance most of the control group in your first example is running on mostly Search Partners…

    Whenever I see data like this that doesn’t quite make sense, (especially because so many top level metrics are averages or aggregates), I always try to drill down another layer to see if there might be a logical reason behind the differences.

    Still I love that you dig in, and stand your ground even when you are in the minority ;-)

Leave a Comment

Susan Waldes
Susan Waldes has worked in the search engine marketing industry since 1999; she is currently the SVP of Client Services at Fivemill Marketing. Susan has handled a multitude of lead generation, branding, and eCommerce clients in her previous roles at ROI Revolution and Rockett Interactive and as an independent SEM consultant. Susan has a BFA from Savannah College of Art and Design. Susan has contributed insights about SEM and client relationships to other highly regarded outlets, including Techipedia.com.