Tracking is a vital part of any PPC analyst job. Without tracking, it is almost impossible to accurately judge profitability of any PPC campaign. (more…)
Testing Blog Posts
When The Adwords Sweet Spots Turn Sour…
Thursday, October 18th, 2007Advert Text, Bidding, Content Network, Google Adwords, Pay Per Action, PPC Campaigns, Testing
I blogged a while back about the sweet spot for your campaign, and how to find it.Basically, you estimate the conversion rate, cost per click and clickthrough rate for each position that your advert can appear in, and calculate how profitable each one is. You should find that one position is more profitable than the ones above or below it, and so this is where you should be putting your advert.
Which is all fine and dandy. But the other day, I was doing some forecasts and my profit curve looked like this:
Clearly, I’d made a mistake! So I went back, and checked my forecasts for the clickthrough rate, the conversion rate and the cost per click. Here they are…
I’ve changed the actual figures, but the result is the same. With a profit per conversion of £300, this gave me an inverted profit curve. Assuming that the cost per click is higher for higher positions, the conversion rate is lower or the same for higher positions, and the clickthrough rate is higher for higher positions, the profit from each conversion must be higher in lower positions. In my case, the conversion rate was clearly higher, the lower my advert appeared. If this effect outweighed the increased number of clicks that I got in a higher position, then it’s possible that I’d predicted that I’d get more conversions in a lower position than in a higher position. For example, if 5th place generated 5,000 clicks with a 3% conversion rate, and 6th position generated 4,000 clicks with a 4% conversion rate, then 5th place would generate 150 conversions, and 6th would generate 160 conversions. Clearly this is a danger when forecasting, particularly if you extrapolate beyond the range of your data. I can’t accept that you can get more conversions from a lower position in practise unless you have a restrictive budget (which I didn’t), so I looked at my data to see if this was the problem…
So that’s not the problem. Finally, I looked at the profit per conversion, the number of conversions, and the product of the two (the total profit).
The number of conversions is lower in lower positions, the profit from each is higher, and you get this ‘inverted’ profit curve – a ‘sour spot’. So, the question is whether this is possible in reality, or if it’s just a flaw in the forecasting method. The answer is surprisingly simple once you think about it. If you advertise in a very low position (say, 100), you’ll get almost no conversions, and hence make almost no profit. The true shape of this curve would probably be something like this:
It’s possible that multiplying these two monotonic functions (conversions and profit per conversion) can generate two turning points in your profit curve – a maximum and a minimum. I can accept that this is possible, and graphs of the above shape will have a sweet-spot of either 1st or the local maximum (in the above example, 6th). This raises one final question. In the above example, I looked at the top six positions, saw the sour-spot and understood that I needed to extrapolate further. But if I’d only run the advert in positions 3 to 8, I would have seen a sweet-spot, and thought no more about it. In this case, I’d still (just about) have the correct sweet-spot, but another time, I may have missed out on potential profit. And perhaps I have done. My conclusion is this – extrapolate your data as far as possible, limiting your graph only at your total budget. See if this kind of shape is a possibility, and investigate it.
A:B Advert Testing, A Cautionary Tale
Monday, July 9th, 2007Adgroups, Advert Text, Google Adwords, Pay Per Action, Testing
The conventional wisdom on PPC adverts on Google is that you should look to improve the click through rate, as it is generally accepted that this is an important attribute in the Quality Score, which determines the amount that you need to bid to get a certain position (or how high up the rankings you appear for your bid, if you prefer). And this is probably true, and isn’t a bad idea. But it’s definitely not a good idea to focus on the click through rate to the exclusion of all else. The click through rate is an indication of how interested people are in your advert, but if your advert does not accurately represent the content of your site, you’ll be enticing traffic that doesn’t convert very well, and may be putting off exactly the people that you should be attracting to your website. This sounds like an easy thing to avoid, but it’s not quite as straightforward as it sounds. Suppose that you are a company that offers free marketing advice via a weekly e-mail that people have to sign up for. Your initial advert may read:
Free Marketing Advice Get Free Advice From Marketers Inc Free E-Mails Every Week MarketersInc.com/Advice
The advert does quite well, and gets conversions occasionally. But you’re concerned that the second line is a fairly weak call to action, so you decide to try something different.
Free Marketing Advice Get Free Advice Here! Free E-Mails Every Week MarketersInc.com/Advice
You run it for a while, and it doubles the click through rate, so within a day or two you bin the old advert and go forward with the new one. Then you look at the third line. It doesn’t really extol the benefits of the e-mails, so you try another line.
Free Marketing Advice Get Free Advice Here! Learn The Tricks Of The Trade MarketersInc.com/Advice
Even better click through rates, so you keep this one. But the changes in the second line may lead people to believe that there is free information on your website, rather than from a marketing company. Whilst you’ll get more traffic to your site, it’ll be of poorer quality. And the change to the third line reinforces this. But surely you’ll see a fall-off in the conversion rates, and keep the old adverts? Not if you’re changing your adverts as soon as one appears significantly better than the other, based on click through rates. Suppose that the campaign above starts out with a click through rate of 3%, then increases to 6% and 8%. At the same time, the conversion rate falls from 10% to 7% to 4%. Finally, assume that the cost per click moves from £0.30 to £0.28 to £0.25 If you accept a 90% level of significance, your results look something like this.
There is no real falloff in the number of conversions, and a significance test of the difference in conversion rates is totally insignificant. In fact, to get significant results (even at the 90% level) for the conversion rates, you’d need to wait much longer.
To put that in context, if you were getting 400 impressions per day, the tests for click through rates would take (1 + 3 =) 4 days, whereas the tests for conversion rates would take (38 + 8 =) 46 days. That’s quite a lot longer. So, what’s the conclusion here? Should you run your campaigns for ten times as long, to confirm that the new advert doesn’t hit your conversion rates? Bear in mind that the changes above are quite extreme , it’s unlikely that your results will show anything after waiting ten times as long , when do you draw the line, and say that the change is too small to matter? Even here, we’ve not taken into account the impact of reducing the cost per click (which will slightly offset a reduced conversion rate), or the impact of increasing the total number of conversions (even at a slightly higher cost per conversion, this could still be a good thing). Alternatively, should you just ignore the conversion rate, and hope for the best? Or try very hard not to change the meaning of the advert? You only need to write one bad advert to wreck your campaign. Perhaps the best approach is to physically look at the conversion rates of the adverts that you are dropping , if they are lower, then ask the question œhave I caused this to happen? The fewer conversions that you are getting, the harder it’ll be to stop a problem , so monitor the conversion rate, and if it starts to drop, check to see if you’re the cause.
A:B Advert Testing – Is Statistical Significance Over-Rated?
Friday, June 29th, 2007Adgroups, Advert Text, Google Adwords, Pay Per Action, Testing
On the face of it, probably a bit of a daft question. How can you be sure that your new advert is better than the old one, if you don’t wait to see if it’s statistically significant? And to an extent, that’s true. If you were to ignore significance completely, the moment somebody clicked through one of your adverts, you’d decide that it was the better advert, and bin the other one. It’s quite possible that only 50% of the time you’d select the better advert, and for every improvement that you make to your advert, you make another change for the worse, and you don’t get any overall improvement at all. But there’s a trade-off for statistical significance. Suppose that you have two adverts, one that generates a click-through rate of 5%, and one that generates a click-through rate of 10%. How long should you wait before you are sure the 10% advert really is better? If you get 30 impressions per day, it’ll take four days to be 85% certain (3/60 vs. 6/60 is significant at the 85% level). But if you want to be 95% certain, it’ll take eleven days (8.25/165 vs. 16.5/165 is significant at the 95% level). And to be 99% certain, it’ll take twenty days! So, in the time that it takes to run one test at the 99% level, you can run five tests at the 85% level. Clearly, you can get far quicker improvements in your overall click-through rate, if most of these changes are genuinely for the better. But what about the risks? You could select to keep adverts that are, in fact, worse than the existing ones (and you will, 15% of the time , any change to an advert will change the click-through rate; there are no ˜equally good’ adverts). But I would challenge that if an advert appears better at the 85% level, whilst it may be worse, the chances are very small that it’ll be much worse. So, if you run five tests in those twenty days, you’ll probably make one change for the (slightly) worse, and four changes for the better. Still an improvement on the one change that you’d make if you were determined to wait until you were 99% certain that you were making the right choice , this is advertising, not a clinical trial! Of course, this is a bit of an over-simplification. In reality, most of your advert tests will yield a much smaller return than doubling the click-through rate, and a lot of them will not be better than the old advert. The first point here is quite important , the smaller the difference between the two adverts (increasingly true once you’ve entered an ongoing process of testing), the longer it’ll take to get strong significance, and the less risk there is in taking the wrong option occasionally. For example, if you were getting 30 impressions per day, and had adverts with 5% and 6% click-throughs, you’d get 85% significance after 75 days, but even 95% significance is going to take 193 days , nearly three times as long. As for the second point, what if the new advert is performing worse than the existing one after a few days? It’s not significant, but, in a mirror of the argument so far, if it is in reality a better advert, is it likely to be much better? Is it worth waiting weeks to see if this advert, that’s probably worse than the existing one, is actually slightly better (remember that the smaller the difference, the longer it’ll take to be sure). Perhaps the time is better spent writing a new challenger, which may prove itself quickly? So what level of significance should you use? Personally, I’d say that 85% is probably sufficient, but I can see an argument for 90%. I feel that running a test for three times as long (as an 85% test) to get to 95% is excessive , yes, you’ll get it wrong less often, but it’ll take a lot longer to generate improvements, and lets face it, your rivals probably aren’t standing still! There is, of course, one problem that brings the whole process to a grinding halt. What if the two adverts are producing very similar results? It’s widely acknowledged that a small change to an advert can have a big impact, but more often than not, it has a very small impact. Everything stops until you get significant results, and the more similar the performance of the adverts, the longer it’ll take. The solution is fairly clear , sooner or later, you’ll have to stop the test. You can either keep the existing advert, since the new advert hasn’t proven itself, or you can take whichever is the better to date, regardless of whether it’s significant or not (this’ll be the better advert more often than not). I’d advocate the second option, although really, it doesn’t make much difference which you choose (since they are performing very similarly). An interesting claim , that under certain circumstances, you should take the advert that is performing better, regardless of whether it’s significant or not! So what process have we arrived at?
- Decide before you run your new advert how long you are willing to wait for a result , this’ll depend on how long you’ve been testing (as you go on, the chances of finding a quick, big win decrease) and (obviously) how many impressions you are getting.
- Set the advert live, checking regularly for significance. I’d recommend www.splittester.com, but any testing tool will do.
- If, after a few days (longer if you’ve got little traffic), the new advert is worse than the old one, kill it, and write a new advert.
- Once you’ve got 85% significance (or 90%, if you’re of a nervous disposition), keep the better advert.
- If the deadline set in step one is reached without a significant result, keep the better advert, regardless of how small the difference is.
Multivariate Testing On PPC Adverts
Friday, June 29th, 2007Advert Text, Google Adwords, Testing
When most experts talk about testing adverts on Google Adwords, they recommend that you change only one thing in the advert, so that your results aren’t confused by conflicting changes (if one change improves your click through rate, and one makes it worse, then the overall impact could be better or worse, and you’ve no idea what worked at what didn’t).
And that’s all ducky. But it assumes that the lines of your advert work independently of each other. What happens if this isn’t the case.
Consider the two adverts below:
A)
ABC Mortgages
Free Advice And Valuations
Competitive Rates
Suppose that you’ve been running A) for a while, and try a new third line œQuick And Friendly Service:
B)
ABC Mortgages
Free Advice And Valuations
Quick And Friendly Service
You get the following results:
A) 10000 impressions, 1000 clicks, click through rate = 10%
B) 10000 impressions, 800 clicks, click through rate = 8%
So you keep advert A)
Then you try it against another advert, with a second line of œYour Local Mortgage Broker:
C)
ABC Mortgages
Your Local Mortgage Broker
Competitive Rates
You get the following results:
A) 10000 impressions, 1000 clicks, click through rate = 10%
C) 10000 impressions, 800 clicks, click through rate = 8%
Again, you keep advert A).
But have you missed something here? We’ve no idea if advert A) is better than an advert with both changes:
D)
ABC Mortgages
Your Local Mortgage Broker
Quick And Friendly Service
It’s quite possible that the two new lines work well together, as they have a similar tone. This being the case, you could potentially get the following results:
A) 10000 impressions, 1000 clicks, click through rate 10%
D) 10000 impressions, 1200 clicks, click through rate 12%
This is quite possible, but you can’t run this test instead of the others, in case one of the lines was better, and the other worse. And if you run it as well as the others, it feels a bit like you’re wasting time. And if you wanted to try varying all three lines, you’d need to run seven different tests, instead of just three.
In principle, the solution is quite simple. There’s an easy way to test your three adverts in the time of two tests, or if you are changing all three lines, eight tests in the time it would take to run four tests:
Simultaneously, run all possible combinations of the adverts against each other at once.
The above tests ran for a total of 20,000 impressions each, but how many impressions would it take to get the best of the four options?
Only 40,000 impressions, because you don’t need to waste 50% of your impressions on the control, only 25%. This is the same amount of time that you’d have spent just testing A against B and A against C.
So your results would have looked something like:
A) 10000 impressions, 1000 clicks, click through rate 10%
B) 10000 impressions, 800 clicks, click through rate 8%
C) 10000 impressions, 800 clicks, click through rate 8%
D) 10000 impressions, 1200 clicks, click through rate 12%
In fact, you can save a bit more time by ditching one of the adverts early on, if it’s clearly not going to be the best.
So perhaps testing one change at a time isn’t necessarily the best idea , the stronger the synergy between the lines of your advert, the more likely you are to miss positive changes.
However, there is a price to pay for this approach (isn’t there always?). The test is slightly more likely to come up with the wrong answer (it’s slightly less significant), so you’ll probably want to run it a little bit longer to get the same degree of confidence in your results.
In summary, if you think there’s likely to be a strong synergy between two lines of your advert, it’s worth testing all four combinations of the two lines at the same time. If there’s no reason to believe that the two new lines work particularly well together, but poorly individually, then there’s no benefit to running the four options at once, and you should run two a:b tests as usual.
For example, if you have the advert:
Dave’s Confectionary
Buy Our Chocolates!
Free Next Day Delivery
And you want to try a second line of œGet Luxury Chocolates Here!, and/or a third line of œGift Wrapping Available, you have to decide whether these lines are likely to complement each other, but are unremarkable individually. Here, it looks like the lines are pretty much unrelated, with one being an alternative call to action, and the other suggesting a different feature. So two standard (one change) a:b tests would be quicker or more reliable (depending on how long you wait for your results) than running all four possibilities together.
Any comments? Disagree? Let me know¦

no comments