5 Reasons Why I Stopped Following A/B Testing Case Studies

I have a confession to make: I have a love/hate relationship with most A/B testing case studies. I love reading case studies that outline the conversion optimization process using A/B testing methods. It has always been a hobby of mine, filling me with inspiration and motivating experimentation ideas inside my blurry head. Until one day. The day I realized that the magic of most case studies relies heavily on obscurity. I then decided to take a different path and turn a blind eye to most case studies. On this day, I stopped following most A/B testing case studies.

There are literally tons of website A/B testing case studies out there, which demonstrate massive uplifts in conversion rate and revenue. It always looks so easy to replicate. But, in reality, while many of us enjoy reading and following those case studies, we should all pay more attention to details – remembering to criticize the data, assumptions and methodology involved. In fact, I fear all A/B testing case studies should come with a clear warning sign above them:

Take your testing to the next level. Delve deeper into the mechanics of A/B testing and discover 10 rules for running impactful tests.

Do not assume you will get similar results for your site without first testing it yourself. Here’s a list of the reasons for what I like to refer to as “the A/B testing case study blindness syndrome.”

5 Reasons Why You Shouldn’t Blindly Follow A/B Testing Case Studies

1. What may work for one brand may not work for another.

Generalizing any A/B testing result to the population of all websites, based on just one, single case, would be considered a false assumption. By doing so, you would ignore your specific vertical space, target audience and brand attributes. Some ideas may work for your site and audience, but most will not replicate that easily.

2. The quality of the tests varies.

I am afraid that most reported case studies do not include the necessary information to evaluate the quality of their methodology. In reality, some of them are lacking valid statistical inferences. When reading a documented A/B test study, you should ask yourself, “What was the full experiment methodology? Were there any data deviations that skewed the outcome of the entire test? What was the statistical significance of the whole sample? What was the sample size of the population? Did they address common pitfalls that may threaten the validity of the test?” (Read more about why protecting statistical significance is so important)?

3. The impact is not necessarily sustainable over time.

The results of any controlled A/B test may differ over time. In other words, the results may be valid only for the specific period when the experiment was conducted. In order to be able to generalize the results to a future audience and prove that the improvements were indeed sustained over time, we need to run the test long enough and even replicate the experiment every once in a while. In most reported A/B testing case studies, the lack of sustainable confirmation evidence is noticeable, which makes them less reliable in my book.

4. False assumptions and misinterpretation of result.

Many of us tend to tie certain results with specific behaviors. We look for the “why?” in the equation, and we forget that the “why” wasn’t tested in the first place. Attributing an experiment’s results to a specific behavioral factor is a natural thing to do, but it may act as a catalyst for false assumptions and misinterpretations. When we conduct a controlled experiment, we’re looking for a statistically significant correlation between a variation and the reaction to it. Reaching statistically significant results doesn’t mean that there’s a definite causal connection between the two, which is something that’s just too difficult to measure using a traditional A/B test. The tendency to inject our own intuitive causes, further suppresses the fact that there is always room for random chance, and we tend to bias the results to more convenient areas.

5. Success bias: The experiments that do not work well usually do not get published.

Most A/B testing experiments fail, and there are many reasons for that; from common execution pitfalls to wrong testing hypotheses. In reality, we rarely hear about failed tests, although they can sometimes provide significant insight. Naturally, people prefer to publish their success stories, not the failed ones. People also tend to simplify their stories, making it look much easier than it was in real life. The truth is that A/B testing is a long process with many obstacles and surprises. I would like to quote Beck’s lines from “The Golden Age:” “It’s a treacherous road, with a desolated view. There’s distant lights, but here they’re far and few.”
A/B Tests Reliability Signals

I am not saying all A/B testing case studies are vague and useless. There are, naturally, some really good studies amongst many over-selling, misleading ones. The reporting side is the crucial part where you can assess the quality of the case study and be able to separate the wheat from the chaff. So, my advice would be to continue reading and following them, but also look for answers to the questions raised in this article. If you can’t find the answers to these questions, I suggest you move on to the next one.

Check the validity of the case studies you’re reading:

  • What was the initial null hypothesis?
  • How long did the test run?
  • What was the sample size?
  • Who was the audience of the experiment?
  • What was the A/B testing platform involved?
  • Was there a re-run of the A/B test to verify sustainable results?

I recommend you to use our Bayesian a/b testing calculator and test duration and sample size calculator to double-check the data, and feel confident about the validity of it.

To Conclude

Published case studies of website A/B testing are an excellent starting point for getting testing ideas, learning best practices and understanding common use cases. But (and there’s a big but in here), you should always criticize the methodology and interpretations of the case studies you are reading, and never follow them blindly. Be skeptical and always test ideas and assumptions on your own site and target audience.

Posted By

5 Reasons Why I Stopped Following A/B Testing Case Studies
4.92 (98.33%) 12 votes

  • Elad Rosenheim

    I empathize with no. 5 in particular. A/B tests are like start-ups: most fail silently, but the (perhaps) 1% that succeed make a lot of noise and keep the myth rollin’ on.

  • Ricardo Pietrobon

    Yaniv, not sure if I follow your argument: You say you stopped following cases, and yet all of your arguments say that you should be critical about them. So, this is like saying that I should stop reading scientific papers because I have to be critical of their results. Isn’t being critical a necessary requirement to read them?

    • Good point Ricardo. Let me clarify my argument: I didn’t say I
      completely stopped reading all case studies. I stopped following most of them,
      and I argued that you shouldn’t blindly follow them. I tried to provide
      the necessary tools for criticizing the published data. To me, since many of
      the published ones tend to be obscured and lack of sustainable data, comparing them
      to scientific papers would be wrong. Go ahead and look for A/B case studies
      that provide answers to my questions, see how many you find. That’s my point.

  • No case studies should be ‘Blindly followed’, not just A/B test case studies. Though I more or less agree with the points you mentioned, I think small businesses can find actionable insights from such case studies that they can use.

    • Thanks for your comment. I agree, good A/B test case studies can inspire and bring excellent testing ideas. We just need to leave the bad ones behind.

  • Hey Yaniv,

    I agree on all 5 of your points, there are far too many A/B testing case studies out there trumpeting a “you can do it too” angle, when in reality what works for one brand may not work for another.

    I think the main problem with these ‘1 easy step’ studies is that they gloss over how much time it took to find that one easy fix. They don’t mention the 5-hour strategy session that identified that one necessary tweak.

  • Well said. Slavishly following the A/B testing path is likely to be like a stream engine hissing a hooting and being the center of activity and attention but it hasn’t actually moved from the station yet. Most A/B evangelists are either trying to sell something (or building there credibility before ahem ‘monetizing’ that which means selling you something.

    Or alternatively they are in love with statistics and, in common with many HR people what they are espousing is actually technically correct – but it doesn’t actually work in the real world.

    • Tom

      David, I’m don’t think Yaniv is arguing against A/B testing in and of itself. As I interpreted it, he encourages properly-done split testing but discourages taking inspiration from split test results that other people publish. His reasoning is excellent: I’ve definitely seen all the biases he mentions in the wild. It’s easier to get excited by “300% lift from one simple thing!” than it is to get excited by a proper technical experiment report that reveals subtle insights into your audience.

      In my experience, it’s the “magic methodologies” that you don’t get to split test on your site that fail to deliver in the real world. Lots of internet wisdom sounds great but, if you don’t split test, you can’t be sure it works with your particular audience.

      Split testing is a powerful tool. Like any powerful tool it can be used well, or it can be used incorrectly, or even abused. Like any powerful tool, it’s not the right tool for every situation. But when you need a scalpel, don’t use a lump hammer.

  • Iain Moss

    Very good piece, Yaniv. Case studies are good for ideas and inspiration, but ultimately you should be looking at your analytics data to find insights and opportunities that will work for your business.

    Full disclosure, I work for Qubit, who do A/B testing. We believe looking at your data to find tests that will work for your business. In fact, we’ve put together a video explaining how to approach A/B testing, and which agrees with a number of your points. If you’re interested, it’s here: http://www.qubitproducts.com/content/video/would-you-bet-your-job-on-your-ab-testing-results . Would love to get your opinion!

Menu Title
Contact Us
×