5 Reasons Why I Stopped Following A/B Testing Case Studies

Here’s a list of the reasons for what I like to refer to as the A/B testing case study blindness syndrome.

VP of Global Marketing, Dynamic Yield

I have a confession to make: I have a love/hate relationship with most A/B testing case studies. I love reading case studies that outline the conversion optimization process using A/B testing methods. It has always been a hobby of mine, filling me with inspiration and motivating experimentation ideas inside my blurry head. Until one day. The day I realized that the magic of most case studies relies heavily on obscurity. I then decided to take a different path and turn a blind eye to most case studies. On this day, I stopped following most A/B testing case studies.

There are literally tons of website A/B testing case studies out there, which demonstrate massive uplifts in conversion rate and revenue. It always looks so easy to replicate. But, in reality, while many of us enjoy reading and following those case studies, we should all pay more attention to details – remembering to criticize the data, assumptions and methodology involved. In fact, I fear all A/B testing case studies should come with a clear warning sign above them:

Take your testing to the next level. Delve deeper into the mechanics of A/B testing and discover 10 rules for running impactful tests.

Do not assume you will get similar results for your site without first testing it yourself. Here’s a list of the reasons for what I like to refer to as “the A/B testing case study blindness syndrome.”

5 Reasons Why You Shouldn’t Blindly Follow A/B Testing Case Studies

1. What may work for one brand may not work for another.

Generalizing any A/B testing result to the population of all websites, based on just one, single case, would be considered a false assumption. By doing so, you would ignore your specific vertical space, target audience and brand attributes. Some ideas may work for your site and audience, but most will not replicate that easily.

2. The quality of the tests varies.

I am afraid that most reported case studies do not include the necessary information to evaluate the quality of their methodology. In reality, some of them are lacking valid statistical inferences. When reading a documented A/B test study, you should ask yourself, “What was the full experiment methodology? Were there any data deviations that skewed the outcome of the entire test? What was the statistical significance of the whole sample? What was the sample size of the population? Did they address common pitfalls that may threaten the validity of the test?” (Read more about why protecting statistical significance is so important)?

3. The impact is not necessarily sustainable over time.

The results of any controlled A/B test may differ over time. In other words, the results may be valid only for the specific period when the experiment was conducted. In order to be able to generalize the results to a future audience and prove that the improvements were indeed sustained over time, we need to run the test long enough and even replicate the experiment every once in a while. In most reported A/B testing case studies, the lack of sustainable confirmation evidence is noticeable, which makes them less reliable in my book.

4. False assumptions and misinterpretation of result.

Many of us tend to tie certain results with specific behaviors. We look for the “why?” in the equation, and we forget that the “why” wasn’t tested in the first place. Attributing an experiment’s results to a specific behavioral factor is a natural thing to do, but it may act as a catalyst for false assumptions and misinterpretations. When we conduct a controlled experiment, we’re looking for a statistically significant correlation between a variation and the reaction to it. Reaching statistically significant results doesn’t mean that there’s a definite causal connection between the two, which is something that’s just too difficult to measure using a traditional A/B test. The tendency to inject our own intuitive causes, further suppresses the fact that there is always room for random chance, and we tend to bias the results to more convenient areas.

5. Success bias: The experiments that do not work well usually do not get published.

Most A/B testing experiments fail, and there are many reasons for that; from common execution pitfalls to wrong testing hypotheses. In reality, we rarely hear about failed tests, although they can sometimes provide significant insight. Naturally, people prefer to publish their success stories, not the failed ones. People also tend to simplify their stories, making it look much easier than it was in real life. The truth is that A/B testing is a long process with many obstacles and surprises. I would like to quote Beck’s lines from “The Golden Age:” “It’s a treacherous road, with a desolated view. There’s distant lights, but here they’re far and few.”
A/B Tests Reliability Signals

I am not saying all A/B testing case studies are vague and useless. There are, naturally, some really good studies amongst many over-selling, misleading ones. The reporting side is the crucial part where you can assess the quality of the case study and be able to separate the wheat from the chaff. So, my advice would be to continue reading and following them, but also look for answers to the questions raised in this article. If you can’t find the answers to these questions, I suggest you move on to the next one.

Check the validity of the case studies you’re reading:

  • What was the initial null hypothesis?
  • How long did the test run?
  • What was the sample size?
  • Who was the audience of the experiment?
  • What was the A/B testing platform involved?
  • Was there a re-run of the A/B test to verify sustainable results?

I recommend you to use our Bayesian a/b testing calculator and test duration and sample size calculator to double-check the data, and feel confident about the validity of it.

To Conclude

Published case studies of website A/B testing are an excellent starting point for getting testing ideas, learning best practices and understanding common use cases. But (and there’s a big but in here), you should always criticize the methodology and interpretations of the case studies you are reading, and never follow them blindly. Be skeptical and always test ideas and assumptions on your own site and target audience.