Over the past decade or two, our ability to integrate, analyze and manipulate data has vastly improved. Conversion optimization continues to be key to digital strategy, and experimentation has become an essential methodology for companies trying to optimize their sites and maximize performance. However, although 67 percent of companies currently perform A/B testing, many are not satisfied with results, according to a RedEye and E-consultancy study. As the pressure increases to deliver improved results through site optimization and testing, marketers are constantly pushing the boundaries of existing methodologies, and the need for more complex, dynamic processes has emerged.
Take your testing to the next level. Delve deeper into the mechanics of A/B testing and discover 10 rules for running impactful tests.
To understand the limitations and even push beyond them, let’s examine how traditional A/B testing works and how both science and technology have broadened the scope of optimization abilities.
Traditionally, companies have ran randomized A/B experiments with just two variants (the baseline/control and treatment), looking to find a single statistically significant winner. Over time, rather than limiting the experiment to two content variations, marketers have started testing multiple segments and targets, ensuring they are getting the most out of their experiments. For most companies today, doing A/B/n testing alongside its multivariate cousin is ubiquitous. Generally speaking, for quite some time these methodologies have been the only approaches available to conduct testing on websites, although many limitations apply.
For example, while these approaches offer some level of testing flexibility, they are heavily dependent on finding a single “winning” variation. As a result, classic A/B testing tools are not able to optimize multiple variations simultaneously.
Rethinking these basic testing methodologies, the search for innovation uncovers new approaches to experimentation. Simply put, marketers are challenging the status quo by asking:
- Why do we have to wait for a sufficient sample size and high statistical significance, only to manually stop the experiments instead of having a mechanism that can converge smoothly and automatically towards the best performing variation?
- How can we automatically leverage huge amounts of data for improved and optimized results?
- How can we help our organizations meet the rising bar of customer expectations for highly relevant content?
- Why do we have to settle for a single “winning” content variation when we should be able to serve multiple, targeted variations, based on the preferences and online behavior of individual customers?
- Why run experiments for maximizing goal conversions instead of directly maximizing online revenue?
- On the technical front, how can we reduce operational and managerial costs and achieve powerful scale and speed?
These are not simple questions to answer. However, after working their way through them, many realize that classic testing methodologies simply cannot provide the solution.
The Bandit Approach
In traditional A/B testing methodologies, traffic is evenly split between two variations (both get 50%). Multi-armed bandits allow you to dynamically allocate traffic to variations that are performing well, while allocating less and less traffic to underperforming variations. Multi-armed bandits are known to produce faster results since there’s no need to wait for a single winning variation.
Bandit algorithms go beyond classic A/B/n testing, conveying a large number of algorithms to tackle different problems, all for the sake of achieving the best results possible. With the help of a relevant user data stream, multi-armed bandits become contextual. Contextual bandits for website optimization rely on an incoming stream of user context data, either historical or fresh, which can be used to make better algorithmic decisions in real time.
Here is a visualization showing how the classic A/B/n testing approach would split the traffic between three different variations, vs. the multi-armed bandit and contextual bandit testing approach. This visualization shows how bandit tests can yield better results over time. While the classic approach requires manual intervention once a statistically significant winner is found, bandit algorithms learn as you test, ensuring dynamic allocation of traffic for each variation. Contextual bandit algorithms change the population exposed to each variation to maximize variation performance, so there isn’t a single winning variation. Similarly, there isn’t even a single losing variation. At the core of it all is the notion that each person may react differently to different content. For example, a classic A/B test for promotional content from a fashion retail site that consists of 80% female customers would come up with the false conclusion that the best performing content would be promotions only targeted for females, although 20% of customers expect to have promotions targeted to males.
The term “Dynamic Experiments” represents a new approach for achieving sustainable value more quickly and efficiently. With the limitations of traditional A/B/n testing in mind, it offers a new way to conduct sophisticated experiments. In particular, this approach allows marketers to:
- Dynamically deliver the best performing variation at the user level, ensuring each customer sees the best performing, personalized content variation, instead of a generic, one-size-fits-all (and therefore misleading) winner.
- Directly optimize experiments for maximum revenue-based performance.
- Migrate from a static A/B/n approach to a flexible platform that can automatically fine-tune every element in order to maximize value.
- Gain full control over every piece of data in real time, allowing for better, more effective data-driven decisions.
- Run complicated optimization initiatives with minimum need for IT, even when setting up complex experiments in ever-changing dynamic environments.
Armed with the right set of tools and testing models, marketers can harness the power of data to leverage historical and behavioral data per user, ultimately leading to more successful optimization initiatives. Take a look at our dynamic experiments page to dive deeper into our unique solution.