A/A testing and decision making in experimentation

Read the full transcript

If you run an A/A test and you quickly identify that some A’s look better than other A’s, and you know they are the same, then your confidence in the A/B testing methodology and technology is questionable. So the simplistic approach to A/A test is, just to give you some sense of confidence to debug the system, maybe there is a problem with the random assignment but nothing beyond that. You could think of more sophisticated approaches, where the A/A test is not just done as a precursor to the A/B test, but it can be done also in parallel with A/B tests for ongoing monitoring. So you could develop more sophisticated systems, which give you control over what’s going on, and that would give you the ability to manage changes in traffic composition and all sorts of things that can happen, that would impact your decision.

For example, it could happen that, there’s some breakdown in the system, you’re running a test and it has, your allocation is being handled by distributed servers, it’s no longer on a single server. One of the servers didn’t work, and in that geographic area, everyone was put into this bucket, and another one was put into that bucket and the people in that geographic area, happened to respond a bit differently than people in other geographic areas. Alright, well now you’ve biased your experiment, your experimental results because all of those people went into those visitors went into one bucket. And those are exactly the kinds of problems and bugs that you’d like to be able to filter out of the system, and the A/A test is a very good way to do that. So that when you do get to comparing different options and drawing conclusions, making business decisions on them, you have confidence that those are working properly.

Another way that it can easily happen and this was pretty common, is from over monitoring the experiment. It’s very natural if you’ve got data coming in, to not wait a week to look at them but to look at them, every day, maybe several times a day. So, someone who comes into the site, maybe they will respond, maybe they won’t respond, maybe they’ll buy something, maybe they won’t. There’s gonna be variability in the data that you see. One of the two A/As, one of your two buckets, it’s gonna show a slightly better KPI, Is it sufficiently better? But if you watch it over time, what’s gonna happen? It’s gonna drift, so, it might show that bucket, this bucket is better, it might show that this bucket is better, but you have to exercise some caution about trying to reach judgements too fast and that’s one thing that can again they can lead to a situation that may be, instead of a confidence builder, something that drives down your confidence. So, be a little bit careful about data peeking, because it’s gonna happen.

In clinical trials, there is something called intermediate analysis. You plan a trial and in some cases you plan but it has to be planned ahead of time, that you’re going to have, an analysis of the data halfway through. This has to be accounted for. You cannot just collect data and decide to look at it, if this was not planned ahead of time. So what happens in A/B testing or A/A testing is that people might do that without having a planned that ahead of time and that has an effect, people might not be aware of that effect and that might mislead them.

Almost everything that we’ve talked about, in terms of A/A testing is also relevant to A/B tests. You do expect to see some differences – that’s the reason that you’re running the test to begin with. So you’re not gonna be as surprised to see differences, in the case of the A/B test, but again, in order to make sure that you’re making the right business decisions, you do wanna make sure you observe the differences that you’re seeing are real ones, that if you now make a decision based on those, on those differences, this is not just something that’s an artifact in the data, it’s something that’s really there, and you can really build a sound, financial decision on it, business decision, that’s gonna, increase your business activity.

This becomes even more complex when you have several variants. So it’s not A/B, it’s A, B one, B two, B three, B four. So the complexity can be okay, we have A and we have five variants. So what do we want? We want to show, which is the best, we want to show that, they are better than A, we want to pick up the top two and if you add to this the option to design the alternatives using what I mentioned, in terms of designing experiments with combinations of the factors, then you get into even more complexity. So, the testing effort is not more difficult to do it’s the design and the analysis that requires a bit more.

Right, some of the early questions that were looked at, were things like, what is the question that Laplace study? What’s the probability that the sun will rise tomorrow? Alright, we’ve seen the sunrise for many days in a row, what’s the probability that it will rise tomorrow? And can you even, does it make sense to even talk about a question like that in a probabilistic sense?

Many of these early discussions had this very metaphysical nature to them. In terms of modern statistical data analysis, the use of Basie and technology and Basie and thinking, has become much much more established. Part of the reason why there was resistance and has been resistance, and some case is still some resistance is I think, sort of on the grounds of objective versus subjective science. And the idea that you’re bringing prior information to a bayesian analysis is it can be regarded and often has been regarded as expressing subjectivity. So I might have a different prior than Ron does, and we might come to different conclusions from the same data and data which problems and this is the standard environment of course for A/B testing, this is much less critical because again the data are so abundant that anything that you fit in through the prior is essentially going to be washed out by the data. And as a result, it doesn’t really make any difference. What I thought in advance, what Ron thought in advance we gonna agree with one another in the end, after we’ve seen the data from the experiment.

How does a marketer know if a proposed change to the customer experience will be a successful one? Professors Ron Kenett and David Steinberg of KPA Group sat down with us to discuss the basics of A/A and A/B testing, how testing can help organizations ensure they are making the right business decisions, and how to properly approach testing design and analysis.

Read their full article on A/A testing and decision making in experimentation.

Tags for this video: A/B Testing