To see probability to best in action in a Bayesian A/B Test, please use our free Bayesian A/B test calculator.
The ‘Probability to Be Best’, indicates each variation’s long-term probability to out-perform all other live variations, given collected data since the creation or change of any variation included in the test.
The P2BB (Probability to Be Best), is a quantity that is calculated by the Dynamic Yield new Bayesian stats engine for each variation within the context of online A/B/n testing. This quality helps the conductor of the test evaluate the state of the experiment, the performance of the variations, and most importantly, the ability to generalize the results so far, and generalize to future users. In practice, this means that this quantity can be used for declaring winners and losers in a running A/B test. If the P2BB of the leading variation goes above a predetermined threshold (say 95%) it can be declared as a winner.
What goes into the calculation?
The input is dependent on the experiment’s KPI and objective – for binary objective experiments, the input contains a numerator and denominator (for highest CTR objective, it will be clicks/ impressions, for goal conversion, it will be conversions / page views, sessions, users according to stickiness- also known as Variation Exposure– the period of time during which the selected variation remains exposed for your visitor).
For revenue based experiments (non-binary), the input is the total revenue for each variation, unique goal completes, and total <sessions, users>.
How is the calculation done?
The calculation of P2BB uses quite advanced math and statistics, and can not be fully detailed in short. At a high level, we quantify what we know and do not know (uncertainty) about the KPI of the variation. For example, 10 clicks on 100 impressions mean that we measure the CTR to be 10%, but with much higher uncertainty than 10,000 clicks on 100,000 imps. By quantifying we mean that we build a probability distribution from which we can randomly sample a possible true KPI. Then we look at these KPI-random-sampling-machines as simulating the real world, looking at a plausible long-term future state of the KPIs and we see who is the best variation in this possible simulated future. We repeat that MANY times (say 300,000 times), and count how many times each variation drew the best randomly sampled KPI. The P2BB of variation ‘A’ is the number of times it won the random KPI draw, divided by the total number of rounds (say a fixed 300,000).
When does Dynamic Yield calculate P2BB?
The P2BB is calculated and exposed in all Manual Allocation tests and experiments with more than one variation (a Control Group functions as an additional variation, if one is assigned). Experiments with manual traffic allocation need to adhere to strict traffic allocation rules. Thus, any editing or significant changes carried out to the test during the test which change the nature of the test require starting a new phase (or: test version) for the test. Each variation in the test receives a fair chance to win, regardless of prior test results, and the P2BB must be reset and recalculated. Therefore, when test or variation settings are changed midterm, a new test version is created in the system in order to make sure that traffic allocation among the variations is reset along with the associated metrics, and ‘Probability to be Best’ calculations begin anew.