How to analyze personalization campaigns by primary audiences
Unlock key insights about your primary audiences with these best practices for analyzing your personalization campaigns.
As you begin segmenting tests by your primary audience groups, you’ll start to unlock valuable insights about what does and does not resonate for certain segments. This data can help you build a complete and nuanced understanding of how to speak to your customers.
Analyzing and recording this data over time will not only strengthen your personalization program, but it will empower all areas of your business and help other departments become more strategic in their interactions with different customer groups. Here, we will share some best practices to keep in mind and walk you through an example of how to analyze your campaigns by primary audiences.
Accurate reporting always begins with accurate data — which you can only achieve if you set up your experiments correctly. We will first begin with some testing best practices to ensure your data is valid before you begin analyzing it.
Please note the following best practices apply only to longer-term experiments in which the goal is to reach statistical significance, or somewhere close to it. Of course, sometimes you will seek to run a campaign for a short period of time, perhaps for a Black Friday promotion, that will only run for a few days. In these cases, you’d leverage a Multi-Armed Bandit method, in which more traffic is allocated to the winning variation over time. For the purposes of reporting, we will be exploring best practices for manual traffic allocation, or, the classic A/B testing approach.
To learn more about how to choose the right traffic allocation method for your tests, you can read our article on the topic here.
Setting up your test to produce valid, meaningful results
Let’s say, for example, you want to test the impact of social proof messaging, in the form of star ratings on PLPs. The first thing you will need to do in order to test your hypothesis and analyze your results with accuracy is set up your variations and your control group.
In this example, the experience is targeted to all users with two variations: one with the star ratings, and one without (the control). In this instance, we would set up 50/50 traffic allocation. This means 50% of traffic will be sent to the star-rating variation, and the other 50% will be sent to the control.
Allocating 50% of traffic to your new variation and 50% to your control will give you clear insight into how your test variation performs across your primary audiences. This method removes room for error and, if it runs long enough, can provide you with statistically significant results.
How to effectively measure the impact of your campaigns
1. Reporting terminology
Most experimentation platforms have built-in analytics to track all relevant metrics and KPIs. But before analyzing an A/B test report, it is important that you understand the following two important metrics.
Uplift: The difference between the performance of a variation and the performance of a baseline variation (usually the control group). For example, if a variation has a revenue per user of $5, and the control has a revenue per user of $4, the uplift is 25%.
Probability to Be Best: The chance of a variation to have the best performance in the long term. This is the most actionable metric in the report, used to define the winner of A/B tests. Whereas uplift may vary based on chance for small sample sizes, the probability to be best takes sample size into account (based on the Bayesian approach). The probability to be best does not begin calculating until there have been 30 conversions or 1,000 samples. To say it simply, the probability to be best answers the question “Who is better?” and the uplift answers the question “By how much?”
2. Basic Analysis
First, you will want to take a look at how the experiment performed for all users. In our example of star ratings on PLPs, you can see that the test variation produced an uplift across the primary metric we set up for the test (in this case, purchases per user).
As you can see here, the test variation has not yet been declared as the “winner.” Typically, a winner will be declared if the following conditions are met:
- One variation has a probability to be best score above 95% (the threshold can be changed in some platforms using the winner significance level setting).
- The minimum test duration has passed (the default is typically 2 weeks). This is designed to make sure the results are not affected by seasonality.
In our example, despite not reaching true statistical significance, there are clear indicators that the test is producing uplift and has a 79% probability to be best. This is encouraging information that will prompt us to dive even further into how this test is performing for our primary audiences.
3. Measuring impact across primary audiences
Most experimentation platforms will offer an audience breakdown that shows how the results of a test differ when segmenting the traffic by different audiences. We recommend analyzing how the test performs across your different primary audience groups to gain a deeper understanding of your primary audiences and to begin building campaigns that better serve the needs of each group.
Given the nature of primary audiences, you will most likely see behavior differences emerge when analyzing the results of a test by audience. In our example, the high-intent group is showing a preference toward the control, or no PLP star reviews.
To make it easier to analyze your results across primary audiences and compare behavior across your segments, we’ve created a Primary Audiences Campaign Analysis Template for you to reference and use.
4. How to use the Primary Audiences Campaign Analysis Template
The Primary Audiences Campaign Analysis Template includes two tabs: an example and a blank template. The example outlines how the spreadsheet works when all data inputs have been filled. The blank template tab is where you can enter your own data to more easily analyze the impact of your campaigns and tests on your primary audiences.
Using the blank template, you will see three categories of inputs. The first input is the primary metric, which will be the subject of each analysis you are running. If you are analyzing the impact of a test on click-through rate across your primary audiences, then you would input “CTR” in this field.
The next input is where you will list your primary audiences. These will be the same across every table. We used intent-based audiences in our example.
The third input is where you’ll record the performance of your campaign against each primary audience, for both your control group and your variation(s). You should be able to pull this information from your experimentation platform.
Once you’ve entered your data, the spreadsheet will calculate the delta and populate a graph to analyze the performance of each primary metric you are measuring across all primary audiences. Looking at these graphs will give you a clear picture of how your primary audiences’ behavior changes depending on the experiment and the primary metric. Equipped with this information, you can then begin to institutionalize your learnings to make site-wide changes that better reflect the needs and preferences of your primary audiences.
5. Beyond the numbers: understanding the psychology of your results
Looking at the graphs in the Primary Audiences Campaign Analysis Template will help you gain better insight into how the results of your tests signal key information about the psychology and behavior of your primary audiences.
Once you’ve gained an understanding of the behavioral differences between your primary audiences, you can institutionalize these learnings and use them moving forward to inform your marketing campaigns.
- In our example, while Low and Medium intent users’ purchase per user increased nearly 6% when viewing star reviews on PLPs, the High intent audience preferred a cleaner PLP.
- We can also see that Low and Medium intent audiences not only purchased more, but the star experience led them to browse more too.
This data reveals insights about our audience needs. In our example, we can begin to see that Low and Medium intent users require additional social proof on PLPs in order to trust our brand, continue to browse, and ultimately purchase. On the other hand, High-intent users already trust our brand and therefore prefer a cleaner PLP that leads to a more streamlined purchase experience.
Once you’ve developed a better understanding of your primary audiences’ behaviors, you can use these insights to inform your next test.
6. Turning insights into action
Once you’ve analyzed your data and drawn conclusions about how the results relate to key behavior differences within your primary audience groups, you can begin to set up your next test to further act on your learnings and deliver a more personalized experience to your users.
In our example, because we found that purchase per user increased only for Low and Medium intent users when viewing star reviews on PLPs, we will segment our test accordingly. Low and Medium intent users will be given the star rating variation and High intent users will be given the variation without star ratings.
This second phase of testing is crucial, as it will help you uncover more insights about your audiences, in this case, our Low and Medium intent audiences, while also serving a more personalized and relevant experience to each audience group.
As you continue to gather more learnings, you can repeat this process to continuously learn more about your primary audiences, refine your experiences, and serve the best variation of your site to each primary audience group.