Do you remember your first A / B test? I do. (Nerdy, I know.)
I was thrilled and scared at the same time because I knew I had to use some of what I learned in college to do my job.
There were a few aspects of A / B testing that I remembered – for example, I knew you needed a large enough sample size to run the test, and you had to run the test long enough to get statistically significant results receive.
But … that's pretty much it. I wasn't sure how big "big enough" was for sample sizes and how long "long enough" was for test durations – and Googling gave me a variety of answers that my college statistics courses definitely hadn't prepared me for.
Turns out I wasn't alone: these are two of the most common A / B test questions we get from customers. The reason the typical answers of a Google search aren't that helpful is because they're A / B testing in an ideal, theoretical world with no marketing.
So I figured I'd do the research to find a practical answer to that question for you. By the end of this post, you should know how to find the right sample size and timeframe for your next A / B test. Let's dive in.
A / B test sample size and time frame
In theory, to determine a winner between Variation A and Variation B you have to wait until you have enough results to determine if there is a statistically significant difference between the two.
Depending on your company, the sample size, and the way you do the A / B testing, statistically significant results can be achieved in hours, days, or weeks – and all you have to do is hold out until you get those results. In theory, you shouldn't limit the amount of time you collect results.
For many A / B tests, waiting isn't a problem. Test headline copy on a landing page? It's cool to wait a month for results. Same goes for blog CTA creatives – you'd go for the long term lead generation game anyway.
However, certain aspects of marketing require shorter A / B testing deadlines. Take email as an example. With email, waiting for an A / B test to complete can be problematic for several practical reasons:
1. Every email sent has a limited target group.
Unlike a landing page (which will keep you targeting new audiences over time), once you've submitted an email A / B test, you won't be able to "add" more people to that A / B test. So you need to find out how much juice is squeezed out of your email.
This usually requires you to send an A / B test to the smallest part of your list that is required to produce statistically significant results, select a winner, and then forward the winning variant to the rest of the list.
2. If you run an email marketing program, you must check at least a few emails a week. (In reality, probably a lot more than that.)
If you spend too much time collecting results, you could miss your next email – which could be worse than sending a non-statistically significant winner email to a segment of your database.
3. E-mails are often designed to be timely.
Your marketing emails are optimized to be delivered at a specific time of the day, regardless of whether your emails support the time of a new campaign start and / or land in your recipient's inboxes at a time when they get them would like to receive. So if you wait for your email to be completely statistically significant, you may be missing out on freshness and relevance – which could ruin the purpose of your emailing in the first place.
This is why there is a "timing" setting built into email A / B testing programs: if at the end of this time frame none of the results are statistically significant, a variation (which you select in advance) is sent to the rest of your team perform. That way, you can still A / B test email, but you can also bypass your email marketing planning needs and make sure users are getting the latest content.
In order to A / B test email while optimizing your mailings for the best results, you need to consider both sample size and timing.
Next – how to use data to find your sample size and timing.
How to determine the sample size for an A / B test
Now let's examine how to calculate the sample size and timing you need for your next A / B test.
For our purposes, we'll use email as an example to demonstrate how to determine the sample size and timing for an A / B test. It is important to note, however, that the steps in this list can be used for any A / B test, not just email.
Let's dive in.
As mentioned above, any A / B test you send can only be sent to a limited audience. So you need to figure out how to maximize the results of this A / B test. To do this, you need to identify the smallest part of your overall list that is necessary for statistically significant results. How to calculate it.
1. Check that your list contains enough contacts to test a sample at all.
To test an example of your List A / B, you'll need a decently large list size – at least 1,000 contacts. When you have less than that on your list, the proportion of your list that you need to A / B test to get statistically significant results just keeps getting bigger.
For example, to get statistically significant results from a small list, you might need to test 85% or 95% of your list. And the results of the people on your list who haven't been tested are so small that you might as well have sent half of your list one version of email and the other half another, and then measured the difference.
Your results may not end up being statistically significant, but at least you'll gain insight as you expand your lists to include 1,000+ contacts. (For more tips on growing your email list so you can hit the 1,000 contacts threshold, check out this blog post.)
Note for HubSpot customers: 1,000 contacts are also our benchmark for performing A / B tests on sample emails. If your selected list has fewer than 1,000 contacts, the A version of your test will automatically be sent to half of your list and the B will be sent to the other half.
2. Use a sample size calculator.
Next, you'll want to find a sample size calculator – SurveySystem.com has a good free sample size calculator.
Here's what it looks like when you open it:
3. Enter the confidence level, confidence interval and population of your email into the tool.
Yeah, that's a lot of statistics jargon. These terms are translated into these terms in your e-mail:
population: Your sample represents a larger group of people. That larger group is called your population.
In email, your population is the typical number of people on your list that email will be delivered to – not the number of people you've emailed to. To calculate the population, I look at the last three to five emails you sent to this list and average the total number of emails delivered. (Use the average when calculating the sample size as the total number of emails delivered will vary.)
Confidence interval: You may have referred to this as your "margin of error". Many polls use this, including political polls. This is the range of results you can expect from this A / B test when run on the entire population.
For example, if you have an interval of 5 in your emails and 60% of your sample open your variation, you can be sure that between 55% (60 minus 5) and 65% (60 plus 5) would have opened these E too -Mail. The larger the interval you choose, the more certain you can be that the actual actions of the populations in that interval have been taken into account. At the same time, large intervals lead to less definitive results. It's a compromise to make in your emails.
For our purposes, it's not worth going too far into confidence intervals. If you are just starting out with A / B testing, I would recommend choosing a smaller interval (example: around 5).
Confidence level: Learn how confident you can be that your sample results are within the confidence interval above. The lower the percentage, the less sure you can be about the results. The higher the percentage, the more people you will need in your sample.
Note for HubSpot customers: The HubSpot email A / B tool automatically uses the 85% confidence level to determine a winner. Since this option is not available in this tool, I would suggest choosing 95%.
Email A / B test example:
Let's imagine we are sending our first A / B test. Our list contains 1,000 people and a 95% deliverability rate. We want to be 95% sure that our successful email metrics are within a 5-point interval of our population metrics.
We added the following to the tool:
- population: 950
- Confidence level: 95%
- Confidence interval: 5
4. Click on "Calculate" and your sample size will be spat out.
Ta-da! The calculator spits out your sample size.
In our example, our sample size is: 274.
This is the size your variations need to be. So if you have a control and a variant for your email sending, you need to double that number. If you had one control and two variations, you'd triple them. (And so on.)
5. Depending on your e-mail program, you may need to calculate the sample size percentage for the entire e-mail.
HubSpot customers, I'll see you for this section. When you run an email A / B test, you need to choose the percentage of contacts to send the list to – not just the raw sample size.
To do this, you need to divide the number in your sample by the total number of contacts in your list. Here's what this math looks like using the example numbers above:
274 / 1,000 = 27.4%
This means that each sample (both your control and your variation) needs to be sent to 27-28% of your target audience – in other words, roughly 55% of your total list.
And that's it! You should be ready to choose your airtime.
How to choose the right time frame for your A / B test
To find out the right time frame for your A / B test, we'll use the example of sending emails. However, this information should still apply regardless of the type of A / B test you are performing.
However, your timeframe also depends on your company's goals. If you want to design a new landing page by Q2 2021 and Q4 2020, you should probably complete your A / B test by January or February so that you can use those results to create your profit page.
For our purposes of emailing, for example, let's come back: you need to figure out how long to run your email A / B test before sending a (winning) version to the rest of your list.
Figuring out the timing aspect is a little less statistical, but you should definitely use previous data to make better decisions. Here's how you can do it.
If you don't have any time constraints on when to send the winning email to the rest of the list, move on to your analysis.
Find out when your email is opened / clicked (or whatever your success metrics are). Take a look at your past emails to find out.
For example, what percentage of all clicks did you receive on your first day? If you find that you get 70% of your clicks in the first 24 hours and 5% every day thereafter, it makes sense to limit the time window for email A / B testing to 24 hours as it doesn't It's worth delaying your results in order to gather some extra data.
In this scenario, you probably want to limit your time window to 24 hours. After 24 hours, your email program should inform you whether a statistically significant winner can be determined.
Then it is up to you what to do next. If you have a large enough sample and find a statistically significant winner at the end of the trial period, many email marketing programs will automatically and instantly send the winning variant.
If you have a large enough sample and there is no statistically significant winner at the end of the trial period, email marketing tools may also allow you to automatically send a variation of your choice.
If you have a smaller sample size or do a 50/50 A / B test, it is entirely up to you when to send the next email based on the results of the first email.
If you have time constraints on when to send the winning email to the rest of the list, find out what time you can send the winner without it being too early or affecting other email sends.
For example, if you sent an email at 3:00 p.m. EST for a flash sale that ends at midnight EST, you don't want to determine an A / B test winner at 11:00 PM. Instead, you want to send the email closer to 6 or 7 p.m. – This gives the people who are not involved in the A / B test enough time to respond to your email.
And that's about it, folks. After you've done these calculations and examined your data, you should be in much better shape to run successful A / B tests – ones that are statistically valid and will help you move the needle on your goals.