What drives A/B test sample size
A/B test sample size depends mainly on four choices: your confidence level, your target power, your baseline conversion rate, and the minimum detectable effect you care about.
Smaller expected lifts require more traffic, while larger expected lifts require less. Lower baseline conversion rates also tend to increase the sample needed.
Why this matters
Running an experiment with too little data makes it easier to miss real differences or overreact to random noise. Planning sample size in advance reduces the temptation to stop early based on unstable results.
This page gives you a practical target per variant so you can judge whether a test is realistic before launch.
- Estimate traffic needs before launch
- Set realistic test durations
- Avoid underpowered experiments
- Align teams on what counts as a meaningful lift
How to use the result
The per-variant result tells you roughly how many observations each variant should receive. The total sample size is the combined traffic across both variants.
If the result looks too large for your available traffic, the usual next step is to reconsider the minimum detectable effect, not to run the same test with less data.
How to turn the result into a test plan
Once you have a per-variant sample target, compare it with weekly traffic to estimate how long the experiment will need to run. That helps you decide whether the test is realistic before design and engineering work is committed.
Sample size is only one part of experiment quality. Clean tracking, a stable baseline, and a clear stopping rule still matter because a large sample cannot rescue a poorly run test.
- Estimate duration from per-variant traffic, not total site traffic
- Choose the minimum detectable effect before launch
- Keep allocation and tracking stable during the run
- Avoid stopping early when results look temporarily promising