Who is this A/B testing guide for?

This guide is for product teams, growth marketers, analysts, and anyone planning experiments who wants to make better decisions about effect size, traffic, and test design.

What should I do after reading this page?

Use the explanation here to choose realistic assumptions, then move to the calculator or related pages to estimate the traffic needed for your experiment.

Why do many teams use 80% power?

Because it is a practical balance between sensitivity and runtime. Higher power reduces the chance of missing a real effect, but it also increases the traffic needed to run the test properly.

Can a low-power test still be useful?

Sometimes, but only if the team understands its limitations. A low-power test is more likely to miss real effects, so a non-significant result should be interpreted with extra caution.

A/B Concept

Statistical Power for A/B Tests

Power is the probability that your test will detect a real effect of the size you care about. In practice, it is one of the clearest ways to think about how likely your experiment is to miss something important.

What power means

If power is 80%, your test is designed to detect the target effect around 80% of the time if that effect truly exists. Lower power means a higher chance of missing real differences.

That makes power a core planning setting rather than an advanced detail.

Why higher power requires more sample

More power means you want greater sensitivity, which usually requires more observations. That tradeoff becomes especially visible when the expected effect is small.

Many teams use 80% as a common standard because it balances rigor and practicality.

How to use it in planning

Power only makes sense together with the minimum detectable effect. A test can be highly powered to detect a large effect and underpowered to detect a small one.

That is why power should never be discussed without the target effect size.

How power affects real experiment decisions

Power matters because an underpowered test can end with no significant result even when a meaningful change is truly present. That often leads teams to wrongly conclude that an idea did not work.

Thinking about power also helps set expectations with stakeholders. A test with limited traffic may still be worth running, but everyone should understand what size of effect it can and cannot detect reliably.

Use power to judge the risk of missing a real effect
Discuss power together with MDE, never by itself
Explain the tradeoff between sensitivity and runtime
Avoid treating non-significant results as proof of no difference

Statistical Power for A/B Tests

What power means

Why higher power requires more sample

How to use it in planning

How power affects real experiment decisions

Related pages for Statistical Power for A/B Tests

Frequently Asked Questions