Statistical Power for A/B Tests
Power is the probability that your test will detect a real effect of the size you care about. In practice, it is one of the clearest ways to think about how likely your experiment is to miss something important.
What power means
If power is 80%, your test is designed to detect the target effect around 80% of the time if that effect truly exists. Lower power means a higher chance of missing real differences.
That makes power a core planning setting rather than an advanced detail.
Why higher power requires more sample
More power means you want greater sensitivity, which usually requires more observations. That tradeoff becomes especially visible when the expected effect is small.
Many teams use 80% as a common standard because it balances rigor and practicality.
How to use it in planning
Power only makes sense together with the minimum detectable effect. A test can be highly powered to detect a large effect and underpowered to detect a small one.
That is why power should never be discussed without the target effect size.
How power affects real experiment decisions
Power matters because an underpowered test can end with no significant result even when a meaningful change is truly present. That often leads teams to wrongly conclude that an idea did not work.
Thinking about power also helps set expectations with stakeholders. A test with limited traffic may still be worth running, but everyone should understand what size of effect it can and cannot detect reliably.
- Use power to judge the risk of missing a real effect
- Discuss power together with MDE, never by itself
- Explain the tradeoff between sensitivity and runtime
- Avoid treating non-significant results as proof of no difference