ABIntuition Busters KDDTalk
ABIntuition Busters KDDTalk
ABIntuition Busters KDDTalk
ly/ABTestingIntuitionBusters
• The power to detect a 10% relative delta in the prior example was 3%
• With such low power, the False Positive Risk is at least 63%,
so at stat-sig result is more likely to be wrong than right!
• To trust it a result, a p-value threshold of 0.002 should be used.
The actual p-value was 0.013 (paper has detailed computations)
• The paper shares another reason: the converges to a normal distribution is faster
when variants are equal. Unequal variants caused material over-estimation of
type-I error on one tail and under-estimated the other tail