This JAMA Guide to Statistics and Methods explains the importance of considering sample size when interpreting study results, how the power analysis can help calculate the appropriate sample size, and the potential pitfalls of this approach.
Koegelenberg et al1 reported the results of a randomized clinical trial (RCT) that investigated whether treatment with a nicotine patch in addition to varenicline produced higher rates of smoking abstinence than varenicline alone. The primary results were positive; that is, patients receiving the combination therapy were more likely to achieve continuous abstinence at 12 weeks than patients receiving varenicline alone. The absolute difference in the abstinence rate was estimated to be approximately 14%, which was statistically significant at level α = .05.
These findings differed from the results reported in 2 previous studies2,3 of the same question, which detected no difference in treatments. What explains this difference? One explanation offered by the authors is that the previous studies “…may have been inadequately powered,” which means the sample size in those studies may have been too small to identify a difference between the treatments tested.
Why Is Power Analysis Used?
The sample size in a research investigation should be large enough that differences occurring by chance are rare but should not be larger than necessary, to avoid waste of resources and to prevent exposure of research participants to risk associated with the interventions. With any study, but especially if the study sample size is very small, any difference in observed rates can happen by chance and thus cannot be considered statistically significant.
In developing the methods for a study, investigators conduct a power analysis to calculate sample size. The power of a hypothesis test is the probability of obtaining a statistically significant result when there is a true difference in treatments. For example, suppose, as Koegelenberg et al1 did, that the smoking abstinence rate were 45% for varenicline alone and 14% larger, or 59%, for the combination regimen. Power is the probability that, under these conditions, the trial would detect a difference in rates large enough to be statistically significant at a certain level α (ie, α is the probability of a type I error, which occurs by rejecting a null hypothesis that is actually true).
Power can also be thought of as the probability of the complement of a type II error. If we accept a 20% type II error for a difference in rates of size d, we are saying that there is a 20% chance that we do not detect the difference between groups when the difference in their rates is d. The complement of this, 0.8 = 1 − 0.2, or the statistical power, means that when a difference of d exists, there is an 80% chance that our statistical test ...