In discussions of whether trials were large enough, you may have heard people refer to the power of the trial as the authors presented in their sample size calculations. Such discussions are complex and confusing. As we illustrate in this chapter, whether a trial or meta-analysis is large enough depends only on the confidence interval (CI).
Hypothesis testing, on which sample size calculations are typically based, involves estimating the probability that observed results would have occurred by chance if a null hypothesis, which states that there is no difference between a treatment condition and a control condition, were true. Health researchers and medical educators have increasingly recognized the limitations of hypothesis testing1-5; consequently, an alternative approach, estimation, is becoming more popular.
How Should We Treat Patients with Heart Failure? A Problem in Interpreting Study Results
In a blinded randomized clinical trial of 804 men with heart failure, investigators compared treatment with enalapril (an angiotensin-converting enzyme [ACE] inhibitor) to treatment with a combination of hydralazine and nitrates.6 In the follow-up period, which ranged from 6 months to 5.7 years, 132 of 403 patients (33%) assigned to receive enalapril died, as did 153 of 401 patients (38%) assigned to receive hydralazine and nitrates. The P value associated with the difference in mortality is .11.
Looking at this study as an exercise in hypothesis testing and adopting the usual 5% risk of obtaining a false-positive result, we would conclude that chance remains a plausible explanation for the apparent differences between groups. We would classify this as a negative study (ie, we would conclude that no important difference existed between the treatment and control groups).
The investigators also conducted an additional analysis that compared the time pattern of the deaths occurring in both groups. This survival analysis, which generally is more sensitive than the test of the difference in proportions (see Chapter 9, Does Treatment Lower Risk? Understanding the Results), had a nonsignificant P value of .08, a result that leads to the same conclusion as the simpler analysis that focused on relative proportions at the end of the study. The authors also tell us that the P value associated with differences in mortality at 2 years (a point predetermined to be a major end point of the trial) was significant at .016.
At this point, one might excuse clinicians who feel a little confused. Ask yourself, is this a positive trial, dictating use of an ACE inhibitor instead of the combination of hydralazine and nitrates, or is it a negative study, showing no difference between the 2 regimens and leaving the choice of drugs open?
Solving the Problem: What are Confidence Intervals?
How can clinicians deal with the limitations of hypothesis testing and resolve the confusion? The solution involves posing 2 questions: (1) “What is the single value most likely to ...