Investigators may stop randomized clinical trials (RCTs) earlier than planned because of perceived harm of the experimental intervention, because they lose hope in achieving a positive result, or because the sponsor wishes to save money.1 The reason for early stopping that may have the most effect on clinical practice, however, is that investigators note treatment effects that appear to be unlikely by chance—and that are usually large—that persuade them that the experimental intervention is beneficial. Trials stopped early for apparent benefit—which we will refer to as truncated RCTs (tRCTs)—often receive considerable attention. They appear in the most prominent journals and in the popular press2,3 markedly increasing the likelihood of widespread dissemination and subsequent citation. These trials may, with remarkable rapidity, form the basis of practice guidelines and criteria for quality of medical care—and such recommendations may persist after subsequent studies have debunked the results of the tRCTs. For example, such has been the fate of stopped-early RCTs documenting the effect of tight glucose control with insulin in patients in the intensive care unit, β-blockers in patients undergoing vascular surgery, and activated protein C in sepsis.4
Truncated Randomized Clinical Trials Are at Risk of Overestimating Treatment Effects
Truncated RCTs will, on average, overestimate treatment effects, and this overestimation may be large, particularly when tRCTs have a small number of outcome events. To understand this overestimation, imagine a number of similar RCTs that address a particular research question in which the truth is a small treatment effect. If trials are at low risk of bias, their results will vary only because of chance. Some trials will start and continue near the truth. However, because of imprecision when the sample size is still small, some will reveal apparent harm early on, and some will reveal large overestimates: the latter 2 categories of trials will approach the truth as the data accumulate (Figure 11.3-1).
Theoretical Distribution of Randomized Clinical Trial Results as Data Accumulate
Truncated RCTs will belong to the group of trials that overestimate because they are at the high end of the random distribution of results. Correspondingly, the non-tRCTs will tend to slightly underestimate. Thus, the overestimation from tRCTs is largely the result of random error. If such studies were to continue to their planned sample sizes, then because of what Pocock and White5 have described as “regression to the truth,” they would still produce overestimates of effect, but those overestimates would be smaller than those seen with early stopping.
As Figure 11.3-1 suggests, large random differences from the true effect are more likely to happen early in a trial when sample sizes are small.5,6 Thus, trials stopped early with extreme stopping boundaries will often produce effect estimates much larger than the truth. The smaller the sample size at the time of an interim look at the data, and in particular the smaller the number of outcome events (see Chapter 12.3, What Determines the Width of the Confidence Interval?), the larger an effect estimate needs to be to qualify for standard stopping rules, and thus the larger the overestimate of effect is likely to be.
Although statistical simulation can readily reveal how tRCTs will overestimate treatment effects,5 trials in which investigators have looked at the data as they accumulated but refrained from stopping early also provide compelling evidence. For example, investigators conducted a trial that compared 5 vs 4 courses of chemotherapy for acute myeloid leukemia.7 They observed an extremely large treatment effect early in their RCT (Figure 11.3-2). Their results crossed their prespecified stopping boundary. Nevertheless, because they correctly concluded that the effect was too good to be true, they continued recruiting and following up patients. Ultimately, the apparent beneficial effect disappeared, and the final result revealed a weak trend toward harm. Had the investigators adhered to their initial plan to stop early if they saw a sufficiently large effect and published this erroneous result, subsequent leukemia patients would have undergone additional toxic chemotherapy without benefit.
A Near Miss in a Trial of Chemotherapy for Leukemia
Abbreviations: CI, confidence interval; HR, hazard ratio; P, patients.
Reproduced from Wheatley and Clayton.7 Copyright © 2003, with permission from Elsevier.
Estimates From Truncated Randomized Clinical Trials Are Frequently Too Good to Be True
A systematic review and meta-analysis that compared the treatment effect from tRCTs with that from meta-analyses of RCTs addressing the same research question but not stopped early found the pooled ratio of relative risks (RRs) in tRCTs vs matching nontruncated RCTs was 0.71 (95% confidence interval [CI], 0.65-0.77).3 This implies that, for instance, if the RR from non-tRCTs was 0.8 (a 20% relative risk reduction [RRR]), the RR from the tRCTs would be, on average, approximately 0.57 (a 43% RRR—more than double the apparent benefit). Non-tRCTs with no evidence of benefit (ie, with an RR of 1.0) would, on average, be associated with a 29% RRR in tRCTs that addressed the same research question.
This overestimation could not be explained by differences in methodologic quality (allocation concealment and blinding) or by the presence of a statistical stopping rule, but it was associated with the total number of outcome events.3 As we describe subsequently, the results of the meta-analysis provide guidance regarding numbers of events that provide protection against large overestimates.
Truncated Randomized Clinical Trials May Prevent a Comprehensive Assessment of Treatment Effect
In 32 of 143 tRCTs included in another systematic review, the decision to stop was based on a composite end point (an aggregate of end points of various importance).2 Use of a composite end point compounds the risk of misleading results: the least patient-important outcome that makes up the composite end point (eg, angina in a composite of death, myocardial infarction, and angina) (see Chapter 12.4, Composite End Points) may drive the decision to stop early. Consequently, few events that are most important to patients may accrue.
Even when investigators do not use composite end points, few events are likely to accrue in the end points not driving the decision to stop early for benefit. These end points may include patient-important beneficial events (eg, overall survival rather than progression-free survival8) or adverse events. Lack of adequate safety data as a result of stopping the trial early may in turn affect the perceived and actual risk-benefit ratios (ie, overestimating the benefit and underestimating the risk) of implementing the intervention in clinical practice.9