This JAMA Guide to Statistics and Methods explains the gatekeeping approach to evaluating the statistical significance of secondary outcomes by not pursuing analysis once a primary finding or a higher order secondary outcome is rendered nonsignificant.
Clinical trials characterizing the effects of an experimental therapy rarely have only a single outcome of interest. For example, the CLEAN-TAVI investigators evaluated the benefits of a cerebral embolic protection device for stroke prevention during transcatheter aortic valve implantation.1 The primary end point was the reduction in the number of ischemic lesions observed 2 days after the procedure. The investigators were also interested in 16 secondary end points involving measurement of the number, volume, and timing of cerebral lesions in various brain regions. Statistically comparing a large number of outcomes using the usual significance threshold of .05 is likely to be misleading because there is a high risk of falsely concluding that a significant effect is present when none exists.2 If 17 comparisons are made when there is no true treatment effect, each comparison has a 5% chance of falsely concluding that an observed difference exists, leading to a 58% chance of falsely concluding at least 1 difference exists. The formula 1 − [1 − α]N can be used to calculate the chance of obtaining at least 1 falsely significant result, when there is no true underlying difference between the groups (in this case α is .05 and N is 17 for the number of tests).
To avoid a false-positive result, while still comparing the multiple clinically relevant end points used in the CLEAN-TAVI study, the investigators used a serial gatekeeping approach for statistical testing. This method tests an outcome, and if that outcome is statistically significant, then the next outcome is tested. This minimizes the chance of falsely concluding a difference exists when it does not.
Why Is Serial Gatekeeping Used?
Many methods exist for conducting multiple comparisons while keeping the overall trial-level risk of a false-positive error at an acceptable level. The Bonferroni approach3 requires a more stringent criterion for statistical significance (a smaller P value) for each statistical test, but each is interpreted independently of the other comparisons. This approach is often considered to be too conservative, reducing the ability of the trial to detect true benefits when they exist.4 Other methods leverage additional knowledge about the trial design to allow only the comparisons of interest. In the Dunnett method for comparing multiple experimental drug doses against a single control, the number of comparisons is reduced by never comparing experimental drug doses against each other.5 Multiple comparison procedures, including the Hochberg procedure, have been discussed in a prior JAMA Guide to Statistics and Methods.2
Description of the Method
A serial gatekeeping procedure controls the ...