This JAMA Guide to Statistics and Methods describes the reasons for using cluster randomization in a clinical trial and how to analyze and interpret the results from a trial that did.
Sometimes a new treatment is best introduced to an entire group of patients rather than to individual patients. Examples include when the new approach requires procedures be followed by multiple members of a health care team or when the new technique is applied to the environment of care (eg, a method for cleaning a hospital room before it is known which patient will be assigned the room). This avoids confusion that could occur if all caregivers had to keep track of which patients were being treated the old way and which were being treated the new way.
One approach to evaluate the efficacy of such treatments—treatments for which the application typically involves changes at the level of the health care practice, hospital unit, or even health care system—is to conduct a cluster randomized trial. In a cluster randomized trial, study participants are randomized in groups or clusters so that all members within a single group are assigned to either the experimental intervention or the control.1,2 This contrasts with the more familiar randomized clinical trial (RCT) in which randomization occurs at the level of the individual participant, and the treatment assigned to one study participant is essentially independent of the treatment assigned to any other. In a cluster randomized trial, the cluster is the unit randomized, whereas in a traditional RCT, the individual study participant is randomized. In both types of trials, however, the outcomes of interest are recorded for each participant individually.
Although there are both theoretical and pragmatic reasons for using cluster randomization in a clinical trial, doing so introduces a fundamental challenge to those analyzing and interpreting the results of the trial: study participants from the same cluster (eg, patients treated within the same medical practice or hospital unit) tend to be more similar to each other than participants from different clusters.2 This nearly universal fact violates a common assumption of most statistical tests, namely, that individual observations are independent of each other. To obtain valid results, a cluster randomized trial must be analyzed using statistical methods that account for the greater similarity between individual participants from the same cluster compared with those from different clusters.2-4
In a JAMA article, Curley et al5 reported the results of the RESTORE trial, a cluster randomized clinical trial evaluating a nurse-implemented, goal-directed sedation protocol for children with acute respiratory failure receiving mechanical ventilation in the intensive care setting, comparing this approach with usual care. The trial evaluated the primary hypothesis that the intervention group—patients treated in intensive care units (ICUs) using the goal-directed sedation protocol—would have a shorter duration of mechanical ventilation. Thirty-one pediatric ICUs, the “clusters,” were randomized to either implement the goal-directed sedation protocol or continue their usual care practices.
Cluster randomization should be used when it would be impractical or impossible to assign and correctly deliver the experimental and control treatments to individual study participants.1,2 Typical situations include the study of interventions that must be implemented by multiple team members, that affect workflow, or that alter the structure of care delivery. As in the RESTORE trial, interventions that involve training multidisciplinary health care teams are practically difficult to conduct using individual-level randomization, as health care practitioners cannot easily unlearn a new way of taking care of patients.
Cluster randomization is often used to reduce the mixing or contamination of treatments in the 2 groups of the trial, as might occur if patients in the control group start to be treated using some of the approaches included in the experimental treatment group, perhaps because the practitioners become habituated to the experimental approach or perceive it to be superior.1,2 For example, consider an injury prevention trial testing the effect of offering bicycle helmets to students in a classroom on the incidence of subsequent head injury. If a conventional RCT were conducted and half of the students in each classroom received helmets, it is likely that some of the other half of students would inform their parents about the ongoing intervention and many of these children might also begin to use bicycle helmets. Contamination is a form of crossover between treatment groups and will generally reduce the observed treatment effect using the usual intent-to-treat analysis.6 Cluster randomization may also be used to reduce potential selection bias. Physicians choosing individual patients from their practice to consent for randomization may tend to enroll patients with specific characteristics (eg, lesser or greater illness severity), reducing the external validity of the trial. Assignment of the treatment group at the practice level, with the application of the assigned treatment to all patients treated within the practice, may minimize this problem.
Using a cluster randomized design also can offer practical advantages. For example, if 2 or more treatments are considered to be within the standard of care, and depending on the risks associated with treatment, streamlined consent procedures or even integration of general and research consents may be used to reduce barriers to participation and ensure a truly representative patient population is enrolled in the trial.1,7
What Are Limitations of Cluster Randomization?
Any time data are clustered, the statistical analysis must use techniques that account for the likeness of cluster members.2,3 Extensions of the more-familiar regression models that are appropriate for the analysis of clustered data include generalized estimating equations (as used in the RESTORE trial), mixed linear models, and hierarchical models. While the proper use of these approaches is complex, the informed reader should be alert to statements that the analysis method was selected to account for the similarity or correlations of data within each cluster. The intracluster correlation coefficient (ICC) quantifies the likeness within clusters and ranges from 0 to 1, although it is frequently in the 0.02 to 0.1 range.4 A value of 0 means each member of the cluster is not more like the other members, with respect to the measured characteristic, than they are to the population at large, so each additional individual contributes the same amount of new information. In contrast, a value of 1 means that each member of the cluster is exactly the same as the others in the cluster, so any participants beyond the first contribute no additional information at all. A larger ICC, representing greater similarity of results within clusters, will decrease the effective sample size of the trial, reducing the precision of estimates of treatment effects and the power of the trial.2 If the ICC is high, the effective sample size will be closer to the number of groups, and if the ICC is low, the effective sample size will be closer to the total number of individuals in the trial.
It is often impossible to maintain blinding of treatment assignment in a cluster randomized trial, both because of the nature of treatments and because of the number of patients in a given location all receiving the same treatment. It is well known that trials evaluating nonblinded interventions have a greater risk of bias.
Why Did the Authors Use Cluster Randomization in This Particular Study?
The RESTORE trial investigators used cluster randomization because they were introducing a nurse-implemented, goal-directed sedation protocol that required a change in behavior among multiple caregivers within each ICU. A major component of the experimental intervention was educating the critical care personnel regarding the perceived benefits and risks of sedation agents and use patterns relative to others. Had individual-level randomization been used to allocate patients, it is highly likely that the patients randomized to standard care would have received care that was somewhere between the prior standard and the new protocol, because all ICU caregivers would have been informed about the scientific and pharmacological basis for the goal-directed sedation protocol.
How Should Cluster Randomization Findings Be Interpreted in This Particular Study?
As in any clinical trial, randomization may or may not work effectively to create similar groups of patients. In the RESTORE trial, some differences between the intervention groups were observed that might partially explain the negative primary outcome. Specifically, the intervention group had a greater proportion of younger children—a group that is more difficult to sedate.8 The RESTORE trial investigators used randomization in blocks to ensure balance of pediatric ICU sizes between groups; methods exist to balance groups in cluster trials on multiple factors simultaneously.9 Although the RESTORE trial yielded a negative primary outcome, the authors noted some promising secondary outcomes related to clinicians’ perception of patient comfort. However, these assessments were unblinded and thus may be subject to bias.
CAVEATS TO CONSIDER WHEN LOOKING AT A CLUSTER RANDOMIZED TRIAL
When evaluating a cluster randomized trial, the first consideration is whether the use of clustering was well justified. Would it have been possible to use individual-level randomization and maintain fidelity in treatment allocation and administration? What would be the likelihood of contamination? Cluster randomization cannot minimize baseline differences between 2 treatment groups as efficiently as individual-level randomization. The design must be justified for scientific or logistical reasons to accept this trade-off.10
Second, the usual sources of bias should be considered, such as patient knowledge of treatment assignment and unblinded assessments of outcome. Although not specific to cluster randomized trials, these sources of bias tend to be more problematic.
Third, it is important to consider whether the intracluster correlation was appropriately accounted for in the design, analysis, and interpretation of the trial.1,10 During the design, the likely ICC should be considered to ensure the planned sample size is adequate. The analysis should be based on statistical methods that account for clustering, such as generalized estimating equations.
Finally, the interpretation should consider the extent with which the 2 treatment groups contained an adequate number, size, and similarity of clusters and whether any clusters were lost to follow-up.
The following disclosures were reported at the time this original article was first published in JAMA.
Conflict of Interest Disclosures: None reported.
statement: extension to cluster randomised trials. BMJ
. 2004;328(7441):702–708. Medline:15031246
RL. Advanced statistics: statistical methods for analyzing cluster and cluster-randomized data. Acad Emerg Med
. 2002;9(4):330–341. Medline:11927463
AP. Conditional independence in statistical theory
. J R Stat Soc Series B
et al. Targeted versus universal decolonization to prevent ICU
infection. N Engl J Med
. 2013;368(24):2255–2265. Medline:23718152
et al. Tolerance and withdrawal from prolonged opioid
use in critically ill children. Pediatrics
. 2010;125(5):e1208–e1225. Medline:20403936
et al. A multilevel intervention to increase community hospital use of alteplase for acute stroke
(INSTINCT). Lancet Neurol
. 2013;12(2):139–148. Medline:23260188
et al. Impact of CONSORT
extension for cluster randomised trials on quality of reporting and study methodology. BMJ
. 2011;343:d5886. Medline:21948873