Were Patients Randomized? If Not, Did the Investigators Use an Alternative Design That Minimizes the Risk of Bias?
The observational design of most QI studies may reflect events outside the control of the researchers (eg, change caused by a new policy) or the impracticability of randomization (eg, unwillingness to participate in a control group).13 Such observational designs make it difficult to determine whether the QI intervention is responsible for observed changes, therefore generating evidence regarding the effects of the intervention on the outcome that warrants only low confidence.22
Nonrandomized designs commonly used in QI studies include before-after studies (with and without concurrent controls), time series designs (interrupted or not), and stepped wedge designs.23 Changes over time in patient populations or changes in practice unrelated to the QI intervention introduce risk of bias in uncontrolled before-after studies,23 which often overestimate the magnitude of benefit.23 For instance, investigators attributed an improvement in surgical outcomes in patients undergoing coronary artery bypass graft surgery in New York State to publicly reporting hospital and surgeon outcomes.24 However, a subsequent investigation found that similar improvements occurred across the United States without any such intervention.25
Controlled before-after designs are infrequently used because of difficulty in identifying a suitable control group. However, even participants who appear well matched (eg, have similar demographic characteristics) at baseline may differ on important unmeasured factors (eg, adherence to the study intervention).
Another infrequently used but potentially even more powerful design involves introducing the intervention at different times in different settings, effectively conducting a series of before-after studies in the context of a single intervention. This design, which we refer to as an interrupted time series design, may increase confidence in the causal link between the intervention and outcome—or scupper a previous spurious conclusion about a causal link.
For example, a study of thoracic surgery addressed the effect of a clinical pathway for postoperative management. The authors compared outcomes during the baseline vs post-intervention periods, reporting significant improvements.26 However, reanalysis revealed a statistically significant preintervention trend, and time-series regression techniques revealed no significant differences after the intervention.27
The interrupted time series design does not, however, protect against the effects of other important events that may coincide with the study intervention. With this design, the study periods must be explicitly defined, and statistical techniques may be required to account for autocorrelation (ie, the same type of data collected at adjacent time points are likely more similar than data collected at widely spaced intervals) to avoid overestimating treatment effects.23 Statistical process control is another common method used to analyze variations in the performance of a process over time. Variations may include improved performance in response to a QI intervention that, over time, will stabilize at a new, improved level.28
Studies using a stepped wedge design introduce the QI intervention to participants sequentially so that, by the end of the study, all participants are exposed to the intervention.29 The order in which the intervention is introduced may be randomized, further reducing risk of bias.
For example, following a national UK recommendation to implement critical care outreach teams (CCOTs), one hospital undertook a stepped wedge trial that evaluated the effects on hospital mortality and length of stay.30 The CCOTs were introduced during 32 weeks, with pairing of wards to match important patient characteristics. One ward from each pair was randomized to earlier CCOT introduction, with usual care occurring in the other paired ward until subsequent CCOT introduction, allowing a matched comparison across 8 pairs of wards. The timing of CCOT initiation across ward pairs was randomly determined and phased in over time, with introduction of CCOTs in an additional ward pair at 4-month intervals. This study found that the CCOT intervention reduced in-hospital mortality vs usual care (odds ratio, 0.52; 95% confidence interval [CI], 0.32-0.85).
Some nonrandomized studies—if designed, conducted, and analyzed appropriately—may provide robust results.16,31,32 Statistical methods (eg, regression analysis) to account for confounding variables (ie, prognostic factors that bias results because they are associated with both the QI intervention and the outcome) may strengthen observational studies.32 When RCTs are used in QI, they are often pragmatic designs that evaluate whether the QI intervention is effective among broadly defined patient groups receiving care in real-world settings.23,33
For example, a study targeted at adult primary care patients with type 2 diabetes randomized 511 patients from 46 clinicians to receive usual care vs shared (patient-clinician) access to a Web-based, electronic diabetes tracker monitoring 13 indicators (eg, blood pressure, glycated hemoglobin [HbA1c] level), as well as providing clinical advice to improve diabetes care.34 This pragmatic RCT was conducted among community-based clinicians to evaluate effectiveness in the setting in which most patients with diabetes receive care. The intervention group had significantly more checks of diabetic indicators than the usual care group at 6 months (difference, 1.27 more checks; 95% CI, 0.79-1.75) and experienced significant improvements in blood pressure and levels of HbA1c. However, HbA1c level may be a poor surrogate for patient-important outcomes (ie, randomized trials of intensive therapy to achieve low HbA1c targets fail to reveal reductions in stroke or cardiovascular death35); improved blood pressure is a more reliable surrogate outcome, although it too may fail to reveal reductions for some outcomes.36
If the Intervention Primarily Targeted Clinicians, Was the Clinician or Clinician Group the Unit of Analysis?
Clinicians working in the same practice, ward, or hospital share a common environment that influences practice and outcomes. Quality improvement investigators must consider this issue in their analysis. For instance, if investigators randomized hospitals to receive an intervention to improve clinician practice, a significant result may occur if data on individual clinicians' practice are analyzed without considering that individual clinicians' results are clustered (ie, physicians working in a particular hospital may practice more similarly to one another than to physicians in other hospitals).19 Failure to appropriately consider this unit of analysis or clustering issue is common in QI studies.37
For example, in an RCT that evaluated the effect of clinical reports encouraging use of peritoneal dialysis among patients with end-stage renal disease, 10 physicians who cared for 152 patients were randomized to the intervention or control groups.38 The authors reported that a significantly greater number of patients started peritoneal dialysis in the group of physicians randomized to the intervention (P = .04). However, if the correct unit of analysis (ie, the 10 physicians rather than the 152 patients) was used or special statistical methods were used to account for clustering of patient outcomes by physician, this result is unlikely to have reached statistical significance.37
The typical solution is to randomize clinicians in groups to intervention and control, referred to as a cluster randomized trial.
Was Data Quality Acceptable?
Although the importance of methods to control data quality is well accepted in clinical research, the same is not true in many QI studies in which data are often collected as part of routine care, without additional resources or training in research methods.32,39 Deficiencies in data quality can result in high risk of bias, and as a user of QI studies, you need to consider data quality in all study phases (Box 11.7-2).
Data Quality Control Methods for Quality Improvement (QI)
|Favorite Table|Download (.pdf) BOX 11.7-2
Data Quality Control Methods for Quality Improvement (QI)
Were the aims of the QI project clearly stated?
Were appropriate definitions and measurement systems reported for all important data?
Were staff trained, with appropriate quality assurance review, regarding data collection?
Was there appropriate review and reporting of missing and outlier/erroneous data?
Was participant flow (eg, patients, clinicians, and hospitals) through the study explicitly reported (ie, number initially approached, participated, and dropped out)?
For example, in a prospective, multicenter study (7688 patients) that evaluated the implementation of a surgical safety checklist on patient complications,40 data collectors at 8 international sites (including resource-poor settings) received training and supervision from local researchers on the identification, classification, and recording of process-of-care measures and complications according to the National Surgical Quality Improvement Program (NSQIP) from the American College of Surgeons. However, this training occurred only at the beginning of the QI study, whereas standard NSQIP training occurs during a 1-year period, suggesting a potential limitation in training for data collectors. Furthermore, many of the complications evaluated (eg, deep venous thrombosis) require specific diagnostic tests for accurate detection, but the proportion of patients systematically evaluated for complications as part of routine care was not reported. Thus, data quality issues may have influenced the association between the surgical safety checklist and subsequent complications.
Given the resource constraints faced in conducting most QI research, missing data are common. Missing data should be explicitly reported because they can bias study results. If the magnitude of missing data and the potential for bias are both low in relation to the number of outcome events,19 it may be appropriate to report the degree of missing data without explicitly addressing these data in the analysis.32 In other situations, investigators may conduct sensitivity analyses to determine the potential effects of loss to follow-up. Results that do not change substantially with sensitivity analyses provide greater confidence.32
USING THE GUIDE
The use of an uncontrolled before-after design suggests a high risk of bias; an interrupted time series or stepped wedge (with randomization) study design would have less potential for bias. Quality assurance over data collection was, however, explicitly reported, with very few missing data. Moreover, the authors used regression methods to account for known imbalances between periods.1