If Not, Did the Investigators Demonstrate Similarity in All Known Determinants of Prognosis or Adjust for Differences in the Analysis?
The risk of bias in observational studies often used to evaluate a CDSS is problematic. One observational design, the before-after design, compares outcomes before a technology is implemented with those after the system is implemented. The validity of this approach is threatened by the possibility that changes over time (called secular trends or temporal trends) in patient mix or in other aspects of health care provision are responsible for changes that investigators may attribute to the CDSS.
Consider the CDSS for management of therapy with antibiotics implemented in the late 1980s in the United States that was associated with apparent improvements in the cost-effectiveness of antibiotic ordering throughout the subsequent 5 years.5 Although this before-after study might appear compelling, changes in the health care system, including the advent of managed care, were occurring simultaneously during the study period. To control for secular trends, study investigators compared antibiotic-prescribing practices to those of other US acute care hospitals for the duration of the study. These other hospitals differed in many ways aside from the CDSS, limiting the validity of the comparison. Nevertheless, the addition of a concurrent control group strengthened the study design.
Investigators also may strengthen the before-after design by turning the intervention on and off multiple times, a type of interrupted time series design. For example, investigators used this approach to evaluate whether a CDSS that provided recommendations for venous thromboembolism prevention for surgical patients improved thromboprophylaxis use.6 There were three 10-week intervention periods that alternated with four 10-week control periods, with a 4-week washout between each period. During each intervention period, adherence to practice guidelines improved significantly and then reverted to baseline during each control period.
Although alternating intervention and control periods strengthen a before-after design, random allocation of participants to a concurrent control group remains the strongest study design for evaluating therapeutic or preventive interventions. As part of randomization, allocation of groups should be concealed from those involved in the study. Fortunately, randomization has been recognized as an important way to evaluate CDSS.7
If the Intervention Primarily Targeted Clinicians, Was the Clinician or Clinician Group (Cluster) the Unit of Analysis?
The unit of analysis is a special issue for CDSS evaluation. For most RCTs, the unit of allocation is the patient. Most CDSS evaluations target clinician behavior. Hence, investigators may randomize individual clinicians or clinician clusters, such as health care teams, hospital wards, or outpatient practices.8 Unfortunately, investigators using such designs often analyze their data as if they had randomized patients.9,10 This mistake, the unit of analysis error, occurs frequently and can generate spuriously low (ie, significant) P values.11 A unit of analysis error should be suspected if a study about a CDSS does not describe the number and characteristics of clinicians (eg, level of clinical experience or specialization, sex, and duration of EMR use) in each arm of a trial.9-11
To deepen your understanding of the problem, consider a hypothetical example. Imagine a study in which an investigator randomizes 2 teams of clinicians to a CDSS and another 2 teams to standard practice. During the study, each team sees 5000 patients. If the investigator analyzes the data as if patients were individually randomized, the sample size appears very large. However, if there are underlying differences between the teams in the patient characteristics, or in how the patients are managed, such differences—rather than the intervention—might well explain differences in the outcomes. Were this the case, we would need to randomize many teams before the patient characteristics or management styles would balance out between groups. At the one extreme, each of the 4 teams is very different; under these circumstances, it is as if we are randomizing only 4 individuals, and the sample size is effectively 4. At the other extreme, the teams are identical in all characteristics other than the intervention, in which case the situation is as if we randomized 20000 individuals, 10000 to each group.
A statistic called the intraclass correlation coefficient tells us about the correlation of observations (in this case, observations of patients) within clusters. For instance, if one team had a very high proportion of old patients with strokes and uniformly poor outcomes and another team had a high proportion of younger patients with pneumonia and uniformly good outcomes, the intraclass correlation would be high (close to 1.0), and we would be reluctant to attribute differences in outcome to the intervention. On the other hand, if both teams had a similar broad range of patients with widely varying outcomes, the intraclass correlation would be low (close to 0), and we would be more comfortable attributing differences to the intervention. Thus, if the intraclass correlation is high, then the inferences we could make would differ little from those that would be possible if we had randomized only 4 individuals (2 per group), which raises questions about our ability to ensure prognostic similarity in the 2 groups at baseline. If the intraclass correlation is low, the likelihood of prognostic balance at baseline is much greater, and the inferences we can make are similar to those that would be possible if we randomized 10000 patients to each group.
Obtaining a sufficient sample size and a balance of important prognostic factors between groups can therefore be difficult when randomizing physicians and health care teams. If only a few health care teams are available, investigators can pair them according to their similarities on numerous factors, then randomly allocate the intervention within each matched pair.12-15 A systematic review of 88 RCTs evaluating the effect of CDSSs found that 43 of 88 were cluster randomized trials and that 53 of 88 failed to either use cluster as the unit of analysis or adjust for clustering in the analysis (cluster analysis).7
Were Participants Analyzed in the Groups to Which They Were Randomized?
Clinicians should particularly attend to an issue regarding randomization. Computer competency varies, and it is common for some clinicians to not use a CDSS or to have technical difficulties accessing a CDSS, even when they are assigned to do so and have help available. Consider the following: If some clinicians assigned to CDSSs fail or refuse to receive the intervention, should these clinicians be included in the analysis? The answer, counterintuitive to some, is yes (see Chapter 11.4, The Principle of Intention to Treat and Ambiguous Dropouts).
Randomization can best accomplish the goal of balancing groups with respect to both known and unknown determinants of outcome if patients (or clinicians) are analyzed according to the groups to which they are randomized. This is the intention-to-treat principle. Deleting or moving patients after randomization compromises or destroys the balance that randomization is designed to achieve (see Chapter 11.4, The Principle of Intention to Treat and Ambiguous Dropouts).
Was the Control Group Unaffected by the Clinical Decision Support System?
The extent to which clinicians or patients in the control group have access to all or part of the CDSS intervention creates a problem of potential contamination. When the control group is influenced by the intervention, the effect of the CDSS may be diluted. Contamination may decrease or even eliminate a true intervention effect.
For example, investigators of a clinical trial randomly allocated patients to have changes in their level of mechanical ventilator support directed by a computer protocol or according to clinical judgment.16 Because the same physicians and respiratory therapists using the computer protocol were also managing the care of patients not assigned to the protocol, experience with the protocol may have influenced clinicians' management of the control group, thus reducing the effect of the intervention that investigators might have observed had different groups of clinicians been managing each group of patients.
Cluster randomized trials (ie, randomizing groups of physicians) lessen the chance of contamination of the intervention across to the control group, as long as the clusters do not interact. Ensuring lack of interaction may be challenging. For example, trials that involve medical trainees are difficult to manage even with entire hospitals as a cluster because trainees in many systems have rotations in several hospitals.
Recent randomized trials have attempted to use CDSS interventions that encourage shared decision making by ensuring that decision support is available to both clinicians and patients. One such trial that was included in the systematic review1 described in our opening clinical scenario randomized patients rather than clinicians or groups of clinicians.17 The rationale for so doing was that the shared intervention was meant to encourage patients with diabetes to manage their own progress and receive personalized advice on 13 risk factors between visits to their family physician.17 In this case, contamination still may have occurred and would have reduced the differences between the groups.
Imaginative study designs may help deal with contamination. For instance, in a cluster randomized trial, a group of physicians received computerized guidelines for the management of asthma and another group received guidelines for the management of angina.18 Both groups are part of an intervention but for different diseases, so they may be less likely to pay attention to the other management area for which they serve as a control.
Aside From the Experimental Intervention, Were the Groups Treated Equally?
All CDSS interventions are complex interventions.4 A CDSS may have a positive influence for unintended reasons. For example, some may be based on the use of structured data collection forms (checklist effect) or performance evaluations (audit and feedback effect).19,20 Moreover, a CDSS has multiple components that investigators should describe. For example, ad hoc, unique, locally developed systems are particularly difficult to evaluate without a description of the intervention components. Some have suggested that reports of analyses of CDSS should include figures that show CDSS screenshots, descriptions of CDSS features and functions, and CDSS algorithms and source code.21 This may be useful for reproducibility and generalizability and for addressing the possibility of cointervention (ie, interventions associated with but separate from the CDSS). For example, consider a hypothetical report of a venous thrombosis CDSS that does not inform the reader that positive ultrasonography reports always triggered a telephone consultation with a thrombosis specialist. This important cointervention information would have been helpful in understanding the effect of the CDSS itself rather than other associated aspects of the intervention.
The results of studies that evaluated interventions aimed at therapy or prevention are more believable if patients, their caregivers, and the study personnel are blind to the treatment (see Chapter 7, Therapy [Randomized Trials]). Blinding also diminishes the placebo effect, which in the case of CDSSs may include the tendency of clinicians and patients to ascribe undeserved positive or negative attributes to the use of a computer workstation. Although blinding the clinicians and patients may not be possible, study personnel collecting outcome information usually can—and those analyzing the results always can—be blinded to group allocation. Blinding of the outcome assessment is important to prevent subjective interpretations of data collected that may unduly favor one group over another.20 Lack of blinding can result in bias if interventions other than the one under scrutiny are differentially applied to the treatment and control groups, particularly if clinicians are permitted to use, at their discretion, effective treatments not included in the study. Investigators can ameliorate concerns regarding lack of blinding if they report details of the intervention and cointerventions.
Cluster randomized trials that involve unblinded clinicians and patients risk differential loss to follow-up. Once clinicians and patients learn that they are part of the control group, even if the control group is arranged as delayed access to the intervention, a loss of interest and subsequent unwillingness to participate may occur and bias the results of any study.22
Were Outcomes Assessed Uniformly in the Experimental and Control Groups?
In some studies, the computer system may be used as a data collection tool to evaluate the outcome in the CDSS group. Using the information system to log episodes in the treatment group and using a manual system in the non-CDSS group can create a data completeness bias.19 If the computer logs more episodes than the manual system, it may appear that the CDSS group had more events, which could bias the outcome for or against the CDSS group. To prevent this bias, investigators should collect and measure outcomes similarly in both groups.
USING THE GUIDE
As outlined above, the systematic review and meta-analysis addressing whether a CDSS can improve outcomes in patients with diabetes was judged to be credible1 (see Chapter 22, The Process of a Systematic Review and Meta-analysis, and Chapter 23, Understanding and Applying the Results of a Systematic Review and Meta-analysis). It included 15 trials that involved 35 557 patients with all types and severity levels of diabetes. Most of the trials assessed compared a CDSS to usual care, 10 of the trials used a cluster randomization, and most concealed allocation, but blinding and follow-up varied. Four of the trials occurred before the year 2000, so they may not be generalizable to current information-technology standards. In terms of clinical outcomes, 2 studies examined hospitalization rates and 3 measured quality of life. Process outcomes (checking hemoglobin A1C [HbA1C], blood pressure, or cholesterol) could not be pooled because of heterogeneity. The systematic review authors note an overall lack of high-quality trials in this area in that only 1 trial was judged to be at low risk of bias.1
One of the most recent trials in the systematic review was a cluster randomized trial that compared 4 groups.23 One of the intervention groups in this trial involved patient coaching using mobile CDSS software, Web portal, and telephone access to diabetes educators for patients and primary care clinician decision support that linked the patient-provided data to guidelines.23 The primary outcome was change in HbA1C at 12 months compared with the usual care group. The report provided no documentation about blinding of final outcome collection or assessment. The investigators used mixed-effects modeling to account for within-practice clustering in the analysis. The authors also discuss additional sensitivity analyses to examine the effect of missing data. The intervention package is described in some detail in the text, but screenshots are available only in supplementary files, and algorithms or codes are not provided. In addition, although deploying complex interventions that involve information and communication technology often has significant unforeseen challenges, the authors do not discuss this important issue.
Of 71 physician practices identified, 26 enrolled. Within these 26 practices, 2602 patients were eligible, of whom only 213 enrolled, and 163 were included in the analysis: 62 patients in the intervention group and 56 patients in the control group, as discussed above. The investigators mention imputation techniques to deal with the considerable amount of missing data but do not discuss what influence they had on the results. Similarly, despite reminders to have HbA1C measured at the 12-month end point, it is likely that patients in the intervention group, which was more closely monitored, were more likely to have usable HbA1C test results than the usual care group.