Clinicians often encounter patients who face potentially harmful exposures to either medical interventions or environmental agents. These circumstances give rise to common questions: Do cell phones increase the risk of brain tumors? Do vasectomies increase the risk of prostate cancer? Do changes in health care policies (eg, activity-based funding) lead to harmful health outcomes? When examining these questions, clinicians and administrators must evaluate the risk of bias, the strength of the association between the assumed cause and the adverse outcome, and the relevance to patients in their practice or domain.
In answering any clinical question, our first goal should be to identify whether there is an existing systematic review of the topic that can provide a summary of the highest-quality available evidence (see the Summarizing the Evidence section). Interpreting such a review requires an understanding of the rules of evidence for individual or primary studies, randomized clinical trials (RCTs), and observational studies. The tests for judging the risk of bias associated with results of observational studies will help you decide whether exposed and control groups (or cases and controls) began and completed the study with sufficient similarities that we can obtain a minimally biased assessment of the influence of exposure on outcome (see Chapter 6, Why Study Results Mislead: Bias and Random Error).
Randomized clinical trials provide less biased estimates of potentially harmful effects than other study designs because randomization is the best way to ensure that groups are balanced with respect to known and unknown determinants of the outcome (see Chapter 7, Therapy [Randomized Trials]). Although investigators conduct RCTs to determine whether therapeutic agents are beneficial, they also should look for harmful effects and may sometimes make surprising discoveries about the adverse effects of the intervention on their primary outcomes (see Chapter 11.2, Surprising Results of Randomized Trials).
There are 4 reasons why RCTs may not be helpful for determining whether a putative harmful agent truly has deleterious effects. First, we may consider it unethical to randomize patients to exposures that might result in harmful effects without benefit (eg, smoking).
Second, we are often concerned about rare and serious adverse effects that may become evident only after tens of thousands of patients have consumed a medication for a period of years. For instance, even a very large RCT failed to detect an association between clopidogrel and thrombotic thrombocytopenic purpura,3 which appeared in a subsequent observational study.4 Randomized clinical trials that address adverse effects may be feasible for adverse event rates as low as 1%,5,6 but the RCTs needed to explore harmful events occurring in fewer than 1 in 100 exposed patients are logistically difficult and often prohibitively expensive because of the huge sample size and lengthy follow-up required. Meta-analyses may be helpful when the event rates are very low.7 However, availability of large-scale evidence on specific harms in systematic reviews is not common. For example, in a report of nearly 2000 systematic reviews, only 25 had large-scale data on 4000 or more randomized participants regarding well-defined harms that might be associated with the interventions under study.8
Third, RCT duration of follow-up is limited, yet not infrequently we are interested in knowing effects years, or even decades, after the exposure (eg, long-term consequences of chemotherapy in childhood).9
Fourth, even when events are sufficiently frequent and occur during a time frame feasible for RCTs to address, study reports often fail to adequately provide information on harm.10
Given that clinicians will not find RCTs to answer most questions about harm, they must understand the alternative strategies used to minimize bias. This requires a familiarity with observational study designs (Table 14-1).
Directions of Inquiry and Key Methodologic Strengths and Weaknesses for Different Study Designs
There are 2 main types of observational studies: cohort and case-control. In a cohort study, investigators identify exposed and nonexposed groups of patients, each a cohort, and then follow them forward in time, monitoring the occurrence of outcomes of interest in an attempt to identify whether there is an association between the exposure and the outcomes. The cohort design is similar to an RCT but without randomization; rather, the determination of whether a patient received the exposure of interest results from the patient's or investigator's preference or from happenstance.
Case-control studies also assess associations between exposures and outcomes. Rare outcomes or those that take a long time to develop can threaten the feasibility not only of RCTs but also of cohort studies. The case-control study provides an alternative design that relies on the initial identification of cases—that is, patients who have already developed the target outcome—and the selection of controls—persons who do not have the outcome of interest. Using case-control designs, investigators assess the relative frequency of previous exposure to the putative harmful agent in the cases and the controls.
For example, in addressing the impact of nonsteroidal anti-inflammatory drugs (NSAIDs) on clinically apparent upper gastrointestinal tract hemorrhage, investigators needed a cohort study to deal with the problem of infrequent events. Bleeding among those taking NSAIDs has been reported to occur approximately 1.5 times per 1000 person-years of exposure, in comparison with 1.0 per 1000 person-years in those not taking NSAIDs.11 Because the event rate in unexposed patients is so low (0.1%), an RCT to study an increase in risk of 50% would require huge numbers of patients (sample size calculations suggested approximately 75000 patients per group) for adequate power to test the hypothesis that NSAIDs cause the additional bleeding.12 Such an RCT would not be feasible, but a cohort study, in which the information comes from a large administrative database, would be possible.
Cohort studies may be prospective or retrospective. In prospective cohort studies, the investigator enrolls patients or participants, starts the follow-up, and waits for the outcomes (events of interest) to occur. Such studies may take many years to complete, and thus they are difficult to conduct. An advantage, however, is that the investigators can plan how to monitor patients and collect data.
In retrospective cohort studies, the data regarding both exposures and outcomes have been previously collected; the investigator obtains the data and determines whether participants with and without the outcome of interest have been exposed to the putative causal agent or agents. These studies are easier to perform because they depend on the availability of data on exposures and outcomes that have already happened. On the other hand, the investigator has less control over the quality and relevance of the available data. In the end, clinicians need not pay too much attention to whether studies are prospective or retrospective but should instead focus on the risk of bias criteria in Box 14-1.
In a Cohort Study, Aside From the Exposure of Interest, Did the Exposed and Control Groups Start and Finish With the Same Risk for the Outcome?
Were Patients Similar for Prognostic Factors That Are Known to Be Associated With the Outcome (Or Did Statistical Adjustment Level Address This Imbalance)?
Cohort studies will yield biased results if the group exposed to the putative harmful agent and the unexposed group begin with additional differences in baseline characteristics that give them a different prognosis (ie, a different risk of the target outcome) and if the analysis fails to deal with this imbalance.For instance, in the association between NSAIDs and the increased risk of upper gastrointestinal tract bleeding, age may be associated with exposure to NSAIDs and gastrointestinal bleeding. In other words, because patients taking NSAIDs will be older and because older patients are more likely to bleed, this variable makes attribution of an increased risk of bleeding to NSAID exposure problematic. When a variable with prognostic power differs in frequency in the exposed and unexposed cohorts, we refer to the situation as confounding.
There is no reason that patients who self-select (or who are selected by their physicians) for exposure to a potentially harmful agent should be similar to the nonexposed patients with respect to important determinants of the harmful outcome. Indeed, there are many reasons to expect they will not be similar. Physicians are appropriately reluctant to prescribe medications they perceive will put their patients at risk.
In one study, 24.1% of patients who were given a then-new NSAID, ketoprofen, had received peptic ulcer therapy during the previous 2 years compared with 15.7% of the control population.13 The likely reason is that the ketoprofen manufacturer succeeded in persuading clinicians that ketoprofen was less likely to cause gastrointestinal bleeding than other agents. A comparison of ketoprofen to other agents would be subject to the risk of finding a spurious increase in bleeding with the new agent (compared with other therapies) because higher-risk patients would have been receiving the ketoprofen. This bias may be referred to as a selection bias or a bias due to confounding by indication.
The prescription of benzodiazepines to elderly patients provides another example of the way that selective physician prescribing practices can lead to a different distribution of risk in patients receiving particular medications, sometimes referred to as the channeling bias.14 Ray et al15 found an association between long-acting benzodiazepines and risk of falls (relative risk [RR], 2.0; 95% confidence interval [CI], 1.6–2.5) in data from 1977 to 1979 but not in data from 1984 to 1985 (RR, 1.3; 95% CI, 0.9–1.8). The most plausible explanation for the change is that patients at high risk for falls (those with dementia) selectively received these benzodiazepines during the earlier period. Reports of associations between benzodiazepine use and falls led to greater caution, and the apparent association disappeared when physicians began to avoid using benzodiazepines in those at high risk of falling.
Therefore, investigators must document the characteristics of the exposed and nonexposed participants and either demonstrate their comparability (very unusual in cohort studies) or use statistical techniques to adjust for these differences. Effective adjusted analyses for prognostic factors require the accurate measurement of those prognostic factors. For prospective cohorts, the investigators may take particular care of the quality of this information. For retrospective databases, however, one has to make use of what is available. Large administrative databases, although providing a sample size that may allow ascertainment of rare events, often have limited quality of data concerning relevant patient characteristics, health care encounters, or diagnoses.For example, in a cross-sectional study designed to measure the accuracy of electronic reporting of care practices compared with manual review, electronic reporting significantly underestimated rates of appropriate asthma medication and pneumococcal vaccination and overestimated rates of cholesterol control in patients with diabetes.16
Even if investigators document the comparability of potentially confounding variables in exposed and nonexposed cohorts, and even if they use statistical techniques to adjust for differences, important prognostic factors that the investigators do not know about or have not measured may be unbalanced between the groups and thus may be responsible for differences in outcome. We call this residual confounding.
Returning to our earlier example, it may be that the illnesses that require NSAIDs, rather than the NSAIDs themselves, contribute to the increased risk of bleeding. Thus, the strength of inference from a cohort study will always be less than that of a rigorously conducted RCT.
Were the Circumstances and Methods for Detecting the Outcome Similar?
In cohort studies, ascertainment of outcome is the key issue. For example, investigators have reported a 3-fold increase in the risk of malignant melanoma in individuals who work with radioactive materials. One possible explanation for some of the increased risk might be that physicians, concerned about a possible risk, search more diligently and therefore detect disease that might otherwise go unnoticed (or they may detect disease at an earlier point). This could result in the exposed cohort having an apparent, but spurious, increase in risk—a situation known as surveillance bias.18
The choice of outcome may partially address this problem. In one cohort study, for example, investigators assessed perinatal outcomes among infants of men exposed to lead and organic solvents in the printing industry by means of a cohort study that assessed all of the men who had been members of the printers' unions in Oslo, Norway.19 The investigators used job classification to categorize the fathers as being exposed to lead and organic solvents or not exposed to those substances. Investigators' awareness of whether the fathers had been exposed to the lead or solvents might bias their assessment of the infant's outcome for minor birth defects or defects that required special investigative procedures. On the other hand, an outcome such as preterm birth would be unlikely to increase simply as a result of detection bias (the tendency to look more carefully for an outcome in one of the comparison groups) because prior knowledge of exposure is unlikely to influence whether an infant is considered preterm or not. The study found that exposure was associated with an 8-fold increase in preterm births but no increase in birth defects, so detection bias was not an issue for the results that were obtained in this study.
Was the Follow-up Sufficiently Complete?
As we pointed out in Chapter 7, Therapy (Random-ized Trials), loss to follow-up can introduce bias because the patients who are lost may have different outcomes from those patients still available for assessment. This is particularly problematic if there are differences in follow-up between the exposed and nonexposed groups.
For example, in a well-executed study,20 investigators determined the vital status of 1235 of 1261 white men (98%) employed in a chrysotile asbestos textile operation between 1940 and 1975. The RR for lung cancer death over time increased from 1.4 to 18.2 in direct proportion to the cumulative exposure among asbestos workers with at least 15 years since first exposure. In this study, the 2% missing data were unlikely to affect the results, and the loss to follow-up did not threaten the strength of the inference that asbestos exposure caused lung cancer deaths.
Case-control studies are always retrospective in design. The outcomes (events of interest) have already happened and participants are designated to 1 of 2 groups: those with the outcomes (cases) and those where the outcome is absent (controls). Retrospectively, investigators ascertain prior exposure to putative causal agents. This design entails inherent risks of bias because exposure data require memory and recall or are based on a collection of data that were originally accumulated for purposes other than the intended study.
In a Case-Control Study, Did the Cases and Control Group Have the Same Risk (Chance) for Being Exposed in the Past?
Were Cases and Controls Similar With Respect to the Indication or Circumstances That Would Lead to Exposure (or Did Matching or Statistical Adjustment Address the Imbalance)?
As with cohort studies, case-control studies are susceptible to unmeasured confounding. For instance, in looking at the association between use of β-agonists and mortality among patients with asthma, investigators need to consider—and match or adjust for—previous hospitalization and use of other medications to avoid confounding by disease severity. Patients who use more β-agonists may have more severe asthma, and this severity, rather than β-agonist use, may be responsible for increased mortality. As in cohort studies, however, matching and adjustment cannot eliminate the risk of bias, particularly when exposure varies over time. In other words, matching or adjustment for hospitalization or use of other medications may not adequately capture all of the variability in underlying disease severity in asthma. In addition, the adverse lifestyle behaviors of patients with asthma who use large amounts of β-agonists could be the real explanation for the association.
To further illustrate the concern about unmeasured confounding, consider the example of a case-control study that was designed to assess the association between diethylstilbestrol ingestion by pregnant women and the development of vaginal adenocarcinomas in their daughters many years later.21 An RCT or prospective cohort study designed to test this cause-and-effect relationship would have required at least 20 years from the time when the association was first suspected until the completion of the study. Furthermore, given the infrequency of the disease, an RCT or a cohort study would have required hundreds of thousands of participants. By contrast, using the case-control strategy, the investigators delineated 2 relatively small groups of young women. Those who had the outcome of interest (vaginal adenocarcinoma) were designated as the cases (n = 8), and those who did not experience the outcome were designated as the controls (n = 32). Then, working backward in time, the investigators determined exposure rates to diethylstilbestrol for the 2 groups. They found a significant association between in utero diethylstilbestrol exposure and vaginal adenocarcinoma, and they found their answer without a delay of 20 years and by studying only 40 women.
An important consideration in this study would be whether the cases could have been exposed to diethylstilbestrol in any special circumstances that would not have affected women in the control group. In this situation, diethylstilbestrol had been prescribed to women at risk for miscarriages or premature births. Could either of these indications be a confounder? Before the introduction of diethylstilbestrol, vaginal adenocarcinoma in young women was uncommon, but miscarriages and premature birth were common. Thus, it would be unlikely that miscarriages and premature births were directly associated with vaginal adenocarcinoma, and in the absence of such an association, neither could be a confounder.
In another study, investigators used a case-control design relying on computer-record linkages between health insurance data and a drug insurance plan to investigate the possible association between use of β-adrenergic agonists and mortality rates in patients with asthma.22 The database for the study included 95% of the population of the province of Saskatchewan, Canada. The investigators selected 129 patients who had experienced a fatal or near-fatal asthma attack to serve as cases and used a matching process to select another 655 patients who also had asthma but who had not had a fatal or near-fatal asthma attack to serve as controls.
The tendency of patients with more severe asthma to use more β-adrenergic medications could create a spurious association between drug use and mortality rate. The investigators attempted to control for the confounding effect of disease severity by measuring the number of hospitalizations in the 24 months before death (for the cases) or before the index date of entry into the study (for the control group) and by using an index of the aggregate use of medications. They found an association between the routine use of large doses of β-adrenergic agonists through metered-dose inhalers and death from asthma (odds ratio [OR], 2.6 per canister of inhaler per month; 95% CI, 1.7–3.9), even after correcting for measures of disease severity.
Were the Circumstances and Methods for Determining Exposure Similar for Cases and Controls?
In case-control studies, ascertainment of the exposure is a key issue. However, if case patients have a better memory for exposure than control patients, the result will be a spurious association.
For example, a case-control study found a 2-fold increase in risk of hip fracture associated with psychotropic drug use.23 In this study, investigators established drug exposure by examining computerized claim files from the Michigan Medicaid program, a strategy that avoided selective memory of exposure—recall bias—and differential probing of cases and controls by an interviewer—interviewer bias.
Another example was a case-control study that evaluated whether the use of cell phones was associated with an increased risk of motor vehicle crash.24 Suppose the investigators had tried to ask people who had a motor vehicle crash and control patients (who were in no crash at the same day and time) whether they were using their cell phone around the time of interest. People who were in a crash would have been more likely to recall such use because their memory might be heightened by the unfortunate circumstances. This would have led to a spurious association because of differential recall. Alternatively, they might specifically deny the use of a cell phone because of embarrassment or legal concerns, thus obscuring an association. Therefore, the investigators in this study used a computerized database of cell phone use instead of patient recall.24 Moreover, the investigators used each person in a crash as his or her own control. The time of the crash was matched against corresponding times of the life of the same person when they were driving but when no crash occurred (eg, same time driving to work). This appropriate design established that use of cell phones was associated with an increased risk of having a motor vehicle crash.
Not all studies have access to unbiased information on exposure. For instance, in a case-control study of the association between coffee and pancreatic cancer, the patients with cancer may be more motivated to identify possible explanations for their problem and provide a greater recounting of coffee use.25 Also, if the interviewers are not blinded to whether a patient is a case or a control patient, the interviewer may probe deeper for exposure information from cases. In this particular study, there were no objective sources of data regarding exposure. Recall or interviewer bias might have explained the apparent association.
As it happened, another bias provided an even more likely explanation for what turned out to be a spurious association. The investigators chose control patients from the practices of the physicians treating the patients with pancreatic cancer. These control patients had a variety of gastrointestinal problems, some of which were exacerbated by coffee ingestion. The control patients had learned to avoid coffee, which explains the investigators' finding of an association between coffee (which the patients with pancreatic cancer consumed at general population levels) and pancreatic cancer. Subsequent investigations, using more appropriate controls, refuted this association.26
In addition to a biased assessment of exposure, random error in exposure ascertainment is also possible. In random misclassification, exposed and unexposed patients are misclassified, but the rates of misclassification are similar in cases and controls. Such nondifferential misclassification dilutes any association (ie, the true association will be larger than the observed association). Fortunately, unless the misclassification is extremely large, the reduction in the true association will not be important.
What Is the Risk of Bias in Cross-sectional Studies?
Like the cohort and the case-control study, the cross-sectional study is also an observational study design. Like a cohort study, a cross-sectional study is based on an assembled population of exposed and unexposed participants. However, in the cross-sectional study, the exposure and the existing or prevalent outcome are measured at the same point in time. Accordingly, the direction of association may be difficult to determine. Another important limitation is that the outcome or the threat of experiencing an adverse outcome may have led patients assigned as cases to leave the study, so a measure of association may be biased against the association. However, cross-sectional studies are relatively inexpensive and quick to conduct and may be useful in generating and exploring hypotheses that will be subsequently investigated using other observational designs or RCTs.
What Is the Risk of Bias in Case Series and Case Reports?
Case series (descriptions of a series of patients) and case reports (descriptions of individual patients) do not provide any comparison groups, so it is impossible to determine whether the observed outcome would likely have occurred in the absence of the exposure. Although descriptive studies have been reputed to have significant findings mandating an immediate change in clinician behavior, this is rarely justified, and without availability of evidence from stronger study designs, there are potentially undesirable consequences when actions are taken in response to evidence warranting very low confidence. Recall the consequences of case reports of specific birth defects occurring in association with thalidomide exposure.27
Consider the case of the drug Bendectin (a combination of doxylamine, pyridoxine, and dicyclomine used as an antiemetic in pregnancy), whose manufacturer withdrew it from the market as a consequence of case reports suggesting that it was teratogenic.28 Later, although a number of comparative studies reported the drug's relative safety,29 they could not eradicate the prevailing litigious atmosphere—which prevented the manufacturer from reintroducing Bendectin. Thus, many pregnant women who might have benefited from the drug's availability were denied the symptomatic relief it could have offered.
For some interventions, registries of adverse events may provide the best possible initial evidence. For example, there are vaccine registries that record adverse events among people who have received the vaccine. These registries may signal problems with a particular adverse event that would be very difficult to capture from prospective studies limited by sample sizes that were too small. Even retrospective studies might be too difficult to conduct if people who receive the vaccine differ substantially from those who do not and if adjustment or matching cannot deal with the differences. In this situation, investigators might conduct a before-after study using the general population before the introduction of the new vaccine occurred. Such comparisons using historical controls are, however, prone to bias because many other factors may have changed in the same period. If, however, changes in the incidence of an adverse event are very large, the signal may be real. An example is the clustering of intussusception cases among children receiving a particular type of rotavirus vaccine,30 resulting in a decision to withdraw the vaccine. The association was subsequently supported by a case-control study.31 Eventually, another type of rotavirus vaccine was developed that did not cause this adverse event.
In general, clinicians should not draw conclusions about relationships from case series, but rather, they should recognize that the results may generate questions, or even hypotheses, that clinical investigators can address with studies that have optimal safeguards against risk of bias. When the immediate risk of exposure outweighs the benefits (and outweighs the risk of stopping an exposure), the clinician may have to make a management decision with less than optimal data.
How Serious Is the Risk of Bias: Summary
Just as it is true for the resolution of questions of therapeutic effectiveness, clinicians should first look to RCTs to resolve issues of harm. They will often be disappointed in the search and must make use of studies of weaker design. Regardless of the design, however, they should look for an appropriate control population. For cohort studies, the control group should have a similar baseline risk of outcome, or investigators should have used statistical techniques to adjust for differences. In case-control studies, the cases and the controls should have had a similar opportunity to have been exposed, so that if a difference in exposure is observed, one might legitimately conclude that the association could be due to a link between the exposure and the outcome and not due to a confounding factor. Nevertheless, investigators should routinely use statistical techniques to match cases and controls or adjust for differences.
Even when investigators have taken all of the appropriate steps to minimize bias, clinicians should bear in mind that residual differences between groups still may bias the results of observational studies.32 Because evidence, clinician preferences, and patient values and preferences determine the use of interventions in the real world, exposed and unexposed patients are likely to differ in prognostic factors.
USING THE GUIDE
Returning to our earlier discussion, the study that we retrieved investigating the association between soy milk (or soy formula) and the development of peanut allergy used a case–control design.1 Those with peanut allergy (cases) appeared to be similar to controls with respect to the indication or circumstances leading to soy exposure, but there were a few potentially important imbalances. In the peanut allergy group, a family history of peanut allergy and an older sibling with a history of milk intolerance were more common and could have biased the likelihood of a subsequent child's being exposed to soy. To avoid confounding, the investigators conducted an adjusted analysis.
The methods for determining exposure were similar for cases and controls because the data were collected with the interviewers and parents unaware of the hypothesis that related soy exposure to peanut allergy (thus avoiding interviewer bias and perhaps recall bias). With regard to access to soy, all of the children came from the same geographic region, although this does not ensure that cultural and economic factors that might determine soy access were similar in cases and controls. Overall, protection from risk of bias seemed adequate.