Reliance on surrogate end points may be beneficial or harmful. On the one hand, use of the surrogate end point may lead to rapid and appropriate access to new treatments. For example, the decision and practice of the US Food and Drug Administration to approve new antiretroviral drugs based on information from trials using surrogate end points recognized the need for effective therapies for patients with human immunodeficiency virus (HIV) infection. The first generation of protease inhibitors proved effective in RCTs that focused on patient-important outcomes.5 More recent trials of antiretroviral drugs from different classes have found effects on surrogate markers of HIV infection, whereas results from cohort studies suggest associated reduction of AIDS and AIDS-related morbidity.6
On the other hand, reliance on surrogate end points can be misleading and thus result in excess morbidity and mortality. For instance, flosequinan, milrinone, ibopamine, vesnarinone, and xamoterol all improve surrogate outcomes of hemodynamic function in ambulatory patients with heart failure, but RCTs have found that each of these agents leads to excess mortality (see Chapter 11.2, Surprising Results of Randomized Trials).
How are clinicians to distinguish between valid surrogate markers—those in which a therapy-induced improvement in the surrogate consistently predicts an improvement in a patient-important outcome—and those of questionable validity? The approach described in this chapter to critically appraise studies using surrogate end points will help clinicians apply the results of studies that use surrogate end points to the management of individual patients.
Crucial to this determination, clinicians need to assess more than a single study to decide on the adequacy of a surrogate end point. Evaluation may require a systematic review, preferably with a meta-analysis, of observational studies of the association between the surrogate end point and the target end point, along with the much more important review of some or all of the RCTs that have evaluated treatment effect on both end points. Although most clinicians will not have the time to conduct such an investigation, our guidelines will allow them to evaluate the arguments of experts—or those of the pharmaceutical industry—for prescribing treatments on the basis of their effect on surrogate end points. Our guides, as presented in Box 13.4-1, bear directly on criteria that help clinicians judge whether they can trust results from trials that focus on surrogate end points.
Users' Guide for a Surrogate End Point Trial
|Favorite Table|Download (.pdf) BOX 13.4-1
Users' Guide for a Surrogate End Point Trial
Is the surrogate valid?
Is there a strong, independent, consistent association between the surrogate outcome and the patient-important outcome?
Have randomized trials of the same drug class shown that improvement in the surrogate end point has consistently led to improvement in the patient-important outcome?a
Have randomized trials of different drug classes shown that improvement in the surrogate end point has consistently led to improvement in a patient-important outcome?a
What are the results?
How can I apply the results to patient care?
Is There a Strong, Independent, Consistent Association Between the Surrogate Outcome and the Patient-Important Outcome?
To function as a valid substitute for an important target outcome, the surrogate end point must be associated with that target outcome. Often, researchers choose surrogate end points because they have found a correlation between a surrogate outcome and a target outcome in observational studies. Their understanding of biologic characteristics gives them confidence that changes in the surrogate will lead to changes in the important outcome. The stronger the association, the more likely it is that there is a link between the surrogate and the target. The strength of an association is reflected in statistical measures such as the relative risk (RR) or the odds ratio (OR) (see Chapter 9, Does Treatment Lower Risk? Understanding the Results).
Many biologically plausible surrogates are associated only weakly with patient-important outcomes. For example, measures of respiratory function in patients with chronic lung disease—or conventional exercise tests in patients with heart and lung disease—are correlated weakly with the capacity to undertake activities of daily living.7,8 When correlations are low, the surrogate is likely to be a poor substitute for the target outcome.
In addition to the strength of the association, one's confidence in the validity of the association depends on whether it is consistent across different studies and after adjustment for known confounding variables. For example, ecologic studies such as the Seven Countries Study9 suggested a strong correlation between serum cholesterol levels and coronary heart disease mortality even after adjusting for other predictors such as age, smoking, and systolic blood pressure. When a surrogate is associated with an outcome after adjusting for multiple other potential prognostic factors, the association is an independent association—although that does not necessarily mean it is causal (see Chapter 15.1, Correlation and Regression). Subsequent large observational studies have confirmed the association between cholesterol and coronary disease mortality in individuals from all continents.10
Similarly, cohort studies have consistently revealed that a single measurement of plasma viral load predicts the subsequent risk of AIDS or death in patients with HIV infection.11-17 For example, in one study, the proportions of patients who progressed to AIDS after 5 years in the lowest through the highest quartiles of viral load were 8%, 26%, 49%, and 62%, respectively.17 Moreover, this association retained its predictive power after adjustment for other potential predictors, such as CD4 cell count.11-16 Such strong, consistent, independent associations establish a measure as a potentially useful surrogate.
USING THE GUIDE
For the patient with type 2 diabetes, the question is whether we can substitute HbA1c for the patient-important outcome of a cardiovascular event (fatal or nonfatal myocardial infarction or stroke). Ideally, to establish the association between HbA1c and having a cardiovascular event in type 2 diabetes would require a large cohort in which patients with type 2 diabetes have been followed up from the onset of diabetes to—for some patients—the development of a cardiovascular event. Large cohort studies do reveal an association between the extent of blood glucose control and macrovascular complications.18,19 In a meta-analysis of 13 cohort studies, an absolute increase in HbA1c of 1% was consistently associated with an increased risk for a cardiovascular event of approximately 18% (RR, 1.18; 95% CI, 1.10-1.26).20 Thus, there is good evidence for an independent and consistent association between increases in HbA1c and unfavorable cardiovascular outcomes.
Have Randomized Trials of the Same Drug Class Shown That Improvement in the Surrogate End Point Has Consistently Led to Improvement in Patient-Important Outcomes?
Meeting the first criterion—a strong, independent association between the surrogate and the patient-important outcome—is necessary, but it is not sufficient to support reliance on a surrogate outcome. Not only must the surrogate outcome be in the causal pathway of the disease process, but also we must be confident that any change in the surrogate with treatment captures all critical effects on patient-important outcomes.3 This condition will fail if our understanding of the surrogate is limited (eg, the relation is causal in one circumstance but not in the context of the treatment under consideration) or if the treatment either positively or negatively affects morbidity or mortality independent of its effect on the surrogate. Clinical trial history is full of examples of drugs and surgical therapies that had a striking, apparently beneficial effect on a surrogate strongly and independently associated with a patient-important outcome but failed to improve that outcome when tested in RCTs—or indeed, made the outcome worse (see Chapter 11.2, Surprising Results of Randomized Trials).
Because surrogates are so seductive, we review 2 additional striking examples.
Higher levels of high-density lipoprotein cholesterol (HDL-C) are strongly, independently, and consistently associated with a lower incidence of myocardial infarction and cardiovascular death. It logically follows that a drug that increases HDL-C will reduce cardiovascular events. Torcetrapib achieved the desired effect on HDL-C but, despite the satisfactory effect on the surrogate outcome, increased the number of deaths.21
Class I antiarrhythmic agents22 effectively prevented ventricular ectopic beats that are strongly associated with adverse prognosis in patients with myocardial infarction.23 In this case, the clinical community did not wait for the RCTs, and the drugs were widely used in clinical practice. When finally—with considerable delay—an RCT was launched to evaluate the effect of the drugs on morbidity and mortality, the agents increased mortality.24 Inappropriate reliance on the surrogate end point of suppression of nonlethal arrhythmias is likely to have led to the deaths of thousands of patients.
Before offering an intervention on the basis of effects on a surrogate outcome, clinicians should note a consistent association between surrogate and patient-important outcomes in RCTs. Clinicians are in a stronger position to trust surrogate end points if a new drug belongs to a class of drugs in which RCTs have verified a strong association between surrogate end point and target outcome for all drugs of that class. For example, several large trials of primary and secondary prevention of coronary heart disease with statins have found reductions in adverse cardiovascular outcomes (although even here the results are not completely consistent—see Chapter 28.4, Understanding Class Effects).25,26 With some hesitation, we may therefore assume a class effect—that is, that a new statin such as rosuvastatin with a similar or even more powerful LDL-C–lowering potency also may reduce patient-important outcomes. Even putting aside reservations regarding the consistency of the association, however, the recent experience in observational studies of a 10-fold increase in severe rhabdomyolysis associated with cerivastatin, another statin that had been approved solely on the basis of its lipid-lowering activities,27 reminds us that reliance on a surrogate for benefit still leaves the issue of toxicity open to serious question.
We would, for 2 reasons, be reluctant to easily generalize these results to another class of lipid-lowering agents. First, the biologic association between the surrogate outcome and the patient-important outcome that exists with one class of agents may not exist with another. For example, bone density is consistently and independently associated with fracture reduction. Furthermore, increased bone density appears to be an important mechanism of fracture reduction with one class of antiosteoporosis drugs, bisphosphonates. Sodium fluoride therapy, however, resulted in treatment-induced increased bone density but to an increase, not decrease, in fractures.28,29 Generalizing the relation between bone density and fractures with bisphosphonates to another class of drugs, as this example shows, would be a serious mistake.30
There is a second reason to hesitate to generalize the association of change in the surrogate outcome and change in the patient-important outcome in one class of drugs to a second class. There may be effects of an agent unrelated to those mediated by the surrogate that influence the patient-important outcome. Consider, for instance, trials that found that a class of anticholesterol agents (fibrates) produced a significant reduction of myocardial infarction but an increased risk of mortality from other causes (gastrointestinal disease) that counteracted this benefit and led to no effect on overall mortality.25
This criterion is complicated by various interpretations of the term “drug class.” A manufacturer will naturally argue for a broad definition of “class” when its drug fits in a class of agents with a consistent positive association between surrogate and target end points (such as β-blockers in patients who have sustained a myocardial infarction or ACE inhibitors for preventing progression of proteinuric kidney disease). If substances are related to drugs with known or suspected adverse effects on target events (eg, clofibrate or some cyclooxygenase 2 inhibitors), manufacturers are more likely to argue that the chemical or physiologic connection is not sufficiently close for the new drug to be relegated to the same class as the harmful agent (see Chapter 28.4, Understanding Class Effects).
In any case, if there is no relevant evidence that plausibly comes from other drugs of a new class, clinicians must rely on evidence on the association between the surrogate end point and target outcome from between-class comparisons. Inferences from such evidence will be substantially weaker than within-class evidence; conservative, wise practice would delay use of the new drug until evidence of an effect on patient-important outcomes is available.
USING THE GUIDE
Returning to the opening scenario, we have established from observational studies that HbA1c holds the characteristics of a potentially reliable surrogate marker for cardiovascular events. However, there is no evidence that addresses the effect on patient-important outcomes in RCTs comparing GLP-1 agonists and other antidiabetic drugs as an adjunct to metformin. In meta-analyses of RCTs with add-on therapy to metformin, long-acting exenatide and liraglutide led to a greater reduction in HbA1c and body weight in patients with type 2 diabetes with poor HbA1c control than the add-on therapy with sitagliptin, a DPP-4 inhibitor, or pioglitazone.1,31 Therefore, as the next step, you have to examine the consistency of evidence for a class effect of HbA1c and cardiovascular end points in other contexts.
Have Randomized Trials of Different Drug Classes Shown That Improvement in the Surrogate End Point Has Consistently Led to Improvement in the Patient-Important Outcome?
When evidence on patient-important outcomes within a new class of drugs—as in our example of GLP-1 agonists—is lacking, we must examine the consistency of the change in a surrogate and a patient-important end point across drug classes.
We have already presented the example of the inconsistent association between changes in bone density and fracture reduction in osteoporosis with bisphosphonates vs sodium fluoride.28 The treatment of heart failure provides a second instructive example. Trials of ACE inhibitors in patients with heart failure have revealed parallel increases in exercise capacity32-35 and reduction in mortality,36 suggesting that clinicians may be able to rely on exercise capacity as a valid surrogate. Milrinone (a phosphodiesterase inhibitor37) and epoprostenol (a prostaglandin38), drugs from different classes, have improved exercise tolerance in patients with symptomatic heart failure. When these drugs were evaluated in RCTs, however, both increased cardiovascular mortality—which in the first case was statistically significant39 and in the second case led to the trial stopping early.40 Thus, exercise tolerance is inconsistent in predicting improved mortality and is therefore an invalid substitute.
Other suggested surrogate end points in patients with heart failure have included ejection fraction, heart rate variability, and markers of autonomic function.41 The dopaminergic agent ibopamine positively influences all 3 surrogate end points, yet an RCT found that the drug increases mortality in patients with heart failure, mainly due to ibopamine-induced tachyarrhythmias.42 Again, the evidence indicates that we cannot rely on ejection fraction, heart rate variability, or markers of autonomic function as trustworthy surrogates.
There is, however, at least one example of appropriate surrogates that have produced consistent effects of within-class and between-class changes for surrogate and patient-important outcomes. For instance, therapy trials in patients with HIV have consistently found that modification of CD4 cell count and complete suppression of HIV-1 RNA plasma viral load are associated with improved patient-important outcomes (Table 13.4-1). Trials that compare different classes of antiretroviral therapies have found that patients randomized to more potent drug regimens had higher CD4 cell counts and higher rates of suppression of HIV-1 viral load and were less likely to progress to AIDS or death.5,43 Subsequently conducted large cohort studies that investigated new, different antiretroviral drugs have found substantial reductions in AIDS and AIDS-related morbidity.44 Even though there is no guarantee that the next trial that uses a different class of drugs will reveal the same pattern, these results strengthen our confidence that, for example, a new integrase inhibitor for HIV infection that increases the CD4 cell count and effectively suppresses HIV-1 viral load will result in a reduction in AIDS-related morbidity and mortality.
We must bear in mind, however, that convincing evidence of the validity of the surrogate does not obviate concern about initially inapparent long-term drug toxicity. For instance, the first-generation protease inhibitors lopinavir and indinavir appear to be associated with an increased risk of myocardial infarction, an association not found with the use of nonnucleoside reverse transcriptase inhibitors.45
USING THE GUIDE
Returning to our opening clinical scenario, aside from insulin, there are now substances from 6 different classes available for the treatment of type 2 diabetes that are known to lower HbA1c levels.46 Results of RCTs that addressed patient-important end points are available for metformin, the thiazolidinediones rosiglitazone and pioglitazone, and for saxagliptin and alogliptin, 2 DPP-4 inhibitors.47,48 These drugs have been compared to different control groups. A systematic review found that monotherapy with metformin was associated with a reduced risk of diabetes-associated and myocardial infarction–related mortality and with a reduced overall mortality.49 In the UK Prospective Diabetes Study, the early add-on therapy of metformin to sulfonylureas, however, was associated with a relative increase in diabetes-related mortality of 96%.50 In a systematic review of RCTs that added metformin to insulin, compared with insulin alone, no improved cardiovascular outcomes in type 2 diabetes were found.51 In another systematic review of RCTs, rosiglitazone was superior in HbA1c reduction compared with the control, but it also was associated with an increased risk of myocardial infarction and congestive heart failure.52 In yet another systematic review with different inclusion criteria, both rosiglitazone and pioglitazone were associated with increased risk for congestive heart failure—suggesting a class effect due to increased fluid retention.53 In 2 large placebo-controlled RCTs, saxagliptin and alogliptin did not reduce the composite of cardiovascular death, myocardial infarction, or ischemic stroke, and saxagliptin was associated with a higher risk of hospitalization due to congestive heart failure.47,48 Supporting the results of these studies with specific drugs, a number of trials of tighter vs less tight glucose control in patients with type 2 diabetes, each of which used different strategies for achieving tight control, found that tight control lowers HbA1c levels with no discernible effect on mortality, cardiovascular mortality, or stroke and only a small relative risk reduction in myocardial infarction. Thus, there is no consistent evidence that changes in HbA1c lead to a consistent reduction in cardiovascular events across different antidiabetic drug classes. Indeed, the evidence suggests the contrary.
In Table 13.4-1, we apply our validity criteria to our example of GLP-1 agonists and a number of more recent controversial examples of the use of surrogate end points.
Selected Controversial Examples of Applied Validity Criteria for the Critical Evaluation of Studies Using Surrogate End Points
|Favorite Table|Download (.pdf) TABLE 13.4-1
Selected Controversial Examples of Applied Validity Criteria for the Critical Evaluation of Studies Using Surrogate End Points
|Types of Intervention ||Surrogate End Point ||Target End Point ||Criterion 1 ||Criterion 2 ||Criterion 3 |
| || || ||Is there a strong, independent, consistent association between the surrogate end point and the clinical end point? ||Is there evidence from randomized trials in the same drug class that improvement in the surrogate end point has consistently led to improvement in the target outcome? ||Is there evidence from randomized trials in other drug classes that improvement in the surrogate end point has consistently led to improvement in the target outcome? |
|Glucagon-like peptide 1 receptor agonist exenatide2 ||HbA1c ||Cardiovascular event ||Yes18,9 ||No1,31 ||No50-53 |
|Antilipidemic drug dalcetrabid57 ||HDL-C ||Cardiovascular event† ||Yes58,59 ||No57 ||No60,61 |
|Antilipidemic drug rosuvastatin62,63 ||Cholesterol reduction or LDL-C reduction ||Myocardial infarction or death from myocardial infarction ||Yes9,64 ||Yes25 ||No25 |
|Folic acid plus vitamins B6 and B1265 ||Homocysteine ||Cardiovascular eventb ||Yes66-69 ||No 65,70,71 ||No72c |
|Proteinase inhibitora atazanavir73 ||HIV-1 viral plasma load ||AIDS or death ||Yes11-16 ||Yes5,43 ||Yes74 |
|Proteinase inhibitora atazanavir73 or darunavir75 ||CD4 cell count ||AIDS or death ||Yes11-16 ||Yes5,43 ||Yes74 |