# Chapter 9: Does Treatment Lower Risk? Understanding the Results

When clinicians consider the results of clinical trials, they are interested in the association between a treatment and an outcome. This chapter will help you understand and interpret study results related to outcomes that are either present or absent (*dichotomous* or binary) for each patient. Such binary outcomes include death, stroke, myocardial infarction, hospitalization, or disease exacerbations. A guide for teaching the *concepts* in this chapter is also available.^{1}

Table 9-1 is a 2 × 2 table that captures the information for a dichotomous outcome of a clinical trial.

**The 2 × 2 Table**

**The 2 × 2 Table**

Exposure | Outcome | |
---|---|---|

Yes | No | |

Yes | a | b |

No | c | d |

Risk with exposure = Risk without exposure = Odds with exposure = Odds without exposure = | ||

Risk difference^{a} | ||

Number needed to treat = 100 / (risk difference expressed as %) | ||

^{a}Also known as the absolute risk reduction.

For instance, during a *randomized trial* that compares mortality rates in patients with bleeding esophageal varices that were controlled by endoscopic ligation or endoscopic sclerotherapy,^{2} 18 of 64 participants assigned to ligation died, as did 29 of 65 patients assigned to sclerotherapy (Table 9-2).

**Results From a Randomized Trial of Endoscopic Sclerotherapy Compared With Endoscopic Ligation for Bleeding Esophageal Varices ^{a}**

**Results From a Randomized Trial of Endoscopic Sclerotherapy Compared With Endoscopic Ligation for Bleeding Esophageal Varices ^{a}**

Exposure | Outcome | Total | |
---|---|---|---|

Death | Survival | ||

Ligation | 18 | 46 | 64 |

Sclerotherapy | 29 | 36 | 65 |

Relative risk = (18/64) / (29/65) = 0.63 or 63% | |||

Relative risk reduction = 1 − 0.63 = 0.37 or 37% | |||

Risk difference = 0.446 − 0.281 = 0.165 or 16.5% | |||

Number needed to treat = 100 / 16.5 = 6 | |||

Odds ratio = (18/46) / (29/36) = 0.39 / 0.80 = 0.49 or 49% |

The simplest measure of occurrence to understand is the *risk* (or *absolute risk*). We often refer to the risk of the adverse outcome in the *control group* as the *baseline risk,* the *control group risk*, or, occasionally, the *control event rate*.

The risk of dying in the ligation group is 28% (18/64 or [*a*/(*a* + *b*)]), and the risk of dying in the sclerotherapy group is 45% (29/65 or [*c*/(*c* + *d*)]).

One way of comparing 2 risks is by calculating the absolute difference between them. We refer to this difference as the *absolute risk reduction* (ARR) or the *risk difference* (RD). Algebraically, the formula for the RD (the control group risk minus the treatment group risk) is [*c*/(*c* + *d*)] – [*a*/(*a* + *b*)] (Table 9-1). This measure of effect uses absolute rather than relative terms in looking at the proportion of patients who are spared the adverse outcome.

In our example, the RD is 0.446 – 0.281 or 0.165 (ie, an RD of 16.5%).

Another way to compare the risks in the 2 groups is to take their ratio; this is called the *relative risk* or *risk ratio* (RR). The RR tells us the proportion of the original risk (in this case, the risk of death in patients who received sclerotherapy) that is still present when patients receive the *experimental treatment* (in this case, ligation). From our 2 × 2 table, the formula for this calculation is [*a*/(*a* × *b*)]/[*c*/(*c* + *d*)] (Table 9-1).

In our example, the RR of dying after receiving initial ligation vs sclerotherapy is 18/64 (the risk in the ligation group) divided by 29/65 (the risk in the sclerotherapy group) or 0.63. In everyday English, we would say that the risk of death with ligation is approximately two-thirds of that with sclerotherapy.

An alternative relative measure of treatment effectiveness is the *relative risk reduction* (RRR), an estimate of the proportion of baseline risk that is removed by the therapy. It may be calculated as 1 − RR. One also can calculate the RRR by dividing the RD (amount of risk removed) by the absolute risk in the control group (Table 9-1).

In our bleeding varices example, where the RR was 0.63, the RRR is thus 1 − 0.63 (or 16.5% divided by 44.6%, the risk in the sclerotherapy group); either way, it comes to 0.37. In other words, ligation decreases the risk of death by just more than one-third compared with sclerotherapy.

Instead of looking at the risk of an event, we could estimate the odds of having vs not having an event. When considering the effects of therapy, you usually will not go far wrong if you interpret the *odds ratio* (OR) as equivalent to the RR. The exception is when the risk of having an event is very high—for instance, when more than 40% of control patients experience myocardial infarction or death (see Chapter 12.2, Understanding the Results: More About Odds Ratios).

Failing to distinguish between the OR and the RR when interpreting randomized trial results will seldom mislead you; you must, however, distinguish between the RR and the RD. The reason is that the RR is generally far larger than the RD, and presentations of results in the form of RR (or RRR) can convey a misleading message. Furthermore, it is the risk difference in which the patient is ultimately interested. Reducing a patient's risk by 50% sounds impressive. That may, however, represent a reduction in risk from 2% to 1%. The corresponding 1% RD sounds considerably less impressive and in fact conveys the crucial information.

As depicted in Figure 9-1, consider a treatment that is administered to 3 different subpopulations of patients and that, in each case, decreases the risk by one-third (RRR, 0.33; RR, 0.67). When administered to a subpopulation with a 30% risk of dying, treatment reduces the risk to 20%. When administered to a population with a 10% risk of dying, treatment reduces the risk to 6.7%. In the third population, treatment reduces the risk of dying from 1% to 0.67%.

Although treatment reduces the risk of dying by one-third in each population, this piece of information is not adequate to fully capture the impact of treatment. What if the treatment under consideration is a toxic cancer chemotherapeutic drug associated with severe adverse effects in 50% of those to whom it is administered? Under these circumstances, most patients in the lowest risk group in Figure 9-1, whose RD is only 0.3%, would likely decline treatment. In the intermediate population, those with an absolute reduction in risk of death of approximately 3%, some might accept the treatment, but many would likely decline. Many in the highest-risk population with an absolute benefit of 10% would likely accept the treatment, but some may not.

We suggest that you consider the RRR in light of your patient's baseline risk. For instance, you might expect an RRR of approximately 25% in vascular events in patients with possible cardiovascular disease with administration of statins. You would view this RRR differently in a 40-year-old woman without hypertension, diabetes mellitus, or a history of smoking with a mildly elevated low-density lipoprotein level (5-year risk of a cardiovascular event of approximately 2%, ARR of approximately 0.5%) and a 70-year-old woman with hypertension and diabetes who smokes (5-year risk of 30%, ARR of 7.5%). All of this assumes a constant RRR across risk groups; fortunately, a more or less constant RRR is usually the case, and we suggest you make that assumption unless there is evidence that suggests it is incorrect.^{3-5}

The impact of treatment also can be expressed by the number of patients you would need to treat to prevent an adverse event, the *number needed to treat* (NNT).^{6} Table 9-2 indicates that the risk of dying is 28.1% in the ligation group and 44.6% in the sclerotherapy group, an RD of 16.5%. If treating 100 patients results in avoiding 16.5 events, how many patients do we need to treat to avoid 1 event? The answer: 100 divided by 16.5, or approximately 6, is the NNT.

The NNT calculation always implies a given time of *follow-up* (ie, do we need to treat 50 patients for 1 year or 5 years to prevent an event?). When trials with long follow-ups are analyzed by survival methods, there are a variety of ways of calculating the NNT (see the following subsection, Survival Data). These different methods will, however, rarely lead to results with different clinical implications.^{7}

Assuming a constant RRR, the NNT is inversely related to the proportion of patients in the control group who have an adverse event. For instance, if the control group risk doubles, the NNT will decrease by a factor of 2 (ie, be half of what it was). If the risk of an adverse event doubles (eg, if we deal with patients at a higher risk of death than those included in the clinical trial), we need to treat only half as many patients to prevent an adverse event. On the other hand, if the risk decreases by a factor of 4 (patients are younger and have less *comorbidity* than those in the study), we will have to treat 4 times as many people.

The NNT also is inversely related to the RRR. With the same baseline risk, a more effective treatment with twice the RRR will reduce the NNT by half. If the RRR with one treatment is only a quarter of that achieved by an alternative strategy, the NNT will be 4 times greater.

Table 9-3 presents hypothetical data that illustrate these relationships.

**Association Among the Baseline Risk, Relative Risk Reduction, and Number Needed to Treat ^{a}**

**Association Among the Baseline Risk, Relative Risk Reduction, and Number Needed to Treat ^{a}**

Control Group Risk | Experimental Group Risk | Relative Risk, % | Relative Risk Reduction, % | Risk Difference, % | Number Needed to Treat |
---|---|---|---|---|---|

0.02 or 2% | 0.01 or 1% | 50 | 50 | 1 | 100 |

0.4 or 40% | 0.2 or 20% | 50 | 50 | 20 | 5 |

0.04 or 4% | 0.02 or 2% | 50 | 50 | 2 | 50 |

0.04 or 4% | 0.03 or 3% | 75 | 25 | 1 | 100 |

0.4 or 40% | 0.3 or 30% | 75 | 25 | 10 | 10 |

0.01 or 1% | 0.005 or 0.5% | 50 | 50 | 0.5 | 200 |

^{a}Relative risk = experimental group risk/control group risk; relative risk reduction = 1 – relative risk; risk difference = control group risk – experimental group risk; number needed to treat = 100/risk difference in %.

Clinicians can calculate the *number needed to harm* (NNH) in a similar way. If you expect 5 of 100 patients to become fatigued when taking a β-blocker for a year, of 20 patients you treat, 1 will become tired; therefore, the NNH is 20.

We have presented all of the measures of association of the treatment with ligation vs sclerotherapy as if they represented the true effect. The results of any experiment, however, represent only an estimate of the truth. The true effect of treatment may be somewhat greater—or less—than what we observed. The *confidence interval* (CI) tells us, within the bounds of plausibility (and assuming a low *risk of bias*), how much greater or smaller the true effect is likely to be (see Chapter 10, Confidence Intervals: Was the Single Study or Meta-analysis Large Enough?).

Analysis of a 2 × 2 table implies an examination of the data at a specific point in time. This analysis is satisfactory if we are looking for events that occur within relatively short periods and if all patients have the same duration of follow-up. In longer-term studies, however, we are interested not only in the total number of events but also in their timing. For instance, we may focus on whether therapy for patients with a uniformly fatal condition (unresectable lung cancer, for example) delays death.

When the timing of events is important, investigators could present the results in the form of several 2 × 2 tables constructed at different points of time after the study began. For example, Table 9-2 represents the situation after the study was finished. Similar tables could be constructed describing the fate of all patients available for analysis after their enrollment in the trial for 1 week, 1 month, 3 months, or whatever time we chose to examine. The analysis of accumulated data that takes into account the timing of events is called *survival analysis*. Do not infer from the name, however, that the analysis is restricted to deaths; in fact, any dichotomous outcome occurring over time will qualify.

The *survival curve* of a group of patients describes their status at different times after a defined starting point.^{8} In Figure 9-2, we show the survival curve from the bleeding varices trial. Because the investigators followed up some patients for a longer time, the survival curve extends beyond the mean follow-up of approximately 10 months. At some point, prediction becomes imprecise because there are few patients remaining to estimate the *probability* of survival. The CIs around the survival curves capture the precision of the estimate.

Even if the true RR, or RRR, is constant throughout the duration of follow-up, the play of chance will ensure that the *point estimates* differ. Ideally then, we would estimate the overall RR by applying an average, weighted for the number of patients available, for the entire survival experience. Statistical methods allow just such an estimate. The probability of events occurring at any point in each group is referred to as the hazard for that group, and the weighted RR during the entire study duration is known as the *hazard ratio*.

A major advantage of using survival analysis is the ability to account for differential length of follow-up. In many trials of a fixed duration, some patients are enrolled early and thus have long follow-up and some later with consequently shorter follow-up. Survival analysis takes into account both those with shorter (by a process called *censoring*) and those with longer follow-up, and all contribute to estimates of hazard and the hazard ratio. Patients are censored at the point at which they are no longer being followed up. Appropriate accounting for those with differential length of follow-up is not possible in 2 × 2 tables that deal only with the number of events.

“Competing risks” is an issue that arises when one event influences the likelihood of another event. The most extreme example is death: if the outcome is stroke, people who die can no longer have a stroke. Competing risks also can arise when there are 2 or more outcome events among living patients (for instance, if a patient has a stroke, the likelihood of a subsequent transient ischemic attack may decrease). Investigators can deal with the problem of competing risks by censoring patients at the time of the “competing” events (death and stroke in the previous examples). The censoring approach, however, has its limitations.^{9}

Specifically, the usual assumption is that the censored events are independent of the main outcome of interest, but in practice this assumption may not be correct. In our example, it is probable that patients who experience myocardial infarction have a higher death rate than those without myocardial infarction, and this would violate the assumption of independence. Investigators also sometimes use censoring for those *lost to follow-up*. This is much more problematic because the censoring assumes that those with shorter follow-up are similar to those with longer follow-up—the only difference, indeed, being length of follow-up. Because loss to follow-up may be associated with a higher or lower likelihood of events (and thus, those lost differ from those who are followed up), the censoring approach does not deal with the risk of bias associated with loss to follow-up.^{9}

As *evidence-based practitioners*, we must decide which measure of association deserves our focus. Does it matter? The answer is yes. The same results, when presented in different ways, may lead to different treatment decisions.^{9-13} For example, Forrow et al^{10} found that clinicians were less inclined to treat patients after presentation of trial results as the absolute change in the outcome compared with the relative change in the outcome. In a similar study, Naylor et al^{11} found that clinicians rated the effectiveness of an intervention lower when events were presented in absolute terms rather than using RRR. Moreover, clinicians offered lower effectiveness ratings when they viewed results expressed in terms of NNT than when they saw the same data as RRRs or ARRs. The awareness of this phenomenon in the pharmaceutical industry may be the reason for their propensity to present physicians with treatment-associated RRRs.

Patients are as susceptible as clinicians to how results are communicated. In one study, when researchers presented patients with a hypothetical scenario of life-threatening illness, the patients were more likely to choose a treatment described in terms of RRR than in terms of the corresponding ARR.^{14}Other investigators found similar results.^{15,16}

Considering how our interpretations differ with data presentations, we are best advised to consider all of the data (as either a 2 × 2 table or a survival analysis) and then reflect on both the relative and the absolute figures. As you examine the results, you will find that if you can estimate your patient's baseline risk, knowing how well the treatment works—expressed as an RR or RRR—allows you to estimate the patient's risk with treatment. Considering the RD—the difference between the risk with and without treatment—and its reciprocal, the NNT, in an individual patient will be most useful in guiding the treatment decision.

*CMAJ*. 2004;171(4:online-1 to online-8):353-–358. http://www.cmaj.ca/cgi/data/171/4/353/DC1/1. Accessed August 25 , 2014. [PubMed: 15313996]

*N Engl J Med*. 1992;326(23):1527-–1532. [PubMed: 1579136]

*Stat Med*. 2002;21(11):1575-–1600. [PubMed: 12111921]

*Stat Med*. 1998;17(17):1923-–1942. [PubMed: 9777687]

*Int J Epidemiol*. 2002;31(1):72-–76. [PubMed: 11914297]

*N Engl J Med*. 1988;318(26):1728-–1733. [PubMed: 3374545]

*CMAJ*. 2005;172(5):613-–615. [PubMed: 15738471]

*Can Med Assoc J*. 1979;121(8):1065-–1068, 1071. [PubMed: 543995]

*Am J Med*. 1992;92(2):121-–124. [PubMed: 1543193]

*Ann Intern Med*. 1992;117(11):916-–921. [PubMed: 1443954]

*J Gen Intern Med*. 1994;9(4):195-–201. [PubMed: 8014724]

*N Engl J Med*. 1990;322(16):1162-–1164. [PubMed: 2320089]

*Lancet*. 1994;343(8907):1209-–1211. [PubMed: 7909875]

*J Gen Intern Med*. 1993;8(10):543-–548. [PubMed: 8271086]

*N Engl J Med*. 1982;306(21):1259-–1262. [PubMed: 7070445]