Clinicians Often Disagree
Clinicians often disagree in their assessment of patients. When 2 clinicians reach different conclusions regarding the presence of a particular physical sign, either different approaches to the examination or different interpretations of the findings may be responsible for the disagreement. Similarly, disagreement between repeated applications of a diagnostic test may result from different application of the test or different interpretation of the results.
Researchers also may face difficulties in agreeing on issues such as whether patients meet the eligibility requirements for a randomized trial, whether patients in a trial have experienced the outcome of interest (eg, they may disagree about whether a patient has had a transient ischemic attack or a stroke or about whether a death should be classified as a cardiovascular death), or whether a study meets the eligibility criteria for a systematic review.
Chance Will Always Be Responsible for Some of the Apparent Agreement Between Observers
Any 2 people judging the presence or absence of an attribute will agree some of the time simply by chance. Similarly, even inexperienced and uninformed clinicians may agree on a physical finding on occasion purely as a result of chance. This chance agreement is more likely to occur when the prevalence of a target finding (a physical finding, a disease, an eligibility criterion) is high—occurring, for instance, in more than 80% of a population. When investigators present agreement as raw agreement (or crude agreement)—that is, by simply counting the number of times agreement has occurred—this chance agreement gives a misleading impression.
Alternatives for Dealing with the Problem of Agreement by Chance
This chapter describes approaches to addressing the problem of misleading results based on chance agreement. When we are dealing with categorical data (ie, placing patients in discrete categories, such as mild, moderate, or severe or stage 1, 2, 3, or 4), the most popular approach to dealing with chance agreement is with chance-corrected agreement. Chance-corrected agreement is statistically determined with kappa (κ) or weighted κ. Another option is the use of chance-independent agreement or phi (φ). One can use these 3 statistics to measure nonrandom agreement among observers, investigators, or measurements.
One Solution to Agreement by Chance: Chance-Corrected Agreement or κ
The application of κ removes most of the agreement by chance and informs clinicians of the extent of the possible agreement over and above chance. The total possible agreement on any judgment is always 100%. Figure 19.3-1 depicts a situation in which agreement by chance is 50%, leaving possible agreement above and beyond chance of 50%. As depicted in the figure, the raters have achieved an agreement of 75%. Of this 75%, 50% was achieved by chance alone. Of the remaining possible 50% agreement, the raters have achieved half, resulting in a κ value of 0.25/0.50, or 0.50.