This JAMA Guide to Statistics and Methods characterizes the strengths and limitations of the C statistic as a measure of a risk prediction model’s ability to discriminate between and predict future events.
Risk prediction models help clinicians develop personalized treatments for patients. The models generally use variables measured at one time point to estimate the probability of an outcome occurring within a given time in the future. It is essential to assess the performance of a risk prediction model in the setting in which it will be used. This is done by evaluating the model’s discrimination and calibration. Discrimination refers to the ability of the model to separate individuals who develop events from those who do not. In time-to-event settings, discrimination is the ability of the model to predict who will develop an event earlier and who will develop an event later or not at all. Calibration measures how accurately the model’s predictions match overall observed event rates.
In a prospective cohort study, Melgaard et al1 used the C statistic, a global measure of model discrimination, to assess the ability of the CHA2DS2-VASc model to predict ischemic stroke, thromboembolism, or death in patients with heart failure and to do so separately for patients who had or did not have atrial fibrillation (AF).
Why Are C Statistics Used?
The C statistic is the probability that, given 2 individuals (one who experiences the outcome of interest and the other who does not or who experiences it later), the model will yield a higher risk for the first patient than for the second. It is a measure of concordance (hence, the name “C statistic”) between model-based risk estimates and observed events. C statistics measure the ability of a model to rank patients from high to low risk but do not assess the ability of a model to assign accurate probabilities of an event occurring (that is measured by the model’s calibration). C statistics generally range from 0.5 (random concordance) to 1 (perfect concordance).
C statistics can also be thought of as being the area under the plot of sensitivity (proportion of people with events for whom the model predicts are high risk) vs 1 minus specificity (proportion of people without events for whom the model predicts are high risk) for all possible classification thresholds. This plot is called the receiver operating characteristic (ROC) curve, and the C statistic is equal to the area under this curve.2 For example, in the study by Melgaard et al,1 CHA2DS2-VASc scores ranged from a low of 0 (heart failure only) to a high of 5 or higher, depending on the number of comorbidities a patient had. One point on the ROC curve would be when high risk is defined as a CHA2DS...