This JAMA Guide to Statistics and Methods discusses the use of multiple imputation in statistical analyses when data are missing for some participants in a clinical trial.
In a study published in JAMA, Asch et al1 reported results of a cluster randomized clinical trial designed to evaluate the effects of physician financial incentives, patient incentives, or shared physician and patient incentives on low-density lipoprotein cholesterol (LDL-C) levels among patients with high cardiovascular risk. Because 1 or more follow-up LDL-C measurements were missing for approximately 7% of participants, Asch et al used multiple imputation (MI) to analyze their data and concluded that shared financial incentives for physicians and patients, but not incentives to physicians or patients alone, resulted in the patients having lower LDL-C levels. Imputation is the process of replacing missing data with 1 or more specific values, to allow statistical analysis that includes all participants and not just those who do not have any missing data.
Missing data are common in research. In another JAMA Guide to Statistics and Methods, Newgard and Lewis2 reviewed the causes of missing data. (See the chapter Missing Data: How to Best Account for What Is Not Known.) These are divided into 3 classes: (1) missing completely at random, the most restrictive assumption, indicating that whether a data point is missing is completely unrelated to observed and unobserved data; (2) missing at random, a more realistic assumption than missing completely at random, indicating whether a missing data point can be explained by the observed data; or (3) missing not at random, meaning that the missingness is dependent on the unobserved values. Common statistical methods used for handling missing values were reviewed.2 When missing data occur, it is important to not exclude cases with missing information (analyses after such exclusion are known as complete case analyses). Single-value imputation methods are those that estimate what each missing value might have been and replace it with a single value in the data set. Single-value imputation methods include mean imputation, last observation carried forward, and random imputation. These approaches can yield biased results and are suboptimal. Multiple imputation better handles missing data by estimating and replacing missing values many times.
Why Is Multiple Imputation Used?
Multiple imputation fills in missing values by generating plausible numbers derived from distributions of and relationships among observed variables in the data set.3 Multiple imputation differs from single imputation methods because missing data are filled in many times, with many different plausible values estimated for each missing value. Using multiple plausible values provides a quantification of the uncertainty in estimating what the missing values might be, avoiding creating false precision (as can happen with single imputation). Multiple imputation provides accurate estimates of quantities or associations of interest, such as treatment effects in randomized trials, sample means ...