This JAMA Guide to Statistics and Methods characterizes the strengths and limitations of different approaches for modeling missing data in clinical research using the example of a trial that applied several of these techniques.
Missing data are common in clinical research, particularly for variables requiring complex, time-sensitive, resource-intensive, or longitudinal data collection methods. However, even seemingly readily available information can be missing. There are many reasons for “missingness,” including missed study visits, patients lost to follow-up, missing information in source documents, lack of availability (eg, laboratory tests that were not performed), and clinical scenarios preventing collection of certain variables (eg, missing coma scale data in sedated patients). It is particularly challenging to interpret studies when primary outcome data are missing. However, many methods commonly used for handling missing values during data analysis can yield biased results, decrease study power, or lead to underestimates of uncertainty, all reducing the chance of drawing valid conclusions.
For example, Bakris et al1 evaluated the effect of finerenone on urinary albumin-creatinine ratio (UACR) in patients with diabetic nephropathy in a randomized, phase 2B, dose-finding clinical trial conducted in 148 sites in 23 countries. Because of the logistical complexity of the study, it is not surprising that some of the intended data collection could not be completed, resulting in missing outcome data. Bakris et al used several analysis and imputation techniques (ie, methods for replacing missing data with specific values) to assess the effects of different approaches for handling missing data. These methods included complete case analysis (restricting the analysis to include only patients with observed 90-day UACR values); last observation carried forward (LOCF; typically this involves using the last recorded data point as the final outcome; Bakris et al1 used the higher of 2 UACR values and, separately, the most recent UACR obtained prior to study discontinuation); baseline observation carried forward (using the baseline UACR value as the outcome UACR value, therefore assuming no treatment effect for that patient); mean value imputation (replacing missing values with the mean of observed UACR values); and random imputation (using randomly selected UACR values to replace missing UACR values).1 Multiple imputation2 to handle missing values was also performed. With the exception of multiple imputation, each of the imputation approaches replaces a missing value with a single number (termed “single” or “simple” imputation) and can threaten the validity of study results.3,4 The authors concluded that finerenone improved the UACR, a result that was consistent regardless of the method for handling missing data.
Why Are These Methods Used?
It is rare for a research investigation not to have any missing data. If patients with missing variables are omitted from an analysis, the effective sample size is reduced and the treatment effect estimate may be incorrect.3 This is known as complete (observed) case analysis ...