This JAMA Guide to Statistics and Methods discusses cardinality matching, a method for finding the largest possible number of matched pairs in an observational data set, with the goal of balanced and representative samples of study participants between groups.
Cardinality matching is a computational method for finding the largest possible number of matched pairs of exposed and unexposed individuals from an observational data set, with specified patterns of baseline characteristics that represent a target population for analysis. In a report in JAMA Network Open, Benjet et al1 used cardinality matching to examine the association of neighborhood exposure to violence during an armed conflict in Nepal with the incidence of major depressive disorder in younger children and older individuals. The authors found that children younger than 11 years at the start of the conflict who were exposed to violence during the conflict were significantly more likely to develop major depressive disorder than matched children in unexposed neighborhoods. In contrast, there was no association between exposure to violence and development of depression among individuals aged 11 years or older.
Why Is Cardinality Matching Used?
In an observational study, treatment assignment is not controlled by the investigator, often resulting in exposed and unexposed groups that are not comparable. This makes evaluating the possible effects of the exposure challenging because differences in outcomes could be due to differences in characteristics other than the exposure. Randomized studies overcome this problem by controlling treatment assignment and producing treated and control groups that are similar on average. In many contexts, such as studying the effects of violence, randomization is infeasible. Thus, an alternative is to analyze observational data in a way that approximates a randomized study as closely as possible.2
An important component of designing observational studies is to ensure that exposed and unexposed groups being compared are as similar as possible on other characteristics (ie, confounders) that might be related to the exposure and that also might influence outcomes. Matching is a method for constructing comparable, or “balanced,” exposed and unexposed groups from observational data. Often, this is done by first estimating each participant’s propensity score, which is the probability of having the exposure under study according to the individual’s characteristics.3,4 Then, exposed participants are matched to unexposed participants with similar propensity scores,5,6 often using a nearest-neighbor algorithm that finds the closest matches first, proceeding until the differences in propensity scores become too large to be considered matched.
After matching, the balance of measured characteristics can be checked transparently through simple figures and tables, and if the characteristics are not balanced, the matches can be modified until they are. However, this method often excludes some exposed participants because they may be too different from any of the unexposed participants to be matched. Once exposed participants are excluded, ...