This JAMA Guide to Statistics and Methods summarizes the characteristics and uses of the National Cancer Database (NCDB) for use in surgical research.
The National Cancer Database (NCDB) is a joint program of the American College of Surgeons Commission on Cancer (CoC) and the American Cancer Society (Box 10).1 The NCDB is a hospital-based clinical cancer registry established in 1989 that collects data from more than 1500 hospitals in the United States, capturing more than 70% of all newly diagnosed cancers.
BOX 10 Best Practices for Using the National Cancer Database (NCDB)
Ensure the NCDB is the appropriate data set to address the question of interest.
Consult an experienced user of the NCDB early.
Examine all variables and data definitions before beginning the project and define an analytic plan a priori. Keep in mind that variables may change over time.
The strengths of the NCDB are in examining treatment patterns and trends over time across the United States. Focus projects on these research questions.
Perform extensive sensitivity analyses to evaluate and address confounding and selection biases.
Thoughtful handling of missing data is necessary when using the NCDB.
In 2013, the American College of Surgeons CoC began to make available the participant use file (PUF) to CoC member facilities. This led to an exponential growth in the number and breadth of publications using the NCDB. With the continued expansion and access to NCDB, it is expected that the number of publications will continue to increase.
Given the power of the NCDB, it is of primary importance to ensure that appropriate methods are used during data analysis and reporting. Like all secondary data sets, there are unique nuances and challenges that may introduce significant confounding and bias into study results. Our objectives are to present an overview of unique data elements in the NCDB and provide an analytic framework when using the data set for the purposes of research.
DATA ELEMENT CONSIDERATIONS
The NCDB PUF includes a range of data elements that include patient characteristics and comorbidities, staging data, treatment information, and survival outcomes. Specific variables and definitions can be found elsewhere.2,3 Although describing each variable is beyond our scope and purpose, we will discuss a few important issues.
The PUF data set contains 2 hospital-specific variables. First, there is an anonymous randomly generated “facility ID.” This variable can be used for hospital-specific calculations, including hospital and procedure volume metrics. It is important to understand how volume calculations may change over the study period. It is common to exclude hospitals if they did not submit at least 1 case to the data set every year of the study to ensure a consistent population of hospitals. A second variable, “facility type,” details the type of cancer center (eg, community, comprehensive, ...