This JAMA Guide to Statistics and Methods summarizes the Surveillance, Epidemiology, and End Results (SEER) data sets, including longitudinal details, included and excluded data, and uses in surgical research.

The Surveillance, Epidemiology, and End Results (SEER) database is a publicly available, federally funded cancer reporting system that represents a collaboration between the US Centers for Disease Control and Prevention, the National Cancer Institute, and regional and state cancer registries.1 SEER data are national, with information from 18 states that represent all regions of the country. In contrast to other commonly used data sets (eg, the National Cancer Data Base), SEER is population-based, because local registries report information for all cancer cases within a specific region and/or defined racial/ethnic population. Given that SEER data is both a cancer reporting system and a research tool, we aim to present salient aspects of these data, strengths and limitations for analyses, and important statistical considerations.


Data Sources

SEER data are gathered at the local level. Trained registrars collect data from all clinical settings that diagnose or treat cancer and include patients of all ages, regardless of insurance status. Dates and causes of death come from death certificates, and mortality statistics are calculated using data from the US Census Bureau (Table 3). SEER data captures 28% of the US population; because of its targeted sampling strategy, it includes a high proportion of racial/ethnic minorities, foreign-born individuals, and those with income below the federal poverty line.


Overview of the Surveillance, Epidemiology, and End Results Database

Time Trend Data


