We continued to develop, refine and evaluate the National Cancer Institutes Breast Cancer Risk Assessment Tool (BCRAT). With a summer Fellow, Mateo Banegas, from University of Washington in Seattle, we quantified the performance of the BCRAT for Latina women in the Women's Health Initiative. We found that BCRAT underestimated risk somewhat, but that recalibration to more recent SEER rates improved the predictions of BCRAT. Dr. Banegas recently received his doctorate and is pursuing research to develop a new model for absolute breast cancer risk for Latina women. Breast, endometrial and ovarian cancers share a hormonal etiology and epidemiologic risk factors. Using data on white, non-Hispanic women over age 50 years from the Prostate, Lung, Colorectal, and Ovary (PLCO) Cancer Screening Trial and the AARP-NIH Diet and Health Study and d RRs with incidence and mortality rates from the Surveillance, Epidemiology and End Results (SEER) registries, we developed models to estimate a womans absolute risk of developing breast, endometrial or ovarian cancer over specific intervals. Work to refine and validate the models is ongoing. We developed models to predict the absolute risk of thyroid cancer following treatment of primary childhood cancers. These models have good discriminatory accuracy (AUC) and can yield 20-year risks of up to 7% in the presence of strong risk factors, such as thyroid nodules and neck radiation. The models were validated in independent cohort data. There is interest in determining whether adding information from single nucleotide polymorphisms (SNPs) can increase the discriminatory accuracy and usefulness for screening of risk models. We examined whether foreseeable genome-wide association studies will discover enough SNPs with sufficiently strong associations to make substantial further improvements in risk prediction. We evaluated several criteria for a range of cancers and concluded that the contributions from future SNP discoveries are likely to be modest. Using data from 1.4 million women undergoing HPV testing and Pap smears in Kaiser Permanente Northern California (KPNC), we calculated absolute risks of precancer and cancer for all possible combinations of HPV and Pap test results over multiple screening visits. These risks are critical for the consensus cervical screening and management guidelines committee meeting in September 2012. We are constructing a bivariate model of the risk of cervical cancer and the chance of clearance of an HPV infection. For each possible screening interval, the model provides the risk of cancer and the chance that HPV will naturally clear without need for intervention. This information should be useful in developing cervical screening guidelines. We are estimating absolute mortality risks within lung cancer risk-based strata in the PLCO to assess whether some risk-based subsets of smokers might have a mortality benefit from chest x-ray lung cancer screening. We developed methods and software to allow one to fit and validate a risk prediction model where key exposures are measured on only a subset of a cohort, with application to developing a new model of lung cancer risk that includes combinations of serum inflammatory markers measured on only a subset of the cohort. We proposed a new binomial regression model for estimating absolute risk (and risk differences) that allows one to include exposures that have logistic or linear effects. We extended this model to estimate absolute risk from population-based case-control studies, with application to estimating absolute risk of lung cancer in female vs. male smokers. We proposed and published two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q), is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. We recently extended these criteria by integrating PCF(q) and PNF(p) over ranges of q and p. We also developed methods of estimating PCF(q) and PNF(p) and their integrated forms both when the risk model was assumed to be well calibrated, and on the basis of empirical data on health outcomes. The latter methods are valid even when the risk models are not well calibrated, but they yield less precise estimates. We developed approaches for estimating and performing inference on absolute risk based on representative survey data, such as the National Health and Nutrition Examination Survey (NHANES). Using influence functions, we derived variance estimates that are valid for surveys with weighting and cluster sampling. We also proposed a criterion to estimate the importance of each competing cause on the calculation of the absolute risk of a particular cause. We developed multiple imputation methods to estimate absolute risk of death when some individuals were known to have died but their causes of death were unknown. In separate work, we showed how these methods could be used with SEER data, which have only a small proportion of such missing cause of death information, to estimate the absolute risk of dying from an incident cancer.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Division of Cancer Epidemiology and Genetics
Zip Code
Cheung, Li C; Pan, Qing; Hyun, Noorie et al. (2017) Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records. Stat Med 36:3583-3595
Katki, Hormuzd A; Kovalchik, Stephanie A; Berg, Christine D et al. (2016) Development and Validation of Risk Models to Select Ever-Smokers for CT Lung Cancer Screening. JAMA 315:2300-11
Kovalchik, Stephanie A; Pfeiffer, Ruth M (2014) Population-based absolute risk estimation with survey data. Lifetime Data Anal 20:252-75
Pfeiffer, Ruth M; Park, Yikyung; Kreimer, Aimée R et al. (2013) Risk prediction for breast, endometrial, and ovarian cancer in white women aged 50 y or older: derivation and validation from population-based cohort studies. PLoS Med 10:e1001492
Pfeiffer, Ruth M (2013) Extensions of criteria for evaluating risk prediction models for public health applications. Biostatistics 14:366-81
Riedl, Regina; Engels, Eric A; Warren, Joan L et al. (2013) Blood transfusions and the subsequent risk of cancers in the United States elderly. Transfusion 53:2198-206
Kovalchik, Stephanie A; Ronckers, Cécile M; Veiga, Lene H S et al. (2013) Absolute risk prediction of second primary thyroid cancer among 5-year survivors of childhood cancer. J Clin Oncol 31:119-27
Kovalchik, Stephanie A; Pfeiffer, Ruth M (2012) Re: Assessment of impact of outmigration on incidence of second primary neoplasms in childhood cancer survivors estimated from SEER data. J Natl Cancer Inst 104:1517-8
Gail, Mitchell H (2011) Personalized estimates of breast cancer risk in clinical practice and public health. Stat Med 30:1090-104
Pfeiffer, R M; Gail, M H (2011) Two criteria for evaluating risk prediction models. Biometrics 67:1057-65

Showing the most recent 10 out of 11 publications