We continued to develop, refine and evaluate the National Cancer Institutes Breast Cancer Risk Assessment Tool (BCRAT). Using data from the Asian American Breast Cancer Study, we obtained ethnicity-specific relative risks and attributable risks for Asian American women, and we coupled these with age- and ethnicity-specific breast cancer incidence rates from SEER to produce absolute risks. The model was assessed for validity in independent data from the Womens Health Initiative. This work has been published and incorporated into BCRAT. With a predoctoral Fellow, Elisabetta Petracci, we developed a breast cancer risk model that included modifiable risk factors, such as alcohol consumption, lack of exercise and body mass index (BMI). We estimated the reductions in absolute risk that could be obtained by reducing these risk factors. With a summer Fellow , Mateo Banegas, who is a pre-doctoral student at the University of Washington in Seattle, we quantified the performance of the BCRAT for Latina women in the Women's Health Initiative. We found that BCRAT underestimated risk somewhat, but that recalibration to more recent SEER rates improved the predictions of BCRAT. We analyzed data from the Washington Ashkenazi Study to address whether a woman from a high risk family known to carry mutations in BRCA1 or BRCA2 genes had above average risk of breast cancer even if she was found not to carry a mutation. Because most of the familial correlation in breast cancer risk is not due to BRCA1 or BRCA2 mutations, and because most high risk families are ascertained because several members are affected, there is reason to believe that such a woman remains at higher risk than the general population, even though the risk is not as high as for a mutation carrier. Our data and review of the literature indicate that such residual familial risk can affect clinical management. There is interest in determining whether adding information from single nucleotide polymorphisms (SNPs) can increase the discriminatory accuracy and usefulness for screening of risk models. We examined whether foreseeable genome-wide association studies will discover enough SNPs with sufficiently strong associations to make substantial further improvements in risk prediction. We evaluated several criteria for a range of cancers and concluded that the contributions from future SNP discoveries are likely to be modest. A related commentary indicated that other genetic variations, such as copy number variants or rare strong variants, might provide some additional discriminatory power. Breast, endometrial and ovarian cancers share a hormonal etiology and epidemiologic risk factors. While several models predict absolute risk of breast cancer, few models predict risk of ovarian cancer in the general population and none for endometrial cancer. Using data on white, non-Hispanic women over age 50 years from the Prostate, Lung, Colorectal, and Ovary (PLCO) Cancer Screening Trial and the AARP-NIH Diet and Health Study and d RRs with incidence and mortality rates from the Surveillance, Epidemiology and End Results (SEER) registries, we developed models to estimate a womans absolute risk of developing breast, endometrial or ovarian cancer over specific intervals. The models were validated using independent data from the Nurses Health Cohort Study. Risk factors included in all models were parity and hormone replacement therapy use. In addition the breast cancer model included age at first life birth, age at menopause, family history of breast or ovarian cancer, history of gynecologic surgeries and benign breast disease/breast biopsies, alcohol consumption and body mass index (BMI);the endometrial model included age at menopause, BMI, smoking and oral contraceptive (OC) use;the ovarian model included OC use and family history of breast or ovarian cancer. All models were well calibrated (ratio of expected (E) to observed (O) cancers were: E/O=1.03, 95% confidence interval [CI] = 0.99 to 1.07 for breast cancer;E/O=1.00, 95%CI 0.91-1.11 for endometrial cancer;E/O=0.91, 95%CI 0.81-1.01 for ovarian cancer. We estimated cervical cancer risk from 330,000 women undergoing HPV testing and Pap smears in Kaiser Permanente Northern California (KPNC). We are further estimating absolute risks in KPNC for new combinations of HPV and Pap test results, and over multiple screening visits. We are constructing a model based on HPV testing to help guide diagnostic testing and treatment of women at risk of cervical cancer. We are developing a statistical model that will separate the screening protocol from natural history. This will allow us to estimate risk for any screening protocol. We estimated absolute risks of lung cancer based on combinations of serum inflammatory markers. We are estimating absolute mortality risks within lung cancer risk-based strata in the PLCO to assess if some risk-based subsets of smokers might have a mortality benefit from chest x-ray lung cancer screening. We developed a model to predict which patients with chronic hepatitis C would have a sustained virologic response to interferon/ribavirin treatment, based on the IL28B rs12979860-CC genotype and four clinical predictors. We proposed and published two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q), is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. We developed imputation methods for projecting absolute risk of dying from an incident cancer in SEER when cause of death information is missing in some cases. With Arpita Ghosh (Visiting Fellow), we are developing methods and software for estimating and validating risk prediction models when some exposures are observed on only a complex sample of a cohort. We developed an influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We applied this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we used an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria were compared to bootstrap variance estimates.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Division of Cancer Epidemiology and Genetics
Zip Code
Katki, Hormuzd A; Kovalchik, Stephanie A; Berg, Christine D et al. (2016) Development and Validation of Risk Models to Select Ever-Smokers for CT Lung Cancer Screening. JAMA 315:2300-11
Kovalchik, Stephanie A; Pfeiffer, Ruth M (2014) Population-based absolute risk estimation with survey data. Lifetime Data Anal 20:252-75
Pfeiffer, Ruth M; Park, Yikyung; Kreimer, Aimée R et al. (2013) Risk prediction for breast, endometrial, and ovarian cancer in white women aged 50 y or older: derivation and validation from population-based cohort studies. PLoS Med 10:e1001492
Riedl, Regina; Engels, Eric A; Warren, Joan L et al. (2013) Blood transfusions and the subsequent risk of cancers in the United States elderly. Transfusion 53:2198-206
Pfeiffer, Ruth M (2013) Extensions of criteria for evaluating risk prediction models for public health applications. Biostatistics 14:366-81
Kovalchik, Stephanie A; Ronckers, Cécile M; Veiga, Lene H S et al. (2013) Absolute risk prediction of second primary thyroid cancer among 5-year survivors of childhood cancer. J Clin Oncol 31:119-27
Kovalchik, Stephanie A; Pfeiffer, Ruth M (2012) Re: Assessment of impact of outmigration on incidence of second primary neoplasms in childhood cancer survivors estimated from SEER data. J Natl Cancer Inst 104:1517-8
Pfeiffer, R M; Gail, M H (2011) Two criteria for evaluating risk prediction models. Biometrics 67:1057-65
Gail, Mitchell H (2011) Personalized estimates of breast cancer risk in clinical practice and public health. Stat Med 30:1090-104
Gail, Mitchell H; Graubard, Barry; Williamson, David F et al. (2009) Comments on 'Choice of time scale and its effect on significance of predictors in longitudinal studies' by Michael J. Pencina, Martin G. Larson and Ralph B. D'Agostino, Statistics in Medicine 2007; 26:1343-1359. Stat Med 28:1315-7