We continued to develop, refine and evaluate the National Cancer Institutes Breast Cancer Risk Assessment Tool (BCRAT). The BCRAT was found to underestimate risk for Latina women in the Women's Health Initiative, but recalibration to more recent SEER rates improved the predictions of BCRAT. Dr. Mateo Banegas, a NCI Cancer Prevention Fellow, is currently developing a new model for absolute invasive breast cancer risk for Latina women. Breast, endometrial and ovarian cancers share a hormonal etiology and epidemiologic risk factors. We used data on white, non-Hispanic women over age 50 years from the Prostate, Lung, Colorectal, and Ovary (PLCO) Cancer Screening Trial and the AARP-NIH Diet and Health Study to compute relative and attributable risks, and we combined these results with incidence and mortality rates from the Surveillance, Epidemiology and End Results (SEER) registries to develop models to estimate a womans absolute risk of developing breast, endometrial or ovarian cancer over specific intervals (in press). There is interest in determining whether adding information from single nucleotide polymorphisms (SNPs) can increase the discriminatory accuracy and usefulness for screening of risk models. We published data showing that huge samples sizes are needed in genome-wide association studies (GWAS) to achieve the full potential discriminatory accuracy inherent in SNPs. We completed research showing that with smaller GWAS samples, one should rarely include more than 100 SNPs in building risk models. Low-dose computed tomography (LDCT) screening in the National Lung Screening Trial (NLST) resulted in a 20% reduction in lung cancer mortality among smokers. We showed that lung cancer risk stratification is useful for deciding who should get LDCT screening, and that the benefit from LDCT is mainly confined to those at highest risk. We published a paper that calculates cumulative incidence of lung cancer risk based on combinations of 4 serum inflammatory markers measured on a case-control sample of a cohort. Using data from 1.4 million women undergoing HPV testing and Pap smears in Kaiser Permanente Northern California (KPNC), we published an 8-paper monograph detailing calculations of absolute risks of precancerous lesions and cancer for all possible combinations of HPV and Pap test results, and biopsy/treatment results, over multiple screening visits. These risks were the basis for the new consensus cervical screening and management guidelines released in April 2013. We are constructing a bivariate model of the risk of cervical cancer and the chance of clearance of an HPV infection that could be useful in developing future cervical screening guidelines. We extended a binomial regression model for estimating absolute risk (and risk differences) to estimate absolute risk from population-based case-control studies, with application to estimating absolute risk of lung cancer in female vs. male smokers. We previously published two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q), is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. We recently extended these criteria by integrating PCF(q) and PNF(p) over ranges of q and p. We also developed methods of estimating PCF(q) and PNF(p) and their integrated forms both when the risk model was assumed to be well calibrated, and on the basis of empirical data on health outcomes. The latter methods are valid even when the risk models are not well calibrated, but they yield less precise estimates. We developed approaches for estimating and performing inference on absolute risk based on representative survey data, such as the National Health and Nutrition Examination Survey (NHANES) (in press). Using influence functions, we derived variance estimates that are valid for surveys with weighting and cluster sampling. We also proposed a criterion to estimate the importance of each competing cause on the calculation of the absolute risk of a particular cause. We showed how to assess the clinical value of a biomarker from measurements in cases and controls by estimating the positive predictive value and one minus the negative predictive values, which are respectively the disease risks given a positive and a negative biomarker result.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Division of Cancer Epidemiology and Genetics
Zip Code
Kovalchik, Stephanie A; Pfeiffer, Ruth M (2014) Population-based absolute risk estimation with survey data. Lifetime Data Anal 20:252-75
Pfeiffer, Ruth M (2013) Extensions of criteria for evaluating risk prediction models for public health applications. Biostatistics 14:366-81
Kovalchik, Stephanie A; Pfeiffer, Ruth M (2012) Re: Assessment of impact of outmigration on incidence of second primary neoplasms in childhood cancer survivors estimated from SEER data. J Natl Cancer Inst 104:1517-8
Gail, Mitchell H (2011) Personalized estimates of breast cancer risk in clinical practice and public health. Stat Med 30:1090-104
Pfeiffer, R M; Gail, M H (2011) Two criteria for evaluating risk prediction models. Biometrics 67:1057-65