Investigations have been conducted for the potential for using data from current and future genome-wide association studies for improving performance of models for predicting disease risks. A new mathematical paradigm was developed to characterize predictive performance of polygenic models in terms of sample size for training datasets, number of underlying susceptibility loci and distribution of their effect-sizes. The paradigm was then applied to make projections for performance of risk prediction models for ten different complex traits, including cancers. These projections revealed that in the future extremely large GWAS, with sample size of a larger order magnitude than even some of the largest GWAS to date, would be needed for building genetic risk models with substantially improved predictive performance. A new method was developed for assessing gene-environment interactions using data from case-control genome-wide association studies that uses publicly available genetic controls. It was shown that under a set of assumptions it possible to characterize joint gene-environment effects from such studies if data on environmental exposures are available from an internal case-control study even if controls in such a study are not genotyped. New methods was developed for evaluating association of SNP markers with disease outcome of ordinal nature reflecting various stages of the progression of a disease. Two alternative tests, the maximum score test (MAX) and the adaptive P-value combination test (Adapt-P), are proposed with the aim of striking a balance between efficiency and robustness over possible alternative models by which a SNPs might be involved in the various stages. Simulation studies were used to demonstrates that MAX and Adapt-P have the most robust performance among all a range of tests under various realistic scenarios. A permutation-based resampling method was developed for using metabolomic data for testing the hypothesis of mediation of the effect of an exposure (e.g smoking) on the risk of a disease (e.g lung cancer) through intermediate biomarkers. Extensive simulation studies were used to examine validity and power of the proposed test. Methods were developed for analysis of population-based case-control studies with complex sampling designs. Two methods were developed for incorporating the information included in the sample weights by modeling the sample expectation of the weights conditional on design variables. These methods have higher efficiency and smaller finite sample bias compared with the standard estimators that use original sample weights. The methods were to the U.S. Kidney Cancer Case-Control Study to identify risk factors. A project developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, while the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. The method was applied to estimate the absolute five-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern Califronia. The LEXPIT model found an increased risk due to abnormal Pap test in HPV-negative that was not detected with logistic regression. An R package blm was developed to provide free and easy-to-use software for fitting the LEXPIT model.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Division of Cancer Epidemiology and Genetics
Zip Code
Rosenberg, Philip S; Check, David P; Anderson, William F (2014) A web tool for age-period-cohort analysis of cancer incidence and mortality rates. Cancer Epidemiol Biomarkers Prev 23:2296-302
Gail, Mitchell H (2014) Using absolute risks to assess the risks and benefits of treatment. Thorax 69:604-5
Boca, Simina M; Sinha, Rashmi; Cross, Amanda J et al. (2014) Testing multiple biological mediators simultaneously. Bioinformatics 30:214-20
Ghosh, Arpita; Hartge, Patricia; Kraft, Peter et al. (2014) Leveraging family history in population-based case-control association studies. Genet Epidemiol 38:114-22
Lubin, Jay H; De Stefani, Eduardo; Abnet, Christian C et al. (2014) Mate drinking and esophageal squamous cell carcinoma in South America: pooled results from two large multicenter case-control studies. Cancer Epidemiol Biomarkers Prev 23:107-16
Shi, Jianxin; Yang, Xiaohong R; Caporaso, Neil E et al. (2014) VTET: a variable threshold exact test for identifying disease-associated copy number variations enriched in short genomic regions. Front Genet 5:53
Turesson, Ingemar; Kovalchik, Stephanie A; Pfeiffer, Ruth M et al. (2014) Monoclonal gammopathy of undetermined significance and risk of lymphoid and myeloid malignancies: 728 cases followed up to 30 years in Sweden. Blood 123:338-45
Katki, Hormuzd A; Schiffman, Mark; Castle, Philip E et al. (2013) Five-year risks of CIN 3+ and cervical cancer among women with HPV testing of ASC-US Pap results. J Low Genit Tract Dis 17:S36-42
Wentzensen, Nicolas; Wacholder, Sholom (2013) From differences in means between cases and controls to risk stratification: a business plan for biomarker development. Cancer Discov 3:148-57
Breslow, Rosalind A; Chen, Chiung M; Graubard, Barry I et al. (2013) Diets of drinkers on drinking and nondrinking days: NHANES 2003-2008. Am J Clin Nutr 97:1068-75

Showing the most recent 10 out of 23 publications