Investigations have been conducted for the potential for using data from current and future genome-wide association studies for improving performance of models for predicting disease risks. A new mathematical paradigm was developed to characterize predictive performance of polygenic models in terms of sample size for training datasets, number of underlying susceptibility loci and distribution of their effect-sizes. The paradigm was then applied to make projections for performance of risk prediction models for ten different complex traits, including cancers. These projections revealed that in the future extremely large GWAS, with sample size of a larger order magnitude than even some of the largest GWAS to date, would be needed for building genetic risk models with substantially improved predictive performance. A new method was developed for assessing gene-environment interactions using data from case-control genome-wide association studies that uses publicly available genetic controls. It was shown that under a set of assumptions it possible to characterize joint gene-environment effects from such studies if data on environmental exposures are available from an internal case-control study even if controls in such a study are not genotyped. New methods was developed for evaluating association of SNP markers with disease outcome of ordinal nature reflecting various stages of the progression of a disease. Two alternative tests, the maximum score test (MAX) and the adaptive P-value combination test (Adapt-P), are proposed with the aim of striking a balance between efficiency and robustness over possible alternative models by which a SNPs might be involved in the various stages. Simulation studies were used to demonstrates that MAX and Adapt-P have the most robust performance among all a range of tests under various realistic scenarios. A permutation-based resampling method was developed for using metabolomic data for testing the hypothesis of mediation of the effect of an exposure (e.g smoking) on the risk of a disease (e.g lung cancer) through intermediate biomarkers. Extensive simulation studies were used to examine validity and power of the proposed test. Methods were developed for analysis of population-based case-control studies with complex sampling designs. Two methods were developed for incorporating the information included in the sample weights by modeling the sample expectation of the weights conditional on design variables. These methods have higher efficiency and smaller finite sample bias compared with the standard estimators that use original sample weights. The methods were to the U.S. Kidney Cancer Case-Control Study to identify risk factors. A project developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, while the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. The method was applied to estimate the absolute five-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern Califronia. The LEXPIT model found an increased risk due to abnormal Pap test in HPV-negative that was not detected with logistic regression. An R package blm was developed to provide free and easy-to-use software for fitting the LEXPIT model.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Division of Cancer Epidemiology and Genetics
Zip Code
Gail, Mitchell H; Wu, Jincao; Wang, Molin et al. (2016) Calibration and seasonal adjustment for matched case-control studies of vitamin D and cancer. Stat Med 35:2133-48
Lubin, Jay H; Albanes, Demetrius; Hoppin, Jane A et al. (2016) Greater Coronary Heart Disease Risk With Lower Intensity and Longer Duration Smoking Compared With Higher Intensity and Shorter Duration Smoking: Congruent Results Across Diverse Cohorts. Nicotine Tob Res :
Flegal, Katherine M; Panagiotou, Orestis A; Graubard, Barry I (2015) Estimating population attributable fractions to quantify the health burden of obesity. Ann Epidemiol 25:201-7
Han, Summer S; Rosenberg, Philip S; Ghosh, Arpita et al. (2015) An exposure-weighted score test for genetic associations integrating environmental risk factors. Biometrics 71:596-605
Yao, Wenliang; Li, Zhaohai; Graubard, Barry I (2015) Estimation of ROC curve with complex survey data. Stat Med 34:1293-303
Li, Y; Graubard, B I; Huang, P et al. (2015) Extension of the Peters-Belson method to estimate health disparities among multiple groups using logistic regression with survey data. Stat Med 34:595-612
Panagiotou, Orestis A; Travis, Ruth C; Campa, Daniele et al. (2015) A genome-wide pleiotropy scan for prostate cancer risk. Eur Urol 67:649-57
Gail, Mitchell H (2014) Using absolute risks to assess the risks and benefits of treatment. Thorax 69:604-5
Boca, Simina M; Sinha, Rashmi; Cross, Amanda J et al. (2014) Testing multiple biological mediators simultaneously. Bioinformatics 30:214-20
Ghosh, Arpita; Hartge, Patricia; Kraft, Peter et al. (2014) Leveraging family history in population-based case-control association studies. Genet Epidemiol 38:114-22

Showing the most recent 10 out of 88 publications