Investigations have been conducted for the potential for using data from current and future genome-wide association studies for improving performance of models for predicting disease risks. A new mathematical paradigm was developed to characterize predictive performance of polygenic models in terms of sample size for training datasets, number of underlying susceptibility loci and distribution of their effect-sizes. The paradigm was then applied to make projections for performance of risk prediction models for ten different complex traits, including cancers. These projections revealed that in the future extremely large GWAS, with sample size of a larger order magnitude than even some of the largest GWAS to date, would be needed for building genetic risk models with substantially improved predictive performance. A new method was developed for assessing gene-environment interactions using data from case-control genome-wide association studies that uses publicly available genetic controls. It was shown that under a set of assumptions it possible to characterize joint gene-environment effects from such studies if data on environmental exposures are available from an internal case-control study even if controls in such a study are not genotyped. New methods was developed for evaluating association of SNP markers with disease outcome of ordinal nature reflecting various stages of the progression of a disease. Two alternative tests, the maximum score test (MAX) and the adaptive P-value combination test (Adapt-P), are proposed with the aim of striking a balance between efficiency and robustness over possible alternative models by which a SNPs might be involved in the various stages. Simulation studies were used to demonstrates that MAX and Adapt-P have the most robust performance among all a range of tests under various realistic scenarios. A permutation-based resampling method was developed for using metabolomic data for testing the hypothesis of mediation of the effect of an exposure (e.g smoking) on the risk of a disease (e.g lung cancer) through intermediate biomarkers. Extensive simulation studies were used to examine validity and power of the proposed test. Methods were developed for analysis of population-based case-control studies with complex sampling designs. Two methods were developed for incorporating the information included in the sample weights by modeling the sample expectation of the weights conditional on design variables. These methods have higher efficiency and smaller finite sample bias compared with the standard estimators that use original sample weights. The methods were to the U.S. Kidney Cancer Case-Control Study to identify risk factors. A project developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, while the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. The method was applied to estimate the absolute five-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern Califronia. The LEXPIT model found an increased risk due to abnormal Pap test in HPV-negative that was not detected with logistic regression. An R package blm was developed to provide free and easy-to-use software for fitting the LEXPIT model.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Division of Cancer Epidemiology and Genetics
Zip Code
Flegal, Katherine M; Graubard, Barry I; Yi, Sang-Wook (2017) Comparative effects of the restriction method in two large observational studies of body mass index and mortality among adults. Eur J Clin Invest 47:415-421
Tomassi, Diego; Forzani, Liliana; Bura, Efstathia et al. (2017) Sufficient dimension reduction for censored predictors. Biometrics 73:220-231
Grill, Sonja; Ankerst, Donna P; Gail, Mitchell H et al. (2017) Comparison of approaches for incorporating new information into existing risk prediction models. Stat Med 36:1134-1156
Boca, Simina M; Pfeiffer, Ruth M; Sampson, Joshua N (2017) Multivariate meta-analysis with an increasing number of parameters. Biom J 59:496-510
Chatterjee, Nilanjan; Chen, Yi-Hau; Maas, Paige et al. (2016) Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources. J Am Stat Assoc 111:107-117
Kant, Ashima K; Graubard, Barry I (2016) A prospective study of water intake and subsequent risk of all-cause mortality in a national cohort. Am J Clin Nutr :
Wang, Lingxiao; Graubard, Barry I; Li, Yan (2016) A composite likelihood approach in testing for Hardy Weinberg Equilibrium using family-based genetic survey data. Stat Med 35:5040-5050
Maas, Paige; Barrdahl, Myrto; Joshi, Amit D et al. (2016) Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. JAMA Oncol 2:1295-1302
Espinosa, Pablo; Pfeiffer, Ruth M; GarcĂ­a-Casado, Zaida et al. (2016) Risk factors for keratinocyte skin cancer in patients diagnosed with melanoma, a large retrospective study. Eur J Cancer 53:115-24
Zhang, Han; Wu, Colin O; Yang, Yifan et al. (2016) A multi-locus genetic association test for a dichotomous trait and its secondary phenotype. Stat Methods Med Res :

Showing the most recent 10 out of 169 publications