Hierarchical Modeling of Interactions in Genome-Wide and Pathway-Based Association Studies: The overarching goal of this grant is to investigate the use of hierarchical modeling in the study of gene-environment and gene-gene interactions in both genome-wide association studies and pathway-based candidate gene studies. We will apply our methods to data available from two large NIH-supported projects: the Colon CFR and Children's Health Study. Data from these association studies often follows a natural hierarchical structure with polymorphisms within gene regions, genes within sub-pathways, and sub-pathways within etiologic networks. By building a statistical model to reflect this natural hierarchy we aim to better account for the dependencies between factors and to better incorporate our knowledge of the underlying etiology. In this proposal, in addition to evaluating the statistical form and structure of such models we also aim to gauge the impact of various types of prior information and intermediate measurements on inference. For genome-wide association studies we will develop analytic approaches for the incorporation of GxE interactions that deal with the multiple testing problem and extend to potentially more efficient 2-phase study designs. These methods will be expanded to test for the interaction of an environmental factor with multiple-SNPs, as well. Specifically with pathway-based studies, we aim to explore the feasibility and performance of mechanistic models (e.g. kinetic models) and hierarchical regression models with model selection for genes and environmental factors within sub-pathways and across networks of pathways. We use prior knowledge in the form of ontologies or expert-based relational databases to help formulate priors for the data analysis. Furthermore, we will investigate various multistage sampling schemes and their interplay with potential genomics data, including whole-exon expression, whole-genome somatic mutations and potential biomarker measures. Finally, we will compare our methods to various data mining techniques to allow genes to act within multiple pathways. Overall, we aim to develop statistical techniques that make it feasible to detect which genes involved in disease and, importantly, in which environmental context they act. By identifying both genetic and environmental factors, we will make progress in understanding the underlying mechanism that leads to disease and potentially identify ways in which to both prevent and treat complex diseases.

Public Health Relevance

Overall, we aim to develop statistical techniques that formally incorporate our biologic knowledge and make it feasible to detect which genes are involved in disease and, importantly, in which environmental context they act. By identifying both genetic and environmental factors, we will make progress in understanding the underlying mechanism that leads to disease and potentially identify ways in which to both prevent and treat complex diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Research Project (R01)
Project #
5R01ES016813-02
Application #
7894921
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Mcallister, Kimberly A
Project Start
2009-07-16
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2012-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$523,553
Indirect Cost
Name
University of Southern California
Department
Neurosciences
Type
Schools of Medicine
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Moss, Lilit C; Gauderman, William J; Lewinger, Juan Pablo et al. (2018) Using Bayes model averaging to leverage both gene main effects and G?×? E interactions to identify genomic regions in genome-wide association studies. Genet Epidemiol :
Newcombe, Paul J; Conti, David V; Richardson, Sylvia (2016) JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects. Genet Epidemiol 40:188-201
Duan, Lewei; Thomas, Duncan C (2013) A Bayesian Hierarchical Model for Relating Multiple SNPs within Multiple Genes to Disease Risk. Int J Genomics 2013:406217
Quintana, M A; Conti, D V (2013) Integrative variable selection via Bayesian model uncertainty. Stat Med 32:4938-53
Baurley, James W; Conti, David V (2013) A scalable, knowledge-based analysis framework for genetic association studies. BMC Bioinformatics 14:312
Lewinger, Juan Pablo; Morrison, John L; Thomas, Duncan C et al. (2013) Efficient two-step testing of gene-gene interactions in genome-wide association studies. Genet Epidemiol 37:440-51
Liu, Jinghua; Lewinger, Juan Pablo; Gilliland, Frank D et al. (2013) Confounding and heterogeneity in genetic association studies with admixed populations. Am J Epidemiol 177:351-60
Liang, Wei E; Thomas, Duncan C; Conti, David V (2012) Analysis and optimal design for association studies using next-generation sequencing with case-control pools. Genet Epidemiol 36:870-81
Franklin, Meredith; Vora, Hita; Avol, Edward et al. (2012) Predictors of intra-community variation in air quality. J Expo Sci Environ Epidemiol 22:135-47
Quintana, Melanie A; Schumacher, Fredrick R; Casey, Graham et al. (2012) Incorporating prior biologic information for high-dimensional rare variant association studies. Hum Hered 74:184-95

Showing the most recent 10 out of 17 publications