In this application, we propose a systematic attempt on methodological development for the largely unexplored but practically important problem-personalized inference of genetic effects of genome variations to complex disease phenotypes, and personalized prediction of clinical outcomes from genome variations. While technological advancements and cost reduction in genome sequencing, clinical phenotyping, and possibly panomic patient profiling promise to bring people closer to an era of personalized and precision medicine, sound mathematical principles and efficient analytical programs needed to deliver such promises remain to be developed. We intend to develop next-generation statistical frameworks, algorithms, and software for robust, yet accurate and personalizable genetic analysis of complex diseases, including genome-wide association (GWA) and phenome-wide association (PheWA) mapping, and whole genome prediction (WGP). Toward this goal, we propose the following specific aims: 1) Develop a new framework for association mapping enabling multi-confounder correction and panomic genetic modeling. 2) Transform traditional parametric linear models to arbitrarily expressive nonparametric functional models for enhanced association mapping and whole-genome prediction. 3) Develop a new statistical paradigm for personalized GWA/PheWA and phenotype prediction. And 4) develop a turnkey and cloud-based software platform for personalized genomics, and application of our methods and programs to an in-depth genetic investigation of childhood and adult asthma using the CAMP and SARP datasets, in collaboration with clinicians from U Pitt School of Medicine/U Pitt Medical Center (UPMC), and Penn State Hershey Medical Center (PSMC). Our proposed methodological innovations depart significantly from conventional technologies and current platforms in clinical genomics, and represent an initial foray into a mathematically rigorous and computationally tractable way for medical genetic inference and prediction in presence of multiple confounders, rich prior structural knowledge, and needs for capturing both shared patterns and individual signatures in complex genetic effects. It is our goal that the resultant ne framework will improve the understanding, diagnosis, and treatment of complex human diseases such as asthma, and offer a practical basis for personalized medicine in the Big Data era of genomic medicine.

Public Health Relevance

A fundamental aim of modern medical genetics is to connect variations in clinical phenotypes with variations in the genome so that one can identify druggable genetic artifacts, predict clinical outcomes, and practice personalized medicine. The existing approaches for genetic analysis of complex human diseases such as asthma remain inadequate for this aim. Our proposed research focus on developing mathematically rigorous, computationally tractable, and user-friendly tools (methods and software) for medical genetic inference and clinical prediction in presence of multiple confounders, rich prior knowledge, and needs for capturing both shared patterns and individual signatures in complex genetic effects. It is our goal that the resultant new framework will improve the understanding, diagnosis, and treatment of complex diseases, and offer a practical basis for personalized medicine in the Big Data era of genomic medicine.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM114311-03
Application #
9344646
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
2015-08-01
Project End
2019-06-30
Budget Start
2017-07-01
Budget End
2018-06-30
Support Year
3
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Carnegie-Mellon University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Lengerich, Benjamin J; Aragam, Bryon; Xing, Eric P (2018) Personalized regression enables sample-specific pan-cancer analysis. Bioinformatics 34:i178-i186
Wang, Haohan; Liu, Xiang; Xiao, Yunpeng et al. (2018) Multiplex confounding factor correction for genomic association mapping with squared sparse linear mixed model. Methods 145:33-40
Al-Shedivat, Maruan; Wilson, Andrew Gordon; Saatchi, Yunus et al. (2017) Learning Scalable Deep Kernels with Recurrent Structure. J Mach Learn Res 18:2850-2886
Xu, Min; Chai, Xiaoqi; Muthakana, Hariank et al. (2017) Deep learning-based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms. Bioinformatics 33:i13-i22
Xiaojun Chang; Yao-Liang Yu; Yi Yang et al. (2017) Semantic Pooling for Complex Event Analysis in Untrimmed Videos. IEEE Trans Pattern Anal Mach Intell 39:1617-1632
Lee, Seunghak; Wang, Haohan; Xing, Eric P (2017) Backward genotype-transcript-phenotype association mapping. Methods 129:18-23
Wang, Haohan; Aragam, Bryon; Xing, Eric P (2017) Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2017:431-438
Lee, Seunghak; Kong, Soonho; Xing, Eric P (2016) A network-driven approach for genome-wide association mapping. Bioinformatics 32:i164-i173
Marchetti-Bowick, Micol; Yin, Junming; Howrylak, Judie A et al. (2016) A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits. Bioinformatics 32:2903-10
Dubey, Avinava; Reddi, Sashank J; Póczos, Barnabás et al. (2016) Variance Reduction in Stochastic Gradient Langevin Dynamics. Adv Neural Inf Process Syst 29:1154-1162