Novel analytic paradigms allowing for fully integrated interrogation of independent genomics data resources is expected to reveal substantial new knowledge regarding the mechanistic foundations of genetic associations. In this proposal we aim to develop, evaluate and apply sound statistical methods for leveraging and integrat- ing the vast amount of publicly available transcriptome and genomics resources to improve understanding of the mechanistic relationships among genes and regulatory elements associated with complex traits. Ultimately, methods for uncovering the molecular and physiological underpinnings of complex diseases will provide clin- ically relevant impact toward development of novel prognostic markers and therapeutic targets. The Speci?c Aims are to: (1) Develop a likelihood-based framework for integrated analysis of genomic elements, expression pro- ?les and phenotypes. An overarching challenge in this setting is that transcriptomics data, composed of genotypes and expression pro?les, and GWA data, composed of genotypes and complex traits, are only generally available for independent cohorts. We propose combining these two data resources and framing the analysis in terms of a missing data problem. The unobserved expression pro?les in the GWA data are treated as missing and an expectation-maximization (EM) approach is proposed. Methods for ef?cient implementation and inference, as well as an alternative Bayesian MCMC approach, are also described. (2) Extend the methods of Aim 1 for alternative data structures and types. The framework of Aim 1 will be further developed to: (a) account for complex linkage disequilibrium (LD) structures within and across genes; (b) address disparities across genotyping platforms; (c) provide for simultaneous investigation of multiple cell and tissue compartments, multiple isoforms, and multiple genes and regulatory elements; and (d) accommodate time-varying biomarker pro?les and time-to-event outcomes. (3) Apply and evaluate performance of the methods developed in Aims 1 and 2. In addition to fully vetting the proposed methods and comparing to alternative strategies using extensive simulation studies, we will further unravel and elucidate the mechanisms of gene and regulatory element control of complex traits using multiple publicly-available reference transcriptome data resources, repeatedly measured biomarker data arising from the GENE study, and clinical outcomes from the CRIC study (see Section C). This application launches from an extensive, decade-long and highly productive trans-disciplinary collabora- tion. Building on a strong research and mentoring record, the proposed research offers novel statistical research addressing pressing challenges in precision medicine.

Public Health Relevance

The emerging collections of big data in genomic medicine promise unprecedented opportunities to elucidate complex disease etiology and inform clinical management strategies. Using in?ammatory stress as our model system, we propose to develop, evaluate and apply new analytic paradigms for integrated analysis of publicly- available transcriptome data and data arising from genome-wide association studies, with the goal of improving understanding of the mechanisms of complex diseases. Ultimately, these methods will allow us to derive infor- mation from the vast quantities of genomics data for personalized, clinical decisions and thus serve as a central component of precision medicine.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
7R01GM127862-03
Application #
9977640
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2018-04-01
Project End
2022-03-31
Budget Start
2019-09-01
Budget End
2020-03-31
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Massachusetts General Hospital
Department
Type
DUNS #
073130411
City
Boston
State
MA
Country
United States
Zip Code
02114