Novel analytic paradigms allowing for fully integrated interrogation of independent genomics data resources is expected to reveal substantial new knowledge regarding the mechanistic foundations of genetic associations. In this proposal we aim to develop, evaluate and apply sound statistical methods for leveraging and integrat- ing the vast amount of publicly available transcriptome and genomics resources to improve understanding of the mechanistic relationships among genes and regulatory elements associated with complex traits. Ultimately, methods for uncovering the molecular and physiological underpinnings of complex diseases will provide clin- ically relevant impact toward development of novel prognostic markers and therapeutic targets. The Speci?c Aims are to: (1) Develop a likelihood-based framework for integrated analysis of genomic elements, expression pro- ?les and phenotypes. An overarching challenge in this setting is that transcriptomics data, composed of genotypes and expression pro?les, and GWA data, composed of genotypes and complex traits, are only generally available for independent cohorts. We propose combining these two data resources and framing the analysis in terms of a missing data problem. The unobserved expression pro?les in the GWA data are treated as missing and an expectation-maximization (EM) approach is proposed. Methods for ef?cient implementation and inference, as well as an alternative Bayesian MCMC approach, are also described. (2) Extend the methods of Aim 1 for alternative data structures and types. The framework of Aim 1 will be further developed to: (a) account for complex linkage disequilibrium (LD) structures within and across genes; (b) address disparities across genotyping platforms; (c) provide for simultaneous investigation of multiple cell and tissue compartments, multiple isoforms, and multiple genes and regulatory elements; and (d) accommodate time-varying biomarker pro?les and time-to-event outcomes. (3) Apply and evaluate performance of the methods developed in Aims 1 and 2. In addition to fully vetting the proposed methods and comparing to alternative strategies using extensive simulation studies, we will further unravel and elucidate the mechanisms of gene and regulatory element control of complex traits using multiple publicly-available reference transcriptome data resources, repeatedly measured biomarker data arising from the GENE study, and clinical outcomes from the CRIC study (see Section C). This application launches from an extensive, decade-long and highly productive trans-disciplinary collabora- tion. Building on a strong research and mentoring record, the proposed research offers novel statistical research addressing pressing challenges in precision medicine.

Public Health Relevance

The emerging collections of big data in genomic medicine promise unprecedented opportunities to elucidate complex disease etiology and inform clinical management strategies. Using in?ammatory stress as our model system, we propose to develop, evaluate and apply new analytic paradigms for integrated analysis of publicly- available transcriptome data and data arising from genome-wide association studies, with the goal of improving understanding of the mechanisms of complex diseases. Ultimately, these methods will allow us to derive infor- mation from the vast quantities of genomics data for personalized, clinical decisions and thus serve as a central component of precision medicine.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Mount Holyoke College
Biostatistics & Other Math Sci
Schools of Arts and Sciences
South Hadley
United States
Zip Code