Many complex disease syndromes consist of a large number of highly related, rather than independent, clinical phenotypes. Differences between these syndromes involve the complex interplay of a large number of genomic variations that perturb the function of disease-related genes in the context of a regulatory network, rather than individually. Thus unraveling the causal genetic variations and understanding the mechanisms of consequent cell and tissue transformation requires an analysis that jointly considers the epistatic, pleiotropic, and plastic interactions of elements and modules within and between the genome (G), transcriptome (T), and phenome (P). Most conventional methods focus on associations between every individual marker genotype and every single phenotype;they have limited statistical power and overlook the complex omit structures. We propose a systematic attempt on methodological development for the largely unexplored but practically important problem of structured associations between the "-omes". Rather than testing each SNP separately for association and then applying a correction by multiple hypothesis test, a structured association analysis identifies associations between groups of entities each with its own sophisticated structure that can not be ignored, such as blocks of SNPs with high LD, modules of genes in the same pathway, and clusters of phenotypes belong to a system of clinical descriptors of a disease. We will develop a mathematically rigorous and computationally efficient machine learning platform and software to address the methodological challenges involved with unraveling the interplay between disease-relevant elements in the G, T, and P omes. Our technical innovations include novel statistical models and algorithms for haplotype inference, recombination hotspot detection, gene network and phenotype network inference, admixture association mapping, and most importantly, a family of new structured regression techniques such as the graph-regularized regression, graph- guided fused lasso and extensions, that perform functional approximations to the association functions among structural elements in the G, T, and P omes, and have provable guarantee on consistency and sparsistency. We envisage our proposed research will open a new paradigm for association studies of complex diseases, which facilitates: 1) Intra- and inter-omic integration of data for association mapping and disease gene/pathway discovery, 2) Thorough explorations of the internal structures within different omic data, so that cryptic associations that are not possibly detectable in unstructured analysis due to their weak statistical power can be now inferred. 3) Joint statistical inference of mechanisms and pathways of how variations in DNA lead to variations in complex traits flows through molecular networks, and inference of condition-specific state of gene function in the molecular networks, and 4) Development of faster and automated computational algorithm with greater scalability and robustness to large-scale inter-omic analysis, and more convenient software package and user interface. All the software tools will be made available for free to the public.

Public Health Relevance

We propose a systematic attempt on methodological development for the largely unexplored but practically important problem of structured association mapping between disease-relevant elements in the genome, transcriptome, and phenome. Since many complex diseases involve composite phenotypes that are the outcome of intricate perturbation of molecular network underlying gene regulatory resulted from complex and interdependent genome variations, structured association analysis at multi-omic level is not only needed, but also necessary, but it is beyond the grasp of convention methods and requires the methodological innovations we propose. Characterizing such interactions can provide a more comprehensive genetic and molecular view of complex diseases, which may lead to the identification of genes underlying disease processes;in addition, such an approach will allow us to formulate hypotheses regarding the roles of these genes with respect to disease pathogenesis, and to develop improved diagnostic biomarkers for multivariate clinical phenotypes.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
Schools of Arts and Sciences
United States
Zip Code
Wu, Wei; Bleecker, Eugene; Moore, Wendy et al. (2014) Unsupervised phenotyping of Severe Asthma Research Program participants using expanded lung data. J Allergy Clin Immunol 133:1280-8
Shringarpure, Suyash; Xing, Eric P (2014) Effects of sample selection bias on the accuracy of population structure and ancestry inference. G3 (Bethesda) 4:901-11
Xing, Eric P; Curtis, Ross E; Schoenherr, Georg et al. (2014) GWAS in a box: statistical and visual analytics of structured associations via GenAMap. PLoS One 9:e97524
Parikh, Ankur P; Wu, Wei; Xing, Eric P (2014) Robust reverse engineering of dynamic gene networks under sample size heterogeneity. Pac Symp Biocomput :265-76
Kim, Seyoung; Xing, Eric P (2014) Exploiting genome structure in association analysis. J Comput Biol 21:345-60
Curtis, Ross E; Kim, Seyoung; Woolford Jr, John L et al. (2013) Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules. BMC Genomics 14:196
Curtis, Ross E; Goyal, Anuj; Xing, Eric P (2012) Enhancing the usability and performance of structured association mapping algorithms using automation, parallelization, and visualization in the GenAMap software system. BMC Genet 13:24
Curtis, Ross E; Yuen, Amos; Song, Le et al. (2011) TVNViewer: an interactive visualization tool for exploring networks that change over time or space. Bioinformatics 27:1880-1
Shringarpure, Suyash; Won, Daegun; Xing, Eric P (2011) StructHDP: automatic inference of number of clusters and population structure from admixed genotype data. Bioinformatics 27:i324-32
Puniyani, Kriti; Kim, Seyoung; Xing, Eric P (2010) Multi-population GWA mapping via multi-task regularized regression. Bioinformatics 26:i208-16

Showing the most recent 10 out of 11 publications