Many complex disease syndromes consist of a large number of highly related, rather than independent, clinical phenotypes. Differences between these syndromes involve the complex interplay of a large number of genomic variations that perturb the function of disease-related genes in the context of a regulatory network, rather than individually. Thus unraveling the causal genetic variations and understanding the mechanisms of consequent cell and tissue transformation requires an analysis that jointly considers the epistatic, pleiotropic, and plastic interactions of elements and modules within and between the genome (G), transcriptome (T), and phenome (P). Most conventional methods focus on associations between every individual marker genotype and every single phenotype;they have limited statistical power and overlook the complex omit structures. We propose a systematic attempt on methodological development for the largely unexplored but practically important problem of structured associations between the """"""""-omes"""""""". Rather than testing each SNP separately for association and then applying a correction by multiple hypothesis test, a structured association analysis identifies associations between groups of entities each with its own sophisticated structure that can not be ignored, such as blocks of SNPs with high LD, modules of genes in the same pathway, and clusters of phenotypes belong to a system of clinical descriptors of a disease. We will develop a mathematically rigorous and computationally efficient machine learning platform and software to address the methodological challenges involved with unraveling the interplay between disease-relevant elements in the G, T, and P omes. Our technical innovations include novel statistical models and algorithms for haplotype inference, recombination hotspot detection, gene network and phenotype network inference, admixture association mapping, and most importantly, a family of new structured regression techniques such as the graph-regularized regression, graph- guided fused lasso and extensions, that perform functional approximations to the association functions among structural elements in the G, T, and P omes, and have provable guarantee on consistency and sparsistency. We envisage our proposed research will open a new paradigm for association studies of complex diseases, which facilitates: 1) Intra- and inter-omic integration of data for association mapping and disease gene/pathway discovery, 2) Thorough explorations of the internal structures within different omic data, so that cryptic associations that are not possibly detectable in unstructured analysis due to their weak statistical power can be now inferred. 3) Joint statistical inference of mechanisms and pathways of how variations in DNA lead to variations in complex traits flows through molecular networks, and inference of condition-specific state of gene function in the molecular networks, and 4) Development of faster and automated computational algorithm with greater scalability and robustness to large-scale inter-omic analysis, and more convenient software package and user interface. All the software tools will be made available for free to the public.
We propose a systematic attempt on methodological development for the largely unexplored but practically important problem of structured association mapping between disease-relevant elements in the genome, transcriptome, and phenome. Since many complex diseases involve composite phenotypes that are the outcome of intricate perturbation of molecular network underlying gene regulatory resulted from complex and interdependent genome variations, structured association analysis at multi-omic level is not only needed, but also necessary, but it is beyond the grasp of convention methods and requires the methodological innovations we propose. Characterizing such interactions can provide a more comprehensive genetic and molecular view of complex diseases, which may lead to the identification of genes underlying disease processes;in addition, such an approach will allow us to formulate hypotheses regarding the roles of these genes with respect to disease pathogenesis, and to develop improved diagnostic biomarkers for multivariate clinical phenotypes.
|Lee, Seunghak; Kong, Soonho; Xing, Eric P (2016) A network-driven approach for genome-wide association mapping. Bioinformatics 32:i164-i173|
|Wang, Xuefeng; Xing, Eric P; Schaid, Daniel J (2015) Kernel methods for large-scale genomic data analysis. Brief Bioinform 16:183-92|
|Shringarpure, Suyash; Xing, Eric P (2014) Effects of sample selection bias on the accuracy of population structure and ancestry inference. G3 (Bethesda) 4:901-11|
|Kolar, Mladen; Liu, Han; Xing, Eric P (2014) Graph Estimation From Multi-Attribute Data. J Mach Learn Res 15:1713-1750|
|Xing, Eric P; Curtis, Ross E; Schoenherr, Georg et al. (2014) GWAS in a box: statistical and visual analytics of structured associations via GenAMap. PLoS One 9:e97524|
|Parikh, Ankur P; Wu, Wei; Xing, Eric P (2014) Robust reverse engineering of dynamic gene networks under sample size heterogeneity. Pac Symp Biocomput :265-76|
|Kim, Seyoung; Xing, Eric P (2014) Exploiting genome structure in association analysis. J Comput Biol 21:345-60|
|Wu, Wei; Bleecker, Eugene; Moore, Wendy et al. (2014) Unsupervised phenotyping of Severe Asthma Research Program participants using expanded lung data. J Allergy Clin Immunol 133:1280-8|
|Yin, Junming; Ho, Qirong; Xing, Eric P (2013) A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks. Adv Neural Inf Process Syst 2013:422-430|
|Curtis, Ross E; Kim, Seyoung; Woolford Jr, John L et al. (2013) Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules. BMC Genomics 14:196|
Showing the most recent 10 out of 22 publications