In order to understand the genomic architecture and etiology for complex human diseases, great efforts have been extended in the past decades on research involving genome-wide genetic variation, transcriptome, and other genomic information. To date, rich resources have been generated and most are made publicly available after being analyzed for respective primary goals/hypotheses. Yet our understandings of human disease mechanisms are just beginning, and those understandings would require both the identification of a cadre of genetic and epigenetic risk factors, and the integration of key factors into a synergistic system. To best utilize existing data and facilitate research on complex human diseases, the long-term objective of the proposed research is to develop powerful and efficient statistical methods and computational tools for multivariate analyses in mainly two areas: association studies with the integration of genomic and non-genomic information in order to further identify genetic variation for complex diseases;and integrative genomic analyses that jointly analyze genetic variation, transcriptome, and other information in the genome.
In Aim 1, we propose novel and powerful methods for gene-based association tests, for identification of genetic variation associated with multivariate disease profiles, and for gene-based gene-environment interaction tests.
In Aim 2, we develop regularized methods for construction and comparison of eQTL networks. The later can also be used to reveal important genetic variants and regulatory relationships through characterizing the changes in genetic regulatory patterns across different phenotypic or environmental groups. Much of our proposed work is motivated by and will be applied to a genetic-genomic study on arsenic toxicity, Gene-Environment Multi- phenotype Study (GEMS).
In Aim 3, we propose methods tailored for the characteristics of this data set;we will also test novel scientific hypotheses on this unique and large arsenic toxicity study. Our proposal is cost- effective as it analyzes existing data from GEMS while providing methods and tools for new research directions. We anticipate that the proposed method development, when applied to and beyond the arsenic toxicity data, would yield valuable insights on clinical trial treatment effects, and on disease etiology for several complex diseases/traits, including but not limited to, arsenic-related skin cancer, cardiovascular diseases, hormone measures, body mass index and blood pressure.
A comprehensive portrait of the genomic architecture and etiology for complex diseases would require the modeling of a cadre of genetic and epigenetic risk factors into a synergistic system. The long-term objective of the proposed work is to develop statistical methods and computational tools for integrative multivariate analyses, in order to: 1) further identify genetic variation and recapitalize on existing association/sequencing data by integration of diverse genomic and non-genomic information, and 2) elucidate the interplay among genomic risk factors by construction of eQTL networks. Much of our proposed work is motivated by and will be applied to an existing association and integrative genomic data with 5,354 individuals from the Gene- Environment Multi-phenotype Study (GEMS).
|Mertins, Philipp; Mani, D R; Ruggles, Kelly V et al. (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55-62|
|Petralia, Francesca; Song, Won-Min; Tu, Zhidong et al. (2016) New Method for Joint Network Analysis Reveals Common and Different Coexpression Patterns among Genes and Proteins in Breast Cancer. J Proteome Res 15:743-54|
|Wang, Jiebiao; Gamazon, Eric R; Pierce, Brandon L et al. (2016) Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx. Am J Hum Genet 98:697-708|
|Liu, Qianying; Chen, Lin S; Nicolae, Dan L et al. (2016) A unified set-based test with adaptive filtering for gene-environment interaction analyses. Biometrics 72:629-38|
|Fu, Rong; Wang, Pei; Ma, Weiping et al. (2016) A statistical method for detecting differentially expressed SNVs based on next-generation RNA-seq data. Biometrics :|
|Petralia, Francesca; Wang, Pei; Yang, Jialiang et al. (2015) Integrative random forest for gene regulatory network inference. Bioinformatics 31:i197-205|
|Danaher, P; Paul, D; Wang, P (2015) Covariance-based analyses of biological pathways. Biometrika 102:533-544|