Modern studies of the genetic architecture underlying human complex traits or diseases generally fall into three designs of association relationship: the association between genetic variants and disease, the association between genetic variants and expression (e.g. expression quantitative trait loci, eQTL), and the association between gene expression and disease. Many promising findings are discovered, including thousands of single nucleotide polymorphisms found to be associated with common diseases. While these findings provide us with valuable insights into the genetic architecture of common diseases and the shared heritability among diseases, what missing are the mechanisms, including the exact causal variants, the direction of their effects, and the orders of events, which forms the foundational hypothesis that we would like to solve through the studies in this proposal. With the inspiration of many recent discoveries that a substantial fraction of the disease-associated genetic variants is located in regulatory regions, in this proposal, we combine bioinformatics, statistical genetics, precision medicine, and phenotype and electronic medical record (EMR) data mining to develop novel analytical strategies that maximally leverage regulatory information from both genotype and expression, aiming to predict phenotype using transcriptomic alteration with DNA variation. We propose the following three major aims. (1) To build a unified genetic model for the prediction of phenotype by combining genetic and transcriptomic associations. Functional and regulatory annotation data generated from the ENCODE, FANTOM5, GENCODE, the Epigenomic Roadmap, and GTEx will be effectively incorporated to infer an important endophenotype, the genetically determined expression component, for better prediction of phenotype or disease outcome. (2) To develop a maximum likelihood based link test and a phenotype-specific regulatory network approach to resolve genotype-phenotype causality relationships mediated by gene expression. (3) To extensively evaluate the approaches in schizophrenia and apply them to broad phenotypes using the Vanderbilt biobank (BioVU) genotype and linked electronic medical data. Building on our previous studies and strong preliminary data, this proposal is timely for studying the genetic architecture in human complex diseases and traits by dissecting the genetic components contributed from regulatory roles of variants at the gene expression level. It is highly significant because it tackles the strong limitations in numerous genome-wide association studies (GWAS) and next-generation sequencing (NGS) for inferring causality and translational potentials in the emerging fields of precision medicine. The successful completion of this project will not only advance our understanding of genetic components in schizophrenia and a broad spectrum of phenotypes or clinical outcomes, but also provide useful methods and tools to the public community for studying genetic architecture of phenotype via the linkage of genomic and medical information.
Recent studies have unveiled that a large portion of phenotypic variability in disease risk for a broad spectrum of disease phenotypes can be explained by regulatory variants. Rapid technology advances have helped biomedical investigators generate huge amount of biological data, including genome- wide DNA variation, tissue-specific gene expression, and electronic medical records. To meet the great challenges on analyzing such large and heterogeneous datasets, in this proposal we combine statistical genetics, bioinformatics, and phenotype data mining to develop novel analytical strategies that maximally leverage information from both genotype and expression, aiming to predict phenotype and disease risk.
Showing the most recent 10 out of 12 publications