Modern studies of the genetic architecture underlying human complex traits or diseases generally fall into three designs of association relationship: the association between genetic variants and disease, the association between genetic variants and expression (e.g. expression quantitative trait loci, eQTL), and the association between gene expression and disease. Many promising findings are discovered, including thousands of single nucleotide polymorphisms found to be associated with common diseases. While these findings provide us with valuable insights into the genetic architecture of common diseases and the shared heritability among diseases, what missing are the mechanisms, including the exact causal variants, the direction of their effects, and the orders of events, which forms the foundational hypothesis that we would like to solve through the studies in this proposal. With the inspiration of many recent discoveries that a substantial fraction of the disease-associated genetic variants is located in regulatory regions, in this proposal, we combine bioinformatics, statistical genetics, precision medicine, and phenotype and electronic medical record (EMR) data mining to develop novel analytical strategies that maximally leverage regulatory information from both genotype and expression, aiming to predict phenotype using transcriptomic alteration with DNA variation. We propose the following three major aims. (1) To build a unified genetic model for the prediction of phenotype by combining genetic and transcriptomic associations. Functional and regulatory annotation data generated from the ENCODE, FANTOM5, GENCODE, the Epigenomic Roadmap, and GTEx will be effectively incorporated to infer an important endophenotype, the genetically determined expression component, for better prediction of phenotype or disease outcome. (2) To develop a maximum likelihood based link test and a phenotype-specific regulatory network approach to resolve genotype-phenotype causality relationships mediated by gene expression. (3) To extensively evaluate the approaches in schizophrenia and apply them to broad phenotypes using the Vanderbilt biobank (BioVU) genotype and linked electronic medical data. Building on our previous studies and strong preliminary data, this proposal is timely for studying the genetic architecture in human complex diseases and traits by dissecting the genetic components contributed from regulatory roles of variants at the gene expression level. It is highly significant because it tackles the strong limitations in numerous genome-wide association studies (GWAS) and next-generation sequencing (NGS) for inferring causality and translational potentials in the emerging fields of precision medicine. The successful completion of this project will not only advance our understanding of genetic components in schizophrenia and a broad spectrum of phenotypes or clinical outcomes, but also provide useful methods and tools to the public community for studying genetic architecture of phenotype via the linkage of genomic and medical information.

Public Health Relevance

Recent studies have unveiled that a large portion of phenotypic variability in disease risk for a broad spectrum of disease phenotypes can be explained by regulatory variants. Rapid technology advances have helped biomedical investigators generate huge amount of biological data, including genome- wide DNA variation, tissue-specific gene expression, and electronic medical records. To meet the great challenges on analyzing such large and heterogeneous datasets, in this proposal we combine statistical genetics, bioinformatics, and phenotype data mining to develop novel analytical strategies that maximally leverage information from both genotype and expression, aiming to predict phenotype and disease risk.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas Health Science Center Houston
Sch Allied Health Professions
United States
Zip Code
Ma, Liang; Jia, Peilin; Zhao, Zhongming (2018) Splicing QTL of human adipose-related traits. Sci Rep 8:318
Sun, Hua; Kim, Pora; Jia, Peilin et al. (2018) Distinct telomere length and molecular signatures in seminoma and non-seminoma of testicular germ cell tumor. Brief Bioinform :
Jia, Peilin; Chen, Xiangning; Xie, Wei et al. (2018) Mega-analysis of Odds Ratio: A Convergent Method for a Deep Understanding of the Genetic Evidence in Schizophrenia. Schizophr Bull :
Kim, Pora; Jia, Peilin; Zhao, Zhongming (2018) Kinase impact assessment in the landscape of fusion genes that retain kinase domains: a pan-cancer study. Brief Bioinform 19:450-460
Knijnenburg, Theo A; Wang, Linghua; Zimmermann, Michael T et al. (2018) Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas. Cell Rep 23:239-254.e6
Jia, Peilin; Chen, Xiangning; Fanous, Ayman H et al. (2018) Convergent roles of de novo mutations and common variants in schizophrenia in tissue-specific and spatiotemporal co-expression network. Transl Psychiatry 8:105
Jiang, Xingwu; Lu, Weiqiang; Shen, Xiaoyang et al. (2018) Repurposing sertraline sensitizes non-small cell lung cancer cells to erlotinib by inducing autophagy. JCI Insight 3:
O'Brien, Timothy D; Jia, Peilin; Caporaso, Neil E et al. (2018) Weak sharing of genetic association signals in three lung cancer subtypes: evidence at the SNP, gene, regulation, and pathway levels. Genome Med 10:16
Zhao, Junfei; Cheng, Feixiong; Jia, Peilin et al. (2018) An integrative functional genomics framework for effective identification of novel regulatory variants in genome-phenome studies. Genome Med 10:7
O'Brien, Timothy D; Jia, Peilin; Aldrich, Melinda C et al. (2018) Lung Cancer: One Disease or Many. Hum Hered 83:65-70

Showing the most recent 10 out of 12 publications