The development of Computational methods for interpreting sequence variants in the non-protein coding regions of the human genome has lagged behind the ability to generate large volumes of genome-wide associated study (GWAS) and whole-genome sequencing (WGS) data. In this project, we will develop innovative computational methods based on rigorous statistical modeling to integrate a large number of heterogeneous genomic data sets from diverse sources to identify non-coding variants that are candidates for affecting organismal function and leading to disease risk or other traits. Due to their genomic prevalence and functional importance, we will focus this proposed research on the specific class of genomic sites known as enhancers. By focusing on enhancers, we are able to develop rigorous statistical methodologies that can be extensively validated via experimental methods. The long-term goal is to accurately predict the sequence variants that confer a phenotypic effect. The objective in this particular application is to develop computational methods that analyze genomic data to identify a set of non-coding variants that are candidates for affecting organismal function and leading to disease risk or other traits. While our methods are intended to handle non- coding variants in different classes of sites identified in human genomes, in this application we will focus on phenotypic effects of variants in enhancers based on our central hypotheses are i) the majority of functionally- important, disease- and trait-associated variants in non-coding regions occur within enhancer regions, and ii) these variants not only alter enhancer actions on adjacent coding target genes, but also disrupt regulatory networks of enhancer interactions, leading to changes in broader programs of transcriptional regulation. These hypotheses have been formulated on the basis of our own preliminary data produced in the 9p21 gene desert, which is linked to specific types of cancer, cardiovascular disease, and type 2 diabetes, and is a locus where we have already made contributions linking GWAS data to a mechanistic understanding of specific enhancer functions. Guided by strong preliminary data, this hypothesis will be tested by pursuing two specific aims: 1) To predict causal enhancers variant by statistical modeling with biological networks; 2) To experimentally validate the computational predictions. The approach is innovative, because our computational approach is different from other software tools for analyzing sequence variants - e.g., RegulomeDB and FunSeq - as it integrates a large number of heterogeneous genomic data sets from diverse sources and incorporates rigorous statistical modeling of biological networks. The proposed research is significant, because by incorporating both genotypic and phenotypic information of genetic diseases and traits, our methods will be able to identify potential functional connections between non-coding variants and phenotypes, and facilitate a targeted analysis of whole-genome sequence data for disease risk assessment.

Public Health Relevance

The proposed research is relevant to public health because the successful completion of the proposed method development will make it possible to identify or substantially narrow the set of non-coding variants that are candidates for affecting organismal function leading to disease risk or other traits, and thus generate testable hypotheses about the genetic etiology of the diseases and traits. Such methods are also needed for targeted analyses of whole-genome sequence data for disease risk assessment. Thus, the proposed research is relevant to the part of NIH's mission that pertains to developing fundamental knowledge that will help to reduce the burdens of human disability.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG008153-01A1
Application #
9072214
Study Section
Special Emphasis Panel (ZHG1-HGR-M (J1))
Program Officer
Pazin, Michael J
Project Start
2016-09-12
Project End
2019-07-31
Budget Start
2016-09-12
Budget End
2017-07-31
Support Year
1
Fiscal Year
2016
Total Cost
$832,550
Indirect Cost
$244,550
Name
Albert Einstein College of Medicine, Inc
Department
Type
DUNS #
079783367
City
Bronx
State
NY
Country
United States
Zip Code
10461
Cai, Ying; Lin, Jhih-Rong; Zhang, Quanwei et al. (2018) Epigenetic alterations to Polycomb targets precede malignant transition in a mouse model of breast cancer. Sci Rep 8:5535
Wang, Zhen; Zhang, Quanwei; Zhang, Wen et al. (2018) HEDD: Human Enhancer Disease Database. Nucleic Acids Res 46:D113-D120
Lin, Jhih-Rong; Jaroslawicz, Daniel; Cai, Ying et al. (2018) PGA: post-GWAS analysis for disease gene identification. Bioinformatics 34:1786-1788
Lin, Jhih-Rong; Zhang, Quanwei; Cai, Ying et al. (2017) Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies. PLoS Genet 13:e1007142