Interpreting human enhancer variants with a network-regularized composite model

Zhang, Zhengdong

Abstract

The development of Computational methods for interpreting sequence variants in the non-protein coding regions of the human genome has lagged behind the ability to generate large volumes of genome-wide associated study (GWAS) and whole-genome sequencing (WGS) data. In this project, we will develop innovative computational methods based on rigorous statistical modeling to integrate a large number of heterogeneous genomic data sets from diverse sources to identify non-coding variants that are candidates for affecting organismal function and leading to disease risk or other traits. Due to their genomic prevalence and functional importance, we will focus this proposed research on the specific class of genomic sites known as enhancers. By focusing on enhancers, we are able to develop rigorous statistical methodologies that can be extensively validated via experimental methods. The long-term goal is to accurately predict the sequence variants that confer a phenotypic effect. The objective in this particular application is to develop computational methods that analyze genomic data to identify a set of non-coding variants that are candidates for affecting organismal function and leading to disease risk or other traits. While our methods are intended to handle non- coding variants in different classes of sites identified in human genomes, in this application we will focus on phenotypic effects of variants in enhancers based on our central hypotheses are i) the majority of functionally- important, disease- and trait-associated variants in non-coding regions occur within enhancer regions, and ii) these variants not only alter enhancer actions on adjacent coding target genes, but also disrupt regulatory networks of enhancer interactions, leading to changes in broader programs of transcriptional regulation. These hypotheses have been formulated on the basis of our own preliminary data produced in the 9p21 gene desert, which is linked to specific types of cancer, cardiovascular disease, and type 2 diabetes, and is a locus where we have already made contributions linking GWAS data to a mechanistic understanding of specific enhancer functions. Guided by strong preliminary data, this hypothesis will be tested by pursuing two specific aims: 1) To predict causal enhancers variant by statistical modeling with biological networks; 2) To experimentally validate the computational predictions. The approach is innovative, because our computational approach is different from other software tools for analyzing sequence variants - e.g., RegulomeDB and FunSeq - as it integrates a large number of heterogeneous genomic data sets from diverse sources and incorporates rigorous statistical modeling of biological networks. The proposed research is significant, because by incorporating both genotypic and phenotypic information of genetic diseases and traits, our methods will be able to identify potential functional connections between non-coding variants and phenotypes, and facilitate a targeted analysis of whole-genome sequence data for disease risk assessment.

Public Health Relevance

The proposed research is relevant to public health because the successful completion of the proposed method development will make it possible to identify or substantially narrow the set of non-coding variants that are candidates for affecting organismal function leading to disease risk or other traits, and thus generate testable hypotheses about the genetic etiology of the diseases and traits. Such methods are also needed for targeted analyses of whole-genome sequence data for disease risk assessment. Thus, the proposed research is relevant to the part of NIH's mission that pertains to developing fundamental knowledge that will help to reduce the burdens of human disability.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG008153-01A1
Application #: 9072214
Study Section: Special Emphasis Panel (ZHG1-HGR-M (J1))
Program Officer: Pazin, Michael J

Project Start: 2016-09-12
Project End: 2019-07-31
Budget Start: 2016-09-12
Budget End: 2017-07-31
Support Year: 1
Fiscal Year: 2016
Total Cost: $832,550
Indirect Cost: $244,550

Institution

Name: Albert Einstein College of Medicine, Inc
Department
Type
DUNS #: 079783367

City: Bronx
State: NY
Country: United States
Zip Code: 10461

Related projects


NIH 2018 R01 HG	Interpreting human enhancer variants with a network-regularized composite model Zhang, Zhengdong D. / Albert Einstein College of Medicine, Inc
NIH 2018 R01 HG	Interpreting human enhancer variants with a network-regularized composite model Zhang, Zhengdong D. / Albert Einstein College of Medicine, Inc
NIH 2018 R01 HG	Interpreting human enhancer variants with a network-regularized composite model Zhang, Zhengdong D. / Albert Einstein College of Medicine
NIH 2017 R01 HG	Interpreting human enhancer variants with a network-regularized composite model Zhang, Zhengdong D. / Albert Einstein College of Medicine, Inc
NIH 2016 R01 HG	Interpreting human enhancer variants with a network-regularized composite model Zhang, Zhengdong D. / Albert Einstein College of Medicine, Inc	$832,550

Publications

Wang, Zhen; Zhang, Quanwei; Zhang, Wen et al. (2018) HEDD: Human Enhancer Disease Database. Nucleic Acids Res 46:D113-D120

Lin, Jhih-Rong; Jaroslawicz, Daniel; Cai, Ying et al. (2018) PGA: post-GWAS analysis for disease gene identification. Bioinformatics 34:1786-1788

Cai, Ying; Lin, Jhih-Rong; Zhang, Quanwei et al. (2018) Epigenetic alterations to Polycomb targets precede malignant transition in a mouse model of breast cancer. Sci Rep 8:5535

Lin, Jhih-Rong; Zhang, Quanwei; Cai, Ying et al. (2017) Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies. PLoS Genet 13:e1007142

Comments

Be the first to comment on Zhengdong Zhang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: