Statistical Methods For Annotating Repetitive Genomic Regions Through ENCODE-deri

Bresnick, Emery; Dewey, Colin; Keles, Sunduz

Abstract

The ENCODE projects have generated a wealth of high-quality genomic datasets with the applications of high- throughput next generation sequencing (NGS) to create a catalog of functional elements in the human and model organism genomes. Although the NGS technologies, embraced by ENCODE, are enabling interrogation of genomes in an unbiased manner, the data analysis efforts of the ENCODE projects have thus far focused on mappable regions of the genomes and thereby have not fully leveraged these data to their full advantage. A major bottleneck to a comprehensive understanding of data from the ENCODE projects is the lack of statistical and computational methods that can identify functional elements in repetitive regions. We will address this critical impediment in four specifi aims by building on our expertise in ChIP-seq and RNA-seq analysis.
In Aim 1, we will develop probabilistic models and accompanying software for utilizing reads that map to multiple locations on the genome (multi-reads) from multiple types of *-seq datasets (ChIP-, DNase-, MeDIP-, and FAIRE-seq). This will enable cataloging of regulatory elements in repetitive regions.
In Aim 2, we will improve the specificity of the discoveries in repetitive regions from ou probabilistic models by utilizing multiple related *- seq datasets simultaneously. Specifically, we will devise methods to supervise analysis of ChIP- and RNA-seq datasets by external ChIP-seq datasets. This will facilitate accurate inference for repetitive elements with near identical sequences, e.g., segmental duplications, long interspersed nuclear elements, and boost accuracy of gene and isoform quantification with RNA-seq.
In Aim 3, we will focus on identifying co-occupied/enriched regions to infer cell-specific modules of regions/genes and their regulatory profiles. We will also develop a formal differential co-enrichment framework to study cell-specific wiring and interactions of regulatory factors. This will elucidate how interactions among regulatory factors vary across cells/tissues/conditions.
Aim 4, we will apply our methods from Aims 1-3 to relevant ENCODE data to understand GATA factor functions in hematopoiesis and vascular biology. The GATA system in human and mouse will serve as a training and validation platform for our methods. Statistical and computational resources generated from the project, which will be disseminated as modular and robust software, will help to enhance and maximize the impact of ENCODE-derived data on the biomedical research community.

Public Health Relevance

The ENCODE projects have generated a wealth of high-quality functional genomic datasets with the applications of high-throughput next generation sequencing (NGS) to create a catalog of functional elements in the human and model organism genomes. A central limitation to a comprehensive understanding of these ENCODE data from the point of development, differentiation, and disease is the lack of statistical and computational methods that can identify functional elements in repetitive regions of the genomes. In this proposal, we will develop statistical and computational methods that can fully leverage ENCODE-derived data to their full advantage and catalog functional repetitive regions of the genomes.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 3U01HG007019-03S1
Application #: 9060461
Study Section: Special Emphasis Panel (ZHG1)
Program Officer: Gilchrist, Daniel A

Project Start: 2012-09-17
Project End: 2016-06-30
Budget Start: 2015-07-01
Budget End: 2016-06-30
Support Year: 3
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: University of Wisconsin Madison
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 161202122

City: Madison
State: WI
Country: United States
Zip Code: 53715

Related projects


NIH 2015 U01 HG	Statistical Methods For Annotating Repetitive Genomic Regions Through ENCODE-deri Bresnick, Emery H.; Dewey, Colin Noel; Keles, Sunduz / University of Wisconsin Madison
NIH 2014 U01 HG	Statistical Methods For Annotating Repetitive Genomic Regions Through ENCODE-deri Bresnick, Emery H.; Dewey, Colin Noel; Keles, Sunduz / University of Wisconsin Madison
NIH 2013 U01 HG	Statistical Methods For Annotating Repetitive Genomic Regions Through ENCODE-deri Keles, Sunduz; Bresnick, Emery H.; Dewey, Colin Noel / University of Wisconsin Madison	$404,900
NIH 2012 U01 HG	Statistical Methods For Annotating Repetitive Genomic Regions Through ENCODE-deri Keles, Sunduz; Bresnick, Emery H.; Dewey, Colin Noel / University of Wisconsin Madison	$382,136

Publications

Mehta, Charu; Johnson, Kirby D; Gao, Xin et al. (2017) Integrating Enhancer Mechanisms to Establish a Hierarchical Blood Development Program. Cell Rep 20:2966-2979

Welch, Rene; Chung, Dongjun; Grass, Jeffrey et al. (2017) Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments. Nucleic Acids Res 45:e145

Shin, Sunyoung; Kele?, Sündüz (2017) Annotation Regression for Genome-Wide Association Studies with an Application to Psychiatric Genomic Consortium Data. Stat Biosci 9:50-72

Bernstein, Matthew N; Doan, AnHai; Dewey, Colin N (2017) MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 33:2914-2923

Kreimer, Anat; Zeng, Haoyang; Edwards, Matthew D et al. (2017) Predicting gene expression in massively parallel reporter assays: A comparative study. Hum Mutat 38:1240-1250

Zhang, Qi; Zeng, Xin; Younkin, Sam et al. (2016) Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection. BMC Bioinformatics 17:96

Tanimura, Nobuyuki; Miller, Eli; Igarashi, Kazuhiko et al. (2016) Mechanism governing heme synthesis reveals a GATA factor/heme circuit that controls differentiation. EMBO Rep 17:249-65

Li, Sisi; Papale, Ligia A; Zhang, Qi et al. (2016) Genome-wide alterations in hippocampal 5-hydroxymethylcytosine links plasticity genes to acute stress. Neurobiol Dis 86:99-108

Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J et al. (2016) A Hierarchical Framework for State-Space Matrix Inference and Clustering. Ann Appl Stat 10:1348-1372

Liu, Peng; Sanalkumar, Rajendran; Bresnick, Emery H et al. (2016) Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq. Genome Res 26:1124-33

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on Emery Bresnick's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: