The ENCODE projects have generated a wealth of high-quality genomic datasets with the applications of high- throughput next generation sequencing (NGS) to create a catalog of functional elements in the human and model organism genomes. Although the NGS technologies, embraced by ENCODE, are enabling interrogation of genomes in an unbiased manner, the data analysis efforts of the ENCODE projects have thus far focused on mappable regions of the genomes and thereby have not fully leveraged these data to their full advantage. A major bottleneck to a comprehensive understanding of data from the ENCODE projects is the lack of statistical and computational methods that can identify functional elements in repetitive regions. We will address this critical impediment in four specifi aims by building on our expertise in ChIP-seq and RNA-seq analysis.
In Aim 1, we will develop probabilistic models and accompanying software for utilizing reads that map to multiple locations on the genome (multi-reads) from multiple types of *-seq datasets (ChIP-, DNase-, MeDIP-, and FAIRE-seq). This will enable cataloging of regulatory elements in repetitive regions.
In Aim 2, we will improve the specificity of the discoveries in repetitive regions from ou probabilistic models by utilizing multiple related *- seq datasets simultaneously. Specifically, we will devise methods to supervise analysis of ChIP- and RNA-seq datasets by external ChIP-seq datasets. This will facilitate accurate inference for repetitive elements with near identical sequences, e.g., segmental duplications, long interspersed nuclear elements, and boost accuracy of gene and isoform quantification with RNA-seq.
In Aim 3, we will focus on identifying co-occupied/enriched regions to infer cell-specific modules of regions/genes and their regulatory profiles. We will also develop a formal differential co-enrichment framework to study cell-specific wiring and interactions of regulatory factors. This will elucidate how interactions among regulatory factors vary across cells/tissues/conditions.
Aim 4, we will apply our methods from Aims 1-3 to relevant ENCODE data to understand GATA factor functions in hematopoiesis and vascular biology. The GATA system in human and mouse will serve as a training and validation platform for our methods. Statistical and computational resources generated from the project, which will be disseminated as modular and robust software, will help to enhance and maximize the impact of ENCODE-derived data on the biomedical research community.

Public Health Relevance

The ENCODE projects have generated a wealth of high-quality functional genomic datasets with the applications of high-throughput next generation sequencing (NGS) to create a catalog of functional elements in the human and model organism genomes. A central limitation to a comprehensive understanding of these ENCODE data from the point of development, differentiation, and disease is the lack of statistical and computational methods that can identify functional elements in repetitive regions of the genomes. In this proposal, we will develop statistical and computational methods that can fully leverage ENCODE-derived data to their full advantage and catalog functional repetitive regions of the genomes.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01HG007019-03
Application #
8687990
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Gilchrist, Daniel A
Project Start
2012-09-17
Project End
2015-06-30
Budget Start
2014-07-01
Budget End
2015-06-30
Support Year
3
Fiscal Year
2014
Total Cost
Indirect Cost
Name
University of Wisconsin Madison
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715
Li, Sisi; Papale, Ligia A; Zhang, Qi et al. (2016) Genome-wide alterations in hippocampal 5-hydroxymethylcytosine links plasticity genes to acute stress. Neurobiol Dis 86:99-108
Papale, Ligia A; Li, Sisi; Madrid, Andy et al. (2016) Sex-specific hippocampal 5-hydroxymethylcytosine is disrupted in response to acute stress. Neurobiol Dis 96:54-66
Liu, Peng; Sanalkumar, Rajendran; Bresnick, Emery H et al. (2016) Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq. Genome Res 26:1124-33
Tanimura, Nobuyuki; Miller, Eli; Igarashi, Kazuhiko et al. (2016) Mechanism governing heme synthesis reveals a GATA factor/heme circuit that controls differentiation. EMBO Rep 17:249-65
Zhang, Qi; Zeng, Xin; Younkin, Sam et al. (2016) Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection. BMC Bioinformatics 17:96
Johnson, Kirby D; Kong, Guangyao; Gao, Xin et al. (2015) Cis-regulatory mechanisms governing stem and progenitor cell transitions. Sci Adv 1:e1500503
Hewitt, Kyle J; Kim, Duk Hyoung; Devadas, Prithvia et al. (2015) Hematopoietic Signaling Mechanism Revealed from a Stem/Progenitor Cell Cistrome. Mol Cell 59:62-74
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan et al. (2015) Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data. Cancer Inform 13:123-31
Zeng, Xin; Li, Bo; Welch, Rene et al. (2015) Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping. PLoS Comput Biol 11:e1004491
Yao, Chen; Chen, Brian H; Joehanes, Roby et al. (2015) Integromic analysis of genetic variation and gene expression identifies networks for cardiovascular disease phenotypes. Circulation 131:536-49

Showing the most recent 10 out of 16 publications