Machine learning methods to impute and annotate epigenomic maps

Noble, William

Abstract

The NIH Roadmap Epigenomics Program has produced reference epigenomic maps derived from a variety of human primary cells and tissues, including pluripotent cell types and in vitro differentiated forms, highly purified primar cells, and a range of fetal and adult tissues. The goal of the proposed project is to develop, validate and apply unsupervised machine learning methods to the joint analysis of these epigenomic maps along with (1) data generated by the NIH ENCODE Consortium, (2) a variety of publicly available data sets that characterize the three-dimensional structure of DNA in the nucleus, and (3) information about evolutionary conservation, represented by cross-species DNA alignments.
The first aim of the project will use data imputation methods to carry out virtual functional genomics experiments. The proposed method is based on techniques developed in the context of recommender systems, but is extended to model dependencies along the genomic axis. By simultaneously analyzing the pattern of biochemical activity across a range of cell types and assay types, the proposed imputation method will accurately predict the results of an assay, such as ChIP-seq for a particular histone modification in a particular cell type, that has not yet been carried out. We will systematically apply this method to Roadmap Epigenomics and ENCODE data, filling in missing experiments in the matrix of cell types and assay types. The remaining three specific aims extend and apply our existing system for semi-automated genome annotation, Segway, which integrates a wide variety of functional genomics data into a human interpretable labeling of genomic elements. These analyses will be performed on real data as well as the virtual experiments from Aim 1. We propose a novel, graph-based regularization scheme and show how, using this approach, we can use Segway to perform integrated analysis of data across cell types and integrate 3D genome architecture information from assays such as Hi-C. We also propose a post-processing method to exploit patterns of evolutionary conservation to identify functionally important labels in the resulting annotations. The primary deliverables will include novel software for imputation and annotation, as well as publicly available sets of virtual experiments and genome annotations.

Public Health Relevance

The NIH has recently expended substantial effort to generate raw data that characterizes the human epigenome across a variety of cell types. This proposal uses machine learning methods to help make sense of this large collection of epigenomic maps, combining the maps with data generated by the NIH ENCODE Consortium, information about the 3D structure of DNA, and information about evolutionary conservation. The project will produce novel computational methods as well as two primary analysis products: virtual experiments for combinations of assays and cell types that have not yet been carried out and annotations that identify various types of biochemical and functional activity along the human genome.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Environmental Health Sciences (NIEHS)
Type: Research Project (R01)
Project #: 1R01ES024917-01
Application #: 8814095
Study Section: Special Emphasis Panel (ZRG1-IMST-R (51))
Program Officer: Chadwick, Lisa

Project Start: 2014-09-10
Project End: 2016-08-31
Budget Start: 2014-09-10
Budget End: 2015-08-31
Support Year: 1
Fiscal Year: 2014
Total Cost: $285,064
Indirect Cost: $85,064

Institution

Name: University of Washington
Department: Genetics
Type: Schools of Medicine
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2015 R01 ES	Machine learning methods to impute and annotate epigenomic maps Noble, William Stafford / University of Washington	$282,942
NIH 2014 R01 ES	Machine learning methods to impute and annotate epigenomic maps Noble, William Stafford / University of Washington	$285,064

Publications

Libbrecht, Maxwell W; Ay, Ferhat; Hoffman, Michael M et al. (2015) Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression. Genome Res 25:544-57

Comments

Be the first to comment on William Noble's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: