The deluge of genome sequencing and functional genomic data in multiple cellular contexts across healthy and diseased individuals provides a unique opportunity to decipher the regulatory and genetic architecture of diseases and traits. Novel computational methods are required that can address fundamental problems involving data representation, data integration, learning accurate predictive models from large-scale datasets and extraction of novel biological insights from complex models. We propose novel machine learning frameworks based on deep neural networks with new interpretation and hypothesis generation engines capable of integrating a wide variety of key genomic data types to learn predictive models of chromatin architecture and chromatin state; integrative models of transcription factor binding; determinants of macroscale three-dimensional genome architecture involving long-range chromatin contacts and the regulatory basis of functional non-coding, regulatory variants. Our methods are highly generalizable to several other related problems in regulatory genomics and lay the foundation for a paradigm shift in computational genomics.

Public Health Relevance

We propose a novel class of machine learning methods based on deep neural networks to integrate diverse sources of functional genomics data and discover novel insights into the 1D microscale and 3D macroscale genetic and regulatory architecture of the human genome.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
NIH Director’s New Innovator Awards (DP2)
Project #
1DP2GM123485-01
Application #
9169521
Study Section
Special Emphasis Panel (ZRG1-MOSS-C (56)R)
Program Officer
Gregurick, Susan
Project Start
2016-09-30
Project End
2021-05-31
Budget Start
2016-09-30
Budget End
2021-05-31
Support Year
1
Fiscal Year
2016
Total Cost
$2,355,000
Indirect Cost
$855,000
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Koh, Pang Wei; Pierson, Emma; Kundaje, Anshul (2017) Denoising genome-wide histone ChIP-seq with convolutional neural networks. Bioinformatics 33:i225-i233