The deluge of genome sequencing and functional genomic data in multiple cellular contexts across healthy and diseased individuals provides a unique opportunity to decipher the regulatory and genetic architecture of diseases and traits. Novel computational methods are required that can address fundamental problems involving data representation, data integration, learning accurate predictive models from large-scale datasets and extraction of novel biological insights from complex models. We propose novel machine learning frameworks based on deep neural networks with new interpretation and hypothesis generation engines capable of integrating a wide variety of key genomic data types to learn predictive models of chromatin architecture and chromatin state; integrative models of transcription factor binding; determinants of macroscale three-dimensional genome architecture involving long-range chromatin contacts and the regulatory basis of functional non-coding, regulatory variants. Our methods are highly generalizable to several other related problems in regulatory genomics and lay the foundation for a paradigm shift in computational genomics.

Public Health Relevance

We propose a novel class of machine learning methods based on deep neural networks to integrate diverse sources of functional genomics data and discover novel insights into the 1D microscale and 3D macroscale genetic and regulatory architecture of the human genome.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
NIH Director’s New Innovator Awards (DP2)
Project #
1DP2GM123485-01
Application #
9169521
Study Section
Special Emphasis Panel (ZRG1-MOSS-C (56)R)
Program Officer
Gregurick, Susan
Project Start
2016-09-30
Project End
2021-05-31
Budget Start
2016-09-30
Budget End
2021-05-31
Support Year
1
Fiscal Year
2016
Total Cost
$2,355,000
Indirect Cost
$855,000
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Greenside, Peyton; Shimko, Tyler; Fordyce, Polly et al. (2018) Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics 34:i629-i637
Marinov, Georgi K; Kundaje, Anshul (2018) ChIP-ping the branches of the tree: functional genomics and the evolution of eukaryotic gene regulation. Brief Funct Genomics 17:116-137
Yang, Dian; Denny, Sarah K; Greenside, Peyton G et al. (2018) Intertumoral Heterogeneity in SCLC Is Influenced by the Cell Type of Origin. Cancer Discov 8:1316-1331
Greenside, Peyton; Hillenmeyer, Maureen; Kundaje, Anshul (2018) Prediction of protein-ligand interactions from paired protein sequence motifs and ligand substructures. Pac Symp Biocomput 23:20-31
Ursu, Oana; Boley, Nathan; Taranova, Maryna et al. (2018) GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics 34:2701-2707
Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15:
Koh, Pang Wei; Pierson, Emma; Kundaje, Anshul (2017) Denoising genome-wide histone ChIP-seq with convolutional neural networks. Bioinformatics 33:i225-i233