The deluge of genome sequencing and functional genomic data in multiple cellular contexts across healthy and diseased individuals provides a unique opportunity to decipher the regulatory and genetic architecture of diseases and traits. Novel computational methods are required that can address fundamental problems involving data representation, data integration, learning accurate predictive models from large-scale datasets and extraction of novel biological insights from complex models. We propose novel machine learning frameworks based on deep neural networks with new interpretation and hypothesis generation engines capable of integrating a wide variety of key genomic data types to learn predictive models of chromatin architecture and chromatin state; integrative models of transcription factor binding; determinants of macroscale three-dimensional genome architecture involving long-range chromatin contacts and the regulatory basis of functional non-coding, regulatory variants. Our methods are highly generalizable to several other related problems in regulatory genomics and lay the foundation for a paradigm shift in computational genomics.
We propose a novel class of machine learning methods based on deep neural networks to integrate diverse sources of functional genomics data and discover novel insights into the 1D microscale and 3D macroscale genetic and regulatory architecture of the human genome.
|Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15:|
|Greenside, Peyton; Shimko, Tyler; Fordyce, Polly et al. (2018) Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics 34:i629-i637|
|Marinov, Georgi K; Kundaje, Anshul (2018) ChIP-ping the branches of the tree: functional genomics and the evolution of eukaryotic gene regulation. Brief Funct Genomics 17:116-137|
|Yang, Dian; Denny, Sarah K; Greenside, Peyton G et al. (2018) Intertumoral Heterogeneity in SCLC Is Influenced by the Cell Type of Origin. Cancer Discov 8:1316-1331|
|Greenside, Peyton; Hillenmeyer, Maureen; Kundaje, Anshul (2018) Prediction of protein-ligand interactions from paired protein sequence motifs and ligand substructures. Pac Symp Biocomput 23:20-31|
|Ursu, Oana; Boley, Nathan; Taranova, Maryna et al. (2018) GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics 34:2701-2707|
|Koh, Pang Wei; Pierson, Emma; Kundaje, Anshul (2017) Denoising genome-wide histone ChIP-seq with convolutional neural networks. Bioinformatics 33:i225-i233|