The deluge of genome sequencing and functional genomic data in multiple cellular contexts across healthy and diseased individuals provides a unique opportunity to decipher the regulatory and genetic architecture of diseases and traits. Novel computational methods are required that can address fundamental problems involving data representation, data integration, learning accurate predictive models from large-scale datasets and extraction of novel biological insights from complex models. We propose novel machine learning frameworks based on deep neural networks with new interpretation and hypothesis generation engines capable of integrating a wide variety of key genomic data types to learn predictive models of chromatin architecture and chromatin state; integrative models of transcription factor binding; determinants of macroscale three-dimensional genome architecture involving long-range chromatin contacts and the regulatory basis of functional non-coding, regulatory variants. Our methods are highly generalizable to several other related problems in regulatory genomics and lay the foundation for a paradigm shift in computational genomics.
We propose a novel class of machine learning methods based on deep neural networks to integrate diverse sources of functional genomics data and discover novel insights into the 1D microscale and 3D macroscale genetic and regulatory architecture of the human genome.
|Koh, Pang Wei; Pierson, Emma; Kundaje, Anshul (2017) Denoising genome-wide histone ChIP-seq with convolutional neural networks. Bioinformatics 33:i225-i233|