While accurate annotations of protein-coding regions in the human genome have been available for many years, annotation and interpretation of regulatory sequences has lagged far behind. This is because?in contrast to protein-coding sequences?the ?rules? that govern links from genome sequence to regulatory function are fuzzy, complex, and highly context-specific. Our limited understanding of regulatory regions presents a fundamental challenge for the identification and interpretation of disease variation, especially in the context of personal genome interpretation. Work from ENCODE and other groups has started to close this gap through experimental work, including high-resolution maps of regulatory sites in a variety of cell types, and modeling of the cell-type specific mappings from genome sequence to regulatory function. In this project we will develop a suite of new tools that uses these diverse new data sets to tackle these problems. We will implement and apply powerful new machine learning methods (based on deep learning) to interpret the genomic, context-specific encoding of regulatory information, and to identify genetic variants that impact the encoded information. We will build models using data from a variety of sources including ENCODE, Roadmap Epigenomics, GTEx, regulatory variation in the HapMap cell lines, as well as from disease cohorts. Validation experiments will be performed using a new high-complexity CRISPR/Cas9 system developed by our team. We will develop software tools and analytical results that can be widely used for genome interpretation, especially in analysis of personal genomes. By the end of this study we expect to have: (1) developed powerful new computational models for predicting regulatory function in a wide variety of cell types, at unprecedented resolution; (2) implemented novel validation screens in native chromatin at extremely high throughput; and (3) developed new tools for interpreting common and rare regulatory variation, with particular focus on identification of high-impact regulatory mutations in personal genomes. We are committed to timely release of software, data and analysis and are committed to working with the ENCODE Consortium to increase the impact of data and analyses from all study sites.
The purpose of this project is to develop powerful new computational methods to understand and predict the identity and function of gene regulatory sequences in diverse cell types. We will use these new methods to help us interpret common and rare genetic variation, and to identify variants that may contribute to disease. Outputs from the project will include new methods, software and functional validation data.
|Ursu, Oana; Boley, Nathan; Taranova, Maryna et al. (2018) GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics 34:2701-2707|
|Liu, Boxiang; Pjanic, Milos; Wang, Ting et al. (2018) Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci. Am J Hum Genet 103:377-388|
|Li, Yang I; Knowles, David A; Humphrey, Jack et al. (2018) Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet 50:151-158|
|Yamamoto, Ryo; Wilkinson, Adam C; Ooehara, Jun et al. (2018) Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell 22:600-607.e4|
|Knowles, David A; Burrows, Courtney K; Blischak, John D et al. (2018) Determining the genetic basis of anthracycline-cardiotoxicity by molecular response QTL mapping in induced cardiomyocytes. Elife 7:|
|Harpak, Arbel; Lan, Xun; Gao, Ziyue et al. (2017) Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates. Proc Natl Acad Sci U S A 114:12779-12784|