While accurate annotations of protein-coding regions in the human genome have been available for many years, annotation and interpretation of regulatory sequences has lagged far behind. This is because?in contrast to protein-coding sequences?the ?rules? that govern links from genome sequence to regulatory function are fuzzy, complex, and highly context-specific. Our limited understanding of regulatory regions presents a fundamental challenge for the identification and interpretation of disease variation, especially in the context of personal genome interpretation. Work from ENCODE and other groups has started to close this gap through experimental work, including high-resolution maps of regulatory sites in a variety of cell types, and modeling of the cell-type specific mappings from genome sequence to regulatory function. In our funded project so far we have developed new computational tools to understand gene regulation, and how this may be impacted by genetic variation; as well as new methods for high throughput validation. In the Supplement year, we propose to extend this work with additional projects focusing in this area, including work on zinc finger proteins; connections between genetic variation, RNA expression, and GWAS; and, finally, high throughput CRISPR-based validation experiments.
The purpose of this project is to develop powerful new computational methods to understand and predict the identity and function of gene regulatory sequences in diverse cell types. We will use these new methods to help us interpret common and rare genetic variation, and to identify variants that may contribute to disease. Outputs from the project will include new methods, software and functional validation data.