Every cell type within an individual essentially shares the same DNA sequence, yet different cell types can have very different functions. Modulating the interpretation of the DNA sequence in different cell types are epigenetic marks on the DNA and on the tails of the histone proteins around which that DNA is wrapped. Massively parallel sequencing has enabled genome-wide mapping of multiple epigenetic marks across a number of cell types. The research will advance computational approaches for modeling and analyzing epigenomic data in several important new dimensions. One dimension involves developing computational methods for modeling epigenomic changes over time. A number of important biological processes are now being studied by mapping multiple epigenetic marks at multiple time points raising the need for new computational approaches to model and analyze this data. Another dimension involves computational approaches to predicting the genome- wide signals of epigenetic marks in cell types in which a mark is not mapped. This is a key computational problem because it is unfeasible to experimentally map every epigenetic mark in every cell type of interest. A third dimension of the project involves computational methods to associate putative distal regulatory elements identified by epigenomic maps with the target genes they regulate by integrating multiple partially informative data sources to these interactions.
As part of the CAREER project Jason Ernst will contribute to the creation of a computational biosciences collaboratory at UCLA, aimed at simultaneously increasing the training of biological researchers in computational approaches and facilitating collaborations between computational and experimental researchers. A new research- focused graduate seminar that covers research topics in computational epigenomics will be created. Course projects for an undergraduate bioinformatics course based on the research will also be created. Undergraduate students, under-represented minority students, and women computer science students will be recruited to participate in research. All instructional material created will be disseminated on the internet. Computational methods developed under this project will be broadly disseminated as open source software and will likely be used by a wide range of biological researchers for a number of important biological applications.