Activation of eukaryotic gene transcription involves the coordination of a multitude of transcription factors and cofactors on regulatory DNA sequences such as promoters and enhancers and on the chromatin structure containing these elements. Therefore, identification of these regulatory DNA elements is of utmost importance for understanding gene regulation in both healthy and diseased cells. Previous studies have demonstrated many characteristic epigenetic modifications occur at regulatory DNA elements, e.g., high levels of histone acetylation at gene promoters and at many enhancers. In addition, it is known that many regulatory elements carry these epigenetic modifications only in specific cell/tissue types or according to environmental conditions. In recent years, a vast amount of genome-wide histone modification data has been generated using chromatin immunoprecipitation coupled with microarray chip (ChIP-chip) or with next-generation sequencing technologies (ChIP-Seq). Currently, there is a pressing need for computational methods to analyze genome-wide histone modification data in order to identify functional DNA elements. The goal of the proposed research is to develop a novel computational method to identify transcriptional regulatory elements on the basis of their epigenetic characteristics. In the field of machine learning, it is well established that meaningful statistical features extracted from raw data can elude more relevant information and increase the prediction accuracy of a classifier. We hypothesize that by introducing efficient data transformation and feature extraction procedures before classification, we can increase the overall prediction accuracy of our method for identifying transcriptional regulatory elements. We propose to test the aforementioned hypothesis by pursuing the following three aims: (1) We will adopt well-established measures from signal processing to design and test a set of statistical features that could give us a better representation of signals in histone modification data. (2) We will evaluate the performance of several commonly used statistical classifiers in predicting enhancers. We will then develop a software tool combining the most informative features from Aim 1 with the optimal classifier. (3) We will apply our computational method to predict novel enhancers in mouse embryonic stem cell and human T cell using genome-wide histone modification maps in these two cell types. We will use both computational and experimental approaches to evaluate the accuracy of our predictions. Although in this project we focus on enhancers, the approach we develop can be readily extended to discover other types of functional DNA elements using histone modification data in different organisms and cell types. )
Transcriptional enhancers play an essential role in establishing tissue and developmental stage specific gene expression patterns that are essential for understanding development, cellular responses to environmental and genetic perturbations as well as the molecular basis of many diseases. The proposed research will lead to the development of a novel computational tool to discover enhancer elements using genome-wide chromatin signatures. Successful completion of the project will also uncover novel enhancers in two biomedically important cell types, embryonic stem cell and T lymphocyte, which could generate new insights into the regulatory networks controlling stem cell phenotype and T cell development and activation.
|Teng, Li; Tan, Kai (2012) Finding combinatorial histone code by semi-supervised biclustering. BMC Genomics 13:301|
|Teng, Li; Firpi, Hiram A; Tan, Kai (2011) Enhancers in embryonic stem cells are enriched for transposable elements and genetic variations associated with cancers. Nucleic Acids Res 39:7371-9|
|Ucar, Duygu; Hu, Qingyang; Tan, Kai (2011) Combinatorial chromatin modification patterns in the human genome revealed by subspace clustering. Nucleic Acids Res 39:4063-75|
|Firpi, Hiram A; Ucar, Duygu; Tan, Kai (2010) Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics 26:1579-86|