Cell identity genes are a group of functionally linked genes that jointly implement the phenotype of a given cell type. A major constraint on cell identity study is the lack of a robust method to define the catalogue of identity genes for a cell type, and to identify master transcription factors that regulate the expression network of cell identity genes and drive cell identity specification. Intrigued by our recent discoveries, we hypothesize that cell identity genes can be identified using epigenetic feature that manifests their distinct transcriptional regulation mechanism. We and several other groups discovered that cell identity genes display unique epigenetic features, e.g., broad H3K4me3 (Chen, et al, Nature Genetics, 2015) and super-enhancers. We illustrated that these features are associated with strong and stable transcription activation signals for cell identity genes in their associated cell type, but not in other cell types. Biologists have used super enhancers or broad H3K4me3 as makers to nominate cell identity genes recently. However, it is still challenging for most biologists to use this method, as the required bioinformatics tools are not yet available. Our overall goal in this proposal is to extend the development of our computational epigenetic methods for cell identity gene discovery. Leveraging the early success of our bioinformatics algorithms DANPOS (Chen, et al, Genome Research, 2013) and DANPOS2 (Chen, et al, Nature Genetics, 2015), we will develop a series of new algorithms to (1) define epigenetic features for cell identity genes, (2) customize parameters for ChIP-Seq analysis of epigenetic feature, (3) collect known cell identity genes on the basis of thorough literature search followed by manual inspection, (4) systematically identify unknown cell identity genes, and (5) define master transcription factors that regulate the network of cell identity genes and drive cell identity specification. As a proof of principle, we will apply our novel methods to study cell identity determinants for the ECs in collaboration with Drs. John P. Cooke, Longhou Fang, and Qi Cao, three experts in EC biology, angiogenesis, and epigenetics. Successful completion of this study is expected to have broad positive impact on the study of cell identity determination, transcriptional regulation, and chromatin epigenetics. The scientific community will be able to use the bioinformatics tools developed in this proposal to define histone modification features with improved accuracy, and to predict identity genes and their master transcription factors systematically for given cell types in numerous biological systems or disease models. Our functional assay for new identity genes of ECs will improve mechanistic understanding of endothelial differentiation, development, and phenotypes, and will better guide discovery of therapeutic targets for treatment of vascular diseases. Although we focus on histone modification features for EC identity genes, our proposed bioinformatics methods can be easily adapted to investigate many other chromatin marks and gene categories in all cell types.

Public Health Relevance

Our research addresses the fundamental problem of how to identify the genes that determine the identity of a cell type. We develop methods to address this problem through bioinformatics investigation of epigenetic mechanisms that control cell identity. The scientific community will be able to use the bioinformatics software developed in this proposal to define chromatin features for individual genes, and to identify cell identity genes based on their unique chromatin features.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Resat, Haluk
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Methodist Hospital Research Institute
United States
Zip Code
Yu, Yang; Pham, Nhung; Xia, Bo et al. (2018) Dna2 nuclease deficiency results in large and complex DNA insertions at chromosomal breaks. Nature 564:287-290