The human genome encodes the developmental programs that result in the creation and maintenance of a complex organism with hundreds of tissues and trillions of cells. Precise, cell type specific control of gene expression is crucial t these processes. Transcription factor (TF) binding, DNA methylation, and histone modifications at gene regulatory elements play key roles in regulating RNA expression. However, the interplay be- tween these factors is poorly understood for most human tissues, and disruption of these processes can cause birth defects, cancer, and other disease. Recent advances in experimental technology have resulted in the production of thousands of genome- wide profiles of DNA methylation, histone modifications, and TF binding across hundreds of cellular contexts. These data hold the promise of revealing the dynamic genomic changes that drive proper development, but sound statistical and computational methods for integrating and testing hypotheses about these large, complex, and highly interdependent data are needed. Different cellular contexts are related through their differentiation histories, and the goal of this projectis to develop analysis tools that leverage these dependencies be- tween developmentally related cell types. This will facilitate the identification of significant changes in DNA and chromatin modifications within developing lineages, and it will highlight when and how these modifications impact gene expression. The approaches developed in this project will enable researchers to address the following biomedically important questions: Which DNA and chromatin modifications drive different transitions in a cellular differentiation? Which genomic regions are influenced by these modifications? What genes are influenced by these dynamic regulatory modifications in different lineages? Software will be developed, tested, and validated on several recent detailed characterizations of blood cell differentiation. This work will provide the developmental and cancer biology communities with open-source tools for characterizing the genomic basis of normal and abnormal development. In addition, with erroneous patterns of DNA methylation and histone modification now being used as diagnostic hallmarks for specific cancers, given the right data, this framework may open up avenues towards a better understanding of the biological underpinnings of such biomarkers.

Public Health Relevance

Different regions of the human genome are active in different types of cells, and improper activation of specific regions is often the cause of developmental disorders, cancer, and other disease. Recent dramatic improvements in experimental techniques have led to the collection of thousands of genome-wide activity patterns, but statistical methods to coherently model and test hypotheses about these data still need to be developed. The research proposed in this project will model genome-wide activity profiles in a statistical framework that accounts for interactions and dependencies in profiles from related cell types; the implementation of these models in open-source software will enable researchers to characterize shifts in genomic activity that are associated with the creation of healthy cells, and identify how genomic regulation goes awry in disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Anatomy/Cell Biology
Schools of Medicine
United States
Zip Code
Pouyan, Maziyar Baran; Kostka, Dennis (2018) Random forest based similarity learning for single cell RNA sequencing data. Bioinformatics 34:i79-i88
Kostka, Dennis; Holloway, Alisha K; Pollard, Katherine S (2018) Developmental Loci Harbor Clusters of Accelerated Regions That Evolved Independently in Ape Lineages. Mol Biol Evol 35:2034-2045
Phua, Yu Leng; Clugston, Andrew; Chen, Kevin Hong et al. (2018) Small non-coding RNA expression in mouse nephrogenic mesenchymal progenitors. Sci Data 5:180218
Simonti, Corinne N; Pavlicev, Mihaela; Capra, John A (2017) Transposable Element Exaptation into Regulatory Regions Is Rare, Influenced by Evolutionary Age, and Subject to Pleiotropic Constraints. Mol Biol Evol 34:2856-2869
Colbran, Laura L; Chen, Ling; Capra, John A (2017) Short DNA sequence patterns accurately identify broadly active human enhancers. BMC Genomics 18:536