The human genome encodes the developmental programs that result in the creation and maintenance of a complex organism with hundreds of tissues and trillions of cells. Precise, cell type specific control of gene expression is crucial t these processes. Transcription factor (TF) binding, DNA methylation, and histone modifications at gene regulatory elements play key roles in regulating RNA expression. However, the interplay be- tween these factors is poorly understood for most human tissues, and disruption of these processes can cause birth defects, cancer, and other disease. Recent advances in experimental technology have resulted in the production of thousands of genome- wide profiles of DNA methylation, histone modifications, and TF binding across hundreds of cellular contexts. These data hold the promise of revealing the dynamic genomic changes that drive proper development, but sound statistical and computational methods for integrating and testing hypotheses about these large, complex, and highly interdependent data are needed. Different cellular contexts are related through their differentiation histories, and the goal of this projectis to develop analysis tools that leverage these dependencies be- tween developmentally related cell types. This will facilitate the identification of significant changes in DNA and chromatin modifications within developing lineages, and it will highlight when and how these modifications impact gene expression. The approaches developed in this project will enable researchers to address the following biomedically important questions: Which DNA and chromatin modifications drive different transitions in a cellular differentiation? Which genomic regions are influenced by these modifications? What genes are influenced by these dynamic regulatory modifications in different lineages? Software will be developed, tested, and validated on several recent detailed characterizations of blood cell differentiation. This work will provide the developmental and cancer biology communities with open-source tools for characterizing the genomic basis of normal and abnormal development. In addition, with erroneous patterns of DNA methylation and histone modification now being used as diagnostic hallmarks for specific cancers, given the right data, this framework may open up avenues towards a better understanding of the biological underpinnings of such biomarkers.
Different regions of the human genome are active in different types of cells, and improper activation of specific regions is often the cause of developmental disorders, cancer, and other disease. Recent dramatic improvements in experimental techniques have led to the collection of thousands of genome-wide activity patterns, but statistical methods to coherently model and test hypotheses about these data still need to be developed. The research proposed in this project will model genome-wide activity profiles in a statistical framework that accounts for interactions and dependencies in profiles from related cell types; the implementation of these models in open-source software will enable researchers to characterize shifts in genomic activity that are associated with the creation of healthy cells, and identify how genomic regulation goes awry in disease.
|Pouyan, Maziyar Baran; Kostka, Dennis (2018) Random forest based similarity learning for single cell RNA sequencing data. Bioinformatics 34:i79-i88|
|Kostka, Dennis; Holloway, Alisha K; Pollard, Katherine S (2018) Developmental Loci Harbor Clusters of Accelerated Regions That Evolved Independently in Ape Lineages. Mol Biol Evol 35:2034-2045|
|Phua, Yu Leng; Clugston, Andrew; Chen, Kevin Hong et al. (2018) Small non-coding RNA expression in mouse nephrogenic mesenchymal progenitors. Sci Data 5:180218|
|Simonti, Corinne N; Pavlicev, Mihaela; Capra, John A (2017) Transposable Element Exaptation into Regulatory Regions Is Rare, Influenced by Evolutionary Age, and Subject to Pleiotropic Constraints. Mol Biol Evol 34:2856-2869|
|Colbran, Laura L; Chen, Ling; Capra, John A (2017) Short DNA sequence patterns accurately identify broadly active human enhancers. BMC Genomics 18:536|