The LDSB investigates the organization and activities of developmental regulatory networks using formation of the Drosophila embryonic heart and body wall muscles as a model system. To achieve this objective, we combine contemporary genome-wide experimental and computational approaches with classical genetics and embryology to generate mechanistic hypotheses that we then test at single cell resolution in the intact organism. The cells comprising the Drosophila heart can be subdivided into two populations, the cardial cells (CCs) which express muscle genes and are contractile, and the pericardial cells (PCs) which are believed to perform nephrocyte functions. To uncover the enhancers and both the shared and uniquely discriminating sequence features which characterize these two cardiac cell types, we modified a machine learning approach that we previously applied to identify somatic myoblast subtypes. This strategy enabled us to computationally classify cell-type specific cardiac enhancers that integrate transcription factor (TF) motifs with ChIP data for a core set of conserved cardiogenic TFs which together define enhancer features that are critical for the cell-specific functions of these regulatory elements. To do so, we first compiled training sets of enhancers with activity in these two cell types. We mapped the ChIP data and TF motifs onto the training set and a set of control sequences, and used linear support vector machines to build separate PC and CC classifiers that discriminate training set sequences from controls. Scanning the entire Drosophila genome with the developed classifers allowed the identification of numerous related enhancers. Large-scale testing of such predicted enhancers revealed that they indeed possess appropriate cell-type specific activities. We also showed significant improvements in enhancer predictions when relevant ChIP data were included in the machine learning analyses. We next used the presence of sequence features identified by the classifier to reveal motifs critical for cardiac enhancer activity. This approach revealed that the Myb motif learned by the classifier is critical for CC activity. Interestingly, clustering the sequence features relevant to PC and CC classifications revealed potential discriminating sequence features between these two cell types. In agreement, cis mutagenesis assays in transgenic reporters reveal that the Notch signaling pathway TF Suppressor of Hairless (Su(H)) motif which was enriched amongst the PC classification and irrelevant to the CC classification is able to discriminate PC from CC gene regulatory activities. Encouraged by these results, we next asked if we could model the features that govern enhancer activity at single cell resolution in the Drosophila heart. The PCs and CCs comprising the Drosophila heart can be further subdivided into individual identities based on differences in morphology, function and gene expression patterns. Recent studies have shown that differential modifications of histone proteins, in vivo TF binding, and the presence of particular TF binding motifs can be used as predictive signatures of the enhancers that govern cell-specific gene expression. Here we used machine learning with all three of these data sets to uncover the chromatin, TF binding and sequence features of enhancers underlying gene expression in individual cardiac cells. In the latter studies, we first undertook a large-scale validation of Drosophila heart enhancer activities at single cell resolution in whole embryos. These experiments revealed enhancer activities present in distinct subpopulations of PCs and CCs. We next used these training sets of validated enhancers in a machine learning approach designed to uncover related regulatory elements as well as both the shared and discriminating sequence and protein features that uniquely characterize these individual heart cells. This revised classification used TF motifs and ChIP data for both a core set of conserved cardiogenic TFs and histone modifications as potential enhancer features that contribute to cellular specificity. In this way, we obtained successful classification of training set sequences from controls, and predicted enhancers that are enriched amongst known heart genes. Furthermore, large-scale testing of classifier-predicted enhancers showed that the scores of the predicted enhancers from the separate classifications can be used to predict cardiac cell subtype enhancer activity. Interestingly, we used the enhancer predictions from the individual cell-specific classifications to predict expression patterns of an atlas of known cardiac genes, and we applied gene ontology analysis to show that these annotated gene expression patterns can be used to infer the functions of individual heart cells. For example, genes associated with enhancers predicted to be active in the contractile CCs were enriched for myogenic functions, whereas genes associated with pan-PC enhancers were associated with renal system development, consistent with prior evidence that PCs act as nephrocytes in the Drosophila embryo. This analysis also revealed specialized functions for individual cardiac cell subtypes, with gene expression patterns associated with some PC subtypes being enriched for endocrine functions whereas others are enriched for the production of extracelluar matrix components. In total, these results document that modeling enhancer activities at single cell resolution can be used to identify previously uncharacterized organ-specific functions of individual embryonic cells. Finally, we used the features uncovered by the cell-specification classifications to reveal chromatin, TF binding and sequence features that distinguish enhancer activities in distinct subpopulations of heart cells. For example, we demonstrated that in vivo binding of the T box family TF Dorsocross discriminates enhancer activity in a subset of cardiac cells, while a particular histone mark (trimethylation of lysine 79 on histone 3) is only enriched amongst enhancers with activity in subsets of cardiac cells. Lastly, hierarchical clustering revealed sequence features which discriminate enhancer activities in individual cardiac cells. We empirically verified this latter result for a series of sequence features using cis mutagenesis in transgenic reporter assays. In total, these results document the utility of computational modeling combined with empirical testing to uncover the enhancers, TF motifs and genes which characterize individual cardiac cell fates, and provide a framework for conducting similar analyses in additional cell types and model organisms.

Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2014
Total Cost
Indirect Cost
Name
U.S. National Heart Lung and Blood Inst
Department
Type
DUNS #
City
State
Country
Zip Code
Busser, Brian W; Haimovich, Julian; Huang, Di et al. (2015) Enhancer modeling uncovers transcriptional signatures of individual cardiac cell states in Drosophila. Nucleic Acids Res 43:1726-39
Ahmad, Shaad M; Busser, Brian W; Huang, Di et al. (2014) Machine learning classification of cell-specific cardiac enhancers uncovers developmental subnetworks regulating progenitor cell division and cell fate specification. Development 141:878-88
Gisselbrecht, Stephen S; Barrera, Luis A; Porsch, Martin et al. (2013) Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nat Methods 10:774-80
Busser, Brian W; Huang, Di; Rogacki, Kevin R et al. (2012) Integrative analysis of the zinc finger transcription factor Lame duck in the Drosophila myogenic gene regulatory network. Proc Natl Acad Sci U S A 109:20768-73
Busser, Brian W; Taher, Leila; Kim, Yongsok et al. (2012) A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis. PLoS Genet 8:e1002531