The LDSB investigates the organization and activities of developmental regulatory networks using formation of the Drosophila embryonic heart and body wall muscles as a model system. To achieve this objective, we combine contemporary genome-wide experimental and computational approaches with classical genetics and embryology to generate mechanistic hypotheses that we then test at single cell resolution in the intact organism. Machine learning represents a powerful computational approach for identifying shared sequence features of cis-regulatory elements having related cell-specific activities. However, this approach is dependent on the availability of a sufficiently large set of functionally similar enhancers to build and train a classifier. Since such a requirement is limiting for Drosophila muscle founder cell (FC) enhancers, we collaborated with Dr. Ivan Ovcharenkos lab to develop an alternative approach in which phylogenetic profiling is used to identify orthologous sequences from other species to expand the requisite training set. Such orthologs were found to be active in Drosophila melanogaster muscle FCs, although extensive evolutionary shuffling of key transcription factor (TF) binding sites occurred among the orthologs. Once trained on the expanded set of FC enhancers, the classifier identified a large number of candidate FC-specific regulatory motifs, a number of which we functionally validated. Moreover, our analysis revealed an extraordinary degree of combinatorial specificity contributed by the TF binding sites found within a large set of validated FC enhancers: of 18 FC enhancers in our analysis, no two elements contained the same set of 12 TF binding site classes. Collectively, these studies establish that TF binding site combinatorics make a major contribution to the diversity and functional complexity of enhancers having related but nonidentical activities in similar cell types within the developing embryo. A diversity of genomic information can be used to study the structure and function of a cell-specific gene regulatory network. Such information includes sets of co-expressed genes, binding site determination of candidate transcription factors (TFs) that may cooperate with each other, co-occurrences of functionally related binding sites found in the vicinity of co-expressed genes, and results from applications of ChIP-seq technology to determine the genome-wide localization of both TF binding sites and histone modification signatures that mark the presence of active cis-regulatory elements in cell types of interest. Thus, we have extended the above machine learning approach to integrate all of these data types to gain unique insights into the gene regulatory network that governs the genetic programs of Drosophila fusion-competent myoblasts (FCMs). We focused on the zinc finger TF, Lame duck (Lmd), a known regulator of FCM identity. Initially, we used genome-wide expression profiling to obtain a set of genes that are regulated in lmd mutant embryos and which may be direct targets of Lmd. Next, we performed ChIP-seq experiments to localize Lmd binding sites across the genome of purified mesodermal cells, a population which was isolated since it is expected to be enriched for molecular signals that are specific for FCM genes. Indeed, Lmd binding was over-represented in association with genes having known FCM expression. In addition, Lmd binding was enriched for multiple histone modifications that are generally characteristic of active enhancers, and Lmd was bound to all previously characterized FCM cis-regulatory elements. Protein binding microarrays were next used to assay the complete spectrum of Lmd DNA binding specificities, and a significant enrichment of the Lmd motifs was found within the Lmd ChIP-seq peaks. The functional significance of Lmd binding sites in two known FCM enhancers was also validated in vivo. Further diversity of the regulatory models that are associated with genes having FCM expression was revealed through the discovery of co-occurrences of Lmd binding together with other known mesodermal TFs, an analysis that identified five unique combinations of TF binding sites that included Lmd. That is, no single combination of established mesodermal TFs accounts for the entire spectrum of FCM gene expression, underscoring a previously unrecognized degree of heterogeneity in FCM gene regulation. In addition, these studies uncovered a feed-forward loop between Lmd and the general mesodermal TF, Twist (Twi), with the latter TF also demonstrated to be essential for FCM enhancer activity in vivo. Quantitative ChIP-PCR of chromatin isolated from transgenic lines containing wild-type and Lmd or Twi site mutant versions of an FCM enhancer further demonstrated that the combinatorial occupancy of these two TFs on this enhancer may be associated with protein-protein interactions aided by the closely-spaced nature of the elements Lmd and Twi binding sites. Of note, Lmd had a much larger effect on Twi binding than vice versa, consistent with Twi binding preceding and cooperatively facilitating Lmd binding to an FCM enhancer, a hypothesis that is further supported by the known developmental timing of gene expression and loss-of-function analyses of these two TFs. To extend our understanding of the combinatorial complexity of FCM gene regulation, we next used machine learning to identify additional sequence motifs that are over-represented within the genomic regions bound by Lmd. This computational analysis revealed enrichment of binding sites for a number of other TFs relative to an appropriate control, including motifs that are bound by the Forkhead (Fkh) class of DNA binding domain. Finally, this latter finding was functionally validated as being relevant to the activation of a known FCM enhancer. In summary, the present research strategy involving the integration of genetic, genomic and computational methods revealed an unexpected degree of combinatorial complexity in the molecular mechanisms underlying the cell type-specific regulation of gene expression in a subset of myoblasts in the developing Drosophila embryo. One of the limiting factors in validating computational predictions such as those described above is the necessity to use time-consuming and labor-intensive transgenic reporter assays to validate the activity of each predicted enhancer. To circumvent this problem, we collaborated with Dr. Martha Bulyks laboratory (Brigham and Womens Hospital) to develop a novel, high-throughput in vivo assay for determining putative tissue- and cell-specific transcriptional enhancers. The procedure, which we have named enhancer FACS-seq (eFS) enables highly parallel identification of active, tissue- and cell type-specific enhancers in whole Drosophila embryos starting with a defined library of candidate and control regulatory elements. The eFS approach was validated by traditional transgenic reporter assays of initial positive results, and by the finding that those mesodermal enhancers identified by eFS were enriched in DNA binding motifs corresponding to TFs with known mesodermal functions. Additional eFS experiments revealed that this technique is capable of identifying enhancers with activities not only in whole mesoderm but also in less abundant mesodermal cell types, thereby extending the potential applicability of this method for analyzing Drosophila embryonic transcriptional regulatory networks at the resolution of cellular subpopulations.
|Ahmad, Shaad M; Busser, Brian W; Huang, Di et al. (2014) Machine learning classification of cell-specific cardiac enhancers uncovers developmental subnetworks regulating progenitor cell division and cell fate specification. Development 141:878-88|