Our project seeks to identify the regulatory elements recognized by the vast majority of transcription factors (TFs) in the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans. In the initial modENCODE project, an experimental pipeline was developed and applied to -100 TFs in each organism. In this intervening year, we expect to capture data for another 75-100 factors. The present project builds on the advances made by the groups in the initial phase and also combines the production pipelines to increase efficiency and to realize economies of scale. With these improvements, we will generate data sets for another 400 factors from each organism, which when combined with previous work will represent the bulk of all transcription factors in these key model organisms. For both organisms, the overall strategy tags transcription factor genes by fusion with an enhanced Green Fluorescent Protein (eGFP) sequence through recombineering of large insert clones, and introducing the tagged genes into the genome by transgenesis. ChlP-seq using a high quality anti-GFP antibody is performed on the developmental stage(s) with maximal GFP expression, as guided by available RNA-seq expression data. The aligned sequence reads are analyzed to identify candidate binding sites and likely target genes. We will prioritize TFs with human homologs to maximize the broader utility of the data. We will also perform RNAi of 125 TFs in each organism, followed by RNA-seq, to validate called peaks and their assigned target genes. Finally, we will integrate the information for the different data setsto construct regulatory networks implied by the TF binding site data. We will coordinate with ENCODE projects on human TFs, and our data will provide key in vivo and developmental regulatory information that will be essential to delineate both fundamentally conserved as well as human-specific properties of TFs.
Insights from the study of the model organisms Drosophila and C. elegans provide the basis for broad understanding of fundamental processes of animal biology. Because many of their genes have clear relatives in humans, these studies have also led directly to improved understanding of human diseases and in some cases to therapies. Similarly, creating a comprehensive understanding of transcription factor binding sites and building regulatory networks in these key model organisms will create the foundation for understanding human regulatory networks both in health and disease.
|Weicksel, Steven E; Mahadav, Assaf; Moyle, Mark et al. (2016) A novel small molecule that disrupts a key event during the oocyte-to-embryo transition in C. elegans. Development 143:3540-3548|
|Thompson, Owen A; Snoek, L Basten; Nijveen, Harm et al. (2015) Remarkably Divergent Regions Punctuate the Genome Assembly of the Caenorhabditis elegans Hawaiian Strain CB4856. Genetics 200:975-89|
|Cheng, Chao; Andrews, Erik; Yan, Koon-Kiu et al. (2015) An approach for determining and measuring network hierarchy applied to comparing the phosphorylome and the regulome. Genome Biol 16:63|
|Wang, Daifeng; Yan, Koon-Kiu; Sisu, Cristina et al. (2015) Loregic: a method to characterize the cooperative logic of regulatory factors. PLoS Comput Biol 11:e1004132|
|Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu et al. (2014) Comparative analysis of the transcriptome across distant species. Nature 512:445-8|
|Kasper, Dionna M; Wang, Guilin; Gardner, Kathryn E et al. (2014) The C. elegans SNAPc component SNPC-4 coats piRNA domains and is globally required for piRNA abundance. Dev Cell 31:145-58|