Our project seeks to identify the regulatory elements recognized by the vast majority of transcription factors (TFs) in the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans. In the initial modENCODE project, an experimental pipeline was developed and applied to -100 TFs in each organism. In this intervening year, we expect to capture data for another 75-100 factors. The present project builds on the advances made by the groups in the initial phase and also combines the production pipelines to increase efficiency and to realize economies of scale. With these improvements, we will generate data sets for another 400 factors from each organism, which when combined with previous work will represent the bulk of all transcription factors in these key model organisms. For both organisms, the overall strategy tags transcription factor genes by fusion with an enhanced Green Fluorescent Protein (eGFP) sequence through recombineering of large insert clones, and introducing the tagged genes into the genome by transgenesis. ChlP-seq using a high quality anti-GFP antibody is performed on the developmental stage(s) with maximal GFP expression, as guided by available RNA-seq expression data. The aligned sequence reads are analyzed to identify candidate binding sites and likely target genes. We will prioritize TFs with human homologs to maximize the broader utility of the data. We will also perform RNAi of 125 TFs in each organism, followed by RNA-seq, to validate called peaks and their assigned target genes. Finally, we will integrate the information for the different data setsto construct regulatory networks implied by the TF binding site data. We will coordinate with ENCODE projects on human TFs, and our data will provide key in vivo and developmental regulatory information that will be essential to delineate both fundamentally conserved as well as human-specific properties of TFs.

Public Health Relevance

Insights from the study of the model organisms Drosophila and C. elegans provide the basis for broad understanding of fundamental processes of animal biology. Because many of their genes have clear relatives in humans, these studies have also led directly to improved understanding of human diseases and in some cases to therapies. Similarly, creating a comprehensive understanding of transcription factor binding sites and building regulatory networks in these key model organisms will create the foundation for understanding human regulatory networks both in health and disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Ethical, Legal, Social Implications Review Committee (GNOM)
Program Officer
Feingold, Elise A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Kasper, Dionna M; Wang, Guilin; Gardner, Kathryn E et al. (2014) The C. elegans SNAPc component SNPC-4 coats piRNA domains and is globally required for piRNA abundance. Dev Cell 31:145-58
Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu et al. (2014) Comparative analysis of the transcriptome across distant species. Nature 512:445-8