Our project seeks to complete the catalog of the regulatory elements recognized by the full set of transcription factors (TFs) in the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans. In the initial modENCODE project, an experimental pipeline was developed and applied to ~100 TFs in each organism, leaving approximately 600 TFs to study for each fly and worm. To achieve this scale-up, the project builds on the advances made by the groups in the initial phase and also combines the production pipelines to increase efficiency and to realize economies of scale. For both organisms, the overall strategy tags transcription factor genes by fusion with an enhanced Green Fluorescent Protein (eGFP) sequence through recombineering of large insert clones, and introducing the tagged genes into the genome by transgenesis. ChlP-seq using a high quality anti-GFP antibody is performed on the developmental stage(s) with maximal GFP expression, as augmented by available RNA-seq expression data. The aligned sequence reads are analyzed by PeakSeq and other software to identify candidate binding sites and likely target genes. We will prioritize TFs with human homologs to maximize the broader utility of the data. For 40 TFs in each organism we will also investigate TF expression of specific subsets of tissues or cells to estimate the specificity and sensitivity of whole animal ChlP-seq assays. We will also perform RNAi of 100 TFs in each organism, followed by RNA-seq, to validate called peaks and their assigned target genes. Finally, we will integrate the information for the different data sets to construct regulatory networks implied by the TF binding site data. We will coordinate with ENCODE projects on human TFs, and our data will provide key in vivo and developmental regulatory information that will be essential to delineate both fundamentally conserved as well as human-specific properties of TFs.

Public Health Relevance

Insights from the study of the model organisms Drosophila and C. elegans provide the basis for broad understanding of fundamental processes of animal biology. Because many of their genes have clear relatives in humans, these studies have also led directly to improved understanding of human diseases and in some cases to therapies. Similarly, creating a comprehensive understanding of transcription factor binding sites and building regulatory networks in these key model organisms will create the foundation for understanding human regulatory networks both in health and disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Specialized Center--Cooperative Agreements (U54)
Project #
1U54HG007002-01
Application #
8402441
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M1))
Program Officer
Feingold, Elise A
Project Start
2012-09-21
Project End
2014-08-31
Budget Start
2012-09-21
Budget End
2014-08-31
Support Year
1
Fiscal Year
2012
Total Cost
$1,450,000
Indirect Cost
$116,408
Name
University of Washington
Department
Genetics
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Kasper, Dionna M; Wang, Guilin; Gardner, Kathryn E et al. (2014) The C. elegans SNAPc component SNPC-4 coats piRNA domains and is globally required for piRNA abundance. Dev Cell 31:145-58
Kasper, Dionna M; Gardner, Kathryn E; Reinke, Valerie (2014) Homeland security in the C. elegans germ line: insights into the biogenesis and function of piRNAs. Epigenetics 9:62-74
Sarov, Mihail; Murray, John I; Schanze, Kristin et al. (2012) A genome-scale resource for in vivo tag-based protein function exploration in C. elegans. Cell 150:855-66