The informatics core will continue to deliver to the community high quality, well-structured datasets with complete metadata along with comprehensive data analysis. To achieve this, we have developed bioinformatics pipelines to process and validate our ChIP-seq and RNA-seq data and worked extensively with the ENCODE DCC to curate our metadata to make our data easily accessible. The ChIP-seq pipeline has been used to call both narrow and broad peaks and to annotate HOT regions and TF binding sites in worm and fly across varying samples and stages; the RNA-seq pipeline has been used to identify differentially expressed genes under various conditions, such as different developmental stages and TF mutants, and we will evaluate TF binding sites associated with these genes. Although these pipelines have been set up and tested thoroughly, we aim to further optimize them; for instance, a new method is being developed to call ChIP-seq peaks using multiple types of controls. To our knowledge, no such peak caller exists. To integrate and analyze our data, we will develop a mini-encyclopedia with three levels of annotations, similar to the encyclopedia developed through the ENCODE project. The ground level will consist of the gene expression, TF binding and histone modification data in worm and fly. Based on our preliminary results, we have developed advanced statistical models to identify functional genomic regions, such as enhancers and HOT regions, etc. We will deposit these results into the middle annotation level. The top level will contain linkages of genes and their regulators, predicted by our models. The regulators include both cis- and trans-regulatory elements, such as enhancers and TFs. Moreover, the linkages will be integrated to form temporal or spatial networks.
We aim to identify key regulatory factors by comparing the structure of the networks. We will share all of our datasets, analysis results, and worm and fly strains with the community through the appropriate public databases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
United States
Zip Code
Sin, Olga; de Jong, Tristan; Mata-Cabana, Alejandro et al. (2017) Identification of an RNA Polymerase III Regulator Linked to Disease-Associated Protein Aggregation. Mol Cell 65:1096-1108.e6
Cao, Junyue; Packer, Jonathan S; Ramani, Vijay et al. (2017) Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357:661-667
Weicksel, Steven E; Mahadav, Assaf; Moyle, Mark et al. (2016) A novel small molecule that disrupts a key event during the oocyte-to-embryo transition in C. elegans. Development 143:3540-3548
Thompson, Owen A; Snoek, L Basten; Nijveen, Harm et al. (2015) Remarkably Divergent Regions Punctuate the Genome Assembly of the Caenorhabditis elegans Hawaiian Strain CB4856. Genetics 200:975-89
Wang, Daifeng; Yan, Koon-Kiu; Sisu, Cristina et al. (2015) Loregic: a method to characterize the cooperative logic of regulatory factors. PLoS Comput Biol 11:e1004132
Cheng, Chao; Andrews, Erik; Yan, Koon-Kiu et al. (2015) An approach for determining and measuring network hierarchy applied to comparing the phosphorylome and the regulome. Genome Biol 16:63
Kasper, Dionna M; Wang, Guilin; Gardner, Kathryn E et al. (2014) The C. elegans SNAPc component SNPC-4 coats piRNA domains and is globally required for piRNA abundance. Dev Cell 31:145-58
Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu et al. (2014) Comparative analysis of the transcriptome across distant species. Nature 512:445-8