The goal of the ENCODE Project is to provide the scientific community with a complete annotation of the human genome by delineating the DNA sequence features that comprise all genes, including exons, introns, promoters and cw-regulatory sequences. The pilot ENCODE project sought to develop and test a variety of experimental, computational and analytical platforms to determine the best ways to approach this problem by focusing on a defined 1% of the human genome. During this initial phase of ENCODE, the applicants of this proposal developed robust high-throughput methods for detecting and validating functional transcription promoters, DNA methylation patterns, and transcription factor occupancy in the pilot regions, and demonstrated that these approaches can be scaled fully to the entire human genome with high robustness, sensitivity and specificity. These experiences, together with the resulting technology and analysis platforms and an existing, highly productive infrastructure, lead to this response to NHGRI's RFA-HG-07-030. This application presents an ambitious proposal to expand a program to map and functionally annotate cisregulatory sequences of the human genome. The plan emphasizes full genome-comprehensivity for three experimental pipelines.: 1) a new sequence-based method called ChlPSeq to elucidate more than 600 comprehensive transcription factorDNA interactomes; 2) a similar new method called MethSeq to determine the methylation status of all the CpG-rich regions in the human genome in more than 1,000 human cell types and cell states; and 3) a high throughput transfection assay pipeline to measure transcriptional activities of 25,000 human """"""""promoter-plus"""""""" proximal cw-regulatory domains, including at least one major promoter for each of the annotated protein-coding genes. A second major product of the promoter pipeline will be a physical resource of proximate reporter constructs for all human genes, designed to accommodate future fine-structure dissection of the promoter regulatory motifs and testing of long-distance elements. All of the experimental work in this project will be subjected to analysis with appropriate quality metrics. In addition, comparative genomics and other computational analyses will be integrated with the experimental production to help prioritize and shape input to the pipelines and to capture information in forms useful to both biologists and genomicists. These analyses will produce several large-scale deliverables, including hundreds of ChIP data-driven sequence motif models, some of which additionally leverage evolutionary conservation for each of hundreds of transcription factors.
Marinov, Georgi K; Kundaje, Anshul; Park, Peter J et al. (2014) Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda) 4:209-23 |
Marinov, Georgi K; Wang, Yun E; Chan, David et al. (2014) Evidence for site-specific occupancy of the mitochondrial genome by nuclear transcription factors. PLoS One 9:e84713 |
Marinov, Georgi K; Williams, Brian A; McCue, Ken et al. (2014) From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res 24:496-510 |
Gasper, William C; Marinov, Georgi K; Pauli-Behn, Florencia et al. (2014) Fully automated high-throughput chromatin immunoprecipitation for ChIP-seq: identifying ChIP-quality p300 monoclonal antibodies. Sci Rep 4:5152 |
Au, Kin Fai; Sebastiano, Vittorio; Afshar, Pegah Tootoonchi et al. (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci U S A 110:E4821-30 |
Gertz, Jason; Savic, Daniel; Varley, Katherine E et al. (2013) Distinct properties of cell-type-specific and shared transcription factor binding sites. Mol Cell 52:25-36 |
Yang, Fei; Nickols, Nicholas G; Li, Benjamin C et al. (2013) Antitumor activity of a pyrrole-imidazole polyamide. Proc Natl Acad Sci U S A 110:1863-8 |
Tsumagari, Koji; Baribault, Carl; Terragni, Jolyon et al. (2013) Early de novo DNA methylation and prolonged demethylation in the muscle lineage. Epigenetics 8:317-32 |
Mortazavi, Ali; Pepke, Shirley; Jansen, Camden et al. (2013) Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps. Genome Res 23:2136-48 |
Varley, Katherine E; Gertz, Jason; Bowling, Kevin M et al. (2013) Dynamic DNA methylation across diverse human cell lines and tissues. Genome Res 23:555-67 |
Showing the most recent 10 out of 31 publications