The objective of the Encyclopedia of DNA Elements (ENCODE) Project is to provide a complete inventory of all functional elements in the human genome using high-throughput experiments as well as computational methods. This proposal aims to create the ENCODE Data Analysis Center (EDAC, or the DAC), consisting of a multi-disciplinary group of leading scientists who will respond to directions from the Analysis Working Group (AWG) of ENCODE and thus integrate data generated by all groups in the ENCODE Consortium in an unbiased manner. These analyses will substantially augment the value of the ENCODE data by integrating diverse data types. The DAC members are leaders in their respective fields of bioinformatics, computational machine learning, algorithm development, and statistical theory and application to genomic data (Zhiping Weng, Manolis Kellis, Mark Gerstein, Mark Daly, Roderic Guigo, Shirley Liu, Rafael Irizarry, and William Noble). They have a strong track record of delivering collaborative analysis in the context of the ENCODE and modENCODE Projects, in which this group of researchers was responsible for the much of the analyses and the majority of the figures and tables in the ENCODE and modENCODE papers. The proposed DAC will pursue goals summarized as the following seven aims:
Aim 1. To work with the AWG to define and prioritize integrative analyses of ENCODE data;
Aim 2. To provide shared computational guidelines and infrastructure for data processing, common analysis tasks, and data exchange;
Aim 3. To facilitate and carry out data integration for element-specific analyses;
Aim 4. To facilitate and carry out exploratory data analyses across elements;
Aim 5. To facilitate and carry out comparative analyses across human, mouse, fly, and worm;
Aim 6. To facilitate integration with the genome-wide association studies community and disease datasets;
and Aim 7. To facilitate writing Consortium papers and assist evaluating ENCODE data.

Public Health Relevance

The Encyclopedia of DNA Elements (ENCODE) Project is a coordinated effort to apply high-throughput, cost-efficient approaches to generate a comprehensive catalog of functional elements in the human genome. This proposal establishes a data analysis center to support, facilitate, and enhance integrative analyses of the ENCODE Consortium, with the ultimate goal of facilitating the scientific and medical communities in interpreting this human genome and using it to understand human biology and improve human health.

National Institute of Health (NIH)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Massachusetts Medical School Worcester
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Ay, Ferhat; Bailey, Timothy L; Noble, William Stafford (2014) Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res 24:999-1011
He, Housheng Hansen; Meyer, Clifford A; Hu, Sheng'en Shawn et al. (2014) Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat Methods 11:73-8
Kheradpour, Pouya; Kellis, Manolis (2014) Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res 42:2976-87
Zhuang, Jiali; Wang, Jie; Theurkauf, William et al. (2014) TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res 42:6826-38
Varoquaux, Nelle; Ay, Ferhat; Noble, William Stafford et al. (2014) A statistical approach for inferring the 3D structure of the genome. Bioinformatics 30:i26-33
Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu et al. (2014) Comparative analysis of the transcriptome across distant species. Nature 512:445-8
Dong, Xianjun; Weng, Zhiping (2013) The correlation between histone modifications and gene expression. Epigenomics 5:113-6
Khurana, Ekta; Fu, Yao; Colonna, Vincenza et al. (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342:1235587
Wang, Jie; Zhuang, Jiali; Iyer, Sowmya et al. (2013) a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res 41:D171-6
Wang, Su; Sun, Hanfei; Ma, Jian et al. (2013) Target analysis by integration of transcriptome and ChIP-seq data with BETA. Nat Protoc 8:2502-15