The ENCODE Data Analysis Center (EDAC) proposal aims to provide a flexible analysis resource for the ENCODE project. The ENCODE project is a large multi center project which aims to define all the functional elements in the human genome. This will be achieved using many different experimental techniques coupled with numerous computational techniques. A critical part in delivering this set of functional elements is the integration of data from multiple sources. The ED AC proposal aims to provide this integration. As proscribed by the RFA for this proposal, the precise prioritization for the EDAC's work will be set by an external group, the Analysis Working Group (AWG). Based on previous experience, these analysis methods will require a variety of techniques. We expect to have to apply sophisticated statistical models to the integration of the data, in particular mitigating the problems of the extensive heterogeneity and correlation of variables on the human genome. We have statistical experts who can use the large size of the human genome, coupled with a limited number of sensible assumptions to produce statistical techniques which are robust to this considerable heterogeneity. We also expect to apply machine learning techniques to build integration methods combining datasets. These included Bayesian based inference methods and the robust computer science technique of Support Vector Machines. Each of these methods have performed well in the ENCODE pilot project and we expect them to be even more useful in the full ENCODE project. We will also provide quality assurance and summary metrics of genome-wide multiple alignments. This area has a number of complex statistical, algorithmic and engineering issues, which we will solve using state of the art techniques. Overall we aim to provide deep integration of the ENCODE data, under the direction of the AWG and in tight collaboration with the other members of the ENCODE consortium.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Ethical, Legal, Social Implications Review Committee (GNOM)
Program Officer
Feingold, Elise A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
European Molecular Biology Laboratory
Zip Code
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn et al. (2016) Ensembl comparative genomics resources. Database (Oxford) 2016:
Zerbino, Daniel R; Johnson, Nathan; Juetteman, Thomas et al. (2016) Ensembl regulation resources. Database (Oxford) 2016:
Zerbino, Daniel R; Ballinger, Tracy; Paten, Benedict et al. (2016) Representing and decomposing genomic structural variants as balanced integer flows on sequence graphs. BMC Bioinformatics 17:400
Pervouchine, Dmitri D; Djebali, Sarah; Breschi, Alessandra et al. (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat Commun 6:5903
Zerbino, Daniel R; Wilder, Steven P; Johnson, Nathan et al. (2015) The ensembl regulatory build. Genome Biol 16:56
Nguyen, Ngan; Hickey, Glenn; Zerbino, Daniel R et al. (2015) Building a pan-genome reference for a population. J Comput Biol 22:387-401
Ho, Joshua W K; Jung, Youngsook L; Liu, Tao et al. (2014) Comparative analysis of metazoan chromatin organization. Nature 512:449-52
Nguyen, Ngan; Hickey, Glenn; Raney, Brian J et al. (2014) Comparative assembly hubs: web-accessible browsers for comparative genomics. Bioinformatics 30:3293-301
Yue, Feng; Cheng, Yong; Breschi, Alessandra et al. (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355-64
Earl, Dent; Nguyen, Ngan; Hickey, Glenn et al. (2014) Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res 24:2077-89

Showing the most recent 10 out of 46 publications