The ENCODE Data Analysis Center (EDAC) proposal aims to provide a flexible analysis resource for the ENCODE project. The ENCODE project is a large multi center project which aims to define all the functional elements in the human genome. This will be achieved using many different experimental techniques coupled with numerous computational techniques. A critical part in delivering this set of functional elements is the integration of data from multiple sources. The ED AC proposal aims to provide this integration. As proscribed by the RFA for this proposal, the precise prioritization for the EDAC's work will be set by an external group, the Analysis Working Group (AWG). Based on previous experience, these analysis methods will require a variety of techniques. We expect to have to apply sophisticated statistical models to the integration of the data, in particular mitigating the problems of the extensive heterogeneity and correlation of variables on the human genome. We have statistical experts who can use the large size of the human genome, coupled with a limited number of sensible assumptions to produce statistical techniques which are robust to this considerable heterogeneity. We also expect to apply machine learning techniques to build integration methods combining datasets. These included Bayesian based inference methods and the robust computer science technique of Support Vector Machines. Each of these methods have performed well in the ENCODE pilot project and we expect them to be even more useful in the full ENCODE project. We will also provide quality assurance and summary metrics of genome-wide multiple alignments. This area has a number of complex statistical, algorithmic and engineering issues, which we will solve using state of the art techniques. Overall we aim to provide deep integration of the ENCODE data, under the direction of the AWG and in tight collaboration with the other members of the ENCODE consortium.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
3U01HG004695-04S1
Application #
8494858
Study Section
Ethical, Legal, Social Implications Review Committee (GNOM)
Program Officer
Feingold, Elise A
Project Start
2008-05-15
Project End
2012-12-31
Budget Start
2012-04-01
Budget End
2012-12-31
Support Year
4
Fiscal Year
2012
Total Cost
$371,054
Indirect Cost
$10,761
Name
European Molecular Biology Laboratory
Department
Type
DUNS #
321691735
City
Heidelberg
State
Country
Germany
Zip Code
69117
Zerbino, Daniel R; Johnson, Nathan; Juetteman, Thomas et al. (2016) Ensembl regulation resources. Database (Oxford) 2016:
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn et al. (2016) Ensembl comparative genomics resources. Database (Oxford) 2016:
Zerbino, Daniel R; Ballinger, Tracy; Paten, Benedict et al. (2016) Representing and decomposing genomic structural variants as balanced integer flows on sequence graphs. BMC Bioinformatics 17:400
Pervouchine, Dmitri D; Djebali, Sarah; Breschi, Alessandra et al. (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat Commun 6:5903
Zerbino, Daniel R; Wilder, Steven P; Johnson, Nathan et al. (2015) The ensembl regulatory build. Genome Biol 16:56
Nguyen, Ngan; Hickey, Glenn; Zerbino, Daniel R et al. (2015) Building a pan-genome reference for a population. J Comput Biol 22:387-401
Yue, Feng; Cheng, Yong; Breschi, Alessandra et al. (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355-64
Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu et al. (2014) Comparative analysis of the transcriptome across distant species. Nature 512:445-8
Earl, Dent; Nguyen, Ngan; Hickey, Glenn et al. (2014) Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res 24:2077-89
Brown, James B; Boley, Nathan; Eisman, Robert et al. (2014) Diversity and dynamics of the Drosophila transcriptome. Nature 512:393-9

Showing the most recent 10 out of 46 publications