A Data Coordinating Center for modENCODE

Stein, Lincoln

Abstract

The modENCODE project is a key sequel to the sequencing of the fly and worm genomes, and will have an enormous impact on our understanding of biological processes in all higher eukaryotes, including human. In order to manage the diverse, large-scale datasets that will be produced by modENCODE, we propose to create a data coordinating center (DCC) to track the data, integrate it with other information sources, and make it available to the research community in a timely and open fashion. This proposal brings together four groups with highly relevant backgrounds: The Micklem group, through its work on the InterMine system and FlyMine database, has extensive experience in integrating diverse types of data into high-performance data mining systems. The Stein and Lewis groups bring to the project an intimate familiarity with the C. elegans and D. melanogaster genomes, their reagents and research communities, and are well-positioned by their work with the WormBase and FlyBase databases to liaise with those MODs. The Kent group is responsible for the DCC for the Human ENCODE pilot project, and has extensive practical knowledge of developing and managing projects of this sort. We will assemble a team of three data managers stationed at CSHL and at Berkeley, who have a background in the bioinformatics of C. elegans and/or D. melanogaster. The managers will liaise with their contacts at the data provider sites to determine data file formats, milestones and quality control procedures for their datasets. They will also liaise with representatives from NCBI to coordinate modENCODE activities with the primary data repositories at GenBank and GEO. Data providers will upload their data sets to a staging server where they will be able to preview their data on an instance of the GBrowse genome browser. The data managers will QC the data before approving its transfer to the production database. Data will be integrated in the production database using InterMine, and from there released to the public on a monthly schedule. Researchers will be able to access the data via the GBrowse genome browser, bulk downloads, and via complex queries and reports mediated by InterMine and the BioMart data warehousing system. All major software systems used by the proposed DCC will be based on open source tools from the Generic Model Organism Database (GMOD), human ENCODE, and other sources. Throughout the project, Lewis and Stein will work close with FlyBase and/or WormBase to ensure that data collected by modENCODE becomes an integral part of the relevant model organism database. In addition we will dedicate a significant part of a data manager's effort to transfer data from modENCODE into the MODs during the last year of the project.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Biotechnology Resource Cooperative Agreements (U41)
Project #: 5U41HG004269-05
Application #: 7825450
Study Section: Special Emphasis Panel (ZHG1-HGR-P (J1))
Program Officer: Good, Peter J

Project Start: 2007-05-04
Project End: 2014-03-31
Budget Start: 2010-04-01
Budget End: 2014-03-31
Support Year: 5
Fiscal Year: 2010
Total Cost: $1,329,443
Indirect Cost

Institution

Name: Ontario Institute for Cancer Research
Department
Type
DUNS #: 205540219

City: Toronto
State: ON
Country: Canada
Zip Code: M5 0-A3

Related projects


NIH 2012 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Ontario Institute for Cancer Research	$655,051
NIH 2011 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Ontario Institute for Cancer Research	$1,326,619
NIH 2010 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Ontario Institute for Cancer Research	$1,329,443
NIH 2009 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Ontario Institute for Cancer Research	$1,155,276
NIH 2009 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Ontario Institute for Cancer Research	$196,500
NIH 2008 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Cold Spring Harbor Laboratory	$208,462
NIH 2008 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Ontario Institute for Cancer Research	$961,173
NIH 2007 U41 HG	A Data Coordinating Center for modENCODE Stein, Lincoln D. / Cold Spring Harbor Laboratory	$1,275,000

Publications

Kalderimis, Alex; Lyne, Rachel; Butano, Daniela et al. (2014) InterMine: extensive web services for modern biology. Nucleic Acids Res 42:W468-72

Trinh, Quang M; Jen, Fei-Yang Arthur; Zhou, Ziru et al. (2013) Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE. BMC Genomics 14:494

Kuhn, Robert M; Haussler, David; Kent, W James (2013) The UCSC genome browser and associated tools. Brief Bioinform 14:144-61

Meyer, Laurence R; Zweig, Ann S; Hinrichs, Angie S et al. (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 41:D64-9

Contrino, Sergio; Smith, Richard N; Butano, Daniela et al. (2012) modMine: flexible access to modENCODE data. Nucleic Acids Res 40:D1082-8

Dreszer, Timothy R; Karolchik, Donna; Zweig, Ann S et al. (2012) The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res 40:D918-23

Fujita, Pauline A; Rhead, Brooke; Zweig, Ann S et al. (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39:D876-82

Washington, Nicole L; Stinson, E O; Perry, Marc D et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database (Oxford) 2011:bar023

McKay, Sheldon J; Vergara, Ismael A; Stajich, Jason E (2010) Using the Generic Synteny Browser (GBrowse_syn). Curr Protoc Bioinformatics Chapter 9:Unit 9.12

Kuhn, R M; Karolchik, D; Zweig, A S et al. (2009) The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 37:D755-61

Comments

Be the first to comment on Lincoln Stein's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: