Model and other organism databases are increasingly essential to genome and proteome research. Numerous database projects will produce large volumes of genome data for many species. An unmet need in building these databases is component tools to manage, view and analyze such data. Two modules toward this need are proposed: improve complex genome data access with search and retrieval methods, for public and peer community consumption; and deploy shared registries or directories of genome data using science grid technology. A Genome Information Search and Retrieval module (GISR) of flexible data and text mining software is an efficient and responsive tool for high-volume genome data of diverse source and content. Such a module, separating search/retrieval from data management tasks, has distinct advantages over a single database. Newly developing projects benefit from this by easily providing rapid access to a diverse set of genome data, while building their data management system. All genome database projects benefit by efficient access to complex data that can easily be federated with the many external, related bioinformatic data sets. A Data Grid Distribution module (DOD) uses emerging, open-source Science Grid technology to let organism databases provide computable directories and access to their high volumes of data, and conversely for such projects to access other's for integration with theirs. Directories for genome data are """"""""broad and shallow"""""""" and can join or federate the """"""""narrow and deep"""""""" detail of databases. Such a module for automated, common-standard use of organism genome data is an important step toward a grid of data useable for the large-scale analyses desired bioscientists.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG002733-01
Application #
6594249
Study Section
Special Emphasis Panel (ZHG1-HGR-N (02))
Program Officer
Good, Peter J
Project Start
2002-09-30
Project End
2004-08-31
Budget Start
2002-09-30
Budget End
2003-08-31
Support Year
1
Fiscal Year
2002
Total Cost
$257,820
Indirect Cost
Name
Indiana University Bloomington
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
006046700
City
Bloomington
State
IN
Country
United States
Zip Code
47401
Gilbert, Don (2005) Biomolecular interaction network database. Brief Bioinform 6:194-8
Colbourne, John K; Singan, Vasanth R; Gilbert, Don G (2005) wFleaBase: the Daphnia genome database. BMC Bioinformatics 6:45
Gilbert, Don (2004) Bioinformatics software resources. Brief Bioinform 5:300-4
Gilbert, Don (2003) Shopping in the genome market with EnsMart. Brief Bioinform 4:292-6
Gilbert, Don (2003) Protein family alignment annotation. Brief Bioinform 4:192-6