We propose to develop a graphical ad hoc query interface for GenBank. Once successful, genome informaticists would be able to use the tool to express queries against GenBank in any manner they wish. We would develop this interface by extending our current tool, which is presently capable of interlinking multiple relational genome databases by permitting formulation of distributed queries. Central to our proposed extension is the development of a limited SQL-like interface on top of GenBank data stored in ASN.1 format. The proposed graphical GenBank query interface would include a number of salient features, including Schema Abstraction, Schema Semantics Explanation, Schema Constituent Finding, and Join Path Discovery. Schema Abstraction aims at tackling the complexity of the ASN.1 specifications by providing the user with both a means of hiding intermediate branches of ASN.1 schema and a means of collapsing complex subbranches of the schema into a simple node within the schema diagram. Schema Semantics Explanation aims at providing users with the semantic meanings (in English) of chosen subcomponents/branches of GenBank ASN.1 schema. Schema Constituent Finding aims at assisting user in identifying the portion of an ASN.1 schema relevant to a planned query. Jon Path Discovery aims at discovering alternative, semantically meaningful ways of linking multiple subschema diagrams to form a query. The proposed query interface could be an essential tool for computational biologists who not only need to freely access the essential nucleotide sequence database, but also need to tie that database to both public and private related genome databases. The interface would be developed in Java applet and would be downloable through the Internet.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000772-08
Application #
6343260
Study Section
Genome Study Section (GNM)
Program Officer
Good, Peter J
Project Start
1993-09-28
Project End
2003-12-31
Budget Start
2001-01-01
Budget End
2003-12-31
Support Year
8
Fiscal Year
2001
Total Cost
$188,875
Indirect Cost
Name
University of Connecticut
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
City
Storrs-Mansfield
State
CT
Country
United States
Zip Code
06269
Cheung, K H; Nadkarni, P; Miller, P et al. (1998) Automatic query mapping among genomic databases: a pilot exploration. Proc AMIA Symp :942-6
Cheung, K H; Nadkarni, P M; Shin, D G (1998) A metadata approach to query interoperation between molecular biology databases. Bioinformatics 14:486-97