We propose to develop a model in which laboratory genome database users freely interlink local databases and public genome databases. By interlinking we mean users of a laboratory genome database can form and issue ad hoc queries that entail cross-database joins between the local database and public genome databases in real-time. Specifically, we study the interoperability between the laboratory database DB/12 and two public genome databases, GDB and GSDB. In a proposed scenario, users/developers of DB/12 will be able to form and issue join-queries among DB/12, GDB and GSDB relatively quickly, even if they are not familiar with the schemas of GDB and GSDB. In another scenario, users/developers of other laboratory genome databases also use our proposed tool to interlink its own database and the public portion of DB/12, GDB and GSDB. The key component of our proposed model is the graphical ad hoc query interface that is designed to help users deal with unfamiliar third party database schemas and therefore eases users' query formulation process. Our specific goal is to enable them to form SQL queries graphically within 5 - 10 minutes despite unfamiliarity with the third-party database schemas of the federation. Our approach creates and uses meta-data describing schema relationships between the existing genome databases that participate in the federation. This study will clarify (i) what types of meta-level information about database schemas are necessary for making interoperability between genome databases feasible, and (ii) what is the most effective way of organizing and storing such meta-level information for efficient mutual use.
Cheung, K H; Nadkarni, P; Miller, P et al. (1998) Automatic query mapping among genomic databases: a pilot exploration. Proc AMIA Symp :942-6 |
Cheung, K H; Nadkarni, P M; Shin, D G (1998) A metadata approach to query interoperation between molecular biology databases. Bioinformatics 14:486-97 |