For many genetic loci, the rapid accumulation of sequence data from several species, along with detailed functional analyses, has produced a ponderous data set that is challenging, if not impossible, for individual investigators to fully assimilate. We propose to develop a prototype database of sequence alignments and experimental results for the beta-like globin gene cluster of mammals. This data repository is intended to help the international community of scientists studying globin genes to plan experiments, to design models and, ultimately, to understand regulation of those genes. The resource will be developed for mammalian beta-like globin gene clusters because they have been thoroughly sequenced for several species and extensive functional data are available. Moreover, the approaches and software developed for this project will be applicable to other sequence-analysis problems, and the prototype developed here will be applicable to any locus for which extensive sequence and functional data are available from several species. We recently developed a program that can simultaneously align a few very long sequences. An e-mail server was then established to deliver any requested portion of the alignment, annotated to indicate highly-conserved regions and known sequence features. This proposal seeks support to develop an interactive Globin Gene Server on the Internet to provide access to and analysis of both the alignments and experimental data. We are currently refining a data model and a collection of tools to capture and display the wealth of pertinent experimental data. Implementation of these tools will allow the user to view the results of the simultaneous alignment of sequences (conserved sequence blocks) in register with the experimental data, thus providing an integrated view of the gene cluster in a flexible, interactive format. This integrated view of both alignments and experimental data in register, which we call electronic genetic analysis, is a unique feature of this database. Future developments will include tools to search the Transcription Factor Database for matches to the consensus for conserved blocks, provide alternative alignments, produce summary views of the database in an interactive manner, support batch queries in classic database query languages, and automatically detect database inconsistencies. To best serve the needs of our potential user community, this information is being made available via World-Wide Web, which is an interactive, graphically oriented environment available over the Internet.
Showing the most recent 10 out of 15 publications