To provide a single formal specification for information relevant to biotechnology computing, including scientific literature, nucleic acid sequence data, protein sequence data, genetic and physical maps, chromosomes, genes, the relationships of other scientific knowledge about these entities and their relationship to normal and disease conditions. To convert a number of important biological databases of diverse content and form into such unifying specification. Develop tools demonstrating use of such unified view of biological data. A large number of databases were examined, such as GenBank, EMBL, PIR, SWISSPROT, Kabat, MIM, ACEDB, Flybase, EcoSeq, MEDLINE, and others. A single modular data model was constructed which could represent almost all of the data from these sources in a consistent way, and formally specified in Abstract Syntax Notation 1, (ISO 8824, 8825). Parsers were written to read the different data formats of the sources, and software developed to map the different available data elements into the proper places in the unified ASN.1 data model. A software product (Entrez) was developed to production quality which took advantage of the unified view of some of the sources (GenBank, PIR, SWISSPROT, and MEDLINE) to allow the scientist to explore all these data as a single integrated whole. Entrez and it's associated integrated data is distributed to scientists on CD-ROM every two months by NCBI. A client/server version provides high speed access over Internet. New databases now allow NCBI to maintain an """"""""up to the minute"""""""" view of the diverse data sources and their relationships with each other despite differing content, data formats, and data release cycles. Additional data sources, such as 3-D protein structures, are being mapping to the common specification and new tools are being developed to use the growing web of connected data. Scientific knowledge continues to evolve, which means the data must evolve as well. This is a project which must continue as long as biochemical research does.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000033-03
Application #
3759310
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
1994
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code