To provide a single formal specification for information relevant to biotechnology computing, including scientific literature, nucleic acid sequence data, protein sequence data, genetic and physical maps, chromosomes, genes, the relationships of other scientific knowledge about these entities and their relationship to normal and disease conditions. To convert a number of important biological databases of diverse content and form into such unifying specification. Develop tools demonstrating use of such unified view of biological data. A large number of databases were examined, such as GenBank, EMBL, PIR, SWISSPROT, Kabat, MIM, ACEDB, Flybase, EcoSeq, MEDLINE, and others. A single modular data model was constructed which could represent almost all of the data from these sources in a consistent way, and formally specified in Abstract Syntax Notation 1, (ISO 8824, 8825). Parsers were written to read the different data formats of the sources, and software developed to map the different available data elements into the proper places in the unified ASN.1 data model. A software product (Entrez) was developed to production quality which took advantage of the unified view of some of the sources (GenBank, PIR, SWISSPROT, and MEDLINE) to allow the scientist to explore all these data as a single integrated whole. Entrez and it's associated integrated data is distributed to scientists on CDROM every two months by NCBI. Additional data sources are being mapping to the common specification and new tools are being developed to use the growing web of connected data. New databases are being designed and built to store and track the ASN.1 model data in a dynamic way. This will permit NCBI to maintain an """"""""up to the minute"""""""" view of all the different supported data sources and their relationships to each other, in spite of the different data release cycles and policies of the sources. Scientific knowledge continues to evolve, which means the data must evolve as well. This is a project which must continue as long as biomedical research does.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000033-01
Application #
3845121
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code