Protein and nucleic acid sequence analysis is an important computational method that biological researchers use to study protein and nucleic sequences. The two basic types of analysis are database sequence retrieval and sequence alignment. Database sequence retrieval are usually implemented based on sequence similarity or based on presence of key words. The retrieved sequence entries, which contain all the information that are known about these sequences, can give researchers a great deal of information on a specific topic. Multiple sequence alignment are used to align a large number of sequences to study detailed similarities. For example, it can be used to classify protein sequences into sub-families. We have made good progress on our Genomic Research Information Database System (GRIDS). This system is implemented using a Sybase commercial SQL server which will allow users to retrieve different types of information including structured data, image, and text. At present, we have a prototype system which allows users to retrieve GenBank entries based on both structured (or measurable) data such as year and sequence length and based on free form text. Typically, free form text is used to retrieve entries that contain or not contain query terms while the measurable data is used to retrieve entries that contain data which is equal to, not equal to, greater, or less than the query data. We have already developed a web-based query entry form for GRIDS.

Agency
National Institute of Health (NIH)
Institute
Center for Information Technology (CIT)
Type
Intramural Research (Z01)
Project #
1Z01CT000263-01
Application #
6161681
Study Section
Special Emphasis Panel (CBEL)
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Center for Information Technology
Department
Type
DUNS #
City
State
Country
United States
Zip Code