DNA sequence is error prone and the presence of sequence data errors can reduce the sensitivity mo of database searches, particularly for distantly related homologs. The reliability of one class of sequence search algorithm BLAST has been explored operating on sequence data with different levels of error introduced artificially. In addition, a version of this algorithm, BLASTX which translates nucleic acid sequences in all possible reading frames and search these conceptually translated protein sequences against protein sequence databases has been used to evaluate search performance based on raw cDNA sequence which is now being deposited in molecular sequence databases as expressed sequence tag (EST) sequence data. The use of codon utilization information has also been incorporated into the BLASTX algorithm to identify coding regions through a combination of sequence alignment and codon utilization information. This makes the identification of coding regions more robust to the presence of data errors in the query or database.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000030-01
Application #
3845118
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code