DNA sequence is error prone and the presence of sequence data errors can reduce the sensitivity mo of database searches, particularly for distantly related homologs. The reliability of one class of sequence search algorithm BLAST has been explored operating on sequence data with different levels of error introduced artificially. In addition, a version of this algorithm, BLASTX which translates nucleic acid sequences in all possible reading frames and search these conceptually translated protein sequences against protein sequence databases has been used to evaluate search performance based on raw cDNA sequence which is now being deposited in molecular sequence databases as expressed sequence tag (EST) sequence data. The use of codon utilization information has also been incorporated into the BLASTX algorithm to identify coding regions through a combination of sequence alignment and codon utilization information. This makes the identification of coding regions more robust to the presence of data errors in the query or database.