In collaboration with the BioInfomatics and Molecular Analysis Section (BIMAS) at the Division of Computer Research and Technology, we are engaged in a critical, quantitative comparison of several computer programs designed for gene sequence assembly and analysis.
The aim of this study was to gain experience and expertise in the use of several different sequence assembly programs, and to evaluate these programs as to their speed, accuracy, and ease of use. Six sequence assembly packages have been examined: the """"""""Inherit"""""""" System (Applied Biosystems); GCG (Genetics Computer Group); Sequencher 2.0 (Gene Codes Co.); GeneWorks (Intelli-Genetics); SeqMan (DNAStar); and AssemblyLING (International Biotechnologies). Inherit makes heavy use of a specialized parallel processing computer capable of scanning and comparing over 15 million characters per second. Inherit is primarily designed for assembly of medium to large sequencing projects, for searching the gene and protein databases for homologous gene sequences, and to quickly search for genetic motifs such as regulatory elements. Unfortunately, much of the Inherit software is bug ridden, and poorly designed. To evaluate the speed and quality of sequence assembly, the rat multidrug resistance gene sequence was randomly split into 58 overlapping fragments. From 0 to 15% error was randomly added to different sets of these fragments. The Inherit and GCG programs gave the best final assemblage results. At 5% added error approximately 50 bases were at variance from the final sequence (1% error). The Sequencher and SeqMan programs consistently gave two to three times as much apparent error. However, the Sequencher program has a contig editing system that is an order of magnitude superior to any of the other editors. Two programs gave unacceptable results. At 5% added error, both AssemblyLING and GeneWorks had between 900 and 1000 variant bases (20% error). GeneWorks frequently misaligned the fragments, even if there was no error added to them.