Global multiple sequence alignment is the most basic step in the comparative study of molecular sequences. It is also the foundation of numerous subsequent biological analyses, such as phylogenetic reconstruction, gene annotation, and three-dimensional structure prediction. The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for addressing alignment-quality issues in real life settings. We have recently devised two simple methodologies to identify and quantify the uncertainties in multiple sequence alignments and their effects on subsequent analyses. With these methods, reliable (anchor) and unreliable (error) segments in alignments can be identified. We also found that most errors in alignment are simple errors, i.e., the misplacement of one or a few indels. Existing MSA reconstruction methods take the purist approach to alignment: define the most appropriate objective function, heuristically find an MSA that approximately maximizes it, and iteratively refine it using the same scoring scheme. We propose to improve upon exiting alignment methods by augmenting them with a utilitarian step: identify possible alignment reconstruction errors, and correct the simple errors, which we found in preliminary studies to be the most numerous. This grant application proposes four specific aims: (1) to design a method for increasing the reliability of the alignment in error segments, thereby improving the reliability of the entire alignment;(2) to evaluate the new method in comparison to and in conjunction with existing methods;(3) to implement the new method as a public domain software package, and (4) to revisit cases in which conclusions were based on erroneous alignments and to study the effects of improved alignments on downstream analyses. We estimate that our work will substantially increase the reliability of alignments and downstream procedures that use alignment as input.
Multiple sequence alignment (MSA) is the first computational step in the comparative analysis of molecular sequences and the foundation of numerous biological studies. MSA reconstruction techniques are used in approximately 20 scientific publications each day;a survey of the literature indicates that MSA is used in such disparate fields, as the diagnosis of microbial infections, the detection of mutations in neonatal syndromes, the study of gene expression, tertiary-structure prediction and rational drug design, high-resolution genotyping and metabolic pathway prediction. Improving the quality of MSAs will have a significant impact on biomedical research and subsequent applications.
Showing the most recent 10 out of 17 publications