Global multiple sequence alignment is the most basic step in the comparative study of molecular sequences. It is also the foundation of numerous subsequent biological analyses, such as phylogenetic reconstruction, gene annotation, and three-dimensional structure prediction. The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for addressing alignment-quality issues in real life settings. We have recently devised two simple methodologies to identify and quantify the uncertainties in multiple sequence alignments and their effects on subsequent analyses. With these methods, reliable (anchor) and unreliable (error) segments in alignments can be identified. We also found that most errors in alignment are simple errors, i.e., the misplacement of one or a few indels. Existing MSA reconstruction methods take the purist approach to alignment: define the most appropriate objective function, heuristically find an MSA that approximately maximizes it, and iteratively refine it using the same scoring scheme. We propose to improve upon exiting alignment methods by augmenting them with a utilitarian step: identify possible alignment reconstruction errors, and correct the simple errors, which we found in preliminary studies to be the most numerous. This grant application proposes four specific aims: (1) to design a method for increasing the reliability of the alignment in error segments, thereby improving the reliability of the entire alignment;(2) to evaluate the new method in comparison to and in conjunction with existing methods;(3) to implement the new method as a public domain software package, and (4) to revisit cases in which conclusions were based on erroneous alignments and to study the effects of improved alignments on downstream analyses. We estimate that our work will substantially increase the reliability of alignments and downstream procedures that use alignment as input.

Public Health Relevance

Multiple sequence alignment (MSA) is the first computational step in the comparative analysis of molecular sequences and the foundation of numerous biological studies. MSA reconstruction techniques are used in approximately 20 scientific publications each day;a survey of the literature indicates that MSA is used in such disparate fields, as the diagnosis of microbial infections, the detection of mutations in neonatal syndromes, the study of gene expression, tertiary-structure prediction and rational drug design, high-resolution genotyping and metabolic pathway prediction. Improving the quality of MSAs will have a significant impact on biomedical research and subsequent applications.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM010009-02
Application #
7828219
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-07-01
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2012-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$562,500
Indirect Cost
Name
University of Houston
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
036837920
City
Houston
State
TX
Country
United States
Zip Code
77204
Ezawa, Kiyoshi (2016) General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable? BMC Bioinformatics 17:304
Ezawa, Kiyoshi (2016) General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation. BMC Bioinformatics 17:397
Ezawa, Kiyoshi; Landan, Giddy; Graur, Dan (2013) Detecting negative selection on recurrent mutations using gene genealogy. BMC Genet 14:37
Bogumil, David; Landan, Giddy; Ilhan, Judith et al. (2012) Chaperones divide yeast proteins into classes of expression level and evolutionary rate. Genome Biol Evol 4:618-25
Ezawa, Kiyoshi; Ikeo, Kazuho; Gojobori, Takashi et al. (2011) Evolutionary patterns of recently emerged animal duplogs. Genome Biol Evol 3:1119-35
Suen, Garret; Teiling, Clotilde; Li, Lewyn et al. (2011) The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet 7:e1002007
Popa, Ovidiu; Hazkani-Covo, Einat; Landan, Giddy et al. (2011) Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res 21:599-609
Cartwright, Reed A; Graur, Dan (2011) The multiple personalities of Watson and Crick strands. Biol Direct 6:7
Cartwright, Reed A; Lartillot, Nicolas; Thorne, Jeffrey L (2011) History can matter: non-Markovian behavior of ancestral lineages. Syst Biol 60:276-90
Cartwright, Reed A (2011) Bards, poets, and cliques: frequency-dependent selection and the evolution of language genes. Bull Math Biol 73:2201-12

Showing the most recent 10 out of 18 publications