Phylogenetic reconstruction is an invaluable tool for studying molecular sequences. Starting from a description of how the characters in the sequences mutate over time, the methods attempt to uncover the sequences'relatedness. Common applications range from describing the evolutionary histories of living organisms in evolutionary biology to estimating genetic distances and constructing protein families in molecular biology and bioinformatics. Standard reconstruction methods rely on sequence alignments that specify which characters in the sequences are homologous, deriving from common ancestors. A fundamental difficulty is that sequence alignments are not directly observed;they are inferred properties of the raw sequence data and must be estimated along with the phylogeny. Current tools handle this inference sequentially, first determining a sometimes poor estimate of the alignment and then conditioning on the truth of alignment to reconstruct the phylogeny. This project provides practical tools for end-users to simultaneously infer alignment and phylogeny, side-stepping biases that sequential estimation introduces. The tools assume both a character substitution model and an insertion/deletion (indel) process through which characters are added or removed generating an alignment. Further, these indels supply previously under-utilized information from the data to infer phytogenies. Major advances make this phylo-alignment framework useful for real-life datasets. The framework draws heavily on hidden Markov models, Bayesian computation and clever parameter integration to produce a computationally efficient inference engine. Expert prior knowledge helps inform the indel process. From this, realistic priors enable Bayes factor tests to address if specific indels are shared by descent or are homoplastic, reducing controversy over their value in phylogenetics. Modeling assumptions better reflect the underlying biology. Allowing spatial variation in the indel process provides more accurate phytogenies and alignments. The extensions also provide for heterogeneity tests to identify evolutionary interesting sequence regions. Examples of the methods span all time-scales of evolution, across billions of years to infer early branches in the Tree of Life to matters of months to describe the diversification of rapidly evolving viruses within infected hosts. This project markedly impacts many fields across biomedical research. For example, the project furnishes mathematical and statistical training in bioinformatics which will play a prime role in discovery during the 21st century, and rigorous inference tools employing phylo-alignment deliver improved molecular, comparative studies, a more accurate understanding of human evolution and new perspectives from which to battle infectious diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM086887-05
Application #
8302280
Study Section
Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer
Eckstrand, Irene A
Project Start
2008-08-01
Project End
2014-07-31
Budget Start
2012-08-01
Budget End
2014-07-31
Support Year
5
Fiscal Year
2012
Total Cost
$295,220
Indirect Cost
$64,050
Name
University of California Los Angeles
Department
None
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Höhna, Sebastian; Heath, Tracy A; Boussau, Bastien et al. (2014) Probabilistic graphical model representation in phylogenetics. Syst Biol 63:753-71
Nunes, Marcio R T; Palacios, Gustavo; Faria, Nuno Rodrigues et al. (2014) Air travel is associated with intracontinental spread of dengue virus serotypes 1-3 in Brazil. PLoS Negl Trop Dis 8:e2769
Heath, Tracy A; Huelsenbeck, John P; Stadler, Tanja (2014) The fossilized birth-death process for coherent calibration of divergence-time estimates. Proc Natl Acad Sci U S A 111:E2957-66
Crawford, Forrest W; Minin, Vladimir N; Suchard, Marc A (2014) Estimation for general birth-death processes. J Am Stat Assoc 109:730-747
Bielejec, Filip; Lemey, Philippe; Carvalho, Luiz Max et al. (2014) ?BUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios. BMC Bioinformatics 15:133
Cybis, Gabriela B; Sinsheimer, Janet S; Lemey, Philippe et al. (2013) Graph hierarchies for phylogeography. Philos Trans R Soc Lond B Biol Sci 368:20120206
Gill, Mandev S; Lemey, Philippe; Faria, Nuno R et al. (2013) Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol 30:713-24
Baele, Guy; Li, Wai Lok Sibon; Drummond, Alexei J et al. (2013) Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. Mol Biol Evol 30:239-43
Crawford, Forrest W; Suchard, Marc A (2013) Diversity, disparity, and evolutionary rate estimation for unresolved Yule trees. Syst Biol 62:439-55
Landis, Michael J; Schraiber, Joshua G; Liang, Mason (2013) Phylogenetic analysis using Levy processes: finding jumps in the evolution of continuous traits. Syst Biol 62:193-204

Showing the most recent 10 out of 46 publications