Phylogenetic reconstruction is an invaluable tool for studying molecular sequences. Starting from a description of how the characters in the sequences mutate over time, the methods attempt to uncover the sequences'relatedness. Common applications range from describing the evolutionary histories of living organisms in evolutionary biology to estimating genetic distances and constructing protein families in molecular biology and bioinformatics. Standard reconstruction methods rely on sequence alignments that specify which characters in the sequences are homologous, deriving from common ancestors. A fundamental difficulty is that sequence alignments are not directly observed;they are inferred properties of the raw sequence data and must be estimated along with the phylogeny. Current tools handle this inference sequentially, first determining a sometimes poor estimate of the alignment and then conditioning on the truth of alignment to reconstruct the phylogeny. This project provides practical tools for end-users to simultaneously infer alignment and phylogeny, side-stepping biases that sequential estimation introduces. The tools assume both a character substitution model and an insertion/deletion (indel) process through which characters are added or removed generating an alignment. Further, these indels supply previously under-utilized information from the data to infer phytogenies. Major advances make this phylo-alignment framework useful for real-life datasets. The framework draws heavily on hidden Markov models, Bayesian computation and clever parameter integration to produce a computationally efficient inference engine. Expert prior knowledge helps inform the indel process. From this, realistic priors enable Bayes factor tests to address if specific indels are shared by descent or are homoplastic, reducing controversy over their value in phylogenetics. Modeling assumptions better reflect the underlying biology. Allowing spatial variation in the indel process provides more accurate phytogenies and alignments. The extensions also provide for heterogeneity tests to identify evolutionary interesting sequence regions. Examples of the methods span all time-scales of evolution, across billions of years to infer early branches in the Tree of Life to matters of months to describe the diversification of rapidly evolving viruses within infected hosts. This project markedly impacts many fields across biomedical research. For example, the project furnishes mathematical and statistical training in bioinformatics which will play a prime role in discovery during the 21st century, and rigorous inference tools employing phylo-alignment deliver improved molecular, comparative studies, a more accurate understanding of human evolution and new perspectives from which to battle infectious diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM086887-04
Application #
8116012
Study Section
Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer
Eckstrand, Irene A
Project Start
2008-08-01
Project End
2013-07-31
Budget Start
2011-08-01
Budget End
2012-07-31
Support Year
4
Fiscal Year
2011
Total Cost
$295,390
Indirect Cost
Name
University of California Los Angeles
Department
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Gilbert, Princess S; Wu, Jing; Simon, Margaret W et al. (2018) Filtering nucleotide sites by phylogenetic signal to noise ratio increases confidence in the Neoaves phylogeny generated from ultraconserved elements. Mol Phylogenet Evol 126:116-128
vonHoldt, Bridgett M; Shuldiner, Emily; Koch, Ilana Janowitz et al. (2017) Structural variants in genes associated with human Williams-Beuren syndrome underlie stereotypical hypersociability in domestic dogs. Sci Adv 3:e1700398
Lake, James A; Larsen, Joseph; Sarna, Brooke et al. (2015) Rings Reconcile Genotypic and Phenotypic Evolution within the Proteobacteria. Genome Biol Evol 7:3434-42
Vrancken, Bram; Baele, Guy; Vandamme, Anne-Mieke et al. (2015) Disentangling the impact of within-host evolution and transmission dynamics on the tempo of HIV-1 evolution. AIDS 29:1549-56
Höhna, Sebastian; Heath, Tracy A; Boussau, Bastien et al. (2014) Probabilistic graphical model representation in phylogenetics. Syst Biol 63:753-71
Nunes, Marcio R T; Palacios, Gustavo; Faria, Nuno Rodrigues et al. (2014) Air travel is associated with intracontinental spread of dengue virus serotypes 1-3 in Brazil. PLoS Negl Trop Dis 8:e2769
Crawford, Forrest W; Minin, Vladimir N; Suchard, Marc A (2014) Estimation for general birth-death processes. J Am Stat Assoc 109:730-747
Heath, Tracy A; Huelsenbeck, John P; Stadler, Tanja (2014) The fossilized birth-death process for coherent calibration of divergence-time estimates. Proc Natl Acad Sci U S A 111:E2957-66
Bielejec, Filip; Lemey, Philippe; Carvalho, Luiz Max et al. (2014) ?BUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios. BMC Bioinformatics 15:133
Doss, Charles R; Suchard, Marc A; Holmes, Ian et al. (2013) Fitting Birth-Death Processes to Panel Data with Applications to Bacterial DNA Fingerprinting. Ann Appl Stat 7:2315-2335

Showing the most recent 10 out of 65 publications