Bayesian Joint Estimation of Alignment and Phylogeny

Suchard, Marc

Abstract

Phylogenetic reconstruction is an invaluable tool for studying molecular sequences. Starting from a description of how the characters in the sequences mutate over time, the methods attempt to uncover the sequences'relatedness. Common applications range from describing the evolutionary histories of living organisms in evolutionary biology to estimating genetic distances and constructing protein families in molecular biology and bioinformatics. Standard reconstruction methods rely on sequence alignments that specify which characters in the sequences are homologous, deriving from common ancestors. A fundamental difficulty is that sequence alignments are not directly observed;they are inferred properties of the raw sequence data and must be estimated along with the phylogeny. Current tools handle this inference sequentially, first determining a sometimes poor estimate of the alignment and then conditioning on the truth of alignment to reconstruct the phylogeny. This project provides practical tools for end-users to simultaneously infer alignment and phylogeny, side-stepping biases that sequential estimation introduces. The tools assume both a character substitution model and an insertion/deletion (indel) process through which characters are added or removed generating an alignment. Further, these indels supply previously under-utilized information from the data to infer phytogenies. Major advances make this phylo-alignment framework useful for real-life datasets. The framework draws heavily on hidden Markov models, Bayesian computation and clever parameter integration to produce a computationally efficient inference engine. Expert prior knowledge helps inform the indel process. From this, realistic priors enable Bayes factor tests to address if specific indels are shared by descent or are homoplastic, reducing controversy over their value in phylogenetics. Modeling assumptions better reflect the underlying biology. Allowing spatial variation in the indel process provides more accurate phytogenies and alignments. The extensions also provide for heterogeneity tests to identify evolutionary interesting sequence regions. Examples of the methods span all time-scales of evolution, across billions of years to infer early branches in the Tree of Life to matters of months to describe the diversification of rapidly evolving viruses within infected hosts. This project markedly impacts many fields across biomedical research. For example, the project furnishes mathematical and statistical training in bioinformatics which will play a prime role in discovery during the 21st century, and rigorous inference tools employing phylo-alignment deliver improved molecular, comparative studies, a more accurate understanding of human evolution and new perspectives from which to battle infectious diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM086887-04
Application #: 8116012
Study Section: Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer: Eckstrand, Irene A

Project Start: 2008-08-01
Project End: 2013-07-31
Budget Start: 2011-08-01
Budget End: 2012-07-31
Support Year: 4
Fiscal Year: 2011
Total Cost: $295,390
Indirect Cost

Institution

Name: University of California Los Angeles
Department
Type: Schools of Medicine
DUNS #: 092530369

City: Los Angeles
State: CA
Country: United States
Zip Code: 90095

Related projects


NIH 2012 R01 GM	Bayesian Joint Estimation of Alignment and Phylogeny Suchard, Marc A. / University of California Los Angeles	$295,220
NIH 2011 R01 GM	Bayesian Joint Estimation of Alignment and Phylogeny Suchard, Marc A. / University of California Los Angeles	$295,390
NIH 2010 R01 GM	Bayesian Joint Estimation of Alignment and Phylogeny Suchard, Marc A. / University of California Los Angeles	$298,325
NIH 2009 R01 GM	Bayesian Joint Estimation of Alignment and Phylogeny Suchard, Marc A. / University of California Los Angeles	$298,715
NIH 2008 R01 GM	Bayesian Joint Estimation of Alignment and Phylogeny Suchard, Marc A. / University of California Los Angeles	$303,219

Publications

Gilbert, Princess S; Wu, Jing; Simon, Margaret W et al. (2018) Filtering nucleotide sites by phylogenetic signal to noise ratio increases confidence in the Neoaves phylogeny generated from ultraconserved elements. Mol Phylogenet Evol 126:116-128

vonHoldt, Bridgett M; Shuldiner, Emily; Koch, Ilana Janowitz et al. (2017) Structural variants in genes associated with human Williams-Beuren syndrome underlie stereotypical hypersociability in domestic dogs. Sci Adv 3:e1700398

Lake, James A; Larsen, Joseph; Sarna, Brooke et al. (2015) Rings Reconcile Genotypic and Phenotypic Evolution within the Proteobacteria. Genome Biol Evol 7:3434-42

Vrancken, Bram; Baele, Guy; Vandamme, Anne-Mieke et al. (2015) Disentangling the impact of within-host evolution and transmission dynamics on the tempo of HIV-1 evolution. AIDS 29:1549-56

Nunes, Marcio R T; Palacios, Gustavo; Faria, Nuno Rodrigues et al. (2014) Air travel is associated with intracontinental spread of dengue virus serotypes 1-3 in Brazil. PLoS Negl Trop Dis 8:e2769

Crawford, Forrest W; Minin, Vladimir N; Suchard, Marc A (2014) Estimation for general birth-death processes. J Am Stat Assoc 109:730-747

Heath, Tracy A; Huelsenbeck, John P; Stadler, Tanja (2014) The fossilized birth-death process for coherent calibration of divergence-time estimates. Proc Natl Acad Sci U S A 111:E2957-66

Bielejec, Filip; Lemey, Philippe; Carvalho, Luiz Max et al. (2014) ?BUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios. BMC Bioinformatics 15:133

Höhna, Sebastian; Heath, Tracy A; Boussau, Bastien et al. (2014) Probabilistic graphical model representation in phylogenetics. Syst Biol 63:753-71

Doss, Charles R; Suchard, Marc A; Holmes, Ian et al. (2013) Fitting Birth-Death Processes to Panel Data with Applications to Bacterial DNA Fingerprinting. Ann Appl Stat 7:2315-2335

Showing the most recent 10 out of 65 publications

Comments

Be the first to comment on Marc Suchard's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: