III: AF: Medium: Collaborative Research: Enabling Phylogenetic Inference for Modern Data Sets

Matsen, Frederick; Minin, Vladimir

Abstract

Many important subjects in biological and biomedical research require a robust means of phylogenetic tree inference: for models of viral transmission, for gene function inference, and for assessment of genetic diversity in the human microbiome, to name a few. These applications also depend on a rigorous means of assessing tree inference uncertainty; the Bayesian framework provides a principled means of assessing and integrating out this uncertainty. The currently available Bayesian algorithmic tools are not capable of performing inferences on large modern data sets, which also may be continually changing as new sequencing results become available. In particular, state-of-the-art methods are almost exclusively based on random-walk Markov chain Monte Carlo (MCMC) using uniformly selected local moves, even though most of these local moves will substantially worsen even a mediocre tree. Convergence problems with this approach are well documented, and thus current methods are limited to around 1000 sequences, a number much smaller than the size of microbial and immune data sets relevant to modern biomedicine. In addition, all current methods require inference to be started from scratch each time the sequence data changes. The broader impacts of this work will extend in three directions: enabling novel applications of Bayesian phylogenetics, stimulating new areas of computer science research, and attracting new talent to the field.

Applications of phylogenetics, in particular Bayesian phylogenetics, are being significantly held back by computational limitations. High-throughput sequencing technologies can return millions of sequences for studies of the human microbiome, viruses, oceanic microbes and antibody-making B Cells but theses cannot be handled with current methods. The models also need to be more realistic, without assumptions of independent interactions. Understanding the shape of multidimensional phylogenetic likelihood surfaces in detail might help to improve the topology. The teams will also investigate when an optimal tree on a taxon sets contains the optimal tree on a taxon subset. These will help to expand the approach to phylogenetic inference. These algorithmic insights will be incorporated into publicly available inference packages with a goal to provide inference on an order of magnitude more taxa than currently possible.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 2110182
Program Officer: Sylvia Spengler

Project Start
Project End
Budget Start: 2020-10-01
Budget End: 2021-06-30
Support Year
Fiscal Year: 2021
Total Cost: $122,180
Indirect Cost

III: AF: Medium: Collaborative Research: Enabling Phylogenetic Inference for Modern Data Sets
Matsen, Frederick Minin, Vladimir
Fred Hutchinson Cancer Research Center, Seattle, WA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments