Understanding the evolutionary relationships between organisms is fundamental in a wide variety of problems in biology. This project investigates and develops new methods for inferring species relationships from genetic data, utilizing probabilistic models of gene trees conditional on a species tree. Its main goals are (1) to advance the mathematical understanding of these models, with a view toward species tree inference; (2) to develop improved methods for species tree inference by considering new and underutilized data types derived from gene trees, including clades, splits, unrooted gene trees, and ranked gene trees; (3) to validate theoretical, computational, and statistical properties of these new methods; (4) to produce software for use by empirical biologists. The project will identify gene tree summary statistics on which accurate inference can be based, and will employ these statistics to develop practical methods that can be used in the presence of missing data and under violations of model assumptions. The mathematical, statistical, and computational properties of both new and current methods will be studied to enable comparisons that can guide empirical applications. The model-based, probabilistic approach of this work provides a foundation for enhancing species tree inference from gene tree samples, and thus from genetic sequence data. The project addresses a promising methodological middle ground between computationally intensive full likelihood and Bayesian analyses, which are often infeasible for genomic-scale data sets, and tractable combinatorial methods, which often lack desirable statistical behaviors. The work will advance phylogenetic analysis by deepening knowledge of probabilistic models of gene tree discordance through analysis of the behavior of summary statistics. It will improve the practice of species tree inference by introducing new statistically consistent approaches and by developing theoretical and experimental understanding of the robustness of methods. Further, its use of mathematical techniques from probability, combinatorics, and algebraic statistics, as well as computational experiments employing simulation, will enhance mathematical evolutionary biology more generally.

Public Health Relevance

Inference of species relationships from genetic data is an essential component of biomedical science, for such disparate purposes as providing evolutionary insights, comparing model organisms, and understanding variation in pathogen strains. This project addresses the challenges of estimating species trees from large genomic data sets by providing new theoretical and practical tools.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM117590-01
Application #
9037795
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Janes, Daniel E
Project Start
2015-08-01
Project End
2019-04-30
Budget Start
2015-08-01
Budget End
2016-04-30
Support Year
1
Fiscal Year
2015
Total Cost
Indirect Cost
Name
University of Alaska Fairbanks
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
615245164
City
Fairbanks
State
AK
Country
United States
Zip Code
99775
BaƱos, Hector (2018) Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model. Bull Math Biol :
Mehta, Rohan S; Rosenberg, Noah A (2018) The probability of reciprocal monophyly of gene lineages in three and four species. Theor Popul Biol :
Degnan, James H (2018) Modeling Hybridization Under the Network Multispecies Coalescent. Syst Biol 67:786-799
Mitchell, Jonathan D; Sumner, Jeremy G; Holland, Barbara R (2018) Distinguishing Between Convergent Evolution and Violation of the Molecular Clock for Three Taxa. Syst Biol 67:905-915
Arbisser, Ilana M; Jewett, Ethan M; Rosenberg, Noah A (2018) On the joint distribution of tree height and tree length under the coalescent. Theor Popul Biol 122:46-56
Allman, Elizabeth S; Degnan, James H; Rhodes, John A (2018) Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model. Bull Math Biol 80:64-103
Allman, Elizabeth S; Degnan, James H; Rhodes, John A (2018) Species Tree Inference from Gene Splits by Unrooted STAR Methods. IEEE/ACM Trans Comput Biol Bioinform 15:337-342
Wrobel, Tomasz P; Bhargava, Rohit (2018) Infrared Spectroscopic Imaging Advances as an Analytical Technology for Biomedical Sciences. Anal Chem 90:1444-1463
Disanto, Filippo; Rosenberg, Noah A (2017) Enumeration of Ancestral Configurations for Matching Gene Trees and Species Trees. J Comput Biol 24:831-850
Kamneva, Olga K; Syring, John; Liston, Aaron et al. (2017) Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing. BMC Evol Biol 17:180

Showing the most recent 10 out of 20 publications