In this project, modeling and analysis techniques from probability theory will be used to study several important computational problems in the area of phylogenomics, i.e., the integration of genome analysis and systematic studies. Various mechanisms such as hybridization events, lateral gene transfers, gene duplications and losses, and incomplete lineage sorting commonly lead to incongruences between inferred gene genealogies. As a result, one is led to consider forests of gene histories as well as more complex network representations of the evolutionary history of life. The main goals of the research are to improve large-scale likelihood-based gene tree estimation, develop computational methods to assemble species phylogenies from gene histories, and detect network-like signal in molecular data. Drawing on a combination of ideas from discrete probability, algorithms, and mathematical statistics, novel methodologies will be developed that are both statistically accurate and computationally efficient for these challenging inference problems.

Biologists face major statistical and computational challenges in modeling, analyzing, and interpreting the massive genetic datasets produced by next-generation technologies, including genomic variation within populations, whole genomes from multiple species, and environmental samples. In particular, high-throughput sequencing is transforming the reconstruction of the Tree of Life, a fundamental problem in biology which provides insights into the study of evolution, adaptation, and speciation. Through the development, implementation, and broad dissemination of new practical algorithms for phylogenomic studies based on mathematical analysis, this project will help advance the state of knowledge in evolutionary biology and contribute to the numerous benefits to society of phylogenetic research. Integration of research and education is a major component of this proposal. In addition to providing training for graduate students and postdoctoral researchers, new undergraduate and graduate courses will be developed and research experiences for undergraduates will be an important part of the project.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1149312
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2012-09-01
Budget End
2018-08-31
Support Year
Fiscal Year
2011
Total Cost
$444,405
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715