Accurately estimating the timing and mode of gene duplications along the evolutionary history of species can provide invaluable information about the underlying mechanisms by which the genomes of organisms evolved and the genes with novel functions arose. The major challenge is the lack of an appropriate modeling framework for gene family evolution. To address this challenge, the investigators develop a probabilistic model for gene family evolution that is able to describe the relationship between the history of a gene family and the history of species by taking into account the effects of different evolutionary mechanisms of gene retention (neofunctionalization, subfunctionalization, dosage balance). The model involves two stochastic processes - a non-homogeneous birth and death process and a mutation process. The probability distribution of a gene family tree given the species tree is derived from a non-homogeneous birth and death process in which a hazard function is employed for modeling the loss rate associated with different evolutionary mechanisms. The probabilistic model provides a systematic way of handling the estimation error of gene family trees when analyzing gene family data and is thus able to more accurately estimate the gene duplication and loss events along the phylogeny of species.

Advanced biotechnologies provide a vast amount of genetic data for the studies on gene family evolution. The main goal of the project is to develop a probabilistic model for gene family evolution, which can be used to effectively analyze gene family data. Gene duplication is the major resource of evolution novelty and plays an important role in gene family evolution. Thus this project will significantly advance our understanding on the process and biological consequences of gene duplication. As gene duplication played a pivotal role in evolution, the outcomes of the project will have significant impacts and important applications on a variety of fields, including evolutionary biology, phylogenetics, and cancer genetics that are closely related to human health and medicine.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1222745
Program Officer
Mary Ann Horn
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$172,888
Indirect Cost
Name
University of Georgia
Department
Type
DUNS #
City
Athens
State
GA
Country
United States
Zip Code
30602