Orthologous genes, or orthologs, are genes in different species that have evolved directly from a common ancestral gene. Genome-scale assignment of orthologs is a fundamental and challenging problem in computational biology, and has a wide range of applications in comparative genomics and functional genomics. This project continues the development of the parsimony approach for assigning orthologs between closely related genomes which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. The project addresses three key algorithmic problems including (i) signed reversal distance with duplicates, (ii) signed transposition distance with duplicates, and (iii) minimum common string partition. Efficient solutions to each of these problems are combined and incorporated into a software system for ortholog assignment, called MSOAR. The project encompasses genome-wide analysis of orthologous (and paralogous) relationships on the human and mouse genomes to valdiate the approach, and more importantly, to address several important evolutionary biological questions including the characterization of gains and losses of duplicated genes in the two genomes, the elucidation of gene movements in one genome with respect to the other genome, and the quantification of different mechanisms of gene duplication.

Intellectual merit.

The parsimony approach presents a novel method for performing genome-wide ortholog assignment that takes into account both gene sequences and locations. The above algorithmic problems are new in the literature and their solutions likely require the introduction of novel algorithm design and analysis techniques. The questions regarding gene duplication and quantification of the duplication mechanisms in model species are of fundamental importance in evolutionary biology.

Broader impact.

As ortholog assignment is a fundamental problem in comparative genomics and has become a routine practice in almost all areas of genomics, MSOAR will find itself a wide range of applications in biology and genomics. Moreover, the research will provide the training opportunity for two computer science graduate students in the interdisciplinary field of computational biology.

Information concerning this NSF project will be provided at the website: http://msoar.cs.ucr.edu/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0711129
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2007-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2007
Total Cost
$260,000
Indirect Cost
Name
University of California Riverside
Department
Type
DUNS #
City
Riverside
State
CA
Country
United States
Zip Code
92521