The proposed project addresses the problem of reconstruction of ancestral genomes and evolutionary history of genomes that may deviate in gene content, resulting from genome rearrangements as well as gene duplications and deletions evolutionary events. It will close the gap between steadily growing number of sequenced genomes and incapability of existing phylogenetic reconstruction tools to process diverse varieties of genomes.
The new PRUGC software will employ the framework of multiple breakpoint graphs that will be extended to address new algorithmic challenges arising from genomes having unequal gene contents. As some of these challenges may be hard and have no computationally feasible solutions, instead of focusing on a fixed evolutionary model and attempting to fit biological problems into it, the PI will let the model be flexible and problem-driven. In the course of PRUGC development, the PI plans to address the following particularly important biological problems (listed in the order of growing complexity): the primate-rodent-carnivore split controversy in mammalian evolution (featuring relatively small number of duplications); phylogenetic analysis of a diverse variety of yeast genomes including genomes that undergone whole genome duplications; and evolutionary problems in plant evolution rich in segmental duplications. Solutions to these problems will help to better understand the mechanisms behind chromosome evolution across variety of genomes. The reconstructed ancestral genomes will provide insights to functional significance of particular gene orders, help to rigorously estimate the rate of genome rearrangements and gene duplications/deletions in different organisms, and allow testing hypotheses about their mechanisms and influence on shaping genomic architectures. It is important to emphasize that the PRUGC software will have a wide range of applications, not limited to the aforementioned problems. The PRUGC software will be helpful in various phylogenomic studies within projects like "Tree of Life", "Genome 10K", and "i5k". It will be released as both a standalone open-source tool and an online web-server application readily accessible for use by biologists.
The project will support research activities in the PI's research lab. In particular, it will help to prepare a new generation of researchers in bioinformatics by providing the opportunities to have hands-on experiences in both computer science and biology. One undergraduate student and two Ph.D. students will be recruited with the support of this project, and the PI will mentor these students and prepare them for building their careers in academia or industry. The PI will make every effort to help the students gain first-hand experience of biology, including short-term visits to our local, national, and international collaborators. Such experience will also help the students to develop and enhance their ability to communicate with researchers in other areas, an important skill in interdisciplinary research.
The project will also offer an excellent opportunity for computer science students to learn about experimental and theoretical research in the interdisciplinary area of bioinformatics. Detailed explanation of the whole process of multiple genomes comparison will perfectly fit into a timeframe of a bioinformatics course. The PI plans to lecture this material within the bioinformatics course CSCE 555 offered for undergraduate and graduate students at the University of South Carolina. As a member of the Bioinformatics Education Alliance developing "Bioinformatics for Biologists" (B4B) textbook, the PI coordinated with the editors preparation of a new chapter module and web-based educational materials for the next edition of B4B that will expand the current chapter on genome rearrangements and illustrate their applications with a number of biological problems within the PRUGC project.