This project will develop new combinatorial and probabilistic algorithms that will unravel the interwoven large-scale genomic changes that have occurred across species in an evolutionary context. The main research thrust is to develop new ancestral genome reconstruction algorithms that handle rearrangements, duplications, and large insertions and deletions at different resolutions in a single unified framework. These methods will be applied to the large number of whole-genome sequence data that have become available to elucidate detailed history of large-scale genomic operations in mammalian genomes. With the reconstructed history, scientists will be able to explain the large-scale genomic changes and assess their phenotypic impact on any lineage, including the human lineage.

The PI considers the problem of ancestral genomes using a parsimony principle based on breakpoint graphs that are consistent with current genomes. For considering more than two genomes, these problems are nearly all computationally intractable. The approach focuses on building better synteny blocks between genomes and using these blocks in a hierarchical method to develop new reconstruction algorithms that are then refined to smaller blocks and dealing with incomplete lineage sorting.

These new software tools and resources will be extremely useful to shed new light on the extraordinary diversity of mammalian forms and capabilities. In addition, the insights from this project will be applied to improve genome assembly methodologies based on next-generation high-throughput DNA sequencing reads. The models and algorithms will also be used to investigate specific genomic regions influenced by large-scale genomic changes, such as complex gene clusters and regions that harbor genome instability in cancer genomes. The project will develop open-source software tools for comparative genomics research, making them accessible to other scientists around the world. In addition, the outcome of the project will be disseminated through online website. Visualization tools from the research will provide scientific education on genome evolution to increase the accessibility of scientific results to the general public.

As part of his CAREER plan, the education components are closely integrated with the research program. The educational objectives include the development of new bioinformatics courses; training graduate students with interdisciplinary expertise necessary for the post-genomic era and providing them with meaningful international research experience through collaboration; getting undergraduate students involved in research projects; and participating in the G.A.M.E.S. camp at the University of Illinois to inspire pre-college girls to develop careers in science and engineering.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Application #
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Illinois Urbana-Champaign
United States
Zip Code