The software MCSCAN, used to align multiple genomes, will be enhanced to contribute to deciphering the structure and evolutionary trajectories of eukaryotic genomes and genes, in particular addressing consequences of recursive whole-genome duplications. Burgeoning sets of eukaryotic genome sequences provide the foundation for a new spectrum of investigations into the functional and evolutionary consequences of gene and genome duplication, as well as the means to clarify knowledge of relationships among particular genes. The current software can only align small numbers of genomes; and layers of duplicated blocks produced by different genome duplication events are not readily deconvoluted, thus failing to provide crucial information toward understanding evolutionary trajectories of genomes and gene families. The enhanced software will mitigate these limitations.
The enhanced software will greatly help researchers to reconstruct the evolutionary trajectories of genomes and gene families, including the singularly challenging genomes of angiosperms and other taxa that have experienced polyploidization events. In particular, multiple alignment (of an expanded number of genomes) will be preceded by a multiple-way comparison of homologous regions at the DNA level, which will provide a holographic grasp of layers of homology produced by different duplication events. To reflect the evolutionary trajectories of structural changes, genomes will be input in a stepwise manner, with those of simple structures first. The resulting multiple alignment will much more accurately depict evolutionary relationships between chromosomal regions from diverse genomes, and easily be visualized and understood by users. The core part of the software will be implemented using the C++ programming language while the visualization module will be developed in Python language. The multiple and pairwise alignment information will be stored in MySQL or SQLite databases. The software will be tamed to work under multiple operating systems, including MS Windows, UNIX and Linux. Online service will be developed using the Django Web framework and jQuery (a concise JavaScript Library), and added to our NSF-supported PGDD. The software will be formed by several independent modules, which can be freely used by other researchers. A to-be-constructed web server accompanying the software will show figures illustrating genome structures, comparison between different plants, and evolutionary changes inferred to have occurred over millions of years. These intuitive visual resources will benefit researchers seeking to understand the evolution of plants, as well as elementary and middle school students, and readers at local libraries. The program will regularly host visitors from other institutions, countries, and the public. The enhanced software and related results in genomic analysis will be reported in academic conferences.