The purpose of this proposal is to re-engineer the source code of the Molecular Evolutionary Genetics Analysis software (MEGA). Over the past 13 years, MEGA has been cited in more than 6,000 research publications spanning a diverse range of biological research disciplines, and it has been downloaded by more than 50,000 unique individuals representing students, educators and investigators from non-profit and commercial institutions. In order for MEGA to continue to address the needs of genomics researchers at the forefront of experimental design, informatics, and discovery, we must address several major technical issues affecting its future prospects. In particular, we plan to address the issues and challenges that have the most significant impact on MEGA's ability to remain useable and to become extensible to respond to the rapidly changing face of large-scale comparative genomics analysis. Therefore, we aim to refactor MEGA's source code in an effort to document the vast body of source code behind the software, and to modularize MEGA's computational core as an enabling step toward the implementation of additional re-engineering goals. A modular computational core enables the implementation of a plug-in extensibility mechanism that empowers the user community to evolve and enhance MEGA for their increasingly diverse and rapidly changing needs. It also enables the implementaion of an application scripting interface, which can extend MEGA's capabilities to the iterative, multi-dataset analysis protocols that are becoming commonplace in large-scale sequence analysis endeavors. The complexity of large-scale sequence analyses often requires the coordinated use of nemerous bioinformatics software tools. Recognizing this need, we plan to augment MEGA's ability to interoperate with diverse data file formats, including integratation of support for popular as well as standardized input/output file formats. During MEGA's long development history, the capabilities of the average, consumer-grade computer workstation have evolved significantly. Although initially developed for computationally intensive multimedia applications, we plan to integrate 64-bit computing and SIMD technology into MEGA to greatly enhance its performance for large-scale analysis. These core software enhancements will contribute to the longevity of MEGA for use in molecular evolution, bioinformatics, functional genomics, computational biology, and basic biomedicine applications.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-E (51))
Program Officer
Portnoy, Matthew
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University-Tempe Campus
Other Basic Sciences
Schools of Arts and Sciences
United States
Zip Code
Kumar, Sudhir; Stecher, Glen; Peterson, Daniel et al. (2012) MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics 28:2685-6
Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas et al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731-9
Kumar, Sudhir; Nei, Masatoshi; Dudley, Joel et al. (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform 9:299-306