Comparative analysis of DNA and amino acid sequences is now routinely employed in tracing the origins, patterns, and evolutionary relationships of homologous sequences. Due to recent advances in DNA sequencing technologies, these datasets now contain increasingly larger numbers of sequences that comprise a family of orthologous (arisen by speciation) and/or paralogous (arisen by gene duplications) sequences from diverse species. Therefore, the need for biologist-centric tools for evolutionary and functional genomics analysis of these data is growing. We propose to address these needs by expanding the scope of Molecular Evolutionary Genetics Analysis (MEGA) software to the analysis of gene families. This would involve development of new software for streamlining large gene family data acquisition, making MEGA cross-platform, and implementing efficient heuristics for estimating very large trees quickly and inferring gene duplication events and divergence times. Because sequence lengths in gene family alignments are biologically constrained, unlike species history analysis for which full genomes and multiple genes are often available to improve precision, we plan to evaluate the accuracies of phylogenetic trees produced by these extremely fast, but highly heuristic, algorithms for phylogenetic inference by means of computer simulation involving biologically realistic parameters. Insights gained from these efforts will be introduced in algorithm designs in MEGA. Overall, the software and research developments will contribute to advances in molecular evolution, bioinformatics, functional genomics, computational biology, and basic biomedicine. As always, MEGA and its source code will be made available free of charge for all uses, including research, education, and training.

Public Health Relevance

Evolutionary Bioinformatics is a powerful tool for conducting in silico functional analysis of DNA and protein sequences from genes and genomes of diverse organisms. The proposed software development and fundamental research will lead to an advanced Molecular Evolutionary Genetics Analysis (MEGA) tool for use by biologists in their quest to beter understand the evolutionary dynamics of gene families residing in the genomes of humans as well as their evolutionary relatives and pathogens.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University-Tempe Campus
Organized Research Units
United States
Zip Code
Stecher, Glen; Liu, Li; Sanderford, Maxwell et al. (2014) MEGA-MD: molecular evolutionary genetics analysis software with mutational diagnosis of amino acid variation. Bioinformatics 30:1305-7
Gray, Vanessa E; Liu, Li; Nirankari, Ronika et al. (2014) Signatures of natural selection on mutations of residues with multiple posttranslational modifications. Mol Biol Evol 31:1641-5
Filipski, Alan; Murillo, Oscar; Freydenzon, Anna et al. (2014) Prospects for building large timetrees using molecular data with incomplete gene coverage among species. Mol Biol Evol 31:2542-50
Kumar, Sudhir; Ye, Jieping; Liu, Li (2014) Reply to: "Proper reporting of predictor performance". Nat Methods 11:781-2
Gilbert, James D J; Acquisti, Claudia; Martinson, Holly M et al. (2013) GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes. BMC Genomics 14:599
Kumar, Sudhir; Filipski, Alan J; Battistuzzi, Fabia U et al. (2012) Statistics and truth in phylogenomics. Mol Biol Evol 29:457-72
Elser, James J; Acquisti, Claudia; Kumar, Sudhir (2011) Stoichiogenomics: the evolutionary ecology of macromolecular elemental composition. Trends Ecol Evol 26:38-44
Battistuzzi, Fabia U; Billing-Ross, Paul; Paliwal, Aditya et al. (2011) Fast and slow implementations of relaxed-clock methods show similar patterns of accuracy in estimating divergence times. Mol Biol Evol 28:2439-42
Gray, Vanessa E; Kumar, Sudhir (2011) Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol Biol Evol 28:1565-8
Battistuzzi, Fabia U; Filipski, Alan; Hedges, S Blair et al. (2010) Performance of relaxed-clock methods in estimating evolutionary divergence times and their credibility intervals. Mol Biol Evol 27:1289-300

Showing the most recent 10 out of 23 publications