An enduring impediment in translating genomic advances into biomedical solutions has been the lack of tools and techniques that enable biologists to [a] efficiently leverage the multitude of publically- available genome variation data in their research endeavors, and [b] effectively harness the long-term (inter-specific) evolutionary histories of mutant positions in diagnosing functional effects of novel mutations. This need has become more acute with the discovery of unprecedented numbers of novel mutations in personal genomes and population surveys. Therefore, we propose an integrated research and development project to address this need. First, we plan to develop unique, user-friendly, and robust software to investigate human mutations in the context of Long-Term Evolutionary (LTE) patterns on a genomic scale;LTE patterns are revealed by inter-specific comparisons at a position, and they provide sound baseline hypotheses for analyzing the nature of mutations and frequencies of contemporary variations. The proposed myPEG (Population Evolutionary Genomics) software will contain tools for automated data assembly and integration from primary genome alignment browsers and mutation databases (e.g., UCSC, 1000Genomes, dbSNP). myPEG will enable users to conduct integrative analysis across taxonomic scales via its cross-platform WebTop display and analysis framework that will seamlessly integrate species and population sequence alignments and analyses in traditional and novel ways. myPEG's approach to software design and development will be biologist- centric in which we emulate, rather than reinvent, biologists'favorite work practices. These software developments will be informed by the proposed fundamental research to develop direct applications of macro-evolutionary patterns to the diagnosis of mutations associated with disease (e.g., Mendelian, complex, and somatic-cell mutations), and the successes of their computational predictions using in silico tools. The proposed investigations will yield similarities and differences in evolutionary anatomies of disease-associated and other mutations (including population SNPs) as well as those of the success rates of all major in silico tools currently used for diagnosing functional effects of novel mutations. These discoveries will form the basis for developing a decision support system to choose the best in silico method for the type of mutation and purpose (type of disease), such that the Reliability of in silico Inference (RoI) is the highest. myPEG will contain this decision support system, along with facilities for prototyping and conducting high-throughput iterative analysis of large numbers of mutations. myPEG will run on all major platforms (Windows, Linux, and MacOS), will be useable as a plug-in into analysis pipelines natively in these operating systems, and will be available at no cost (including the source code) to all users, including those in research, education, and training.

Public Health Relevance

The biological and biomedical research community at large needs user-friendly computational tools to translate the wealth of genomic data into useful information and solutions. Therefore, we will develop biologist-centric software to explore, integrate, analyze, and diagnose human mutation data in the context of the evolutionary history of mutation positions. Proposed innovative technological and research advances will enable scientists to harness the power of datasets in basic biomedicine, personalized medicine, personal genomics, and broader biological research.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (M3))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University-Tempe Campus
Organized Research Units
United States
Zip Code
Kumar, Sudhir; Liu, Li (2014) No positive selection for G allele in a p53 response element in Europeans. Cell 157:1497-9
Gray, Vanessa E; Liu, Li; Nirankari, Ronika et al. (2014) Signatures of natural selection on mutations of residues with multiple posttranslational modifications. Mol Biol Evol 31:1641-5
Kumar, Sudhir; Ye, Jieping; Liu, Li (2014) Reply to: ""Proper reporting of predictor performance"". Nat Methods 11:781-2
Liu, Li; Kumar, Sudhir (2013) Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants. Mol Biol Evol 30:1252-7
Nevin Gerek, Zeynep; Kumar, Sudhir; Banu Ozkan, Sefika (2013) Structural dynamics flexibility informs function and evolution at a proteome scale. Evol Appl 6:423-33
Champion, Mia D; Gray, Vanessa; Eberhard, Carl et al. (2013) The evolutionary history of amino acid variations mediating increased resistance of S. aureus identifies reversion mutations in metabolic regulators. PLoS One 8:e56466
Dudley, Joel T; Chen, Rong; Sanderford, Maxwell et al. (2012) Evolutionary meta-analysis of association studies reveals ancient constraints affecting disease marker discovery. Mol Biol Evol 29:2087-94
Kumar, Sudhir; Sanderford, Maxwell; Gray, Vanessa E et al. (2012) Evolutionary diagnosis method for variants in personal exomes. Nat Methods 9:855-6
Gray, Vanessa E; Kukurba, Kimberly R; Kumar, Sudhir (2012) Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations. Bioinformatics 28:2093-6
Dudley, Joel T; Kim, Yuseob; Liu, Li et al. (2012) Human genomic disease variants: a neutral evolutionary explanation. Genome Res 22:1383-94

Showing the most recent 10 out of 13 publications