An enduring impediment in translating genomic advances into biomedical solutions has been the lack of tools and techniques that enable biologists to [a] efficiently leverage the multitude of publically- available genome variation data in their research endeavors, and [b] effectively harness the long-term (inter-specific) evolutionary histories of mutant positions in diagnosing functional effects of novel mutations. This need has become more acute with the discovery of unprecedented numbers of novel mutations in personal genomes and population surveys. Therefore, we propose an integrated research and development project to address this need. First, we plan to develop unique, user-friendly, and robust software to investigate human mutations in the context of Long-Term Evolutionary (LTE) patterns on a genomic scale;LTE patterns are revealed by inter-specific comparisons at a position, and they provide sound baseline hypotheses for analyzing the nature of mutations and frequencies of contemporary variations. The proposed myPEG (Population Evolutionary Genomics) software will contain tools for automated data assembly and integration from primary genome alignment browsers and mutation databases (e.g., UCSC, 1000Genomes, dbSNP). myPEG will enable users to conduct integrative analysis across taxonomic scales via its cross-platform WebTop display and analysis framework that will seamlessly integrate species and population sequence alignments and analyses in traditional and novel ways. myPEG's approach to software design and development will be biologist- centric in which we emulate, rather than reinvent, biologists'favorite work practices. These software developments will be informed by the proposed fundamental research to develop direct applications of macro-evolutionary patterns to the diagnosis of mutations associated with disease (e.g., Mendelian, complex, and somatic-cell mutations), and the successes of their computational predictions using in silico tools. The proposed investigations will yield similarities and differences in evolutionary anatomies of disease-associated and other mutations (including population SNPs) as well as those of the success rates of all major in silico tools currently used for diagnosing functional effects of novel mutations. These discoveries will form the basis for developing a decision support system to choose the best in silico method for the type of mutation and purpose (type of disease), such that the Reliability of in silico Inference (RoI) is the highest. myPEG will contain this decision support system, along with facilities for prototyping and conducting high-throughput iterative analysis of large numbers of mutations. myPEG will run on all major platforms (Windows, Linux, and MacOS), will be useable as a plug-in into analysis pipelines natively in these operating systems, and will be available at no cost (including the source code) to all users, including those in research, education, and training.

Public Health Relevance

The biological and biomedical research community at large needs user-friendly computational tools to translate the wealth of genomic data into useful information and solutions. Therefore, we will develop biologist-centric software to explore, integrate, analyze, and diagnose human mutation data in the context of the evolutionary history of mutation positions. Proposed innovative technological and research advances will enable scientists to harness the power of datasets in basic biomedicine, personalized medicine, personal genomics, and broader biological research.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM010834-03
Application #
8323957
Study Section
Special Emphasis Panel (ZLM1-ZH-C (M3))
Program Officer
Ye, Jane
Project Start
2010-09-30
Project End
2014-09-29
Budget Start
2012-09-30
Budget End
2014-09-29
Support Year
3
Fiscal Year
2012
Total Cost
$358,680
Indirect Cost
$123,480
Name
Arizona State University-Tempe Campus
Department
Genetics
Type
Organized Research Units
DUNS #
943360412
City
Tempe
State
AZ
Country
United States
Zip Code
85287
Kumar, Sudhir; Ye, Jieping; Liu, Li (2014) Reply to: "Proper reporting of predictor performance". Nat Methods 11:781-2
Liu, Li; Kumar, Sudhir (2013) Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants. Mol Biol Evol 30:1252-7
Champion, Mia D; Gray, Vanessa; Eberhard, Carl et al. (2013) The evolutionary history of amino acid variations mediating increased resistance of S. aureus identifies reversion mutations in metabolic regulators. PLoS One 8:e56466
Dudley, Joel T; Chen, Rong; Sanderford, Maxwell et al. (2012) Evolutionary meta-analysis of association studies reveals ancient constraints affecting disease marker discovery. Mol Biol Evol 29:2087-94
Kumar, Sudhir; Dudley, Joel T; Filipski, Alan et al. (2011) Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet 27:377-86
Gray, Vanessa E; Kumar, Sudhir (2011) Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol Biol Evol 28:1565-8