Over 25,000 researchers in the US and over 50,000 in 120 other countries have exploited the PredictProtein (PP) Internet server to analyze proteins by homology-transfer and by eye novo predictions of protein structure and function. Here, we propose technical and scientific solutions that will improve the functionality of PP and its extension portal META-PP. Many technical changes will remain hidden to users and are required to increase the maintainability, scalability, and portability of these servers. New Graphical User Interfaces are one proposed solution that will visibly impact the service. The scientific solutions address two related tasks pertaining to the prediction of structure and function. The first is to predict the effect of mutations. We propose the development of novel machine learning-based methods to distinguish between mutations that affect structure, function, or have no apparent phenotype. Our final method will be applied to the screening of SNP data from our experimental colleagues at Columbia, as well as to the prediction of SNP effects in public databases. The second major task is the identification of natively unstructured regions and their functional classification. Proteins that do not adopt regular structures in isolation are increasingly becoming an important research area;they may provide a key to the evolution of complexity from prokaryotes to eukaryotes. We propose the development of a machine learning-based identification of features specific to this important class of molecules. We also plan to attack the problem from a very different angle by using predictions of interaction densities inside proteins. The resulting novel tools will allow a proteome-wide analysis of the role of these molecules. All methods will be made available through PP.

Public Health Relevance

Information about protein structure adds an entire dimension to protein analysis and genome annotation. This addition is often essential to infer function even for natively unstructured proteins. The PredictProtein server is unique in its combination and exploitation of evolution, structure, and function;many thousands of theoretical, experimental, and clinical researches have benefited from this. The long-term goal of the research proposed here is to improve our ability to use the evolutionary record of amino acid substitutions, i.e. to ultimately understand the amino acid """"""""language"""""""". The short-term goal is to address two tasks that are closely related to human diseases, namely the distinction between silent and important mutations and the mapping of unstructured proteins onto networks and diseases.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
7R01LM007329-08
Application #
7842572
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2002-04-01
Project End
2012-04-30
Budget Start
2010-05-01
Budget End
2012-04-30
Support Year
8
Fiscal Year
2010
Total Cost
$327,180
Indirect Cost
Name
Columbia University (N.Y.)
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
049179401
City
New York
State
NY
Country
United States
Zip Code
10027
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda et al. (2013) Cloud prediction of protein structure and function with PredictProtein for Debian. Biomed Res Int 2013:398968
Schlessinger, Avner; Schaefer, Christian; Vicedo, Esmeralda et al. (2011) Protein disorder--a breakthrough invention of evolution? Curr Opin Struct Biol 21:412-8
Bromberg, Yana; Rost, Burkhard (2009) Correlating protein function and stability through the analysis of single amino acid substitutions. BMC Bioinformatics 10 Suppl 8:S8
Wrzeszczynski, Kazimierz O; Rost, Burkhard (2009) Cell cycle kinases predicted from conserved biophysical properties. Proteins 74:655-68
Kernytsky, Andrew; Rost, Burkhard (2009) Using genetic algorithms to select most predictive protein features. Proteins 75:75-88
Jiang, Guoqian; Chute, Christopher G (2009) Auditing the semantic completeness of SNOMED CT using formal concept analysis. J Am Med Inform Assoc 16:89-102
Bromberg, Yana; Yachdav, Guy; Ofran, Yanay et al. (2009) New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'. Curr Opin Drug Discov Devel 12:408-19
Bromberg, Yana; Overton, John; Vaisse, Christian et al. (2009) In silico mutagenesis: a case study of the melanocortin 4 receptor. FASEB J 23:3059-69
Meszaros, Balint; Simon, Istvan; Dosztanyi, Zsuzsanna (2009) Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 5:e1000376
Bertonati, Claudia; Punta, Marco; Fischer, Markus et al. (2009) Structural genomics reveals EVE as a new ASCH/PUA-related domain. Proteins 75:760-73

Showing the most recent 10 out of 41 publications