One (1) of the fundamental problems in computational biology is the prediction of a protein's 3D structural class -- that is, recognition of its fold from its linear sequence of amino acids. The proposed project aims to develop computational methods and tools for recognizing protein folds. The first specific aim involves building and delivering to the scientific community a web-based, discriminative fold-recognition software engine. This tool will instantiate for the first time in a user-friendly form a discriminative fold-recognition algorithm. This type of algorithm has been described and repeatedly validated in the scientific literature over the past 5 years, but no easy-to-use software tools yet exists to bring this technology to the end user. The second specific aim improves upon existing fold-recognition algorithms by exploiting the inherently multiclass nature of the problem. Previous approaches have treated each fold class independently, thereby sacrificing statistical power. This project will produce algorithms and software that dramatically improve our ability to recognize, from the primary amino acid sequence, subtle structural similarities among proteins.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BPC-Q (03))
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sloan-Kettering Institute for Cancer Research
New York
United States
Zip Code
Qi, Yanjun; Oja, Merja; Weston, Jason et al. (2012) A unified multitask architecture for predicting local protein properties. PLoS One 7:e32235
Melvin, Iain; Weston, Jason; Noble, William Stafford et al. (2011) Detecting remote evolutionary relationships among proteins by large-scale semantic embedding. PLoS Comput Biol 7:e1001047
Agius, Phaedra; Arvey, Aaron; Chang, William et al. (2010) High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput Biol 6:
Melvin, Iain; Weston, Jason; Leslie, Christina et al. (2009) RANKPROP: a web server for protein remote homology detection. Bioinformatics 25:121-2
Nimrod, Guy; Szilágyi, András; Leslie, Christina et al. (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 387:1040-53
Melvin, Iain; Weston, Jason; Leslie, Christina S et al. (2008) Combining classifiers for improved classification of proteins from sequence or structure. BMC Bioinformatics 9:389
Melvin, Iain; Ie, Eugene; Kuang, Rui et al. (2007) SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8 Suppl 4:S2
Weston, Jason; Kuang, Rui; Leslie, Christina et al. (2006) Protein ranking by semi-supervised network propagation. BMC Bioinformatics 7 Suppl 1:S10
Kuang, Rui; Weston, Jason; Noble, William Stafford et al. (2005) Motif-based protein ranking by network propagation. Bioinformatics 21:3711-8