This proposal will provide information, new algorithms, and computational tools for predicting proteolytic events. The ultimate goal is to make accurate proteome-wide predictions of the substrates for any given protease. However, our current effort will focus mainly on matrix metalloproteases (MMPs), caspases, and several protein convertases (PCs) belonging to the serine protease family because a vast amount of experimental information on those proteases is already available at the Sanford-Burnham Medical Research Institute. Our approach can be easily extended to any other proteases when a statistically significant number of substrates become available for deriving a specificity profile. The unique feature of the proposed prediction method is combining sequence-based predictions with other factors. These include: structural features of the substrates, cooperative interactions, and co-localization and co-expression of substrates and proteases. We will also include information about SNPs (single nucleotide polymorphisms) and PTMs (posttranslational modifications) of the residues in the vicinity of the cleavage sites in protein substrates. These two effects can modify the proteolytic event by turning it off or by creating a new possible cleavage site. Such modifications can lead to diseases or syndromes. The proteolytic events, e.g., protease-substrate pairs, will be mapped onto the known regulatory networks. All the information that is collected and tools that are developed will be freely available on the PMAP Web site ( for use by the biomedical research community. Because proteases usually have more than a dozen substrates, and because the substrates often differ in normal physiology vs. pathology, the impact of this project could be immense. Rather than identifying protease substrates on a one-by-one basis, our predictions will produce very-well-annotated sets of substrates that will likely have biological significance.

Public Health Relevance

Proteolysis is a biological process involving hydrolysis of the peptide bonds in proteins. We propose to design a computational approach for predicting substrates for proteinases in human proteome that takes into account accurate amino acid sequence specificity and structural and biological factors. This computational approach will help detect aberrations in the processing, regulation, and degradation of proteins leading to disease or syndromes.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Preusch, Peter C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sanford-Burnham Medical Research Institute
La Jolla
United States
Zip Code
Kumar, Sonu; van Raam, Bram J; Salvesen, Guy S et al. (2014) Caspase cleavage sites in the human proteome: CaspDB, a database of predicted substrates. PLoS One 9:e110539
Ratnikov, Boris I; Cieplak, Piotr; Gramatikoff, Kosi et al. (2014) Basis for substrate recognition and distinction by matrix metalloproteinases. Proc Natl Acad Sci U S A 111:E4148-55
Shiryaev, Sergey A; Aleshin, Alexander E; Muranaka, Norihito et al. (2014) Structural and functional diversity of metalloproteinases encoded by the Bacteroides fragilis pathogenicity island. FEBS J 281:2487-502
Belushkin, Alexander A; Vinogradov, Dmitry V; Gelfand, Mikhail S et al. (2014) Sequence-derived structural features driving proteolytic processing. Proteomics 14:42-50
Shiryaev, Sergey A; Chernov, Andrei V; Golubkov, Vladislav S et al. (2013) High-resolution analysis and functional mapping of cleavage sites and substrate proteins of furin in the human proteome. PLoS One 8:e54290