This proposal will provide information, new algorithms, and computational tools for predicting proteolytic events. The ultimate goal is to make accurate proteome-wide predictions of the substrates for any given protease. However, our current effort will focus mainly on matrix metalloproteases (MMPs), caspases, and several protein convertases (PCs) belonging to the serine protease family because a vast amount of experimental information on those proteases is already available at the Sanford-Burnham Medical Research Institute. Our approach can be easily extended to any other proteases when a statistically significant number of substrates become available for deriving a specificity profile. The unique feature of the proposed prediction method is combining sequence-based predictions with other factors. These include: structural features of the substrates, cooperative interactions, and co-localization and co-expression of substrates and proteases. We will also include information about SNPs (single nucleotide polymorphisms) and PTMs (posttranslational modifications) of the residues in the vicinity of the cleavage sites in protein substrates. These two effects can modify the proteolytic event by turning it off or by creating a new possible cleavage site. Such modifications can lead to diseases or syndromes. The proteolytic events, e.g., protease-substrate pairs, will be mapped onto the known regulatory networks. All the information that is collected and tools that are developed will be freely available on the PMAP Web site ( for use by the biomedical research community. Because proteases usually have more than a dozen substrates, and because the substrates often differ in normal physiology vs. pathology, the impact of this project could be immense. Rather than identifying protease substrates on a one-by-one basis, our predictions will produce very-well-annotated sets of substrates that will likely have biological significance.

Public Health Relevance

Proteolysis is a biological process involving hydrolysis of the peptide bonds in proteins. We propose to design a computational approach for predicting substrates for proteinases in human proteome that takes into account accurate amino acid sequence specificity and structural and biological factors. This computational approach will help detect aberrations in the processing, regulation, and degradation of proteins leading to disease or syndromes.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Preusch, Peter C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sanford-Burnham Medical Research Institute
La Jolla
United States
Zip Code
Cieplak, Piotr; Strongin, Alex Y (2017) Matrix metalloproteinases - From the cleavage data to the prediction tools and beyond. Biochim Biophys Acta Mol Cell Res 1864:1952-1963
Cieplak, Piotr (2015) Letter to the Editor: Caspase cleavage sites in the human proteome: CaspDB, a database of predicted substrates. Apoptosis 20:421
Remacle, Albert G; Kumar, Sonu; Motamedchaboki, Khatereh et al. (2015) Matrix Metalloproteinase (MMP) Proteolysis of the Extracellular Loop of Voltage-gated Sodium Channels and Potential Alterations in Pain Signaling. J Biol Chem 290:22939-44
Kumar, Sonu; Ratnikov, Boris I; Kazanov, Marat D et al. (2015) CleavPredict: A Platform for Reasoning about Matrix Metalloproteinases Proteolytic Events. PLoS One 10:e0127877
Kukreja, Muskan; Shiryaev, Sergey A; Cieplak, Piotr et al. (2015) High-Throughput Multiplexed Peptide-Centric Profiling Illustrates Both Substrate Cleavage Redundancy and Specificity in the MMP Family. Chem Biol 22:1122-33
Kumar, Sonu; van Raam, Bram J; Salvesen, Guy S et al. (2014) Caspase cleavage sites in the human proteome: CaspDB, a database of predicted substrates. PLoS One 9:e110539
Shiryaev, Sergey A; Aleshin, Alexander E; Muranaka, Norihito et al. (2014) Structural and functional diversity of metalloproteinases encoded by the Bacteroides fragilis pathogenicity island. FEBS J 281:2487-502
Ratnikov, Boris I; Cieplak, Piotr; Gramatikoff, Kosi et al. (2014) Basis for substrate recognition and distinction by matrix metalloproteinases. Proc Natl Acad Sci U S A 111:E4148-55
Belushkin, Alexander A; Vinogradov, Dmitry V; Gelfand, Mikhail S et al. (2014) Sequence-derived structural features driving proteolytic processing. Proteomics 14:42-50
Shiryaev, Sergey A; Remacle, Albert G; Cieplak, Piotr et al. (2014) Peptide Sequence Region That is Essential for the Interactions of the Enterotoxigenic Bacteroides fragilis Metalloproteinase II with E-cadherin. J Proteolysis 1:3-14

Showing the most recent 10 out of 14 publications