Defects in the ubiquitin-proteasome system are implicated in the development of numerous human diseases. Some of the natural substrates for ubiquitination and degradation can induce malignant transformation if not properly removed from the cell. Despite the importance of the ubiquitination process, precise identification of ubiquitination (Ub) sites (i.e. acceptor lysine residues to which a ubiquitin molecule is attached) within substrates of ubiquitin ligases is still experimentally challenging. The development of computational approaches to predict Ub sites from a protein sequence provides an attractive alternative to the experimental methods. Here, we propose to develop a computational algorithm that could predict Ub sites with high precision. First, we will identify new protein Ub sites using a combination of multidimensional protein identification technology (MudPit) and mass spectrometry. Different environmental perturbations, such as heat shock, oxidative stress, DNA damage, and starvation for nutrients, will be introduced in order to increase the coverage of the ubiquitinated proteome. Second, we will use the dataset of new Ub sites to develop a ubiquitination sites predictor. A novel machine learning approach that includes co-training of two predictors having different data representations, and the usage of the unlabeled dataset to increase performance accuracy will be utilized. To our knowledge, this will be the first ubiquitination sites predictor developed to date. Finally, we will apply the predictor to the datasets of cell signaling and cancer-associated proteins to predict new ubiquitination sites and substrates among them. The prediction of intrinsic disorder (ID) will be carried out on the same datasets in order to test the hypothesis about preferential occurrence of Ub sites within ID regions. Annotated disease-related mutations will be extracted from three public databases (MutDB, SWISS-PROT and OMIM) and correlated with the predicted ubiquitination sites. The discovery of mutations in proximity to Ub sites or even those directly affecting Ub sites would lay the basis for formulating and testing biologically meaningful hypotheses about their role in cancer and other diseases. Proteins undergo a wide range of modifications that regulate their activity. One of such modification, ubiquitination, was shown to be involved in various human diseases including cancer, renal diseases (von Hippel-Lindau disease, Liddle syndrome, ischemic acute renal failure), several neurodegenerative diseases (Alzheimer, Parkinson, CAG- expansion disorders). The precise ubiquitination sites in proteins are difficult to detect. We propose to develop a computational approach that could identify such sites with high precision. This would help to develop better drugs that are directed either against the ubiquitinated proteins or against specific sites in these proteins. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21CA113711-01A2
Application #
7256169
Study Section
Special Emphasis Panel (ZRG1-BCMB-Q (90))
Program Officer
Couch, Jennifer A
Project Start
2007-08-02
Project End
2009-07-31
Budget Start
2007-08-02
Budget End
2008-07-31
Support Year
1
Fiscal Year
2007
Total Cost
$188,000
Indirect Cost
Name
Rockefeller University
Department
Biostatistics & Other Math Sci
Type
Other Domestic Higher Education
DUNS #
071037113
City
New York
State
NY
Country
United States
Zip Code
10065
Vacic, Vladimir; Iakoucheva, Lilia M; Lonardi, Stefano et al. (2010) Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 17:55-72
Radivojac, Predrag; Vacic, Vladimir; Haynes, Chad et al. (2010) Identification, analysis, and prediction of protein ubiquitination sites. Proteins 78:365-80
Li, Shuyan; Iakoucheva, Lilia M; Mooney, Sean D et al. (2010) Loss of post-translational modification sites in disease. Pac Symp Biocomput :337-47
Boxem, Mike; Maliga, Zoltan; Klitgord, Niels et al. (2008) A protein domain-based interactome network for C. elegans early embryogenesis. Cell 134:534-45