Indiana University is given a CAREER award for Predrag Radivojac to develop new methods to study protein post-translational modifications (PTMs), i.e. covalent modifications to protein structure after its translation. It is now known that there are over 200 PTM types that modulate protein activity in eukaryotic organisms by acting as molecular switches which turn on and off protein function, stabilize it or target for destruction or determine its cellular location. The objective is to carry out the following research activities: (Aim 1) Develop models for automated annotation of PTM sites (Aim 2) Analyze structural and functional properties of PTM sites (Aim 3) Integrate protein bioinformatics and proteomics to construct functional map of all major PTMs Intellectual Merit. This project introduces a novel framework of integrating protein bioinformatics and proteomics towards comprehensive understanding of the regulatory and signaling roles of PTMs. In addition to the new biological knowledge, this project will develop a set of tools that will enable accurate and efficient proteomics analysis and also aid experimental biologists in experimental design. Combinatorial explosion is currently the major hindrance in proteomics studies of PTMs and cellular processes they are involved in. Thus, the proposed prioritization of the proteomics searches by incorporating PTM prediction will not only make such studies realistic, but also enable identification of many new sites by decreasing the effective database size. PTMs frequently occur in unstructured protein regions that cannot be easily studied using traditional experimental methods (X-ray crystallography, NMR spectroscopy) and structural bioinformatics. Developing sequence based methods to predict PTM sites, integrating them into proteomics searches for verification, and investigating relationship between PTMs and genetic data may be crucial for understanding the molecular basis of many diseases, thus facilitating drug design and clinical therapy. The far-reaching promise of understanding life at a molecular level and advancing medical research can be fulfilled only through an integration of research and interdisciplinary education. To achieve this goal, the PI will extend and develop new courses at graduate and undergraduate level, will expose students to advanced education, and, through an active, rigorous and enthusiastic learning environment; train a generation of new experts and researchers who will be capable of making further advances. The teaching material and research results will be disseminated via our web site, by publishing papers in internationally recognized journals and by attending major conferences. The educational plan of this project will ensure participation of underrepresented minority and socioeconomic groups in research by partnering with the Office of Strategic Mentoring (OSM) and Alliances for Graduate and Professoriate Program (AGEP) at Indiana University. Two vital components of this plan are (i) attracting underrepresented students through OSM/AGEP and courses addressing data mining and bioinformatics, and (ii) retaining and promoting underrepresented students through committing funding, advising and mentoring.

Project Report

(PTMs) are covalent processing events of a protein after its translation. They include covalent additions of particular chemical groups, lipids, carbohydrates or even entire proteins to amino acid side chains, as well as the enzymatic cleavage of peptide bonds. Post-translational modification is an important mechanism of regulation of protein function that serves to greatly expand an organism’s complexity while maintaining a relatively small genome size. In this multi-year project, we proposed (i) extensive data collection, (ii) developing computational models for automated PTM annotation, (iii) structural and functional characterization of PTM sites, and (iv) integration of protein bioinformatics and mass spectrometry-based proteomics to make advances in processing of PTMs. We thoroughly studied various structural and functional properties of PTMs. From the structural perspective, we demonstrated the ability of several PTMs to impact protein function through structural rearrangements and conformational selection. We also hypothesized and provided evidence for the PTMs to impact protein function through combinatorial effect as an ef?cient vehicle that can save a cell several-fold in gene number and speed up its response to environmental change. We extensively studied the relationship between PTMs and disease: we found that mutations of and around PTM sites may be involved in about 5%, and possibly even more, of known instances of human inherited disease and other rare disorders. Especially solid evidence is provided for glycosylation, acetylation, hydroxylation, proteolytic cleavage, and carboxylation. In addition, we found links between phosphorylation and somatic mutations, which led us to hypothesize that disruption of subtle signaling events is one of the signatures of cancer. On the computational side, we developed algorithms for predicting about twenty different PTMs from protein sequence and structure that can be used to guide biological experiments and also to predict molecular mechanisms of disease. We used these models to improve the prediction of effect of mutations on protein function and human phenotype. The computational models for prediction of PTMs were further expanded to other types of functional residues, such as catalytic sites and protein function as a whole. In one of our collaborative publications, we used protein phosphorylation and its prediction to propose that an organizing principle in the logic of integrated biological networks favor the regulation of regulatory proteins by the specific regulation they conduct (in other words, kinases are functional regulators regulated by phosphorylation). We have also studied evolution of protein function and provided evidence that challenged one of the important assumptions related to evolution of protein function: that orthologous proteins are more likely to share the same function than paralogous proteins. By studying human and mouse proteins and their functional annotation, we found that even though the orthologous proteins are undeniably useful for transferring protein function from one organism to another, paralogs in the same organism are more likely to have the same function than the orthologs at the same sequence identity levels. This discovery led us to hypothesize that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act. Finally, we also used prediction of phosphorylation sites to improve the identification of phosphorylated peptides using tandem mass spectrometry technology. We believe this work serves as a proof of principle that computational PTM site predictors will be useful in identification of new PTM sites. Overall, we believe we contributed to a better understanding of post-translational modifications and protein function as a whole and believe that our efforts and results will have positive impact not only on biology but also on the development of new personalized treatments in precision medicine. Computationally, we developed statistical inference algorithms that can be applied in domains other than computational biology (e.g. graphlet kernels). For most of its duration, this grant supported two graduate students (one male and one female), both of whom earned their Ph.D. degrees at Indiana University and moved on to post-doctoral and industry positions. In addition, we funded several undergraduate students and helped two female students (one minority) pursue graduate school education. We received several travel awards for the students to present their papers at high-caliber scientific meetings and in 2013 received an Outstanding Student Paper award at the most prestigious Computational Biology conference Intelligent Systems for Molecular Biology (ISMB).

National Science Foundation (NSF)
Division of Biological Infrastructure (DBI)
Application #
Program Officer
Anne Haake
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University
United States
Zip Code