The difference between the number of proteins with known sequence and those with well- studied function (sequence-function gap) is growing daily. One well-defined coarse-grained aspect of function is the native subcellular localization of a protein that has a central role in the Gene Ontology (GO) hierarchy. Many detailed and high-throughput experiments annotate localization. Where experiments do not reach, homology-based and de novo prediction methods succeed. Here, we propose the development of a comprehensive system that combines experimental resources with data mining techniques and novel prediction methods with the objective to annotate localization for entirely sequenced eukaryotes at an unprecedented detail and accuracy. Firstly, we propose to gather all available data and all relevant methods to build a comprehensive localization atlas for human and Arabidopsis. Secondly, we plan to develop novel methods tailored specifically to capture proteins for which we are left with no reliable annotations after completing the first step. We assume that these methods will focus on the prediction of the particular type of membrane into which an integral membrane protein is inserted, and of the native localization for minor eukaryotic compartments (ER, Golgi, lysosome). Thirdly, we propose the implementation of specific improvements over today's motif-based methods for secreted and nuclear proteins, as well as the extension of de novo predictions for the major compartments. An important objective will be to maintain high levels of performance for splice variants and for sequence fragments. Overall, the project will require the analysis of existing biological databases, the development of novel methods, and the combination of existing ones;it will generate novel information available through internet servers, standalone programs and databases.

Public Health Relevance

The annotations generated by our system will aid the design of detailed and high-throughput experimental studies. In particular, localization may increase in its relevance as one essential feature used to infer networks of interactions. The ultimate goal of our project is the generation of an atlas that maps all proteins in a cell. Eventually, this atlas will constitute a 4D map;it will localize proteins in their 3D cellular environments and resolve the coarse-grained dynamics of the system, e.g. """"""""expression on ribosomes, bind importin, transport into nucleus, bind DNA, bind exportin, export out of nucleus;next cell cycle"""""""". The components proposed here constitute one crucial building block toward such a 4D map of a cell.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-MSFD-N (01))
Program Officer
Remington, Karin A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Schools of Medicine
New York
United States
Zip Code
Rastogi, Shruti; Rost, Burkhard (2011) LocDB: experimental annotations of localization for Homo sapiens and Arabidopsis thaliana. Nucleic Acids Res 39:D230-4
Schaefer, Christian; Schlessinger, Avner; Rost, Burkhard (2010) Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be. Bioinformatics 26:625-31
Rastogi, Shruti; Rost, Burkhard (2010) Bioinformatics predictions of localization and targeting. Methods Mol Biol 619:285-305
Bromberg, Yana; Yachdav, Guy; Ofran, Yanay et al. (2009) New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'. Curr Opin Drug Discov Devel 12:408-19
Bertonati, Claudia; Punta, Marco; Fischer, Markus et al. (2009) Structural genomics reveals EVE as a new ASCH/PUA-related domain. Proteins 75:760-73
Wrzeszczynski, Kazimierz O; Rost, Burkhard (2009) Cell cycle kinases predicted from conserved biophysical properties. Proteins 74:655-68
Kernytsky, Andrew; Rost, Burkhard (2009) Using genetic algorithms to select most predictive protein features. Proteins 75:75-88
Lippi, Marco; Passerini, Andrea; Punta, Marco et al. (2008) MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence. Bioinformatics 24:2094-5
Ofran, Yanay; Schlessinger, Avner; Rost, Burkhard (2008) Automated identification of complementarity determining regions (CDRs) reveals peculiar characteristics of CDRs and B cell epitopes. J Immunol 181:6230-5