Cataloging the subcellular and suborganellar proteomes of sequenced genomes

Guda, Chittibabu

Abstract

Precise knowledge of the subcellular localization of proteins is very important in systems biology research because most cellular processes are spatially constrained in the cell. This spatial context is essential to gain a better understanding of the various roles of proteins involved in the intra-cellular cross-talk and cell signaling associated with disease pathways that span across subcellular boundaries. Experimentally-determined localizations are available only for about 1% of the proteins in the UniProt database. Computational methods can complement experimental efforts in determining the localization of many proteins with unknown localization. Existing computational methods have limited scope and applicability, and hence are not suitable for proteome-wide prediction of localizations. Moreover, the reliability of these predictions is questionable due to lack of any experimental validation. In this project, we propose the development of a comprehensive system that will enable us to create accurate and comprehensive catalogs of subcellular and suborganellar proteomes of all sequenced genomes of animal species. This system is based on our recently published computational method known as ngLOC, that uses 'n-gram'peptides (fixed-length subsequences of proteins) to build accurate Bayesian models for classification of subcellular and suborganellar classes. Additionally, ngLOC is well suited for proteome-wide predictions and to predict proteins localized to multiple organelles. Based on the ngLOC approach, we propose to develop a new method by using advanced computational concepts such as semi-supervised learning, hierarchical Bayesian classification and ensemble approaches, and by implementing substitutions matrices to compare n-gram homology. All of these methods have proven success in other domains and hence are expected to substantially improve the accuracy of our method. A set of 400 human proteins whose localizations are predicted by our new method will be experimentally tested in normal and cancer cell lines of human, using GFP-fusion and expression followed by visualization under confocal microscope. This step would allow us to determine the prediction accuracy of our method at each score threshold for each organelle. Using optimal score thresholds, proteome-wide predictions will be carried out and detailed catalogs of experimentally-known and predicted subcellular and suborganellar proteomes will be generated for all sequenced genomes of animal species. Additionally, a standalone software package for the improved method will be developed and released to the research community under the General Public License (GPL). An online web server will be developed to make predictions online, and to enable access to the cataloged data and to the software produced in this project. In summary, the proposed comprehensive system will deliver a 'gold-standard'dataset of experimentally established localizations, a novel methodology for prediction, experimental validation of predicted localizations, and a public web server to predict or to access datasets and the software tool developed in this project. These resources will prove to be very valuable to the biomedical research community in advancing the many facets of systems biology research.

Public Health Relevance

Proteins are synthesized in the cytoplasm of a cell, but are destined to localize into specific subcellular compartment(s) to carry out their intended functions. A number of human diseases are caused by mislocalization of proteins to unintended subcellular locations resulting in functional interference with a vital cellular process. The current project proposes a comprehensive system that uses computational and experimental approaches to accurately determine the subcellular localization of proteins and to generate detailed catalogs of subcellular proteomes for all sequenced genomes of animal species. The outcomes of this project will help advance our understanding of protein localization and function and consequently, our understanding of the causative factors for many human diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM086533-02
Application #: 7918788
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Remington, Karin A

Project Start: 2009-09-01
Project End: 2010-09-02
Budget Start: 2010-09-01
Budget End: 2010-09-02
Support Year: 2
Fiscal Year: 2010
Total Cost: $1
Indirect Cost

Institution

Name: State University of New York at Albany
Department: Public Health & Prev Medicine
Type: Schools of Public Health
DUNS #: 152652822

City: Albany
State: NY
Country: United States
Zip Code: 12222

Related projects


NIH 2013 R01 GM	Cataloging the subcellular and suborganellar proteomes of sequenced genomes Guda, Chittibabu / University of Nebraska Medical Center	$175,564
NIH 2012 R01 GM	Cataloging the subcellular and suborganellar proteomes of sequenced genomes Guda, Chittibabu / University of Nebraska Medical Center	$218,318
NIH 2011 R01 GM	Cataloging the subcellular and suborganellar proteomes of sequenced genomes Guda, Chittibabu / University of Nebraska Medical Center	$218,318
NIH 2010 R01 GM	Cataloging the subcellular and suborganellar proteomes of sequenced genomes Guda, Chittibabu / State University of New York at Albany	$1
NIH 2010 R01 GM	Cataloging the subcellular and suborganellar proteomes of sequenced genomes Guda, Chittibabu / University of Nebraska Medical Center	$147,015
NIH 2009 R01 GM	Cataloging the subcellular and suborganellar proteomes of sequenced genomes Guda, Chittibabu / State University of New York at Albany	$151,500

Publications

Negi, Simarjeet K; Guda, Chittibabu (2017) Global gene expression profiling of healthy human brain and its application in studying neurological disorders. Sci Rep 7:897

Vural, Suleyman; Wang, Xiaosheng; Guda, Chittibabu (2016) Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Syst Biol 10 Suppl 3:62

Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M et al. (2015) LocSigDB: a database of protein localization signals. Database (Oxford) 2015:

Shen, Ru; Wang, Xiaosheng; Guda, Chittibabu (2015) Discovering distinct functional modules of specific cancer types using protein-protein interaction networks. Biomed Res Int 2015:146365

Mohammed, Akram; Guda, Chittibabu (2015) Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism. BMC Genomics 16 Suppl 7:S16

Chaturvedi, Nagendra K; Mir, Riyaz A; Band, Vimla et al. (2014) Experimental validation of predicted subcellular localizations of human proteins. BMC Res Notes 7:912

Shen, Ru; Guda, Chittibabu (2014) Applied graph-mining algorithms to study biomolecular interaction networks. Biomed Res Int 2014:439476

Srinivasan, Satish M; Guda, Chittibabu (2013) MetaID: a novel method for identification and quantification of metagenomic samples. BMC Genomics 14 Suppl 8:S4

Srinivasan, Satish M; Vural, Suleyman; King, Brian R et al. (2013) Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics 14:96

King, Brian R; Vural, Suleyman; Pandey, Sanjit et al. (2012) ngLOC: software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes. BMC Res Notes 5:351

Showing the most recent 10 out of 12 publications

Comments

Be the first to comment on Chittibabu Guda's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: