Protein function prediction by statistical phylogenomics

Brenner, Steven

Abstract

Genome and metagenome projects have revealed the genetic sequence of millions of proteins, whose biological interpretation requires understanding of their function. One of the most successful approaches for predicting proteins' functions is the integration of all available functional data evolutionary relationships in a reconciled phylogenetic tree. This method, known as phylogenomics, has been heralded as highly accurate and conceptually elegant, but its application has been limited by its exquisite dependency upon painstaking analyses by domain experts. We will enhance, assess, and apply a statistical method for predicting protein function using phylogenomic principles. Our approach, known as SIFTER (Statistical Inference of Function Through Evolutionary Relationships) presently exists as a prototype. In this proposal, we will enhance the core algorithms to take account of domain architecture, to become more consistently statistical in its approach, and to accommodate a larger range of possible functions for proteins. We will improve the key internal parameters of the molecular evolution model, and improve interpretability of the results. We will make the program capable of accepting more typical protein sequences for analysis, and of using a wider range of information (including database annotations, sequence & structure motifs) as evidence of function. Ultimately, SIFTER will be capable of incorporating other function prediction approaches within its phylogenetic context. The performance of SIFTER will be rigorously assessed using well-studied families. We will collaborate with major protein databases to deploy SIFTER for medium-scale application in protein annotation. Experimental validation will be essential to truly test SIFTER'S performance and, coincidentally, enrich our biological understanding of several protein families. We will use SIFTER to make an optimal selection of Nudix proteins for experimental characterization. In addition to assaying these proteins, we will also make blind predictions of molecular function of proteins being characterized by structural genomics centers, and we will then biochemically characterize promising candidate proteins provided to us. The completed SIFTER system should provide a significant improvement over current approaches for protein function prediction, of direct relevance to nearly all molecular biologists. The significance of this work for public health is clear and immediate, by unlocking protein function information encoded in genome sequences. These methods will allow understanding of proteins implicated in disease and necessary for health, in humans as well as model organisms. Application of SIFTER will also permit detailed understanding of pathogens' and commensal microbiota's proteins. These methods will be a foundation for the further study of any protein identified through genome projects. ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM071749-01A2
Application #: 7264715
Study Section: Special Emphasis Panel (ZRG1-BCMB-Q (90))
Program Officer: Li, Jerry

Project Start: 2007-05-01
Project End: 2011-04-30
Budget Start: 2007-05-01
Budget End: 2008-04-30
Support Year: 1
Fiscal Year: 2007
Total Cost: $288,800
Indirect Cost

Institution

Name: University of California Berkeley
Department: Other Basic Sciences
Type: Schools of Earth Sciences/Natur
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2010 R01 GM	Protein function prediction by statistical phylogenomics Brenner, Steven E. / University of California Berkeley	$285,912
NIH 2009 R01 GM	Protein function prediction by statistical phylogenomics Brenner, Steven E. / University of California Berkeley	$288,800
NIH 2009 R01 GM	Protein function prediction by statistical phylogenomics Brenner, Steven E. / University of California Berkeley	$266,184
NIH 2008 R01 GM	Protein function prediction by statistical phylogenomics Brenner, Steven E. / University of California Berkeley	$288,800
NIH 2007 R01 GM	Protein function prediction by statistical phylogenomics Brenner, Steven E. / University of California Berkeley	$288,800

Publications

Srouji, John R; Xu, Anting; Park, Annsea et al. (2017) The evolution of function within the Nudix homology clan. Proteins 85:775-811

Kara?i?, Zrinka; Vukeli?, Bojana; Ho, Gabrielle H et al. (2017) A novel plant enzyme with dual activity: an atypical Nudix hydrolase and a dipeptidyl peptidase III. Biol Chem 398:101-112

Nguyen, Vi N; Park, Annsea; Xu, Anting et al. (2016) Substrate specificity characterization for eight putative nudix hydrolases. Evaluation of criteria for substrate identification within the Nudix family. Proteins 84:1810-1822

Jiang, Yuxiang; Oron, Tal Ronnen; Clark, Wyatt T et al. (2016) An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol 17:184

Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E (2015) SIFTER search: a web server for accurate phylogeny-based protein function prediction. Nucleic Acids Res 43:W141-7

Listgarten, Jennifer; Stegle, Oliver; Morris, Quaid et al. (2014) PERSONALIZED MEDICINE: FROM GENOTYPES AND MOLECULAR PHENOTYPES TOWARDS THERAPY. Pac Symp Biocomput 19:224-228

Muratore, Kathryn E; Engelhardt, Barbara E; Srouji, John R et al. (2013) Molecular function prediction for a family exhibiting evolutionary tendencies toward substrate specificity swapping: recurrence of tyrosine aminotransferase activity in the I? subfamily. Proteins 81:1593-609

Xu, Anting; Desai, Anna M; Brenner, Steven E et al. (2013) A continuous fluorescence assay for the characterization of Nudix hydrolases. Anal Biochem 437:178-84

Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen et al. (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221-7

Stegle, Oliver; Brenner, Steven E; Morris, Quaid et al. (2013) PERSONALIZED MEDICINE: FROM GENOTYPES AND MOLECULAR PHENOTYPES TOWARDS COMPUTED THERAPY. Pac Symp Biocomput 18:171-174

Showing the most recent 10 out of 16 publications

Comments

Be the first to comment on Steven Brenner's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: