Genome and metagenome projects have revealed the genetic sequence of millions of proteins, whose biological interpretation requires understanding of their function. One of the most successful approaches for predicting proteins'functions is the integration of all available functional data evolutionary relationships in a reconciled phylogenetic tree. This method, known as phylogenomics, has been heralded as highly accurate and conceptually elegant, but its application has been limited by its exquisite dependency upon painstaking analyses by domain experts. We will enhance, assess, and apply a statistical method for predicting protein function using phylogenomic principles. Our approach, known as SIFTER (Statistical Inference of Function Through Evolutionary Relationships) presently exists as a prototype. In this proposal, we will enhance the core algorithms to take account of domain architecture, to become more consistently statistical in its approach, and to accommodate a larger range of possible functions for proteins. We will improve the key internal parameters of the molecular evolution model, and improve interpretability of the results. We will make the program capable of accepting more typical protein sequences for analysis, and of using a wider range of information (including database annotations, sequence &structure motifs) as evidence of function. Ultimately, SIFTER will be capable of incorporating other function prediction approaches within its phylogenetic context. The performance of SIFTER will be rigorously assessed using well-studied families. We will collaborate with major protein databases to deploy SIFTER for medium-scale application in protein annotation. Experimental validation will be essential to truly test SIFTER'S performance and, coincidentally, enrich our biological understanding of several protein families. We will use SIFTER to make an optimal selection of Nudix proteins for experimental characterization. In addition to assaying these proteins, we will also make blind predictions of molecular function of proteins being characterized by structural genomics centers, and we will then biochemically characterize promising candidate proteins provided to us. The completed SIFTER system should provide a significant improvement over current approaches for protein function prediction, of direct relevance to nearly all molecular biologists. The significance of this work for public health is clear and immediate, by unlocking protein function information encoded in genome sequences. These methods will allow understanding of proteins implicated in disease and necessary for health, in humans as well as model organisms. Application of SIFTER will also permit detailed understanding of pathogens'and commensal microbiota's proteins. These methods will be a foundation for the further study of any protein identified through genome projects.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM071749-03
Application #
7595887
Study Section
Special Emphasis Panel (ZRG1-BCMB-Q (90))
Program Officer
Brazhnik, Paul
Project Start
2007-05-01
Project End
2011-04-30
Budget Start
2009-05-01
Budget End
2010-04-30
Support Year
3
Fiscal Year
2009
Total Cost
$288,800
Indirect Cost
Name
University of California Berkeley
Department
Other Basic Sciences
Type
Schools of Earth Sciences/Natur
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Srouji, John R; Xu, Anting; Park, Annsea et al. (2017) The evolution of function within the Nudix homology clan. Proteins 85:775-811
Kara?i?, Zrinka; Vukeli?, Bojana; Ho, Gabrielle H et al. (2017) A novel plant enzyme with dual activity: an atypical Nudix hydrolase and a dipeptidyl peptidase III. Biol Chem 398:101-112
Nguyen, Vi N; Park, Annsea; Xu, Anting et al. (2016) Substrate specificity characterization for eight putative nudix hydrolases. Evaluation of criteria for substrate identification within the Nudix family. Proteins 84:1810-1822
Jiang, Yuxiang; Oron, Tal Ronnen; Clark, Wyatt T et al. (2016) An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol 17:184
Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E (2015) SIFTER search: a web server for accurate phylogeny-based protein function prediction. Nucleic Acids Res 43:W141-7
Listgarten, Jennifer; Stegle, Oliver; Morris, Quaid et al. (2014) PERSONALIZED MEDICINE: FROM GENOTYPES AND MOLECULAR PHENOTYPES TOWARDS THERAPY. Pac Symp Biocomput 19:224-228
Muratore, Kathryn E; Engelhardt, Barbara E; Srouji, John R et al. (2013) Molecular function prediction for a family exhibiting evolutionary tendencies toward substrate specificity swapping: recurrence of tyrosine aminotransferase activity in the I? subfamily. Proteins 81:1593-609
Xu, Anting; Desai, Anna M; Brenner, Steven E et al. (2013) A continuous fluorescence assay for the characterization of Nudix hydrolases. Anal Biochem 437:178-84
Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen et al. (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221-7
Stegle, Oliver; Brenner, Steven E; Morris, Quaid et al. (2013) PERSONALIZED MEDICINE: FROM GENOTYPES AND MOLECULAR PHENOTYPES TOWARDS COMPUTED THERAPY. Pac Symp Biocomput 18:171-174

Showing the most recent 10 out of 16 publications