? With the completion of the sequencing of the first multicellular eukaryotic genome, Caenorhabditis elegans, in 1998, the Drosophila melanogaster genome in 2000, the human genome in 2001, and the pending completion of the mouse genome, investigators in animal genomics are facing new challenges in high-throughput analysis of the proteins encoded by these genes. Computational methods for protein function prediction are increasingly relied upon by biologists, for a first-pass annotation, and to prioritize wet-bench experiment. However, most of these methods do not provide sufficient information to enable informed prediction of specific protein function, and some of these methods result in systematic error, particularly those using function prediction by homology based on simple pair wise sequence comparison. It has become clear that phylogenomic analysis - function inference based on phylogenetic analysis of a protein in the context of its family members - is critical for accurate functional annotation. While phylogenomic analysis has been applied to the analysis of a number of protein families, a large-scale phylogenomic analysis of proteins in animal genomes has not yet been made available to scientists in the public sector. The work outlined in this proposal is designed to address this need, and to be complementary to existing tools. All proteins from animal genomes will be clustered into families based on global sequence similarity, and homologs will be gathered from other organisms. For each group, a multiple sequence alignment, phylogenetic tree, and subfamily classifications will be produced. Hidden Markov models will be generated to provide high-throughput classification ability, one for each protein family, and one for each subfamily identified. A web-server will be created, to enable investigators in both the private and public sectors to submit sequences for classification against these hidden Markov models, and a graphic user interface will display the correlation of changes in protein sequence with changes in structure and function ? ?
Sankararaman, Sriram; Sha, Fei; Kirsch, Jack F et al. (2010) Active site prediction using evolutionary and structural information. Bioinformatics 26:617-24 |
Alterovitz, Ron; Arvey, Aaron; Sankararaman, Sriram et al. (2009) ResBoost: characterizing and predicting catalytic residues in enzymes. BMC Bioinformatics 10:197 |
Datta, Ruchira S; Meacham, Christopher; Samad, Bushra et al. (2009) Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res 37:W84-9 |
Sankararaman, Sriram; Kolaczkowski, Bryan; Sjolander, Kimmen (2009) INTREPID: a web server for prediction of functionally important residues by evolutionary analysis. Nucleic Acids Res 37:W390-5 |
Glanville, Jake Gunn; Kirshner, Dan; Krishnamurthy, Nandini et al. (2007) Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis. Nucleic Acids Res 35:W27-32 |
Brown, Duncan P; Krishnamurthy, Nandini; Sjolander, Kimmen (2007) Automated protein subfamily identification and classification. PLoS Comput Biol 3:e160 |
Krishnamurthy, Nandini; Brown, Duncan; Sjolander, Kimmen (2007) FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function. BMC Evol Biol 7 Suppl 1:S12 |
Brown, Duncan; Krishnamurthy, Nandini; Dale, Joseph M et al. (2005) Subfamily hmms in functional genomics. Pac Symp Biocomput :322-33 |
Krishnamurthy, Nandini; Sjolander, Kimmen (2005) Phylogenomic inference of protein molecular function. Curr Protoc Bioinformatics Chapter 6:Unit 6.9 |
Sjolander, Kimmen (2004) Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20:170-9 |