High-throughput phylogenomic analysis of animal proteins

Sjolander, Kimmen

Abstract

? With the completion of the sequencing of the first multicellular eukaryotic genome, Caenorhabditis elegans, in 1998, the Drosophila melanogaster genome in 2000, the human genome in 2001, and the pending completion of the mouse genome, investigators in animal genomics are facing new challenges in high-throughput analysis of the proteins encoded by these genes. Computational methods for protein function prediction are increasingly relied upon by biologists, for a first-pass annotation, and to prioritize wet-bench experiment. However, most of these methods do not provide sufficient information to enable informed prediction of specific protein function, and some of these methods result in systematic error, particularly those using function prediction by homology based on simple pair wise sequence comparison. It has become clear that phylogenomic analysis - function inference based on phylogenetic analysis of a protein in the context of its family members - is critical for accurate functional annotation. While phylogenomic analysis has been applied to the analysis of a number of protein families, a large-scale phylogenomic analysis of proteins in animal genomes has not yet been made available to scientists in the public sector. The work outlined in this proposal is designed to address this need, and to be complementary to existing tools. All proteins from animal genomes will be clustered into families based on global sequence similarity, and homologs will be gathered from other organisms. For each group, a multiple sequence alignment, phylogenetic tree, and subfamily classifications will be produced. Hidden Markov models will be generated to provide high-throughput classification ability, one for each protein family, and one for each subfamily identified. A web-server will be created, to enable investigators in both the private and public sectors to submit sequences for classification against these hidden Markov models, and a graphic user interface will display the correlation of changes in protein sequence with changes in structure and function ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG002769-01
Application #: 6597199
Study Section: Genome Study Section (GNM)
Program Officer: Good, Peter J

Project Start: 2003-09-30
Project End: 2008-06-30
Budget Start: 2003-09-30
Budget End: 2004-06-30
Support Year: 1
Fiscal Year: 2003
Total Cost: $355,018
Indirect Cost

Institution

Name: University of California Berkeley
Department: Biomedical Engineering
Type: Schools of Engineering
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2007 R01 HG	High-throughput phylogenomic analysis of animal proteins Sjolander, Kimmen / University of California Berkeley	$349,600
NIH 2006 R01 HG	High-throughput phylogenomic analysis of animal proteins Sjolander, Kimmen / University of California Berkeley	$358,341
NIH 2005 R01 HG	High-throughput phylogenomic analysis of animal proteins Sjolander, Kimmen / University of California Berkeley	$365,223
NIH 2004 R01 HG	High-throughput phylogenomic analysis of animal proteins Sjolander, Kimmen / University of California Berkeley	$364,352
NIH 2003 R01 HG	High-throughput phylogenomic analysis of animal proteins Sjolander, Kimmen / University of California Berkeley	$355,018

Publications

Sankararaman, Sriram; Sha, Fei; Kirsch, Jack F et al. (2010) Active site prediction using evolutionary and structural information. Bioinformatics 26:617-24

Alterovitz, Ron; Arvey, Aaron; Sankararaman, Sriram et al. (2009) ResBoost: characterizing and predicting catalytic residues in enzymes. BMC Bioinformatics 10:197

Datta, Ruchira S; Meacham, Christopher; Samad, Bushra et al. (2009) Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res 37:W84-9

Sankararaman, Sriram; Kolaczkowski, Bryan; Sjolander, Kimmen (2009) INTREPID: a web server for prediction of functionally important residues by evolutionary analysis. Nucleic Acids Res 37:W390-5

Krishnamurthy, Nandini; Brown, Duncan; Sjolander, Kimmen (2007) FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function. BMC Evol Biol 7 Suppl 1:S12

Glanville, Jake Gunn; Kirshner, Dan; Krishnamurthy, Nandini et al. (2007) Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis. Nucleic Acids Res 35:W27-32

Brown, Duncan P; Krishnamurthy, Nandini; Sjolander, Kimmen (2007) Automated protein subfamily identification and classification. PLoS Comput Biol 3:e160

Brown, Duncan; Krishnamurthy, Nandini; Dale, Joseph M et al. (2005) Subfamily hmms in functional genomics. Pac Symp Biocomput :322-33

Krishnamurthy, Nandini; Sjolander, Kimmen (2005) Phylogenomic inference of protein molecular function. Curr Protoc Bioinformatics Chapter 6:Unit 6.9

Sjolander, Kimmen (2004) Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20:170-9

Comments

Be the first to comment on Kimmen Sjolander's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: