The Berkeley Phylogenomics Group, headed by Dr Kimmen Sjolander, has joined forces with top investigators in microbiology, computational structural biology, and genomics to create an online phylogenomic encyclopedia of microbial gene families. This resource will enable biologists to predict the function, biological process and 3D structure of millions of proteins encoded in microbial genomes. When a gene is sequenced, the work has only just begun. Genome functional annotation and analysis starts where gene sequencing efforts end. The next (and much more challenging) questions involve understanding the function of the protein the gene encodes: What biological processes or pathways does this gene (or protein) participate in? What is its role in these processes or pathways? What is the protein's 3D structure? Phylogenomics, the study of genomes in an evolutionary framework, is the most powerful approach to date to answer these questions. It is also extremely challenging technically and computationally, and requires expertise in myriad bioinformatics tasks. These requirements have severely limited the application of phylogenomic approaches to predicting gene function despite overwhelming evidence that standard approaches to functional annotation are prone to serious systematic error, and that phylogenomic inference is able to prevent these errors. The resource to be developed under this grant, the PhyloFacts Microbial Encyclopedia, will address these problems by providing pre-computed phylogenomic analyses of millions of microbial genes. The PhyloFacts Microbial Encyclopedia will incorporate powerful new bioinformatics methods to reconstruct the evolutionary histories of these ancient gene families, predict protein structure, molecular function and cellular localization, and link genes to metabolic networks and signalling pathways.

The PhyloFacts microbial phylogenomic encyclopedia will contain tens of thousands of phylogenetic trees for microbial gene families. These data will help biologists understand the biological mechanisms underlying the evolution of microbial genomes, assist in the identification of horizontal gene transfer events, and provide a framework for understanding how gene families evolve following gene duplication, domain shuffling and gene fusion and fission events. The PhyloFacts resource pathway discovery and analysis modules will help biologists discover novel aspects of microbial biochemistry, metabolism, development and cellular biology. All data will be provided on the web, including interactive graphical user interfaces to enable biologists to view protein structures, manipulate and annotate phylogenetic trees, and collaborate in the functional annotation of protein families in their area of expertise. New sequences generated by genome sequencing projects will be classified to families and subfamilies using a database of hidden Markov models, statistical models representing the preferred amino acids at each position in the consensus structure of these macromolecules. This online collaboratory for the microbial scientific community will provide a foundation for scientists working on different microbial species and gene families to share their expertise and thus advance the pace of biological discovery.

Agency
National Science Foundation (NSF)
Institute
Division of Molecular and Cellular Biosciences (MCB)
Type
Standard Grant (Standard)
Application #
0732065
Program Officer
Gregory W. Warr
Project Start
Project End
Budget Start
2007-11-01
Budget End
2011-10-31
Support Year
Fiscal Year
2007
Total Cost
$1,899,499
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704