This EAGER project to Brian Palenik, UC San Diego, is for tools development to identify previously unknown, putative genes in metagenomic sequence data. Metagenomic sequences represent the full complement of microbial genomes in a community and are difficult to analyse because of their complexity. Even the genomes of cultured organisms are difficult to annotate as typically 50% of the genome sequence can contain unknown or novel genes. Proteomics uses mass spectrometry to analyze peptides derived from the proteolysis of samples. This analysis allows for the determination of the presence or absence of specific proteins in the original sample. In some forms, proteomics can use genomic databases as a starting point for peptide and protein identification. More recently, a form of "reverse proteomics" is being tested where mass spectrometry data are used to improve whole genome annotation by demonstrating the presence of proteins not predicted in the initial genome annotation. This technology is now beginning to show promise in the analysis of complex environmental samples, such as metaproteomic samples. This type of community-level analysis can lead to new information about not only the biodiversity but the functioning of ecosystems. However, the coupling of mass spectrometry and genome and metagenome data to analyze environmental samples still requires tool development. This development is best done when the metagenome and metaproteome sampling can be coordinated to optimize and test the tools. This PI is developing several tools for analyzing metaproteomes using metagenomics data obtained from the same site. The tools include computational procedures to discover new open reading frames and genes, methods to identify which species are present without DNA sequencing and an approach for analysing post-translational modifications in Synechococcus metagenomes. This aspect of the work is higher risk than the other aims but is important because this will allow them to evaluate which genes in the metagenome are active. This project represents a novel interdisciplinary collaboration between a marine microbiologist and a mass spectroscopist, and it is timely as it takes advantage of ongoing metagenome sequencing funded by another NSF proposal. If successful, these tools will allow for the identification of activated genes within communities. Further, new or novel genes will be identified in the metagenomes and the products of these novel genes will be identified. This research team will focus on marine cyanobacterial communities as the test metagenome. The PI has already sequenced the genomes of two Synechococcus (marine cyanobacteria) and he will use these as reference genomes for the metagenomic/metaproteomic analyses. Synechococcus assemblages in the ocean are not very diverse, therefore Synechococcus metagenomes are some of the best metagenomes for these tests because these populations can be enriched from seawater to produce a nearly pure samples for proteomic analysis. The PI is overseeing training of two post-docs as a result of this award.

Agency
National Science Foundation (NSF)
Institute
Emerging Frontiers (EF)
Type
Standard Grant (Standard)
Application #
0938190
Program Officer
Karen C. Cone
Project Start
Project End
Budget Start
2009-08-01
Budget End
2011-07-31
Support Year
Fiscal Year
2009
Total Cost
$87,596
Indirect Cost
Name
University of California-San Diego Scripps Inst of Oceanography
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093