Intellectual Merit: The identification of coding regions or genes present in the genome of organisms is generally insufficient if based primarily on DNA sequence and in absence of experimental verification. Retrospective analysis of genome annotation with proteomics data improves its quality and completeness. Unfortunately, proteogenomically improved gene models are rarely incorporated into the public knowledge base. This project seeks to build a proteogenomics software pipeline that will enable and improve primary genome annotation. This pipeline will initially be used to re-annotate 30 prokaryotic genomes from six representative phyla: euryarchaeota, cyanobacteria, actinobacteria, firmicutes, deinococcus-thermus and proteobacteria. The anticipation is that tens of thousands of validated gene models or verified protein maturation events can be established. As target genomes are improved, the corrections can be propagated to an estimated 300 additional, highly homologous genomes. To ensure broad public accessibility, all findings will be incorporated into RefSeq and GenBank.

Broader Impact: The project will facilitate education and training through the development of a proteogenomics curriculum to be used in bioinformatics and genomics courses at universities and science workshops. Additionally, two high school teachers will be mentored in curriculum development.

Agency
National Science Foundation (NSF)
Institute
Emerging Frontiers (EF)
Type
Standard Grant (Standard)
Application #
0949047
Program Officer
Gregory W. Warr
Project Start
Project End
Budget Start
2009-09-01
Budget End
2011-02-28
Support Year
Fiscal Year
2009
Total Cost
$707,345
Indirect Cost
Name
J. Craig Venter Institute, Inc.
Department
Type
DUNS #
City
Rockville
State
MD
Country
United States
Zip Code
20850