Intellectual Merit: The identification of coding regions or genes present in the genome of organisms is generally insufficient if based primarily on DNA sequence and in absence of experimental verification. Retrospective analysis of genome annotation with proteomics data improves its quality and completeness. Unfortunately, proteogenomically improved gene models are rarely incorporated into the public knowledge base. This project seeks to build a proteogenomics software pipeline that will enable and improve primary genome annotation. This pipeline will initially be used to re-annotate 30 prokaryotic genomes from six representative phyla: euryarchaeota, cyanobacteria, actinobacteria, firmicutes, deinococcus-thermus and proteobacteria. The anticipation is that tens of thousands of validated gene models or verified protein maturation events can be established. As target genomes are improved, the corrections can be propagated to an estimated 300 additional, highly homologous genomes. To ensure broad public accessibility, all findings will be incorporated into RefSeq and GenBank.

Broader Impact: The project will facilitate education and training through the development of a proteogenomics curriculum to be used in bioinformatics and genomics courses at universities and science workshops. Additionally, two high school teachers will be mentored in curriculum development.

Agency
National Science Foundation (NSF)
Institute
Emerging Frontiers (EF)
Type
Standard Grant (Standard)
Application #
1118732
Program Officer
Gregory W. Warr
Project Start
Project End
Budget Start
2010-11-01
Budget End
2012-08-31
Support Year
Fiscal Year
2011
Total Cost
$469,988
Indirect Cost
Name
Battelle Memorial Institute
Department
Type
DUNS #
City
Richland
State
WA
Country
United States
Zip Code
99354