The number of available genome sequences from the bacterial family Enterobacteriaceae is reaching a threshold where comparative genomics can drive hypotheses and experiments. In this project we have selected genomes for sequencing based on their pathogenicity and their taxonomic position. These sequences will help us understand these and other related pathogens by defining their differences and similarities in gene content. (1) The genome sequences of S. enterica serovar Paratyphi A (SPA), already sampled to 97 percent coverage, will be completed and annotated. SPA is the second most prevalent cause of typhoid and, like S. enterica serovar Typhi (STY), is restricted to humans. Typhi is undergoing genome degradation, perhaps associated with its recent adaptation to a narrow host range; we will determine if Paratyphi A is undergoing similar degradation. Klebsiella pneumoniae is a major opportunistic pathogen. We have sequenced this genome to 8-fold coverage; it will be closed, finished and annotated. (2) Cost-effective four-fold sampling (97 percent coverage) will be performed for four genomes: a biotype of S. enterica Paratyphi B (SPB), which is the third most prevalent cause of typhoid and is host-adapted to man; S. enterica Arizonae (SAR), the most distantly related S. enterica that regularly causes disease in humans; Citrobacter koseri (CKO) and Enterobacter cloacae (ECL) both of which are opportunistic pathogens representing the unsequenced genera within or adjacent to the E. coli/Salmonella/Klebsiella clade. Web-based analysis tools that take into account the incomplete nature of the samples will be used to present these data in comparison to other related genomes. Finally, (3) we have amplified and arrayed the complete open reading frames of nearly every CDS in S. enterica subspecies 1, serovar Typhimurium LT2. This resource will be supplemented with new putative CDSs, not found in STM, as these sequences become available from STY, SPA, SPB, and other serovars of S. enterica. Thus, we will develop an array that can be used in a wide variety of Salmonella, both sequenced and unsequenced, for analysis of expression and of genome content.
Showing the most recent 10 out of 60 publications