Microbial communities are the primary drivers of the global biogeochemical cycles that maintain the nutrient balance of ecosystems and ultimately shape overall ecosystem function. Today, we appreciate the critical role microbes play in the biochemical processing of nutrient elements; yet our understanding of how the structure of microbial communities influences the suite of biogeochemical processes within a given nutrient cycle is somewhat rudimentary. Limited understanding of structure-function relationships between microbial communities and biogeochemical cycles is due in large part to technological limitations in characterizing the ecology of microbial communities. Until recently, microbial ecologists simply had no means of unambiguously characterizing the richness and evenness of species within a microbial community and the prevailing biochemical capabilities of constituent microbial populations. Using high-throughput DNA sequencing and three methodological approaches,shotgun metagenomics, PCR amplicon sequencing, and genomics microbial ecologists are beginning to unveil the inner workings of microbial communities and connect genetic details with biogeochemical processes. While this work holds great promise for the advancement of microbial ecology, currently-available high-throughput sequencing technologies are not ideally suited to the high-sample throughput demands of ecosystem science, the small genome size of bacteria and viruses, and their genetic novelty. Moreover, in some cases the failure to adequately ground-truth application of next-generation DNA sequencers to environmental DNA samples has resulted in biased data and erroneous scientific conclusions. This "high-risk; high-reward" research seeks to explore and test the use of a new, next-generation DNA sequencer, the PacBio RS, which has several attributes that may make it better suited to the specific needs of microbial ecology research and has the potential to be highly transformative to this geoscience discipline. A series of controlled and carefully replicated experiments will be conducted that will test the use of PacBio sequencing for shotgun metagenomics, 16S PCR amplicon sequencing, and single cell genome sequencing. This project will leverage existing datasets from other high-throughput sequencing platforms (e.g., Illumina and 454) to directly compare the performance of PacBio in each of these application areas. Through a NSF Major Research Instrumentation award to the University of Delaware, the PIs will have access to one of the few PacBio RS instruments available at an academic institution. Ultimately, these investigations will constrain the experimental error within PacBio sequencing and serve as an initial demonstration of the utility of the instrument for microbial ecology research. The Broader Impacts of this proposal includes an effort to understand and constrain the sources of error and other biases within PacBio sequencing, and make technical recommendations that will shape the optimal use of the instrument within microbial science. In the course of this work, the PIs will mentor a Ph.D. graduate student and a post-doctoral researcher, and provide open access to all project data and findings.

Project Report

Overview. Next-generation DNA sequencers introduced over the past seven years have given environmental researchers unprecedented access to DNA sequence data from unknown microorganisms. By and large, the technological advancements in DNA sequencing have come through providing increasingly cheaper, yet shorter read length sequences. This emphasis on cheap sequencing of short DNA fragments has been driven primarily by the demand for human genome re-sequencing, but it poses significant limitations when applied to the analysis of complex microbial communities from environmental samples. PacBio RS, a third-generation genomic sequencing technology, offers a departure from this trend, with read lengths averaging 2,500 bp, and these sequence reads are obtained from single template DNA molecules. In theory, this should significantly improve the value of genomic sequences obtained from environmental microbial communities. The central objective of this project was to determine whether the perceived advantages of PacBio translate in practice when applied to ‘real world’ research scenarios in marine microbial ecology, including: a) examination of microbial diversity using polymerase chain reaction amplicon sequencing; b) bacterial genomic sequencing, including single-cell genomic analysis; and c) analysis of the genetic composition of microbial communities using shotgun metagenomics. Intellectual merit. At Bigelow Laboratory, our work has been focused on PacBio sequencing applications in microbial single cell genomics, a cutting-edge research technology pioneered by Bigelow scientists. By creating benchmark data sets from individual cells of previously sequenced strains of marine cyanobacteria Prochlorococcus, we were able to efficiently evaluate various genomic sequencing and assembly strategies. We tested PacBio sequencing using various DNA insert sizes and library preparation techniques, alone and in combination with Illumina sequencing (currently predominant short read sequencing technology). We determined that the success of genome recovery from individual cells can be predicted from the speed of single cell whole genome amplification, which will significantly improve data quality and reduce costs in this rapidly growing research area. We then applied the PacBio-only, Illumina-only and PacBio-Illumina hybrid sequencing approaches on SAGs of ten uncultured marine bacteria and archaea in the first single cell genomics-based analysis of virus-host interactions in their natural environment. In this pilot study we have identified a number of novel virus-host systems, which span three virus classes: Myoviridae, Siphoviridae and Podoviridae. We found these infections in diverse members of surface ocean bacterioplankton, including Verrucomicrobia, Marinimicrobia (SAR406), Proteobacteria, and Bacteroidetes. In difference to prior cultivation-independent research methods, this new approach offers a combination of several important benefits: a) ability to identify viruses as well as their hosts; b) recovery of near-complete genomes; and c) discrimination between lytic and lysogenic interaction. Thus, our results so far demonstrate that PacBio technology offers significant new opportunities to marine microbiologists. The paucity of suitable bioinformatics tools currently constitutes the main bottleneck to the broader application of PacBio sequencing technology in marine microbial ecology, and our current work is focused specifically to address this challenge. Broader impacts. This project has provided significant training and professional development experiences for two undergraduate students, three graduate students, two postdoctoral scientists and a bioinformatician. They have been involved in various aspects of laboratory work, computational analyses, result presentations and manuscript preparation. In the process, they obtained hands-on experience in new research areas and technologies, as well as project management and mentorship roles. By ground-truthing the use of PacBio sequencing technology for research problems in marine microbial ecology, we expect that our findings will have a significant impact on the broader field of microbial ecology and other life sciences.

Agency
National Science Foundation (NSF)
Institute
Division of Ocean Sciences (OCE)
Type
Standard Grant (Standard)
Application #
1148017
Program Officer
David L. Garrison
Project Start
Project End
Budget Start
2011-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2011
Total Cost
$98,918
Indirect Cost
Name
Bigelow Laboratory for Ocean Sciences
Department
Type
DUNS #
City
East Boothbay
State
ME
Country
United States
Zip Code
04544