Breakthrough technology enabling rapid sequencing of whole genomes from single cells, will be used in this project to unlock secrets of life's early evolution and genealogy. Prior to this breakthrough, many of life's earliest diverging lineages could not be analyzed because they cannot be grown in laboratory cultures. This unculturable "Microbial Dark Matter" (MDM) within the groups Bacteria and Archaea constitutes the majority of biological species diversity and biological mass on Earth, and this can now be brought to light by the sequencing of single amplified genomes (SAGs). In this unprecedented research project, many new genomes, encoding the blueprints for disparate organisms, can and will be analyzed to illuminate (1) the genealogy of these ancient microbial species, (2) the relative timing of their origins, and (3) the role that horizontal gene transfer among distant relatives may have played in the origins of new species and novel ways of living. Further, extensive MDM sampling will be done from unexplored subterranean environments around the world, enhancing the discovery and understanding of life forms that are new to science.
A team of collaborating scientists will analyze over 100 MDM-rich field samples, collected from multiple study sites, including Precambrian shield environments from the Kaapvaall Craton of South Africa, a Precambrian metamorphic complex accessed via the Sanford Underground Research Facility in South Dakota, and fault-associated springs associated with Tertiary volcanic areas and Cambrian-aged sedimentary rocks of the Death Valley Regional Flow System in Nevada. Extensive contextual information (environmental and biological metadata) will also be incorporated in these sampling efforts. The research tasks begin with the collection of field samples and metadata from sites known or suspected to contain a high proportion of MDM. Then, the microbial community composition within the collected samples is surveyed by sequencing their small sub-unit (SSU) rRNA gene iTags. Based on the iTag data, 20 samples that best meet the objectives of this project are selected, and 12,600 single amplified genomes (SAGs) from them will be sequenced. The SAGs will then be identified by their SSU rRNA genes, and 800 MDM SAGs that best meet the objectives of this project will be further annotated, aligned and combined with various publicly-available data sets for use in detailed, statistical phylogenetic reconstruction of a comprehensive genealogy for Bacteria and Archaea. The genealogical inferences, in combination with other data layers important to understanding MDM evolution (including geochemical, geospatial, microbial community composition, and microbial physiology data) will be compiled and used in addressing the project's general evolutionary questions, as noted above. Analyses will also address the potential for the deep subsurface environments of Earth to serve as a repository for the early evolutionary history of Bacteria and Archaea. This project leverages an existing major award to the researchers through the DOE Joint Genome Institute to cover sequencing costs. The project has a very rich educational and outreach component, geared toward research experiences for undergraduate and graduate students, postdoctoral researchers and high school teachers. Engaging outreach activities and media designed for the general public are also planned. Further, the laboratory and computational tools and the massive genomic data produced by this project will provide a major resource to the broad research community and, potentially, to the biotechnology industry.