Microbial population genetic research has been crucial for understanding pathogen dynamics, virulence, host specificity, and many other topics; in many cases uncovering unexpected and transformative biological processes. However, conventional population genetic analyses are limited by the quantity of sequence data from each sample. The temporal, spatial, and evolutionary resolution of techniques that rely on single gene sequences or multi-locus sequence typing are often insufficient to study biological processes on fine scales, precisely the scales at which many evolutionary and mechanistic process occur. Population genomics offers a vast quantity of sequence information for inferring evolutionary and ecological processes on very fine spatial and temporal scales, inferences that are critical to understanding and eventually controlling many infectious diseases. The promise of population genomics is tempered, however, by difficulties in isolating and preparing microbes for next-generation sequencing. We have developed the selective whole genome amplification (SWGA) technology to sequence microbial genomes from complex biological specimens without relying on labor-intensive laboratory culture, even if the focal microbial genome constitutes only a miniscule fraction of the natural sample. The primary hindrance to popular adoption of SWGA for microbial genomic studies is not its effectiveness in producing samples suitable for next-generation sequencing but in the upfront investment needed to develop an effective protocol to amplify the genome of a specific microbial species. Identifying an SWGA protocol that consistently results in selective and even amplification across the target genome is currently hindered by computationally-inefficient software that can evaluate a very limited set of the potentially effective solutions. Further, this software uses marginally-effective optimality criteria as there is currently only a limited understanding of the true criteria that result in highly-selective and even amplification of a target genome. As a result, SWGA protocol development is currently costly in both time and resources. A primary goal of the proposed research is to identify the criteria that result in optimal SWGA by analyzing next- generation sequencing data with advanced machine learning techniques. These optimality criteria will be integrated into a freely-available, computationally-efficient swga development program that will reduce the upfront investment in SWGA protocol development, thus allowing researchers to address medically- and biologically-important questions in any microbial species. In the near term, this project will also generate effective SWGA protocols for four microbial species which can be used immediately to address fundamental questions in evolutionary biology, disease progression, and emerging infectious disease dynamics. From a global disease perspective, this work is imperative as the majority of microbial species cannot easily be cultured and are in danger of becoming bystanders in the genomics revolution that is currently elucidating evolutionary processes and molecular mechanisms in cultivable microbial species.

Public Health Relevance

Addressing many of the major outstanding questions about pathogen evolution will require analyses of populations of microbial genomes. Although population genomic studies would provide the analytical resolution to investigate evolutionary and mechanistic processes on fine spatial and temporal scales ? precisely the scales at which these processes occur ? microbial population genomic research is currently hindered by the practicalities of obtaining sufficient quantities of genomes to analyze. We propose to develop an innovative, cost-effective, practical, and publically-available technology to collect sufficient quantities of microbial genomic DNA necessary for next-generation microbial genome sequencing.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Gezmu, Misrak
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Arts and Sciences
United States
Zip Code