Deep sequencing of microbial populations is a potentially powerful probe of their diversity and their dynamical evolutionary history. Unfortunately, the ability to analyze deep sequencing data and to infer information in the presence of errors has far from kept up with DNA sequencing capabilities. The difficulties are particularly pronounced for investigating the fine-scale diversity in a population of closely related microbes. The investigator and his colleagues will develop new algorithms for extraction of reliable information on the genomic diversity of microbial populations and analyze the short-term dynamical evolutionary processes that can generate such diversity. The algorithms will be based on modeling the processes that produce errors and biases in deep sequencing data, primarily the PCR amplification of the DNA and the sequencing itself. But in order to make useful inferences about fine-scale diversity, far better understanding is needed of the evolutionary dynamics of large microbial populations. Thus a spectrum of evolutionary scenarios will be modeled and analyzed focussing on the diversity and clues it may give to the evolutionary history. The algorithms developed for disentangling the diversity from errors will then be focussed and adapted to use the expectations from the evolutionary modeling as prior information and thereby distinguish between different scenarios. This will include developing optimized strategies for depth, breadth, and timing of DNA sequencing.

Evolution of animals is usually very slow, but bacteria and viruses evolve extremely fast and this evolution leads to major threats to humans. For example, the evolution of usually innocuous bacteria within children with cystic fibrosis is what eventually leads to their premature death, and, on a global scale, evolution of influenza is what causes new epidemics. Better understanding and observations of evolution of pathogens is sorely needed. In the laboratory, bacteria and viruses also evolve --- and this evolution can be directed. Although artificial evolution can lead to many benefits, such as bacteria that eat pollutants, it can also be used for nefarious purposes. A crucial capability, such as for investigation of the anthrax attacks ten years ago, is to determine the evolutionary history from samples: when, where, and how they evolved. Fortunately, DNA sequencing has become so inexpensive that one can not only sequence many individual bacteria or viruses, but also sequence whole populations. This enables direct observations of the evolution of a population. the spectrum of differences among the individuals that provides the variation on which natural selection acts, and clues to the evolutionary history. But DNA sequencing produces many errors which make extraction of the useful information exceedingly difficult. This project will develop new algorithms for disentangling the actual DNA sequences from the errors. In parallel, sophisticated mathematical modeling will be used to explore various possible evolutionary histories and the resulting sequence variations. These will be put together to develop strategies for optimal use of DNA sequencing for inferring key aspects of the evolution of bacterial and viral populations, and understanding and predicting their consequences.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Leland M. Jameson
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Palo Alto
United States
Zip Code