The human body is host to a diverse array of microorganisms. Collectively known as the human microbiota, our microbial cells are a significant factor in human health, not only through causing disease but also by promoting wellness. The microbiota must be understood to fight antibiotic resistance, to stop the rise of autoimmune diseases, and to deliver precision medicine. Microbes also carry information about our ancestry and environmental exposures. Metagenomic sequencing allows researchers to explore what is there and what they are doing. But the field has only recently begun to link differences in metagenomic data to specific microbes, because strain-level analysis is computationally and statistically challenging. The goal of this project is to develop efficient and accurate computational methods for studying the human microbiome at the strain level, so that the full extent of variation in our microbial cells and its association with our biology can be elucidated. The novel associations discovered could be used to identify microbial biomarkers for diagnosis and personalized treatments or to design microbiome targeted drugs, prebiotics, and probiotics. The tools developed will also be useful for characterizing strain-level variation in microbes from environments such as soils and oceans. The investigators will use the cyberinfrastructure and discoveries from this project in graduate teaching, for outreach and communication through public media.
This project will create new statistical methods, models, and software for microbiome research that enable characterization of gene copy number and single nucleotide variants of the microbial strains in shotgun metagenomes. The investigators hypothesize that strain-level analysis will reveal cryptic diversity and associations between host and microbes, which are missed by approaches that ignore differences in gene content amongst strains. The researchers have discovered massive differences in gene content (<50% shared genes) between strains of common human-associated bacteria from different people, which no doubt has functional consequences. Microbial species therefore are not sufficient biomarkers for precision medicine and evolutionary studies of the microbiome, because a person's strain may not harbor the pathways (e.g., for pathogenicity or drug-metabolism) identified in another strain of the same species. With novel tools for quantifying microbiome genomic diversity, thousands of publicly available metagenomes will be analyzed to comprehensively assess the global population structure of human-associated microbes. By comparing this biogeography across species with different functional capabilities and correlating it with human traits and demography, the aim is to discover how microbes adapt to and affect diverse human hosts.