Improved Analysis of Metagenomes through the application of Read-Sized Profile HMMs to Marker Gene Subsequences

Selengut, Jeremy

Abstract

The study of the human microbiome, with its multitudes of host-associated organisms, holds great promise for increasing our understanding of human health and disease. With its fragmented sequence data unlinked from genome of origin information, the particular challenge of metagenomics is how to provide reliable functional annotation and taxonomic assignment. Here we address these issues by leveraging existing profile hidden Markov models (HMMs) of functionally characterized gene families. Instead of relying on fragment matches to full-length genes or gene models, we will determine which segments of gene models are capable of high-quality annotations of function and origin, and focus on those. By this approach, the portions of the gene models that have low sequence conservation or have variable insertion/gap length (tending towards low recall), or those that are composed of sequence shared among multiple gene families and functions (tending towards low precision) are systematically eliminated, increasing overall signal-to-noise. The high-quality segments of the models (?mini? HMMs) will be our analytical tools. Using these methods we hope to provide robust approach that frees metagenomics from the limitations of assembly-first strategies, and thereby provide access to information about the numerous low-abundance species in complex biological samples. We will use bacterial single-copy genes as taxonomic markers, and will produce a database of these genes from high-quality genomes. We expect to identify ~80 suitable marker genes, determined for several thousand genomes. For each of these genes, we will produce a corresponding reference phylogenetic tree. In the course of producing these resources, the existing models (TIGRFAMs and Pfam HMMs) will be updated based on the current set of reference genomes and a constant, state-of-the-art construction process. These resources, and any software we produce will be made available through our public website. With these methods and resources, we will obtain taxonomic profiles, investigate genes of interest and devise methods for linking those genes to the taxa in the profile. We will utilize real and synthetic metagenomes to perform validation of the methods, and establish statistical confidence metrics for our results.

Public Health Relevance

Metagenomes consist of short sequence fragments that are disconnected from information about their genome of origin. Methods proposed here attempt to overcome the limitations of the fragmentary nature of the data by identifying reliable short fragment-sized markers of genes as detection, annotation and taxonomic placement tools. Based on profile hidden Markov models (HMMs), these short markers are called ?mini? HMMs (mHMMs).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Allergy and Infectious Diseases (NIAID)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21AI123929-01A1
Application #: 9181272
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Brown, Liliana L

Project Start: 2016-06-02
Project End: 2018-05-31
Budget Start: 2016-06-02
Budget End: 2017-05-31
Support Year: 1
Fiscal Year: 2016
Total Cost: $223,399
Indirect Cost: $73,399

Institution

Name: University of Maryland College Park
Department: Biostatistics & Other Math Sci
Type: Schools of Earth Sciences/Natur
DUNS #: 790934285

City: College Park
State: MD
Country: United States
Zip Code: 20742

Related projects


NIH 2017 R21 AI	Improved Analysis of Metagenomes through the application of Read-Sized Profile HMMs to Marker Gene Subsequences Selengut, Jeremy / University of Maryland College Park	$185,215
NIH 2016 R21 AI	Improved Analysis of Metagenomes through the application of Read-Sized Profile HMMs to Marker Gene Subsequences Selengut, Jeremy / University of Maryland College Park	$223,399

Comments

Be the first to comment on Jeremy Selengut's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: