The goal of the current proposal is to develop a new flexible framework for metagenomic assembly. This framework will provide novel algorithms for de novo and comparative assembly of metagenomic data, as well as tools for extracting and analyzing genomic variation. In addition, a novel approach will be developed for reconstructing segments of a genome containing a specific 16S rRNA sequence. At the core of this framework are two packages developed in the PIs lab: the open source package AMOS - a flexible and modular assembly package, and Bambus 2 - a genomic scaffolder specifically targeted at metagenomic data. A novel model-based testing framework will be closely integrated with the developments proposed here in order to ensure the correctness of the software being developed. Several levels of test procedures will be run every time the code is changed, on a nightly basis, as well as at regular intervals, allowing errors to be promptly discovered and corrected.

Public Health Relevance

Metagenomics studies are starting to elucidate the roles microbes play in human health and disease. This proposal will enhance future metagenomic studies by providing full featured, efficient, and robust algorithms and tools for reconstructing te genomes and metagenomes from high-throughput sequencing data.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Yao, Alison Q
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Maryland College Park
Biostatistics & Other Math Sci
Schools of Arts and Sciences
College Park
United States
Zip Code
Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter et al. (2017) Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 14:1063-1071
Ghurye, Jay; Pop, Mihai; Koren, Sergey et al. (2017) Scaffolding of long read assemblies using long range contact information. BMC Genomics 18:527
Olson, Nathan D; Treangen, Todd J; Hill, Christopher M et al. (2017) Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform :
Almeida, Mathieu; Pop, Mihai; Le Chatelier, Emmanuelle et al. (2016) Capturing the most wanted taxa through cross-sample correlations. ISME J 10:2459-67
Pop, Mihai; Paulson, Joseph N; Chakraborty, Subhra et al. (2016) Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment. BMC Genomics 17:440
Rashid, Mahamud-Ur; Almeida, Mathieu; Azman, Andrew S et al. (2016) Comparison of inferred relatedness based on multilocus variable-number tandem-repeat analysis and whole genome sequencing of Vibrio cholerae O1. FEMS Microbiol Lett 363:
Morris, Alison; Paulson, Joseph N; Talukder, Hisham et al. (2016) Longitudinal analysis of the lung microbiota of cynomolgous macaques during long-term SHIV infection. Microbiome 4:38
Mendelowitz, Lee M; Schwartz, David C; Pop, Mihai (2016) Maligner: a fast ordered restriction map aligner. Bioinformatics 32:1016-22
Simpson, Jared T; Pop, Mihai (2015) The Theory and Practice of Genome Sequence Assembly. Annu Rev Genomics Hum Genet 16:153-72
Pop, Mihai; Salzberg, Steven L (2015) Use and mis-use of supplementary material in science publications. BMC Bioinformatics 16:237

Showing the most recent 10 out of 16 publications