Understanding the behavior of microbes plays a crucial role in environmental monitoring, renewable energy, agriculture, and medicine. Metagenomics sequences the complete genomes of all microbes residing in an ecological niche as a whole, allowing for a comprehensive and unbiased investigation into the community’s function and taxonomy. However, the bioinformatic analysis of the resulted sequence data remains challenging due to its high volume and complexity. The main goal of this Faculty Early Career Development (CAREER) project is to develop efficient and accurate computational methods for the analysis of metagenomics data and to develop the corresponding bioinformatics sections for the University of Kansas’s STEM outreach programs. The project will provide a new bioinformatic infrastructure for metagenomic sequencing data analysis. It will also stimulate the interest and improve the retention of younger generation and underrepresented minorities in STEM.

Reconstructing the complete microbial genomes from the fragmented metagenomic sequencing data (i.e., metagenome assembly) is a fundamental step towards understanding the function and taxonomy of the metagenome. The existing assembly algorithms are primarily based on two types of graph data structures, namely the string graph (which extends the overlap graph) and de Bruijn graph. The string graph has a higher accuracy but a lower sensitivity and contiguity compared to the de Bruijn graph. The Specific Aim 1 of this project will improve the sensitivity and contiguity of the string graph approach through an earlier consideration of the paired-end information and an adaptive overlap length threshold that accounts for the uneven coverage of metagenomics data. Specific Aim 2 of this project will leverage the read connectivity information from the assembly graph to improve the discovery, classification, and quantification of functional genes. The software products and scientific findings resulted from this project will be disseminated to the public for free through https://cbb.ittc.ku.edu.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Biological Infrastructure (DBI)
Application #
Program Officer
Jean Gao
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Kansas
United States
Zip Code