The current COVID-19 pandemic prompts an urgent response including improved bioinformatics tools to help the community analyze relevant datasets from infected patients and their environments. This project will update the popular QIIME pipeline for microbiome analysis, which is very widely used by people studying bacteria and archaea in the microbiome, optimizing it for researchers studying viruses. This will bring features such as data provenance tracking and reproducibility of analysis workflows to the viral research community. Such features enhance the reliability of bioinformatics work in a rapidly paced of research environment such as emergency response to a pandemic, where processing errors are more likely.
The intellectual merit of this work will be to move studies of viral communities from a nonphylogenetic to a phylogenetic basis, accelerate time-to-result, and to make QIIME 2 far more useful to the increasing number of researchers moving from bacterial community analysis to viral community analysis in response to the COVID-19 pandemic. Specific enhancements to be implemented are to build a reference database of viral sequences from diverse genome, metagenome, and metatranscriptome sources; to enhance storage and compute to resolve limitations posed by large-scale datasets generated for SARS-CoV-2; to extend computational pipelines to accommodate the recombination and lack of recognizable common phylogenetic tree roots characteristic of viruses; and to support genome assembly from reads recruiting to viral databases. Results will be disseminated as new QIIME 2 plugins, for broad distribution to the community, including the development of new educational materials and new workshop modules.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.