Viruses are the most abundant biological entities on Earth and are keystone components of environments and microbiomes whose contributions have mostly been overlooked. Understanding viruses is critical to the study and applications of microbiomes in diverse fields such as agriculture, medicine, biotechnology, ecosystem science, oceanography and biogeochemistry. In spite of this recognized importance, computational tools for studying viruses are lacking compared to similar tools for other microbes like bacteria. To that end, the goals of this project are to develop new machine learning-based approaches for the study of viruses and their ecology. The project has the potential to transform the study of microbiomes and the field of viral ecology by maximizing information gained from viruses and elucidating their roles in nature. The current SARS-CoV-2 epidemic is projected to vastly increase student interest in STEM and specifically, virology in the coming years. In parallel, there is an increasing demand for a workforce adept in bioinformatics and data science approaches in biology. The project aims to advance the development of a talented, educated, and skilled workforce and increase participation of underrepresented minorities and first-generation college students in STEM. The project will also increase literacy of virology and data science across K-12 and undergraduate education through teacher-training workshops & course-based undergraduate research experiences (CUREs) at the interface of bioinformatics, data science, and virology.
This project will develop algorithms and bioinformatic tools to enable the study of uncultivated viruses from mixed communities with little to no biases (bacterial/archaeal/eukaryotic, DNA/RNA, lytic/lysogenic). The goals of the project are to develop new genome and protein databases and machine learning approaches to identify viruses; network-based frameworks for reference-free prediction of viral hosts; and network and statistical approaches for determination of viral taxonomy and estimates of genome completion. These methods will be validated using simulated and real-world metagenomics and metatranscriptomics data and formalized through the development and release of open access databases and software. The approaches will be applied to study viral ecology of deep-sea hydrothermal ecosystems, and the role of viral infections in impacting nutrient cycling in the oceans. Additionally, the project will also enable investigation of fundamental questions in viral ecology governing the roles of viruses in diverse microbiomes and environments such as soils, human health, freshwater, and marine systems. Two CUREs will be developed in virus cultivation and bioinformatics, respectively, using novel interactive education approaches including blended learning, and are expected to reach in excess of 1000 students. A teacher-training workshop “Viruses in nature†will be conducted to train K-12 biology teachers (especially from rural and underrepresented communities). The workshop will develop lesson plans, and hands-on laboratory activities that integrate concepts of virology and bioinformatics into teaching units on biology. For more information visit: https://github.com/AnantharamanLab/NSF_CAREER.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.