The analysis of the RNA sequence (i.e, a genome) defining the SARS-CoV-2 pathogen, which causes COVID-19 disease, could shed light on better diagnostics, vaccines, and treatments for the disease. Although many research groups are studying the SARS-CoV-2 genome, they typically focus on a particular analytical method (e.g., nucleotide conservation analysis) or one or another element in the genome and therefore do not consider how different analyses may complement one another, or how interactions between the SARS-CoV-2 genomic elements themselves and/or other factors (e.g., human genes and proteins or drugs) could shed light on how to treat the disease. The project focuses on an integrated approach to identifying important elements in the SARS-CoV-2 genome ? elements that might be potentially hidden from more superficial analyses ? and the characterization of interactions within the SARS-CoV-2 genome as well interactions involving other factors, such as human genes, drugs, and other viruses. To this end, a wide variety of complementary analysis techniques (e.g., evolutionary conservation analysis, polymorphism functional effect evaluation, secondary RNA structure prediction, and phylogenetic signal analysis to name a few) will be exploited to identify and characterize features of the SARS-CoV-2 genome. The SARS-CoV-2 genome to other closely and distantly related pathogen and organism genomes will be compared to identify potentially hidden or novel elements that might be diagnostic or drug targets. Large-scale database searches and computational drug matching analysis will not be able to recognize not only regions of the SARS-CoV-2 genome that might be modulated or impacted by drugs, but also candidate drugs as well. The nature of the comprehensive and integrated quantitative research will expose potential multidisciplinary training and education opportunities.
The scope of the project will be broad by design and amount to as comprehensive an annotation of the SARS-CoV-2 genome as possible, especially with respect to genomically-guided interactions involving SARS-CoV-2 genomic elements amongst themselves and with other factors, such as other viruses, human genes, proteins and elements such as microRNAs, and drugs and therapeutic constructs such as antisense oligonucleotide (ASO) constructs. To pursue the research, an analysis pipeline and workflow were designed to enable and integrate the results of various analyses. The pipeline starts with comparative analyses of all available SARS-CoV-2 genomes (currently >10,000) and closely and distantly related species such as other viruses and SARS-CoV-2 infection hosts such as bats and humans (amounting to billions of species) and function prediction tools to identify and characterize likely functional elements. The identification of unique features in the SARS-CoV-2 genome could reveal diagnostic targets. Both structure prediction and phylogenetic signal analyses are pursued on any identified elements found in the SARS-CoV-2 genome. Predicted structures are then subjected to in silico drug and therapeutic construct (e.g., ASOs) binding and modulation studies. The likely functional effects of polymorphism on these structures are also assessed. Phylogenetic signal analyses can reveal phenotypes that variant forms of a functional element can influence (e.g., viral load, infectivity, etc.). Finally, identified elements are considered in network analyses to determine which other elements they interact with and which human (and other) host genes and proteins they could influence. A website describing and disseminating the analysis results and data generated as part of the research will be available at www.tgen.org.
This RAPID award is made by the Infrastructure Innovation for Biological Research (IIBR Informatics) Program in the Division of Biological Infrastructure, using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.