The COVID?19/SARS?CoV?2 pandemic is a once in a generation, ?all?hands?on?deck? event for the scientific community. This pandemic is also the first in which real time genomic data are available, e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus and the epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences by multiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viral investigations, including COVID?19: lack of reproducibility, rigor, and data/analytic sharing. Only about 10% of the published genomes have quality metrics, primary data (read files), or any level of details on analytics, making these data irreproducible and unverifiable; over 40% of GISAID submissions to date provide no information about how the sequences were generated. Essential questions about the extent of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), transmission (phylogenetic and phylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source data and analytical approaches. One of the key goals/deliverables of this supplement will be the open analytic workflows that can be used to curate and standardize genomic data, and high quality annotated variation data.

Public Health Relevance

There are currently >10,500 complete genomic sequences of SARS?CoV?2 in GISAID [1] and NCBI where >60% of positions have a variant/mutation in at least one genome. The vast majority of them are likely noise (sequencing errors in the consensus) or neutral (carry little cost/benefit to the virus), but a few (or very few) might be a result of adaptation to human populations or other selective forces (e.g. drugs). The goal is this supplement is to develop robust strategies for detection of such variants and for assessment of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), and transmission (phylogenetic and phylogeographic).

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Research Project (R01)
Project #
3R01AI134384-04S1
Application #
10148893
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gezmu, Misrak
Project Start
2020-07-09
Project End
2022-05-31
Budget Start
2020-07-09
Budget End
2021-05-31
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802
Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea et al. (2018) Community-Driven Data Analysis Training for Biology. Cell Syst 6:752-758.e1
Frost, Simon D W; Magalis, Brittany Rife; Kosakovsky Pond, Sergei L (2018) Neutral Theory and Rapidly Evolving Viral Pathogens. Mol Biol Evol 35:1348-1354
Shank, Stephen D; Weaver, Steven; Kosakovsky Pond, Sergei L (2018) phylotree.js - a JavaScript library for application development and interactive data visualization in phylogenetics. BMC Bioinformatics 19:276
Grüning, Björn; Chilton, John; Köster, Johannes et al. (2018) Practical Computational Reproducibility in the Life Sciences. Cell Syst 6:631-635
Nekrutenko, Anton; Team, Galaxy; Goecks, Jeremy et al. (2018) Biology Needs Evolutionary Software Tools: Let's Build Them Right. Mol Biol Evol 35:1372-1375
Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J et al. (2018) HIV-TRACE (TRAnsmission Cluster Engine): a Tool for Large Scale Molecular Epidemiology of HIV-1 and Other Rapidly Evolving Pathogens. Mol Biol Evol 35:1812-1819