The COVID?19/SARS?CoV?2 pandemic is a once in a generation, ?all?hands?on?deck? event for the scientific community. This pandemic is also the first in which real time genomic data are available, e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus and the epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences by multiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viral investigations, including COVID?19: lack of reproducibility, rigor, and data/analytic sharing. Only about 10% of the published genomes have quality metrics, primary data (read files), or any level of details on analytics, making these data irreproducible and unverifiable; over 40% of GISAID submissions to date provide no information about how the sequences were generated. Essential questions about the extent of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), transmission (phylogenetic and phylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source data and analytical approaches. One of the key goals/deliverables of this supplement will be the open analytic workflows that can be used to curate and standardize genomic data, and high quality annotated variation data.
There are currently >10,500 complete genomic sequences of SARS?CoV?2 in GISAID [1] and NCBI where >60% of positions have a variant/mutation in at least one genome. The vast majority of them are likely noise (sequencing errors in the consensus) or neutral (carry little cost/benefit to the virus), but a few (or very few) might be a result of adaptation to human populations or other selective forces (e.g. drugs). The goal is this supplement is to develop robust strategies for detection of such variants and for assessment of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), and transmission (phylogenetic and phylogeographic).
Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea et al. (2018) Community-Driven Data Analysis Training for Biology. Cell Syst 6:752-758.e1 |
Frost, Simon D W; Magalis, Brittany Rife; Kosakovsky Pond, Sergei L (2018) Neutral Theory and Rapidly Evolving Viral Pathogens. Mol Biol Evol 35:1348-1354 |
Shank, Stephen D; Weaver, Steven; Kosakovsky Pond, Sergei L (2018) phylotree.js - a JavaScript library for application development and interactive data visualization in phylogenetics. BMC Bioinformatics 19:276 |
Grüning, Björn; Chilton, John; Köster, Johannes et al. (2018) Practical Computational Reproducibility in the Life Sciences. Cell Syst 6:631-635 |
Nekrutenko, Anton; Team, Galaxy; Goecks, Jeremy et al. (2018) Biology Needs Evolutionary Software Tools: Let's Build Them Right. Mol Biol Evol 35:1372-1375 |
Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J et al. (2018) HIV-TRACE (TRAnsmission Cluster Engine): a Tool for Large Scale Molecular Epidemiology of HIV-1 and Other Rapidly Evolving Pathogens. Mol Biol Evol 35:1812-1819 |