Tuning big data analysis infrastructure for HIV research

Nekrutenko, Anton; Pond, Sergei; Schatz, Michael

Abstract

The COVID?19/SARS?CoV?2 pandemic is a once in a generation, ?all?hands?on?deck? event for the scientific community. This pandemic is also the first in which real time genomic data are available, e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus and the epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences by multiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viral investigations, including COVID?19: lack of reproducibility, rigor, and data/analytic sharing. Only about 10% of the published genomes have quality metrics, primary data (read files), or any level of details on analytics, making these data irreproducible and unverifiable; over 40% of GISAID submissions to date provide no information about how the sequences were generated. Essential questions about the extent of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), transmission (phylogenetic and phylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source data and analytical approaches. One of the key goals/deliverables of this supplement will be the open analytic workflows that can be used to curate and standardize genomic data, and high quality annotated variation data.

Public Health Relevance

There are currently >10,500 complete genomic sequences of SARS?CoV?2 in GISAID [1] and NCBI where >60% of positions have a variant/mutation in at least one genome. The vast majority of them are likely noise (sequencing errors in the consensus) or neutral (carry little cost/benefit to the virus), but a few (or very few) might be a result of adaptation to human populations or other selective forces (e.g. drugs). The goal is this supplement is to develop robust strategies for detection of such variants and for assessment of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), and transmission (phylogenetic and phylogeographic).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Allergy and Infectious Diseases (NIAID)
Type: Research Project (R01)
Project #: 3R01AI134384-04S1
Application #: 10148893
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Gezmu, Misrak

Project Start: 2020-07-09
Project End: 2022-05-31
Budget Start: 2020-07-09
Budget End: 2021-05-31
Support Year: 4
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Pennsylvania State University
Department: Biochemistry
Type: Schools of Arts and Sciences
DUNS #: 003403953

City: University Park
State: PA
Country: United States
Zip Code: 16802

Related projects


NIH 2020 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University
NIH 2020 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Schatz, Michael / Pennsylvania State University
NIH 2019 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University
NIH 2018 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University
NIH 2017 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University

Publications

Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea et al. (2018) Community-Driven Data Analysis Training for Biology. Cell Syst 6:752-758.e1

Frost, Simon D W; Magalis, Brittany Rife; Kosakovsky Pond, Sergei L (2018) Neutral Theory and Rapidly Evolving Viral Pathogens. Mol Biol Evol 35:1348-1354

Shank, Stephen D; Weaver, Steven; Kosakovsky Pond, Sergei L (2018) phylotree.js - a JavaScript library for application development and interactive data visualization in phylogenetics. BMC Bioinformatics 19:276

Grüning, Björn; Chilton, John; Köster, Johannes et al. (2018) Practical Computational Reproducibility in the Life Sciences. Cell Syst 6:631-635

Nekrutenko, Anton; Team, Galaxy; Goecks, Jeremy et al. (2018) Biology Needs Evolutionary Software Tools: Let's Build Them Right. Mol Biol Evol 35:1372-1375

Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J et al. (2018) HIV-TRACE (TRAnsmission Cluster Engine): a Tool for Large Scale Molecular Epidemiology of HIV-1 and Other Rapidly Evolving Pathogens. Mol Biol Evol 35:1812-1819

Comments

Be the first to comment on Anton Nekrutenko's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: