Tuning big data analysis infrastructure for HIV research

Nekrutenko, Anton; Pond, Sergei; Taylor, James

Abstract

The state of big data analytics in the field of HIV/AIDS research is critically lacking. Decreasing cost of sequencing stimulated the development of novel software tools and analysis frameworks. The bulk of these efforts has been driven by truly expansive (and well-funded) collaborative projects such as the 1000 genomes, ENCODE, modENCODE, GTEx, the Human Microbiome, the Cancer Genome Atlas, and others. While these projects hardened many aspects of NGS data analysis and manipulation, as well as established standards for data representation (e.g. BAM, VCF, CRAM formats) they were facing a set of challenges that is markedly distinct from those faced by HIV researchers, e.g. long stable genomes with few mutations (i.e., human) versus short variable genomes with many mutations (i.e., HIV). Consequently, the development of HIV-specific tools and applications for next generation sequencing (NGS) has largely been the domain of individual labs, independently designing sensible ad hoc, yet disaggregated, solutions to common problems, resulting in a fragmented field largely without accepted standards and gaps between available solutions and the needs of end users. The current practice of writing ?full-stack? custom in-house solutions for NGS analyses is not scalable, not maintainable, largely fails to leverage the developments from other domains of NGS data analysis, and hampers the adoption of this transformative technology in HIV research.
The specific aims of this proposal address practical aspects of HIV/AIDS-related NGS analysis by assembling proven and newly developed tools and modules into ?data to answer? series of workflows, and creating a publicly available and accessible turnkey solution suitable for a large proportion of HIV/AIDS researchers needing to perform routine and bespoke analyses of NGS data..

Public Health Relevance

This proposal brings together three research groups (Penn State, Temple, and Johns Hopkins), with combined expertise and accomplishments in comprehensive sequence-based HIV/AIDS research, open source high-throughput genomic tools and framework development, reproducible scientific computation, and data visualization. Jointly, we will develop a single-point publicly accessible informatics and analytical ?data-to-answer framework? enabling HIV researchers to process, store, and retrieve NGS data, and to translate these data into actionable results.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Allergy and Infectious Diseases (NIAID)
Type: Research Project (R01)
Project #: 1R01AI134384-01
Application #: 9411424
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Gezmu, Misrak

Project Start: 2017-06-26
Project End: 2022-05-31
Budget Start: 2017-06-26
Budget End: 2018-05-31
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: Pennsylvania State University
Department: Biochemistry
Type: Schools of Arts and Sciences
DUNS #: 003403953

City: University Park
State: PA
Country: United States
Zip Code: 16802

Related projects


NIH 2020 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University
NIH 2020 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Schatz, Michael / Pennsylvania State University
NIH 2019 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University
NIH 2018 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University
NIH 2017 R01 AI	Tuning big data analysis infrastructure for HIV research Nekrutenko, Anton; Pond, Sergei L Kosakovsky; Taylor, James Peter / Pennsylvania State University

Publications

Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea et al. (2018) Community-Driven Data Analysis Training for Biology. Cell Syst 6:752-758.e1

Frost, Simon D W; Magalis, Brittany Rife; Kosakovsky Pond, Sergei L (2018) Neutral Theory and Rapidly Evolving Viral Pathogens. Mol Biol Evol 35:1348-1354

Shank, Stephen D; Weaver, Steven; Kosakovsky Pond, Sergei L (2018) phylotree.js - a JavaScript library for application development and interactive data visualization in phylogenetics. BMC Bioinformatics 19:276

Grüning, Björn; Chilton, John; Köster, Johannes et al. (2018) Practical Computational Reproducibility in the Life Sciences. Cell Syst 6:631-635

Nekrutenko, Anton; Team, Galaxy; Goecks, Jeremy et al. (2018) Biology Needs Evolutionary Software Tools: Let's Build Them Right. Mol Biol Evol 35:1372-1375

Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J et al. (2018) HIV-TRACE (TRAnsmission Cluster Engine): a Tool for Large Scale Molecular Epidemiology of HIV-1 and Other Rapidly Evolving Pathogens. Mol Biol Evol 35:1812-1819

Comments

Be the first to comment on Anton Nekrutenko's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: