Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)

Carey, Vincent; Goecks, Jeremy; Leek, Jeffrey; Morgan, Martin; Nekrutenko, Anton; Schatz, Michael; Waldron, Levi

Abstract

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab?space (AnVIL) will power the next generation of computational genomic research. We will develop the AnVIL environment using the leading national?scale cyberinfrastructure as the foundation supporting the most widely?used analysis environments and frameworks vetted by NHGRI researchers. Our user?centered solution for data access, analysis, and visualization will enable investigators across all levels of expertise to fully utilize genomic datasets using environments they are already familiar with, leveraging well?engineered and optimized scientific computing infrastructure for greater efficiency and lower costs.
Aim 1 : Engineer the AnVIL Data and Compute Platform. We will leverage the TACC Science Cloud and the Agave Science?As?A?Service platform to deploy a cloud?based environment supporting the data storage, access, and compute needs of the NHGRI research community.
Aim 2. Develop APIs for Data and Compute Access. To maximize the domain?wide impact of AnVIL, we will draw on community efforts and our own collective experience supporting diverse genomic analyses to define access standards and to design and implement AnVIL APIs.
Aim 3. Build an AnVIL metaportal integrating widely used analysis platforms. We will create a single metaportal residing within TACC's Science Cloud providing a unified view of users' data and activities, provenance and billing, and access to several of the most widely used workbenches for genomic research. These workbenches include Bioconductor, Galaxy, the Genome Modeling System, Juypter, and RStudio. The metaportal will also provide access to the most popular genomic visualization tools.
Aim 4. Develop novel data aggregation, indexing and query schemes to increase analysis efficiency and reduce cost. We will build approaches, including indexing and pre?computation of key statistics, to make better use of existing (e.g., TCGA, GTEx) and future large datasets with the goal of increasing data utility and decreasing the cost of posing scientific queries against massive datasets.
Aim 5 : Develop training and outreach infrastructure and materials. We will build support for training directly in the AnVIL platform, including tight coupling to MOOC style courses, self?directed training materials, and support materials for conducting online and in?person training workshops.
Aim 6 : Engage in effective project governance and assessment. We will establish a leadership and management structure involving key stakeholders from NHGRI, including program staff and the NHGRI appointed Data Steering Committee and External Advisory Committee. The key innovation of this work is our leveraging of existing hardware, software, and human resources to create a practical and pragmatic solution to the challenge of building the AnVIL.

Public Health Relevance

Project? ?Narrative The? ?goal? ?of? ?this? ?project? ?is? ?to? ?create? ?a? ?cloud-based? ?computational? ?analysis? ?and? ?visualization workspace? ?for? ?genomic? ?research.? ?The? ?research? ?enabled? ?by? ?this? ?workspace? ?will? ?accelerate? ?our understanding? ?of? ?the? ?genetic? ?components? ?of? ?human? ?health? ?and? ?disease? ?and? ?progress? ?towards precision? ?genomic? ?medicine.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Resource-Related Research Projects--Cooperative Agreements (U24)
Project #: 5U24HG010263-03
Application #: 9974560
Study Section: Special Emphasis Panel (ZHG1)
Program Officer: Di Francesco, Valentina

Project Start: 2018-09-21
Project End: 2023-06-30
Budget Start: 2020-07-01
Budget End: 2021-06-30
Support Year: 3
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Johns Hopkins University
Department: Biostatistics & Other Math Sci
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21205

Related projects


NIH 2020 U24 HG	Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) Carey, Vincent James; Goecks, Jeremy; Leek, Jeffrey T.; Morgan, Martin T.; Nekrutenko, Anton; Schatz, Michael; Waldron, Levi David / Johns Hopkins University
NIH 2020 U24 HG	Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) Goecks, Jeremy; Morgan, Martin T.; Schatz, Michael / Johns Hopkins University
NIH 2019 U24 HG	Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) Carey, Vincent James; Goecks, Jeremy; Leek, Jeffrey T.; Morgan, Martin T.; Nekrutenko, Anton; Schatz, Michael; Taylor, James Peter; Waldron, Levi David / Johns Hopkins University
NIH 2018 U24 HG	Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) Carey, Vincent James; Goecks, Jeremy; Leek, Jeffrey T.; Morgan, Martin T.; Nekrutenko, Anton; Schatz, Michael; Taylor, James Peter; Waldron, Levi David / Johns Hopkins University

Comments

Be the first to comment on Vincent Carey's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: