The AnVIL Data Ecosystem

Abecasis, Goncalo; Carroll, Robert; Denny, Joshua; Grossman, Robert; Hall, Ira; Hall, Jennifer; Haussler, David; Paten, Benedict; Philippakis, Anthony

Abstract

In this proposal, we bring together a unified team with a strong track record of developing secure and scalable software systems to support flagship scientific efforts, such as the All of Us Research Program, the Genomic Data Commons (GDC), and the Human Cell Atlas (HCA). Our group will leverage these experiences, and the software developed for them, to create an ecosystem of applications that will both serve the needs of the AnVIL and interoperate with other NIH data resources. We will accomplish this through the following Aims: ? Aim 1 (Software Engineering): Leverage existing software capabilities to create tools for storing, sharing, and analyzing AnVIL datasets at unlimited scale. During the past five years, our groups have created a suite of modular and open source software capabilities that address key needs in genomic data science. We will leverage these existing capabilities and extend them in novel directions to address AnVIL-specific scientific goals relating to human genetics and functional genomics. ? Aim 2 (Data Engineering): Curate data and metadata resources so that they are easily accessible. The AnVIL will not only be a suite of software services, but also a vast repository of genotypic and phenotypic information. For this resource to be usable by the community, it must be organized, curated, and made accessible. We will accomplish this by processing genomic datasets using a consistent set of best-practices pipelines, and mapping phenotypes to a common data model. ? Aim 3 (Operations): Stand up and support a data environment for the AnVIL community, and integrate it with other NIH resources as part of a federated NIH-wide genomic data commons. The modular components of Aim 1 are critical building blocks, but they alone are not enough to meet the needs of the AnVIL; they must also be stood up as services and integrated into a coherent entity, which we call a ?data environment.? We propose to create an AnVIL data environment that will enable researchers to access datasets in a secure, compliant, and facile manner. The guiding principle of these efforts is that progress in genomic science will happen most rapidly if there is a diversity of solutions created by a plurality of groups. Towards that end, our approach to engineering the software components of Aim 1, curating the datasets of Aim 2, and operating the software services of Aim 3 is to catalyze an ecosystem of activity around the AnVIL. Our proposal focuses not only on creating and operating software services ourselves, but also on incorporating third-party solutions. We propose to accomplish this by architecting the AnVIL data environment according to the following principles: (i) modularity, (ii) openness, (iii) community engagement, (iv) standardization, and (v) interoperability.

Public Health Relevance

The AnVIL Data Ecosystem Project Narrative In this proposal, we bring together a unified team with a strong track record of developing secure and scalable software systems to support flagship scientific efforts, such as the All of Us Research Program, the Genomic Data Commons (GDC), and the Human Cell Atlas (HCA). Our group will leverage these experiences, and the software developed for them, to create an ecosystem of cloud-based applications that will enable the NHGRI to store, share and analyze datasets at unlimited scale. Importantly, this architecture will interoperate with other key NIH data environments as part of a federated genomic data commons.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Resource-Related Research Projects--Cooperative Agreements (U24)
Project #: 1U24HG010262-01
Application #: 9598187
Study Section: Special Emphasis Panel (ZHG1)
Program Officer: Wiley, Kenneth L

Project Start: 2018-09-19
Project End: 2023-06-30
Budget Start: 2018-09-19
Budget End: 2019-06-30
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Broad Institute, Inc.
Department
Type
DUNS #: 623544785

City: Cambridge
State: MA
Country: United States
Zip Code

Related projects


NIH 2020 U24 HG	The AnVIL Data Ecosystem Carroll, Robert J.; Grossman, Robert L.; Hall, Ira M.; Hall, Jennifer L.; Haussler, David H.; Paten, Benedict; Philippakis, Anthony / Broad Institute, Inc.
NIH 2020 U24 HG	The AnVIL Data Ecosystem Carroll, Robert J.; Grossman, Robert L.; Hall, Ira M.; Hall, Jennifer L.; Haussler, David H.; Paten, Benedict; Philippakis, Anthony / Broad Institute, Inc.
NIH 2020 U24 HG	The AnVIL Data Ecosystem Carroll, Robert J.; Grossman, Robert L.; Hall, Ira M.; Hall, Jennifer L.; Haussler, David H.; Paten, Benedict; Philippakis, Anthony / Broad Institute, Inc.
NIH 2019 U24 HG	The AnVIL Data Ecosystem Carroll, Robert J.; Grossman, Robert L.; Hall, Ira M.; Hall, Jennifer L.; Haussler, David H.; Paten, Benedict; Philippakis, Anthony / Broad Institute, Inc.
NIH 2018 U24 HG	The AnVIL Data Ecosystem Abecasis, Goncalo; Carroll, Robert J.; Denny, Joshua Charles; Grossman, Robert L.; Hall, Ira M.; Hall, Jennifer L.; Haussler, David H.; Paten, Benedict; Philippakis, Anthony / Broad Institute, Inc.

Comments

Be the first to comment on Goncalo Abecasis's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: