The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) powers the next generation of computational genomics research using cloud-scale data and compute resources. The platform is built on a set of established components, including the Terra computing platform and Dockstore for standards-based sharing of containerized tools and workflows. It also provides multiple entry points for data access and analysis, including batch workflows with Terra, notebook environments including Jupyter and RStudio, Bioconductor packages for building analysis on top of AnVIL APIs and services, and will soon offer Galaxy instances for interactive analysis. By providing a unified environment for data management and compute, AnVIL eliminates the need for data movement, allows for controlled access to sensitive data and monitoring, and provides elastic, shared computing resources that can be acquired by researchers as needed. NIH-sponsored biomedical research is increasingly moving to cloud-based data storage and analysis systems, with major cloud portals established for GTEx, Kids First, TOPMed, TCGA and several other major initiatives. However, using these systems together is a challenge. The individual data portals enable researchers to browse and query their own data but have limited functionality to share data or user registrations across portals or with cloud based workspaces, like Terra and Galaxy. The recently established NIH Cloud Platform Interoperability (NCPI) effort aims to address these issues by implementing key interoperability technologies across multiple NIH institutes. Under this project, we will work the NCPI working groups to define the use cases and standards for interoperability as well as implement three major technologies recommended by the NCPI within the Galaxy and R/Bioconductor components of AnVIL. First, we will implement the NIH Researcher Auth Service (RAS) to provide a common mechanism for researchers to establish their identity and access data they are authorized to use across Terra and Galaxy. Second, we will implement the Global Alliance for Genomics and Health (GA4GH) Data Repository Service (DRS) so that data consumers, including workflow systems, can access data objects in a single, standard way regardless of where they are stored and how they are managed. Finally, we will develop initial support in AnVIL for the Fast Healthcare Interoperability Resources (FHIR) standard. This standard describes data formats, elements, and an API for exchanging electronic health records (EHR), especially to ensure these records are available, discoverable, and understandable as patients move around the healthcare ecosystem. FHIR support in AnVIL will facilitate access to eMERGE and related projects by users once the data are ingested in AnVIL.
Biomedical research is increasingly moving towards cloud-based systems to browse and analyze large quantities of diverse molecular and phenotypic data. The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) powers the next generation of computational genomics research using cloud-scale data and compute resources. Under this project, we will enhance and extend the Galaxy and R/Bioconductor components of AnVIL to increase interoperability with other cloud portals by implementing support for the NIH Researcher Auth Service (RAS) and the Global Alliance for Genomics and Health (GA4GH) Data Repository Service (DRS) as well as develop initial support for the Fast Healthcare Interoperability Resources (FHIR) standard.