The storage, sharing, and analysis of individual-level genomic, environmental, and linked phenotypic and/or health outcome data poses profound technical and logistical challenges for precision medicine research. Accordingly, new cloud-based computing and storage platforms are being developed to support facile data processing and cloud-based analysis commons. However, the implications for responsible data governance of these mechanisms are currently unexamined. To promote responsible and trustworthy data governance (i.e. decision-making about how biomedical data are stored, accessed, and used by researchers, as well as communicated to research participants), we will conduct an in-depth qualitative analysis of the policies, practices, and procedures associated with three emerging cloud-based precision medicine platforms: the BioData Catalyst or BDCatalyst (NHLBI), the Analysis, Visualization, and Informatics Lab-space or AnVIL (NHGRI), and the Research Hub (All of Us Research Program). The immediate goal of this exploratory investigation will be to examine how the control and management of genomic and linked clinical data stored on these platforms differs from earlier data sharing efforts. We will also explore what research stakeholders, including platform developers; investigators (data contributors as well as data users); institutional officials; and funders regard as the most relevant governance tradeoffs associated with the new approaches. In particular, we will solicit views on mechanisms employed to protect participant data, ensure that research uses are aligned with informed consent, and make well-validated results available to interested participants. Draft recommendations based in these observations will be shared with the research community and form the basis for subsequent research, which will introduce these new platforms and the research practices they enable to diverse research participants for feedback and critical reflection. To achieve these research objectives, we will pursue the following Aims: (1) Characterize current approaches to the storage, sharing, and analysis of largescale genomic and linked data enabled by emerging cloud-based analysis platforms; (2) Explore stakeholder views on current and proposed approaches to the governance of genomic cloud-based analysis platforms; and (3) Propose and vet recommendations for the trustworthy governance of genomic and linked environmental and phenotypic data in the context of cloud-based platforms. The proposed investigation will generate novel, timely, and detailed information about the storage, access, and intended use of large-scale genomic and linked clinical data held in emerging cloud-based data storage and analysis platforms. These data will provide a robust basis from which to identify best practices for trustworthy data governance and provide enhanced transparency about a key precision medicine research tool.
Precision medicine research increasingly relies on cloud-based platforms for the storage, sharing, and analysis of individual-level genomic, environmental, and linked health outcome data. To promote responsible and trustworthy data governance (i.e. decision-making about how biomedical data are stored, accessed, and used by researchers, as well as communicated to research participants) we will conduct an exploratory investigation of the policies, practices, and procedures associated with three such platforms: the BioData Catalyst (NHLBI), the Analysis, Visualization, and Informatics Lab-space or AnVIL (NHGRI), and the Research Hub (All of Us Research Program).