Advances in genomics and data analytics create new opportunities for accurate risk prediction and personalized medical treatment for even rare cancers via large-scale data federation across institutions. Yet cancer research is often stymied by a lack of appropriate tools to streamline the transfer and sharing of clinical patient data for cancer research. Globus services permit secure data transfer, synchronization, and sharing in distributed environments at large scale. We propose here to extend these services so that they are appropriate to work securely with protected human data. The extended services will allow federation of clinical patient data for accurate cancer risk prediction, personalized treatment, as well as any other cancer research area. Globus is widely used, with over 15,000 users, more than 8,000 storage systems accessible via Globus, including at most leading US universities and many sites overseas, and more than 165 petabytes and 25 billion files transferred. Adoption of Globus by biomedical researchers has been rapid and is accelerating. Biomedical researchers at ~30 universities, government agencies, and sequencing centers have relied on Globus for streamlined data transfer and sharing. Our ?Globus Genomics? (GG) integrated Galaxy-Globus-cloud genomics analysis system has been used by more than 300 researchers across multiple biomedical research domains, including cancer, at over 25 institutions to analyze over 10,000 samples. We will develop a HIPAA Enablement Toolkit that will enable Globus and other software-as-a-service providers (including GG) to manage protected data securely (Aim 1.1). We will extend Globus security features by implementing file name encryption and by encrypting data with user-supplied keys, and demonstrate that these new features can be used by GG and other services to enable elastic, secure, high-performance cancer genomics data analysis (Aim 1.2). We will integrate Globus with major cloud platforms by developing uniform storage system interfaces (Aim 2.1), engineering high-speed transfers (Aim 2.2), and implementing search, replication, and synchronization (Aim 2.3) on AWS, Google, Microsoft, and OpenStack-based clouds, so that cancer researchers can transfer and share data securely and easily among these and other (e.g., local) computing and storage platforms. The resulting tools will be applicable to any cancer type across the cancer research spectrum. We will validate and disseminate these new technologies first within existing and emerging breast (Aim 3.1), blood (Aim 3.2), and pancreatic (Aim 3.3) cancer research networks and then more broadly with collaborators across the cancer research continuum (Aim 3.4). We will work closely with collaborators and users to ensure that we meet the needs of a broad cross-section of the cancer research community that requires transfer, sharing, and analysis of large, human data sets. We will use extensive community outreach through multiple channels to widely disseminate our technologies.

Public Health Relevance

Title Building protected data sharing networks to advance cancer risk assessment and treatment. Project Narrative Progress in cancer research increasingly depends on sharing human research data?clinical, genome, phenome, imaging, cancer registry, and survey?across institutions. Yet the safeguards required to protect data confidentiality, availability, and integrity frequently hinder research. Researchers are often denied electronic access to remote data, reduced to shipping portable electronic media, and/or forced to struggle with inadequate local computing environments. To address these obstacles to cancer research, we propose to extend the powerful Globus services to enable cancer researchers to appropriately and securely transfer, manage, and share identified human data in distributed environments, and to appropriately and securely analyze such data on public cloud platforms.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA209996-03
Application #
9749062
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Divi, Rao L
Project Start
2017-08-01
Project End
2022-07-31
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Chicago
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
005421136
City
Chicago
State
IL
Country
United States
Zip Code
60637