Data sharing is essential to modern biomedical data science. Access to a large amount of genomic and clinical data can help us better understand human genetics and its impact on health and disease. However, the sensitive nature of biomedical information presents a key bottleneck in data sharing and collection efforts, limiting the utility of these data for science. The goal of this project is to leverage cutting-edge advances in cryptography and information theory to develop innovative computational frameworks for privacy-preserving sharing and analysis of biomedical data. We will draw upon our recent success in developing secure pipelines for collaborative biomedical analyses to address the imminent need to share sensitive data securely and at scale. Practical adoption of existing privacy-preserving techniques in biomedicine has thus far been largely limited due to two major pitfalls, which this project overcomes with novel technical advances. First, emerging cryptographic data sharing frameworks, which promise to enable collaborative analysis pipelines that securely combine data across multiple institutions with theoretical privacy guarantees, are too costly to support complex and large-scale computations required in biomedical analyses. In this project, we will build upon recent advances in cryptography (e.g., secure distributed computation, pseudorandom correlation, zero-knowledge proofs) to significantly enhance the scalability and security of cryptographic biomedical data sharing pipelines. Second, existing approaches that locally transform data to protect sensitive information before sharing (e.g. de-identification techniques) either offer insufficient levels of protection or require excessive perturbation in order to ensure privacy. We will draw upon recent tools from information theory to develop effective local privacy protection methods that achieve superior utility-privacy tradeoffs on a range of biomedical data including genomes, transcriptomes, and medical images by directly exploiting the latent correlation structure of the data. To promote the use of our privacy techniques, we will create production-grade software of our tools and publicly release them. We will also actively participate in international standard-setting organizations in genomics, e.g. GA4GH and ICDA, to incorporate our insights into community guidelines for biomedical privacy. Successful completion of these aims will result in computational methods and software tools that open the door to secure sharing and analysis of massive sets of sensitive genomic and clinical data. Our long-term goal is to broadly enable data sharing and collaboration efforts in biomedicine, thus empowering researchers to better understand the molecular basis of human health and to drive translation of new biological insights to the clinic.

Public Health Relevance

Rapidly-growing volume of biomedical datasets around the world promises to enable unprecedented insights into human health and disease. However, increasing concerns for individual privacy severely limited the extent of data sharing in the field. This project draws upon cutting-edge tools from cryptography and information theory to develop effective privacy- preserving methods for collecting, sharing, and analyzing sensitive biomedical data to empower advances in genomics and medicine.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Type
Early Independence Award (DP5)
Project #
1DP5OD029574-01
Application #
10017554
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Miller, Becky
Project Start
2020-09-10
Project End
2025-08-31
Budget Start
2020-09-10
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
623544785
City
Cambridge
State
MA
Country
United States
Zip Code
02142