2 3 High-throughput sequencing, biomedical imaging, and electronic health record technologies are 4 generating health-related datasets of unprecedented scale. Integrative analysis of these 5 resources promises to reveal new biology and drive personal and precision medicine. Yet, the 6 sensitive nature of these data often requires that they be kept in isolated silos, limiting their 7 usefulness to science. The goal of this project is to develop innovative privacy-preserving 8 algorithms to enable data sharing and drive genomic medicine. Crucially, we will draw upon our 9 past success in secure genome analysis and algorithmic expertise in computational biology to 10 address the imminent need to perform complex integrative analyses securely and at scale. 11 Current privacy-preserving tools are prohibitively too costly to perform the complex 12 calculations required in genomic analysis. We previously leveraged the highly structured nature 13 of biological data and novel optimization strategies to implement efficient pipelines for secure 14 genome-wide association studies (GWAS) and drug interaction predictions which scaled to 15 millions of samples. In this project, we will further exploit the unique properties of biomedical data 16 to: (i) develop secure integrative analysis methods for genomic medicine; (ii) develop an easy-to- 17 use programming environment with advanced automated optimizations to facilitate the adoption 18 of privacy-preserving analyses; and (iii) promote the use of our privacy techniques to gain novel 19 biological insights through large-scale collaborative genetic studies of multi-ethnic cohorts. 20 With co-I?s Amarasinghe (MIT) and Cho (Broad Institute), we aim to apply these tools to 21 realize the first multi-institution, multi-national secure genetic studies with our partners at the 22 Swiss Personalized Health Network, UK Biobank, Finnish FinnGen, All of Us, NIH NCBI, Broad 23 and Barcelona Supercomputing Center (Letters of Support). We will also use our privacy- 24 preserving approaches to study genomic origins of polygenic traits for disease as well as 25 neuroimaging and other clinical phenotypes. We will continue to actively integrate our methods 26 into community standards (MPEG-G, GA4GH). 27 Successful completion of these aims will result in computational methods and open-source, 28 easy-to-use, production-grade implementations that open the door to secure integration and 29 analysis of massive sets of sensitive genomic and clinical data. With input from our collaborations, 30 we will build these tools and apply them to better understand the molecular causes of human 31 health and its translation to the clinic.

Public Health Relevance

Combining genomic and health-related data from millions of patients will empower the development of clinically relevant measures of human health and disease risks. However, this task requires securely sharing sensitive data at an immense scale beyond what existing cryptographic platforms can achieve. Here we develop novel computational methods to enable biomedical data integration, analysis, and interpretation in a privacy-preserving and highly scalable manner.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG010959-01A1
Application #
9998648
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sofia, Heidi J
Project Start
2020-09-18
Project End
2024-06-30
Budget Start
2020-09-18
Budget End
2021-06-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
001425594
City
Cambridge
State
MA
Country
United States
Zip Code
02142