With the rapid growth of the data volume (e.g., human genomic data) collected in biomedical research, data protection, in particular for patients? privacy in secondary uses of these data, has attracted much attention recently. Today, a vast majority of sensitive biomedical data, including individual human genomic data and their associated health metadata, are shared only through controlled-access databases (e.g. dbGaP) and biomedical researchers are required to sign a user agreement before getting access to these data. Security research has already produced a suite of techniques that can serve the general purpose of privacy-preserving computation; their direct applications are, however, too expensive (in terms of resource consumption) for real-world biomedical applications. An alternative solution is hardware-assisted Trusted Execution Environment (TEE) solutions developed or being developed by both hardware vendors (Intel, AMD, ARM) and the open-source research community. A prominent example is Intel?s Software Guard Extension (SGX), which is available as a feature in Intel's mainstream CPUs (i.e., Skylake and Kaby Lake). In this project, we plan to explore potential applications of TEE to two popular genome computation tasks involving sensitive biomedical data, i.e., the genome-wide and phenome-wide association studies. For GWAS, a secondary research user may collect genomic sequences (in encrypted form) with (cases) or without (controls) a disease phenotype from multiple data owners, on which association tests or advanced GWAS algorithms can be conducted within the SGX enclave. Similarly, for PheWAS, a user may collect phenotype data from individuals whose genomes containing (case) or not containing (control) one or more specific variations. We will address two issues when developing these approaches: 1) we will customize GWAS/PheWAS algorithms for efficient execution in the TEE with limited resources (e.g, memory, I/O, etc), and 2) we will develop new genome computing outsourcing and data sharing platforms suing the SGX techniques, and further understand and mitigate its potential side-channel risks with regards to GWAS/PheWAS computing tasks. The proposed research will lead to a practical solution for secure GWAS and PheWAS in three application scenarios: 1) secure outsourcing: a research institution collects matched genomic and phenotypic data from a large cohort of case and control individuals, and outsources the storage of these data and potential repeated GWAS and PheWAS computation to a public or commercial cloud; 2) secure collaboration: a consortium of researchers across multiple institutions attempt to collaborate on a large GWAS/PheWAS study using the data collected by each participating institution; and 3) secure data sharing: researchers want to share their data with a broad biomedical research community so that potential data users may conduct a secondary GWAS/PheWAS analysis.

Public Health Relevance

We propose to develop efficient algorithms for genome-wide and phenome-wide association studies on genomic and phenotype data using hardware supported Trusted Execution Environment (TEE) technology. The proposed research will allow for a secondary analysis to be conducted on encrypted genomic and phenotype data within an enclave designed to be resilient to attacks from its host operating system, and thus will significantly improve the protection of patients? private data. The proposed research will lead to broad, responsible sharing of human genomic data for improving human health.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG010798-02
Application #
9993637
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sofia, Heidi J
Project Start
2019-08-09
Project End
2023-05-31
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Indiana University Bloomington
Department
Miscellaneous
Type
Schools of Arts and Sciences
DUNS #
006046700
City
Bloomington
State
IN
Country
United States
Zip Code
47401