In the age of precision medicine, genomic data are being integrated with other health care data to support personalized and calibrated clinical decision-making. Genomic sequence data are too large to be stored in electronic health record (EHR) systems and need to be separately stored. While cloud computing offers a cost-efficient and scalable platform, the privacy and security concerns about outsourcing genomic data are challenging issues. The common perception is that the ease of access to remote data and the protection of privacy are at odds with each other. We propose a new genomics archiving and communications system (GACS) that meets both requirements by using state-of-the-art homomorphic encryption algorithms and matrix representation of data and queries. In this system, variants are represented as vectors, that are homomorphically encrypted by a client and stored on the GACS server. When analysis is required, a query is generated in the form of a matrix. This matrix is encrypted (or can remain in plaintext depending on the task) and sent to the GACS server. The server computes on encrypted data, produces an encrypted result and returns it to the client, who has the secret key to decode it. The GACS is not able to decrypt the data or the encrypted queries, thus guaranteeing that privacy and security are maintained on the GACS. Preliminary results of the algorithms show that after decryption, the results are the same as results from computing on plaintext. In this project, we will implement our GACS system software modules and demonstrate the use of the system with examples from three use- cases: pharmacogenomics, clinical trials eligibility and analysis for disease risks. We will measure performance speed and memory consumption in all three use-cases. A GACS system as a cloud-hosted service can reduce the computational burden on healthcare facilities. It can provide small healthcare facilities with the same genomic analysis capability available to larger hospitals. In addition, clinical decision support (CDS) can be deployed on the GACS. As clinical guidelines evolve in response to new discoveries linking genetic variants to disease and medicines, healthcare facilities can stay in compliance with the guidelines.

Public Health Relevance

The use of genomic data in clinical decision-making is rapidly increasing. Since the size of genomic sequence data are large, they cannot be stored easily in electronic health record systems. Furthermore, since genomic data are highly sensitive in nature, they must be protected in storage and during analysis. We propose a new genomics archiving and communications system (GACS) that satisfies the requirement of easy access to the data by clinical systems and provides strong protection for privacy. This system is based on state-of-the- art encryption algorithms. Genome data are encrypted and stored in the GACS. The data are analyzed while remaining encrypted. The GACS learns neither the data nor the analysis questions, thus guaranteeing that privacy is maintained on the GACS server. We will test the new system on three use-cases: pharmacogenomics, clinical trials eligibility, and gene analysis for disease risk.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Technology Transfer (STTR) Grants - Phase I (R41)
Project #
1R41HG010978-01
Application #
9906292
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sofia, Heidi J
Project Start
2019-09-09
Project End
2020-08-31
Budget Start
2019-09-09
Budget End
2020-08-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Elimu Informatics, Inc.
Department
Type
DUNS #
City
Richmond
State
CA
Country
United States
Zip Code
94801