Recent studies and advisory reports to the government have pointed out that information sharing with appropriate privacy protection to enable health research is one of the most critical challenges of our time. Current de-identification approaches or microdata (i.e. original records) release are subject to various re-identification and disclosure risks and do not provide sufficient protection for our patients. A complementary approach is to release statistical macrodata (i.e. derived statistics), which can also be used to construct synthetic data that mimic the original data. Differential privacy has emerged in recent years as one of the strongest provable privacy guarantees for statistical data release. However, it remains a challenge to efficiently and effectively release a dataset (consistent with a set of statistics) that ensures differential privacy while guaranteeing data utiity for targeted applications. Applying differential privacy to health data presents additional new challenges due to the high dimensionality, high correlation, and cross- institution distribution in health datasets that support cross-sectional, longitudinal, and cross-institutional studies. The absence of practical data sharing software with rigorous privacy guarantees has made data providers uncomfortable with sharing data. Lack of datasets has severely hindered medical and informatics research in general. We propose to collaborate with the NCBC iDASH (integrating Data for Analysis, Anonymization, and Sharing) center to develop a Statistical Health informAtion RElease (SHARE) toolkit with differential privacy.
The specific aims are: 1) develop and evaluate novel methods for releasing statistical health data with differential privacy to address the high-dimensionality, self-correlation, and cross-institution distribution of data, 2) ue the SHARE toolkit for clinical dataset construction and use case studies using Emory Analytical Information Warehouse (AIW) and UCSD Clinical Data Warehouse for Research (CDWR) and demonstrate its utility for cohort discovery queries and hospital readmission study, and 3) deploy the SHARE toolkit at Emory and iDASH as well as Atlanta Clinical & Translational Science Institute (ACTSI) and UCSD Clinical and Translational Research Institute (CTRI). The techniques and software tools envisioned by SHARE will facilitate health information sharing for health research and have a direct impact on predictive health and translational medicine as well as informatics practice.

Public Health Relevance

Collaborating with iDASH, a National Center for Biomedical Computing for 'integrating Data for Analysis, Anonymization and Sharing', we will develop innovative and practical techniques and software for protecting the privacy of human subjects in the large-scale analysis of clinical and public health data. The techniques and software tools will significantly help overcome the barrier to data access, and ultimately accelerate the public health research.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM114612-03
Application #
9252489
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Marcus, Stephen
Project Start
2015-04-01
Project End
2018-03-31
Budget Start
2017-04-01
Budget End
2018-03-31
Support Year
3
Fiscal Year
2017
Total Cost
$302,642
Indirect Cost
$73,452
Name
Emory University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
066469933
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Sadat, Md Nazmus; Aziz, Md Momin Al; Mohammed, Noman et al. (2018) SAFETY: Secure gwAs in Federated Environment Through a hYbrid solution. IEEE/ACM Trans Comput Biol Bioinform :
Bonomi, Luca; Jiang, Xiaoqian (2018) Patient ranking with temporally annotated data. J Biomed Inform 78:43-53
Miotto, Riccardo; Wang, Fei; Wang, Shuang et al. (2018) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 19:1236-1246
Sadat, Md Nazmus; Jiang, Xiaoqian; Aziz, Md Momin Al et al. (2018) Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation. JMIR Med Inform 6:e14
Vaidya, Jaideep; Shafiq, Basit; Asani, Muazzam et al. (2017) A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis. AMIA Annu Symp Proc 2017:1695-1704
Wang, Meng; Ji, Zhanglong; Wang, Shuang et al. (2017) Mechanisms to protect the privacy of families when using the transmission disequilibrium test in genome-wide association studies. Bioinformatics 33:3716-3725
Sun, Xiaobo; Pittard, William S; Xu, Tianlei et al. (2017) Omicseq: a web-based search engine for exploring omics datasets. Nucleic Acids Res 45:W445-W452
Cao, Yang; Yoshikawa, Masatoshi; Xiao, Yonghui et al. (2017) Quantifying Differential Privacy under Temporal Correlations. Proc Int Conf Data Eng 2017:821-832
Raisaro, Jean Louis; Tramèr, Florian; Ji, Zhanglong et al. (2017) Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks. J Am Med Inform Assoc 24:799-805
Li, Haoran; Xiong, Li; Ji, Zhanglong et al. (2017) Partitioning-based mechanisms under personalized differential privacy. Adv Knowl Discov Data Min (2017) 10234:615-627

Showing the most recent 10 out of 20 publications