The biomedical community is in the midst of a genomics revolution with the potential to personalize healthcare services to a patient's genome. To capitalize on recent genomics programs, scientists have initiated research to discover relationships between an individual's genomic variations and clinical phenotype. Most of the gathering and analysis of person-specific records has been localized to particular investigators or institutions;however, scientists need to share data collections to strengthen the statistical power of association tests, to allow others the opportunity to verify their analyses, and to comply with policy requirements. To facilitate this process, various organizations around the world are significantly investing in databanks to consolidate patient- specific records from disparate investigators. The availability of such databanks for wide-spread use is contingent on protecting the anonymity of the individuals that correspond to the shared records. Though policy and technical approaches for biomedical records privacy exist, they are inappropriate for environments that consolidate records from multiple organizations. In particular, various investigations demonstrate that the simple de-identification of person-specific biomedical records leave centralized records vulnerable to """"""""re- identification"""""""" through public resources. The overarching goal of our research is to develop a novel data protection model for centralized person-specific biomedical records based on formal privacy and security methods. Our solution will be composed of a suite of technologies, each of which addresses a challenge for the construction, and use, of biomedical databanks. These technologies will be developed in three specific aims: (1) build a tool to integrate research participants'biomedical records from disparate organizations without compromising participants'anonymity, (2) construct methods to securely collect, store, and analyze biomedical data without revealing individual records, and (3) detect and prevent policy violations that can arise as a consequence of investigators queries to the databank. Our methods will be implemented in software that shields scientists and administrators from handling the technical details of unfamiliar privacy and security protocols. The final product will be software that enables disparate data holders to submit information to a centralized biomedical databank, scientists to analyze the stored records, and administrators to monitor the use of system for privacy violations. The software will be designed in a modular and configurable manner, thus enabling users to pick and choose which protection features are most appropriate to their environment. To demonstrate the applicability of our methodology, this research will specifically address a real world data privacy challenge that is a bottleneck for multi-institutional genome wide association studies, but the resulting models and software will be reusable for other centralized databanking environments. We believe that by managing biomedical records through formal privacy protection mechanisms, databanks based on our model will be able to support research with greater throughput than the current status quo.

Public Health Relevance

To ensure wide-scale sharing of person-specific biomedical records for research purposes, it is necessary to build technologies that uphold the anonymity of the corresponding individuals without diminishing the usability of the records. In this research, we will focus on developing technologies to support privacy in an emerging phenomenon: biomedical databanks that centralize records from disparate data providers. The goals of this project are to develop techniques, implemented in open-source software, that a) merge biomedical records on the same subjects without revealing the subjects'identities, b) collect, store, and analyze biomedical records in a secure manner and without revealing individual records, and c) detect and mitigate policy violations while investigators interact with records stored in the databank.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM009989-06
Application #
8714051
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-09-01
Project End
2015-08-31
Budget Start
2014-09-01
Budget End
2015-08-31
Support Year
6
Fiscal Year
2014
Total Cost
Indirect Cost
Name
Vanderbilt University Medical Center
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
City
Nashville
State
TN
Country
United States
Zip Code
37212
Heatherly, Raymond; Rasmussen, Luke V; Peissig, Peggy L et al. (2016) A multi-institution evaluation of clinical profile anonymization. J Am Med Inform Assoc 23:e131-7
Kho, Abel N; Cashy, John P; Jackson, Kathryn L et al. (2015) Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc 22:1072-80
Barth-Jones, Daniel; El Emam, Khaled; Bambauer, Jane et al. (2015) Assessing data intrusion threats. Science 348:194-5
El Emam, Khaled; Rodgers, Sam; Malin, Bradley (2015) Anonymising and sharing individual patient data. BMJ 350:h1139
Xia, Weiyi; Heatherly, Raymond; Ding, Xiaofeng et al. (2015) R-U policy frontiers for health data de-identification. J Am Med Inform Assoc 22:1029-41
Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi et al. (2015) A game theoretic framework for analyzing re-identification risk. PLoS One 10:e0120592
Naveed, Muhammad; Ayday, Erman; Clayton, Ellen W et al. (2015) Privacy in the Genomic Era. ACM Comput Surv 48:
Xie, Wei; Kantarcioglu, Murat; Bush, William S et al. (2014) SecureMA: protecting participant privacy in genetic association meta-analysis. Bioinformatics 30:3334-41
Durham, Elizabeth Ashley; Kantarcioglu, Murat; Xue, Yuan et al. (2014) Composite Bloom Filters for Secure Record Linkage. IEEE Trans Knowl Data Eng 26:2956-2968
Li, Muqun; Carrell, David; Aberdeen, John et al. (2014) De-identification of clinical narratives through writing complexity measures. Int J Med Inform 83:750-67

Showing the most recent 10 out of 39 publications