The biomedical community is in the midst of a genomics revolution with the potential to personalize healthcare services to a patient's genome. To capitalize on recent genomics programs, scientists have initiated research to discover relationships between an individual's genomic variations and clinical phenotype. Most of the gathering and analysis of person-specific records has been localized to particular investigators or institutions;however, scientists need to share data collections to strengthen the statistical power of association tests, to allow others the opportunity to verify their analyses, and to comply with policy requirements. To facilitate this process, various organizations around the world are significantly investing in databanks to consolidate patient- specific records from disparate investigators. The availability of such databanks for wide-spread use is contingent on protecting the anonymity of the individuals that correspond to the shared records. Though policy and technical approaches for biomedical records privacy exist, they are inappropriate for environments that consolidate records from multiple organizations. In particular, various investigations demonstrate that the simple de-identification of person-specific biomedical records leave centralized records vulnerable to """"""""re- identification"""""""" through public resources. The overarching goal of our research is to develop a novel data protection model for centralized person-specific biomedical records based on formal privacy and security methods. Our solution will be composed of a suite of technologies, each of which addresses a challenge for the construction, and use, of biomedical databanks. These technologies will be developed in three specific aims: (1) build a tool to integrate research participants'biomedical records from disparate organizations without compromising participants'anonymity, (2) construct methods to securely collect, store, and analyze biomedical data without revealing individual records, and (3) detect and prevent policy violations that can arise as a consequence of investigators queries to the databank. Our methods will be implemented in software that shields scientists and administrators from handling the technical details of unfamiliar privacy and security protocols. The final product will be software that enables disparate data holders to submit information to a centralized biomedical databank, scientists to analyze the stored records, and administrators to monitor the use of system for privacy violations. The software will be designed in a modular and configurable manner, thus enabling users to pick and choose which protection features are most appropriate to their environment. To demonstrate the applicability of our methodology, this research will specifically address a real world data privacy challenge that is a bottleneck for multi-institutional genome wide association studies, but the resulting models and software will be reusable for other centralized databanking environments. We believe that by managing biomedical records through formal privacy protection mechanisms, databanks based on our model will be able to support research with greater throughput than the current status quo.

Public Health Relevance

To ensure wide-scale sharing of person-specific biomedical records for research purposes, it is necessary to build technologies that uphold the anonymity of the corresponding individuals without diminishing the usability of the records. In this research, we will focus on developing technologies to support privacy in an emerging phenomenon: biomedical databanks that centralize records from disparate data providers. The goals of this project are to develop techniques, implemented in open-source software, that a) merge biomedical records on the same subjects without revealing the subjects'identities, b) collect, store, and analyze biomedical records in a secure manner and without revealing individual records, and c) detect and mitigate policy violations while investigators interact with records stored in the databank.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM009989-05
Application #
8528721
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-09-01
Project End
2015-08-31
Budget Start
2013-09-01
Budget End
2014-08-31
Support Year
5
Fiscal Year
2013
Total Cost
$241,696
Indirect Cost
$60,765
Name
Vanderbilt University Medical Center
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
004413456
City
Nashville
State
TN
Country
United States
Zip Code
37212
Prasser, Fabian; Gaupp, James; Wan, Zhiyu et al. (2017) An Open Source Tool for Game Theoretic Health Data De-Identification. AMIA Annu Symp Proc 2017:1430-1439
Li, Bo; Vorobeychik, Yevgeniy; Li, Muqun et al. (2017) Scalable Iterative Classification for Sanitizing Large-Scale Datasets. IEEE Trans Knowl Data Eng 29:698-711
Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi et al. (2017) Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach. Am J Hum Genet 100:316-322
Yuan, Jiawei; Malin, Bradley; Modave, François et al. (2017) Towards a privacy preserving cohort discovery framework for clinical research networks. J Biomed Inform 66:42-51
Heatherly, Raymond; Rasmussen, Luke V; Peissig, Peggy L et al. (2016) A multi-institution evaluation of clinical profile anonymization. J Am Med Inform Assoc 23:e131-7
Li, Muqun; Carrell, David; Aberdeen, John et al. (2016) Optimizing annotation resources for natural language de-identification via a game theoretic framework. J Biomed Inform 61:97-109
Kho, Abel N; Cashy, John P; Jackson, Kathryn L et al. (2015) Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc 22:1072-80
Xia, Weiyi; Heatherly, Raymond; Ding, Xiaofeng et al. (2015) R-U policy frontiers for health data de-identification. J Am Med Inform Assoc 22:1029-41
Naveed, Muhammad; Ayday, Erman; Clayton, Ellen W et al. (2015) Privacy in the Genomic Era. ACM Comput Surv 48:
El Emam, Khaled; Rodgers, Sam; Malin, Bradley (2015) Anonymising and sharing individual patient data. BMJ 350:h1139

Showing the most recent 10 out of 44 publications