Technologies to Enable Privacy in Biomedical Databanks

Malin, Bradley

Abstract

The biomedical community is in the midst of a genomics revolution with the potential to personalize healthcare services to a patient's genome. To capitalize on recent genomics programs, scientists have initiated research to discover relationships between an individual's genomic variations and clinical phenotype. Most of the gathering and analysis of person-specific records has been localized to particular investigators or institutions;however, scientists need to share data collections to strengthen the statistical power of association tests, to allow others the opportunity to verify their analyses, and to comply with policy requirements. To facilitate this process, various organizations around the world are significantly investing in databanks to consolidate patient- specific records from disparate investigators. The availability of such databanks for wide-spread use is contingent on protecting the anonymity of the individuals that correspond to the shared records. Though policy and technical approaches for biomedical records privacy exist, they are inappropriate for environments that consolidate records from multiple organizations. In particular, various investigations demonstrate that the simple de-identification of person-specific biomedical records leave centralized records vulnerable to """"""""re- identification"""""""" through public resources. The overarching goal of our research is to develop a novel data protection model for centralized person-specific biomedical records based on formal privacy and security methods. Our solution will be composed of a suite of technologies, each of which addresses a challenge for the construction, and use, of biomedical databanks. These technologies will be developed in three specific aims: (1) build a tool to integrate research participants'biomedical records from disparate organizations without compromising participants'anonymity, (2) construct methods to securely collect, store, and analyze biomedical data without revealing individual records, and (3) detect and prevent policy violations that can arise as a consequence of investigators queries to the databank. Our methods will be implemented in software that shields scientists and administrators from handling the technical details of unfamiliar privacy and security protocols. The final product will be software that enables disparate data holders to submit information to a centralized biomedical databank, scientists to analyze the stored records, and administrators to monitor the use of system for privacy violations. The software will be designed in a modular and configurable manner, thus enabling users to pick and choose which protection features are most appropriate to their environment. To demonstrate the applicability of our methodology, this research will specifically address a real world data privacy challenge that is a bottleneck for multi-institutional genome wide association studies, but the resulting models and software will be reusable for other centralized databanking environments. We believe that by managing biomedical records through formal privacy protection mechanisms, databanks based on our model will be able to support research with greater throughput than the current status quo.

Public Health Relevance

To ensure wide-scale sharing of person-specific biomedical records for research purposes, it is necessary to build technologies that uphold the anonymity of the corresponding individuals without diminishing the usability of the records. In this research, we will focus on developing technologies to support privacy in an emerging phenomenon: biomedical databanks that centralize records from disparate data providers. The goals of this project are to develop techniques, implemented in open-source software, that a) merge biomedical records on the same subjects without revealing the subjects'identities, b) collect, store, and analyze biomedical records in a secure manner and without revealing individual records, and c) detect and mitigate policy violations while investigators interact with records stored in the databank.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM009989-05
Application #: 8528721
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 2009-09-01
Project End: 2015-08-31
Budget Start: 2013-09-01
Budget End: 2014-08-31
Support Year: 5
Fiscal Year: 2013
Total Cost: $241,696
Indirect Cost: $60,765

Institution

Name: Vanderbilt University Medical Center
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 004413456

City: Nashville
State: TN
Country: United States
Zip Code: 37212

Related projects


NIH 2014 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center
NIH 2013 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center	$241,696
NIH 2012 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center	$255,061
NIH 2011 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center	$261,469
NIH 2010 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center	$272,423
NIH 2010 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center	$1,860
NIH 2009 R01 LM	Technologies to Enable Privacy in Biomedical Databanks Malin, Bradley A. / Vanderbilt University Medical Center	$286,964

Publications

Prasser, Fabian; Gaupp, James; Wan, Zhiyu et al. (2017) An Open Source Tool for Game Theoretic Health Data De-Identification. AMIA Annu Symp Proc 2017:1430-1439

Li, Bo; Vorobeychik, Yevgeniy; Li, Muqun et al. (2017) Scalable Iterative Classification for Sanitizing Large-Scale Datasets. IEEE Trans Knowl Data Eng 29:698-711

Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi et al. (2017) Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach. Am J Hum Genet 100:316-322

Yuan, Jiawei; Malin, Bradley; Modave, François et al. (2017) Towards a privacy preserving cohort discovery framework for clinical research networks. J Biomed Inform 66:42-51

Heatherly, Raymond; Rasmussen, Luke V; Peissig, Peggy L et al. (2016) A multi-institution evaluation of clinical profile anonymization. J Am Med Inform Assoc 23:e131-7

Li, Muqun; Carrell, David; Aberdeen, John et al. (2016) Optimizing annotation resources for natural language de-identification via a game theoretic framework. J Biomed Inform 61:97-109

Kho, Abel N; Cashy, John P; Jackson, Kathryn L et al. (2015) Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc 22:1072-80

Xia, Weiyi; Heatherly, Raymond; Ding, Xiaofeng et al. (2015) R-U policy frontiers for health data de-identification. J Am Med Inform Assoc 22:1029-41

Naveed, Muhammad; Ayday, Erman; Clayton, Ellen W et al. (2015) Privacy in the Genomic Era. ACM Comput Surv 48:

El Emam, Khaled; Rodgers, Sam; Malin, Bradley (2015) Anonymising and sharing individual patient data. BMJ 350:h1139

Showing the most recent 10 out of 44 publications

Comments

Be the first to comment on Bradley Malin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: