The biomedical community is in the midst of a genomics revolution with the potential to personalize healthcare services to a patient's genome. To capitalize on recent genomics programs, scientists have initiated research to discover relationships between an individual's genomic variations and clinical phenotype. Most of the gathering and analysis of person-specific records has been localized to particular investigators or institutions;however, scientists need to share data collections to strengthen the statistical power of association tests, to allow others the opportunity to verify their analyses, and to comply with policy requirements. To facilitate this process, various organizations around the world are significantly investing in databanks to consolidate patient- specific records from disparate investigators. The availability of such databanks for wide-spread use is contingent on protecting the anonymity of the individuals that correspond to the shared records. Though policy and technical approaches for biomedical records privacy exist, they are inappropriate for environments that consolidate records from multiple organizations. In particular, various investigations demonstrate that the simple de-identification of person-specific biomedical records leave centralized records vulnerable to "re- identification" through public resources. The overarching goal of our research is to develop a novel data protection model for centralized person-specific biomedical records based on formal privacy and security methods. Our solution will be composed of a suite of technologies, each of which addresses a challenge for the construction, and use, of biomedical databanks. These technologies will be developed in three specific aims: (1) build a tool to integrate research participants'biomedical records from disparate organizations without compromising participants'anonymity, (2) construct methods to securely collect, store, and analyze biomedical data without revealing individual records, and (3) detect and prevent policy violations that can arise as a consequence of investigators queries to the databank. Our methods will be implemented in software that shields scientists and administrators from handling the technical details of unfamiliar privacy and security protocols. The final product will be software that enables disparate data holders to submit information to a centralized biomedical databank, scientists to analyze the stored records, and administrators to monitor the use of system for privacy violations. The software will be designed in a modular and configurable manner, thus enabling users to pick and choose which protection features are most appropriate to their environment. To demonstrate the applicability of our methodology, this research will specifically address a real world data privacy challenge that is a bottleneck for multi-institutional genome wide association studies, but the resulting models and software will be reusable for other centralized databanking environments. We believe that by managing biomedical records through formal privacy protection mechanisms, databanks based on our model will be able to support research with greater throughput than the current status quo.

Public Health Relevance

To ensure wide-scale sharing of person-specific biomedical records for research purposes, it is necessary to build technologies that uphold the anonymity of the corresponding individuals without diminishing the usability of the records. In this research, we will focus on developing technologies to support privacy in an emerging phenomenon: biomedical databanks that centralize records from disparate data providers. The goals of this project are to develop techniques, implemented in open-source software, that a) merge biomedical records on the same subjects without revealing the subjects'identities, b) collect, store, and analyze biomedical records in a secure manner and without revealing individual records, and c) detect and mitigate policy violations while investigators interact with records stored in the databank.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Heatherly, Raymond; Denny, Joshua C; Haines, Jonathan L et al. (2014) Size matters: how population size influences genotype-phenotype association studies in anonymized data. J Biomed Inform 52:243-50
Durham, Elizabeth Ashley; Kantarcioglu, Murat; Xue, Yuan et al. (2014) Composite Bloom Filters for Secure Record Linkage. IEEE Trans Knowl Data Eng 26:2956-2968
Li, Muqun; Carrell, David; Aberdeen, John et al. (2014) De-identification of clinical narratives through writing complexity measures. Int J Med Inform 83:750-67
Kuzu, Mehmet; Kantarcioglu, Murat; Durham, Elizabeth Ashley et al. (2013) A practical approach to achieve private medical record linkage in light of public resources. J Am Med Inform Assoc 20:285-92
Heatherly, Raymond D; Loukides, Grigorios; Denny, Joshua C et al. (2013) Enabling genomic-phenomic association discovery without sacrificing anonymity. PLoS One 8:e53875
Altman, Russ B; Clayton, Ellen Wright; Kohane, Isaac S et al. (2013) Data re-identification: societal safeguards. Science 339:1032-3
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk et al. (2013) A secure distributed logistic regression protocol for the detection of rare adverse drug events. J Am Med Inform Assoc 20:453-61
Tamersoy, Acar; Loukides, Grigorios; Nergiz, Mehmet Ercan et al. (2012) Anonymization of longitudinal electronic medical records. IEEE Trans Inf Technol Biomed 16:413-23
Canim, Mustafa; Kantarcioglu, Murat; Malin, Bradley (2012) Secure management of biomedical data with cryptographic hardware. IEEE Trans Inf Technol Biomed 16:166-75
Malin, Bradley; Loukides, Grigorios; Benitez, Kathleen et al. (2011) Identifiability in biobanks: models, measures, and mitigation strategies. Hum Genet 130:383-92

Showing the most recent 10 out of 16 publications