A Risk Management Framework for Identifiability in Genomics Research

Malin, Bradley

Abstract

When the Human Genome Project was completed almost ten years ago it cost millions of dollars to sequence an individual's genome. Yet, the evolution of high-throughput sequencing and computational tools has been swift and it will soon be possible to genotype anyone for a nominal price. The ability to generate genomic data coincides with the adoption of electronic health records, setting the stage for large-scale personalized medicine research, the results of which can improve the efficiency, effectiveness, and safety of healthcare delivery. To ease barriers to population-based research, genomic and clinical data are often made available via a de- identified designation by various policies and regulations. However, there is a growing perception that de- identification is a fallacy and that biomedical data can be re-identified with relative ease. This argument, which is partially based on our own studies, forms the core of calls for legislative and regulatory modifications in the literature and court cases. Most notably, a recent Advanced Notice of Proposed Rule Making (ANPRM) inquires if biospecimens, as well as derived genomic data, should be redefined as inherently identifiable. Such labeling would require changes to the Common Rule and HIPAA Privacy Rule and could influence the availability of genomic data for research. It is clear that only a small amount of genomic data is necessary to uniquely distinguish an individual, even in the context of aggregated statistics. However, at the same time, it must be recognized that """"""""distinguishable"""""""" is not equivalent to """"""""identifiable"""""""" and though re-identification is possible it des not imply it is probable. Identifiability concerns should not be trivialized, but there is currentl no sound basis for reasoning about such risks, limiting the ability to make informed policy decisions. There are many factors associated with identifiability, including the information shared with genomic data (e.g., clinical, demographic), with whom it is shared, what other sources of data exist, and the relevant legal landscape. A limiting factor of prior studies in genomic identifiability is their consideration of these factors in isolation, which provides an incomplete picture. To fill this void, the overarching objective of our research is to engineer a foundation, rooted in ethical, legal, and computational formalisms, that provides a basis for reasoning about, and managing, genomic data identifiability risks. This foundation will be realized through specific aims: (1) build a protocol for modeling the extent to which sharing genomic data can substantiate re-identification concerns, (2) design and evaluate practical measures of genomic identifiability for risk assessment protocols, (3) develop a strategy that supplies options to mitigate genomic data identification risks. We envision several notable outcomes from this project. First, this work will yield guidelines and risk assessment strategies that can be employed by genomic data managers and policy makers to inform their decisions regarding identifiability. Second, we will perform an evaluation of our framework with a real, large de-identified database of clinical and genomic data to provide tangible and pragmatic results.

Public Health Relevance

The protective nature of de-identification has been criticized and there are growing calls to relabel all genomic data as inherently identifiable. However, there are no reasoning tools to assist genomic data managers and policy makers to assess identifiability or determine which protections, technical or legal, should be invoked to mitigate risks. The goals of this research project are to develop an interdisciplinary framework to a) model genomic data re-identification risks, b) measure the risks given computational and socio-legal constraints, and c) assist in determining which data protection strategies are the most appropriate to specific data sharing scenarios.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG006844-02
Application #: 8548389
Study Section: Special Emphasis Panel (SEIR)
Program Officer: Mcewen, Jean

Project Start: 2012-09-21
Project End: 2016-06-30
Budget Start: 2013-07-01
Budget End: 2014-06-30
Support Year: 2
Fiscal Year: 2013
Total Cost: $334,251
Indirect Cost: $92,185

Institution

Name: Vanderbilt University Medical Center
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 004413456

City: Nashville
State: TN
Country: United States
Zip Code: 37212

Related projects


NIH 2019 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center
NIH 2018 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center
NIH 2017 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center
NIH 2016 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center	$268,434
NIH 2015 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center
NIH 2015 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center
NIH 2014 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center	$343,000
NIH 2013 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center	$334,251
NIH 2012 R01 HG	A Risk Management Framework for Identifiability in Genomics Research Malin, Bradley A. / Vanderbilt University Medical Center	$395,184

Publications

Xia, Weiyi; Wan, Zhiyu; Yin, Zhijun et al. (2018) It's all in the timing: calibrating temporal penalties for biomedical data sharing. J Am Med Inform Assoc 25:25-31

Wan, Zhiyu; Vorobeychik, Yevgeniy; Kantarcioglu, Murat et al. (2017) Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services. BMC Med Genomics 10:39

Wang, Shuang; Jiang, Xiaoqian; Tang, Haixu et al. (2017) A community effort to protect genomic data sharing, collaboration and outsourcing. NPJ Genom Med 2:33

Prasser, Fabian; Gaupp, James; Wan, Zhiyu et al. (2017) An Open Source Tool for Game Theoretic Health Data De-Identification. AMIA Annu Symp Proc 2017:1430-1439

Li, Bo; Vorobeychik, Yevgeniy; Li, Muqun et al. (2017) Scalable Iterative Classification for Sanitizing Large-Scale Datasets. IEEE Trans Knowl Data Eng 29:698-711

Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi et al. (2017) Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach. Am J Hum Genet 100:316-322

Yuan, Jiawei; Malin, Bradley; Modave, François et al. (2017) Towards a privacy preserving cohort discovery framework for clinical research networks. J Biomed Inform 66:42-51

Heatherly, Raymond; Rasmussen, Luke V; Peissig, Peggy L et al. (2016) A multi-institution evaluation of clinical profile anonymization. J Am Med Inform Assoc 23:e131-7

Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi et al. (2015) A game theoretic framework for analyzing re-identification risk. PLoS One 10:e0120592

Xia, Weiyi; Heatherly, Raymond; Ding, Xiaofeng et al. (2015) R-U policy frontiers for health data de-identification. J Am Med Inform Assoc 22:1029-41

Showing the most recent 10 out of 18 publications

Comments

Be the first to comment on Bradley Malin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: