Validation of geographic masking techniques for location privacy protection

Zandbergen, Paul

Abstract

When locations of individual-level health data are released (in the form of paper or digital maps), the identity of these individuals could be identified through reverse geocoding. Spatial data can therefore not be released. Spatial data can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Geographic masking techniques apply transformation or perturbations to prevent the re-identification of individuals through reverse geocoding. Despite substantial attention by the research community in recent years, there is at present very limited confidence in the ability of geographic masking techniques to reliably protect individual privacy, while at the same time still providing masked datasets that are representative of the original for the purpose of spatial pattern analysis. The proposed research will examine the trade-off between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. Both existing and newly developed geographic masking techniques will be optimized within a framework of spatial:-anonymity. The research goal is to determine the effectiveness and reliability of geographic masking technique to protect individual privacy. The specific research aims are: 1) determine the degree of privacy protection provided by geographic masking techniques - these include both existing techniques and newly developed ones;2) determine how the degree of privacy protection of each geographic masking technique varies with masking parameters, population density and the amount of supplementary information provided;3) determine how robust the degree of privacy protection of each geographic masking technique is when the masking algorithm is disclosed as metadata and/or when multiple versions of the masked data are released;and 4) determine how each geographic masking technique affects the robustness of spatial analytic methods applied to the masked data. This will be accomplished by empirically validating the performance of a range of geographic masking techniques. High-resolution datasets from 12 counties in the US will be used to generate sample data of varying sizes. Spatial :-anonymity will be determined using an n-th nearest neighbor analysis of the masked data, effectively determining an empirical estimate of the probability of discovery. Simulation modeling will be employed to determine the robustness of the masked data to the disclosure of the masking algorithm and the disclosure of multiple versions of the data. Artificial clusters will be introduced to examine the effect of masking on spatial analytic methods. The final practical outcome of the research will be a set of """"""""best practices"""""""" for using geographic masking techniques.

Public Health Relevance

When locations of individual-level health data are released (in the form of paper or digital maps), the identity of these individuals could be identified through reverse geocoding. Spatial data can therefore not be released. Spatial data can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Geographic masking techniques apply transformation or perturbations to prevent the re-identification of individuals through reverse geocoding. Despite substantial attention by the research community in recent years, there is at present very limited confidence in the ability of geographic masking techniques to reliably protect individual privacy, while at the same time still providing masked datasets that are representative of the original for the purpose of spatial pattern analysis. The proposed research will examine the trade-off between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. Both existing and newly developed geographic masking techniques will be optimized within a framework of spatial :-anonymity. The research goal is to determine the effectiveness and reliability of geographic masking technique to protect individual privacy. The specific research aims are: 1) determine the degree of privacy protection provided by geographic masking techniques - these include both existing techniques and newly developed ones;2) determine how the degree of privacy protection of each geographic masking technique varies with masking parameters, population density and the amount of supplementary information provided;3) determine how robust the degree of privacy protection of each geographic masking technique is when the masking algorithm is disclosed as metadata and/or when multiple versions of the masked data are released;and 4) determine how each geographic masking technique affects the robustness of spatial analytic methods applied to the masked data. This will be accomplished by empirically validating the performance of a range of geographic masking techniques. High-resolution datasets from 12 counties in the US will be used to generate sample data of varying sizes. Spatial :-anonymity will be determined using an n-th nearest neighbor analysis of the masked data, effectively determining an empirical estimate of the probability of discovery. Simulation modeling will be employed to determine the robustness of the masked data to the disclosure of the masking algorithm and the disclosure of multiple versions of the data. Artificial clusters will be introduced to examine the effect of masking on spatial analytic methods. The final practical outcome of the research will be a set of best practices for using geographic masking techniques.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Environmental Health Sciences (NIEHS)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21ES019666-01
Application #: 8027840
Study Section: Special Emphasis Panel (ZRG1-AARR-F (52))
Program Officer: Dilworth, Caroline H

Project Start: 2011-01-01
Project End: 2012-12-31
Budget Start: 2011-01-01
Budget End: 2011-12-31
Support Year: 1
Fiscal Year: 2011
Total Cost: $74,981
Indirect Cost

Institution

Name: University of New Mexico
Department: Miscellaneous
Type: Schools of Arts and Sciences
DUNS #: 868853094

City: Albuquerque
State: NM
Country: United States
Zip Code: 87131

Related projects


NIH 2012 R21 ES	Validation of geographic masking techniques for location privacy protection Zandbergen, Paul Adrianus / University of New Mexico	$71,187
NIH 2011 R21 ES	Validation of geographic masking techniques for location privacy protection Zandbergen, Paul Adrianus / University of New Mexico	$74,981

Publications

Zandbergen, Paul A (2014) Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data. Adv Med 2014:567049

Zandbergen, P A; Hart, T C; Lenzer, K E et al. (2012) Error propagation models to examine the effects of geocoding quality on spatial analysis of individual-level datasets. Spat Spatiotemporal Epidemiol 3:69-82

Comments

Be the first to comment on Paul Zandbergen's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: