When locations of individual-level health data are released (in the form of paper or digital maps), the identity of these individuals could be identified through reverse geocoding. Spatial data can therefore not be released. Spatial data can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Geographic masking techniques apply transformation or perturbations to prevent the re-identification of individuals through reverse geocoding. Despite substantial attention by the research community in recent years, there is at present very limited confidence in the ability of geographic masking techniques to reliably protect individual privacy, while at the same time still providing masked datasets that are representative of the original for the purpose of spatial pattern analysis. The proposed research will examine the trade-off between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. Both existing and newly developed geographic masking techniques will be optimized within a framework of spatial:-anonymity. The research goal is to determine the effectiveness and reliability of geographic masking technique to protect individual privacy. The specific research aims are: 1) determine the degree of privacy protection provided by geographic masking techniques - these include both existing techniques and newly developed ones;2) determine how the degree of privacy protection of each geographic masking technique varies with masking parameters, population density and the amount of supplementary information provided;3) determine how robust the degree of privacy protection of each geographic masking technique is when the masking algorithm is disclosed as metadata and/or when multiple versions of the masked data are released;and 4) determine how each geographic masking technique affects the robustness of spatial analytic methods applied to the masked data. This will be accomplished by empirically validating the performance of a range of geographic masking techniques. High-resolution datasets from 12 counties in the US will be used to generate sample data of varying sizes. Spatial :-anonymity will be determined using an n-th nearest neighbor analysis of the masked data, effectively determining an empirical estimate of the probability of discovery. Simulation modeling will be employed to determine the robustness of the masked data to the disclosure of the masking algorithm and the disclosure of multiple versions of the data. Artificial clusters will be introduced to examine the effect of masking on spatial analytic methods. The final practical outcome of the research will be a set of """"""""best practices"""""""" for using geographic masking techniques.
When locations of individual-level health data are released (in the form of paper or digital maps), the identity of these individuals could be identified through reverse geocoding. Spatial data can therefore not be released. Spatial data can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Geographic masking techniques apply transformation or perturbations to prevent the re-identification of individuals through reverse geocoding. Despite substantial attention by the research community in recent years, there is at present very limited confidence in the ability of geographic masking techniques to reliably protect individual privacy, while at the same time still providing masked datasets that are representative of the original for the purpose of spatial pattern analysis. The proposed research will examine the trade-off between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. Both existing and newly developed geographic masking techniques will be optimized within a framework of spatial :-anonymity. The research goal is to determine the effectiveness and reliability of geographic masking technique to protect individual privacy. The specific research aims are: 1) determine the degree of privacy protection provided by geographic masking techniques - these include both existing techniques and newly developed ones;2) determine how the degree of privacy protection of each geographic masking technique varies with masking parameters, population density and the amount of supplementary information provided;3) determine how robust the degree of privacy protection of each geographic masking technique is when the masking algorithm is disclosed as metadata and/or when multiple versions of the masked data are released;and 4) determine how each geographic masking technique affects the robustness of spatial analytic methods applied to the masked data. This will be accomplished by empirically validating the performance of a range of geographic masking techniques. High-resolution datasets from 12 counties in the US will be used to generate sample data of varying sizes. Spatial :-anonymity will be determined using an n-th nearest neighbor analysis of the masked data, effectively determining an empirical estimate of the probability of discovery. Simulation modeling will be employed to determine the robustness of the masked data to the disclosure of the masking algorithm and the disclosure of multiple versions of the data. Artificial clusters will be introduced to examine the effect of masking on spatial analytic methods. The final practical outcome of the research will be a set of best practices for using geographic masking techniques.
Zandbergen, Paul A (2014) Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data. Adv Med 2014:567049 |
Zandbergen, P A; Hart, T C; Lenzer, K E et al. (2012) Error propagation models to examine the effects of geocoding quality on spatial analysis of individual-level datasets. Spat Spatiotemporal Epidemiol 3:69-82 |