When locations of individual-level health data are released (in the form of paper or digital maps), the identity of these individuals could be identified through reverse geocoding. Spatial data can therefore not be released. Spatial data can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Geographic masking techniques apply transformation or perturbations to prevent the re-identification of individuals through reverse geocoding. Despite substantial attention by the research community in recent years, there is at present very limited confidence in the ability of geographic masking techniques to reliably protect individual privacy, while at the same time still providing masked datasets that are representative of the original for the purpose of spatial pattern analysis. The proposed research will examine the trade-off between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. Both existing and newly developed geographic masking techniques will be optimized within a framework of spatial:-anonymity. The research goal is to determine the effectiveness and reliability of geographic masking technique to protect individual privacy. The specific research aims are: 1) determine the degree of privacy protection provided by geographic masking techniques - these include both existing techniques and newly developed ones;2) determine how the degree of privacy protection of each geographic masking technique varies with masking parameters, population density and the amount of supplementary information provided;3) determine how robust the degree of privacy protection of each geographic masking technique is when the masking algorithm is disclosed as metadata and/or when multiple versions of the masked data are released;and 4) determine how each geographic masking technique affects the robustness of spatial analytic methods applied to the masked data. This will be accomplished by empirically validating the performance of a range of geographic masking techniques. High-resolution datasets from 12 counties in the US will be used to generate sample data of varying sizes. Spatial :-anonymity will be determined using an n-th nearest neighbor analysis of the masked data, effectively determining an empirical estimate of the probability of discovery. Simulation modeling will be employed to determine the robustness of the masked data to the disclosure of the masking algorithm and the disclosure of multiple versions of the data. Artificial clusters will be introduced to examine the effect of masking on spatial analytic methods. The final practical outcome of the research will be a set of "best practices" for using geographic masking techniques.

Public Health Relevance

When locations of individual-level health data are released (in the form of paper or digital maps), the identity of these individuals could be identified through reverse geocoding. Spatial data can therefore not be released. Spatial data can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Geographic masking techniques apply transformation or perturbations to prevent the re-identification of individuals through reverse geocoding. Despite substantial attention by the research community in recent years, there is at present very limited confidence in the ability of geographic masking techniques to reliably protect individual privacy, while at the same time still providing masked datasets that are representative of the original for the purpose of spatial pattern analysis. The proposed research will examine the trade-off between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. Both existing and newly developed geographic masking techniques will be optimized within a framework of spatial :-anonymity. The research goal is to determine the effectiveness and reliability of geographic masking technique to protect individual privacy. The specific research aims are: 1) determine the degree of privacy protection provided by geographic masking techniques - these include both existing techniques and newly developed ones;2) determine how the degree of privacy protection of each geographic masking technique varies with masking parameters, population density and the amount of supplementary information provided;3) determine how robust the degree of privacy protection of each geographic masking technique is when the masking algorithm is disclosed as metadata and/or when multiple versions of the masked data are released;and 4) determine how each geographic masking technique affects the robustness of spatial analytic methods applied to the masked data. This will be accomplished by empirically validating the performance of a range of geographic masking techniques. High-resolution datasets from 12 counties in the US will be used to generate sample data of varying sizes. Spatial :-anonymity will be determined using an n-th nearest neighbor analysis of the masked data, effectively determining an empirical estimate of the probability of discovery. Simulation modeling will be employed to determine the robustness of the masked data to the disclosure of the masking algorithm and the disclosure of multiple versions of the data. Artificial clusters will be introduced to examine the effect of masking on spatial analytic methods. The final practical outcome of the research will be a set of best practices for using geographic masking techniques.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21ES019666-02
Application #
8204866
Study Section
Special Emphasis Panel (ZRG1-AARR-F (52))
Program Officer
Dilworth, Caroline H
Project Start
2011-01-01
Project End
2012-12-31
Budget Start
2012-01-01
Budget End
2012-12-31
Support Year
2
Fiscal Year
2012
Total Cost
$71,187
Indirect Cost
$22,552
Name
University of New Mexico
Department
Miscellaneous
Type
Schools of Arts and Sciences
DUNS #
868853094
City
Albuquerque
State
NM
Country
United States
Zip Code
87131