Addressing Disclosure Risk of Contextualized Microdata in Survey Design

Elliott, Michael; Brown, Daniel; Leicht, Kevin; Raghunathan, Trivellore; Witkowski, Kristine

Abstract

This project seeks to increase the availability of detailed research data about a person's neighborhood and individual characteristics, behaviors, and health outcomes, information which is crucial for research on critical national issues, such as health disparities. However, a delicate balance must be struck between providing easy access to these data and protecting the anonymity of study participants. Responding to the rising demand for contextualized microdata, large national surveys typically collect meticulous information about their subjects'personal and geographic attributes. When data are prepared for public-use files, however, much of this important detail is either suppressed or coarsened to protect the anonymity of respondents. These limitations reduce opportunities for important scientific research and impose costly burdens on producers and distributors who must implement restrictive data use agreements. Little is known about how the ability to protect a respondent's identity (i.e., disclosure risk) is affected by releasing microdata files that contain the contextual attributes of counties, tracts, blockgroups, and 1/2-mile geographic areas surrounding each subject. Considering factors that are determined at the outset of a study, it is not known how disclosure risk of contextualized microdata is affected by varying levels of sensitive information, or different sampling designs and analytical purposes. Turning to factors that are usually addressed after data collection when research files are prepared for dissemination, it is not known to what extent that disclosure risk and the scientific value of data is affected by the selection of different variables for release or application of various statistical techniques to limit disclosure. With a priori knowledge of these determinants, data producers will be able to anticipate how many and which respondents are at risk of disclosure, and adapt their data collection methods to protect them. Such adjustments will preserve and enhance the utility of the data for broad dissemination. Also, factors that affect data collection efficiencies can then be measured, allowing for the estimation of survey costs associated with modifying sampling designs to meet disclosure goals. Hence this project seeks to incorporate disclosure risk into the conceptual and empirical frameworks used in the evaluation of survey designs. In so doing, we first develop and validate models that predict the composition of survey data under different sampling designs. Next we develop measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs. Lastly we conduct simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

Public Health Relevance

Our project will increase the value and availability of scientific data by developing ways to assess, at the earliest stages of research, the risks of disclosing confidential information about study subjects. Detailed data about peoples'neighborhoods, characteristics, behaviors and health are essential for informing policy and advancing science. But a balance must be struck between providing easy access to such data and protecting confidential information. By evaluating such disclosure risks in the design phase of research, we will enhance investments in data collection and increase the value and availability of data on detailed subpopulations and their environments.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type: Research Project (R01)
Project #: 1R01HD067184-01A1
Application #: 8234528
Study Section: Special Emphasis Panel (ZRG1-HDM-Q (54))
Program Officer: Bures, Regina M

Project Start: 2012-01-16
Project End: 2016-11-30
Budget Start: 2012-01-16
Budget End: 2012-11-30
Support Year: 1
Fiscal Year: 2012
Total Cost: $598,260
Indirect Cost: $193,517

Institution

Name: University of Michigan Ann Arbor
Department
Type: Organized Research Units
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2017 R01 HD	Addressing Disclosure Risk of Contextualized Microdata in Survey Design Elliott, Michael R.; Brown, Daniel G.; Raghunathan, Trivellore E.; Witkowski, Kristine Marie / University of Michigan Ann Arbor	$444,503
NIH 2016 R01 HD	Addressing Disclosure Risk of Contextualized Microdata in Survey Design Elliott, Michael R.; Brown, Daniel G.; Leicht, Kevin; Raghunathan, Trivellore E.; Witkowski, Kristine Marie / University of Michigan Ann Arbor	$444,272
NIH 2015 R01 HD	Addressing Disclosure Risk of Contextualized Microdata in Survey Design Elliott, Michael R.; Brown, Daniel G.; Leicht, Kevin; Raghunathan, Trivellore E.; Witkowski, Kristine Marie / University of Michigan Ann Arbor	$446,122
NIH 2013 R01 HD	Addressing Disclosure Risk of Contextualized Microdata in Survey Design Elliott, Michael R.; Brown, Daniel G.; Leicht, Kevin; Raghunathan, Trivellore E.; Witkowski, Kristine Marie / University of Michigan Ann Arbor	$553,314
NIH 2012 R01 HD	Addressing Disclosure Risk of Contextualized Microdata in Survey Design Elliott, Michael R.; Brown, Daniel G.; Leicht, Kevin; Raghunathan, Trivellore E.; Witkowski, Kristine Marie / University of Michigan Ann Arbor	$598,260

Comments

Be the first to comment on Michael Elliott's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: