This project seeks to increase the availability of detailed research data about a person's neighborhood and individual characteristics, behaviors, and health outcomes, information which is crucial for research on critical national issues, such as health disparities. However, a delicate balance must be struck between providing easy access to these data and protecting the anonymity of study participants. Responding to the rising demand for contextualized microdata, large national surveys typically collect meticulous information about their subjects' personal and geographic attributes. When data are prepared for public-use files, however, much of this important detail is either suppressed or coarsened to protect the anonymity of respondents. These limitations reduce opportunities for important scientific research and impose costly burdens on producers and distributors who must implement restrictive data use agreements. Little is known about how the ability to protect a respondent's identity (i.e., disclosure risk) is affected by releasing microdata files that contain the contextual attributes of counties, tracts, blockgroups, and 1/2-mile geographic areas surrounding each subject. Considering factors that are determined at the outset of a study, it is not known how disclosure risk of contextualized microdata is affected by varying levels of sensitive information, or different sampling designs and analytical purposes. Turning to factors that are usually addressed after data collection when research files are prepared for dissemination, it is not known to what extent that disclosure risk and the scientific value of data is affected by the selection of different variables for release or application of various statistical techniques to limit disclosure. With a priori knowledge of these determinants, data producers will be able to anticipate how many and which respondents are at risk of disclosure, and adapt their data collection methods to protect them. Such adjustments will preserve and enhance the utility of the data for broad dissemination. Also, factors that affect data collection efficiencies can then be measured, allowing for the estimation of survey costs associated with modifying sampling designs to meet disclosure goals. Hence this project seeks to incorporate disclosure risk into the conceptual and empirical frameworks used in the evaluation of survey designs. In so doing, we first develop and validate models that predict the composition of survey data under different sampling designs. Next we develop measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs. Lastly we conduct simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

Public Health Relevance

Our project will increase the value and availability of scientific data by developing ways to assess, at the earliest stages of research, the risks of disclosing confidential information about study subjects. Detailed data about peoples' neighborhoods, characteristics, behaviors and health are essential for informing policy and advancing science. But a balance must be struck between providing easy access to such data and protecting confidential information. By evaluating such disclosure risks in the design phase of research, we will enhance investments in data collection and increase the value and availability of data on detailed subpopulations and their environments.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Research Project (R01)
Project #
4R01HD067184-04
Application #
8991247
Study Section
Special Emphasis Panel (ZRG1-HDM-Q (54))
Program Officer
Bures, Regina M
Project Start
2012-01-16
Project End
2017-11-30
Budget Start
2015-12-01
Budget End
2016-11-30
Support Year
4
Fiscal Year
2016
Total Cost
$444,272
Indirect Cost
$136,552
Name
University of Michigan Ann Arbor
Department
Type
Organized Research Units
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109