Demand for small area estimates is growing heavily among a variety of researchers, analysts, decision-makers, and community planners, who use these data to advance current knowledge on issues affecting communities and the lives of their residents. Statistical agencies regularly collect survey data from small geographic areas, but are often prevented from publicly releasing these data in microdata form because of confidentiality risks associated with releasing small area identifiers. The main objective of this proposal is to develop a methodology for generating fully-synthetic micro-level datasets that permit valid estimation of small area statistics while protecting the confidentiality of respondent?s data. The proposed methodology will use well-known Bayesian hierarchical modeling techniques to generate simulated (or imputed) values based on an assumed prediction model for a commonly used set of variables found in public-use datasets. The modeling approach will account for different levels of variation occurring at the county- and state-level for the purposes of generating synthetic data that produce valid inferences for both levels of geography. Parametric and nonparametric modeling strategies will be considered, and small area inferences based on the actual data and synthetic data will be compared to evaluate the utility of the synthetic data methodology. In addition, two real-world data complexities typically ignored in synthetic data applications will be addressed in this research, including: 1) generating synthetic data for household- and individual-level attributes and maintaining the within-household composition structure; and 2) accounting for complex sample design features (e.g., unequal probabilities of selection, stratification, and clustering). The proposed research will break new ground by testing an alternative method of disseminating public-use data suitable for small area estimation while enhancing confidentiality protection. If the proposed methods prove to be successful, then the current practice of requiring data users to access small area data within restricted data center facilities may be avoided. By releasing synthetic microdata with small area identifiers, users will be able to perform customized small area analysis for levels of geography that are not currently permitted without restricted data access. This innovation may help meet the growing demand for small area estimates and increase the sheer volume of small area estimates produced using microdata from statistical agencies.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
0918942
Program Officer
Cheryl L. Eavey
Project Start
Project End
Budget Start
2009-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2009
Total Cost
$6,000
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109