A primary mission of many federal statistical agencies is to disseminate data to the public for secondary analysis. However, dissemination is increasingly challenging due to risks of unintended confidentiality breaches, nonresponse and faulty data, and the costs of mounting surveys that collect many detailed attributes. The Triangle Census Research Network (TCRN) will develop broadly applicable methodologies that will transform and improve data dissemination practice in the federal statistical system. In particular, the TCRN will advance methodologies and tools for disseminating public use data with high quality and acceptable risks of confidentiality breaches by developing theory and methodology for releasing multiply imputed, synthetic datasets based on flexible, nonparametric Bayesian models built specifically for high-dimensional data with longitudinal and multi-level aspects. TCRN will develop approaches for including survey weights in redacted data that can improve statistical estimation without leading to confidentiality disclosures. The project also will develop the framework for computer systems that provide secondary analysts with feedback on the quality of inferences from redacted data, and it will develop theory and methodology for creating synthetic contingency tables based on fusions of linear programming and Bayesian modeling. The TCRN will improve methodology and practice for handling missing and faulty data by developing frameworks for simultaneous imputation of missing data and editing of faulty data by integrating paradigms from statistics and operations research. The project also will develop nonparametric Bayesian methodology for multiple imputation of missing data in high dimensions with longitudinal and multi-level aspects. Finally, to enhance agencies' abilities to integrate information from multiple sources, the TCRN will develop methods that agencies and secondary analysts can use to properly account for uncertainty in inferences in imperfect record linkage settings, as well as to pass on that uncertainty in public use data products via multiply imputed datasets. TCRN also will develop statistical approaches to combining information from multiple data sources that do not depend on record linkage.

The methodological developments of the TCRN will transform the way statistical agencies handle data dissemination with regard to statistical disclosure limitation, missing data, and integrating information. These developments will offer federal agencies options for releasing data products with increased utility, leading to advances in science and improved policy making. The TCRN will apply the methodologies to major Census Bureau data products, thereby improving the hundreds of secondary analyses of these datasets. The interdisciplinary team of the TCRN will use these data products to answer questions in aging, economics, and social welfare that have important implications for policy making. As an integral part of the research, the TCRN will involve and offer educational opportunities to postdoctoral fellows, graduate students, and statisticians at federal agencies, thus developing and training future leaders in data dissemination research and practice. This activity is supported by the NSF-Census Research Network funding opportunity.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1131897
Program Officer
Cheryl L. Eavey
Project Start
Project End
Budget Start
2011-10-01
Budget End
2016-09-30
Support Year
Fiscal Year
2011
Total Cost
$3,519,355
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705