This project will investigate the feasibility and merit of applying bootstrapping and similar resampling strategies to the analysis of relatively large census and survey microdata files. While bootstrapping has in general been applied most fruitfully to small sample research designs, new technology now allows resampling and bootstrapping to be effectively applied to much larger data sets than have been previously analyzed using the techniques. In particular, we will focus on two aims: (l) determining confidence intervals for frequency counts, percentages, and summary statistics for basic multivariate analyses from large census and survey data files; and (2) assessing the potential for resampling techniques to assist in masking sensitive information extracted from data sets in which confidentiality of the respondents (disclosure avoidance) is an important concern and where minimal perturbing of the data is desired. A computational tool utilizing an existing parallel high performance computing environment and optimized for resampling will be created to facilitate the implementation and testing of resampling techniques such as bootstrapping on data sets of 10,000-50,000 records.

Proposed Commercial Applications

Incorporating resampling into our own information system, PDQ-Explore, will increase its value to data users in the fields of social science, health care, community services, and commercial information. Licensing the software to other information providers who need confidence intervals will broaden our customer base. Protecting confidentiality will allow us to tap more data sources and make more data sets available to more users, in research, education, government, and commerce.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
5R44HD037311-03
Application #
6388051
Study Section
Special Emphasis Panel (ZRG1-SNEM-4 (02))
Program Officer
Casper, Lynne M
Project Start
1999-01-07
Project End
2003-07-31
Budget Start
2001-08-01
Budget End
2003-07-31
Support Year
3
Fiscal Year
2001
Total Cost
$350,380
Indirect Cost
Name
Public Data Queries, Inc.
Department
Type
DUNS #
City
Chelsea
State
MI
Country
United States
Zip Code
48118