Applying Resampling Techniques to Large Data Sets

Anderson, Albert

Abstract

This project will investigate the feasibility and merit of applying bootstrapping and similar resampling strategies to the analysis of relatively large census and survey microdata files. While bootstrapping has in general been applied most fruitfully to small sample research designs, new technology now allows resampling and bootstrapping to be effectively applied to much larger data sets than have been previously analyzed using the techniques. In particular, we will focus on two aims: (l) determining confidence intervals for frequency counts, percentages, and summary statistics for basic multivariate analyses from large census and survey data files; and (2) assessing the potential for resampling techniques to assist in masking sensitive information extracted from data sets in which confidentiality of the respondents (disclosure avoidance) is an important concern and where minimal perturbing of the data is desired. A computational tool utilizing an existing parallel high performance computing environment and optimized for resampling will be created to facilitate the implementation and testing of resampling techniques such as bootstrapping on data sets of 10,000-50,000 records.

Proposed Commercial Applications

Incorporating resampling into our own information system, PDQ-Explore, will increase its value to data users in the fields of social science, health care, community services, and commercial information. Licensing the software to other information providers who need confidence intervals will broaden our customer base. Protecting confidentiality will allow us to tap more data sources and make more data sets available to more users, in research, education, government, and commerce.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 2R44HD037311-02
Application #: 6140490
Study Section: Special Emphasis Panel (ZRG1-SNEM-4 (02))
Program Officer: Casper, Lynne M

Project Start: 1999-01-07
Project End: 2002-07-31
Budget Start: 2000-08-10
Budget End: 2001-07-31
Support Year: 2
Fiscal Year: 2000
Total Cost: $365,749
Indirect Cost

Institution

Name: Public Data Queries, Inc.
Department
Type
DUNS #

City: Chelsea
State: MI
Country: United States
Zip Code: 48118

Related projects


NIH 2001 R44 HD	Applying Resampling Techniques to Large Data Sets Anderson, Albert F. / Public Data Queries, Inc.	$350,380
NIH 2000 R44 HD	Applying Resampling Techniques to Large Data Sets Anderson, Albert F. / Public Data Queries, Inc.	$365,749

Comments

Be the first to comment on Albert Anderson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: