Large, incomplete datasets create major challenges for statistical prediction in research. This project will develop a data curing service that is able to manage large, incomplete, and diverse datasets, and would provide uncertainty measures for the cured data. The project identifies and collaborates with several communities where this data service is central to scientific research, including civil engineering, building science, urban energy, and social science.

The effort creates a parallel data curing service, provides uncertainty measures for the cured data, and develops supplementary imputing algorithms. The team develops a data curing platform with imputation for incomplete, heterogeneous data; robust machine learning (ML) and statistical predictions would be established by developing an easy-to-use, general-purpose, large data-friendly imputation program. The focus is on a novel combination of three established imputation methods: two-level finite mixture model-based imputation (FMMI), fractional hot deck imputation (FHDI), and Gaussian mixture model-based imputation (GMMI), for which parallel implementations in R would also be provided.

This award by the NSF Office of Advanced Cyberinfrastructure is jointly funded by the Established Program to Stimulate Competitive Research (EPSCoR).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1931380
Program Officer
Amy Walton
Project Start
Project End
Budget Start
2019-09-01
Budget End
2022-08-31
Support Year
Fiscal Year
2019
Total Cost
$592,386
Indirect Cost
Name
Iowa State University
Department
Type
DUNS #
City
Ames
State
IA
Country
United States
Zip Code
50011