Elements: Development of Assumption-Free Parallel Data Curing Service for Robust Machine Learning and Statistical Predictions

Cho, In Ho; Kim, Jae-Kwang

Abstract

Large, incomplete datasets create major challenges for statistical prediction in research. This project will develop a data curing service that is able to manage large, incomplete, and diverse datasets, and would provide uncertainty measures for the cured data. The project identifies and collaborates with several communities where this data service is central to scientific research, including civil engineering, building science, urban energy, and social science.

The effort creates a parallel data curing service, provides uncertainty measures for the cured data, and develops supplementary imputing algorithms. The team develops a data curing platform with imputation for incomplete, heterogeneous data; robust machine learning (ML) and statistical predictions would be established by developing an easy-to-use, general-purpose, large data-friendly imputation program. The focus is on a novel combination of three established imputation methods: two-level finite mixture model-based imputation (FMMI), fractional hot deck imputation (FHDI), and Gaussian mixture model-based imputation (GMMI), for which parallel implementations in R would also be provided.

This award by the NSF Office of Advanced Cyberinfrastructure is jointly funded by the Established Program to Stimulate Competitive Research (EPSCoR).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Advanced CyberInfrastructure (ACI)
Type: Standard Grant (Standard)
Application #: 1931380
Program Officer: Amy Walton

Project Start
Project End
Budget Start: 2019-09-01
Budget End: 2022-08-31
Support Year
Fiscal Year: 2019
Total Cost: $592,386
Indirect Cost

Elements: Development of Assumption-Free Parallel Data Curing Service for Robust Machine Learning and Statistical Predictions
Cho, In Ho Kim, Jae-Kwang
Iowa State University, Ames, IA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments