Machine learning currently offers one of the most cost-effective approaches to building predictive models from data. However, practical applications of machine learning have to cope with sparsity, noise, and uncertainty of data present. Against this background, this project aims to: 1) Introduce a minimax framework for pattern learning to unify various regularization models, and characterize a variety of data uncertainty (e.g. incomplete data, local nonrigid displacements, lighting variations) and the corresponding regularizations regarding their intrinsic properties (e.g. sparsity, locality, robustness); 2) Establish new feature subset selection methods and quantify group effects as well as confidence levels/intervals of selecting/discarding features; 3) Construct new methods of sparse grouping representation for resilience to labeling errors by exploiting effective regularizations and their properties. The project aims to explicitly model various classes of data uncertainty (distortions) within a minimax framework, to optimize pattern learning process based on the worst distortion(s) in a given class, and to exploit regularization properties (e.g. sparsity, robustness).

Anticipated results of the project include: (1) New models and methods for accounting for various classes of distortions and for finding optimal solutions under the worst distortion(s) over a given class; (2) New methods for selecting features with confidence analysis and for learning predictive models resilient to labeling errors; (3) Rigorous evaluation of the resulting methods on real-world data sets.

The new machine learning algorithms resulting from this research find applications in many areas that rely on predictive modeling from large data sets(e.g. medical analysis, earthquake modeling). All of the software tools developed in this project will be made available to the scientific community, educators and students. The project offers enhanced research-based training opportunities for graduate and undergraduate students, as well as outreach to K-12 students, and efforts aimed at broadening the participation of under-represented groups in Computer Science research and education.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1218712
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-09-01
Budget End
2017-08-31
Support Year
Fiscal Year
2012
Total Cost
$254,661
Indirect Cost
Name
Southern Illinois University at Carbondale
Department
Type
DUNS #
City
Carbondale
State
IL
Country
United States
Zip Code
62901