Machine learning currently offers one of the most cost-effective approaches to building predictive models from data. However, practical applications of machine learning have to cope with sparsity, noise, and uncertainty of data present. Against this background, this project aims to: 1) Introduce a minimax framework for pattern learning to unify various regularization models, and characterize a variety of data uncertainty (e.g. incomplete data, local nonrigid displacements, lighting variations) and the corresponding regularizations regarding their intrinsic properties (e.g. sparsity, locality, robustness); 2) Establish new feature subset selection methods and quantify group effects as well as confidence levels/intervals of selecting/discarding features; 3) Construct new methods of sparse grouping representation for resilience to labeling errors by exploiting effective regularizations and their properties. The project aims to explicitly model various classes of data uncertainty (distortions) within a minimax framework, to optimize pattern learning process based on the worst distortion(s) in a given class, and to exploit regularization properties (e.g. sparsity, robustness).
Anticipated results of the project include: (1) New models and methods for accounting for various classes of distortions and for finding optimal solutions under the worst distortion(s) over a given class; (2) New methods for selecting features with confidence analysis and for learning predictive models resilient to labeling errors; (3) Rigorous evaluation of the resulting methods on real-world data sets.
The new machine learning algorithms resulting from this research find applications in many areas that rely on predictive modeling from large data sets(e.g. medical analysis, earthquake modeling). All of the software tools developed in this project will be made available to the scientific community, educators and students. The project offers enhanced research-based training opportunities for graduate and undergraduate students, as well as outreach to K-12 students, and efforts aimed at broadening the participation of under-represented groups in Computer Science research and education.