Machine Learning algorithms are ubiquitous in computer science with important applications to data-mining, classification, and ranking. These algorithms are typically applied to data sets that contain a sizable fraction of noisy training examples. This project focuses on developing learning algorithms that can succeed in the presence of noisy data sets that have been corrupted in a potentially adversarial or malicious manner. Algorithms that can tolerate these types of worst-case noise are critical for the depolyment of complex machine learning systems, as real-world data sets (for example, data related to spam detection) are often noisy in unpredictable ways. Previous work on learning in the presence of noise focused on models with strong assumptions on how the noise is applied (e.g., independently for each data point).

The intellectual merit of this project lies in understanding the computational complexity of optimization problems associated with learning in worst-case noise models. More specifically, the project will design algorithms that can find a classifier whose error is competitive with the best function from a large class of concepts. In order to design these algorithms, the project will prove new structural results on how well classes of Boolean functions can be approximated with respect to a variety of well-studied probability distributions. Additionally, the project will explore hardness results for learning functions with respect to adversarial noise via reductions to notoriously difficult problems in cryptography and computational complexity.

The broader impact of this project is the potential to realize more powerful classification tools in a variety of application areas in the sciences such as computational biology (e.g., protein detection) and linguistics (e.g., text categorization). Additionally, the PI will develop a new graduate course that furthers the relationship between computational and statistical methods in machine learning theory.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1018829
Program Officer
Jack S. Snoeyink
Project Start
Project End
Budget Start
2011-09-01
Budget End
2016-09-30
Support Year
Fiscal Year
2010
Total Cost
$499,864
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78759