Algorithms for learning to classify data have important applications in almost every area of computer science including data mining, computer vision, compiler design, operating system design, speech recognition, computational biology, computational game theory, computational neuroscience, and traditional algorithm design. A common, simplifying assumption in learning theory is that labeled data can be classified by a halfspace in many dimensions. The intellectual merit of this research involves understanding the computational complexity of fundamental halfspace-based learning tasks. The investigators focus on the following three challenges: 1) develop algorithms for learning a halfspace in the presence of noise; 2) efficiently learn intersections of halfspaces; 3) prove hardness results for learning various halfspace-based concept classes. In terms of broader impact, as alluded to above, this research gives new tools for practitioners in a variety of fields including computational biology, economics, and statistics.

The research relies heavily on new techniques from approximation theory and harmonic analysis to give provably efficient algorithms for learning halfspaces (also known as linear threshold functions) in malicious noise models with respect to many natural distributions. A recent polynomial-regression algorithm due to the principal investigator and his colleagues for agnostically learning halfspaces generalizes previous work in Fourier-based learning. The investigators study other applications of these techniques to discover new algorithms for learning intersections of halfspaces and, more generally, arbitrary convex sets with respect to natural distributions. In addition, the investigator applies properties of new lattice-based cryptosystems to show the intractability of learning intersections of halfspaces in distribution-free models.

Project Report

This award addresses fundamental theoretical problems underlying the field of machine learning and its relationship to computer science. Algorithms for making predictions based on large data sets are now ubiquitous in science and have many applications. For example, the Netflix challenge, where a system needed to be devised to predict your future movie choices, relied heavily on algorithms from machine learning systems. Surprisingly, our understanding as to why and how these machine-learning systems work, at a detailed level, is lacking. As a further example, many machine learning problems that are known to be computationally intractable in the worst case seem do not seem difficult to solve in practice. Why is this so? Can we devise a mathematical model that explains this phenomenon and also leads us to new, interesting algorithms and/or intractability results? One major thrust of this project has been that the underyling geometry and distribution of machine learning problems can dictate its complexity. Perhaps the most basic of all machine learning problems is the problem of learning a halfspace. Imagine dividing space into two halves via a giant plane (in many dimensions this would be a hyperplane). The plane (or hyperplane) is unknown to a learner. Instead, the learner is given points in space and told whether they are in one half (a positive example) or the other half (a negative example). Given many of these points chosen randomly from some distribution, a learner can run an algorithm and identify the dividing plane (or hyerplane). One this divider is found, the learner is able to label future points by himself. Many algorithms are known for this problem, and they date back to work done in the 50s by neuroscientists (Block, Novikoff and Rosenblatt in particular). Since then, these algorithms have been used inside far more complicated learning systems. Fundamental questions about this simple problem remain open, however. For example, what if the learner is given points whose labels are noisy (so some points are labeled incorrectly). What if space is not divided perfectly by a single plane, but instead is divided up by multiple planes? It turns out that very little is known about these questions. The main agenda in this award was to solve these fundamental problems about halfspace-based learning. The PI is pleased to announce that much progress has been made. In particular, 1) The first provably correct algorithms for learning intersections of hyerplanes with respect to many natural distributions. In fact, new results are given for learning any convex sets with respect to Gaussian distributions. 2) These algorithms succeed even if the labels are noisy. 3) New hardness results that explain why many of these problems are hard in the worst-case. More generally, this award develops an entire theory of learning in the agnostic setting, a challenging model of learning where an adversary can corrupt labeled examples arbitrarily.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0643829
Program Officer
Dmitry Maslov
Project Start
Project End
Budget Start
2007-02-01
Budget End
2014-01-31
Support Year
Fiscal Year
2006
Total Cost
$400,000
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78712