Machine Learning Theory is concerned with designing algorithms with provable guarantees for learning from data, and understanding what types of guarantees are inherently achievable. This turns out to have a number of important connections to key problems in Algorithmic Game Theory, Database Privacy, and Clustering. This project intends to further develop and expand these connections in order to address fundamental questions in all three areas.

A major thrust of Algorithmic Game Theory has been quantifying the inefficiency self-interested behavior via the notion of "Price of Anarchy". However, the assumption made that users behave according to a Nash equilibrium can be overly optimistic, especially in large or information-limited settings (even for a centralized controller these equilibria can be computationally hard to find). Instead, simple guarantees achieved by online learning algorithms can directly model a minimal concept of self-interested behavior, and make sense even in large, information-poor arenas. This work will develop an alternative approach to analyzing systems of self-interested entities based on this idea.

In the area of Database Privacy, it is clear that protecting privacy of those involved in a database while still providing useful statistics is a challenging task. Unfortunately, while positive results have been achieved for interactive query-answering mechanisms, most results for actually releasing a database satisfying the stringent condition known as differential privacy have been negative. Learning Theory, however, suggests a possible approach: rather than attempting to preserve all possible statistics, one can instead define interesting classes of statistics (much like concept classes in PAC learning) and ask: when can one approximately preserve all statistics in this class while still satisfying privacy? We have been able to produce computationally inefficient mechanisms that apply to a broad collection of such classes, and one main thrust of this work is to develop computationally tractable procedures.

Finally, while many different algorithms have been developed for data clustering, the theory of what information about data is sufficient to be able to cluster it accurately remains not well understood. Theoretical models either make strong statistical assumptions (such as mixtures of Gaussians) or else aim to optimize graph-based objectives that may not be directly related to error rate. In this work we aim to create an analog of the PAC learning model for clustering that is able to directly address this issue. In recent work we have made initial progress in this direction and this project aims to more fully develop this approach.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0830540
Program Officer
Balasubramanian Kalyanasundaram
Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$368,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213