At the core of many of the tasks we need computers to solve lie a collection of important, though difficult, optimization problems. These problems can be hard to solve optimally, and as a result they have typically been attacked through two distinct approaches. The first is to construct algorithms with provable worst-case approximation guarantees. This approach has the advantage of provable guarantees but the disadvantage that these guarantees can be fairly poor. The second approach is to develop natural heuristics and to test them on benchmark problems. This approach has the advantage of producing techniques that can work well in practical settings similar to the benchmarks, but the disadvantage that the conditions needed for good performance are often not well understood. This project aims to bridge the gap between these approaches: to develop new theoretical foundations for the analysis of non-worst-case guarantees, as well as new tools for the design of formally-justified heuristics.

Specifically, this project will pursue four main directions. The first is the analysis of implicit assumptions in problem formulations. Often the objective function used in an optimization problem is a proxy for a goal that cannot be directly measured by the algorithm, and thus the instance already needs to have the property that these two are related. This relation is often not explicitly stated and yet can potentially provide usable structure an algorithm can use. The second is analysis of natural conditions on inputs due to how they are constructed. For example, if certain parts of the instance given to the algorithm are the result of noisy measurements, then the algorithm can be flexible to their exact values. The third is development of fast methods to test the "niceness" of an instance, which can then be used in the selection of an appropriate algorithm for that instance. Finally, the last direction is the development of algorithms that provide a smooth transition between their performance on stable and unstable instances with applications to privacy-preserving data analysis.

Project Report

This project developed new techniques to advance the state-of-the-art for difficult optimization problems involving both analyzing data and helping to preserve the privacy of individuals whose data is being analyzed. One advance developed in this project is an algorithm for answering sensitive questions about social networks in ways that rigorously preserve the privacy of the individuals involved, yet are able to return fairly accurate answers when the network satisfies appropriate stability properties. In general, for the types of questions we consider, it is not possible to give accurate answers while also adhering to a strict condition known as differential-privacy, because the answers could be sensitive to small changes in the network. In this work, we allow the analyst (who is asking the question) to specify conditions that he or she believes the network satisfies and under which the answers would not be so sensitive. While our algorithm can neither confirm nor deny whether these conditions hold (because doing so might violate the privacy guarantees) it is able to produce answers such that if the analyst is correct, then the answers are indeed accurate. Our work also developed new privacy guarantees for a widely-used class of "dimension-reduction" methods. These methods are typically used to make a large dataset smaller and more manageable, but we show that they also enjoy strong privacy-preservation properties. Another advance in this project was the development of algorithms for quickly estimating the complexity of certain machine learning tasks, in order to guide how data is collected and the internal representation of data used by machine learning algorithms. This project also developed a new understanding of how relations among different machine learning tasks can be used to develop algorithms that achieve high accuracy from just a small amount of human-labeled data. Finally, new algorithms for clustering data as well as new methods for estimating the accuracy of learned rules were developed in this work.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1116892
Program Officer
Balasubramanian Kalyanasundaram
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$358,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213