Machine learning is a broad discipline with important application domains including computer vision, robotics, sustainability, and bio-surveillance. Its past successful evolution was heavily influenced by mathematical foundations developed for core problems of generalizing from labeled data. However, with the variety of applications of machine learning across science, engineering, and computing in the age of Big Data, re-examining the underlying foundations of the field has become imperative. This project aims to substantially advance the field of machine learning by developing foundations and algorithms for a number of important modern learning paradigms. These include interactive learning, where the algorithm and the domain expert engage in a two-way dialogue to facilitate more accurate learning from less data compared to the classic approach of passively observing labeled data; distributed learning, where a large dataset is distributed across multiple servers and the challenge lies in learning with limited communication; and multi-task learning, where the goal is to solve multiple related learning problems from less data by taking advantage of relationship among the learning tasks. The project also aims to develop new connections between machine learning and property testing, a flourishing area of theoretical computer science. In addition to solving fundamental questions in each of these directions, the project will highlight and leverage synergies between these topics.

More specifically, the key research directions of this project are: (1) Developing mathematical foundations for interactive learning by analyzing new forms of interactions between the learning algorithm and the domain expert that could lead to fast and efficient learning of difficult tasks by wisely exploiting the capabilities of domain experts. (2) Developing new algorithms for distributed learning, an important modern scenario where data is distributed among several locations. This project will develop protocols that trade off the various types of resources involved in such settings (computation, communication, and domain expertise). (3) Developing new algorithms with provable guarantees for learning multiple related tasks from limited amounts of labeled data and massive amounts of unlabeled data by wisely exploiting explicitly known or latent relationships between the given tasks. (4) Developing mathematical foundations for property testing, where the question is to quickly determine whether there exists a low-error rule of a desired form by using significantly less data than needed to actually find the rule itself. This project will specifically focus on active and distributed scenarios, with the goal of using testing as a way to improve learning efficiency itself.

Broader impacts include mentoring women in CS and actively organizing workshops and seminars in the interdisciplinary area.

Project Start
Project End
Budget Start
2014-08-01
Budget End
2017-07-31
Support Year
Fiscal Year
2014
Total Cost
$400,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213