We are entering an Industrial Revolution in the production of information. While in the past data was "handmade" by typing on keyboards, today data are increasingly manufactured by machines: sensors, cameras, software logs, etc. When harnessed in a timely manner, these data can have significant positive impact in many contexts, including early warning and rapid response in natural disasters, air quality monitoring, and improved Internet security. To provide useful information in these contexts, computers in multiple locations must coordinate over networks, because the data are both widely distributed and massive, and cannot be "warehoused" at a single location in a timely manner. Worse, sensor data is typical "noisy" or erroneous in various ways, so statistical methods must be employed to convert the raw "evidence" data into probabilistically reliable information.

In this project we develop new techniques to integrate statistical inference methods from AI with overlay network algorithms developed for peer-to-peer and wireless settings. We design new overlay network algorithms customized for distributed inference. We also develop network-aware inference algorithms that can trade off inference approximation quality for communication efficiency and robustness to network failure. Finally, we explore the use of a high-level declarative language for programming both the networking and inference logic. The high-level language enables us to investigate compilation techniques to co-optimize the inference and overlay network tasks for maximal utility. We prototype and evaluate our ideas via open-source implementations deployed on testbeds like Emulab and Planetlab. Software and research papers are disseminated at http://declarativity.net.

Project Report

The focus of this project is on developing methods for analyzing data efficiently using multicore and large-scale computer clusters. These types of computers have become the dominant computer architectures, but programming these systems is challenging both in terms of designing distributed algorithms and dealing with parallel systems challenges, such as data races, robustness, scheduling and optimization of communication. Our declarative approach has the potential of both simplifying the design of distributed algorithms and of the software system that supports these algorithms, alleviating the need for algorithm designers to focus on these lower level challenges. As part of this effort, we started the GraphLab project for large-scale machine learning. This effort has lead to a major open-source software effort with tens of thousands of downloads. We have held two GraphLab workshops in the last couple of years. The first one in 2012 had 318 people in attendance. The second one in 2013 had 570 people. We have also developed Bloom, a programming language we have designed for cloud computing and other distributed systems. Bloom removes traditional mismatches between distributed software and platforms, enabling powerful coding and code analysis without resorting to exotic syntax. Bloom was recognized by MIT Technology Review magazine as one of the TR10 award winners in 2010: the 10 technologies most likely to "change our world". We also had a very successful education plan. For example, PI Guestrin's Machine Learning Class was the most popular graduate class at CMU, with about 120 registered students. The class has greatly benefited from class projects derived from data collected by the PI and other researchers, and by data given to the PI through industry collaborators. Guestrin's class slides have been requested by a number of other instructors worldwide, and have been incorporated into their classes. We used Bloom as the core vehicle to teach a new hands-on undergraduate distributed systems course at UC Berkeley in 2012 and 2013 (http://programthecloud.github.io). Class time blended traditional lectures with live authoring and analysis of Bloom code for a wide variety of tricky distributed systems protocols that would have been too complex and unwieldy to present in traditional languages. One important aspect of this project is to support under-represented minority students in science. We had undergraduates working on this and related projects, supported through CMU's IFYRE program, which helps expose undergraduates from under-represented minorities to research. We have also hosted a undergraduate from Caltech through a program for introducing female students to undergraduate research. In this project, the PIs co-advised two PhD students, one based at each institution. Both completed PhDs in the last year -- one is now an assistant professor at UMass Amherst, the other a member of staff at Facebook. In addition, one of Guestrin's PhD students is now at Berkeley as a postdoc with Hellerstein and others.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0803690
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2008-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2008
Total Cost
$450,000
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704