There is a rising demand to scale application performance by distributing computation across many machines in a data-center. It is difficult to write efficient and robust parallel programs in the data-center setting because programmers need to worry about reducing communication overhead while handling possible machine failures.

This project investigates a new data-centric parallel programming model, called Piccolo, that can simplify the construction of in-memory data-center applications such as PageRank, neural network training etc.

In-memory applications can hold all their intermediate states in the aggregate memory of many machines and benefit from sharing these intermediate states between machines during computation. Traditionally, these applications have been built using low-level communication-centric primitives such as MPI, resulting in significant programming complexity. The recently popular MapReduce and Dryad also do not fit well with these applications because their data flow programming model lacks support for shared states.

Unlike data flow models, Piccolo explicitly supports the sharing of mutable, distributed states via a key/value table interface. Piccolo makes sharing efficient by optimizing for locality of access to shared tables and automatically resolving write-write conflicts using user-defined accumulation functions. As a result, Piccolo is easy to program for, enables applications that do not fit into MapReduce, and achieves good scalable performance.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
1065114
Program Officer
Marilyn McClure
Project Start
Project End
Budget Start
2011-07-01
Budget End
2015-06-30
Support Year
Fiscal Year
2010
Total Cost
$330,171
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139