With the recent dominance of computers with many parallel cores, and the widespread use of large data centers, the need to supply high-level, simple and general approaches to developing parallel codes for these machines has become critical. There are many challenges to effectively developing such parallel codes, but certainly a principle difficulty is dealing with communication costs - or when looked at from the other side, taking advantage of locality. Unfortunately this challenge has only become more difficult on modern parallel machines that have many forms of locality - network latency and bandwidth, shared and distributed caches, partitioned memories, and secondary storage in a variety of organizations.

To address this problem this project is developing an approach for programmers to understand locality in their parallel code without needing to know any details of the particular parallel machine they are using. In the approach programmers express the full dynamic parallelism of their algorithm without describing how it is mapped onto processors, and are given a simple high-level model for analyzing locality. The research is based on the conjecture that locality should be viewed by the programmer as a property of the algorithm or code and not the machine. To ensure that real machines can take proper advantage of the locality analyzed in the model, the research is developing scheduling approaches that map the "algorithm locality" onto various forms of machine locality, including shared caches, distributed caches, trees of caches, and distributed memory machines. The results of the research include both theoretical bounds for such schedulers on specific machine organizations, and experimental validation on a set of benchmark applications.

Project Report

Because of the high cost of communication in parallel machines locality has always played a key role in the performance of parallel applications. However, programming for locality has proven to be very difficult---having the programmer control the layout, synchronization and scheduling of the computation and data to deal with locality can be extremely cumbersome, dominating the time required to develop, analyze, verify, maintain, and update codes. The outcome of the research were We developed cost models for easily understanding the locality in parallel algorithms without having to understand the details of any particular machine. The key model is what we call the Parallel Cache Model. It assigns a cost to computations described in a high-level nested parallel programming model, but captures both "spacial" and "temporal" locality. We developed a variety of algorithms in the model. We showed that some standard algorithms are already efficient in the model (e.g. matrix multiply) but also developed some new algorithms (e.g. a variant of sample sort). We developed theoretically efficient schedulers that map the high level Parallel Cache model onto a variety of parallel machine configurations with hierarchical memory organizations. We were able to show some strong bounds on how the high-level costs map onto the low-level costs. We implemented a scheduler that incorporates many of the theoretical ideas and experimented with a variety of algorithms using the scheduler. Our results show that indeed the cost model does capture locality well, and that the schedulers improve performance over standard schedulers (e.g. work-stealing schedulers). As part of the work we published several papers on the topic, and developed a scheduler that we have made public. The work has lead to newer ongoing work on incorporating automatic memory management into the model and scheduler. We also developed course material on "cache efficient parallel algorithms" that has been added to a graduate course on applied algorithms. We have presented the work at a variety of venues.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1018188
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2010-08-01
Budget End
2014-01-31
Support Year
Fiscal Year
2010
Total Cost
$449,055
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213