EAPSI:Facilitating cooperation between unreliable processors without direct communication

Abstract

Modern high-performance computers have many different processors that must work together to complete incoming tasks. These processors cannot communicate efficiently, may run at different speeds, and can break down without warning. This research will implement a new method that allows multiple, unreliable processors to complete tasks without communicating with each other directly. The processors have no prior knowledge of when the tasks will arrive, and must handle them as they come. Dividing work in this setting is a fundamental task in high-performance computing, and this research has the potential to improve performance of real systems. This work will be conducted at the National University of Singapore in collaboration with Professor Seth Gilbert, one of the inventors of this method that will provide invaluable assistance in implementation of this research.

This data structure, the dynamic to-do tree, is the first task mapping data structure for asynchronous processors and online tasks that achieves theoretical guarantees (within log3 m of optimal where m is the maximum number of concurrent tasks). The goal will be to verify that the dynamic to-do tree also achieves good practical performance. This data structure will be compared with the current state of the art, hopefully showing a significant improvement. These results may have broader implications for less restricted versions of the problem (i.e. the tasks are given offline) and to other similar tasks such as mutual exclusion and distributed clocks. This NSF EAPSI award is funded in collaboration with the National Research Foundation of Singapore.

Project Report

Initially, the main focus of our project was to implement a dynamic to-do tree. This data structure has many potential applications such as task mapping in highly parallel systems. My initial meetings with Seth focused on how we should implement the data structure---what parts should be simulated, what parts should be run explicitly, and how we could achieve the best performance. Ultimately, we decided that the most efficient way to implement the data structure would be to use Transactional Memory. However, we did not have access to hardware transactional memory at the beginning of the summer, so we decided to simulate our results with software transactional memory simulators. Unfortunately, it quickly became clear that the overheads of software transactional memory are too significant and would alter our results. Meanwhile, the applications of the to-do tree already have many well-optimized (though fundamentally different) solutions---therefore, to obtain decent results, the to-do tree would have to be optimized for a particular system. For the remainder of the summer we chose to focus on our other projects, which showed more potential of short-term results. Very-large-scale integration (VLSI) layouts determine how to lay out transistors and wires on a computer chip. Putting well-connected transistors close to each other, and minimizing the average length of wire on the chip can greatly improve performance. Generally these chips have very rules about how wires may be laid out---for example, it may be that only two wires can cross at a given point, or that wires can only cross at right angles. This topic has been studied extensively from a computational view. However, to our knowledge, we are the first to study the case when these layouts can change over time. I collaborated with Seth Gilbert locally, and with Shikha Singh, Michael Bender, and Manoj Gupta remotely, to work on dynamic (changing over time) VLSI layouts. We used weight balance techniques, combined with standard amortization arguments, to achieve improved running times for adding wires, and excellent running times for adding or deleting transistors. Our methods are very general. These performance guarantees apply to the number of operations used in designing the chip, the number of wires that must be moved on the chip, and the area of the chip that must be altered. Thus, the data structure is useful for many contexts---for example, whether a company wants to save money on chip design, or reduce the number of changes to the chip itself. Bloom Filters are a ubiquitous data structure with applications in databases, cloud computing, and big data. In short, they massively compress a data set so that each member can be stored using only a handful of bits. They can answer membership queries efficiently, but they have a small, bounded false positive rate which can be set arbitrarily, with only a small increase in the required space. My second project in Singapore was to extend Bloom Filters to handle range queries. Unfortunately, recently after I had arrived, a colleague of mine informed me of a result that said that this was essentially impossible. Any data structure achieving Bloom Filter-like properties (in particular, the bounded false positive guarantee) for range queries requires nearly as much space as storing the original data set. Thus, we modified the problem: if a small range Bloom Filter is impossible with the classical guarantees, is it possible to create a relaxed data structure that is still useful in practice? We focused our attention on databases. Generally, a Bloom Filter is stored in a cache of size M. When the database receives a membership query, it does a quick lookup in the cache of size M; it only does a full database search if the element is in the Bloom Filter. Note that this greatly improves the running time without hurting accuracy: if there is a false positive in the Bloom Filter, it will be detected by the database search. We decided to relax the guarantee on the number of false positives, while retaining the guarantee that there cannot be false negatives. Thus, a data structure under our parameters can greatly improve the running time of range membership queries to databases. The only issue is that this improvement is not paramaterized as it is in Bloom Filters. Using a previous result in online algorithms performance, we were able to develop a data structure that does as well as any Range Bloom Filter that does not change over time. In other words, our data structure changes as queries are made, and its final performance is as good as any data structure that can see all queries ahead of time, but cannot change while it processes them. We are extending this result to provide stronger guarantees, and handle database elements with different storage sizes.

Funding Agency

Agency: National Science Foundation (NSF)
Application #: 1414973
Program Officer: Anne Emig

Project Start
Project End
Budget Start: 2014-06-01
Budget End: 2015-05-31
Support Year
Fiscal Year: 2014
Total Cost: $5,070
Indirect Cost

EAPSI:Facilitating cooperation between unreliable processors without direct communication

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments