To take full advantage of the parallelism offered by a multicore machine, one must write parallel code. Writing parallel code is difficult. Even when one writes correct code, there are numerous performance pitfalls. For example, an unrecognized data hotspot could mean that all threads effectively serialize their access to the hotspot, and throughput is dramatically reduced.
This project aims to provide a generic framework for performing certain kinds of concurrent operations in parallel. Infrastructure is provided to perform those operations in a scalable way over the available threads in a multicore machine, automatically responding to hotspots and other performance hazards. The goal is not to squeeze the last drop of performance out of a particular platform. Rather, with the planned system a programmer can, without detailed knowledge of concurrent and parallel programming, develop code that efficiently utilizes a multicore machine.
The project involves the development of algorithms and data structures designed for the efficient parallel execution of generic code fragments. The primary focus is on data intensive operations as would typically be found in an in-memory database engine. Critical research questions include how to design generic multi-threaded operators that can be applied to a range of computations, how to avoid cache thrashing, and how to implement the framework in a way that works on a variety of hardware platforms. Performance improvements in throughput of an order of magnitude are expected relative to naive solutions that suffer from contention. The project aims to achieve performance close to that of hand-tailored expert-written parallel code, with far less coding effort.
This project has immediate applications in both commercial and public-domain database systems where performance improvements would enhance the experience of database system users, and reduce hardware and energy requirements for a given level of performance.
Programmability improvements would allow programmers without expertise in parallel programming to effectively use multicore machines. The project also provides the focus for an advanced-level course on database system implementation for multicore machines. The software infrastructure will be made available for research use by others.