Future processor chips are expected to have hundreds or even thousands of processor cores. To take advantage of this massive computing power, programmers need to parallelize their applications. Parallel programming, however, is notoriously difficult. Almost all the production concurrent software systems used today contain bugs costing billions of dollars. To address this challenge, this research is developing parallel runtime mechanisms that could make it possible for even buggy software to run correctly in a production system.
The fundamental problem with the current parallel programming models is that they expose an unbounded number of thread interleavings to the parallel runtime system, and a majority of the interleavings in a production system remain untested. This research is exploring two directions to avoid incorrect interleavings from manifesting in a production run. The first approach uses a sampling-based low overhead data race detector for detecting incorrect interleavings, which are then avoided. The second approach constrains production run thread interleavings to a set of tested interleavings, which could provide comprehensive immunity against most types of concurrency bugs. Software tools developed as part of this research will help software developers and researchers. Students will also use these productivity tools in their course projects.