How to popularize parallel programming is one of the Computing Research Association's Grand Research Challenges for the systems community. Being able to leverage the full potential of multi-core systems would put us back into exponential growth of usable performance, as well as lead to significant power savings. Facilitating correct and efficient execution of multithreaded programs would have a transformative effect in the IT industry, as it attacks a problem at the heart of the programmability issues in multiprocessor systems. With ubiquitous multicores and emerging parallel programs, the IT industry is now dealing with far harder reliability problems.
Concurrency errors are hard to understand, are typically non-deterministic and manifest themselves way past the point of their occurrence. Moreover, they have major implications for programming language semantics. Recent work on support for concurrency debugging has made good progress, but has often focused on best-effort techniques for bug detection with probabilistic guarantees. This research takes a direct approach to the problem: making concurrency errors fail-stop by delivering an exception before the error manifests itself. In other words, the system detects that a concurrency error is about to happen and will raise an exception before the code with an error is allowed to execute. The investigators call this mechanism concurrency exceptions. Concurrency exceptions will allow concurrency errors to be handled as conveniently as division by zero and segmentation fault.
This project aims at making parallel programming much easier by exploring a new way of dealing with concurrency errors. It explores questions with deep implications for multicore systems and how their software is written. What are the canonical concurrency errorconditions? How can we detect these conditions with zero performance cost and low complexity? Given that concurrency bugs involve multiple threads, which thread should get the exception? What are the global state guarantees offered by the system when the exception is delivered? What are possible recovery actions for concurrency exceptions? And finally, what are the systems implications of concurrency exceptions, in particular, how should the semantics of correct thread interaction be expressed on the programming languages level? Year four highlights were: (1) Mapping Low-Level Data-Races into High-Level Data-Races We explained why low-level data-race detectors are only useful for low-level programs out of the box and how they can miss data races and report false data races in high-level programs. To bring the benefits of low-level data-race detection to high-level languages, we designed low-level abstractable race detection (LARD), an extension of the interface between low-level data-race detectors and runtime systems that enables precise high-level data-race to be implemented using low-level data-race detection support. We implemented working fully-precise data-race exception support Java, using LARD to couple a low-level race detector and a modified Java virtual machine. We evaluated the precision of our detector and several naive low-level data-race detection implementations for Java, showing that unmodified precise low-level data-race detectors exhibit large numbers of missed races and false races in practice. This was ongoing from Year 3 and now is concluded, with an ASPLOS paper published and PhD defense done. (2) FIB: Fast Instrumentation via Bias The goal of instrumentation bias is to remove most costs of check-access atomicity in many cases by exploiting the same obser- vations and techniques as biased locking. Beyond the typical benefit of well-applied biased locking, we can enable additional optimization opportunities and avoid extra storage for an explicit lock. To improve the efficacy of biasing instrumentation, we use adaptive information and profiling. T his was ongoing from Year 3 and now is being wrapped up for conference paper submission. (3) Last Writer Slices and Communication Traps We design efficient system support for collecting last writer slices in executions of shared-memory concurrent programs. Last writer slices are an abstraction of a program's execution that dynamically tracks memory updates. Last writer slices provide provenance information for values in memory that can help with debugging. We build on top of last writer slices to develop communication traps (CTraps). CTraps uses low-overhead system support for monitoring and interposing on dynamic memory dependences corresponding to communication between threads. CTraps provides an extensible framework that exposes inter-thread communication events to CTraps applications. These can implement analyses that monitor and react to inter-thread communication. We fully implemented last writer slicing and CTraps. We also implemented a variant of CTraps that trades off some precision for some performance. We showed that CTraps is useful by implementing two applications from prior work. We showed that last writer slices help with debugging using case studies of real-world bugs. We evaluate the performance and precision of our designs on a set of server programs and standard benchmarks. Our results show CTraps imposes overheads low enough for use in production systems (0-15%) in many important use cases. This was ongoing from Year 3 and now is being prepared for submission. This was the last year in the grant after the extension and we can say without a doubt that we finished everything we promised in the proposal and also went beyond with the last writer slices and LARD. Luis Ceze and Dan Grossman (2014). LARD: Catching Program-Level Races with Low-Level Abstractable Race Detection. Architectural Support for Programming Languages and Operating Systems (ASPLOS).