Computer technology is rapidly permeating all spheres of society. A computer system that affects the lives of thousands or millions of people creates a massive community of users who have an interest in the correct behavior of that system. Widespread interconnectivity means that we now have the ability to tap this potential.
This work confronts the challenge of diagnosing and mitigating concurrency bugs. A suite of novel instrumentation schemes will be developed for monitoring thread interleaving patterns. Coupled with statistical debugging models developed previously, this lets developers identify bad thread interleavings which constitute root causes of program failure. A new approach to coordinated cross-thread random sampling keeps overheads low while still providing ample data for diagnosis. Static analysis will play a role to further reduce instrumentation load. Prior statistical debugging work was content with diagnosis only, but this project will develop a speculative locking strategy, guided by the statistical models, to avoid and thereby mitigate the effects of a variety of concurrency bugs.