Identifying bugs in software continues to be a challenging, but essential problem to solve. One particularly difficult task is ensuring the integrity of large-scale data structures stored in memory. Existing bug-finding techniques, such as static analysis of the code, have not been effective on this problem, especially for complex and highly dynamic software, such as web applications.
This project explores a new technique for checking data structures dynamically as the program executes. Dynamic checking is effective and precise, but must be efficient in order to avoid significantly slowing program execution. The key idea in this work is to piggyback checking on the garbage collector, which already periodically visits all data structures in the program. An efficient and precise tool for detecting data structure errors could be widely deployed to improve the reliability of critical software infrastructure.
The project consists of three specific avenues of research. The first involves developing a declarative language for expressing dynamic data structure properties, building on existing techniques from static analysis and verification. The second investigates the class of properties that can be checked during a single pass of the garbage collector. The third builds on the machinery of concurrent garbage collection, allowing heap checks to proceed concurrently with the application on available extra CPU cores.
The overall goal of this project is to develop a family of new tools to help programmers find and diagnose software errors. Bugs continue to be a huge problem with major consequences for our society as we increasingly depend on software to run everything from commerce to cars. Over the last 10 to 15 years computer science researchers have developed a wide array of tools and techniques that help programmers build better software. The problem is so difficult, though, that no single approach is likely to solve the entire problem. Existing tools have focused on trying to find bugs in software by analyzing the source code. While valuable, these tools are not good at finding problems in a program's data -- they do not involve running the program, so there is no actual data to check. We have developed three techniques that excel at finding problems in data structures. Our tools consist of small pieces of extra code that are added to existing programs to monitor their behavior. Programmers use a kind of mathematical logic to describe concisely how the data should look. The tool code runs alongside the program code and ensures that data integrity is preserved. One of our central contributions is the invention of a way to make program monitoring very cheap (almost no extra cost) by piggybacking on an existing component that is present in many systems -- the garbage collector. Ordinarily, verifying the integrity of a program's data (especially large programs, like server applications) is very expensive and can slow programs down significantly. We noticed that garbage collectors already do much of this work: they visit all the data in a program looking for unused memory that can be recycled. Our programming monitors reside inside the garbage collector and perform a little bit of extra work on each data item that the garbage collector inspects. Over the course of the project we developed three approaches to program monitoring using the garbage collector. The first, called GC Assertions, provides a simple and efficient way for programmers to specify and check basic program properties. The second introduces a rich language for expressing more complex program properties and a way to translate this language into efficient program checking code. The third approach uses parallelism to speed up program monitoring using the extra processing power available on many of today's chips. This system performs program checking while the program is running, without interrupting it at all. The key challenge in this work is coordinating the monitor code with the program code so that they don't interfere with each other. All three systems were implemented and tested on real programs -- open source applications and systems that are widely used. We believe that these ideas could be readily adopted in commercial systems and have a significant impact on software reliability at a low cost.