In just a decade, dynamic invariant inference has emerged as one of the most promising directions in program analysis, with a variety of applications. An invariant inference system observes a program during test execution and filters a large number of candidate invariants (i.e., suspected relations between program data), finally reporting only those that hold with high confidence. However, inferred invariants are not always true (they depend on the quality of a test suite), and the few really useful invariants discovered are often accompanied by many more true but trivial and irrelevant facts. This work improves the quality of discovered invariants by ensuring their consistency with facts that are known statically. For instance, even though the invariants describing the behavior of two functions f1 and f2 may be unknown, we may know that any valid input for f1 is also valid for f2. This fact can be incorporated in the inference process to eliminate inconsistent invariants. More generally, the work explores techniques for expressing, discovering, and employing such consistency constraints to improve the quality of produced invariants, from type information and other sources including static analysis and user-supplied annotation.
The work will impact many aspects of software engineering, including scientific and industrial uses. Concrete benefits will be in the form of publications, usable software (released under an academic open-source license), software prototypes, and educational activities and resources (enhancement of a textbook and current courses, internships for high school students).
Software is as vital to the 21st century as steel was to the 20th. It is fundamental to advances in everything from automobiles to cell phones to drug discovery to emergency response coordination, and leadership in software development is key to economic competitiveness. The complexity of software is bounded only by human capacity to understand and control it. Software developers work always near the edge of their ability to tame complexity. A primary goal of much research in software engineering and programming languages is to provide software developers better technical and intellectual tools for ensuring that the complex software structures they construct and assemble conform to the design intent of the developers. Usually this involves both synthetic approaches (partial automation of software construction from specifications of intent) and analytic approaches (automated and partially automated checks of software structure and behavior relative to specified intent, and diagnosis of deviations). Foundational theory of computation tells us that there can never be a complete and infallible solution to the general problem, so advances must leverage and magnify human ingenuity rather than replacing it. Among the levers we can develop are dynamic analysis tools that allow the software developer to sift through enormous volumes of information gathered from program execution to find important facts about those executions. These facts may be used to diagnose a programming error ("bug") or to assess and plan a potential improvement or extension to the software. Dynamic invariant detection is a technique for producing observations about relations among internal structures maintained by software. In essence it is simple: Automatically generate a large number of conjectures, then eliminate all of the conjectures that are contradicted by actual observations over several test executions of the software. The conjectures that remain after many checks are "likely invariants", i.e., facts that seem to hold for all the observed executions. Dynamic invariant detection is potentially useful to support a variety of typical software development tasks, including development of good test suites (by noting unintended uniformity in a weak test suite) and assessment and planning of program enhancements. Unfortunately, dynamic invariant detection in its current form produces far too many putative facts about a program, most of which are true but few of which are really useful. The developer who attempts to use dynamic invariant detection to gain a better understanding of software suffers a "needle in a haystack" problem. Our project produces better needles in much smaller haystacks, by making use of explicit abstractions provided by developers. Some of those abstractions are already expressed in some way in the program, typically through organization of the program's data structures as a hierarchical collection of classes. Additional hints about the design intent of the developer --- e.g., that a particular structure is meant to represent a sequence of items in a particular order --- permits us to extract fewer but much richer description of observed "likely invariants" in terms of those abstractions. We have constructed a prototype software tool to demonstrate and evaluation this approach to dynamic invariant detection. Our Alembic system allows a developer to specify the abstractions important to him or her, at whatever level of detail seems most useful. Alembic creates a new, instrumented version of the program to be analyzed, gathers observations during program execution, and produces a set of likely invariants for the programmer. Applying Alembic to software from widely used libraries, we have shown that it can radically reduce the volume and improve the quality of information presented to the developer, at a very modest cost in effort to specify simple abstractions.