The goal of the proposed project is to help programmers create correct, efficient, reliable, and secure software systems by providing improved static-analysis techniques that can (i) verify properties of a program's behavior, and (ii) find potential bugs and security vulnerabilities in programs. The project will concentrate on the analysis of machine code: the objectives are to enhance the scientific basis for static analysis of machine code, to create prototype tools that apply the techniques developed in tools for analyzing safety properties of machine-code programs, and to measure how well the tools perform.
This grant supported research in the field of program analysis, with the aim of helping programmers create correct, efficient, reliable, and secure software systems. In particular, the research carried out under the grant focused on the difficult and important problem of analyzing stripped machine code (i.e., neither source code nor symbol-table/debugging information is available to help the analyzer).The number of articles during the last several years about security issues---both in the popular press and in the research literature---are one measure of the importance of machine-code analysis. The applications of the work include verification that machine-code programs operate as desired, and the identification of bugs and security vulnerabilities at the machine-code level, via combinations of static analysis, dynamic analysis, and symbolic execution. During the course of the project, the main activities have been as follows: 1. Development of a system, called TSL (for Transformer Specification Language), that makes it easier to implement analyses for machine code. TSL provides a platform for specifying the semantics of machine-code instruction sets and generating dynamic, symbolic, and static analyses from the semantic specification. 2. Exploration of how TSL could be used to create implementations of the basic primitives used in symbolic program analysis: forward symbolic evaluation, pre-image computation, and symbolic composition. 3. Exploration of how TSL could be used to specify the transition function for cache memory. 4. The use of TSL to develop a tool, called BCE (Botnet Command Extractor), for automatically extracting botnet-command information from bot executables. 5. The use of TSL to develop a tool, called McVeto (Machine Code VErification Tool), which uses both dynamic analysis and symbolic analysis to determine whether a certain program point of interest is reachable. 6. Several investigations of fundamental issues that arise in program analyzers. Static-analysis techniques explore the program's behavior for all possible inputs. The technical challenge that one faces is ``How can one obtain information about the possible states that a program reaches during execution, but without actually running the program on specific inputs?'' To make this feasible, the program is run ``in the aggregate'' -- i.e., on descriptors that represent collections of many states. One of the lines of investigation concerned the development of families of descriptors suitable for use with bit-vector arithmetic (as opposed to integer arithmetic), particularly for representing inequalities. Another line of investigation concerned what we call 'symbolic abstract interpretation', which harnesses the power of modern constraint solvers to perform fundamental symbolic-reasoning operations for different families of descriptors. An overarching theme of the work has been that we have considered a number of issues that have been ignored in previous work on program analysis -- and cause previous techniques to be unsound when applied to machine code -- and shown how comparable techniques can be developed that support sound machine-code analysis.