Interprocedural dataflow analysis plays a central role in tools for software maintenance, testing, verification, and optimization. Modern software has characteristics that cannot be handled by traditional approaches: it typically uses multiple distributed components, and it often employs dynamic mechanisms that are hard to analyze statically. Existing analyses fail in the presence of such features, making it hard to provide sophisticated tool support for real-world software systems. In turn, this reduces programmer productivity and leads to lower software quality. This project focuses on three challenges posed by modern software: reusable components, such as standard libraries; distributed software; and run-time adaptation through dynamic class loading and reflection. This effort is a significant step towards building powerful software tools that are truly usable and useful in the software industry.
The theoretical foundations of dataflow analysis are generalized to achieve precision and scalability in the presence of reusable components. Widely used analyses (e.g., points-to analysis, MOD/REF analysis, constant propagation, and object naming) are adapted to distributed component-based systems. The analyses are systematically generalized to handle dynamic language features. Dissemination is achieved through open-source analysis implementations and two program understanding tools. The broader impacts of the project include (1) research infrastructure which provides scalable off-the-shelf implementations of several fundamental static analyses, (2) tools that supply high-quality support for program understanding, which will improve productivity and software quality, and (3) integration of the research with education, which will increase the students' proficiency in current methods and tools for software development, and will accelerate their career progress.
This project produced new theoretical techniques and analysis algorithms for improving the correctness and performance of software systems. In modern object-oriented software, it is difficult to perform automated software analysis because of the scale of the systems and the challenging features of programming languages. Our work addresses this problem by considering a compositional approach in which software components are analyzed separately, and the results are combined to determine properties of the entire system. We have developed theoretical approaches to design such analyses, and demonstrated them through a large number of new analysis algorithms. The conclusion from this work is that the generality, scalability, and precision of software analysis can be improved significantly with the help of the proposed approach. Our work showed experimentally that the proposed techniques can lead to dramatic cost reduction for many commonly-used analyses. Based on these techniques, we developed novel analyses for potential execution inefficiencies. This work showed, for the first time, that effective detection of performance problems in object-oriented software is possible in advance, without running the software. We also presented a new and more effective way to handle dynamic features of programming languages; this work extended significantly the scope of what is currently possible to achieve with software analysis algorithms in the presence of such dynamic features. Our techniques can serve as basis for future software analyses developed by other researchers, which would contribute to the research expertise in this field. These analyses are often part of the infrastructure of analysis frameworks and software tools, and reducing their cost can benefit significantly other researchers as well as tool users - for example, software developers and software testers. Overall, the work makes it feasible to analyze real-world software systems with increased precision and reduced cost, which adds to the toolset of software developers/testers. This ultimately leads to more robust software infrastructure for the enterprise and consumer markets, with higher quality and lower development costs for many software products.