Developing and maintaining software is a key challenge in computer science, with failures costing up to one half of one percent of the US GDP each year. Most code is retained and evolved, rather than created from scratch, and professional software developers spend over three-fourths of their time trying to understand existing code. Understandability and documentation have become key components of software quality, yet they remain poorly understood by both researchers and practitioners. In a future where the software engineering focus shifts from implementation to design and composition concerns, program understandability will become even more important. This research develops tools and techniques for mechanically generating documentation to help make programs easier to understand.
The research follows the insight that modern analysis techniques can form rich descriptive models of programs that are both precise and succinct. Human-readable documentation can then be synthesized from such models. The approach applies to large programs across multiple application domains. The research focuses on documenting how code should be used correctly, a critical aspect in an era of components-of-the-shelf development, as well as documenting how code has changed and evolved over time, a key part of software maintenance. The research leverages program analysis techniques, machine learning, and textual synthesis, with results disseminated through academic publication; the education, training and mentoring of students; as well as freely-available, open-source tools.