The construction of a software system leads to the creation of several different artifacts, including requirements and code. Requirements, written in natural language, stipulate the system functionality; code then implements and tests the specified functionality. To ensure that a system has been properly implemented and tested, software engineers attempt to match and link requirements to code (and other artifacts) in a process known as software traceability. Unfortunately, the traceability process can be both difficult and time consuming due to the complexity of the underlying system and the fact that modern development practices tend to prioritize implemented functionality over traceability. This project will develop novel techniques for automating the software traceability process by predicting accurate links for developers and explaining why these predictions were made. The proposed techniques will allow software engineers to establish and manage software traceability in a more efficient and effective manner, ultimately leading to a better understanding of a given system and more robust guarantees that it is functioning as intended. The project will also produce and disseminate educational materials on best practices for requirements engineering and program comprehension. We expect these materials to be integrated into existing computer literacy courses at all levels of education. In addition, the project will focus on recruiting and retaining computer science students from traditionally underrepresented categories.
The project is centered on three specific goals. First, it will develop novel techniques that are capable of combining (i) orthogonal measures of the textual similarity of software artifacts, (ii) developer feedback, and (iii) transitive links that exist between artifacts, in order to predict accurate trace links between software artifacts. This component will adapt and build upon techniques for machine learning, information retrieval, and statistical modeling. Second, it will develop a method for using evolutionary software histories to improve trace-link quality. This evolutionary component to the automated traceability system will adapt recent advancements in dynamic statistical-modeling techniques. Finally, the project will leverage causal inference and intelligent agents to aid in explaining predicted trace links and supporting developers in the trace-link evaluation process. The automated techniques developed during the course of this project will be thoroughly validated with industry partners, and are expected to become a powerful tool for developers in establishing and managing trace links for software systems.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.