As technology advances, software plays an ever-increasing role in our daily lives. The prominence of defects is a troublesome fact of software development. Software defects can have serious consequences leading to significant loss of financial capital for businesses or even loss of life. These facts have led to rich areas of research studying previously found defects and their fixes in order to prevent or find future defects. These areas of research span many disciplines from security to software engineering and sub-fields such as program analysis, verification, software testing, and automated program repair. Important to these fields is the availability of a large number of real-world defects for study and evaluation. Recently, the BugSwarm infrastructure was developed to automatically create a continuously growing dataset of reproducible real-world failures and fixes. BugSwarm mines pairs of failures and fixes from open-source GitHub projects that use the continuous integration service Travis CI. The novelty of BugSwarm lies in the automated generation of scripts to compile and test the code, and the use of Docker images to provide the required environment to reproduce each failure and its corresponding fix. The BugSwarm infrastructure has led to the creation of the BugSwarm dataset, which currently includes over 3,000 reproducible failures and fixes from projects written in Java or Python.

This research will enhance the BugSwarm infrastructure and dataset to enable growth at the scale and direction required by the research community, and to possess the long-term sustainability required for such an infrastructure to continue evolving. Specifically, the research aims to solve five challenges necessary to make the dataset larger, more robust, diverse, and useful: (1) devise novel techniques to increase the reproduction rate of failures and fixes mined from open-source projects, (2) incorporate user feedback to guide the mining of failures and fixes of interest to the research community, (3) provide a richer classification schema for the dataset, (4) build support for additional programming languages such as JavaScript, and (5) provide a tool ecosystem to facilitate the use of the BugSwarm dataset. As part of this effort, workshops and tutorials will be organized to gather feedback from the community, and to share the BugSwarm resources. Going forward, BugSwarm will not only facilitate experimentation, but also avoid the duplication of work among software engineering researchers.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
2016735
Program Officer
Sol Greenspan
Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$1,470,431
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618