When developing or deploying software, programmers must ensure that any external libraries or services which are necessary to run an application are properly installed and configured. The process of preparing computing infrastructure to properly execute an application is referred to as software configuration. Currently, programmers often manually perform software configuration, which can result in errors and poor maintainability; as a result, improper software configuration can cost billions of dollars of loss for business, lead to unexpected downtime of services, and cause failure of critical infrastructure and loss of data. Unfortunately, the skills required for proper software configuration can be orthogonal to software development, meaning there are limited programmers who are trained in software configuration skills.

This project will develop techniques to automatically perform the software configuration necessary to run an arbitrary application. Two main research tasks will be investigated for this project. One task will be to develop an approach for automatically inferring a Dockerfile, a configuration script for the Docker container system, capable of executing an application. The approach will use automatic code analysis of existing software libraries to build an offline knowledge base capable of recovering the dependencies between them. The approach will augment the knowledge base with rules learned by mining existing Dockerfiles, configuration scripts, and developer resources like Stack Overflow. Further, the approach will apply minimization techniques on environment specifications extracted from this knowledge base to arrive at a minimal set of application dependencies. The second task will develop a system to detect when configuration scripts are incompatible with code, such as in the event of a dependency upgrade, and use search-based techniques to automatically repair these configuration scripts. In addition, transfer learning will be used to guide successful inferences and repairs. Finally, these approaches are applied in two applications: detecting when code snippets in community resources are incompatible with an API version, and building repair bots that can automatically create a pull request for repairing configuration scripts.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2018
Total Cost
$349,675
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695