Software analytics distills large quantities of low-value data down to smaller sets of higher value data that shed important insights for software quality enhancement. It is essential for software effort estimation, risk management, defect prediction, project resource management and many other tasks.  Software analytics is a  complex, time-consuming process.  Recent research has tried to alleviate the issue through intelligent optimizers  that make better use of existing computational resources. The space of possible options for optimization is very large, and spans over multiple layers: all possible settings for algorithms, compilers, and execution time options. To complicate matters, there are many competing goals that could be used to guide that tuning; e.g. reducing CPU usage while increasing the predictive power of the learned model. Existing research has mainly focused on limited optimizers that explore just a few options at mostly one level while trying to improve on just one or two goals, leaving the large potential of optimizations untapped.

This research proposes to advance the state of the art to holistic scalable intelligent optimization for software analytics (SHASA). SHASA tunes all levels of options for multiple optimization objectives at the same time. It achieves this ambitious goal through the development of a set of novel techniques that efficiently handle the tremendous tuning space. These techniques take advantage of the synergies between all those options and goals by exploiting relevancy filtering (to quickly dispose of unhelpful options), locality of inference (that enables faster updates to outdated tunings) and redundancy reduction (that reduces the search space for better tunings).  This research will produce algorithms and tools that are demonstrably more useful and efficient for software analytics research. Those techniques are generalizable beyond software analytics for use in computational science and engineering at large.  An important broader impact is minimizing CPU and memory usage, ultimately reducing energy consumption in data centers, as data analytics computations grown significantly in scale and become computationally more demanding.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1703487
Program Officer
Sol Greenspan
Project Start
Project End
Budget Start
2017-07-01
Budget End
2021-06-30
Support Year
Fiscal Year
2017
Total Cost
$898,349
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695