Most modern microprocessors support some form of vector operations that allow the same operation to be applied to small vectors of arguments simultaneously. Studies have shown that use of these instructions can improve the performance of many scientific codes by a factor of 2 or more. Unfortunately, the state of the art in autovectorization falls far short of this goal, only achieving improvements of 20-30% on the same codes.
While studies have shown that current autovectorizing compilers do not identify all of the opportunities for vectorization, little is known about why they fail to do so. The PI will study the problem of mapping between high-level idealized vector code and the idiosyncratic vector instructions found on real hardware. The PI plans to use the Spiral code generation and autotuning system to generate a large set of test cases for evaluating how well existing autovectorizing compilers manage such mappings. This research will make it possible to develop better autovectorizing compilers by identifying the program transformations that are required to generate vectorized code for real hardware. The performance benefits of such compilers will improve the performance of applications ranging from multimedia software to scientific computing.