Although the speed and performance of high end computers have increased dramatically over the last decade, the ease of programming such parallel computers has not progressed. The time and effort required to develop and debug scientific software has become the bottleneck in many areas of science and engineering. The difficulty of developing high-performance software is recognized as one of the most significant challenges today in the effective use of large scale computers.
The Cactus framework for science applications has been developed over the last several years,tosimulate physical systems in many fields of science, such as black holes and neutron stars in general relativity. As in other software frameworks, applications are built from separately developed and tested components. The project SDCI HPC Improvement: Cactus Tools for Application Level Performance and Correctness Analysis (ALPACA) will provide high-level tools to allow developers and end-users to examine and validate the correctness of an application, and aid them in measuring and improving its performance in production environments. These tools will be components themselves, built into the application and interacting with it. The developed software will also help render applications tolerant against partial system failures, which is becoming a pressing need with tomorrow?s architectures consisting of hundreds of thousands of nodes.
In contrast to existing debuggers and profilers, the ALPACA tools will work at a much higher level, at the level of the physical equations and their discretisations which are implemented by the application, not at the level of individual lines of code or variables. It is not enough for only the main kernels to be correct and show good scalability; the overall application ? which may contain many smaller modules ? must perform. Our integrative effort will lead to well-tested and highly efficient applications which are developed in a shorter time scale and execute more reliably. By providing interactive debugging abilities at the application level in production environments, and by allowing interactive experimentation at the algorithmic level on large HPC systems, the ALPACA tools will significantly reduce the time and effort required to take the steps from using isolated application components on single workstations to performing large-scale HPC calculations.
The ALPACA tools will be developed in close conjunction with scientists from several scientific communities, ensuring their direct usefulness and applicability to real-world problems.
Intellectual Merit: The issues addressed by ALPACA are critically important for the success ofHPC systems at all scales, and ALPACA tools will be highly valuable for algorithm development, performance analysis, and software engineering in many fields of science. The LSU group has been a leader in developing application level HPC tools, including the Cactus framework, in developing algorithms for adaptive scalability in HPC and distributed HPC environments, and in developing applications themselves; it is a world leader in applying HPC as a tool to solve Einstein?s equations. The ALPACA project simultaneously addresses problems in physics and computational science,and will provide scientists with radically improved tools to help them bringing their problems to the machine.
Broader Impacts: ALPACA fundamentally involves several application areas important to NSF, and will impact many others. ALPACA has the potential to make a huge contribution to computational science by providing a software infrastructure that enables developers and users to create scalable applications and to use them in a correct and efficent manner. Through this tools, we expect groups to concentrate more on physics and numerics, and less on computational details in an ever more complex computing environment. At the same time, many other communities are emerging to solve complex problems, and our ideas and techniques, designed to help HPC software development in any disciplines, will have impact across many of these projects. The ALPACA tools will thus naturally spread out into the communities. This proposal includes a training workshop and the training of a postdoc and a graduate student.