Extremely large scale systems offer a new challenge to application designers. Current software development techniques do not scale well in execution efficiency on these systems or, more importantly, in the amount of time the programmer spends writing, debugging, and tuning the software. To realize extreme-scale computing, we must increase programmer productivity. To that end, we require three advances in the programming paradigm for these systems. First, the application programmer must interact with the development environment at a level higher than processes or execution threads. Tools must support these interaction modes and more abstract application views. Second, system monitoring functions must exist to provide feedback to the application programmer on overall system performance. Monitoring an extreme-scale system must include some degree of automation and must be able to infer overall performance from a small set of monitoring points. Feedback must be compressed to highlight performance issues at the high abstraction level the programmer requires. Finally, many low-level optimization decisions must be automated by incorporating a new generation of compiler optimizations targeting global program behavior, and these must be intimately integrated with the monitoring system. These advances are described collectively as autonomic performance optimization.
Our proposed research addresses these requirements by developing new tools and extending current tools to manage large software projects. We will extend our prior work on the Tau framework of performance instrumentation and analysis tools to the scale used by these HEC applications. To do this, we will incorporate a new framework for monitoring representative "skeletons" that can provide information to the programmer about total system performance by using a simpler model that matches the execution profile of the full application. Performance modeling of the skeleton is achieved by placement of profile monitors at strategic points in the system. We will utilize advanced machine learning techniques to determine the placement of these monitor points, as well as to synthesize the resulting large quantity of performance information into the proper form for the application designer. Finally, we will automate some critical low-level design decisions by feeding profile data directly to the compiler and dynamic code translator. The optimizations developed target data layout, data duplication throughout the system, and dynamic data movement. Optimizing data management will decrease average access latency for memory references, reducing congestion on the inter-processor and inter-cluster networks while freeing the programmer from making detailed data placement decisions.
The intellectual merit of this proposal is in the new paradigm of autonomic performance optimization as a framework for the integration of performance methods and tools for HEC systems. The broader impact is both technical and societal. Ultimately, we strive to enhance the computational tool infrastructure used to solve Grand Challenge scientific computing problems. However, we believe this must be done in association with the evolution of large-scale computing to use introspective, autonomic platforms and systems. Our work will enable practitioners to more easily build efficient, scalable applications, to solve very large and complex problems, and to do so more quickly than is currently feasible. The significant increase in the productivity of applications writers will not only enhance the development of scientific applications important to our national infrastructure, but will also open HEC to important economic and societal applications where computing is advancing science and technology.