The project will develop an integrated aproach to improving communication performance in clusters. Cluster computing has become a common, cost-effective means of parallel computing. Although adding more CPUs increases the cluster's maximum processing power, real applications often can not efficiently use very large numbers of CPUs, due to lack of scalability. In regular codes the main impediment to achieving scalability is the communication overhead which increases as the number of CPUs increases. Most of these optimization methods proposed target specialized hardware or programming languages, and require specialized knowledge from the domain scientist, or are not enough to provide a comprehensive solution on their own, and do not adequately address the challenges of the layers of communication software between the sender processes and the receiver processes. Improved performance overall for these applications, it remains largely untapped due to (1) the need for the knowledge of the context of the communication operations to exploit the sophisticated network technology fully, and the (2) the low level nature of programming needed within the application program context to achieve that potential. In particular, performance can often be improved through increasing the use of lightweight asynchronous communication. Unfortunately, programming with asynchronous communication is difficult and error prone, even for the most experienced programmers.

The project will pursue a vertically integrated approach, where a set of optimizations in the compiler, network and operating system, can enable legacy parallel applications to scale to a much larger number of CPUs, even if written without any knowledge of our techniques. An experimental prototype and preliminary experiments with real scientific applications, show that significant performance improvements are possible with a vertically integrated approach where knowledge of the context of communication operations is joined with knowledge of the network and cluster details to provide a fine-grained strategy for overlapping communication and computation. Based on these initial promising results, the overall goal of this proposed research is to create a means for scalable cluster computing through enabling integrated knowledge and cooperation between the source optimizer, operating system, and network technology of the cluster, without relying on the programmer to learn about the low level details of the cluster communications system.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0509170
Program Officer
Mohamed G. Gouda
Project Start
Project End
Budget Start
2005-08-01
Budget End
2009-07-31
Support Year
Fiscal Year
2005
Total Cost
$450,000
Indirect Cost
Name
University of Delaware
Department
Type
DUNS #
City
Newark
State
DE
Country
United States
Zip Code
19716