Dongarra, Jack J. Plank, James S. University of Tennessee
Enabling Technology for High-Performance Heterogeneous Clusters Focusing on Grid Middleware, Fault Tolerance and Sparse Matrix Computations
This research instrumentation enables research projects in:
- Harnessing Cluster Resources for Distributed Scientific Computing, - Fast and Portable Checkpointing in Clusters and Clusters-of-Clusters, and - Tools for Large-Scale Sparse Matrix Applications on Clusters.
To support the aforementioned projects, this award contributes to the purchase of a 32-node high performance cluster, some switches, workstations, and interface to existing clusters and visualization lab at the University of Tennessee. High-performance, low-latency clusters assembled from commodity computers and interconnects are clearly the cost-effective alternative to parallel supercomputers. But the software environment on such clusters is primitive at best; there is an obvious need of tools for application development and cluster management. The three projects in this proposal address this need. Their common goal is the development of enabling technology for advanced scientific computing applications on large-scale clusters and heterogeneous clusters-of-clusters. The proposed instrumentation is for a high-performance networked cluster of workstations. This cluster will be connected to two small clusters (available in the department) to provide a sizable, heterogeneous ``cluster-of-clusters'' with visualization capabilities. This cluster-of-clusters parallel platform will be used for algorithm, software and tool development research in the three projects.