The concept of a Computational Power Grid has emerged as a way of capturing the vision of a network computing system that provides broad access, not only to massive information resources, but to massive computational resources as well. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. NetSolve is a software environment for network computing that addresses the complexities of such systems.

NetSolve uses a modular, client-agent-server architecture. Moreover, it is designed to be highly composable in that it readily permits new resources to be added. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explores design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.

This work will focus on issues of fault-tolerance and computation migration in the NetSolve environment. NetSolve's composable architecture and dynamic access to heterogeneous resources make it ideal for investigating a variety of approaches to fault-tolerance, and then for deploying any valuable results that ensue. The research on NetSolve aims to improve NetSolve's design by exploring the following techniques:

- Fault-tolerance and migration between resources. This is termed inter-serverfault-tolerance, in which a computation is moved off a server that falls or is not progressing fast enough. - Introduction of storage servers to store checkpointed state, so that if a given server falls, the NetSolve agent can send its state to a new server. - Data logistics for improved performance through techniques that manage user data, for example to eliminate redundant transmissions of large data sets across the network. - Dynamic loading of new software componet ts from network repositories to satisfy user requests.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
9876895
Program Officer
Xiaodong Zhang
Project Start
Project End
Budget Start
1999-05-01
Budget End
2002-04-30
Support Year
Fiscal Year
1998
Total Cost
$398,101
Indirect Cost
Name
University of Tennessee Knoxville
Department
Type
DUNS #
City
Knoxville
State
TN
Country
United States
Zip Code
37996