This research project focuses on two important areas in tolerating resource dynamicism: architecture- independent checkpointing (AIC) and primitives for high availability. Architecture-independent checkpointing is the ability to save the execution state of a program in a machine independent manner so that it may be restored on a different processing platform. This could be on a different machine, or for a parallel program, a different parallel processing environment. The PI is attacking AIC from different angles. These are applicationspecific AIC in ScaLAPACK, semi-transparent to transparent AIC for C and Fortran programs, AIC of Java applications, and AIC of HighPerformance Fortran applications. The result will be a collection of methods for and implementations of AIC for a variety of widely used computing tools. The second portion of this research project is to design and implement primitives for high availability in three currently popular tools for distributed computing: ScaLAPACK, MPI, and Java. Currently, each is limited in its support. Occurrences such as failures, processor revocation and subsequent availability, network partitions, and planned shutdowns are not accommodated. The PI plans to address these deficiencies. The result will be more robust tools for distributed computing, plus methodologies that may be incorporated into the design of future tools.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
9703390
Program Officer
Mukesh Singhal
Project Start
Project End
Budget Start
1997-08-15
Budget End
2001-07-31
Support Year
Fiscal Year
1997
Total Cost
$205,000
Indirect Cost
Name
University of Tennessee Knoxville
Department
Type
DUNS #
City
Knoxville
State
TN
Country
United States
Zip Code
37996