A Decentralized and Rule-Based Approach to Data Dependency Analysis and Failure Recovery in a Service-Oriented Environment

PI: Susan D. Urban

The objective of this research is to develop a decentralized approach to data dependency analysis and failure recovery among concurrently executing processes in a loosely-coupled service-oriented environment. The approach involves monitoring externalized data changes of individual service executions. Peer-to-peer, decentralized communication among process execution agents is then used to discover data dependencies among concurrently executing processes that may lead to data inconsistencies during the recovery of a failed process. Process interference rules of dependent processes are used to test user-defined semantic conditions to determine if 1) critical data conditions have been affected by the recovery of a failed process and 2) recovery procedures should be invoked for dependent processes. The research includes the development of a methodology for using process interference rules. The correctness and efficiency of decentralized data dependency analysis and rule-based recovery procedures are also demonstrated for concurrent processes in the context of a service composition model that supports compensation, contingency, rollback, and retry techniques. This research provides a new way of thinking about traditional transaction recoverability concepts, providing a dynamic approach to discovering data dependencies and responding to failures in a manner that guarantees user-defined correctness conditions for concurrent processes that execute without isolation guarantees.

Project Report

The advent of Web Services and Service-Oriented Computing has significantly changed software development practices and data access patterns for distributed computing environments, creating the ability to develop processes that are composed of distributed service executions. Service-oriented computing, however, also poses new challenges for software design and execution environments, especially with respect to failure recovery and semantic correctness in the context of concurrent process execution. Our research is innovative in that we provide a new paradigm for service execution that supports the dynamic discovery of data dependencies, with rule-based techniques for testing user-defined semantic conditions for correct execution. In particular, this research has extended an abstract execution model for establishing user-defined correctness and recovery in a service composition environment. The service composition model defines a hierarchical service composition structure, where a service is composed of atomic and/or composite groups. The model provides multi-level protection against service execution failure by using compensation and contingency at different composition granularity levels. The model is enhanced with the concept of assurance points (APS), integration rules, invariant rules, and application exception rules. APs serve as logical and physical checkpoints for user-defined consistency checking, invoking integration rules that check pre and post conditions at different points in the execution process. Invariants provide a stronger way of monitoring constraints and guaranteeing that a condition holds for a specific duration of execution as defined by starting and ending assurance points. Application exception rules extend integration rules with a case-based structure that is used to respond variably to events and exceptions that interrupt the execution of a process, allowing a process to determine recovery actions depending on the state of the process execution. A unique aspect of APs is that they provide intermediate rollback points when failures occur, thus allowing a process to be compensated to a specific AP for the purpose of rechecking pre-conditions before retry attempts. APs also support a dynamic backward recovery process, known as cascaded contingency, for hierarchically nested processes in an attempt to recover to a previous AP that can be used to invoke contingent procedures or alternate execution paths for failure of a nested process. As a result, the assurance point approach provides flexibility with respect to the combined use of backward and forward recovery options. Figure 1 illustrates the use of APs, integration rules, invariant rules, and application exception rules for process P1. This research also involved the investigation of decentralized data dependency analysis in support of process recovery procedures. In processes composed of Web Services, interleaved access to data between service executions of concurrent processes can potentially cause data inconsistency problems. If a process fails, data items modified by the recovery of a failed process may affect other processes that are concurrently executing and have accessed the same data items. The results of this research present a decentralized approach to analyzing data dependencies among concurrently executing processes in a service-oriented environment. The decentralized approach is an extension of past research with Delta-Enabled Grid Services (DEGS), which provides a technique for analyzing data changes capture from service execution to determine process dependencies. Process Execution Agents (PEXAs) have been defined that control the execution of processes and maintain local information about data changes. Process execution histories are then enhanced with control information that allows the construction of data dependency graphs to be distributed among multiple PEXAs by sharing data dependency information. Research results include the full integration of the decentralized data dependency analysis algorithm with the AP service composition and recovery model, demonstrating and evaluating how application exception rules can be used to communicate between PEXAs about data dependencies and to propagate recovery activities during rollback, retry, or cascaded contingency recovery activities. Figure 1 illustrates communication among three PEXAs, with P1 executing at PEXA1 and all three PEXAs communicating about data dependencies for coordinating recovery actions when failures occur. Petri Nets have been used to define the semantics of the assurance point approach to service composition and recovery with integration rules. The Petri Net formalization includes the semantics of AP recovery procedures in the context of flow groups that support parallel execution within a process and also defines recovery semantics in the context of if-then-else and looping control structures. A YAWL (Yet Another Workflow Language) specification of the AP Model was also developed, providing a way to demonstrate the soundness properties of the AP Model. The results of this research provide a dynamic and intelligent approach to monitoring failures, detecting dependencies, and responding to failures and exceptional conditions in a manner that guarantees some degree of correctness for process execution in a service-oriented environment. This research has built on foundational concepts related to data dependency and recoverability of transactions to demonstrate a new, decentralized way of thinking about the problem in the context of service executions.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0820152
Program Officer
Sol J. Greenspan
Project Start
Project End
Budget Start
2008-06-15
Budget End
2012-05-31
Support Year
Fiscal Year
2008
Total Cost
$354,581
Indirect Cost
Name
Texas Tech University
Department
Type
DUNS #
City
Lubbock
State
TX
Country
United States
Zip Code
79409