Self-Managing Resource Allocation in Unsupervised Distributed Systems

Recent years have seen a growing deployment of distributed computing infrastructures such as Grids, PlanetLab, @home, and peer-to-peer systems, that run a variety of Web, commercial, and scientific applications. Many of these infrastructures are unsupervised---they consist of large number of loosely-connected nodes that contribute computational and storage resources but are not centrally managed. Such unsupervised infrastructures are characterized by uncertainty in their resource availability caused by failures, varying load conditions, and node churn, thus putting undue burden on application writers and system administrators for the successful deployment and execution of applications. This project is developing a self-managing resource allocation framework that would hide the infrastructure uncertainties and dynamics from applications, while transparently adapting to changing conditions within the infrastructure. As part of this framework, this project is developing techniques for: (i) Predictable resource aggregation to provide resource guarantees to applications in the presence of dynamic loads and changing resource availability, (ii) Reliability-aware resource management to provide desired levels of reliability and availability, and (iii) System inference and prediction to enable decentralized inference of global system conditions for proactive response to dynamic infrastructure changes. These techniques are based on cooperation and redundancy among nodes in the infrastructure to provide scalability and decentralization. The proposed research will have significant impact on distributed computing by enabling effective deployment of large-scale scientific and commercial applications on resource-rich but unreliable infrastructures.

Project Report

Processors, the computational heart of computer systems, have been changing rapidly over the past five years. With single-threaded performance increasing much more slowly than it did in the 90s and early 2000s, manufacturers have been increasing core counts. But making use of these increasing core counts has been an engineering struggle. CNS-0644205 is a CAREER grant intended to address software support for increasing core counts by developing the technology of memory transactions. Memory transactions allow the programmer to specify that certain data structure updates must complete as a unit, without partial results visible to other computational threads in the system.For example, a memory transaction would allow a programmer to add an item to a queue, which is an operation that requires several memory updates. Other threads would be able to observe only the legal beforestate and the legal after state, i.e., the state without and then with the new item. Threads could not observe the queue state as it isbeing updated. Transactions in database systems have been a historically successful programming abstraction, so memory transactions should help make multicore programming easier. Transactions should improve performance, reduce implementation complexity, and most importantly reduce the conceptual complexity of implementing system services. The project accomplished its goals via the following major findings. Our work on TxLinux establishes how a modern operating system (OS) could use memory transactions, and how it must support memory transactions with modified scheduling policies. Our work on dependence aware transactions demonstrates a provably safe optimization that allows more memory transactions to safely commit. TxOS is a version of Linux that provides transactions as part of its system call interface. It shows how system transactions provide a safe and efficient concurrency API and how they can be provided at modest performance cost. We further demonstrate the utility of TxOS by adapting multiple server applications to use it, for example, an IMAP mail server and a Byzantine fault tolerant library.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0644205
Program Officer
D. Helen Gill
Project Start
Project End
Budget Start
2007-01-01
Budget End
2012-12-31
Support Year
Fiscal Year
2006
Total Cost
$400,000
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78712