CAREER: Mechanisms for Resource Sharing in Collaborative High-End Computing Platforms

Fisher, Nathan; Shi, Weisong

Abstract

This investigator conjectures that the fundamental problem of resource sharing is a dependable mechanism for resource trading. Given this trading mechanism, it is easy to build several high level resource management services, such as service level agreements, optimal resource co-allocation, advance reservation, and dynamic adaptation and reconfiguration. This research consists of four components: (1) An adaptive, personalized trust model named aPET to be employed by individual peers for trustworthiness derivation; (2) A trust-based economic model M-CUBE to efficiently express resources thus to provide a powerful foundation for resource allocation and management; (3) Efficient resource allocation scheme across multiple sites and investigating the impact of trustworthiness provided by aPET on resource allocation schemes; and (4) Applying the proposed trust and economic models to provide efficient system management. The research is expected to serve as a fundamental component and has the extraordinary potential to be deployed and adopted by several high-end computing communities, which require collaboration of geographically distributed resources.

Project Report

Federated sharing of dispersed pools of geographically distributed high-end computing resources under coordinated control is widely recognized as a promising paradigm for building and executing next generation distributed high performance applications. However, the autonomous, heterogeneous and decentralized nature of participating peers across multiple administrative domains introduces two challenges. One challenge is effective decentralized resource management, which includes introducing incentives for peers to provide good services, self-adaptive resource sharing in different situations and optimal resource allocation. The other challenge is efficient system management, which includes self-defensive system management and user friendly management interfaces. Efficient system management makes the system easy to maintain and use and reduces the system management cost, which is extremely important for large scale systems. Intellectual Merit: In this project, we first proposed a solution that consists of two models: M-CUBE, a Multiple CUrrency Based Economic model, as the decentralized trading scheme, and aPET, an adaptive PErsonalized Trust model, to provide the trustworthiness of the peer to support M-CUBE. The M-CUBE model provides a general and flexible substrate to support most of high level resource management services, such as resource co-allocation, quality of service (QoS) control, advance reservation and scheduling algorithms. aPET is built on top of our previous work on PET, which derives the trustworthiness from the reputation evaluation and risk evaluation. Reputation is the accumulative assessment for the long-term behavior, while the risk evaluation is the opinion of the short-term behavior. Two kinds of knowledge, interaction-derived experience (local knowledge), and the recommendation (knowledge of other peers), are used to derive the reputation. Selecting the weights of these two parts are environment specific and are a decision on the trade-off between the reliability and efficiency. With the help of the trust management and the merits of the economic institution, M-CUBE provides a novel self-policing and quality-aware framework for the sharing of heterogeneous resources, and is a flexible universal infrastructure for building high-level resource management related services. M-CUBE is built upon currency-based mechanism, where the uniqueness of M-CUBE is each peer has its own currency. We then applied the trust model to the resource scheduling in cloud computing by employing a reputation-based resource scheduling. We proposed OPERA, an open reputation model, which characterizes itself with two important novelties: a vector representation of the reputation and the just-in-time feature that represents the real time system status. To demonstrate the effectiveness of Opera, we have integrated the Opera trust model into the scheduler of Hadoop. The experimental results showed that Opera enables the scheduler to select appropriate nodes which helped to reduce not only the number of re-executed tasks but also the execution time of Hadoopâ€™s jobs under the presence of failures and heavy workload up to 59% and 32%, respectively. This improvement, in turn, can improve the energy efficiency of the whole system and the network by 16.17% and 53.32% for the sort application respectively. Finally, we introduced RESCUE, an energy aware scheduler for heterogeneous cloud environments. RESCUE ranks nodes within the cloud based on their Application Specific Energy Efficiency (ASEE), which captures the correlation between the various hardware and software and the energy efficiency. According to ASEE, RESCUE can assign the whole workload to the most energy efficient machine while keeping the same performance. We evaluated RESCUE on a private cloud and found that with an aggressive control policy which set the idle machines in â€˜sleep modeâ€™ , RESCUE-with-sleep can further reduce the total energy consumption by 51.5% for BS Seeker, 48.8% for Matrix Stressmark, and 21.3% for TPC-W. Broader Impacts: The research outcomes of this project have been published in top journals, including IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Services Computing, IEEE Transactions on Dependable and Secure Computing, IEEE Internet Computing, Journal of Parallel and Distributed Computing, Sustainable Computing, and premier conferences, such as IEEE IPDPS. Two software packages, RatSim and CloudAlinger, were developed and released on the project web site. CloudAligner, a MapReduce-based tool for sequence mapping released in 2010 , has been downloaded more than 1200 times from around the world including China, India, Spain, Korea, France, Australia, Brazil, Estonia, and the United States. These downloads have been utilized not only for research and industry purposes, but also for academic courses. Three Ph.D. students and four Masters students involved in this project have graduated during in the period of the project, five undergraduate students, including three African American and one female student, had been involved in the research activities of this project via the support of NSF Research Experiences for Undergraduates (REU) supplemental program and Wayne State undergraduate research grant.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Application #: 0643521
Program Officer: Almadena Y. Chtchelkanova

Project Start
Project End
Budget Start: 2007-04-15
Budget End: 2014-03-31
Support Year
Fiscal Year: 2006
Total Cost: $465,770
Indirect Cost

CAREER: Mechanisms for Resource Sharing in Collaborative High-End Computing Platforms
Fisher, Nathan Shi, Weisong
Wayne State University, Detroit, MI, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments