On-demand, service-oriented cloud computing infrastructures continue to increase in popularity with organizations. Three observations motivate us to investigate running high-throughput, data-intensive tasks as background workloads on these cloud infrastructures. First, the rapid growth in hardware parallelism leaves more residue resources to be exploited. Second, the ``incremental power usage'' of piggybacking a secondary background workload onto the foreground workload to utilize those residue resources is relatively low. Third, the advances in GPGPU (General-Purpose GPU) processing enable a novel coupling of concurrent workloads.

This project will explore a new computing model of offering cloud services on active nodes that are serving on-demand utility computing users. We plan to (1) assess the efficacy of resource sharing between foreground and background workloads and investigate the relationship between their resource usage patterns and the benefit and cost of their mixed execution; (2) develop scheduling and load management middleware that performs dynamic background workload distribution considering the energy-performance tradeoff; and (3) exploit the use of GPGPUs for cloud services on active nodes that are running foreground workloads mainly on the CPUs.

Our research will explore a revolutionary change in the use of cloud computing and may influence their hosting organizations' future resource configuration and planning to create greener clouds. The research will be closely integrated with education-oriented cloud platforms at NCSU. The PIs will also leverage their established services and connections to increase the participation of women and minority students and to promote students' interactions with industry partners.

Project Report

Project outcomes: 1) Intellectual merit: This project explored a new operational model for cloud computing with background batch jobs scheduled to utilize the residue resources left by foreground interactive services. We have designed and implemented new application behavioral learning and resource managment techniques to achieve high performance assurance and low energy consumptions in cloud computing infrastructures. Specifially, we developped pattern-driven application consolidation and prediction-driven elastic resource scaling to achieve efficient resource sharing in multi-tenant cloud systems. Our experiments show that our techniques can reduce the SLO violation rate by orders of magnitude and save 8-10% energy cost. This research has generated more than 20 publications in conferences and journals. Several news medias (e.g., The Register) also reported our research. 2) Broader impact: We have collaborated with various industry partners such as Google during the project. We not only test our systems with in-house experiments but also with real production system trace data collected on the NCSU's virtual computing lab and cluster trace data provided by Google. Our research provies useful insights for the cloud service provider to improve their infrastructure efficiency. The project supported 11 PhD students including two female PhD students and two undergraduate students. The project allows students to acquire necessary skills to obtain permanent or internship jobs at leading industry labs or companies (e.g., IBM T.J. Watson Research, NEC Labs, Amazon, Oak Ridge National Lab).

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0915861
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-12-31
Support Year
Fiscal Year
2009
Total Cost
$320,000
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695