Due to the increasing demands for computing and storage, energy consumption, heat generation, and cooling requirements have become critical concerns both in terms of the growing costs as well as their environmental and societal impacts. Thus, thermal awareness, which is the knowledge of local unevenness in heat generation and extraction rates, and hence, heat imbalance at various points inside a datacenter, is essential to maximize energy and cooling efficiency as well as to minimize server failure rates. The objectives of this research are to acquire knowledge about the heat imbalance at different regions inside a datacenter and to enable thermal-aware self-configuration and self-optimization of computing resources inside a datacenter. These objectives are aimed at increasing the energy and cooling efficiency and at decreasing equipment failure rates so to minimize both the impact on the environment and the Total Cost of Ownership (TCO) of datacenters. Specifically, the project focuses on designing autonomic adaptive sampling solutions for enabling self-organization of heterogeneous sensors - composed of thermal cameras, scalar temperature and humidity sensors, and airflow meters - into a multi-tier sensing infrastructure, and on studying proactive, Quality of Service (QoS)-aware, heat-imbalance-based solutions for Virtual Machine (VM) consolidation and cooling system optimization in a virtualized air-cooled datacenter. This project will also result in the generation of computer-literate undergraduate and graduate researchers with a comprehensive knowledge of complex optimization problems in energy-efficient design and management of large datacenters. The PI will create new teaching modules on distributed sensing, provide opportunities for exchange programs, leverage existing minority student outreach networks at Rutgers, and incorporate student exchange programs as well as team-teaching approaches.

Project Report

Datacenters are a growing component of society’s information technology (IT) infrastructure, enabling services related to health, banking, commerce, defense, education, and entertainment. Due to the increasing demands for computing and storage, energy consumption, heat generation, and cooling requirements have become critical concerns in datacenters both in terms of the growing operating costs (power and cooling) as well as their environmental and societal impacts. Many current datacenters are not following a sustainable model in terms of energy consumption growth as the rate at which computing resources are added exceeds the available and planned power capacities. One of the main fundamental problems in datacenters is the uneven heat generation and heat extraction (i.e., the heat imbalance), which may lead to CPU temperature increases, thermal hotspots, and thermal fugues. We have inferred that one of the fundamental problems in existing datacenters is the local unevenness in heat generation and heat extraction rates. The former can be attributed to non-uniform distribution of workloads among servers and the heterogeneity of computing hardware, while the latter can be attributed to non-ideal air circulation, which depends on the layout of server racks inside the datacenter and on the placement of computer room air conditioning (CRAC) unit fans and air vents. The heat generation and extraction rates may differ, which over time causes what we call heat imbalance. A large negative heat imbalance at a particular region inside a datacenter will result in energy-inefficient overcooling and hence a significant decrease in temperature. Conversely, a large positive heat imbalance in a particular region will lead to a significant temperature rise, which may increase the risk of equipment overheating and hence the chances of server system failures due to operation in the unsafe temperature range. Thus, thermal awareness, which is the knowledge of heat imbalance in different regions inside a datacenter, is essential to maximize energy and cooling efficiency as well as to minimize server system failure rates. Autonomic datacenter management, which includes thermal- and energy-aware resource provisioning, cooling system optimization, and anomaly detection, can help minimize both the impact on the environment and the total cost of ownership (TCO) of datacenters, making them energy-efficient and green. Autonomic datacenter management solutions require continuous processing andanalysis of real-time feedback. Modern blade servers are equipped with a number of internal sensors that provide information about server subsystem operating temperatures and utilization. Specifically, the project focuses on designing autonomic adaptive sampling solutions for enabling self-organization of heterogeneous sensors - composed of thermal cameras, scalar temperature and humidity sensors, and airflow meters - into a multi-tier sensing infrastructure, and on studying proactive, QoS-aware, heat-imbalance-based solutions for Virtual Machine (VM) consolidation and cooling system optimization in a virtualized air-cooled datacenter. This intelligent system, which leverages measurements collected and shared across the three federated CAC sites, is able i) to collect measurable heterogeneous data, ii) to process it and generate information (e.g., the heat produced by CPUs and conducted/radiated in the datacenter as well as the location of hotspots), and iii) to acquire knowledge (e.g., the estimated heat to be extracted by the Air Conditioning (AC) system). This knowledge can be used to design closed-loop controllers to optimize i) the AC compressor duty cycle (which controls the temperature of cold air) and ii) the fan speeds (which control the air circulation). These controllers will use as feedback input not temperature values (which would make the AC system ‘reactive’) but heat-imbalance estimations, which allow to ‘predict’ temperature increases and make the system more energy efficient.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1117263
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2011-08-15
Budget End
2013-07-31
Support Year
Fiscal Year
2011
Total Cost
$120,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901