Today, more and more high-performance embedded applications such as avionics and flight control, space shuttle systems, vehicles, and instrumentation in medical and emergency facilities, demand greatly increased computation capabilities from processors. Meanwhile, semiconductor manufacturing technologies keep scaling processors to smaller feature sizes. As a result, power density in processors becomes increasingly high. Due to the high power density, processors are prone to overheating, which affects not only reliability but also performance, power and cost of embedded systems. As such, thermal management becomes a prominent issue in system design. On the other hand, high-performance embedded applications demand increasingly stringent need for timing guarantees. As high-performance embedded systems become more and more thermally-constrained, the issue of how to provide timing guarantees under the constraints of thermal behavior and thermal control mechanisms must be addressed.

The objective of this NSF CAREER research project is to provide timing-guaranteed services for real-time applications while maintaining safe temperature levels for processors in high-performance embedded systems. This project focuses on development of timing-aware dynamic thermal management methodology, algorithms design analysis and system implementation in a variety of problem domains. The project seeks to significantly advance real-time system design by furthering understanding of the fundamental thermally-constrained and timing-constrained problems in high-performance embedded systems. Furthermore, this research provides an important foundation for addressing external environmental effects (such as thermal environments) on real-time systems. The impacts of this project also extend to academia through graduate and undergraduate research, curriculum development, and industrial collaboration and community outreaching.

Project Report

The objective of this research project is to provide timing-guaranteed services for real-time applications while maintaining safe temperature levels for processors in high-performance embedded systems. This project focuses on development of timing-aware dynamic thermal management methodology, algorithms design analysis and system implementation in a variety of problem domains. As a result of this research project, it has furthered the current understanding of the fundamental thermally-constrained and timing-constrained problems in computer systems. It has produced the following major outcomes and findings: 1. We have developed both reactive and proactive timing-aware dynamic thermal management schemes. They greatly enrich traditional dynamic thermal management to handle both timing and thermal constraints. The reactive scheme is simple to implement while the proactive scheme can achieve higher resource utilization. 2. An extended study on multi-core systems has helped us better understand how the heat transfer among different cores affects the design of the timing-aware dynamic thermal management schemes. Unlike single-core systems, multi-core systems present more challenges in handling both timing and thermal constraints. Comparing with the naive load-balancing scheme, our proposed scheme significantly reduces the peak temperature of multicores. 3. We have also gained a better understanding of how energy efficiency design can be achieved under both thermal and timing constraints, and how to reduce the effect of power leakage in the energy-aware and thermal-aware timing-sensitive systems. 4. We investigated the relationship between temperature and reliability in multi-core systems. We found out that minimizing peak temperature in multi-core systems did not necessarily optimize system lifetime. 5. We have studied the fundamental trade-off between two major power management schemes: Dynamic Voltage/Frequency Scaling (DVFS) and Dynamic Power Management (DPM) with Clock Gating. DPM is efficient in reducing power leakage while DVFS introduces less overhead. We have obtained the best configuration of the system with a combination of both schemes. 6. A thorough study has been done on providing soft timing guarantees for stochastic arrival pattern under the thermal constraint. We developed smart power-saving schemes with different service level agreements (SLAs) on different system platforms. The SLA can be either the mean response time or a percentile of the response time and the platform can be either single tier or multi-tier. This project has provided the PI with opportunities to supervise a research scientist, two graduate students, and an undergraduate student. All of them had not worked on the thermal-aware design before joining this project. Through the PI's direct mentoring in the research project, they were able to delve into the key issues and identify the research topic and investigate potential solutions. Also this project was introduced to both undergraduate and graduate level courses. Through lecturing, students were exposed to the pioneering work in thermal-aware system design and some of them showed a strong interest in this area. This research also serves as an important stepping stone for addressing environmental effects on computer systems design (such as the thermal effect and the energy effect in our research). Since both overheating and energy are prominent issues to the environment nowadays, our research findings have a great potential to benefit the society at large.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0746906
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2008-05-01
Budget End
2014-04-30
Support Year
Fiscal Year
2007
Total Cost
$400,000
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109