In today's computing systems workload schedulers target best performance, but are unaware of thermal and power realities of the system. Similarly, the cooling subsystem controllers take only data from thermal sensors as their input and are thus totally oblivious to workload scheduling and power management decisions. Even though these systems all share a single computer infrastructure, their operation is optimized separately, resulting in inefficiencies.
This proposal goes well beyond previous work on optimizing thermal problems in CPUs separately from memory by largely neglecting the rest of the system, to solutions that understand the complex interplay between CPUs, HW accelerators such as GPUs, memory and hard disks with their related cooling subsystems. The PIs propose to develop joint control policies for such systems and to quantify the respective benefits and disadvantages. The project plans to study and design control policies for various ways of implementing cooling, using both fans and liquid cooling systems (e.g micro-channel vs. channels in a heat sink with external pump). The project will also test ideas on computing systems available in a modular data center container obtained at UCSD as a part of recently awarded NSF MRI (GreenLight) grant.
Graduate and undergraduate students will be involved in various parts of the proposed research and help in connecting this work with other NSF sponsored projects. The results of research, tools and coursework materials developed will be freely and easily distributed to engineering community at large. In addition, the PI has created a new program affiliated with the Computer Science and Engineering department at the UCSD whose target is to ensure seamless transfer of ideas, funds and people between academic and industrial settings.