This project's goal is to acquire and develop an instrumented datacenter testbed spanning the three sites of the NSF Center for Autonomic Computing (CAC)-the University of Florida (UF), the University of Arizona (UA) and Rutgers, the State University of New Jersey (RU). Datacenters are a growing component of society's IT infrastructure, including services related to health, banking, commerce, defense, education and entertainment. Annual energy and administration costs of today's datacenters amount to billions of dollars; high energy consumption also translates into excessive heat dissipation, which, in turn, increases cooling costs and increases servers' failure rates. The proposed testbed will enable a fundamental understanding of the operations of data centers and the autonomic control and management of their resources and services. The design of the underlying infrastructure reflects the natural heterogeneity, dynamism and distribution of real-world datacenters, and includes embedded instrumentation at all levels, including the platform, virtualization, middleware and application layers. Its scale and geographical distribution enables studies of challenges faced by datacenter applications, services, middleware and architectures related to both "scale-up" (increases in the capacity of individual servers) and "scale-out" (increases in the number of servers in the system). This testbed will enable fundamental and far-reaching research focused on cross-layer autonomics for managing and optimizing large-scale datacenters. The participant sites will contribute complementary expertise-UA at the resource level, UF at the virtualization layer, and RU in the area of services and applications. The collaboration between the university sites will bring coherence across ongoing separate research efforts and have a transformative impact on the modeling, formulation and solution of datacenter management problems, which have so far been considered mostly in terms of individual layers. The testbed will also provide a critical infrastructure for education at multiple levels, including providing students with hands-on experience via course projects, enable development of new advanced multi-university and cross-disciplinary courses, as well as multi-site group projects focused on end-to-end autonomics, which will use the proposed testbed. Students from underrepresented groups will be actively involved in the research and their participation will be increased through ongoing collaborations with minority institutions. Even broader community participation will result from an evolving partnership with the recently proposed industry cloud initiatives.
Datacenters are a growing component of society’s IT infrastructure, including services related to health,banking, commerce, defense, education, and entertainment. Annual energy and administration costs oftoday’s datacenters amount to billions of dollars; this tremendous energy consumption also translatesinto excessive heat dissipation, which, in turn, increases cooling costs and increases servers’ failurerates. In this collaborative project, the partner universities have assembled a testbed that enablesexperiments towards a fundamental understanding of the operations of datacenters and clouds, as wellas the autonomic control and management of their resources and services. As the design of the real-world datacenters and their equipment are different in terms of performanceand energy consumption, the management solutions that take into account the natural heterogeneity,dynamism, distribution of datacenters, and even their solutions are very challenging. "Scaling-up" and"scaling-out" these solutions (i.e., by employing applications, services, middleware, and architectures)are also challenging tasks to accomplish because they are located in different places. The collaborationamong university sites as a part of the NSF Cloud and Autonomic Computing (CAC) Center broughtcoherent research efforts and a transformative impact on the modeling, formulation, and solution ofdatacenter management problems, which have so far been considered mostly in terms of individuallayers. This project has enabled the acquisition of significant computing resources at all three university sitesto serve as a testbed for research on cross-layer autonomics. The computational resources of thetestbed have been deployed and interconnected in a configuration that is described in more detailsbelow. The testbed includes computing, storage, and networking resources, as well as a heterogeneoussensing infrastructure (thermal cameras, temperature, and airflow sensors) to monitor real time theenvironmental conditions. In the first year of the project, each university site surveyed options for the major server computingresources, worked with vendors to obtain discounted quotes, and selected an equipment supplier usinga competitive process. The choice of the configuration of the computing resources has been such thatresources are homogeneous from an architectural standpoint (64-bit x86 servers) but configured indifferent ways at each site to enable different experiments to capture the behavior of heterogeneoussystems. In the second and third years of the project, the equipment has been configured, deployed,and interconnected, and has been used in a variety of research experiments and emulation campaigns. The testbed allows experiments on a cross-layer architecture composed of four different layers (seeFigure 1 below) that belong to different abstract components with different responsibilities, but sharecommon objectives: • The workload or application layer uses online clustering techniques and workload profiling andcharacterization in order to group jobs that have similar requirements. It efficiently characterizesdynamic, rather than generic, classes of resource requirements that can be used for proactiveVirtual Machine (VM) provisioning. Job groups are provisioned with VM classes focusing on reducingover-provisioning, energy consumption, and cost. •The virtualization layer instantiates and configures VMs following the classes obtained via clusteringtechniques. VMs can be migrated if necessary, potentially driven by the interactions with otherlayers (e.g., reacting to hotspots). VM configurations are defined using models for workloadcharacterization based on its requirements. • The physical resource layer performs resource provisioning, which consists on mapping VMs tospecific resources. We consider physical nodes that allow specific configurations in their subsystemsor even disabling some of them. Physical resource configurations are defined for each specificrequest set using appropriate models and dynamic reconfigurations are allowed in order to optimizeenergy efficiency. • The environment layer profiles temperature and heat, and controls the Computer Room AirConditioning (CRAC) unit based on estimations of heat imbalance– the difference between the heatgenerated and the heat extracted. This layer detects, localizes, characterizes, and tracks thermalhot spots using scalar sensors (e.g., temperature and humidity) as well as thermal cameras andairflow meters to control the CRAC unit. This layer uses as inputs: i) type of workload, ii) intensity, iii)physical resources used, and iv) workload scheduling policy. These layers work together proactively to enable efficient datacenter management. The three universitysites contributed complementary expertise—University of Arizona at the resource level, University of Florida at the virtualization layer and in cross-layer information systems and resource discovery, and Rutgers in the area of services, applications, and environmental monitoring.