As the number of cores per chip scales to 1000s by the middle of the next decade, a fundamentally different set of constraints governs software design, including the operating system. However, recent research has demonstrated problems with scaling monolithic OS designs. In monolithic OSs, OS code executes in the kernel on the same core, which makes an OS service request. That has led to significant performance degradation and exhibits severe scalability problems. Efforts to improve scalability of these services have been difficult and marginally successful. The primary question facing OS research over the next decade is: How can the operating system services be designed such that they scale to hundreds or thousands of cores? To answer this question, the research community needs to rethink operating system design from the ground up in light of current and future multicore architectures.
One solution is to provide a factored operating system that scales over the increasing number of cores. A factored operating system (called "fos") factors out the system services of a monolithic OS into a set of individual services. fos further factors and parallelizes each system service into an Internet-style collection, or fleet, of cooperating servers that are distributed across the multicore chip and bound to specific cores. To maintain good performance and efficient utilization in the face of varying system resources and application demand, the fleets need to employ elasticity. The PIs propose an elastic version of fos (dubbed "e-fos") which provides technologies to allow system services to be scaled up or down at run-time. The primary goal of e-fos is to scale to a large numbers of cores while meeting the varying demand in resources and services, i.e., discover and evaluate how a factored operating system can leverage elasticity to maintain performance and efficiency. e-fos demonstrates elastic fleets and mechanisms to manage the elasticity for future multicores. These techniques replace the outdated "static OS components", which currently hinder contemporary monolithic OSs from scaling to hundreds or thousands of cores. For further information, see the project webpage: http://groups.csail.mit.edu/carbon/fos.
Current computer operating systems were designed in an era when computation was the most critical resource in a machine. In the multicore era, the landscape has fundamentally changed. With the expected exponential increase in number of cores per processor, the question is no longer how to cope with limited resources, but rather how to make the most of the abundant computation available. This project investigated some of the design decisions and challenges that will be encountered as operating systems evolve for this new computing landscape. Specifically, we sought to answer the following high-level questions: Can an operating system service be effectively implemented using a set (or "fleet") of cooperating server processes running on different cores within a large multicore processor? How should that fleet adapt to changes in workload or availability of resources? Our initial explorations were made in the context of the experimental "fos" operating system. We designed a new naming service that is used by applications to find a particular service they need. Having this layer of indirection between the applications and services allows the fleets of servers to dynamically grow, shrink, migrate or reconfigure themselves as needed. We call this ability to adapt "elasticity." We prototyped an elastic filesystem in fos and showed that it was able to respond well to changes in demand using only simple heuristics to grow and shrink the filesystem fleet. Finally, we designed an elastic networking service for fos by refactoring and partitioning the work it performs in novel ways. It distributes the workload across a parallel fleet of servers that is capable of growing and shrinking as required. More details on these can be found in the academic papers published under the fos project. In the second half of this project, we turned our attention towards applying some of the lessons we had learned to a more mainstream operating system. We chose to implement a new networking service (called Pika) within the Linux operating system that would have greater scalability and elasticity than the default network stack. Pika employs dedicated cores for running the network stack which are distinct from application cores. Furthermore, Pika splits the network stack into various conceptual component servers based on functionality. The design of PIKA allows each of these components to either be run as stand-alone servers or combined with other components into a composite server (see Pika Architecture images). An instance of PIKA consists of one or more servers, each of which may encapsulate one or more of these components. This gives Pika the flexibility to run on both small and large machines and adapt to changes in load by adding or removing servers. We evaluated Pika on a modern server machine with four 10-core Intel Xeon processors and compared it to the stock Linux networking stack. Pika is able to significantly outperform stock Linux when it is allowed to run on large numbers of cores. Pika is able to achieve near-optimal scaling as the number of cores are increased, showing that it is well-equiped to handle the highly-parallel machines of the future. In addition, its performance is competitive with Linux when both are restricted to small numbers of cores (simulating a less parallel machine). This shows that Pika elastic design is able to handle a wide range of situations gracefully. One of the keys to its excellent performance is Pika's load balancing algorithm. We evaluated several algorithms and found that a Tournament scheme with a tariff of 2 had the best ability to maintain short services times when faced with highly asymmetric application response times. Our results show that global coordinated load balancing is crucial for effective use of parallel resources. Finally, we evaluated both split and combined configurations of Pika and found that the combined approach is better on current high-performance cores. However, our results indicate that the split approach could be better for future manycore systems or for other system services (besides networking) that might require larger working set sizes within the server. In conclusion, Pika achieves good performance, scalability, and load balance on some of the largest servers available today for both uniform and skewed workloads. We hope that our techniques and results can inform the design of other system services for multikernels that must maintain shared state on future multicore processors. For additional details, please see MIT CSAIL Technical Report MIT-CSAIL-TR-2014-002.