This is an EAGER project that addresses a highly exploratory investigation into key elements needed to specify the characteristics of an operating system (OS) in a way that permits an architectural model to be created that interacts fully with a suite of simulation tools.

The suite of tools, CoGenT (CoGeneration of Tools), include specification languages to allow researchers to express novel instruction sets and micro-architectures and the infrastructure for automatic generation of corresponding functional and timing co-simulators, compilers, linkers, loaders, debuggers, assemblers, disassemblers, and a fully integrated instrumentation facility to enable meaningful experimentation within this new design space. CoGenT?s ability to automatically generate a functional simulator from a specification, and other related elements, will be released this year.

This EAGER addresses the problem that, in simulating complex architectures, it is important to be able to specify OS support, not just as a set of external calls, but as a specific model that integrates with the rest of the architecture. Current architectures rely on the services and policies of the operating system, and the operating system itself needs to evolve with the radical shifts in architecture and applications that are anticipated in the next decade.

With this project, this team develops an approach that enables simultaneous research into novel hardware and software paradigms, with great flexibility, and without the heretofore prohibitive cost of manually building a complete hardware and software simulation infrastructure with a tailored OS implementation. Traditional system simulation approaches either ignored OS impact on performance or resorted to costly and inflexible full system simulation where an actual OS implementation is executed directly. The former provides unrealistic results, and the latter does not admit the kind of exploration needed for transformative paradigm shifts.

The goal of this project is to extend the relatively recent approach of functional and timing co-simulation for hardware architectures into "pseudo-full system simulation", where the OS becomes a first-class element in the simulation modeling and instrumentation framework. Simulating an OS model derived from a specification will also enable sensitivity and significance analyses, often neglected in current simulation-based research even though they are essential to understanding the real impact of new approaches.

Project Report

As computer systems evolve and increase in complexity, understanding how their perfomance varies becomes both more complex and more important. Small differences in behavior can cascade into significant variations in performance. One aspect of system behavior that has not been studied in a modern context is the interaction between the operating system and internal state that a computer maintains for enhanced performance. For example, the computer keeps recently accessed data in a faster memory called a cache, and the behavior of decision points in program code is recorded to enable prediction of future decisions so instructions can be fetched from memory before they are needed, etc. Most of this stored information is meant to accelerate user programs, but when a program calls upon the services of the operating system, some of the information is inadvertently displaced. As a result, when the operating system returns control to the user program, it slows down for a subsequent period until the information is restored. The goal of our research was to characterize the cost of this side effect. We began with the expectation that the internal perfomance analysis hardware (performance counters) provided by microprocessors would enable us to gather this information. But after months of trying to access them in different ways, we found that the access process itself disturbed the measurements so much that they were meaningless, and there was no way to work around the effect. We thus turned to using a software simulation of a microprocessor called MARSSx86. MARSSx86 simulates the Intel instruction set, and the internal state for a particular AMD processor. Its timing has been verified against real hardware to within a few percent. At first we found initial results that seemed reasonable, but further experiments showed inexplicable behavior. The ensuing year and a half was spent working with the MARSSx86 developers to fix numerous bugs and problems in the simulator that they were unaware of, that were causing it to produce inconsistent results. We developed a suite of tests that enabled us to both isolate the problems and to validate our own results once the simulator had been fixed. We modified MARSSx86 to save the internal state before each system call. We could then compare that state with the state after the call, and restore the original state to compare actual performance with performance for no distubance. We analyzed a set of benchmark applications to identify the most frequently used system calls, and then measured their impact both in isolation and in the context of applications. We also tested multiple versions of operating systems. Our analysis found that the primary instruction cache could suffer up to a 38% performance penalty due to disruption, and a portion of the branch prediction unit could suffer up to 32%. On average, both of these units showed an 8% variation in performance from being disturbed. Other units were impacted less. Switching from the Ubuntu Linux version 9 operating system to version 12, we found average variations on the order of 25% across all state, and as much as a 500% difference in the branch predictor state. We have shown that operating system calls have a significant effect on the internal state of user code in a processor. While the average impact is modest, it cannot be neglected because these variations are actually comparable to perfomance improvements that are often cited in computer systems research. Thus, it is possible that some of those reports are biased by this underyling variation. Intellectual Merit: We are the first to develop a thorough analysis of the performance side effects of operating system calls because of disruption of internal processor state. Doing so was far more challenging than we anticipated, which may be why it has not been done before. We show that the effect is significant, and that it varies greatly with the operating system. We also found that the effect was greatest on two particular internal units, which indicates that further research should focus on reducing impact on those units. Concrete products from the work include a paper presenting our results at the ISPASS 2013 conference, improvements in MARSSx86, development of a state save and restore facility for MARSSx86 that handles complex nested system calls with multiple threads of execution, and development of a suite of microbenchmarks for validation of the simulator. Students working on the project earned two MS degrees, and we supported multiple undergraduates through internships. Broader Impact: As part of our work, we helped the developers of MARSSx86 to refine their simulation to become a precise and accurate research tool, which will improve the quality of research results across the community. Our infrastructure and results can also serve as a guide to operating system and processor developers to help them in improving the overall performance of computer systems.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
0950410
Program Officer
Krishna Kant
Project Start
Project End
Budget Start
2009-09-15
Budget End
2012-08-31
Support Year
Fiscal Year
2009
Total Cost
$216,000
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01003