Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures.
In this project, the researchers are developing a reliability subsystem for commodity operating systems. Their approach is to improve OS reliability by isolating the OS from driver failures. Rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, their goal is to prevent the vast majority of driver-caused crashes with little or no change to existing driver and system code. To achieve this, they isolate drivers within lightweight protection domains inside the kernel address space. Drivers run largely unchanged in kernel mode, but are prevented from corrupting the kernel by both software and hardware means. The system also tracks a driver's use of kernel resources to hasten automatic clean-up during recovery.
To prove the viability of their approach, they are implementing a system called "Nooks" in the Linux operating system and are using it to fault-isolate several device drivers. Their initial results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from 99% of the faults that would otherwise crash the system. While Nooks was designed for drivers, their techniques generalize to other kernel extensions as well. The researchers are demonstrating this by isolating a kernel-mode file system and an in-kernel Internet service.
Overall, because Nooks supports existing C-language extensions, runs on a commodity operating system and commodity hardware, and enables automated recovery, it represents a substantial step beyond the specialized architectures and type-safe languages required by previous efforts directed at safe extensibility.