Exokernel: An Operating System Architecture for Application-level Resource Management

Original paper by Dawson R. Engler, M. Frans Kaashoek and James O’Tootle Jr.: https://dl.acm.org/doi/10.1145/224057.224076

In this paper, the authors present an alternative to traditional Operating Systems, presenting the results of their experiments on creating a new architecture for Operating Systems called Exokernel: a small kernel that is capable of securely exporting all hardware resources through a low-level interface, allowing applications to extend, specialize or even replace important components usually handled by the kernel such as IPC (Interprocess communication) and virtual memory. The authors stand that traditional operating systems rely on the kernel to handle all the hardware resources, providing interfaces between applications and physical resources thus significantly limiting the performance and the implementation freedom from applications. This traditional approach, as the authors stand at the beginning of the paper, denies the applications the advantages of domain-specific optimizations, discouraging changes to the implementation of the existing abstractions provided by the OS. As an example of these limitations, database implementers might struggle to emulate random-access record storage on top of file systems, since traditional operating systems hide the details of its file system implementation as well as its page faults and timer interrupts. Given this example, the authors affirm that traditional operating systems that have a centralized resource management providing abstractions that cannot be specialized, extended or replaced, hurt application performance because there is no single way to abstract physical resources that are best for all kinds of applications, being this, the main motivation for the proposal of the exokernel architecture.

Applications know better than the operating system what is the goal of their resource management decisions, because of that, they should be able to control as much as possible the abstractions for interacting with those physical resources. The paper focuses on demonstrating that an exokernel architecture can effectively address those limitation problems derived from the traditional operating systems approach. The exokernel architecture consists of a light kernel veneer that multiplexes and export physical resources securely, while the called library operating systems (applications) consumes this low-level interface, implementing higher-level abstractions and being able to define special-purpose implementations that best meets the performance and functionality goals of applications.

The main challenge in the exokernel design, as described in the paper, is to give library operating systems the maximum possible freedom in managing physical resources while still protecting them from each other. To achieve this goal an exokernel separates control from management performing three tasks: tracking ownership of resources, ensuring protection by guarding all resource usage, and revoking access to resources. To achieve these tasks three techniques are presented. First, secure bindings, where a lib OS can securely bind to a machine resource. Second, visible revocation, allowing a lib OS to participate in a resource revocation protocol. And third, an abort protocol, used by the exokernel to break secure bindings of a given lib OS “by force”. It’s interesting to notice that these design principles state that an exokernel should avoid resource management at all cost, it should only manage resources to the extent required by protection, since the authors have a strong belief that distributed, application-specific resource management is the best way to build efficient flexible systems.

The secure binding is the protection mechanism that decouples authorization from the actual use of a given resource. Using hardware mechanisms, software caching, and downloading application code, this secure binding mechanism is implemented. To ensure protection the exokernel guards every access to a physical memory page into a buffer named TLB. The processor contains the TLB, and the exokernel checks the memory capabilities whenever a library operating system attempts to enter a new virtual-to-physical mapping. Once resources are bounded to applications, the way to reclaim them and break their secure bindings is though visible revocation: traditional operating systems invisibly perform the revoking process, meaning that the applications have no involvement in the process of deallocating resources, this form of revocation has lower latency compared to the visible revocation process, but its advantage is that applications (library operating systems) cannot guide deallocation and have the knowledge that resources are scarce. An exokernel, on the other hand, uses the visible revocation for most resources, allowing library operating systems to react by saving only the required processor state for example. The revocation process thus, is viewed by the authors as a dialogue between the exokernel and the library operating system. Still, the exokernel must be able to take resources from the library operating system when it fails to respond satisfactorily to revocation requests. When an exokernel takes a resource from a library operating system, it records it in a “repossession vector” so the applications can decide what to do with that latter. This process of taking resources from lib OS “by force” is the abort protocol.

After those above-mentioned aspects of the exokernel design, the paper from the section 4, starts to describe the implementation details of Aegis, the prototype exokernel developed by the authors, as well as the ExOS, a library operating system and its extensions. The experiments conducted aim to demonstrate mainly four things: exokernels can be very efficient, secure multiplexing of hardware can be implemented efficiently, traditional OS abstractions can be efficiently implemented at the application level and applications can create special-purpose implementations of these abstractions. The paper presents many comparisons between Aegis and Ultrix, a monolithic UNIX based operating system, demonstrating that there is indeed much overhead in those systems which can be easily removed by specialized implementations.

The key implementation aspects and components of the Aegis implementation are: processor time slices, processor environments, exceptions, protected control transfer, and dynamic packet filters. Aegis represents the CPU as a linear vector, where each element corresponds to a time slice that is partitioned at the clock granularly so intuitively, position can be used to meet deadlines and to implement rules for trading between latency and throughput giving applications an interesting degree of control over context-switching. Aegis’ processor environment is described as a structure that stores the information needed to deliver events to applications, and there are four kinds of events that can be delivered by Aegis: exceptions, interrupts, protected control transfers, and address translations. Aegis dispatches all hardware exceptions to applications so they can know better what to do when an exception occurs, reducing drastically the kernel intervention allowing applications to resume immediately after processing an exception. Aegis also provides an abstraction for efficient implementations of inter-process communication (IPC) called protected control transfers, being these transfers synchronous or asynchronous, allowing applications to have an atomic control transfer, never overwriting any application-visible register so that applications can construct their own IPC abstractions according to its particular needs. To determine which application should receive incoming messages at the networking level, Aegis relies on packet filters for implementing extensible kernel demultiplexing. The main difference between the way Aegis handles packet filters and traditional approaches is that Aegis makes the filter process run faster by creating the executable code at runtime (dynamic code generation). It eliminates the interpretation overhead since the operating system compiles the packet filters’ code when they are installed into the kernel. In summary, all those implementation items together definitely are the reason for Aegis’s good performance, as the authors explain in detail under section five of the paper.

After focusing on Aegis’s main implementation details, the paper dedicates the remaining sections to demonstrate that basic system abstractions can be implemented at the application level in an efficient manner in ExOS, their library operating system. The authors highlight the implementation of the IPC (inter-process communication), virtual memory, and remote communication. The application-level implemented virtual memory has two limitations: it does not handle swapping and its page-tables are implemented as a linear vector. Even though, the ExOS benchmarking with Ultrix shows that the exokernel based system performs well on most of its memory operations, except for the operations that handle read-protect cases (prot100 and unprot100). Focusing on the Ultrix benchmarking, the authors demonstrate that the library operating system working above the exokernel interface can be indeed extensible enough to fulfill application-specific requirements.

Like microkernels, exokernels are designed to increase extensibility, as the authors peak at the end of the paper. Unlike traditional microkernels, an exokernel pushes the kernel interface much closer to the hardware, which increases considerably its flexibility. Many related works are presented as well, demonstrating that the aim for greater flexibility through the separation of kernel policies and mechanisms has already been demonstrated before, for example, in Lampson’s description of CALTSS and in Brinch Hansen’s microkernel paper “The nucleus of a multiprogramming system” from 1970. In the conclusion, the fact that most of their assumptions on exokernel advantages and feasibility were proven correct through the Aegis and ExOS implementation: Aegis’s performance is better or very similar in terms of performance when compared to modern high-performance implementations of exceptions and control transfer primitives, since exokernel primitives are fast, secure multiplexing of hardware resources were implemented in an effective way, and lastly, traditional operating system abstractions were successfully implemented at application level demonstrating that applications can create special-purpose implementation of abstractions by modifying a library. Based on the presented results, the authors finish the paper stating that the exokernel architecture is a viable structure that proposes high-performance operations and greatly extensible operating systems.