Xen and the Art of Virtualization

03 Jan 2021

Original paper from 2003, by Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt and Andrew Warfield: https://dl.acm.org/doi/10.1145/1165389.945462

Among the numerous systems that have been designed using virtualization to subdivide the resources of computers, this paper presents Xen, a virtual machine monitor that allows multiple operating systems to share hardware safely without sacrificing performance. The authors present three main challenges to achieving this: isolate virtual machines from each other, support a good variety of operating systems, and lastly, not add a performance overhead that usually is introduced by the virtualization process.

The approach taken for Xen consists of small modifications made in the Guest OS so they can communicate directly with the VMM (virtual machine monitor) through what the authors call hypercalls. This strategy is named paravirtualization, it was taken mainly because support for full virtualization was never part of the x86 architecture, causing certain privileged instructions to fail silently rather than producing convenient traps that are necessary to allow the VMM interventions. The para-virtualized interface is factored into three broad aspects as presented in the article: memory management, CPU, and device I/O. As the authors highlight in the paper, virtualizing memory would be easier if all architectures provided a software-managed TLB (translation lookaside buffer), but that’s not the case for x86. To work around this, the authors decided that in Xen the guest OSes are responsible for allocating and managing the hardware page tables, with minimal Xen involvement to ensure safety and isolation. To restrict unallowed updates, each time a guest OS requires a new page table it initializes a page from its own memory reservation and registers it with Xen, that’s when the OS relinquish direct-write privileges to this page-table memory, causing thus an exception each time one write operation happens, making Xen capable of handling those exceptions controlling OS operations that should only map pages that it owns. Virtualizing CPU has also implications for guest OS, since the insertion of a hypervisor below the OS makes the usual assumption that the OS is the most privileged entity in the system. One interesting aspect of CPU virtualization is Xen, is the way Xen improves its performance by allowing each guest OS to register a “fast exception handler” that is accessed directly by the processor. Similar to hardware interrupts, Xen also supports an event-delivery mechanism used for sending notifications for each OS domain about hardware device operations.

The paper follows describing its subsystem virtualization with details on its design, where the goal of creating separation between policy from mechanism wherever possible is always assumed. Because of this goal, the resulting architecture of Xen is one in which the VMM itself (hypervisor) provides only basic control operations, and complex policy decisions such as admission control are performed by the management software that runs over a guest OS. One interesting aspect of the physical memory subsystem virtualization in Xen is that while most operating systems assume that memory comprises at most a few large continuous regions, Xen does not guarantee to allocate contiguous regions of memory. Mapping from physical to hardware address is the responsibility of the Guest OS that typically creates for themselves the illusion of contiguous physical memory even though their underlying allocation of hardware memory is sparse.

The benchmarks presented in the paper are made against some virtualization technologies, where the authors compare the total system throughput when executing multiple applications concurrently. The performance isolation is also compared, assessing the total overhead of running large numbers of operating systems on the same hardware. The authors present macro benchmarks focused on CPU intensive, filesystem, network, process, and basic VMM operations, as well as micro benchmarks focusing on Xen specific virtualization techniques such as the batch operations on hypercalls and zero-copy processes. The XenoLinux OS is the GuestOS used on the benchmarks, which overall, demonstrate that the performance of XenoLinux over Xen is equivalent to the performance of the baseline Linux system used on the benchmark, demonstrating that the careful design decisions involved in the paravirtualization model implemented in Xen caused a good performance result.