MIT’s Exokernel
CS-450-1: Operating Systems
Fall 2003
Presented by
Victoria Barrow
Kyle Safford
Sean Sommers
Table of Contents:.
I. Introduction………………………………………………………………1
II. Exokernel Design………………………………………………………...1
Secure Binding……………………………………………………………….2
Visible Revocation…………………………………………………………...2
The Abort Protocol…………………………………………………………...3
III. Aegis Exokernel………………………………………………………….3
Introduction…………………………………………………………………..3
Scheduling……………………………………………………………………3
Processor Environments..…………………………………………………….4
Costs for Processor and System calls………………………………………...4
Exceptions……………………………………………………………………4
Address Translations………………………………………………………....5
Protected Control Transfers………………………………………………….5
Aegis Overview……………………………………………………………...6
IV. Xok Exokernel…………………………………………………………...6
V. ExOS LibOS……………………………………………………………..7
Application Performance…………………………………………………….8
Overall………………………………………………………………………..8
VI. Conclusion………………………………………………………………..9
I. Introduction
An operating system is a program that defines an interface between application programs and computer hardware. Most of today’s operating systems overly complicate this interface by hiding information about machine resources behind its own set of high-level abstractions such as files, processes, interprocess communication, and address spaces. Since the implementations of these abstractions are hardcoded and therefore cannot be changed or replaced by “untreated” application programs, applications and other parts of the operating system must be made to work with and around them. Such a rigid basis makes an operating system very difficult to improve once written, since changing its implementation of existing abstractions would be near impossible. There is the possibility for applications to add new abstractions, but this could only be done by awkward emulation on top of the abstractions already hardcoded. With these issues in mind it would seem that not only are today’s operating systems overly complicated, but they are inefficient and inflexible as well. The exokernel operating system architecture addresses these problems by minimizing embedded functionality of the operating systems kernel and allowing application-level management of underlying hardware resources.
The main goal of the exokernel is to separate protection from management, or in other words protect resources but hand over management to applications. Since most applications do not need customized resource management it will have to be provided for them through operating system libraries that hide low-level resources behind traditional abstractions. The application can just link to the library operating system (LibOS) of choice and it should be able to function just as it would on a typical system. LibOSes are unprivileged, and consequently can be modified or replaced at any time. Due to this freedom, new implementations of LibOSes are easily incorporated just by relinking application executables. (Engler et al., 1995) This paper will provide an in-depth look at the exokernel and explore its advantages and disadvantages when compared to a traditional operating system.
II. Exokernel Design
In order to provide library operating systems with the maximum amount of control over the management of machine resources, the design of the exokernel had to drastically stray from traditional methods and be so bold as to “expose hardware”. One of the core features of the exokernel architecture is that it provides primitives at the lowest possible level required for security. By exposing hardware at such a low level, applications can access all physical resources as directly as possible. The exokernel is designed to securely export hardware DMA capabilities and privileged instructions. Most importantly it exports machine resources such as the CPU, physical memory, disk blocks, the translation look-aside buffer (TLB), and context identifiers (Application). The design of the exokernel avoids resource management as much as it safely can.
Applications are very involved in allocation. The library operating systems are allowed to request specific physical resources. Physical names are used whenever possible because they not only carry useful information, but they also save time by cutting out the cost of translating virtual names. Exokernels expose revocation policies to LibOSes and let the applications decide which instance of a resource to relinquish. (Kaashoek et al, 1995)
Though the exokernel delegates management to LibOSes, it must still protect these operating system libraries from each other. In order words, one error in a library operating system should not affect any other LibOSes. (Engler et al., 1995) Such protection and yet freedom requires that an exokernel do a few essential tasks. These tasks include tracking the ownership of resources, providing protection by guarding all resource usage or binding points, and revoking access to resources. In order to perform these tasks, an exokernel uses the following techniques: secure bindings, visible revocation, and abort protocol.
Secure Bindings:
The secure binding is a lightweight binding of an application to a resource. Enforcing a secure binding is fairly fast since protection checks are done using simple operations that the kernel (or hardware) can implement quickly. (Engler et al., 1995) Basically, the exokernel performs authorization when a resource is first requested by an application. This request period is referred to as bind time. Once authorization has occurred, binding can be used again and again during access time without authorization.
There are three techniques involved in implementing secure bindings: hardware mechanisms, software caching, and downloading application code.
With hardware support, secure bindings can use very simple protection operations thereby making later operations more efficient since they do not need to resort to high-level authorization information.
Secure bindings can be held in a large software cache. (Engler et al., 1995) In hardware, the cache monitors addresses of subsequent reads to see if the required data is already in the cache. If it is (a cache hit) then it is returned immediately. If the data is not cached (a cache miss) then it is fetched from main memory and also saved in the cache for further use. The cache is built from faster memory chips than main memory so a cache hit takes much less time to complete than a normal memory access. Just like memory cache, the software cache acts as a holding area for frequently used secure bindings within the software. (Engler et al., 1995)
Downloading code into the kernel is another way to implement secure bindings. Downloading code into the kernel allows an application thread of control to be immediately executed on kernel events. In addition to implementing secure bindings, downloading code improves performance. The advantages are it eliminates kernel crossings and the downloaded code can be readily bounded. (Engler et al., 1995)
Visible Revocation:
There needs to be a way to reclaim resources and break secure bindings after resources have been bound to an application. There are two forms of revocation, one in which revocation is visible to applications and one where it is invisible. Most operating systems perform revocation invisibly in which deallocating resources is done without application involvement or knowledge. As for the exokernel, visible revocation is used for most resources. Visible revocation allows an operating system library to specify which resources it wants deallocated. Since the exokernel uses physical resource names whenever possible, an exokernel is required to expose each revocation to the relevant LibOS so that operating system library can reposition its physical names (Engler et al., 1995). Therefore, visible revocation allows an operating system library to modify its conduct based on available resources. LibOSes should arrange resource lists so that the physical resources can be deallocated swiftly.
The Abort Protocol:
Sometimes library operating systems can fail to respond in a suitable manner to revocation requests. In these cases an exokernel must be able to forcefully take resources from any uncooperative LibOSes. In some sense there are two parts to visible revocation. The first stage is the exokernel “asking” the LibOSes to deallocate a resource and the second stage is the exokernel “demanding” it return the resource within a certain amount of time. The abort protocol defines the actions to take when application fails to respond in time. Essentially, if an application fails to comply with revocation protocol, then all of its existing secure bindings are broken by the exokernel. Once the bindings are broken, the exokernel informs the operating system library. A repossession vector is used to record all the forced losses of resources. Each taken resource is registered in the vector and the LibOS receives a “repossession” exception so that it can update any mappings that were using the resource (Engler et al., 1995). If the resource had a state, the exokernel can write the state into another memory resource. There is some concern that the exokernel might accidentally seize some of a LibOS’s critical resources, such as the physical memory used to store vital bootstrap information. To protect the LibOSes, each library is guaranteed a small number of resources that will not be repossessed, such as five to ten physical memory pages. (Engler et al., 1995)
III. Aegis Exokernel
Intro:
One of the exokernels that was chosen for experimenting with the MIT Exokernel model was Aegis. For the tests, Aegis was implemented on each of the DEC2100, DEC3100, and DEC5000 series platforms. The Aegis exokernel represents the CPU as a linear vector, where each element of the vector is a time slice. These time slices “are partitioned at the clock granularity and can be allocated in a manner similar to physical memory.” (Engler et al., 1995)
The beginning and end of time slices are denoted by timer interrupts. These timer interrupts are also delivered in a manner similar to exceptions. That is, “a register is saved in the ‘interrupt save area,’ the exception program counter is loaded, and Aegis jumps to user-specified interrupt handling code with interrupts re-enabled.” (Engler et al., 1995) General-purpose context switching such as saving and restoring live registers, and releasing locks, is done through the application’s handlers, which gives applications a large amount of control over context switching.
Scheduling:
Scheduling in Aegis “is done ‘round robin’ by cycling through the time slices.” (Engler et al., 1995) Unlike typical operating systems, CPU scheduling on the Aegis exokernel is concerned with scheduling the different Library OSes that are being used and not the "processes" and "threads" that each contain. For the most part, individual processes and threads are managed based on the Library OSes they are running under. Position, a crucial property of this representation, encodes an ordering and an approximate upper bound on when the time slice will be run. The position property has two main uses, to meet deadlines, and to trade off latency for throughput. “Fairness is achieved by bounding the time an application takes to save its context” (Engler et al., 1995); meaning, each subsequent timer interrupt is recorded in an excess time counter (for each excess time slice consumed, a subsequent time slice is forfeited).
Processor Environments:
Information needed to deliver events to applications is stored in an Aegis processor environment. Processor environments contain four contexts required to support the four kinds of events delivered by Aegis: exceptions, interrupts, protected control transfers, and address translations. These event-handling contexts are required to define a process, and each context depends on the others for validity. The four context environments are as follows:
Exception context: contains a program counter for where to jump to and a pointer to physical memory for saving registers.
Interrupt context: includes program counters and register-save regions. However, it is a bit different in the case of timer interrupts. In this case, it specifies separate program counters for start-time-slice and end-time-slice cases.
Protected Entry context: specifies program counters for synchronous and asynchronous protected control transfers from other applications. Access control is managed by the application itself.
Addressing context: consists of a set of guaranteed mappings. It also includes an address space identifier, a status register, and a tag used to hash into the Aegis software TLB (translation look-aside buffer). (Engler et al., 1995)
Costs for Procedure & System calls:
The null procedure and system call costs (in microseconds) for Aegis are compared to those of the Ultrix (a monolithic OS) in Table 4. As shown in Table 4, the costs for null procedures are for the most part equivalent for both systems. The real difference in costs is shown in the system calls tests. While the Aegis has two system call paths to be measured, those that do not require a stack and those that do, both are still approximately an order of magnitude faster then that of the Ultrix. This tells us that the base cost for demultiplexing system calls in the Ultrix is significantly higher.
Exceptions:
In Aegis, all hardware exceptions, except for system calls, are dispatched to the applications. In order to do this, Aegis performs the following steps. It saves the three scratch registers into an agreed upon area using physical addresses. “It loads the exception program counter, the last virtual address that failed to have a valid translation, and the cause of the exception.” (Engler et al., 1995) Finally, “it uses the cause of the exception to perform an indirect jump to an application-specified program counter value, where it resumes execution with the appropriate permissions set.” (Engler et al., 1995)
“After processing an exception the application can resume execution without even entering the kernel.” (Engler et al., 1995) In order to do this, all registers that are saved must be in user-accessible memory locations. This is so applications can return from their own exceptions. Due to the low-level nature of Aegis, it is extremely efficient at dispatching exceptions. Table 5 shows exception dispatch times for the following exceptions in microseconds: unaligned pointer accesses (unalign), arithmetic overflow (overflow), attempted use of the floating point co-processor when it is disabled (coproc) and access to protected pages (prot). The times for unalign and coproc are not available for the Ultrix system, because unalign is not available, and coproc is not accessible by applications, and therefore cannot be utilized. However, in each of the cases, Aegis’s “dispatch times are approximately two orders of magnitudes faster then Ultrix.” (Engler et al., 1995)
Address Translations:
There are two problems in supporting application-level virtual memory. These problems are, a system must provide bootstrapping for the virtual naming system, and the system must support virtual memory efficiency. Through the use of a small number of guaranteed mappings, Aegis provides simple bootstrapping. In order to implement guaranteed mappings efficiently, the virtual address space of an application is partitioned into two segments: normal application data, and code.
Translation look-aside buffers must have a fast refill time in order to support virtual memory more efficiently at the application-level (LibOSes). For this purpose, “Aegis caches TLB entries in the kernel by overlaying the hardware TLB with a large software TLB (STLB).” (Engler et al., 1995) STLB is directly mapped and resides in unmapped physical memory. Aegis handles TLB misses by first checking to see if the required mapping is in the STLB. If it is, then Aegis installs it and resumes execution. If not, the miss is forwarded to the application. In Aegis, an STLB “hit” takes approximately one to two microseconds. However, to perform “an upcall to application level on a TLB miss, followed by a system call to install a new mapping is at least three to six microseconds” (Engler et al., 1995) longer, thus less efficient.
