Mac OS X: a Modern UNIX-Based Operating System

Mac OS X: A Modern UNIX-Based Operating System

Nathan Henkel

Stephen Ayers

Dan Hodos

David Der

William Hunt

Table of Contents

Overview of Mac OS X 3

Data Structures and Abstractions 4

Thread Management 5

Mutual Exclusion and Synchronization 6

Interobject Communication 6

Memory Management 7

File System and File Management 8

Conclusion 9

Overview

In the early 1990s, Apple Computer, Inc. first began plans to create a next-generation operating system (OS); Apple was hoping to create an OS that would be stable, portable, easy-to-use, and offer performance enhancements over the current Mac OS (Glaser, 2001). In February 1994, Apple announced the beginning of Copland, Apple’s first venture to create a modern OS.In August 1996, Apple killed the Copland project because “Copland sucked tremendous money, resources, and time out of Apple and ultimately delivered … nothing. After years of promises and untold millions (billions?) spent on the project, Apple shelved its attempts at building its own next-generation operating system and started looking outside the company” (Garfinkel & Mahoney, 2002).

Eventually, Apple decided to purchase NeXT, Inc., started by Apple co-founder Steve Jobs, for $430 million dollars (Mesa, 1998).The acquisition of NeXT allowed Apple to use NeXT’s operating system, named NeXTSTEP, as the basis for their new operating system. This new project was code-named Rhapsody and, after countless hours of integration work, was released unto the public as Mac OS X in March 2001. “Mac OS X is an industrial-strength, modern operating system engineered for reliability, stability, scalability, and performance."(Apple, 2003) The new OS introduced a number of new features, including preemptive multitasking, protected memory, advanced memory management, symmetric multiprocessing, and more (Glaser & Reynolds, 2003).

Mac OS X is built using the Darwin Kernel, an open-source UNIX-based core built on top of a Mach 3.0 microkernel along with 4.4BSD core services. On top of Darwin is the Quartz graphics system, OpenGL and Quicktime. Quartz provides a window server as well as a graphics-rendering library that uses PDF (Portable Document Format) as its imaging model. OpenGL is the industry standard for high-end 2D and 3D graphics. Quicktime is a technology designed by Apple for the storage, manipulation, enhancement and streaming of multimedia such as videos or animation. On top of these graphics engines are a number of different programming APIs, namely Cocoa, Carbon, Classic, and Java; all of these APIs provide a way for the graphics engine to interact with Aqua, the user interface developed specifically for Mac OS X

(Apple 3, 2003).

Apple has since released three versions of Mac OS X since its initial release in March of 2001: 10.1 in September 2001, 10.2 (codenamed Jaguar) in August 2002, and most recently 10.3 (codenamed Panther) in October 2003. Every iteration of the operating system introduced performance enhancements, increased stability, and new features to Apple computers; these innovations, as well the ease with which applications can be created for Mac OS X, have spurred over five million users to adopt Mac OS X as of January 2003 (Apple 2, 2003). The advancements made in the operating system, coupled with new hardware debuts such as the iPod, Power Macintosh G5, and iMac, have combined to pull Apple out of the financial trouble they encountered in the early 1990s.

Mac OS X is designed to run exclusively on Apple's hardware. There is a good reason for this; Apple is primarily a hardware company. Its revenues from hardware greatly exceed its revenues from software. Mac OS X and many other software applications designed by Apple are intended to increase the marketability of Apple hardware (which is why many quality application programs from Apple are provided free of charge). Apple computer hardware includes servers, professional and consumer grade desktop systems, and professional and consumer grade laptop systems. Apple's computer systems include both 32-bit and 64-bit architectures. These are the only hardware environments on which Mac OS X is designed to run, allowing Apple a great deal of control over the integration of the hardware and operating system. For example, because of Mac OS X's powerful and simple multiprocessing support, Apple nearly always includes dual-processor configurations in its lineup. Mac OS X is designed as a multiuser system. It can be easily implemented over a network. In Mac OS X, Apple has attempted to preserve the ease of use found in prior versions of the Mac OS, maintain their market base with creative professionals, and expand into new markets.

In this paper, we will discuss several of the technical aspects of Mac OS X, including data structures and abstractions, thread management, interobject communication, mutual exclusion and synchronization, memory management, and the file system. We have confined our discussion to elements found in the Mac OS X kernel environment. Although there is much that is interesting about the Macintosh APIs and Aqua (the windowing system), a discussion of these elements is beyond the scope of this paper.

Data Structures and Abstractions

Because Mach posesses a fundamentally Object-Oriented design, it is important to understand the abstractions and data structures defined in Mach before proceeding with a discussion of the Macintosh operating system. It will also be useful to explain some of the data structures specific to Mac OS X.

The fundamental unit of resource ownership in Mach is the task. A task is similar, but not identical, to a process. The distinction is most apparent in the management of threads. Although a thread "belongs" to a particular task, and can only be accessed through that tasks ports, for purposes of execution Mach does not care whether two threads come from the same task or from distinct tasks. Lacking a traditional process, in Mach, the thread is the basic and sole unit of execution. Mac OS X does define a process in the BSD part of the kernel to represent a running program. In this sense, a process represents a Mach task and some number of Mach threads. The notion of a process is important for purposes of controlling execution from a high level. More information about threads and processing threads is covered in a later section of this paper.(Rashid et al. 1987, Apple KP 2003)

Each task consists of a virtual address space (4 GB in OS X), some number of ports, and some number of threads. A port is the channel of communication to any object. Ports are kernel-protected so that any object sending a message to a port must have the appropriate port right (permission). Port rights are passed by inheritance or granted by the kernel. Port rights are also discussed in greater detail in a later section. (Apple KP 2003, Silberschatz et al. 2 2003)

A final Mach abstraction is the memory object. Although several memory objects may be within the address space of a particular task, the task does not own these memory objects. Memory objects are owned by memory managers, which may be either user-level or kernel-level. More on memory objects and memory managers is covered in the section on Memory Management.(Apple KP 2003)

Managing Threads

It is worth noting once again that although a task is in some ways comparable to a process, there are differences sufficient to make the distinction important. A task is a collection of resources. As such it does not have a "state" like a process does. In Mac OS X, three states are defined for threads: Ready, Stopped, and Running. These correspond to the traditional process states of Ready, Waiting, and Running, respectively.

(Apple KP 2003)

A similarity between tasks and processes is that both are very expensive to create or destroy. Mach threads are, like threads in any other operating system, far less expensive to create or destroy. (Apple KP 2003)

Although Mac OS X does define different kinds of threads, at the lowest level, every thread is a Mach thread (a kernel-level thread). This structure allows for great flexibility in thread scheduling, as we shall see. On top of Mach threads, the first level of thread definition is the pthread (POSIX thread). All mach threads running in user space have a pthread layered on top of their mach thread. A pthread can support an NS (Cocoa) thread, a Carbon MP (multiprocessing) task (no relation to a Mach "task"), or a cooperative thread. (Apple 2001)

The advantage of having all threads built on top of generic Mach threads is that Mach sees no type difference among the various threads on the system. The task that owns a thread is irrelevant to the scheduler. They are all, simply, Mach threads. As such, Mach always follows its scheduling policies to select threads for execution. This lack of control by Mach makes possible control at a higher level, which is where the various thread definitions come in. For example, with cooperative threads, the owning process (not task) has a synchronization token (a Mach message) which it passes among the various cooperative threads belonging to it. Any thread without the token will block, so Mach will select the thread with the token over all those without it. Eventually the thread will yield the token to the Carbon Thread Manager, which will pass the token to the next cooperative thread. This is but one example of many types of manipulations that can affect the scheduling decisions that Mach actually makes.(Apple KP 2003, Apple 2001)

Mach makes available several scheduling policies. Three of these are recommended on OS X: the Thread Standard Policy (a system-defined "fair" policy), the Thread Time Constraint Policy (a soft real-time scheduling policy), and the Thread Precedence Policy (a priority policy). All these policies use preemption according to time quanta. The values for the time quanta vary with the scheduling policy. By mixing different Mach messages with the various policies, a great deal of flexibility is achieved in how thread scheduling will be handled. In Mac OS X, the kernel is not solely responsible for scheduling.(Apple KP 2003)

Mutual Exclusion and Synchronization

Mac OS X uses a traditional counting semaphore to acheive mutual exclusion. The semaphore is owned by a particular Mach task, and every semaphore must be assigned to some task when it is created (this is done via a parameter in the semaphore_create function. This task is also responsible for destroying the semaphore. Mac OS X defines three policies for semaphores: FIFO (first-in first-out), Fixed Priority, and Prepost.

The function of the FIFO policy should be obvious. Fixed Priority reorders the wait queue according to thread priority policies. Prepost prohibits the semaphore_signal function from incrementing the counter when no threads are in queue. This creates a condition where threads must always wait until they are signaled. (Apple KP 2003)

Mac OS X also makes use of spinlocks, mutex locks, and read-write locks. Because preemption in Mac OS X is based on time quanta, a spinlock will always persist until its time quantum expires. For this reason, there is usually no performance advantage gained by using spinlocks to avoid context switches. As such, spinlocks are only recommended where mutexes cannot be used. A thread executes a mutex signal when it is about to enter its critical section, then blocks, yielding the remainder of its time quantum. In general, this is preferable to a spinlock, but its use is limited to contexts where blocking is allowed. In contexts where blocking is not allowed (interrupt handlers, for example), mutex locks cannot be used. Read-write locks are the locks that are not always exclusive. A read write lock is implemented in a situation where multiple threads can read from shared data, but only one can write at a time without causing problems. As long as all the threads holding the lock are performing reads, any number of threads may hold a read-write lock. However, if any thread intends to write, it must block until all threads performing reads release their locks. (Apple KP 2003)

Interobject Communication

As Mach does not implement processes, "interprocess communication" would be a misnomer. Moreover, it would be an inadequate description since all Mach objects interact in the same way. In Mach, every object has some number of ports, and certain port rights. Communication from one object is passed in a message to the port of another object. An object can only send a message to a port for which it possesses the appropriate port-right. Each port allows some set of the possible operations or communications possible, so that an object can only perform those operations that correspond to the ports it can access. Mach implements a highly streamlined message service. There are only three system calls needed for message transmission: msg_send (send a message), msg_receive (receive a message), and msg_rpc (remote procedure call, sends a message and waits for a response). Messages from a single sender are queued according to a first-in, first-out scheme. Traditionally, implementations of message passing schemes have suffered from poor performance due to the frequent copy operations required (copying the message from the sender to the receiver, copying the response from that object back to the original sender. Mach avoids this problem in most cases by mapping the memory containing the message into the receiver's address space (no copy is made). (Silberschatz et al. 2 2003, Apple KP 2003, Silberschatz et al. 2003)

Memory Management

In Mac OS X, Memory Management is handled almost exclusively by or through the Mach portion of the kernel. Mach implements a sparse Virtual Memory scheme. Each process in Mac OS X is allocated a four gigabyte virtual address space, most of which is usually empty. This is intended to minimize the detrimental effects of fragmentation on the available memory space of a process. If the virtual space for an application is sufficiently large, it becomes unlikely that any page will be too large to fit any of the available spaces. (Apple KP 2003, Silberschatz et al. 2 2003)