CS 201Computer Systems Programmingchapter 15

Paged VMMHM

CS 201Computer Systems ProgrammingChapter 15

“Virtual Memory Management

Via

Demand Paging”

Herbert G. Mayer, PSU CS

Status: 8/1/2014

Paged Virtual Memory Management (VMM) (11/26/2013)

With the advent of 64-bit architectures, paged Virtual Memory Management (VMM) experienced a revival in the 1990s. Originally, the driving principle for developing VMM systems (paged or segmented or both on Multics [4]) was the need for more addressable program memory than wastypically available in physical memory. Scarceness of physical memory originally was caused by the high cost of memory. The early technology of core memories carried a high price tag due to the tedious manual labor involved. Now the days of high cost per byte of memory are long gone, but the days of insufficient physical memory have returned, with larger data sets supported by64 bit addresses.

Paged VMM is based on the idea that some memory areas can be relocated out to disk, while other re-used areas can be moved back from disc into memory on demand. Disk space abounds while main memory is more constrained. This relocation in and out, called swapping, can be handled transparently, thus imposing no additional burden on the application programmer. The system must detect situations in which a logical address references an object that is on disk and must therefore perform the hidden swap-in automatically.

Paged VMM trades speed for address range. The loss in speed is caused by mapping logical-to-physical, and by the slow disk accesses. A typical disk access can be 100s of thousands to millions of times more expensive -in number of cycles- than a memory access. However, if virtual memory mapping allows large programs to run --albeit slowly-- that previously could not execute at all due to their demands of memory, then the trade-off seems worth the slow speed. The real trade-off is enabling memory-hungry programs to run slowly vs. not executing at all.

Synopsis

High-Level Steps of Paged Virtual Memory Management
High-Level Mapping Steps (Two-Level Mapping)
Definitions
History of Paging
Goals and Methods of Paging
An Unrealistic Paging Scheme
A Realistic Paging Scheme for 32-bit Architecture
Typical PD and PT Entries
Page Replacement
Belady Anomaly
Bibliography

High-Level Steps of Paged Virtual Memory Management

Some instruction references a logical address
VMM determines, whether logical address maps onto a resident page
If yes, the memory access can complete. In a system with a data cache, such an access is fast; see also special purpose TLB cache for VMM.
If memoryaccess is a store operation, (AKAwrite) that fact is recorded
If access does not address a resident page:
the missing page is found on disk and made available in memory (swap-in); or else:
the missing page is created for the first time ever, generally with initialization for system pages and no initialization for user pages; yet even for user pages a frame must be found first
Making a new page available requires finding memory space of size of a page frame,aligned on a page-size boundary; commonly named a page frame
If such space can be allocated from unused memory (usually during initial program execution), such a page frame is now reserved from –i.e. removed from–available memory space
If no page frame is available, some currently resident page must be swapped out; the freed space can then be reused for the new page
Note preference of locatingunmodified pages, i.e. pages with no writes; no swap-out needed!
Also note preference: Best to swap out a page from the very same process that is requesting a new page frame
Should the page to be replaced be dirty (i.e. modified), it must first be written to disk
Otherwise a copy already exists on disk andswap-out operation is empty

High-Level Mapping Steps (Two-Level Mapping) for 32-bit architecture

Some instruction references thelogical address (la) of some piece of data
Processor finds datum in L1 (or lower-level) data cache, and the operation is done;
else:The method for VMM mapping starts as follows:
Processor finds the start address of Page Directory (PD)
Logical address is partitioned into three bit fields, Page Directory (PD) Index, Page Table (PT) Index, and user-page Page Offset
The PD finds the PT, the PT finds the user Page Frame (actual page address)
Typical entries are 32-bits long, 4 bytes on byte-addressable architecture
Entry in PD is found by adding PD Index left-shifted by 2--AKA multiplied by 4--to be added to the start address of PD; this yields the address of the fitting PD slot; this slots holds the PT address
Add PT Index left-shifted by 2 to the previously found PT address; this yields aPage Address
Add Page Offset to previously found Page Address; this yields the actual user byte address
Along the way there may have been 3 page faults, with swap-outs and swap-ins
During book-keeping (e.g. searching for clean pages, locating the LRU page among all candidates) many more memory accesses can result

Definitions

Alignment:

Attribute of some memory address A, stating that A must be evenly divisible by some power of two, the alignment number. For example, word-alignedaddress A on some 4-byte, 32-bit architecture means, that address A is evenly divisible by 4, or the rightmost 2 address bits of A are both 0.

Demand Paging:

Policy that allocates a page in physical memory only if an address on that page is actually referenced (demanded) in the executing program. The virtual memory manager (VMM), part of the operating system, is responsible for the logical to physical page address mapping

Dirty Bit:

Data structure (single-bit suffices) that tells whether the associated page was written after its last swap-in (or creation). Synonym M for modified bit, or W for written bit.

Global Page:

A page that is used in more than one program; typically found in multi-programming environment with shared pages.

Logical Address:

Address as defined by the architecture; it is the address as seen by the programmer or compiler. Synonym virtual address and on Intel architecture: Linear address. Antonym: physical address.

Overlay:

Before advent of VMM, programmers had to manually relocate information out, to free memory for next data. This reuse of the same data area was called: to overlay memory.

Page:

A portion of logical addressing space of a particular size and alignment. The start address of a page is an integral multiple of the page size, thusit also is page-size aligned. A logical page is placed into a physical page frame.Synonym: User page. Another technology: Segment.

Page Directory:

A VMM data structure that lists all addresses for Page Tables. Typically this directory consumes an integral number of page frames as well, e.g. 1 page. In addition to page table addresses, each entry in the Page Directory also contains information about presence, access rights, written to or not, ofPage Tables.

Page Directory Base Register (pdbr):

Hardware register and VMM resource (machine register) that holds the address of the Page Directory page

Page Fault:

Logical address references a page that is not resident. Consequently, space must be found for the referencedpage, and that page must be either created, or must be swapped in, if itexisted before.

Page Frame:

A portion of physical memory that is aligned and fixed in size,able to hold one page of content. It starts at a boundary evenly divisible by the page size. Total physical memory is an integral multiple of the page size. Logical memory is an integral multiple of page size by definition on a binary system, where the page size is a power of 2.

Page Table:

A VMM data structure that lists the addresses of user Pages. Typically each page table consumes an integral number of page frames, e.g. 1. In addition to page addresses, each entry also contains information about presence, access rights, dirty, global, etc. similar to Page Directory entries.

Physical Memory:

Main memory actually available physically on a processor. Antonym: Logical memory.

Present Bit:

Single-bit VMM data structure that tells, whether the associated page is resident or swapped out onto disk.

Resident:

Attribute of a page (or of any memory object) referenced by executing code: an object physically in memory is called resident; if it is not in memory, then it is non-resident.

Swap:

Either swap-in or swap-out.

Swap-in:

Transfer of a page of information from secondary storage to primary storage, i.e. a page frame in memory; AKA the transfer from disk to physical memory.

Swap-out:

Transfer of a page of information from primary to secondary storage; from physical memory to disk.

Thrashing:

Excessive amount of swapping. When this happens, performance is severely degraded. This is an indicator for the working set being too small.

Translation Look-Aside Buffer:

Special-purpose VMM cache for storing recently used Page Directory and Page Table entries. AKA tlb.

Virtual Contiguity:

Memory management policy that separates physical from logical memory. In particular, virtual contiguity creates the impression that two addresses n and n+1 are adjacent in memory, while in reality they are an arbitrary number of physical locations apart from one another. Usually, they are an integral positive or negative multipleof page-sizes apart.

Virtual Memory:

Memory management policy that separates physical from logical memory. In particular, virtual memory can create the impression that a larger amount of memory is addressable than is really available on the target.

Working Set:

That number of allocated physical page frames to guarantee that the program executes without thrashing. Working set is unique for any piece of software; moreover varies by the input data for a particular execution of such a piece of SW.

History of Paging

Invented in the 1960s at the University of Manchester for Atlas Computer
Used commercially in KDF-9 computer [5] of English Electric Co.
In fact, KDF-9 was one of the major architectural milestones in computer history, aside from theawesome von Neumann and Atanasoffmachines; incorporated first cache and VMM, had hardware display for stack frame addresses, etc.
In the late 1960s to early 1980s, memory was expensive, processors were expensive and getting fast, and programs grew large. Insufficient memories were common and were one aspect of the growing Software Crisisof the late 1970s
16-bit minicomputers and 32-bit mainframes became common; also 18-bit address architectures (CDC and Cyber) of 60-bit words used
Paging grew increasingly popular: fast execution was gladly traded against large address range at cost of slower performance
By early to mid 1980s, memories became cheaper, faster
By the late 1980s, memories had become cheaper yet, the address range remained 32-bit, and large physical memories became possible and available
Several supercomputers were designed with operating systems that provided no memory virtualization at all, no memory mapping. For example, Cray systems were built without VMM. The Intel Hypercube NX® operating system had no virtual memory management
Just when VMM was falling into disfavor, the addressing limitation of 32 bits started to constrain real programs. In the 1990s, 64-bit architectures became common-place, rather than an exception
Intermediate steps between evolving architectural generations: The Harris 3-byte system with 24-bit addresses, not quite making the jump to 32 bits. The Pentium Pro®with 36 bits in extended addressing mode, not quite yet making the jump to 64 bit addresses. Even the earliest Itanium® family processors had only 44 physical address bits, not quite making the jump to the physical 64 bit address space.
By the end of the 1990s, 64-bit addresses were common, 64-bit integer arithmetic was generally performed in hardware, no longer via slow library extensions
Pentium Pro has 4kB pages by default, 4MB pages if page size extension (pse) bit is set in normal addressing mode
However, if the physical address extension (pae) bit and pse bit are set, the default page size changes from 4kB to 2 MB

Goals and Methods of Paging (VMM)

Make full logical (virtual) address space available, even if smaller physical memory installed
Perform mapping transparently. Thus, if at a later time the target computer receives a larger physical memory, the same program will run unchanged on the upgraded computer, butsomewhat faster
Map logical onto physical address, even if this costs some performance
Implement the mapping in a way that the overhead is small in relation to the total program execution
A necessary requirement is a sufficiently large working set, but also:
This is accomplished by caching page directory- and page table entries
Or by placing the complete page directory into a special cache; rarely done

An Unrealistic Paging Scheme

On a 32-bit architecture, byte-addressable, 4-byte words, 4kB page size
Single-level paging mechanism; later we’ll cover multi-level mapping
On 4kB page, rightmost 12 bits of each page address are all 0, or implied
Page Table thus has (32-12 = 20) 220 entries, each entry typically 4 bytes
Page offset of 12 bits identifies each byte on a 4kB page

WithPage Table entry consuming 4 bytes, results in page table of 4 MB
This is not a good paging scheme, since the scarce resource, physical memory, consumes already a 4MB overhead. Note that Page Tables should be resident, as long as some entries point to resident user pages
Thus the overhead may exceed the total available resource, which is memory
Moreover, almost all entries are empty; their associated pages do not exist yet; the pointers are null!

Problem: The one data structure that should be resident is too large, consuming much or even most of the physical memory that is so scarce --or was so scare in the 1970s, when VMM was productized
So: break the table overhead into smaller units!
Disadvantage of additional mapping: more memory accesses
Advantage of more mapping: Avoiding large 4MB Page Table

A Realistic Paging Scheme for 32-bit Architecture

Again assuming a 32-bit architecture, byte-addressable, 4-byte words, 4kB page size
Two-level paging mechanism, consisting of Page Directory and Page Tables in addition to user pages, with pdbr
With 4kB page frame size, again the rightmost 12 bits of any page address are 0  implied
All user pages, page tables, and the page directory can fit into identically sized page frames!

Mechanism to determine a physical address:

Start with someHW register, such as apdbr that points to Page Directory

Or else implement thePage Directory as a special-purpose cache
Or else place PD into a-priori known memory location
But there must be a way to locate the PD, and best without a memory access

Design principle: have user pages, Page Tables, and Page Directory look the same, for example, all should consume one page frame each
Every logical address is broken into three parts:
two indices, generally 10 bits on a 32-bit architecture with 4 kB page frames
one offset, generally 12 bits, to locate any offset on a 4 kB page
total adds up to 32 bits
The Page Directory Index is in factjust an index; to find the actual entry in PD, first < 2 (or multiply by 4), and then add this number (i.e. a page offset) to the start address of PD; thus a PD element(a slot, AKA an element) can be found
Similarly, Page Table Index is an index; to find the entry in a page table: left-shift by two, < 2 (or multiply by 4) and add the result to the start address of the PT found in previous step; thus an entry in the PT is found
PT entry holds the address of the user page, yet the rightmost 12 bits are all implied, are all 0, hence don’t need to be stored in the page table
The rightmost 12 bits of a logical address define the page offset within the physical found user page
Since by assumption, page frames are 4 kB in size, 12 bits suffice to identify any byte in such a user page
Given that the address of a user page is found, add the offset to its address, the final byte address (physical address) is finally identified

Disadvantage: Multiple memory accesses, up to 3; can be worse if the page-replacement algorithm also has to search through memory
In fact, can become way worse, since any of these total 3 accesses could cause page faults, resulting in 3 swap-ins, with disc accesses and many memory references; it is a good design principle never to swap out the page directory, and rarely –if ever—the page tables!
Some of these 3 accesses may cause swap-out, if a page frame has to be located, making matters even worse
Performance loss could be tremendous; i.e. many decimal orders of magnitude slower than a single memory access
Thus some of these data structures should be cached
In all high-performance architectures –e.g. Intel Pentium Pro®– a Translation Look-Aside Buffer (TLB) is designed as a special purpose cache of PD or PT entries
Also possible to cache the complete PD, since it is contained in size (4 KB)
Conclusion: VMM via paging works only, if locality is good; another way of saying this, paged VMM works well, only if the working set is large enough; else there will be thrashing

Typical PD and PT Entries

Entries in PD and PT need to explicitly store20 bits for a logical 32-bit address; the lower 12 bits of a 32-bit address are implied, they are all 0
The other 12 bits (beyond the 20) in any page directory or page table entry can be used for bookkeeping information; for example:
P-Bit, AKA Present bit, indicating: is the referenced page present? (or is itresident)
R/W/E bit: Can the referenced page be read, written, executed, all?
User/Supervisor bit, OS-dependent: is page reserved for privileged code?
Typically the P-Bit is positioned for quick, easy access: rightmost bit in 4-byte word
Some information is unique to PD or PT
For example, P-Bit not needed in PD entries, if Page Tables must be permanently present
On systems with varying page sizes, page size information must be recorded
Example: