CSC/ECE 506: Architecture of Parallel Computers

Problem Set 3

Due Wednesday, October 12, 2016

There are 70 points on this problem set.

Problem 1. (20 points) Consider a virtual-memory system that can address a total of 232 bytes. Physical memory is only 16 MB but hard disk space is “unlimited.” Assume that page is 4 KB in size. To reduce access time the system uses a translation lookaside buffer (TLB). Suppose the memory system has the characteristics shown in the table below. The TLB and cache miss rates indicate how often the requested entry is not found. The main memory miss rate indicates how often page faults occur.

Memory / Access Time (cycles) / Miss Rate
TLB / 1 / 0.05%
Cache / 1 / 5%
Main memory / 100 / 0.0003%
Disk / 1,000,000 / 0%

(a) (8 points) What is the average memory access time of the virtual memory system before and after adding the TLB? Assume that the page table is always resident in physical memory and is never held in the data cache.

(b) (5 points) What is the average access time if it is a virtually mapped physically tagged cache after adding the TLB?

(c) (7 points) What is the size of the TLB, in bits, if it has 64 entries? Each entry contains the following: page-frame number, virtual page number, and a valid bit. Show your work clearly.

Problem 2. (15 points) The following is proposed as an alternate definition for SC.

•Each process issues memory requests in program order.

•After a read or write is issued, the issuing process waits for the operation to complete before issuing its next operation.

•Before a processor Pj can return a value written by another processor Pi, all operations that were performed with respect to Pi before it issued the store must also be performed with respect to Pj.

Are these conditions indeed sufficient to guarantee SC execution? If so, explain why. If not, construct a counterexample, and say why the conditions listed in Lecture 12 are indeed sufficient in that case. Hint: Think about how these conditions are different from the ones in Lecture 12.

Problem 3.(20 points)Below you will find two code segments. Determine what results are possible and what results are not, under sequential consistency. If a result is possible, list the execution sequence that will produce it. If a result is not possible, prove it is impossible. Assume all variables are initialized to 0.

(a) Processor 1 Processor 2

1ax = 1 2az = x
1by = x 2bx = y

(b) Processor 1 Processor 2
1ax = y 2ax = 1
1bz = 2 2b y = x

Problem 4. (15 points) Suppose a processor writes a block that is being shared by many processors (thus invalidating the block in their caches). If the line is subsequently re-read by the other processors, each will miss on the line. Researchers have suggested a read-broadcast scheme, in which if one processor reads the line, all other processors with invalid copies of the line read it into their second-level caches as well. Do you think this is a good extension of the protocol? Give two reasons to support your choice and at least one that tends to rebut it.

–1–