Spring 2008 CDA5155 Homework 3

Date assigned: Mar. 25, 2008

Due date: Apr. 4, 2008 (11:59 pm)

Due date for EDGE: Apr. 11, 2008 (11:59 pm)

Total Points: 100 pts

(Important: if the exercise is from case studies of textbook, please read the case study description carefully before solving problem)

Exercise 1 (5/5/5/10 pts)

Refer to Figure C.24 for TLB.

Assume caches with the following parameters:

Cache size: 1MB; Line size: 64B;

Page size: 8 KB

Address (virtual) width: 38 bits

Address (physical) width: 32 bits

TLB (translate fixed 8-Kbytes pages): 512 entries, 8-way set-associative

Cache topology: a. direct-mapped, b. 8-way, c. fully-associative

Answer the following questions:

(a) Compute the number of bits of the tags fields and index fields for each of the three cache topologies.

(b) Compute the width of virtual address tag and the width of the translated physical page address in TLB.

(c) Assume the valid entry percentage of TLB is 95%, while 90% for cache, what is the ratio of the coverage between TLB and cache, in terms of the total size of memory blocks that can be records in TLB and in cache.

(d) Does fully-associative cache always perform better than direct-mapped cache? If the answer is no, give an example.

Exercise 2 (10/15 pts)

The following diagram shows a snooping-bus based cache-coherent system with 2 processors. Each processor is equipped with a single-level cache. Each cache has 4 lines with 2-way set-associative design. Assume that cache lines A, B, C, and D are mapped to the set 0 and lines E, F, G, and H are mapped to set 1. The diagram and assumptions are used for answering following questions.

a. Uniprocessor cache misses can be classified into 3Cs. Describe the definitions of these three Cs. In addition, fill in cache content as well as hits or miss (including the type of misses) in the following table, assuming the requests are issued from left to right and the LRU replacement policy is used.

·  Compulsory using 1 to represent

·  Capacity using 2 to represent

·  Conflict using 3 to represent

·  The left block in the cache set it MRU block, while the right block is LRU block.

Proc. / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1 / P1
Addr. / A / B / C / A / B / C / E / F / G / C / B / A / G / E
Cache
Set 0 / A / B, A
Cache
Set 1 / -
Hit/Miss / M / M
Miss Type? / 1 / 1

b. In a multiprocessor environment, another type of cache misses (called coherence miss) may be encountered due to maintaining cache coherence using the MSI 3-state invalidation-based protocol in figure 4.6. Again, describe the definition of the coherence miss. Also, fill in the cache content as well as hit/miss (including the type of misses) in the following table. Note that the request type can either be a read or a write. Note also that a compulsory miss is encountered when the line is first accessed (either read or write) in each processor. A write that hits a shared copy is considered a hit, but an invalidation of other copies must be performed.

·  Coherence miss using 4 to represent

Proc. / P1 / P1 / P1 / P2 / P1 / P2 / P2 / P1 / P1 / P2 / P1 / P1
Req.
type / Read / Read / Read / Write / Read / Read / Write / Read / Read / Write / Read / Read
Addr. / A / B / C / C / C / C / B / D / B / B / C / B
Cache
P1/Set 0 / A / B,A / C,B
Cache
P1/Set 1 / -
Cache
P2/Set 0 / -
Cache
P2/Set 1 / -
Hit/
Miss? / M / M / M
Miss Type? / 1 / 1 / 1

Exercise 3 (10/10/5 pts)

Answer the following questions about directory-based coherence protocol and the memory consistency in a shared memory multiple processor system.

a. Assume the cache lines contains A1=1 and A2=2 are uncached initially. A1 and A2 are different block and are mapped to the same cache frame in each cache. Consider the following sequence of memory operations:

P1: Read A1

P2: Write A1=3

P2: Write A2=4

P1: Read A2

Simulate the cache coherence activities similar to the example given in the lecture using the following table (Additional spaces may be given for each memory operation).

The original directory-based coherence protocol is given in Figure 4.21 and 4.22 in the textbook and commands are in Figure 4.20.

P1 / P2 / Bus / Directory / Memory
Step / State / Addr / Value / State / Addr / Value / Action / Proc / Addr / Value / Addr / State / {Pro} / Value
P1: Read
A1
P2: A1=1
P2: A2=2
P1: Read A2

b. Now assuming that P1 and P2 independently execute the following instructions:

(A1 and A2 are in different blocks.)

P1: Read A2 P2: Write A1=10

Read A1 Write A2=20

Both A1 and A2 are shared in both caches of P1 and P2 with the initial value as in part a. Write all combinations of A1 and A2 values that are read by P1 without violating the sequential consistency. Also, write all combinations of A1 and A2 values that are read by P1 with a violation of sequential consistency.

c. Comparing the performance of snooping-bus protocol and directory-based protocol, which one is better? Why? (You don’t need to simulate a specific program, just answer by 2-3 sentences)

Exercise 4 (25 pts)

Solve Exercise 4.3 the book (H & P, 4th Edition, Page 267)

Please draw your optimized 4-state protocol diagrams with transitions only.