Fall 2016 CDA 5155 Homework 3

Date-assigned: Nov 3rd, 2016

Due dates: on-campus: Nov. 19, 2016; EDGE: Nov. 20, 2016

You are not allowed to take or give help in completing this assignment. Submit the typewritten PDF or Microsoft Word version of the submission in Canvas website before the deadline. Scanned handwritten submissions will NOT be accepted. Necessary calculation procedure or explanation MUST be provided in your answer.

Total Points: 100 pts

  1. For the purpose of this exercise, we assume that we have 512-byte cache with 64-byte blocks. We will also assume that the main memory is 2 KB large. We can regard the memory as an array of 64-byte blocks: M0, M1, …, M31. Figure B.30 sketches the memory blocks that can reside in different cache blocks if the cache was fully associative.
  1. [10] <B.1> Show the contents of the table if cache is organized as a directmapped cache.
  2. [10] <B.1> Repeat part (a) with the cache organized as a four-way set associative cache.
  1. The LRU replacement policy is based on the assumption that if address A1 is accessed less recently than address A2 in the past, then A2 will be accessed again before A1 in the future. Hence, A2 is given priority over A1. Discuss how this assumption fails to hold when a loop larger than the instruction cache is being continuously executed. For example, consider a fully associative 128-byte instruction cache with a 4-byte block (every block can exactly hold one instruction). The cache uses an LRU replacement policy.
  1. [10] <B.3> What is the asymptotic instruction miss rate for a 64-byte loop with a large number of iterations?
  1. [10] <B.3> Repeat part (a) for loop sizes 192 bytes and 320 bytes.
  1. [5] <B.3> If the cache replacement policy is changed to most recently used (MRU) (replace the most recently accessed cache line), which of the three above cases (64-, 192-, or 320-byte loops) would benefit from this policy?
  1. To access data from a typical DRAM, we first have to activatethe appropriate row. Assume that this brings an entire page of size 8 KB to therow buffer. Then we select a particular column from the row buffer. If subsequentaccesses to DRAM are to the same page, then we can skip the activation step;otherwise, we have to close the current page and precharge the bitlines for thenext activation. Another popular DRAM policy is to proactively close a page andprechargebitlines as soon as an access is over. Assume that every read or write toDRAM is of size 64 bytes and DDR bus latency (Data out in Figure 2.30) forsending 512 bits is Tddr.
  1. [10] <2.3> Assuming DDR2-667, if it takes five cycles to precharge, fivecycles to activate, and four cycles to read a column, for what value of the rowbuffer hit rate (r) will you choose one policy over another to get the bestaccess time? Assume that every access to DRAM is separated by enough timeto finish a random new access.
  2. [5] <2.3> If 10% of the total accesses to DRAM happen back to back orcontiguously without any time gap, how will your decision change?
  3. [10] <2.3> Calculate the difference in average DRAM energy per accessbetween the two policies using the row buffer hit rate calculated above.Assume that precharging requires 2 nJ and activation requires 4 nJ and that100 pJ/bit are required to read or write from the row buffer.

  1. [10] <5.2> Many snooping coherence protocols have additional states, state transitions,or bus transactions to reduce the overhead of maintaining cache coherency.In Implementation 1 of Figure 5.36, misses are incurring fewer stall cycleswhen they are supplied by cache than when they are supplied by memory. Somecoherence protocols try to improve performance by increasing the frequency ofthis case. A common protocol optimization is to introduce an Owned state (usuallydenoted O). The Owned state behaves like the Shared state in that nodes mayonly read Owned blocks, but it behaves like the Modified state in that nodes mustsupply data on other nodes’read and write misses to Owned blocks. A read missto a block in either the Modified or Owned states supplies data to the requestingnode and transitions to the Owned state. A write miss to a block in either stateModified or Owned supplies data to the requesting node and transitions to stateInvalid. This optimized MOSI protocol only updates memory when a nodereplaces a block in state Modified or Owned. Draw new protocol diagrams withthe additional state and transitions.
  1. For the following code sequences and the timing parametersfor the two implementation 1 in Figure 5.36, compute the total stall cycles for thebase MSI protocol and the optimized MOSI protocol in Exercise 5.3. Assume thatstate transitions that do not require bus transactions incur no additional stall cycles.

a. [5] <5.2> P0: read 110

P3: read 110

P0: read 110

b. [5] <5.2> P1: read 120

P3: read 120

P0: read 120

c. [5] <5.2> P0: write 120 <-- 80

P3: read 120

P0: read 120

d. [5] <5.2> P0: write 108 <-- 88

P3: read 108

P0: write 108 <-- 98

  1. Exercise 5.7 in the textbook.
  1. Exercise 5.9 in the textbook.