CISC662 - COMPUTER SYSTEMS: ARCHITECTURE - Fall 2009

Assignment 3

IMPORTANT: Write your name at the beginning of the manuscript containing your summary and answers. Name the .pdf file with your name and the content of the file, e.g., homework1_mtaufer.pdf. Send the file to the instructor per e-mail before the deadline.

PART 1

Part C - Essay on Limits of Instruction Level Parallelism

After reading chapter 3 carefully and critically, write an essay that summarizes the limits of instruction level parallelism.

Help in composing your essay:

These questions help you to capture the major contributions of the chapter. Make sure that your essay answers this question in a concise and clear way.

  • State as accurately as possible the authors’ purpose for writing the chapter. What is the chapter trying to accomplish? What is the chapter central aim?
  • Identify the key questions in the mind of the authors when they wrote the chapter. What questions is the chapter raising? What questions is the chapter addressing?
  • Identify the facts, observations, and/or data the author is using to support their conclusions. Be quantitative. What information is the chapter using in coming to that conclusion? What experience has the chapter to support its claim? What information does the chapter need to settle the question? What is the authors taking for granted?
  • Identify the key conclusions the authors present in the chapter. How did the authors reach this conclusion?Is there another way to interpret the information?
  • What consequences are likely to follow if people take the author’s reasoning seriously? Trace the implications and consequences that follow from your reasoning. Search for negative as well as positive implications. Consider all possible consequence.

Constraints:

Your essay CANNOT be longer than 4 pages. You cannot use fonts smaller than 11pt and have margins smaller than 1 inch, top, bottom and sides. Please note that if you do NOTfollow these two roles, the essay will not be graded.

Evaluation:

The essay will be evaluated based on these criteria

Criteria / Great / Needs Work / Poor
Well planned beginning and ending / 5 / 3 / 1
Engaging, interesting verbal style / 5 / 3 / 1
Correct message conveyed with good detail / 5 / 3 / 1
Good use of data, tables, lists / 5 / 3 / 1
Convey understanding of the topic / 5 / 3 / 1
Total scores:

PART 2

IMPORTANT NOTES: Write clearly, be concise, and report only the information asked. Additional information (not required) will be considered as the indication that you have not correctly understood the question and therefore will penalize your final score.

SUGGESTION: If you are planning to take the preliminary exam, try to answer these questions with the book close. Take the time it will take your to answer the questions.

Question 1: Dynamic branch prediction

Several processors use dynamic scheduling whereby hardware rearranges the instructions execution to reduce stalls. Frequent branches and jumps demand that we attach potential stalls from control dependences. Hardware mechanisms can be used to dynamically predict branches and jumps, i.e., allow the processor to resolve outcome of a branch early, though preventing control dependences.

A. Briefly describe the 1-bit and 2-bit prediction schemas and outline their major performance shortcoming.

B. Briefly describe the correlating predictor schema or two-level predictor schema. Explain the meaning of the values “2” and “4” in a (2,4) predictor. In other words, what does “2” represent? And what does “4” represent?

C. Advanced mechanisms for dynamic branch prediction include branch–target buffer also called branch-target cache. This mechanism can provide 0-cycle unconditional branches and sometimes 0-cycle conditional branches. Briefly explain this technique and explain why it can provide 0-cycle unconditional branches.

D. Suppose we have a pipelined architecture that has 7 stages and on a particular benchmark, it has the following dynamic instruction mix.

Instruction TypeDynamic FrequencyCycle Count

------

LOADS24% 7

STORES15% 7

ALU37% 7

BRANCHES (taken)19% 7

BRANCHES (not taken) 5% 7

Suppose we are using dynamic branch prediction and that each misprediction costs a 2-cycle penalty. Assume there are no additional hazards. What branch prediction accuracy rate (in %) is needed in order to achieve a throughput of 0.90 instructions per cycle? (Show work)

Question 2: Memory Hierarchies

Memory blocks are mapped to caches in various ways. The cache configuration or structure determines the mapping. The memory address of the block and the cache configuration determine the placement of the block in the cache. There are three basic cache configurations: direct mapped (one-way set associative), set associative, and fully associative (n-way set associative).

A. Compare and contrast the three cache configurations in terms of:

a. Hardware resources - which configuration(s) need more resources and what are those resources?

b. Access time - which configuration(s) require more access time and why?)

c. Potential hit rate - which configuration(s) have (potentially) better hit rates and why? Is this true for all workloads?

B. Assume a direct-mapped cache, a two-way set-associative cache, and a fully-associative cache, each with storage for (the tags and data associated with) 8 blocks total. For the memory block with block address 20 (in decimal base), indicate all possible locations in the cache to which this block may be mapped by drawing a picture of each cache and marking an “X” in the targeted cache block.

C. Give short definitions of hit time, miss rate, miss penalty. What is the formula for the average memory access time for a three-level cache?

D. Indicate one technique that is used by modern computer systems to reduce (a) cache time hit; (b) cache miss rates; (c) cache miss penalties. For each technique, briefly explain why they can be effective.

Question 3: Memory Performance

The average memory access time formula (given below) provides us with a framework to present cache optimizations and techniques for improving cache performance.

Average memory access time = hit time + miss rate * miss penalty

where:

Hit time is the time to hit in the cache

Miss rate is the fraction of access that is not in the cache

Miss penalty is the additional time to serve the miss

A. Briefly explain why a smaller and simpler cache can reduce the hit time. Make sure that in your explanation you address both the impact of a smaller cache and the impact of a simpler cache.

B. Briefly explain why larger block sizes and higher associatively can decrease miss rate. Make sure that in your explanation you address both the impact of larger block sizes cache and the impact of higher associatively.

C. Briefly explain why a multi-level cache can reduce miss penalty.

D. Explain why a fully associative schema for block placement increases the hit time and therefore the vast majority of processor caches today are direct mapped, two-way set associative, or four-way set associative.

1