Homework 2 (Part 1)

Elec/Comp 425

Fall 2009

Homework 2 (Part 1)

Please use this Word document together with additional spreadsheets and charts. Print and submit hard copies. Delete problem statements to save space.

Assume the classic 5-stage pipeline for this problem. All memory accesses take 1 cycle, the clock rate is 1GHz, and the initial value of R3 equals the initial value of R2 + 400. Answer (a) through (c) with respect to the following code segment.

Loop:LDR1, 0(R2)| temp = A[index]

DADDIR1, R1, #1| temp++

SD0(R2), R1| A[index] = temp

DADDIR2, R2, #4| index += 4

DSUBR4, R3, R2|

BNEZR4, Loop| Loop if R4 not zero

Assume (i) No forwarding (ii) Branch resolved in the ID stage (iii) Branch resolved by flushing the pipeline if necessary. Complete the schedule in the Excel spreadsheet Q1, showing the schedule for the first 7 instructions of the loop. How many cycles does the program take to complete? What is the MIPS rating achieved for this code?

Cycle count:

MIPS:

Repeat part (a) assuming (i) normal forwarding hardware for data dependencies (ii) Branch target address computed in ID and resolved in EX (iii) Branch predicted as Not Taken.

Cycle count:

MIPS:

Repeat part (a) assuming 1 branch delay slot (b) normal data forwarding hardware (c) instruction reordering (changing operands if necessary, to execute in the minimum time). Show the instruction sequence and schedule in the spreadsheet.

Cycle count:

MIPS:

Loop:

A:L.DF2, 0(R1)| Load X[i]

B:MULT.D F4, F2, F0| a * X[i]

C:L.DF6, 0(R2)| Load Y[i]

D:ADD.DF6, F4, F6| A * X[i] + Y[i]

E:S.DF6, 0(R2)| Store Y[i]

DADDIR1, R1, #8| Increment X index

DADDIR2, R2, #8| Increment Y index

BNER1, R4, Loop| Iterate if limit not reached

Assume:

FP MUL is a 6-stage fully-pipelined unit
FP ADD is a 4-cycle non-pipelined unit
Full forwarding hardware
All LD and SD instructions always require 1 MEM cycle.
Branches are predicted as always taken and the target address is computed in the ID stage

Q2. a)Show the execution schedule of two iterations of the above code on a MIPS FP pipeline using the attached chart in Excel Spreadsheet Q2. (The pipeline structure is similar to Slide 1 of Lecture 9/25.)

b)How many cycles are required per iteration?

Q3. a) Show the execution schedule of two iterations of the above code on a Scoreboard architecture pipeline using the attached chart in Excel Spreadsheet Q3. Recall there is no data forwarding in the Scoreboard. However an update of a destination register and its read by a waiting instruction can occur in the same cycle.

(The pipeline structure is similar to Slide 1 of Lecture 10/2.)

For integer operations (like integer ADD, SUB etc) assume there is one EX stage of 1 cycle. That is, the integer pipeline consists of three stages of 1 cycle each: R, EX, and WB. Assume that a write to an integer register and one to an FP register can occur in the same cycle. Assume that the branch is resolved in the ID stage. That is registers are read and compared, and target address is computed in the ID stage.

b)How many cycles are required per iteration?

c) Describe the data flow graph created by the Scoreboard algorithm at different times by showing the edges into and out of the various registers following the event indicated. Make sure to differentiate green from red edges in your diagram. Use the provided template: Dataflow3. A through E refer to the first 5 loop instructions during iteration 1 ).