CES 524 Advanced Computer Architecture Spring 2009

HW # 2 solutions

1) Calculate the average CPI for each machine M1 and M2.

a.  Average CPI for M1 and M2:

For M1: 0.6*1 + 0.3*2 + 0.1*4 = 0.6 + 0.6 + 0.4 = 1.6

For M2: 0.6*2 + 0.3*3 + 0.1*4 = 1.2 + 0.9 + 0.4 = 2.5

b.  Calculate the average MIPS ratings for each machine, M1 and M2:

MIPS rating is equivalent to Instructions-Per-Seconds (in millions)
= no of cycles per sec/ avg # of cycles per instruction

= clock-rate/avg CPI

For M1: 80 x 106/1.6 = 50 MIPS

For M2: MIPS = 100 x 106/2.5 = 40 MIPS

c. Which machine has a smaller MIPS rating ? Which individual instruction class CPI do you need to change, and by how much, to have this machine have the same or better performance as the machine with the higher MIPS rating (you can change the CPI for one of the instruction classes on the slower machine) ?

M2 has the lower MIPS rating. It can be checked that by reducing the CPI of class instructions from 2 to 1, M2 can have a higher MIPS rating.

2. Suppose you have a machine that executes a program consisting of 50% floating point multiply, 20% floating point divide, and the remaining 30% are from other instructions.

(a) Management wants the machine to run four times faster. You can make the divide run at most 3 times faster and the multiply run at most 8 times faster. Can you meet management’s goal by making one improvement, and which one?

Solution: With the improvement in multiplication, the maximum speed-up we could possibly get is 2 since speed-up <= 1/f = 2

Similarly, we the improvement in division, the max speed up we can get is at most 1/f = 1/0.7 < 2

Thus in both cases, we can never get a speed-up of 4.

(b) With both improvements implemented, we get a speed-up of

s = 1/(f1 + f2/ p2 + f3 / p3) ~ 2.33

3. Problem 3.16

(a) The area of the circle = p d2/4

The area of each chip = u2

Total number of chips (including broken ones) =

The number of broken ones ~

The reason is that each broken one roughly covers of the circumference and since the length of the circumference is p d, the number of broken chips is approximately .

Thus the claim follows.

4.  Problem 5.13 (a) and (c)

(solution due to Muhammad Lutfi)

(a)

main:

la $s1, array # s1 = &A[0]

la $s2, length # s2 = &length

lw $s2,0($s2) # s2 = length

lw $t0,0($s1) # t0 = A[0] = max

addi $t1,$zero,0 # t1 = i = 0

loop:

add $t1,$t1,1 # i = i + 1

beq $t1,$s2,done # if (i==length) then done

add $t2,$t1,$t1 # t2 = 2*i

add $t2,$t2,$t2 # t2 = 4*i

add $t2,$s1,$t2 # t2 = &A[0] + 4*i

lw $t3,0($t2) # t3 = A[i]

slt $t4,$t0,$t3 # t4 = (max < A[i] ? 1 : 0)

beq $t4,$zero,loop # if (max >= A[i]) goto loop

addi $t0,$t3,0 # max = A[i]

j loop

done:

addi $v0,$zero,4 # print(answer)

la $a0,answer

syscall

addi $a0,$t0,0 # print(max)

li $v0,1

syscall

addi $v0,$zero,4 # print(newln)

la $a0,newln

syscall

(c)

main:

la $s1, array # s1 = &A[0]

la $s2, length # s2 = &length

lw $s2,0($s2) # s2 = length

lw $t0,0($s1) # t0 = A[0] = max

andi $t0,$t0,0x0ff # t0 contains the LSB of A[0]

addi $t1,$zero,0 # t1 = i = 0

addi $t5,$zero,0 # t5 = idx_of_max = 0

loop:

add $t1,$t1,1 # i = i + 1

beq $t1,$s2,done # if (i==length) then done

add $t2,$t1,$t1 # t2 = 2*i

add $t2,$t2,$t2 # t2 = 4*i

add $t2,$s1,$t2 # t2 = &A[0] + 4*i

lw $t3,0($t2) # t3 = A[i]

andi $t3,$t3,0x0ff # t3 = LSB of A[i]

bge $t0,$t3,loop # goto loop if max >= A[i]

addi $t0,$t3,0 # max = A[i]

andi $t0,$t0,0x0ff # max = LSB of A[i]

addi $t5,$t1,0 # t5 = current idx

j loop

done:

addi $v0,$zero,4 # print(answer)

la $a0,answer

syscall

addi $a0,$t5,0 # print(idx_max)

li $v0,1

syscall

addi $v0,$zero,4 # print(newln)

la $a0,newln

syscall

exit: