Final exam

Computer Systems Architecture

27 January 2012

6 Questions 100 points

Please show the detail of your solution. If the questions are not clear, please state your assumption.

Question 1: Given a following assembly code:

int func1(int x,int y){

pushl %ebp

movl %esp,%ebp

movl 8(%ebp), %eax

cmpl 12(%ebp), %eax

jl .L2

movl 8(%ebp),%eax

.L2:

movl 12(%ebp), %eax

.L3:

popl %ebp

ret

}

parameter x and y are stored at memory location with offset 8, and 12 relative to the address in register %ebp. The function will store the final value in register %eax. Write the original C function. (20 points)

Question 2: For the following code, what is the final value of register %eax (20 points)

pushl %ebp

movl %esp, %ebp

movl 12(%ebp), %eax

movl 8(%ebp), %edx

movl %eax, %ecx

sall %cl, %edx

movl 16(%ebp), %eax

movl (%eax), %eax

leal (%edx,%eax), %eax

popl %ebp

ret

Assume that the value of registers before starting this code is as follows:

%ebp 0xbffff398

%esp 0xbffff36c

and the value of memory is

0xbffff370: 0x00000064

0xbffff374: 0x00000005

0xbffff378: 0xbffff384

0xbffff37C: 0xbffff380

0xbffff380: 0x00000009

0xbffff384: 0x0000000a

Question 3: Assume that for each function call, the compiler will increase the stack size by the size of local variable + return address only (10 points).

For the following three functions, what are the final stack look like assuming that you current frame is in function c and address of register esp before call function a is 0x8000.

double c(int x, int &y, char w){

double z;

z = x+*y+w; -- you are in this frame

return z;

}

double b(int x,int &y, char w, char z){

return c(x,y,w) +z ;

}

double a(int x, int &y){

char w,z;

z = 3;

w = 4;

return b(x,&y,w,z);

}

int main(){

int x,y;

double z;

x = 1; y= 2;

z=a(x,&y);

}

Question 4: Assume a 5 stage microprocessor system (Fetch, Decode, Execute, Memory, and Writeback), calculate the total number of cycles of below instructions.

1)No pipeline system

2)A pipeline system

In this in-order machine with one unit of ALU that can perform all arithmetic operations, multiplier instruction (multiplier with pipeline support) takes 3 cycles, divider instruction (no pipeline for divider) takes 5 cycles, and other instructions take 1 cycle. (20 points)

  1. MUL R1, R2, R3
  2. DIV R4, R1, R6
  3. LD R9, [R4]
  4. MUL R2, R1,R3
  5. LD R3, [R2]
  6. MUL R4, R3, R6
  7. LD R5, [R4]
  8. MUL R7, R8, R9
  9. DIV R4, R10,R4
  10. ADD R8,R9, R11

Questions 5: Given this load pattern, calculate the total number of cache miss for a 2-set cache (2 ways-set associate) with the block size of 16 bytes (byte-addressable). The system is based on true LRU algorithm. (20 points).

0xAA00

0xAA04

0xAA10

0xAA20

0xAA30

0xAA14

0xAA05

0xAA16

0xAA40

0xAA50

0xAA06

0xAA17

0xAA67

0xAA78

0xAA09

0xAA80

0xAA7A

Question 6: Consider the following page-reference addresses:

0xAAA00

0xAAB00

0xA0A00

0xA1B00

0xA2A00

0xA3A00

0xAAC00

0xA4A00

0xA5A00

0xA1FFF

0xA2AFF

0xA3BFF

0xA4A00

0xA6000

0xA7000

How many page faults would occur for true-LRU replacement algorithms, assuming six frames and page size is 4kbytes and cache size is 32 bytes? (10 points)