Date:

QuizforChapter4TheProcessor

Notallquestionsareofequaldifficulty.Pleasereviewtheentirequizfirstandthen budgetyourtimecarefully.

Name:Course:

SolutionsinRED

1.[6points] FortheMIPSdatapath shownbelow, severallinesaremarkedwith“X”.Foreachone:

•Describein wordsthenegativeconsequenceofcuttingthislinerelativetotheworking, unmodifiedprocessor.

•Provideasnippetofcodethat willfail

•Provideasnippetofcodethat willstillwork

(1)Cannot writetoregisterfile. Thismeansthat R-typeandanyinstructionwithwritebacktoregister filewillfail. Anexampleof codesnippetthat wouldfailis:

add $s1, $s2, $s3

Anexampleofacodesnippetthat willnotfailis:

sw $s1, 0($s2)

(2)Forwardingofthefirstoperandfails.Anexampleof codesnippetthat wouldfailis:

add $s1, $t0, $t1 add $s1, $s1, $s1

Anexampleofcodesnippetthat willnotfailis:

add $s1, $t0, $t1

add $s1, $t2, $s1 # Here the second operand is forwarded correctly

(3) Jumping toabranchtargetdoesnot work. Exampleof codethat fails:

addi $s1, $zero, 2 addi $s2, $zero, 2 beq $s1, $s2, exit

Codethat willstillwork:

addi $s1, $zero, 10 addi $s2, $zero, 20 beq $s1, $s2, exit

2.[3points]Considerthefollowingassemblylanguagecode:

I0: ADD R4 = R1 + R0; I1: SUB R9 = R3 - R4; I2: ADD R4 = R5 + R6;

I3: LDW R2 = MEM[R3 + 100];

I4: LDW R2 = MEM[R2 + 0]; I5: STW MEM[R4 + 100] = R2; I6: AND R2 = R2 & R1;

I7: BEQ R9 == R1, Target;

I8: AND R9 = R9 & R1;

Considerapipelinewithforwarding,hazarddetection,and1 delayslotfor branches.The

pipelineisthetypical5-stageIF,ID,EX,MEM,WBMIPSdesign.Fortheabove code,complete the

pipelinediagrambelow(instructionson theleft, cycleson top)for thecode. InsertthecharactersIF,

ID,EX,MEM,WBfor eachinstructionintheboxes. Assumethat theretwo levelsof bypassing,that

thesecondhalfofthedecodestageperformsareadof sourceregisters,andthatthefirsthalfof the

write-backstagewritestotheregisterfile.

Labelalldatastalls(Draw anXinthebox).Labelall dataforwardsthattheforwardingunit detects (arrowbetweenthestageshandingoff thedataandthestagesreceivingthedata). Whatisthefinal executiontime ofthecode?

T0 / T1 / T2 / T3 / T4 / T5 / T6 / T7 / T8 / T9 / T10 / T11 / T12 / T13 / T14 / T15 / T16 / T17
I0
I1
I2
I3
I4
I5
I6
I7
I8
T0 / T1 / T2 / T3 / T4 / T5 / T6 / T7 / T8 / T9 / T10 / T11 / T12 / T13 / T14 / T15 / T16 / T17
I0 / IF / ID / EX / ME M / WB
I1 / IF / ID / EX / ME M / WB
I2 / IF / ID / EX / ME M / WB
I3 / IF / ID / EX / ME M / WB
I4 / IF / ID / X / EX / ME M / WB
I5 / IF / X / ID / EX / ME M / WB
I6 / X / IF / ID / EX / ME M / WB
I7 / IF / ID / EX / ME M / WB
I8 / IF / ID / EX / ME M / WB

Thefinalexecutiontimeofthecodeis13cycles.

3.[5points] Structural,dataandcontrolhazardstypicallyrequireaprocessorpipelinetostall.Listed belowareaseriesof optimizationtechniquesimplementedina compilerora processorpipeline designedtoreduceor eliminatestallsdue tothesehazards.Foreachofthefollowingoptimization techniques,statewhichpipelinehazardsitaddressesandhowit addressesit.Some optimization techniquesmayaddressmore thanonehazard,sobesuretoincludeexplanationsforalladdressed hazards.

(a)BranchPrediction

Itaddressescontrol hazardsbyguessingtheoutcome ofabranchinstructionandthenspeculatively executestheinstructionson onesideofthebranchtokeepthepipelinemoving.Predictionscanbe madeinhardwareor insoftwarebythecompiler.

(b)InstructionScheduling

Itaddressesstructuralhazardsanddatahazards.Itaddressesdatahazardsbyeithermoving instructionsthat arenotdependenton aninstruction,sayA,beforesomeinstructionsthatdependon Aandthus avoidingthestallthat would haveoccurredotherwise.Itaddressesstructuralhazardsby makingsureinstructionsthat usefunctionalunitsthat havelimitednumber ofinstancesarebe scheduledfarapartfrom eachotherandthereisno unnecessarystalldue tothis. Itcanbe donein hardware(superscalarprocessor)or staticallybythecompiler

(c)delayslots

Itaddressescontrol hazards.Ithelpstoavoidastallthatwould resultdue branchtargetidentification duringthedecodestagebyschedulingtheexecutionofsomeotherinstructionwhichanywayhasto executeirrespectiveofthebranchcondition.

(d)increasingavailabilityoffunctional units(ALUs,addersetc)

Ithelpstoavoidstructuralhazards.It ispossibletorun multiple instructionsof thesametypeat the sametime ifwehavereplicatedfunctionalunits

(e)caches

Itaddressesdatahazards.In particular, cacheshelptoreducememorylatencyandhencereducethe load-uselatencywhichinturn reducethestalldurationandimprovesexecutiontime(bymaintaining pipelinesteadystate).

7.[10points]Thisisathree-partquestionaboutcriticalpath calculation.Considerasimplesingle- cycleimplementationofMIPSISA.Theoperationtimesfor themajorfunctionalcomponents forthis machineareasfollows:

ComponentLatency
ALU / 10ns
Adder / 8ns
ALUControlUnit / 2ns
Shifter / 3ns
ControlUnit/ROM / 4ns
Sign/zeroextender / 3ns
2-1MUX / 2ns
Memory(read/write)(instructionordata) / 15ns
PCRegister(readaction) / 1ns
PCRegister(writeaction) / 1ns
Registerfile(readaction) / 7ns
Registerfile(writeaction) / 5ns
Logic(1ormorelevelsofgates) / 1ns

BelowisacopyoftheMIPSsingle-cycledatapathdesign.Inthisimplementation theclockcycleis determinedbythelongestpossiblepath in themachine.Thecriticalpathsfor thedifferentinstruction typesthat needtobe consideredare: R-format, Load-word, andstore-word.Allinstructionshavethe sameinstructionfetchanddecodesteps.Thebasicregistertransferoftheinstructionsare:

Fetch/Decode: Instruction <- IMEM[PC];

R-type: R[rd] <- R[rs] op R[rt]; PC <- PC + 4;

load: R[rt] <- DMEM[ R[rs] + signext(offset)]; PC <- PC +4;

store: DMEM[ R[rs] + signext(offset)] <- R[Rt]; PC <- PC +4;

(PartA)

Inthetablebelow,indicatethecomponentsthat determinethecriticalpath for therespective instruction,intheorderthatthecriticalpath occurs.Ifacomponent isused, butnot partof thecritical pathof theinstruction(iehappensinparallelwithanothercomponent),it shouldnot be inthetable. Theregisterfileisusedforreadingandfor writing;it willappeartwiceforsomeinstructions.All instructionbeginbyreadingthePCregisterwithalatencyof2ns.

InstructionHardwareElementsUsedByInstruction
Type
R-Format / Read
PC / Instruction
Memory / Control / Read regs / ALU
control / Mux / ALU / Mux / Mux / Write
regs
Load / Read
PC / Instruction
Memory / Control / Read
Reg / ALU
control / Mux / ALU / Data
Memory / Mux / Mux / Write
regs / Sign
extend
Store / Read
PC / Instruction
Memory / Control / Read
Regs / ALU
control / Mux / ALU / Data
Memory

(PartB)

Placethelatenciesof thecomponents that you havedecidedforthecriticalpathofeachinstructionin thetable below. Compute thesum ofeachofthecomponent latenciesfor eachinstruction.

InstructionTypeHardwareLatenciesForRespectiveElementsTotal
R-Format / 1ns / 15 ns / 4ns / 7ns / 2ns / 2 ns / 10 ns / 2ns / 2 ns / 5 ns
Load / 1ns / 15 ns / 4ns / 7ns / 2ns / 2 ns / 10 ns / 15 ns / 2ns / 2 ns / 5 ns / 3 ns
Store / 1ns / 15 ns / 4ns / 7ns / 2ns / 2 ns / 10ns / 15 ns

(PartC)

Usethetotal latencycolumn toderivethefollowingcriticalpath information:

•Giventhedatapath latenciesabove,whichinstructiondeterminestheoverallmachinecritical path (latency)?

•Whatwillbe theresultantclockcycletime ofthemachinebasedonthecriticalpath instruction?

•Whatfrequencywillthemachinerun?

Theloadinstructiondeterminestheoverallmachinecriticalpath latencysinceit hasthemaximum latency.

The critical path for the load instruction is
PC + Instmem+ ReadRegs+ Mux + ALU +DadaMem+Mux+WriteRegs
= 1 +15+7+2+10+15+2+5 = 57 ns

Theclockcycletime should be longenoughtopermit thecriticalpathinstructiontocomplete. So it should be atleast57 ns

Thefrequencyof themachinewould be (1/57)ns=17.5 MHz