Date:
QuizforChapter4TheProcessor
Notallquestionsareofequaldifficulty.Pleasereviewtheentirequizfirstandthen budgetyourtimecarefully.
Name:Course:
SolutionsinRED
1.[6points] FortheMIPSdatapath shownbelow, severallinesaremarkedwith“X”.Foreachone:
•Describein wordsthenegativeconsequenceofcuttingthislinerelativetotheworking, unmodifiedprocessor.
•Provideasnippetofcodethat willfail
•Provideasnippetofcodethat willstillwork
(1)Cannot writetoregisterfile. Thismeansthat R-typeandanyinstructionwithwritebacktoregister filewillfail. Anexampleof codesnippetthat wouldfailis:
add $s1, $s2, $s3
Anexampleofacodesnippetthat willnotfailis:
sw $s1, 0($s2)
(2)Forwardingofthefirstoperandfails.Anexampleof codesnippetthat wouldfailis:
add $s1, $t0, $t1 add $s1, $s1, $s1
Anexampleofcodesnippetthat willnotfailis:
add $s1, $t0, $t1
add $s1, $t2, $s1 # Here the second operand is forwarded correctly
(3) Jumping toabranchtargetdoesnot work. Exampleof codethat fails:
addi $s1, $zero, 2 addi $s2, $zero, 2 beq $s1, $s2, exit
Codethat willstillwork:
addi $s1, $zero, 10 addi $s2, $zero, 20 beq $s1, $s2, exit
2.[3points]Considerthefollowingassemblylanguagecode:
I0: ADD R4 = R1 + R0; I1: SUB R9 = R3 - R4; I2: ADD R4 = R5 + R6;
I3: LDW R2 = MEM[R3 + 100];
I4: LDW R2 = MEM[R2 + 0]; I5: STW MEM[R4 + 100] = R2; I6: AND R2 = R2 & R1;
I7: BEQ R9 == R1, Target;
I8: AND R9 = R9 & R1;
Considerapipelinewithforwarding,hazarddetection,and1 delayslotfor branches.The
pipelineisthetypical5-stageIF,ID,EX,MEM,WBMIPSdesign.Fortheabove code,complete the
pipelinediagrambelow(instructionson theleft, cycleson top)for thecode. InsertthecharactersIF,
ID,EX,MEM,WBfor eachinstructionintheboxes. Assumethat theretwo levelsof bypassing,that
thesecondhalfofthedecodestageperformsareadof sourceregisters,andthatthefirsthalfof the
write-backstagewritestotheregisterfile.
Labelalldatastalls(Draw anXinthebox).Labelall dataforwardsthattheforwardingunit detects (arrowbetweenthestageshandingoff thedataandthestagesreceivingthedata). Whatisthefinal executiontime ofthecode?
T0 / T1 / T2 / T3 / T4 / T5 / T6 / T7 / T8 / T9 / T10 / T11 / T12 / T13 / T14 / T15 / T16 / T17I0
I1
I2
I3
I4
I5
I6
I7
I8
T0 / T1 / T2 / T3 / T4 / T5 / T6 / T7 / T8 / T9 / T10 / T11 / T12 / T13 / T14 / T15 / T16 / T17
I0 / IF / ID / EX / ME M / WB
I1 / IF / ID / EX / ME M / WB
I2 / IF / ID / EX / ME M / WB
I3 / IF / ID / EX / ME M / WB
I4 / IF / ID / X / EX / ME M / WB
I5 / IF / X / ID / EX / ME M / WB
I6 / X / IF / ID / EX / ME M / WB
I7 / IF / ID / EX / ME M / WB
I8 / IF / ID / EX / ME M / WB
Thefinalexecutiontimeofthecodeis13cycles.
3.[5points] Structural,dataandcontrolhazardstypicallyrequireaprocessorpipelinetostall.Listed belowareaseriesof optimizationtechniquesimplementedina compilerora processorpipeline designedtoreduceor eliminatestallsdue tothesehazards.Foreachofthefollowingoptimization techniques,statewhichpipelinehazardsitaddressesandhowit addressesit.Some optimization techniquesmayaddressmore thanonehazard,sobesuretoincludeexplanationsforalladdressed hazards.
(a)BranchPrediction
Itaddressescontrol hazardsbyguessingtheoutcome ofabranchinstructionandthenspeculatively executestheinstructionson onesideofthebranchtokeepthepipelinemoving.Predictionscanbe madeinhardwareor insoftwarebythecompiler.
(b)InstructionScheduling
Itaddressesstructuralhazardsanddatahazards.Itaddressesdatahazardsbyeithermoving instructionsthat arenotdependenton aninstruction,sayA,beforesomeinstructionsthatdependon Aandthus avoidingthestallthat would haveoccurredotherwise.Itaddressesstructuralhazardsby makingsureinstructionsthat usefunctionalunitsthat havelimitednumber ofinstancesarebe scheduledfarapartfrom eachotherandthereisno unnecessarystalldue tothis. Itcanbe donein hardware(superscalarprocessor)or staticallybythecompiler
(c)delayslots
Itaddressescontrol hazards.Ithelpstoavoidastallthatwould resultdue branchtargetidentification duringthedecodestagebyschedulingtheexecutionofsomeotherinstructionwhichanywayhasto executeirrespectiveofthebranchcondition.
(d)increasingavailabilityoffunctional units(ALUs,addersetc)
Ithelpstoavoidstructuralhazards.It ispossibletorun multiple instructionsof thesametypeat the sametime ifwehavereplicatedfunctionalunits
(e)caches
Itaddressesdatahazards.In particular, cacheshelptoreducememorylatencyandhencereducethe load-uselatencywhichinturn reducethestalldurationandimprovesexecutiontime(bymaintaining pipelinesteadystate).
7.[10points]Thisisathree-partquestionaboutcriticalpath calculation.Considerasimplesingle- cycleimplementationofMIPSISA.Theoperationtimesfor themajorfunctionalcomponents forthis machineareasfollows:
ComponentLatencyALU / 10ns
Adder / 8ns
ALUControlUnit / 2ns
Shifter / 3ns
ControlUnit/ROM / 4ns
Sign/zeroextender / 3ns
2-1MUX / 2ns
Memory(read/write)(instructionordata) / 15ns
PCRegister(readaction) / 1ns
PCRegister(writeaction) / 1ns
Registerfile(readaction) / 7ns
Registerfile(writeaction) / 5ns
Logic(1ormorelevelsofgates) / 1ns
BelowisacopyoftheMIPSsingle-cycledatapathdesign.Inthisimplementation theclockcycleis determinedbythelongestpossiblepath in themachine.Thecriticalpathsfor thedifferentinstruction typesthat needtobe consideredare: R-format, Load-word, andstore-word.Allinstructionshavethe sameinstructionfetchanddecodesteps.Thebasicregistertransferoftheinstructionsare:
Fetch/Decode: Instruction <- IMEM[PC];
R-type: R[rd] <- R[rs] op R[rt]; PC <- PC + 4;
load: R[rt] <- DMEM[ R[rs] + signext(offset)]; PC <- PC +4;
store: DMEM[ R[rs] + signext(offset)] <- R[Rt]; PC <- PC +4;
(PartA)
Inthetablebelow,indicatethecomponentsthat determinethecriticalpath for therespective instruction,intheorderthatthecriticalpath occurs.Ifacomponent isused, butnot partof thecritical pathof theinstruction(iehappensinparallelwithanothercomponent),it shouldnot be inthetable. Theregisterfileisusedforreadingandfor writing;it willappeartwiceforsomeinstructions.All instructionbeginbyreadingthePCregisterwithalatencyof2ns.
InstructionHardwareElementsUsedByInstructionType
R-Format / Read
PC / Instruction
Memory / Control / Read regs / ALU
control / Mux / ALU / Mux / Mux / Write
regs
Load / Read
PC / Instruction
Memory / Control / Read
Reg / ALU
control / Mux / ALU / Data
Memory / Mux / Mux / Write
regs / Sign
extend
Store / Read
PC / Instruction
Memory / Control / Read
Regs / ALU
control / Mux / ALU / Data
Memory
(PartB)
Placethelatenciesof thecomponents that you havedecidedforthecriticalpathofeachinstructionin thetable below. Compute thesum ofeachofthecomponent latenciesfor eachinstruction.
InstructionTypeHardwareLatenciesForRespectiveElementsTotalR-Format / 1ns / 15 ns / 4ns / 7ns / 2ns / 2 ns / 10 ns / 2ns / 2 ns / 5 ns
Load / 1ns / 15 ns / 4ns / 7ns / 2ns / 2 ns / 10 ns / 15 ns / 2ns / 2 ns / 5 ns / 3 ns
Store / 1ns / 15 ns / 4ns / 7ns / 2ns / 2 ns / 10ns / 15 ns
(PartC)
Usethetotal latencycolumn toderivethefollowingcriticalpath information:
•Giventhedatapath latenciesabove,whichinstructiondeterminestheoverallmachinecritical path (latency)?
•Whatwillbe theresultantclockcycletime ofthemachinebasedonthecriticalpath instruction?
•Whatfrequencywillthemachinerun?
Theloadinstructiondeterminestheoverallmachinecriticalpath latencysinceit hasthemaximum latency.
The critical path for the load instruction is
PC + Instmem+ ReadRegs+ Mux + ALU +DadaMem+Mux+WriteRegs
= 1 +15+7+2+10+15+2+5 = 57 ns
Theclockcycletime should be longenoughtopermit thecriticalpathinstructiontocomplete. So it should be atleast57 ns
Thefrequencyof themachinewould be (1/57)ns=17.5 MHz