Computer Organization and Architecture: Themes and Variations, 1st Edition Alan Clements
Code Fragments
I have extracted most of the fragments of ARM code from chapters 3 and 4 and have provided copies in this document. Most fragments include a line or two of the text preceding them in order to help students locate them in the text. I have put the first few words of each text fragment in enlarged bold font to indicate the beginning of each new fragment.
The purpose of this document is to enable students to embed code in their own notes and to add any further comments or explanations.
If you have any comments or suggestions or wish to report errors, please contact me at .
The following fragment of codedemonstrates a conditional branch.
SUBS r5,r5,#1;Subtract 1 from r5
BEQ onZero;IF zero then goto the line labeled ‘onZero’
notZero ADD r1,r2,r3;ELSE continue from here
.
.
.
onZero SUB r1,r2,r3;Here’s where we end up if we take the branch
We can translate this into ARM code using the subset of ARM instructions defined earlier in the panel.In the following code.
LDR r0,P;Load r0 with the contents of memory location P
LDR r1,Q;Load r1 with the contents of memory location Q
SUBS r2,r0,r1;Subtract the contents of Q from P to get X = P-Q
BPL THEN;IF X 0 then execute the ‘THEN’ part
ELSE ADD r0,r0,#20;ELSE Add 20 to the contents of r0 to get P+20
B EXIT;Skip past ‘THEN’ part to ‘EXIT’
THENADD r0,r0,#5;Add 5 to r0 to get P+5
EXITSTR r0,X;Store r0 in memory location X
STOP
PDCD 12;These three lines reserve memory space for
QDCD 9;the three operands P, Q, X. The memory
XDCD ;locations are 36, 40, and 44, respectively.
This sequence of assembly-language instructions can be expressed in RTL notation, as follows:
LDR / r0,P / ;[r0] ← [P]LDR / r1,Q / ;[r1] ← [Q]
SUBS / r2,r0,r1 / ;[r2] ← [r0]-[r1]
BPL / THEN / ;IF [r2] ≥ 0 [PC]← THEN
ELSE / ADD / r0,r0,#20 / ;[r0] ← [r0] + 20
B / EXIT / ;[PC] ← EXIT
THEN / ADD / r0,r0,#5 / ;[r0] ← [r0]+5
EXIT / STR / r0,X / ;[X] ← [r0]
Case 1: P = 12, Q = 9, and the branch is taken (control is transferred to the branch target address);
Case 2: P = 12, Q = 14, and the branch is not taken (control is transferred to PC+4).
Let’s look at another example of the use of conditional branching in the mechanization of a loop that calculates 1 + 2 + 3 + … + 20. In this case a counter is incremented from 1 to 20. On the final pass, the count becomes 21. The operation CMPr0,#21 compares the counter value in r0 with the literal 21 by subtraction. The next operation BNE Next makes a branch back to the instruction labeled by ‘Next’ unless the previous result was zero. On the 20th iteration, the result becomes zero and the branch is not taken and the loop exited.
LDR r0,#1;Put 1 in register r0 (the counter)
LDR r1,#0;Put 0 in register r1 (the sum)
NextADD r1,r1,r0;REPEAT:Add the current count to the sum
ADD r0,r0,#1; Add 1 to the counter
CMP r0,#21; Have we added all 20 numbers?
BNE Next;UNTIL we have made 20 iterations
STOP;If we have THEN stop
We’ll use theADD instruction to add together the four values in registers r2, r3, r4, and r5. This code is typical of RISC processors like the ARM.
ADD r1,r2,r3 ;r1 = r2 + r3
ADD r1,r1,r4 ;r1 = r1 + r4
ADD r1,r1,r5 ;r1 = r1 + r5 = r2 + r3 + r4 + r5
You have already seen fragments of ARM assembly language and now we introduce some of the features that enable you to write programs that will run in an ARM environment.ARM instructions are written in the form
Label Op-code operand1, operand2, operand3 ;comment
e.g., Test_5 ADD r0,r1,r2 ;calculate TotalTime = Time + NewTime
MOV r7, #5 ;Loadloopcounter with 5
BEQ Test_5 ;IF zero THEN goto Test_5
The Label field is a user-defined label that can be used by other instructions to refer to that line; for example, by a conditional branch. Note that it doesn’t matter whether there are one or more spaces after the commas in argument lists; you can write operand1,operand2 or operand1, operand2.
Let’s look at a simplefragment of ARM code. Suppose we wish to generate the sum of the cubes of numbers from 1 to 10. We can use the multiply and accumulate instruction as follows;
MOV r0,#0 ;clear total in r0
MOV r1,#10 ;FOR i = 1 to 10 (count down)
Next MUL r2,r1,r1 ; square number
MLA r0,r2,r1,r0 ; cube number and add to total
SUBS r1,r1,#1 ; decrement counter (set condition flags)
BNE Next ;END FOR (branch back on count not zero)
We begin with a program that can be executed on an ARM computer or a PC with an ARM cross-development system. The following fragment of code demonstrates the structure of the simple program we described above that forms the cubes of the first ten integers. The text in blue represents assembly directives rather than executable ARM code.
AREA ARMtest, CODE, READONLY
ENTRY
MOV r0,#0 ;clear total in r0
MOV r1,#10 ;FOR i = 1 to 10
Next MUL r2,r1,r1 ; square number
MLA r0,r2,r1,r0 ; cube number and add to total
SUBS r1,r1,#1 ; decrement loop count
BNE Next ;END FOR
END
The following fragment of ARM code provides a demonstration of storage allocation and the use of the ALIGN directive.
Stop B Stop ;infinite loop!
AREA Directives, CODE, READONLY
ENTRY
MOV r6,#XX ;load r6 with 5 (i.e., XX)
LDR r7,P1 ;load r7 with the contents of location P1
ADD r5,r6,r7 ;just a dummy instruction
MOV r0, #0x18 ;angel_SWIreason_ReportException
LDR r1, =0x20026 ;ADP_Stopped_ApplicationExit
SVC #0x123456 ;ARMsemihosting (formerly SWI)
XX EQU 5 ;equate XX to 5
P1 DCD 0x12345678 ;store hex 32-bit value 1345678
P3 DCB 25 ;store the byte 25 in memory
YY DCB 'A' ;store byte whose ASCII character is A in memory
Tx2 DCW 12342 ;store the 16-bit value 12342 in memory
ALIGN ;ensure code is on a 32-bit word boundary
Strg1 = "Hello"
Strg2 = "X2", &0C, &0A
Z3 DCW 0xABCD
END
The following code fragment demonstrates the use of the ADRpseudoinstruction.
ADR r1,MyArray;set up r1 to point to MyArray
.
LDR r3,[r1];read an element using the pointer
. .
MyArray DCD 0x12345678
Let’s look at howpseudoinstructionsare treated by the ARM development system. Consider the following code fragment. This is just dummy code intended to illustrate a point; it doesn’t have any purpose.
AREA ConstPool, CODE, READONLY
ENTRY
LDR r0,=0x12345678;load r0 with a 32-bit constant
ADR r1,Table;load r1 with the address of Table
ADR r2,Table1;load r2 with the address of Table1
LDR r3, = 0xAAAAAAAA;load r3 with a 32-bit constant
LDR r4,P3;what does this do?
Table DCD 0xABCDDCBA;dummy data
Table1 DCD 0xFFFFFFFF
P3 DCD 0x22222222
The compare instructionCMP r0,r1 evaluates [r0] – [r1] then updates the status bits accordingly. A special case of the comparison instruction isTSTtest, which performs a comparison with zero, since ARM lacks an explicit compare-with-zero instruction. We look at this instruction in more detail later. Considerthe following example:
CMP r1,r2 ;is r1 = r2?
BEQ DoThis ;if equal then gotoDoThis
ADD r1,r1,#1 ;else add 1 to r0
.
.
DoThis SUB r1,r1,#1 ;subtract 1 from r1
For example, the ARM assembly code that multiplies 121 by 96 is
MOV r0,#121;load r0 with 121
MOV r1,#96;load r1 with 96
MUL r2,r0,r1;r2 = r0 x r1
The following code fragment shows how the multiply and accumulate instruction is used to form the inner product between two n-component vectors Vector1 and Vector2.
MOV r4,#n ;r4 is the loop counter
MOV r3,#0 ;clear the inner product
ADR r5,Vector1 ;r5 points to vector 1
ADR r6,Vector2 ;r6 points to vector 2
Loop LDR r0,[r5], #4 ;REPEAT read a component of A and update the pointer
LDR r1,[r6], #4 ; get the second element in the pair
MLA r3,r0,r1,r3 ; add new product term to the total (r3 = r3 + r0·r1)
SUBS r4,r4,#1 ; decrement the loop counter (and remember to set the CCR)
BNE Loop ;UNTIL all done
A typical application of logical operations might be to merge groups of bits, an operation that is commonly used to pack more than one variable into a register or memory location. Suppose that register r0 contains the 8 bits bbbbbbxx, register r1 contains the bits bbbyyybb and register r2 contains the bits zzzbbbbb, where x, y, and z represent the bits of desired fields and the b’s are unwanted bits. We wish to pack these bits to get the final value zzzyyyxx. We can achieve this by:
AND r0,r0,#2_00000011;Mask r0 to two bits xx
AND r1,r1,#2_00011100;Mask r1 to three bits yyy
AND r2,r2,#2_11100000;Mask r2 to three bits zzz
OR r0,r0,r1;Merge r1 and r0 to get 000yyyxx
OR r0,r0,r2;Merge r2 and r0 to get zzzyyyxx
A typical application of logical shiftingis to extract a bit pattern from within a word. Suppose we have an 8-bit string bxxxxbbb, where the xs represent the bits to be extracted and the bs denote don’t-care values. We can extract and right-justify the required field, as follows (note that this code is for illustration and is not ARM code).
LSR r0,r0,#3,;Shift r0 three places right to get 000bxxxx
AND r0,r0,#2_00001111;Mask out unwanted bits to get 0000xxxx
ARM’s unconditional branch instruction has the form B target, where target denotes the branch target address (BTA, the address of the next instruction to be executed). The following fragment of code demonstrates how the unconditional branch is used.
.. do thisSome code
.. then thatSome other code
B NextNow skip past the next instructions
.. …the code being skipped past
.. …the code being skipped past
Next ..Target address for the branch, denoted by label Next
ARM’s conditional branches are similar to those of other RISC and CISC processors. They consist of a mnemonic Bcc and a target address, where the subscript defines one of 16 conditions that must be satisfied for the branch to be taken and the target address is the location of the place in the code where execution continues if the branch is taken. A typical conditional example of conditional behavior in a high-level language is given by the following construct.
If (X ==Y) {THEN Y = Y + 1;
ELSE Y = Y + 2}
CMP r1,r2;assume r1 contains y and r2 contains x: compare them
BNE plus2;if not equal then branch to the else part
ADD r1,r1,#1;if equal fall through to here and add one to y
B leave;now skip past the else part
plus2 ADDr1,r1,#2 ;ELSE part add 2 to y
leave …;continue from here
The FOR loop
MOV r0,#10;set up the loop counter
Loop code ...;body of the loop
SUBS r0,r0,#1;decrement loop counter and set status flags
BNE Loop;continue until count zero–branch on not zero
Post loop ...;fall through on zero count
The WHILE loop
Loop CMP r0,#0 ;perform test at start of loop
BEQ WhileExit;exit on test true
code ...;body of the loop
B Loop;Repeat WHILE true
WhileExit Post loop ...;fall through on zero count
The UNTIL loop
Loop code ...;body of the loop
CMP r0,#0;perform test at start end of loop
BNE Loop;RepeatUNTIL true
Post loop ...;fall through on zero count
ARM’s conditional execution mode makes it easy to implement conditional operations in a high-level language. Consider thefollowing fragment of C code.
if (P == Q) X = P – Y ;
If we assume that r1 contains P, r2contains Q, r3contains X, and r4contains Y, then we can write
CMP r1,r2 ;compare P == Q
SUBEQ r3,r1,r4 ;if (P == Q) then r3 = r1 - r4
Now consider a more complicated exampleof a C construct with a compound predicate:
if ((a == b) & (c == d)) e++;
CMP r0,r1 ;compare a == b
CMPEQ r2,r3 ;if a == b then test c == d
ADDEQ r4,r4,#1 ;if a == b AND c == d THEN increment e
Without conditional execution, we might write
CMP r0,r1 ;compare a == b
BNE Exit ;exit if a =! b
CMP r2,r3 ;compare c == d
BNE Exit ;exit if c =! d
ADD r4,r4,#1 ;else increment e
Exit
Consider:
if(a == b) e = e +4;
if(a b) e = e +7;
if(a b) e = e + 12;
CMP r0,r1 ;compare a == b
ADDEQ r4,r4,#4 ;if a == b then e = e + 4
ADDLE r4,r4,#7 ;if a b then e = e + 7
ADDGT r4,r4,#12 ;if a b then e = e + 12
Once again, using conventional non-conditional execution, we would have to write something like the following to implement this algorithm.
CMP r0,r1 ;compare a == b
BNE Test1 ;not equal try next test
ADD r4,r4,#4 ;a == b so e = e+4
B ExitAll ;now leave
Test1 BLT Test2 ;if a < b then
ADD r4,r4,#12 ;if we are here a > b so e = e + 12
B ExitAll ;now leave
Test2 ADD r4,r4,#7 ;if we are here a < b so e = e + 7
ExitAll
Literal addressing is used by high-level language (HLL) constructs that specify a constant rather than a variable, such as:
IF I > 25 THEN J = K + 12 ,
where the constants 12 and 25 can be specified by literal addressing. We can express this as:
;assume I is in r0, J in r1, and K in r2
CMP r0,#25;Compare I with the value 25
BLE Exit;IF I ≤ 25 THEN exit
ADD r1,r2,#12; ELSE add 12 to K
Exit;...
We can simplify the code by using conditional execution as follows.
CMP r0,#25;Compare I with the value 25
ADDGE r1,r2,#12 ;IF I ≤ 25 THEN exit
Consider the following example where a table of seven entries representsthe days of the week.D1 represents Monday andD2 represents Tuesday, etc. If Diis day i then Di+1 represents the next day. In order to move from one day to the next, all we need do is increment index i. This is why we need variable addresses.
ADR r0 = week;r0 points to array week
ADD r0,r0,r1 LSL #2;r0 now points at the day whose value is in r1
LDR r2,[r0];read the data for this day into r2
Week DCD ;data for day 1
DCD ;data for day 2
.
DCD;data for day 7
Consider the following fragment of C code:
for (i = 0; i < 21; i++)
{
j[i] = j[i] + 10;
}
The values 0, 21, and 10 in this program are constants specified via immediate addressing during compilation. We can translate the above high-level code into ARM assemble language as follows.
MOV r0,#0 ; Set counter i in r0 to initial value zero
ADR r8,#j ; Index register r8 points to array j (pseudoinstruction)
Loop LDR r1,[r8] ; REPEAT Get j[i]
ADD r1,r1,#10 ; Add 10 to j[i]
STR r1,[r8] ; Save j[i]
ADD r0,r0,#1 ; Increment loop counter i
CMP r0,#21 ; Compare loop counter with terminal value + 1
BNE Loop ; UNTIL i = 21
Note that we have counted up from 0. Had we loaded r0 with 10, we could have used a SUBS r0,r0,#1 to decrement the counter, followed by a BNE Loop to save an instruction.
Let’s look as a simple but typical example of offset addressing. The following fragment of code demonstrates the use of offsets to implement array access. Because the offset is a constant, it cannot be changed at runtime.
Sun EQU 0;offsets for days of the week
Mon EQU 4
Tue EQU 8
.
Sat EQU 24
ADR r0, week;r0 points to array week
LDR r2,[r0,#Tue];read the data for Tuesday into r2
Week DCD ;data for day 1(Sunday)
DCD ;data for day 2 (Monday)
DCD ;data for day 3 (Tuesday)
DCD ;data for day 4 (Wednesday)
DCD ;data for day 5 (Thursday)
DCD ;data for day 6 (Friday)
DCD;data for day 7 (Saturday)
Consider the following example of the addition of two arrays.
Len EQU 8;let’s make the arrays 8 words long
ADR r0,A - 4;register r0 points at array A
ADR r1,B - 4;register r1 points at array B
ADR r2,C - 4;register r2 points at array C
MOV r5,#Len;use register r5 as a loop counter
Loop LDR r3,[r0,#4]!;get element of A
LDR r4,[r1,#4]!;get element of B
ADD r3,r3,r4;add two elements
STR r3,[r2,#4]!;store the sum in C
SUBS r5,r5,#1;test for end of loop
BNE Loop;repeat until all done
Memory access operations have a conditional execution field, bits 31-28 of the op-code, and can be conditionally executed like other ARM instructions. This facility makes it possible to write code like
;if (a ==b) then x = p else x = q
CMP r1,r2;if (a == b)
LDREQ r3,[r4];then x = p
LDRNE r3,[r5];else x = q
Let’s look at a simple example of the use of a subroutine. Suppose that you wanted to evaluate the function if x > 0 then x = 16x + 1 elsex = 32x several times in a program. Assuming that the parameter x is in register r0, we can write the following subroutine.
Func1 CMP r0,#0;test for x > 0
MOVGT r0,r0, LSL #4 ;if x > 0 x = 16x
ADDGT r0,r0,#1 ;if x > 0 then x = 16x + 1
MOVLT r0,r0, LSL #5 ;ELSE if x < 0 THEN x = 32x
MOV pc,lr :return by restoring saved PC
We’ve made use of conditional execution here. The only thing needed to turn a block of code into a subroutine is an entry point (the label ‘Func1’) and a return (the BL). Consider the following.
LDR r0,[r4] ; get P
BL Func1 ; P = (if P > 0 then 16P + 1 else 32P)
STR r0,[r4] ; save P
.
. some code
.
LDR r0,[r5,#20] ; get Q
BL Func1 ; Q = (if Q > 0 then 16Q + 1 else 32Q)
STR r0,[r5,#20] ; save P
Because the branch with link instruction can be conditionally executed, ARM provides a full set of conditional subroutine calls, for example:
CMP r9,r4 ;if r9 < r4
BLLT ABC ;then call subroutine ABC
Suppose we wish to obtain the absolute value of a signed integer; that is, if x < 0 then x = - x. This fragment of code uses theTEQ instruction and a reverse subtract operation.
TEQ r0,#0 ;compare r0 with zero
RSBMI r0,R0,#0 ;if negative then 0 – r0 (note use of reverse subtract)
Suppose the data we wish to re-order,0xABCDEFGH,is in r0and r1 is a working register. The following code (taken from ARM literature) implements this operation which generate the new sequence 0xGHEFCDAB (i.e., the bytes have been reversed but not the nibbles in the bytes). The comment fields for each of these operations show what’s happening to the data.
EOR r1,r0,r0, ROR #16 ; AE, BF, CG, DH, EA, FB, GC, HD
BIC r1,r1, #0x00FF0000 ; AE, BF, 0, 0, EA, FB, GC, HD
MOV r0,r0,ROR #8 ; G,H,A,B,C,D,E,F
EOR r0,r0,r1, LSR #8 ; r1 after LSR #8 is 0,0, AE, BF, 0, 0, EA, FB
; G,H,AAE, BBF, C,D,E EA,FFB
; G,H,E,F,C,D,A,B
The ARM’s ability to shift an operand before using it in an addition or subtraction provides a convenient way to multiply by 2n – 1 or 2n + 1. Consider the following fragment of code that exploits both this feature and conditional execution.
;IF x > y THEN p = (2n + 1)q
; ELSE IF (x = y) p = 2n·q
; ELSE p = (2n – 1)·q
CMP r2,r3 ;Compare x and y
ADDGT r4,r1,r1, LSL #n ;IF > calculate p = q·(2n + 1)
MOVEQ r1,r1, LSL #n ;IF = calculate p = q·2n
RSBLT r4,r1,r1, LSL #n ;IF < calculate p = q·(2n - 1)
In this example we’ll convert to lower-case text. Bit 5 of an ASCII character is zero for upper-case letters, and one for lower-case letters. It is easy to detect upper-case letters because they are contiguous, beginning with ‘A’ and ending with ‘Z’. Assuming the character to convert is in r0 and the remaining bits of r0 are all clear, we can write
CMP r0,#’A’ ;Are we in the range of capitals?
RSBGES r1,r0,#’Z’ ;Check less than Z if greater than A. Update flags
ORRGE r0,r0,#0x0020 ;If A to Z then set bit 5 to force lower-case
The first instruction checks whether the character is ‘A’ or greater. If it is, the second line checks that the character is less than ‘Z’. Note that this test is performed only if the character in r0 is greater than ’A’ and that we are using reverse subtraction because we wish to test whether ‘Z’ – char is positive. The mnemonic is “if greater than or equal to then reverse subtract and update the status bits on the result”. Finally, if we are in range, the conditional OR instruction is executed and an upper- to lower-case conversion is performed.
Consider the switch statement in a high level language. For example
switch (i) {
case 0: do action; break;
case 1: do action 1; break;
.
.
case n: do action n; break;
default: exception
}
ADR r1, Case ;load r1 with the address of the jump table
CMP r0,#maxCase ;better see if the switch variable is in range
ADDLE pc,r1,r0, LSL #2 ;if OK then jump to the appropriate case
;default exception handler here
.
.
Case B case0 ;from the case table jump to the actual code