Computer Organization and Architecture: Themes and Variations, 1St Edition Alan Clements

Computer Organization and Architecture: Themes and Variations, 1st Edition Alan Clements

Code Fragments

I have extracted most of the fragments of ARM code from chapters 3 and 4 and have provided copies in this document. Most fragments include a line or two of the text preceding them in order to help students locate them in the text. I have put the first few words of each text fragment in enlarged bold font to indicate the beginning of each new fragment.

The purpose of this document is to enable students to embed code in their own notes and to add any further comments or explanations.

If you have any comments or suggestions or wish to report errors, please contact me at .

The following fragment of codedemonstrates a conditional branch.

SUBS r5,r5,#1;Subtract 1 from r5

BEQ onZero;IF zero then goto the line labeled ‘onZero’

notZero ADD r1,r2,r3;ELSE continue from here

onZero SUB r1,r2,r3;Here’s where we end up if we take the branch

We can translate this into ARM code using the subset of ARM instructions defined earlier in the panel.In the following code.

LDR r0,P;Load r0 with the contents of memory location P

LDR r1,Q;Load r1 with the contents of memory location Q

SUBS r2,r0,r1;Subtract the contents of Q from P to get X = P-Q

BPL THEN;IF X  0 then execute the ‘THEN’ part

ELSE ADD r0,r0,#20;ELSE Add 20 to the contents of r0 to get P+20

B EXIT;Skip past ‘THEN’ part to ‘EXIT’

THENADD r0,r0,#5;Add 5 to r0 to get P+5

EXITSTR r0,X;Store r0 in memory location X

STOP

PDCD 12;These three lines reserve memory space for

QDCD 9;the three operands P, Q, X. The memory

XDCD ;locations are 36, 40, and 44, respectively.

This sequence of assembly-language instructions can be expressed in RTL notation, as follows:

LDR / r0,P / ;[r0] ← [P]
LDR / r1,Q / ;[r1] ← [Q]
SUBS / r2,r0,r1 / ;[r2] ← [r0]-[r1]
BPL / THEN / ;IF [r2] ≥ 0 [PC]← THEN
ELSE / ADD / r0,r0,#20 / ;[r0] ← [r0] + 20
B / EXIT / ;[PC] ← EXIT
THEN / ADD / r0,r0,#5 / ;[r0] ← [r0]+5
EXIT / STR / r0,X / ;[X] ← [r0]

Case 1: P = 12, Q = 9, and the branch is taken (control is transferred to the branch target address);

Case 2: P = 12, Q = 14, and the branch is not taken (control is transferred to PC+4).

Let’s look at another example of the use of conditional branching in the mechanization of a loop that calculates 1 + 2 + 3 + … + 20. In this case a counter is incremented from 1 to 20. On the final pass, the count becomes 21. The operation CMPr0,#21 compares the counter value in r0 with the literal 21 by subtraction. The next operation BNE Next makes a branch back to the instruction labeled by ‘Next’ unless the previous result was zero. On the 20th iteration, the result becomes zero and the branch is not taken and the loop exited.

LDR r0,#1;Put 1 in register r0 (the counter)

LDR r1,#0;Put 0 in register r1 (the sum)

NextADD r1,r1,r0;REPEAT:Add the current count to the sum

ADD r0,r0,#1; Add 1 to the counter

CMP r0,#21; Have we added all 20 numbers?

BNE Next;UNTIL we have made 20 iterations

STOP;If we have THEN stop

We’ll use theADD instruction to add together the four values in registers r2, r3, r4, and r5. This code is typical of RISC processors like the ARM.

ADD r1,r2,r3 ;r1 = r2 + r3

ADD r1,r1,r4 ;r1 = r1 + r4

ADD r1,r1,r5 ;r1 = r1 + r5 = r2 + r3 + r4 + r5

You have already seen fragments of ARM assembly language and now we introduce some of the features that enable you to write programs that will run in an ARM environment.ARM instructions are written in the form

Label Op-code operand1, operand2, operand3 ;comment

e.g., Test_5 ADD r0,r1,r2 ;calculate TotalTime = Time + NewTime

MOV r7, #5 ;Loadloopcounter with 5

BEQ Test_5 ;IF zero THEN goto Test_5

The Label field is a user-defined label that can be used by other instructions to refer to that line; for example, by a conditional branch. Note that it doesn’t matter whether there are one or more spaces after the commas in argument lists; you can write operand1,operand2 or operand1, operand2.

Let’s look at a simplefragment of ARM code. Suppose we wish to generate the sum of the cubes of numbers from 1 to 10. We can use the multiply and accumulate instruction as follows;

MOV r0,#0 ;clear total in r0

MOV r1,#10 ;FOR i = 1 to 10 (count down)

Next MUL r2,r1,r1 ; square number

MLA r0,r2,r1,r0 ; cube number and add to total

SUBS r1,r1,#1 ; decrement counter (set condition flags)

BNE Next ;END FOR (branch back on count not zero)

We begin with a program that can be executed on an ARM computer or a PC with an ARM cross-development system. The following fragment of code demonstrates the structure of the simple program we described above that forms the cubes of the first ten integers. The text in blue represents assembly directives rather than executable ARM code.

AREA ARMtest, CODE, READONLY

ENTRY

MOV r0,#0 ;clear total in r0

MOV r1,#10 ;FOR i = 1 to 10

Next MUL r2,r1,r1 ; square number

MLA r0,r2,r1,r0 ; cube number and add to total

SUBS r1,r1,#1 ; decrement loop count

BNE Next ;END FOR

END

The following fragment of ARM code provides a demonstration of storage allocation and the use of the ALIGN directive.

Stop B Stop ;infinite loop!

AREA Directives, CODE, READONLY

ENTRY

MOV r6,#XX ;load r6 with 5 (i.e., XX)

LDR r7,P1 ;load r7 with the contents of location P1

ADD r5,r6,r7 ;just a dummy instruction

MOV r0, #0x18 ;angel_SWIreason_ReportException

LDR r1, =0x20026 ;ADP_Stopped_ApplicationExit

SVC #0x123456 ;ARMsemihosting (formerly SWI)

XX EQU 5 ;equate XX to 5

P1 DCD 0x12345678 ;store hex 32-bit value 1345678

P3 DCB 25 ;store the byte 25 in memory

YY DCB 'A' ;store byte whose ASCII character is A in memory

Tx2 DCW 12342 ;store the 16-bit value 12342 in memory

ALIGN ;ensure code is on a 32-bit word boundary

Strg1 = "Hello"

Strg2 = "X2", &0C, &0A

Z3 DCW 0xABCD

END

The following code fragment demonstrates the use of the ADRpseudoinstruction.

ADR r1,MyArray;set up r1 to point to MyArray

LDR r3,[r1];read an element using the pointer

. .

MyArray DCD 0x12345678

Let’s look at howpseudoinstructionsare treated by the ARM development system. Consider the following code fragment. This is just dummy code intended to illustrate a point; it doesn’t have any purpose.

AREA ConstPool, CODE, READONLY

ENTRY

LDR r0,=0x12345678;load r0 with a 32-bit constant

ADR r1,Table;load r1 with the address of Table

ADR r2,Table1;load r2 with the address of Table1

LDR r3, = 0xAAAAAAAA;load r3 with a 32-bit constant

LDR r4,P3;what does this do?

Table DCD 0xABCDDCBA;dummy data

Table1 DCD 0xFFFFFFFF

P3 DCD 0x22222222

The compare instructionCMP r0,r1 evaluates [r0] – [r1] then updates the status bits accordingly. A special case of the comparison instruction isTSTtest, which performs a comparison with zero, since ARM lacks an explicit compare-with-zero instruction. We look at this instruction in more detail later. Considerthe following example:

CMP r1,r2 ;is r1 = r2?

BEQ DoThis ;if equal then gotoDoThis

ADD r1,r1,#1 ;else add 1 to r0

DoThis SUB r1,r1,#1 ;subtract 1 from r1

For example, the ARM assembly code that multiplies 121 by 96 is

MOV r0,#121;load r0 with 121

MOV r1,#96;load r1 with 96

MUL r2,r0,r1;r2 = r0 x r1

The following code fragment shows how the multiply and accumulate instruction is used to form the inner product between two n-component vectors Vector1 and Vector2.

MOV r4,#n ;r4 is the loop counter

MOV r3,#0 ;clear the inner product

ADR r5,Vector1 ;r5 points to vector 1

ADR r6,Vector2 ;r6 points to vector 2

Loop LDR r0,[r5], #4 ;REPEAT read a component of A and update the pointer

LDR r1,[r6], #4 ; get the second element in the pair

MLA r3,r0,r1,r3 ; add new product term to the total (r3 = r3 + r0·r1)

SUBS r4,r4,#1 ; decrement the loop counter (and remember to set the CCR)

BNE Loop ;UNTIL all done

A typical application of logical operations might be to merge groups of bits, an operation that is commonly used to pack more than one variable into a register or memory location. Suppose that register r0 contains the 8 bits bbbbbbxx, register r1 contains the bits bbbyyybb and register r2 contains the bits zzzbbbbb, where x, y, and z represent the bits of desired fields and the b’s are unwanted bits. We wish to pack these bits to get the final value zzzyyyxx. We can achieve this by:

AND r0,r0,#2_00000011;Mask r0 to two bits xx

AND r1,r1,#2_00011100;Mask r1 to three bits yyy

AND r2,r2,#2_11100000;Mask r2 to three bits zzz

OR r0,r0,r1;Merge r1 and r0 to get 000yyyxx

OR r0,r0,r2;Merge r2 and r0 to get zzzyyyxx

A typical application of logical shiftingis to extract a bit pattern from within a word. Suppose we have an 8-bit string bxxxxbbb, where the xs represent the bits to be extracted and the bs denote don’t-care values. We can extract and right-justify the required field, as follows (note that this code is for illustration and is not ARM code).

LSR r0,r0,#3,;Shift r0 three places right to get 000bxxxx

AND r0,r0,#2_00001111;Mask out unwanted bits to get 0000xxxx

ARM’s unconditional branch instruction has the form B target, where target denotes the branch target address (BTA, the address of the next instruction to be executed). The following fragment of code demonstrates how the unconditional branch is used.

.. do thisSome code

.. then thatSome other code

B NextNow skip past the next instructions

.. …the code being skipped past

Next ..Target address for the branch, denoted by label Next

ARM’s conditional branches are similar to those of other RISC and CISC processors. They consist of a mnemonic Bcc and a target address, where the subscript defines one of 16 conditions that must be satisfied for the branch to be taken and the target address is the location of the place in the code where execution continues if the branch is taken. A typical conditional example of conditional behavior in a high-level language is given by the following construct.

If (X ==Y) {THEN Y = Y + 1;

ELSE Y = Y + 2}

CMP r1,r2;assume r1 contains y and r2 contains x: compare them

BNE plus2;if not equal then branch to the else part

ADD r1,r1,#1;if equal fall through to here and add one to y

B leave;now skip past the else part

plus2 ADDr1,r1,#2 ;ELSE part add 2 to y

leave …;continue from here

The FOR loop

MOV r0,#10;set up the loop counter

Loop code ...;body of the loop

SUBS r0,r0,#1;decrement loop counter and set status flags

BNE Loop;continue until count zero–branch on not zero

Post loop ...;fall through on zero count

The WHILE loop

Loop CMP r0,#0 ;perform test at start of loop

BEQ WhileExit;exit on test true

code ...;body of the loop

B Loop;Repeat WHILE true

WhileExit Post loop ...;fall through on zero count

The UNTIL loop

Loop code ...;body of the loop

CMP r0,#0;perform test at start end of loop

BNE Loop;RepeatUNTIL true

Post loop ...;fall through on zero count

ARM’s conditional execution mode makes it easy to implement conditional operations in a high-level language. Consider thefollowing fragment of C code.

if (P == Q) X = P – Y ;

If we assume that r1 contains P, r2contains Q, r3contains X, and r4contains Y, then we can write

CMP r1,r2 ;compare P == Q

SUBEQ r3,r1,r4 ;if (P == Q) then r3 = r1 - r4

Now consider a more complicated exampleof a C construct with a compound predicate:

if ((a == b) & (c == d)) e++;

CMP r0,r1 ;compare a == b

CMPEQ r2,r3 ;if a == b then test c == d

ADDEQ r4,r4,#1 ;if a == b AND c == d THEN increment e

Without conditional execution, we might write

CMP r0,r1 ;compare a == b

BNE Exit ;exit if a =! b

CMP r2,r3 ;compare c == d

BNE Exit ;exit if c =! d

ADD r4,r4,#1 ;else increment e

Exit

Consider:

if(a == b) e = e +4;

if(a b) e = e +7;

if(a b) e = e + 12;

CMP r0,r1 ;compare a == b

ADDEQ r4,r4,#4 ;if a == b then e = e + 4

ADDLE r4,r4,#7 ;if a b then e = e + 7

ADDGT r4,r4,#12 ;if a b then e = e + 12

Once again, using conventional non-conditional execution, we would have to write something like the following to implement this algorithm.

CMP r0,r1 ;compare a == b

BNE Test1 ;not equal try next test

ADD r4,r4,#4 ;a == b so e = e+4

B ExitAll ;now leave

Test1 BLT Test2 ;if a < b then

ADD r4,r4,#12 ;if we are here a > b so e = e + 12

B ExitAll ;now leave

Test2 ADD r4,r4,#7 ;if we are here a < b so e = e + 7

ExitAll

Literal addressing is used by high-level language (HLL) constructs that specify a constant rather than a variable, such as:

IF I > 25 THEN J = K + 12 ,

where the constants 12 and 25 can be specified by literal addressing. We can express this as:

;assume I is in r0, J in r1, and K in r2

CMP r0,#25;Compare I with the value 25

BLE Exit;IF I ≤ 25 THEN exit

ADD r1,r2,#12; ELSE add 12 to K

Exit;...

We can simplify the code by using conditional execution as follows.

CMP r0,#25;Compare I with the value 25

ADDGE r1,r2,#12 ;IF I ≤ 25 THEN exit

Consider the following example where a table of seven entries representsthe days of the week.D1 represents Monday andD2 represents Tuesday, etc. If Diis day i then Di+1 represents the next day. In order to move from one day to the next, all we need do is increment index i. This is why we need variable addresses.

ADR r0 = week;r0 points to array week

ADD r0,r0,r1 LSL #2;r0 now points at the day whose value is in r1

LDR r2,[r0];read the data for this day into r2

Week DCD ;data for day 1

DCD ;data for day 2

DCD;data for day 7

Consider the following fragment of C code:

for (i = 0; i < 21; i++)

{

j[i] = j[i] + 10;

}

The values 0, 21, and 10 in this program are constants specified via immediate addressing during compilation. We can translate the above high-level code into ARM assemble language as follows.

MOV r0,#0 ; Set counter i in r0 to initial value zero

ADR r8,#j ; Index register r8 points to array j (pseudoinstruction)

Loop LDR r1,[r8] ; REPEAT Get j[i]

ADD r1,r1,#10 ; Add 10 to j[i]

STR r1,[r8] ; Save j[i]

ADD r0,r0,#1 ; Increment loop counter i

CMP r0,#21 ; Compare loop counter with terminal value + 1

BNE Loop ; UNTIL i = 21

Note that we have counted up from 0. Had we loaded r0 with 10, we could have used a SUBS r0,r0,#1 to decrement the counter, followed by a BNE Loop to save an instruction.

Let’s look as a simple but typical example of offset addressing. The following fragment of code demonstrates the use of offsets to implement array access. Because the offset is a constant, it cannot be changed at runtime.

Sun EQU 0;offsets for days of the week

Mon EQU 4

Tue EQU 8

Sat EQU 24

ADR r0, week;r0 points to array week

LDR r2,[r0,#Tue];read the data for Tuesday into r2

Week DCD ;data for day 1(Sunday)

DCD ;data for day 2 (Monday)

DCD ;data for day 3 (Tuesday)

DCD ;data for day 4 (Wednesday)

DCD ;data for day 5 (Thursday)

DCD ;data for day 6 (Friday)

DCD;data for day 7 (Saturday)

Consider the following example of the addition of two arrays.

Len EQU 8;let’s make the arrays 8 words long

ADR r0,A - 4;register r0 points at array A

ADR r1,B - 4;register r1 points at array B

ADR r2,C - 4;register r2 points at array C

MOV r5,#Len;use register r5 as a loop counter

Loop LDR r3,[r0,#4]!;get element of A

LDR r4,[r1,#4]!;get element of B

ADD r3,r3,r4;add two elements

STR r3,[r2,#4]!;store the sum in C

SUBS r5,r5,#1;test for end of loop

BNE Loop;repeat until all done

Memory access operations have a conditional execution field, bits 31-28 of the op-code, and can be conditionally executed like other ARM instructions. This facility makes it possible to write code like

;if (a ==b) then x = p else x = q

CMP r1,r2;if (a == b)

LDREQ r3,[r4];then x = p

LDRNE r3,[r5];else x = q

Let’s look at a simple example of the use of a subroutine. Suppose that you wanted to evaluate the function if x > 0 then x = 16x + 1 elsex = 32x several times in a program. Assuming that the parameter x is in register r0, we can write the following subroutine.

Func1 CMP r0,#0;test for x > 0

MOVGT r0,r0, LSL #4 ;if x > 0 x = 16x

ADDGT r0,r0,#1 ;if x > 0 then x = 16x + 1

MOVLT r0,r0, LSL #5 ;ELSE if x < 0 THEN x = 32x

MOV pc,lr :return by restoring saved PC

We’ve made use of conditional execution here. The only thing needed to turn a block of code into a subroutine is an entry point (the label ‘Func1’) and a return (the BL). Consider the following.

LDR r0,[r4] ; get P

BL Func1 ; P = (if P > 0 then 16P + 1 else 32P)

STR r0,[r4] ; save P

. some code

LDR r0,[r5,#20] ; get Q

BL Func1 ; Q = (if Q > 0 then 16Q + 1 else 32Q)

STR r0,[r5,#20] ; save P

Because the branch with link instruction can be conditionally executed, ARM provides a full set of conditional subroutine calls, for example:

CMP r9,r4 ;if r9 < r4

BLLT ABC ;then call subroutine ABC

Suppose we wish to obtain the absolute value of a signed integer; that is, if x < 0 then x = - x. This fragment of code uses theTEQ instruction and a reverse subtract operation.

TEQ r0,#0 ;compare r0 with zero

RSBMI r0,R0,#0 ;if negative then 0 – r0 (note use of reverse subtract)

Suppose the data we wish to re-order,0xABCDEFGH,is in r0and r1 is a working register. The following code (taken from ARM literature) implements this operation which generate the new sequence 0xGHEFCDAB (i.e., the bytes have been reversed but not the nibbles in the bytes). The comment fields for each of these operations show what’s happening to the data.

EOR r1,r0,r0, ROR #16 ; AE, BF, CG, DH, EA, FB, GC, HD

BIC r1,r1, #0x00FF0000 ; AE, BF, 0, 0, EA, FB, GC, HD

MOV r0,r0,ROR #8 ; G,H,A,B,C,D,E,F

EOR r0,r0,r1, LSR #8 ; r1 after LSR #8 is 0,0, AE, BF, 0, 0, EA, FB

; G,H,AAE, BBF, C,D,E EA,FFB

; G,H,E,F,C,D,A,B

The ARM’s ability to shift an operand before using it in an addition or subtraction provides a convenient way to multiply by 2n – 1 or 2n + 1. Consider the following fragment of code that exploits both this feature and conditional execution.

;IF x > y THEN p = (2n + 1)q

; ELSE IF (x = y) p = 2n·q

; ELSE p = (2n – 1)·q

CMP r2,r3 ;Compare x and y

ADDGT r4,r1,r1, LSL #n ;IF > calculate p = q·(2n + 1)

MOVEQ r1,r1, LSL #n ;IF = calculate p = q·2n

RSBLT r4,r1,r1, LSL #n ;IF < calculate p = q·(2n - 1)

In this example we’ll convert to lower-case text. Bit 5 of an ASCII character is zero for upper-case letters, and one for lower-case letters. It is easy to detect upper-case letters because they are contiguous, beginning with ‘A’ and ending with ‘Z’. Assuming the character to convert is in r0 and the remaining bits of r0 are all clear, we can write

CMP r0,#’A’ ;Are we in the range of capitals?

RSBGES r1,r0,#’Z’ ;Check less than Z if greater than A. Update flags

ORRGE r0,r0,#0x0020 ;If A to Z then set bit 5 to force lower-case

The first instruction checks whether the character is ‘A’ or greater. If it is, the second line checks that the character is less than ‘Z’. Note that this test is performed only if the character in r0 is greater than ’A’ and that we are using reverse subtraction because we wish to test whether ‘Z’ – char is positive. The mnemonic is “if greater than or equal to then reverse subtract and update the status bits on the result”. Finally, if we are in range, the conditional OR instruction is executed and an upper- to lower-case conversion is performed.

Consider the switch statement in a high level language. For example

switch (i) {

case 0: do action; break;

case 1: do action 1; break;

case n: do action n; break;

default: exception

}

ADR r1, Case ;load r1 with the address of the jump table

CMP r0,#maxCase ;better see if the switch variable is in range

ADDLE pc,r1,r0, LSL #2 ;if OK then jump to the appropriate case

;default exception handler here

Case B case0 ;from the case table jump to the actual code