Blackfin Reference Sheet Developed 1St December 2003, Smithmr Ucalgary

Blackfin (ADSP-BFXXX) Reference
V1.3developed 1st Nov. 2004,

PROGRAMMING MODEL

R0 to R7 Data registers R0, R1, R2 volatile
P0 to P5 Pointer registers P0, P1 volatile
FP Frame pointer SP Stack pointer
A0, A1 Accumulator registers LC0, LC1 Loop counters
DSP REGISTERS – ALL VOLATILE
I0 to I3 index registers (Ireg) M0to M3 modify registers (Mreg)
B0 to B3base registers L0 to L3 length registers
Breg start of circular buffer of length Lreg using post-increment register Mregwith index register Ireg
NOTATION CONVENTION
imm signed immediate uimm unsigned immediate
imm3 -4 to +3 uimm3 0 to 7
reg Any register R0 to R7, P0 to P5
dreg Any data register R0 to R7 Preg Any pointer register P0 to P5
statbit: AZ, AN, AC0, AC1, V, VS, AV0, AV0S, AV1, AV1S, AQ
reg_lo low part of register (R0.L) reg_hi high part of register (P0.H)

PARAMETER PASSING EXAMPLE

#define INPAR4_ON_STACK 20 // NOT IN R3
#define INPAR3_SPACE_ON_STACK 16 // In R2
#define INPAR2_SPACE_ON_STACK 12 // In R1
#define INPAR1_SPACE_ON_STACK 8 // In R0
#define RETS_LOCATION_ON_STACK 4
#define OLD_FP_LOCATION_ON_STACK 0 // Relative to FP
#define SAVED_P3 20 // Relative to SP
#define SAVED_P4 16
#define OUTPAR4_ON_STACK 12 // NOT IN R3
#define OUTPAR3_SPACE_ON_STACK 8 // In R2
#define OUTPAR2_SPACE_ON_STACK 4 // In R1
#define OUTPAR1_SPACE_ON_STACK 0 // In R0
section program;
.extern _Somewhere; .extern _Subroutine;
.global _Foo; // void Foo(INPAR1, INPAR2, INPAR3, INPAR4)
_Foo: LINK 24; // 16 spaces for new stack + 2 saved registers
[SP + SAVED_P4] = P4; // Save non-volatile registers on the stack
P4.L = _Somewhere; // Point to memory location _Somewhere
P4.H = _Somewhere; // Reference resolved by linker since .extern
[FP + INPAR1_SPACE_ON_STACK] = R0; // Save for later
[FP + INPAR2_SPACE_ON_STACK] = R2; // Save for later
R0 = [FP + INPAR4_ON_STACK]; // OUTPAR4 = INPAR4
[SP + OUTPAR4_ON_STACK] = R0;
R2 = 0xFFFF (X); // Sign extend OUTPAR3value
// R1 = R1; // OUTPAR2 = INPAR2
R0 = 0xFFFF (Z); // Zero extend OUTPAR1 value
CALL _Subroutine; // Subroutine(0xFFFF, INPAR2, 0xFFFF, INPAR4)
W[P4] = R0; // Store return value as 16-bit
P4.L = lo(FIO_FLAG_D); P4.H = hi(FIO_FLAG_D); // Constant from
// <defsBF533.h> requires hi/lo macros
P4 = [SP + SAVED_P4]; Also see P0 = [FP + 4]; // Get RETS
UNLINK UNLINK
RTS JUMP (P0); // Not clear why used /

PROGRAM FLOW INSTRUCTIONS

JUMP User_Label PC replaced by address of User_Label
JUMP (Preg) PC replaced by value in P-register
IF CC Jump UserLabel if CC = 1 PC replaced by address of User_Label
IF !CC Jump UserLabel if CC = 0 PC replaced by address of User_Label
IF CC Jump UserLabel (bp) IF !CC Jump UserLabel (bp) are versions where the branch is predicted to be taken. Correctly predicting branches improves pipeline performance
CALL User_Label PC replaced by address of User_Label next instruction  RETS
CALL (Preg) PC replaced by value in P-register next instructions  RETS

RTSreturn from subroutine (RETS) RTI return from interrupt (RETI) ,

RTXreturn from exception (RETX) RTNreturn from NME (RETN)

RTEreturn from emulation (RETE) Return register used in brackets

Loop loop_name loopcounter;
Loop_begin loop_name; 1st instr. Loop_end loop_name; last instruction

Lsetup(Label_1stinstruction, Label_last) loopcounter;

Can use Loopcounter, Loopcounter = Preg or Loopcounter = Preg > 1

LTn, LBn, LCn (Loop_Top, Loop_Bottom, Loop_Counter) can be set directly

LOAD / STORE INSTRUCTIONS

reg_lo = uimm16; reg_hi = uimm16; half-word loads
reg = uimm16 (Z); zero extended to 32 bits
reg = imm16 (X); signed extended to 32 bits (also imm7 version)

Loading 32 bit values

reg.L = uimm32 & 0xFFFF; reg.H =(uimm32 >16) & 0xFFFF;
BUT .IMPORT value; reg.L = value; reg.H = value; (half-word correct)
Preg = [ indirect_address ]; [indirect_address] = Pref;
where indirect address is Preg, Preg++, Preg--, Preg + offset, Preg – offset, FP – offset Offsets factor of 4
Dreg = [ indirect address ]; [indirect_address] = Dreg; where indirect address is Preg, Preg++, Preg--, Preg + small / large offset, Preg – large offset, FP – offset. Preg ++ Preg, Ireg, Ireg++. Ireg--, Ireg ++ Mreg
Dreg = W [ indirect address ] (Z); zero-extend half word fetch
Dreg = W [ indirect address ] (X); sign-extend half word fetch
Dreg = B[indirect_address] (Z); Dreg = B[indirect_address] (X) where indirect address is Preg, Preg++, Preg--, Preg + offset, Preg - offset,
Word access only Preg ++ Preg offset factor of 2
Dreg_lo = W[indirect_address]; Dreg_hi = W[indirect_address];
W[indirect_addres] = Dreg_lo; W[indirect_address] = Dreg_hi;
where indirect address is Ireg, Ireg++, Ireg--, Preg, Preg ++ Preg

COMPARE INSTRUCTIONS

CC = Operand_1 == Operand_2;

CC = Operand_1 <= Operand_2; signed compare

CC = Operand_1 <= Operand_2 (UI); unsigned compare

CC = Operand_1 < Operand_2; signed compare
CC = Operand_1 < Operand_2 (UI); unsigned compare

Compare Data Registers -- Not parallel (16-bit)

Operand_1 Dreg Operand_2 Dreg or small constant where small constant is imm3 or uimm3 /

COMPARE (CONTINUED)

Compare Pointer Registers -- Not parallel (16-bit)

Operand_1 Preg Operand_2 Preg or small constant where small constant is imm3 or uimm3

Compare Accumulator Registers -- Not parallel (16-bit)

Operand_1 A0 Operand_2 A1 Always signed compares

MOVE CC INSTRUCTIONS

Dest OP CC Dest Dreg, statbit CC OP Source; Source Dreg, statbit
OP =, |=, &=, ^= e.g. R0 |= CC; Note: CC = Dreg, CC = 1 if Dreg != 0
NEGATE CC INSTRUCTIONS

CC = ! CC;

MOVE INSTRUCTIONS

genreg = genreg ; genreg = dagreg ; dagreg = genreg ;
dagreg = dagreg ; genreg = USP ; USP = genreg ;
Dreg = sysreg ;/* sysreg to 32-bit D-register */
sysreg = Dreg ;/* 32-bit D-register to sysreg */
sysreg = Preg ;/* 32-bit P-register to sysreg */ sysreg = USP;
A0 = A1 ; /* move 40-bit Accumulator value */
A1 = A0 ; /* move 40-bit Accumulator value */
A0 = Dreg ; /* 32-bit D-register to 40-bit A0, sign extended */
A1 = Dreg ; /* 32-bit D-register to 40-bit A1, sign extended */
Accumulator to D-register Move:
Dreg_even = A0 (opt_mode) ;/* move 32-bit A0.W to even Dreg */
Dreg_odd = A1 (opt_mode) ;/* move 32-bit A1.W to odd Dreg */
Dreg_even = A0, Dreg_odd = A1 (opt_mode) ;
/* move both Accumulators to a register pair */
Dreg_odd = A1, Dreg_even = A0 (opt_mode) ;
/* move both Accumulators to a register pair */
IF CC DPreg = DPreg ; /* move if CC = 1 */ Dreg, Preg, SP, FP
IF ! CC DPreg = DPreg ; /* move if CC = 0 */ Dreg, Preg, SP, FP
Dreg = Dreg_lo (Z) ; Dreg = Dreg_lo (X) ;
Dreg = Dreg.B (Z); Dreg = Dreg.B (X); lowest 8 bits
Acc.X = Dreg_lo; Least significant 8-bits moved
Dreg_lo = Acc.X; 8 bits moved, sign extended
Acc.L = Dreg_lo; Least significant 16-bits moved
Dreg_lo = Acc.L; 16 bits moved
Acc.H = Dreg_hi; Most significant 16-bits moved
Dreg_hi = Acc.H; 16 bits moved
Accumulator to Half D-register Move supports the following options
Signed fraction format (default).
Unsigned fraction format (saturated) (FU).
Signed and unsigned integer formats (IS) (IU).
Signed fraction with truncation (T),
Signed fraction with scaling and rounding (S2RND),
Signed integer with scaling (ISS2),
Signed integer with high word extract (IH) MORE INFO TO BE ADDED

STACK INSTRUCTIONS -- SP point to next used location

[ -- SP] = allreg; allreg = [SP ++];
[ -- SP] = ( R7 : Dreglim, P5 : Preglimit)– or Dreg and Preg on their own
LINK uimm (Manual says minimum value is 8, but LINK 0 and LINK 4 seem OK)
Saves RETS and FP on stack, copies SP into FP and then decrements SP
UNLINK causes FP  SP then Mem[SP ++]  FP, Mem[SP++]  RETS

LOGICAL INSTRUCTIONS

Dreg = Dreg1 LOGICAL_OP Dreg2; LOGICAL_OP - &, |, ^
Dreg = ~Dreg1; complement
Also BXOR and BXORSHIFT -- more later

BIT INSTRUCTIONS

BitInstruction(Dreg, bit position) where bit_position is 0 to 31
BitInstruction is BITCLR (clear), BITSET (set), BITTGL (toggle),
CC =BITTST (Dreg, bit position) Bit test
CC = !BITTST (Dreg, bit position) Bit test
R0 = R1.B(X); R0 = R1.B(Z); // Extract and sign extend a byte value
// CAN”T DO MATH ON A BYTE VALUE DIRECTLY
Dreg = DEPOSIT ( backgroundDreg, foregroundDreg ) ;
Dreg = DEPOSIT ( Dreg, Dreg ) (X) ;/* sign-extended */
Foreground format – bits 31 to 16, pattern to be moved,
bits 15 to 8, position in backgroundDreg where last (right) bit is moved
bits 7 to 6, length of bits 31 to 16 to be moved
R7 = DEPOSIT(R4, R3);
R4=0b11111111111111111111111111111111
R3=0b0000000000000000 00000111 00000011
R7=0b11111111111111111111110001111111
R7 = DEPOSIT(R4, R3) (x) ;/* sign-extended*/
R4=0b11111111111111111111111111111111
R3=0b0101101001011010 00000111 00000011
R7=0b00000000000000000000000101111111
Dreg = EXTRACT ( sceneDreg, patternDreg_lo ) (Z) ;
Dreg = EXTRACT (Dreg, Dreg_lo ) (X) ;/* sign-extended (b)*/
PatternDreg format bits 15 to 8, position in screenDreg extracted
bits 7 to 6, length to be extracts from sceneDreg
R7 = EXTRACT (R4, R3L) (Z) ;/* zero-extended*/
R4=0b10100101101001011100001110101010
R3=0bxxxxxxxxxxxxxxxx00000111 00000100
R7=0b00000000000000000000000000000111
R7 = EXTRACT (R4, R3.L) (X) ;/* sign-extended*/
R4=0b10100101101001011100001110101010
R3=0bxxxxxxxxxxxxxxxx00000111 00000100
R7=0b00000000000000000000000000000111
BITMUX ( Dreg , Dreg , A0 ) (ASR) ;/* shift right, LSB is shifted out */
BITMUX ( Dreg , Dreg , A0 ) (ASL) ;/* shift left, MSB is shifted out */
In the Shift Right version, the processor performs the following sequence.

Right shift Accumulator A0 by one bit. Right shift the LSB of source_1 into the MSB of the Accumulator.
Right shift Accumulator A0 by one bit. Right shift the LSB of source_0 into the MSB of the Accumulator.

In the Shift Left version, the processor performs the following sequence.

Left shift Accumulator A0 by one bit. Left shift the MSB of source_0 into the LSB of the Accumulator.

Left shift Accumulator A0 by one bit. Left shift the MSB of source_1 into the LSB of the Accumulator.

Dreg.L = ONES Dreg; return the number of bits set in Dreg /

SHIFT / ROTATE INSTRUCTIONS

dest_pntr = (dest_pntr + src_reg) < 1; Down shift not allowed
dest_pntr = (dest_pntr + src_reg) < 2;
dest_reg = (dest_reg + src_reg) < 1;
dest_reg = (dest_reg + src_reg) < 2;
dest_pntr = adder_pntr + ( src_pntr < 1 );
dest_pntr = adder_pntr + ( src_pntr < 2 );
ARITHMETIC SHIFT ASHIFT or >
dest_reg >= shift_magnitude;
dest_reg = src_reg > shift_magnitude (opt_sat);
dest_reg = src_reg < shift_magnitude (S);
accumulator = accumulator > shift_magnitude;
dest_reg = ASHIFT src_reg BY shift_magnitude (opt_sat);
accumulator = ASHIFT accumulator BY shift_magnitude;
LOGICAL SHIFT LSHIFT or >
dest_pntr = src_pntr > 1; dest_pntr = src_pntr < 1;
dest_pntr = src_pntr > 2; dest_pntr = src_pntr < 2;
dest_reg >= shift_magnitude; dest_reg <= shift_magnitude;
dest_reg = src_reg > shift_magnitude;
dest_reg = src_reg < shift_magnitude;
dest_reg = LSHIFT src_reg BY shift_magnitude;
ROTATE
dest_reg = ROT src_reg BY rotate_magnitude;
accumulator_new = ROT accumulator_old BY rotate_magnitude;

PARALLEL OPERATION EXAMPLES

32-bit ALU/MAC instruction || 16-bit instruction ||16-bit instruction ;
saa (r1:0, r3:2) || r0=[i0++] || r2=[i1++] ;
mnop || r1 = [i0++] || r3 = [i1++] ;
r7.h=r7.l=sign(r2.h)*r3.h + sign(r2.l)*r3.l || i0+=m3 || r0=[i0] ;
NOTE: If two parallel memory operations, only one can involve a Preg
NOTE: If two parallel memory operations, then only one can be a write
EXTERNAL EVENT MANAGEMENT
NOP 16-bit NOP MNOP 32-bit NOP e.g. MNOP || NOP || NOP ;
IDLE; CSYNC; (core sync), SSYNC; (system sync),
CLI Dreg (clear interrupts,and save old interrupts to Dreg.
STI Dreg(set interrupts from Dreg),
RAISE uimm4 (force interrupt – effectively software interrupt of any interrupt)
EXCPT uimm4 (force exception – effectively software interrupt of any exception)
TESTSET (Preg)The Test and Set Byte (Atomic) instruction loads an indirectly addressed memory byte, tests whether it is zero, then sets the most significant bit of the memory byte without affecting any other bits. If the byte is originally zero, the instruction sets the CC bit. If the byte is originally nonzero the instruction clears the CC bit.
The sequence of this memory transaction is atomic – meaning it can’t be blocked by interrupts as would the sequence
Read memory into R0, test R0, if CC zero then set R0 = 1, Store R0 back to memory. /

ARITHMETIC INSTRUCTIONS

dest_reg = ABS src_reg; dest_reg = src_reg_1 + src_reg_2;
NOTE: dest_reg.LorH = src_reg1.LorH + src_reg2.LorH (mode); mode = (NS) or (S)
// Arithmetic is saturating or non-saturating (normal math is NS)
NOTE: dest_reg = src_reg_1 +|- srec_reg_2; H + H and L + L operations both done
// Can also do + | +, + | -, - | +, - | -
Dreg_lo_hi = Dreg + Dreg (RND20) ; STEP 1: Downshift by 4 and then
Dreg_lo_hi = Dreg - Dreg (RND20) ; STEP 2: perform operation, round top 16 bits
STEP 3: and use top 16 bits – fractional number
Dreg_lo_hi = Dreg + Dreg (RND12) ; STEP 1:Upshift by 4 and then
Dreg_lo_hi = Dreg - Dreg (RND12) ; STEP 2: perform operation,
STEP 3: round and use top 16 bits
Dreg = MAX ( Dreg , Dreg ) ; Dreg = MIN ( Dreg , Dreg ) ;
Preg -= Preg ; Ireg -= Mreg ;
Preg += Preg (BREV) ; Ireg += Mreg (opt_brev) ;
dest_reg = src_reg_0 * src_reg_1 (opt_mode) (16 bit mult)
Dreg *= Dreg ; (32 bit mult)
accumulator = src_reg_0 * src_reg_1 (opt_mode)
accumulator += src_reg_0 * src_reg_1 (opt_mode)
accumulator –= src_reg_0 * src_reg_1 (opt_mode)
dest_reg_half = (accumulator = src_reg_0 * src_reg_1) (opt_mode)
dest_reg_half = (accumulator += src_reg_0 * src_reg_1) (opt_mode)
dest_reg_half = (accumulator –= src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator = src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator += src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator –= src_reg_0 * src_reg_1) (opt_mode)
dest_reg = – src_reg; dest_accumulator = – src_accumulator
dest_reg = src_reg (RND) (32 bit to 16 bit round and saturate)
accumulator = accumulator (S)
dest_reg = SIGNBITS sample_register
dest_reg = src_reg_1 - src_reg_2; Ireg -= 2 ; Ireg -= 4 ;
VIDEO PIXEL INSTRUCTIONS
ALIGN8, ALIGN16, ALIGN24, DISALGNEXCPT, BYTEOP3P (Dual 16-Bit Add / Clip), Dual 16-Bit Accumulator Extraction with Addition, BYTEOP16P (Quad 8-Bit Add), BYTEOP1P (Quad 8-Bit Average – Byte), BYTEOP2P (Quad 8-Bit Average – Half-Word), BYTEPACK (Quad 8-Bit Pack), BYTEOP16M (Quad 8-Bit Subtract), SAA (Quad 8-Bit Subtract-Absolute-Accumulate), BYTEUNPACK (Quad 8-Bit Unpack)

VECTOR INSTRUCTIONS basically 2 16 bit ops

Add on Sign, VIT_MAX (Compare-Select), Vector Arithmetic Shift, Vector Logical Shift, Vector MIN, Vector Multiply, Vector Multiply and Multiply-Accumulate, Vector Negate (Two’s Complement), Vector PACK, Vector SEARCH
Example Vector Add / Subtract
dest = src_reg_0 +|+ src_reg_1;
ExampleVector MAX
dest_reg = MAX ( src_reg_0, src_reg_1 ) (V)

Example Vector ABS dest_reg = ABS source_reg (V)

Programmable flags (PF) registers
Note that FIO_FLAG_D bits are set during edge-triggered interrupts and must be cleared

/

/
NOTE: The following have a similar format
FIO_MASKA_C (Clear – W1C)
FIO_MASKA_T (Toggle – W1T)
There are also FIO_MASKB registers with same functionalit
WATCH-DOG TIMER

/ INTERRUPT CONTROL

IPEN has same format as ILAT but is read only

CORE TIMER

SPI transmit and receive registers

/ SPI INTERFACE

/ TIMER0, TIMER1, TIMER2
All three timers have equivalent registers

/

There is also an equivalent Timer disable
register (write one to clear)
EVENT TABLE