ENCM415 Blackfin ADSP-BF533 Assignment 3, 2007
Handling 64-bit operations on a Blackfin
Only 1 report needed from you and your laboratory partner
Due Tuesday16thOctober 10 p.m.
(25% penalty if late up to 1 day, zero marks after that as answers will be posted)

The following letter was received in the Dean’s Office from Santa Claus early in September

Santa’s House

North Pole,

Canada H0H0H0

Dear ENCM415 student,

I have tried to become a “technology-aware” Santa Claus. I went down the chimney of a number of Grade 12 students and placed ADSP-BF533 evaluation boards into their stockings; but nothing has gone right. 

They claim that when they use the evaluation board for 1 second or 5 seconds, they find that the processor performs well – giving over 300 MHz clock rates. BUT when they use the board for 10 seconds, the processor gets tired and slows down to only 6 MHz. To prove this fact, they have sent me some of their test files and the screen capture below

Unless I can find out (A) why they got such strange results, and (B) fix the problem so I can show them that the processor does not get tired, my reputation is ruined. Can you help?

Father Christmas

P.S. I know you guys are working hard. I have told your instructor that I will be putting coal in his stocking for next Christmas unless he lets you use the same assembly language functions you develop for this assignment during Lab. 2 to cut down your workload!

Assignment 3 details

If you go to the ENCM415 Assignment 3 web page, you will be able to download many prepared files into your ENCM415Lab Assign3 directory to cut down on the amount of typing you need to do to complete this assignment. You will need to generate an Assignment3.prj to run the code provided and the code you develop.

The grade 12 students sent Santa the following “main.cpp” file

#include "../Lab1/Lab1.h"

#include "../Assign3/Assignment3Functions.h"

int main( ) {

Initialize_GPIOFlagsASM( );// Set up the GPIO interface

if (CheckCableError( ) == true) return(1); // Check Cable

InitFlashASM( );// Set up the Flash memory

InitFlashPortASM( ); // Set up Flash LED port

puts("\nYou will need to adjust the constant COUNT_1_SECOND");

puts("in file \"Assignment3Functions.h\" to get the lights to");

puts("flash every second -- otherwise speed calculation is incorrect");

FlashLEDLightsEverySecondFor5Seconds( );

puts("\n");

CalculateBlackfinSpeed(1); // Calculate Blackfin speed over 1 second

CalculateBlackfinSpeed(2);// over 2 seconds

CalculateBlackfinSpeed(5);// over 5 seconds

CalculateBlackfinSpeed(10);// over 10 seconds

CalculateBlackfinSpeed(1); // Calculate Blackfin speed over 1 second again

}

If you look at the provided files you will see that the students have built a function

void Count1Second(void) {

unsigned int count;

for (count = 0; count < COUNT_1_SECOND; count++)

count = count;

}

that simple “spins the processor” for (roughly) 1 second provided YOU set the constant COUNT_1_SECOND correctly. In order to set this constant the grade 12 students used some Lab. 1 code from the web to flash the LEDs using the Count1SecondRoutine( ) function

void FlashLEDLightsEverySecondFor5Seconds(void) {

int count = 0;// Increment LED roughly every 1 second for 5 seconds

// Need to adjust constant COUNT_1_SECOND

for (count = 0; count < 5; count++) { // to make the light flash time reasonably accurate

WriteFlashLEDASM(count);

Count1Second( );

}

}

Later you will be able to use that routine, and your wristwatch, to find the best value for the constant COUNT_1_SECOND.

To measure how fast the processor was “clocking” the students called this routine

void CalculateBlackfinSpeed(int numSeconds) { // Number of seconds to run the speed test

unsigned long long int firstCount, secondCount, difference; // Location to store the 64-bit CYCLES register value

StopCycleCounterASM();

ResetCycleCounterASM();

StartCycleCounterASM();

firstCount = ReadCycleCounterASM( ); // Find starting time

for (int count = 0; count < numSeconds; count++)

Count1Second( );

StopCycleCounterASM(); // Stop system counter

secondCount = ReadCycleCounterASM( ); // Find finishing time

difference = secondCount - firstCount; // Find total number of cycles

printf("If constant COUNT_1_SECOND properly adjusted\n");

printf("so that lights were flashing every 1 second in earlier test\n");

printf("Then %llu Blackfin Cycles in %d seconds\n", difference, numSeconds);

printf("Blackfin operating at %llu MHz\n\n", difference / 1000000LL / numSeconds);

}

You will need to add the file SystemCycleReader.asm toyour Assign3 directory. This file already contains the code for void StopCycleCounterASM(void) and void StartCycleCounterASM(void) -- routines that stop and start the Blackfin CYCLES system register (background clock) that is used to time how long the Blackfin takes to execute code. Later in this assignment you will have to complete the functions
unsigned long long int ReadCycleCounterASM(void);
unsigned long long int ResetCycleCounterASM(void);

In the mean time, simply construct stubs for these functions – placing zeros in register R1 and R0 inside the stubs. As we will see later, the Blackfin uses the 32-bit register R1 as the high 32-bits of a 64-bit unsigned long long register and uses the 32-bit register R0 as the low 32-bits of a 64-bit unsigned long long register when returning unsigned long long ints from functions. Check your ENCM369 notes to see why this is necessary; the same usage of 2 registers for unsigned long long ints happens with the MIPS processors.

If you now build the Assign3 project, with the stubs added, the code will work to activate the lights.

Q1) A) What value of COUNT_1_SECOND is needed to have the lights flash every second – Accuracy of around 15% is okay. Express the value in HEX and decimal 3 marks

B) Get a letter from another group to indicate that you have build the Assign3 project and have the lights flashing every 1 second. 5 marks

Q2) Load the Assign3 executable onto the processor. Now click on the Count1Second.cpp file to bring it into the editor window. Right click in the window and select the option “Mixed”. You should now see both the C++ code and the assembly code.

Capture a screen shot of the code and place in your report.2 marks

Answer the following questions based on your screen shot

A)What is the memory location (in section program) for the start of the Count1Second( ) function?

B)What is the memory location (in section program) for the end of the Count1Second( ) function?

C)What register is being used for the count variable?

D)What is the memory location (in section program) where the test count < COUNT_1_SECOND is made?

E)What is the memory location (in section program) where the operation count++ is performed?

5 marks

Q3. We can now ask the compiler to generate optimized code.

Switch the compiler setting from “debug (non-optimized)” to “release (optimized)” – see pictures below

Rerun your code and watch the lights – explain what you see. Was the speed improvement about what expected? 2 marks

In order to be able to see both the C++ code and the assembly code of an optimized piece of code, you must modify the project options to “generate debug information” – see picture below

Once you have set the project options, rebuilt the “release” version of the code.

Load the Assign3 executable onto the processor. Now click on the Count1Second.cpp file to bring it into the editor window. Right click in the window and select the option “Mixed”. You should now see both the C++ code and the assembly code.

Capture a screen shot of the code and place in your report.2 marks

Answer the following questions based on your screen shot

A)What is the memory location (in section program) for the start of the Count1Second( ) function?

B)What is the memory location (in section program) for the end of the Count1Second( ) function?

C)What register is being used for the count variable?

D)What is the memory location (in section program) where the test count < COUNT_1_SECOND is made?

E)What is the memory location (in section program) where the operation count++ is performed?

2 marks

BONUS – You can add ONE keyword to the Count1Second.cpp code and restore (most of) the code to this function in an optimized format – what is that keyword, where does it go, and why does it work, what optimizations are present in the code?TO GET THIS BONUS, YOU MUST WORK OUT THE ANSWER YOURSELF, DON’T ASK ANY BODY ELSE, INCLUDING T.A.S AND INSTRUCTOR.

HINT: The keyword has already been mentioned in class
6 marks

SWITCH BACK TO DEBUG MODE AND REBUILT THE CODE

We now need to build the functions

unsigned long long int ReadCycleCounterASM(void); // Return the 64 bit value of the Blackfin CYCLE register

unsigned long long int ResetCycleCounterASM(void); // Reset the 64 bit value of the Blackfin CYCLE register to zero and return the old value (before reset) of the Blackfin CYCLE register

I’ll give you a few hints about handling 64 bit registers on the Blackfin – it is very similar to what you have been taught for the MIPS

Lets add 2 64-bit values together in C++

Unsigned long long int first = 0x0002 0000 0004 0000 // spaces present for ease of understanding
Unsigned long long int second = 0x0005 0000 0001 0000
Unsigned long long int result = first + second;

Answer expected is 0x0007 0000 0005 0000

The Blackfin assembly language (done in pseudo code to avoid writing all the hi( ) and lo( ) operations) is like this

Unsigned long long int first = 0x0002 0000 0004 0000 // spaces present for ease of understanding
R0 = 0x0004 0000 // Use this pair of registers for first
R1 = 0x0002 0000
Unsigned long long int second = 0x0005 0000 0001 0000
R2 = 0x0001 0000 // Use this pair of registers for second
R3 = 0x0005 0000
Unsigned long long int result = first + second;
R4 = R2 + R0; // Use this pair of registers for third
R5 = R3 + R1; // We need to worry about “carries” if R2 + R0 overflows.

Using this information we can write the assembly code section to “reset the 64-bit Blackfin CYCLES register.

BIG HINT for the rest of the assignment – read the hardware and software manuals about the CYCLES register – about a page in each booklet – before completing the writing of the code

#include “macros.h”
#include “defsBF533.h”

// Manual says that CYCLES2 is the name of the high 32-bits of the 64-bit CYCLES system register
// and CYCLES is the name of the low 32-bits of the 64-bit CYCLES system register

// CYCLES, CYCLES2, P0, P1, STAT are all system registers, not memory mapped registers (Quiz hint – what is the difference between system registers and memory mapped registers when used in code). Most system registers are NOT like data or pointer registers – you cannot put a constant into a system register unless that system register is a data or pointer register. You must first put that value into a data register and then TRANSFER the value into the system register

// unsigned long int zeroULL = 0x0000 0000 0000 0000

#define zeroHighBits_R1 R1
#define zeroLowBits_R0 R0
zeroHighBits_R1 = 0;
zeroLowBits_R0 = 0;

/// CyclesRegister = zeroLL
// Zero high 32bits of CYCLES register then Zero low 32 bits of CYCLES register
CYCLES2 = zeroHighBits_R1;
CYCLES = zeroLowBits_R0;

Q4) Now built the functions
THESE WILL BE NEEDED IN LAB. 2 SO COMPLETE BEFORE COMING INTO THE LAB.

unsigned long long int ReadCycleCounterASM(void); // Return the 64 bit value of the Blackfin CYCLE register, High 32-bits of CYCLE returned in R1 register, low 32-bits of cycle returned in R0 register

unsigned long long int ResetCycleCounterASM(void); // Reset the 64 bit value of the Blackfin CYCLE register to zero and return the old value (before reset) of the Blackfin CYCLE register. High 32-bits of CYCLE returned in R1 register, low 32-bits of cycle returned in R0 register

add them to the Assignment 3 code.

Provide the screen of these documented assembly language functions following the required format for ENCM415 routines after they have assembled and linked correctly. 10 marks.

Q5) DON’T SPEND TOO LONG TRYING TO GET FULL MARKS ON THIS QUESTION.

Run your code and measure the speed of the Blackfin processor after running the code for 1 second, 5 seconds and 10 seconds. Does your code behave like the grade 12 students who wrote to Santa?

Write a short report to Santa answering the two questions he asked in his letter.

REMINDER: Unless I can find out (A) why they got such strange results, and (B) fix the problem so I can show them that the processor does not get tired, my reputation is ruined. Maximum 1 page 10 marks

There is a lot of documentation about how to use the CYCLES register in assembly code available from the Analog Devices Website -- like most manufacturers’ websites -- not easy to find.

HINT: The answers to Santa’s two questions can be found in the hardware and software manuals. This assignment shows the importance of reading those manuals, especially when you, or somebody else, finds that the processor does not work as expected.

Q5) Question from 2005 Final – Hand in answer on another sheet 10 marks

Design a documented Blackfin assembly code subroutine
void WaitForSignal(int *leftAudioChannel, long int threshold)

which will continually monitor the left audio channel and only return from this subroutine when there is a signal (higher than a certain noise threshold level) present on the left audio channel. Follow the coding conventions established for this course. This code should not make use of memory locations declared in other files.

To make life easier for the markers (and increase your chances of partial marks) please try to match your assembly code up (left side of the page) with the documentation (right side of the page).

The answer layout should look something like this

Blackfin assembly code / Documentation

Q6) Question from 2006 Final – PRACTICE – DO NOT HAND IN

In this question, you are going to demonstrate how to move values from one memory location to another memory location. Marks are given for making each loop efficient (You don’t have to have the most efficient solution – but your answer should be reasonably efficient). Use the Blackfin coding conventions established for this course

BLACKFIN ASSEMBLY CODE / C++ CODE
long int first[200];
void MemoryMove(long int size,
long int *second, long int *third) {
// TRANSLATE THIS LOOP USING A
// SOFTWARE LOOP 2.5 marks
for (int count = 0, count < size; count++) {
second[count] = first[count];
}
// TRANSLATE THIS LOOP USING A
// HARDWARE LOOP 2.5 marks
for (int count = 0, count < size; count++) {
third[count] = first[count];
}
}

B) Provide an estimate (% wise) of how much faster your hardware loop will run compared to your software loop. EXPLAIN how you obtained that estimate.