ENCM515 Final TAKE HOME QUIZ
To be done on an individual basis
You are not allowed to request T.A.
or other people’s assistance

For deadlines – see ENCM515 Web Pages
Probable completion time -- 3 hours
Suggested quit time -- 5 hours

In ENCM515, I used to have individual laboratory quizzes. Each student had 1 hour to do a laboratory exercise; highly stressful for both sides. Biggest problem was – if something did not work, I had to have a mechanism in place to prove that it was the student and not the equipment or the software being used. I gave up on that approach and now provide a take-home quiz.

The idea is for you to demonstrate knowledge gained during the course. There are several “paper” exercises and one laboratory exercise.

Q1. About 20 minute to 40 minutes to answer

Look at Listing 1. This code was part of an exercise where I “wanted” to demonstrate how C++ coding practices changed the speed of code execution. The question to be answered – did I actually prove anything.

I want you to complete this exercise “without doing any coding using VDSP”. Imagine you have been asked the question on the final

A)  Write down the prototypes for these four functions.

Imagine you are explaining your answers to a third year student who has completed ENCM415 (familiar with embedded system concepts but is not familiar with TigerSHARC architecture). It is acceptable if you cut-and-paste a picture of the TigerSHARC architecture into your answer to assist in the explanation

B)  Write-out a short program (main( ) ) that calls all these functions. Syntax is important, but don’t use VDSP to check your result – an occasionally MINOR typo is irrevleant.
Each function inputs an array, processes it (performs complex conjugate), and outputs the processed array. Draw diagrams demonstrating the location of the arrays elements (specified in main) in memory – meaning what components of the arrays are stored where.

C)  Explain why, in principle, you would expect MakeConjugate( ) to be slower in execution than MakeConjugateFaster( )

D)  Very philosophical question:- This code was generated when I first was trying to demonstrate TigerSHARC optimization. I naively believed that each function would be faster than the previous. Explain why, to somebody unfamiliar with TigerSHARC code optimization, this seemed at the time the right answer. What differences in the TigerSHARC architectures did I believe I was taking advantage of to gain speed?

E)  I asked the previous question because I believe that many people who are unfamiliar with the TigerSHARC architecture would believe the same thing. It might even be true is you compiled the code using the compiler in “non-optimized” mode (debug).
Using a resource management diagram (example shown below), calculate the theoretical best speed of each of these functions when processing N. Note, I am asking you to “calculate the theoretical best speed” of these functions, not code them – the coding exercise is done in question 3. Explain your answer.

Other / Adder / Multiplier / J-Bus; J-IALU / K-Bus, K-IALU
Set loop counter
Move j_inpar into K register

Q2. About 15 minute to 30 minutes to answer

. In the TigerSHARC boot sequence there is the following sequence of instructions
CACMDALL = CACMD_EN;; // enable cache - broadcast

rds;; // reduce interrupt to subroutine level

The concept of “enabling the cache” is something that has obvious meaning, even if I don’t understand the syntax; but what does “broadcast” mean, who is doing it, to whom and why?

Also, the next instruction – I think I understand the difference between interrupts and subroutines. Subroutines are called “from a specific location in the code”, save certain registers (not all), return using RTS or an equivalent instruction, execute the instructions following the subroutine call

Some code Subroutine

Call subroutine Save preserved registers

Continue here Code

Recover preserved registers
RTS

where as an interrupt is “hardware generated”, can occur and return “anywhere” within your code, requires saving “all registers” used and returns using RTI or equivalent.

Question is – why would you want to “reduce interrupt to subroutine level”, how does it happen, and part of the program does the code return to with the RTS?
Listing 1

#include <math.h>

float MakeConjugate(float *in_re, float *in_im,

float *out_re, float *out_im, int number) {

int count;

for (count = 0; count < number; count++) {

*out_re = *in_re;

*out_im = -*in_im;

out_re++; out_im++;

in_re++; in_im++;

} // NEW -- returning a parameter

return(count); // Actually returning (float) count

}

float MakeConjugateFaster(float *in_re, float *in_im,

float pm *out_re, float pm *out_im, int number) {

int count;

for (count = 0; count < number; count++) {

*out_re = *in_re;

*out_im = -*in_im;

out_re++; out_im++;

in_re++; in_im++;

} // NEW -- returning a parameter

return(count); // Actually returning (float) count

}

float MakeConjugateFasterStill(float *in_re, float pm *in_im,

float pm *out_re, float *out_im, int number) {

int count;

for (count = 0; count < number; count++) {

*out_re = *in_re;

*out_im = -*in_im;

out_re++; out_im++;

in_re++; in_im++;

} // NEW -- returning a parameter

return(count); // Actually returning (float) count

}

float MakeConjugateFastestl(float *in_re, float pm *in_im,

float pm *out_re, float *out_im, int number) {

int count;

for (count = 0; count < number; count++) {

*out_re++ = *in_re++;

*out_im++ = -*in_im++;

} // NEW -- returning a parameter

return(count); // Actually returning (float) count

}


Q3. Based on ENCM515 Final exam 2001. Original question planned for 25 minutes – this one will take around 50 minutes

The figure below shows a 1 stage IIR filter.

A)  Explain what the terms A2D and D2A are meant to represent

B)  Draw the IIR filter when there are two equivalent IIR stages.

C)  Explain why it is totally pointless to optimize the IIR filter code unless the A2D and D2A are “always ready” (as compared to – processor must “wait till A2D is ready”)

D)  Explain why the coding (or possibly the hardware configuration) will be different (could be different) for a 12-bit A2D compared to when using a 24-bit A2D. The answer should demonstrate that you understand why there is an issue here.

E)  Explain why the following “pseudo code” will provide the wrong answer for a 1 stage IIR filter (there are about three reasons)
Take value after first delay – multiply by 0.9 – place result in register X
Read value from A2D and place in register Y
add X and Y to produce Z
Take value after second delay – multiply by 0.00255 – place result in register W
Subtract Z from W to produce V
Write V to D2A

F)  What is the theoretical best speed for a 2 stage IIR filter of this format on the TigerSHARC. Use resource allocation chart

G)  Now provide the true code for the optimized TigerSHARC 2 stage IIR filter.
NOTE: To reduce the complexity of the problem and the time you have to spend, I will accept a solution where we assume a TigerSHARC processor variant with a three stage pipeline so that values are “immediately” available and the Y registers are not used. However, if you want to go the whole hog – please do so.


Q4) After Lab.4 – every TigerSHARC will have the audio program blasted into the boot loader – not what I want for next year. I want to have each TigerSHARC reblasted with a sort of “blink” program so that students can see when VDSP recaptures control of the TigerSHARC processor.

Minimum requirements – 20 minutes to 30 minutes if you have done Lab. 4

You must have a way to demonstrate that it is “your code” that is present in the loader, rather that the code left over by the previous student.

New project built – include only the files needed for the project – Provide source code

You may modify the .ldf file from another multi-processor project (recommended approach)

Processor A blinking lights

Processor B independently blinking lights

Light patterns

Surname A – F -- Processor A both on, both off, Processor B one on, then the other on.

Surname G -- M -- Processor B both on, both off, Processor A one on, then the other on.

Surname N – Q -- Processor A both on, both off, Processor B both on, both off.

Surname G -- M -- Processor B one on, then the other on, Processor B one on, then the other on.

What I would prefer

A)  Synchronize the lights. The TigerSHARC’s have memory they can share (broadcast).

Processor A waits a while – then writes a value (message) to memory location A, then write a 1 (semaphore) to memory location B. Processor A waits a while then checks semaphore (make sure it does “wait a while” otherwise you might end up with a really tight loop). If semaphore is cleared to 1, then Processor A writes a new message

Processor B waits a while – checks the semaphore. If semaphore is 1, then processor B displays its light pattern and then sets semaphore to 0 (message received). I don’t believe that the system will block – if it does – increase all “wait a while” loops.

What I would really prefer

Light patterns as above until you press one of the buttons. Then, for a short period of time, the processor LED flash out your name in MORSE code – shorts are shown by 1 LED on processor A (left most light), longs are shown by 4 LEDs on (both processors)


Possible Pseudo code

Main

Char name[ ] = “My name”

For each character in name
display character
wait a while
end for

Repeat

End main

Display character( )

Array of pointers to character string = { “A”, “LL”, “B”, “LS” etc} where L is long flash and S is short flash

Find the location of the morse code character string =
2 * (ascii value of character – ascii value of ‘A’) + 1 ( I think)

While *pt != NULL
if *pt == ‘L’

All lights on both processors
Wait time * 2

End if

if *pt == ‘S’

Light in processor A only
Wait time

End if

Pt++;

End while

End display character


Special questions based on issues arising from talks. Check frequently, others may be added.

Screen captures of the modified slides can be inserted directly into your answer sheet. You may answer as many or as few of these questions are you like.

Q5) Just for Chad – Prepare (no more than 3) slides to explain the difference between VDSP and VDK.

Q5) Just for Baradi – Redo your simple example “hello world” program slides to better demonstrate the parallelism of the cell processor for this programming model