BBB - Working with the PRU-ICSS/PRUSSv2
(Disclaimer - this is some experimentation, so the information here may not be the best way of doing things.
Also, please read the comments below, which have recent information as people discover things)
17th March 2013 - Anything old is now in purple. Notes have been updated to reflect recent builds.
What is it?
The BeagleBone Black's TI chip (XAM3359AZCZ revision 2) contains the main processor (ARM) along with a number of other modules (see this diagram from the AM335x datasheet).
Although the ARM Cortex-A8 processor portion of the chip is powerful, the nature of Linux means that real-time control of high-speed external hardware may often still not be easily possible. The TI chip improves the situation by providing two additional CPUs (known as PRU-ICSS or PRUSSv2, I’ll call it PRU for short) on the same silicon. It means that separate software can be run on them, to offload hardware interfacing and processing of low-level protocols.
The chip has been likened to having arduino-type capability on the same chip, but actually the additional CPUs run at a far higher speed (200MHz) which in many cases means that external logic devices, CPLDs or FPGAs may not be necessary.
Generally, having to program more than one processor is inconvenient and means that a protocol is needed between the processors. This is greatly simplified on the TI chip, because (1) the code for the PRUs can be downloaded from the main processor, and (2) shared memory can be used for communication.
Where would it be useful?
For low-speed comms, conventional I2C or similar protocols can be used, and there is no need to use a PRU. For high-speed comms the PRU may be extremely useful because it can service the hardware with no interruptions due to Linux context switching, and no overhead is experienced by the main ARM processor. Here are some examples that should be feasible; basically quite a few possibilities.
- interfacing to a fast ADC (e.g. analog capture),
- CCD or a CMOS camera
- LED or LCD display
- analog video generation (video encoder)
- custom PWM or other custom or non-standard protocols
- motor control with feedback
As far as I can tell, it is even possible to clock in parallel data from an external clock.
How to use it?
Currently, it is not straightforward, but certainly not difficult. The main difficulty is finding complete examples on the web. The information here has been gleaned from a lot of web searching and experimenting.
These are the main steps:
- Get the PRU system enabled on the BBB board
- Get the PRU assembler installed on the BBB (code for the PRUs is written in assembler currently, until someone creates a C compiler for it)
- Write the code. PRU applications are in two halves which can communicate with each other through memory addressing:
(a) The assembler code that is assembled into a .bin machine file to run on the PRU, and
(b) Some C code that will run on the main processor, i.e. on top of Linux. This C code is responsible for downloading the assembled code in the PRU
4. configure the Linux device tree to enable any pins for input/output
5. Run the program
What is the assembler code like?
It’s not bad. It’s easier than some common assembler languages like for PIC or other 8-bit processors because there are a large amount of registers (all 32-bit), the instructions are orthogonal and bit and byte referencing for manipulations is extremely good. There are not many commands, I’ve only used a few so far out of 45 approx, and that suits me fine (usually I don’t want to invest a lot of time learning assembler for an awkward processor – this is not the case and the PRU instructions seem easy to use).
Is it worth the effort?
I think it is, because it becomes possible to control hardware at high speed (say 50MHz). Each instruction takes 5nsec to execute on the PRUs (200MHz clock, each instruction takes 1 cycle) and no varying latency due to the Linux kernel.
What are the difficult bits?
Mainly, it is the device tree related stuff. Hopefully this can change or become simplified in the future. On a typical microcontroller, inputs/outputs are set using particular registers that reside in part of the memory map of a device. With the current software running on the BBB, the user is prevented from directly modifying such hardware registers from within conventional C code as far as I can tell. With the current method, a ‘device tree’ is used; it is a text file that is compacted into a binary file which is read when the system is booted. The file tells the system which pins are inputs/outputs. Device tree modifications are also used to enable the PRU system.
What is the device tree?
See a number of posts (e.g. post 109) here, Selsinork has created some useful examples of it, such as using it to switch off an LED which flashes by default on the BBB.
The device tree resides in the /boot folder on the BBB, and it is a binary file that is not understandable (example snippet below). It has a .dtb or a .dtbo filename:
There is a program on the BBB called dtc that can be used to convert to a text readable form or vice-versa. Here is a snippet of what the text form looks like (usually a .dts suffix):
Working with the device tree means converting the existing binary .dtb file at /boot into a text source .dts file, making some changes and then converting back into the binary format, and then rebooting the BBB for the changes to take effect.
Here is the procedure to convert the binary file into a text file:
cd /boot
cp am335x-boneblack.dtb am335x-boneblack.dtb_orig
dtc -I dtb -O dts am335x-boneblack.dtb > am335x-boneblack.dts_orig
After any modifications have been made (maybe copy the file first so that you have a backup), it can be converted back into binary form using:
dtc -I dts -O dtb am335x-boneblack.dts_pru > am335x-boneblack.dtb_pru
cp am335x-boneblack.dtb_pru am335x-boneblack.dtb
These commands will be used for a couple of the steps below.
Step 0: Get the BBB ready in general for any development
Although it is possible to compile up code on an x86 Linux server, the BBB is fast enough that there is no need to cross-compile. Usually I tend to write the code on a Windows or Linux machine, and then use SFTP to transfer the files across to the BBB and compile there. But, sometimes vi still gets used on the BBB. The Angstrom Linux has a few defaults that may not suit everyone. I’ll place them in a separate post – some people may not want to do them, or may have better suggestions. By the way I used bash shell for everything, not the default sh. Just in case that makes a difference to environment variables setting.
Anyway, it is advisable to upgrade the software to the latest. This requires a 4GB minimum microSD card that can be programmed from a PC.
Step 1: Get the PRU system enabled on the BBB board
The information in color here is now historic. Today it is possible to enable the PRU using a "dts fragment file" also known as ".dtbo" method. So, skip the colored bit.
By default, lsmod reveals that uio_pruss is not installed, so you have to type the following to install this module:
modprobe uio_pruss
The device tree needs updating to enable the PRUs. Once you have got a text version of am335x-boneblack.dtb (using the method described earlier under "What is the device tree?") then edit it and make the following changes:
Search for pruss@4a300000 and then under it change
status = "disabled";
to
status = "okay";
(note: the correct procedure is not to do the above in the .dtb file but rather in a .dtbo fragment file as explained in the comments below, but there is currently a bug that makes PRU enablement unreliable via the .dtbo) - Note 2 - the bug appears fixed in recent images, so no need to do this in the .dtb file. Just do it in the .dtbo file instead. However, just make the led0 changes here, to disable the flashing.
While you are at it, make Selsinork’s LED change to disable the flashing USR0 LED (the LED on the far end of the board). It is extremely useful to disable it so that you can try to control it from the PRU as an experiment. This is what needs to be done:
The LEDs are controlled by this part:
gpio-leds {
compatible = "gpio-leds";
pinctrl-names = "default";
pinctrl-0 = <0x3>;
led0 {
label = "beaglebone:green:usr0";
gpios = <0x5 0x15 0x0>;
linux,default-trigger = "heartbeat";
default-state = "off";
};
led1 {..etc
So, we can change led0 (aka USER0 aka USR0) by changing the line from:
linux,default-trigger = "heartbeat";
to:
linux,default-trigger = "none";
Ignore the colored bit, it is no longer necessary.
There is one more change for now, but I’ll explain it later. For now, you may wish to make this change too, to run some example code later. If you don’t make the change now, no problem; the device tree can always be updated again at a later date and then the board rebooted.
Search for a line that says
pinctrl-single,pins = <0x54 0x7 0x58 0x17 0x5c 0x7 0x60 0x17>;
and change it to:
pinctrl-single,pins = <0x030 0x06 0x54 0x7 0x58 0x17 0x5c 0x7 0x60 0x17>;
Note: Don't do the pinctrl modification in recent releases. Just use the .dtbo file method because it works. The information above will be deleted soon, because it does not apply to recent releases.
Save the file, and then convert into the .dtb file as mentioned earlier, and then the board can be rebooted. Now the annoying flashing LED has stopped flashing and can be used for our debug purposes.
Read Step 4 now, to see how to create the fragment file. The ordering is a little back-to-front because in the past, the .dtbo method did not work to enable the PRU. Today it works.
Step 2: Get the PRU assembler installed on the BBB
This is straightforward. Find the file am335x_pru_package-master.zip from the Internet and save it onto the BBB and unzip to a folder.
Type the following:
export CROSS_COMPILE=
go to pru_sw/app_loader/interface and type:
make
then go to pru_sw/utils
mv pasm pasm_linuxintel
cd pasm_source
source ./linux_build
Go to pru_sw/example_apps
make clean
make
The steps above will have created the assembler, and also some demo programs. As mentioned earlier, PRU applications are in two halves;
(a) the hand-written assembler code that got assembled into code (a .bin file) to run on the PRU of course, and
(b) some C code that will run on the main processor.
The latter is responsible for two things:
- Uploading the assembled binary file into the PRU, and
- interacting with the PRU to pass/fetch information.
The source code for (b) resides in the example_apps/xxx folder, and when compiled it creates a .o file in the obj folder which we can link with libprussdrv.a to create our executable.
The assembler code for (a) resides in the same example_apps/xxx folder as a .p and a .hp file but when assembled, the .bin file sits in example_apps/bin
The commands earlier will have created the .bin file for (a), and the .o file for (b).
The .o file can be linked into an executable using (say):
cd example_apps/PRU_memAccess_DDR_PRUsharedRAM/obj
gcc PRU_memAccess_DDR_PRUsharedRAM.o -L../../../app_loader/lib -lprussdrv -lpthread –o mytest.out
With both the executable (mytest.out) and the .bin file in the same folder (You will have to move it manually), the executable can now be run:
./mytest.out
This is the output:
INFO: Starting PRU_memAccess_DDR_PRUsharedRAM example.
AM33XX
INFO: Initializing example.
INFO: Executing example.
File ./PRU_memAccess_DDR_PRUsharedRAM.bin open passed
INFO: Waiting for HALT command.
INFO: PRU completed transfer.
Example executed succesfully.
The folders for the example are a bit messy. To make life easier you may as well copy the library and header files to system folders:
Go to pru_sw/app_loader/lib
cp libprussdrv.a /usr/lib/.
cd ../include
cp *.h /usr/include
cd pru_sw/utils
cp pasm /usr/bin
Now you could use a simple makefile to create your own code in any folder. That is what will be done in Step 3.
An explanation regarding the file suffixes: the .p means assembler source file, the .hp is like an include file but behaves exactly the same as the .p file so technically you could put everything into a single .p file if desired; the assembler is simple to use. No linker is required by the way.
Step 3a: Write the assembler code to run on the PRU
Here I just reused some of the example code (from the PRU_memAccess_DDR_PRUsharedRAM example in Step 2), and modified it so that it flashes the USR0 LED (the LED on the far end of the board).
Here is the entire .p file, I called it prucode.p:
// prucode.p
.origin 0
.entrypoint START
#include "prucode.hp"
#define GPIO1 0x4804c000
#define GPIO_CLEARDATAOUT 0x190
#define GPIO_SETDATAOUT 0x194
START:
// Enable OCP master port
LBCO r0, CONST_PRUCFG, 4, 4
CLR r0, r0, 4 // Clear SYSCFG[STANDBY_INIT] to enable OCP master port
SBCO r0, CONST_PRUCFG, 4, 4
// Configure the programmable pointer register for PRU0 by setting c28_pointer[15:0]
// field to 0x0120. This will make C28 point to 0x00012000 (PRU shared RAM).
MOV r0, 0x00000120
MOV r1, CTPPR_0
ST32 r0, r1
// Configure the programmable pointer register for PRU0 by setting c31_pointer[15:0]
// field to 0x0010. This will make C31 point to 0x80001000 (DDR memory).
MOV r0, 0x00100000
MOV r1, CTPPR_1
ST32 r0, r1
//Load values from external DDR Memory into Registers R0/R1/R2
LBCO r0, CONST_DDR, 0, 12
//Store values from read from the DDR memory into PRU shared RAM
SBCO r0, CONST_PRUSHAREDRAM, 0, 12
// test GP output
MOV r1, 10 // loop 10 times
LOOP:
MOV r2, 1<21
MOV r3, GPIO1 | GPIO_SETDATAOUT
SBBO r2, r3, 0, 4
MOV r0, 0x00f00000
DEL1:
SUB r0, r0, 1
QBNE DEL1, r0, 0
MOV R2, 1<21
MOV r3, GPIO1 | GPIO_CLEARDATAOUT
SBBO r2, r3, 0, 4
MOV r0, 0x00f00000
DEL2:
SUB r0, r0, 1
QBNE DEL2, r0, 0
SUB r1, r1, 1
QBNE LOOP, r1, 0
// Send notification to Host for program completion
MOV r31.b0, PRU0_ARM_INTERRUPT+16
// Halt the processor
HALT
As mentioned, this is derived from the example .p file. The only major difference is around where it says “// test GP output”. All the code before it is an example that shows memory access from the PRU and the ARM processor, as run earlier in Step 2. It is useful since it shows how to communicate between the processors.
The new code is intended to flash the USR0 LED ten times, and here is a description of what the assembler code is doing:
R0: Used to store a large number for use as a delay
R1: used to store 10 as a loop counter
R2: This stores the value 1<21 which is shorthand for bit 21 set, and all other bits clear. The USR0..3 LEDs are GPIO1_21..24 as can be seen in the schematic (see image below), so if we want to control USR0 then bit 21 gets manipulated
R3: This stores the address of the GPIO1 register which can be found in the am335x tech reference manual (4000 pages) and if you click on the blue text at that location, it goes to the chapter which shows the actual individual register addresses for GPIO1. The code uses this to set or clear the GPIO1 pin 21 in this example.
Some example assembler syntax:
MOV r1, 10 – This moves 10 into register 1 (i.e. r1 <- 10)
SBBO r2, r3, 0, 4 – SBBO stands for ‘store byte burst’ and this moves from registers to an address; in this case the contents of r2 into the address at r3. The zero means no offset, and the 4 means copy 4 bytes (each register is 32 bits).
You can see the code has a main outer LOOP1, and a couple of DEL1 and DEL2 loops.
The prucode.hp file is the same as the .hp file that is in the example folder (I just renamed to prucode.hp); it contains some definitions and some useful macros.
The .p and .hp code gets assembled into a single .bin file. I used a makefile for this, and didn’t assemble it until I had completed step 3b.
Step 3b: Write the C code to run on the ARM processor
The C code is nothing special. It is probably advisable to just re-use the example code. I reused the .c file from the PRU_memAccess_DDR_PRUsharedRAM example and just renamed it to mytest.c
The C code makes use of some library functions that initialize the PRU and download the .bin file into the PRU. It is fairly obvious what is occurring by inspecting the example code.
Step 3c: Create the .bin file for the PRU, and the ARM executable
I’m no Makefile expert but what worked for me is attached. You can just type ‘make’ and it will build the .bin file (prucode.bin) and the executable (mytest).
You can actually run the program now and it will work, without steps 4 and 5.
./mytest
This will flash the end-most LED on the board 10 times.
Step 4: Configure the Linux device tree to enable any pins for input/output
Although the example worked, actually this is just because we used it to toggle a pin that was already configured for use as a GPIO output (USR0 LED). Ordinarily some work needs to be done with the device tree.
There is also another important point: Most pins on the TI chip can be used for multiple purposes, depending on how they are configured. It is known as Pin Multiplexing (pinmux) and it seems complex because there are so many modes (up to 7) that each pin can be used for.
There is a very useful table that Selsinork created. It is in post #5.
By inspecting this table, you can see which pins on the chip make it out to the headers on the BBB, and the names of the functions in each mode.
This is important to know, because although the PRU runs at a high speed (200MHz), it cannot control pins in the standard GPIO mode at such a high speed. The pins need to be set to a different mode so that the PRU can directly control them. Then, the PRU can control the pins at the high speed (5nsec, i.e. a single clock cycle). The direct mode is known as ‘PRU GPI’ or ‘PRU GPO’ mode (for input and output respectively).