A DSP VIRTUAL MACHINE
*DSPVM Group, TeNeT Group, IIT-Madras,
Chennai-600036.
Contact Address:
Dr Hema A Murthy,
Asst. Professor,
Dept. of Comp Science & Engg,
IIT Madras,
Chennai, 600036.
India.
e-mail:
Abstract
Modern day embedded systems use DSP processors extensively. With different DSP processors having their own architecture and assembly language, it is very difficult to port an application written in the assembly language of one processor to that of other. The issue of porting an application written in assembly language of one DSP processor to that of other has been addressed by designing an independent Virtual Architecture and Language associated with that. In this paper we describe this virtual architecture and language.
I. Introduction
Embedded systems use DSP processors extensively today, DSP applications require some specialized operations for example, vector operations, circular addressing, bit reversed addressing and so on. DSP processors are designed to enable fast computation of DSP algorithms. Although translators are available from C code to target processor code, most DSP algorithms are written in the machine language of the DSP processor to ensure efficient code. Although code written for a processor is efficient, the drawback is that this code cannot be ported across different DSP processors. To port, code written for one processor to another the code has to be rewritten. An alternative to this problem is to write code in a high level language like C and use translators. The drawback of this approach is that the code generated is likely to be inefficient as it is unlikely to exploit the DSP support provided by the processor.
What is required is a mechanism for porting algorithms (or applications) between DSP processors without compromising efficiency.
Fig.1
In this paper we provide a solution to this problem. The key objective of the solution is to provide a single language interface across several (if not all) DSP processors. In addition, it has to efficiently utilize the special capabilities of DSP processors and the target processor in particular. The architecture of the DSP Virtual Machine(DVM) is shown in Fig.1. This paper describes a language called the Virtual Machine Language (VML) that runs on the virtual DSP architecture. The VML is translated to that of the target processor code using a translator as shown in fig.1.
The primary focus of the efforts discussed in this paper is to generate processor independent (platform independent) but optimized DSP processor specific code.
One of the solutions to platform independence is the Java approach. Java compiler creates byte code of the Java Source Code, which can be executed on a Java Virtual Machine [1]. VML is also platform independent, which executes on any DSP processor. Here similar to byte code of JVM, an intermediate parse tree is generated. Then for each instruction the translator checks for the availability of the instruction on the target processor. If any instruction is not directly available, DVM plugins in terms of the instructions available on the processor are included.
The key difference is that, in the DVM importance is given to the hardware configuration of the target processor.
*The DSPVM Group currently consists of the following people: Dr. Hema A Murthy, T.A.Gonsalves, Rolland Enoch, G.Sirisha, Valli Kannu, M.Mahesh, K.Geetha, Saravan, Raghu Kishore.
The VML has been defined in such a way that the high level VML instructions have plugins in terms of low level VML instructions. The language provides an instruction set, which is inherently powerful and allows concurrent operations in more than one functional unit at a time.
In Section II, we describe the design of the DSP virtual machine, and in Section III we discuss the design of the Virtual Machine Language. In Section IV we describe the translator from VML to target processor. To test the VML, a simulator was developed (in C language); this is discussed in Section V. Finally we present the conclusion in Section VI.
II. DVM Architecture
The following design was arrived at after a thorough study of the DSP processors, namely TI, Motorola, Analog series of DSP processors [2,3,4]. From the study it was observed that all DSP processors have the following computational units: ALU (Arithmetic Logic Unit), MAC (Multiply and Add with carry unit), SU (Shifter Unit), AGU (Address Generation Unit), Program control unit, Memory unit and I/O interfaces. The architecture of DVM is a superset of all the three families. The primary reason for these characteristics is that the DVM should be able to support future DSP processors too.
The DVM is divided into the following functional units: namely Computational units: These include computational units, Multiply and Accumulate units, ALUs, Shifters, Address Generation Unit, and Program sequencer.
Memory: Memory might include more than one block of memory, which can be accessed simultaneously. Typically Program and Data memory blocks are present. Optionally the cache memory may be available.
I/O interfaces: These include the input and output units like Interrupt Control Units, Host Port Interfaces, Direct Memory Access Controllers, Serial Ports and so on.
The architecture assumes that there can be more than one functional unit of any kind. Each functional unit is treated as a resource. The program sequencer controls the overall functioning of the DVM; it takes care of the resource allocation and functional unit invocation.
Fig.2 shows the organization of the functional units of the DVM. The attributes of each functional unit are described below:
Computational Units: ALU: The ALU implements a wide range of arithmetic and logical operations. Further fixed and floating point operands are supported.
Fig.2 Organization of the functional units in the DVM
MAC: The MAC unit supports M*N bit multiplication with K bit addition. Shifter Unit: Shifter Unit enables left shift, right shift relative with flag through carry, prescaling and postscaling and normalizing operation. Address Generation Unit: The addressing modes that are supported in the DVM include the following: Direct addressing mode, Direct offset addressing, Indirect addressing, Immediate addressing, Modulo addressing and Bit reverse addressing. Further the address can either remain unchanged or postupdated or preupdated. Program Sequencer Unit: This unit primarily controls the flow of control in a program, ensuring that flags and registers are appropriately updated after the execution of each instruction. Memory Unit: The memory in the DVM is organized in blocks. Each block can be configured in terms of size, access etc. I/O Interfaces: The current architecture of the DVM only supports to or from memory operation. A simple file handler is provided which enables transfer of data to(from) file (from)to memory.
For complete set of instructions refer to website http://www.tenet.res.in/Donlab/Dsp/index.html.
III. VM Language
The design of the VML is based on a set of desirable features. The following are identified as some of the desirable features of the VML.
1.The VML should be sufficiently comprehen-sive so that any DSP application can be developed.
2. It should be possible to optimize DSP code written in VML itself.
3. The application development should be easy, as compared to the assembly language programming, having support to high-level language constructs.
4. The machine language that will be generated by the translator should be optimal. The overhead should not exceed 1.25 to 1.5 times as compared to that of hand assembled code.
5. The VML should be able to exploit the architectural features of the DVM (addressing modes, parallel operation of functional units).
6.VML should support the data types that are typical to the DSP applications, vectors, circular buffers.
To realize some of the desirable features, the language was developed in layers with sophistication increasing from L0 to Ln. The block diagram is given in fig.3.
Fig.3 Architecture of VML
The higher level instructions are also mapped into a series of lower level instructions. Application writer would make sure that he/she writes an optimal code in higher level instructions for the application. When it comes to porting an application to the specific target, the target processor may/maynot support all the high level instructions in the VML. This information is available in the specific instruction configuration file. If a level N instruction is not available in the target processor, it is automatically translated in terms of level L0 to LN-1 instructions. This is repeated at every level. The current VML supports L0, L1 and L2 instructions. At each level the instructions have been grouped according to their modes of operation.
At the L0 level instructions support is provided for arithmetic, logic, bit manipulation, program control and so on. Some control structures like DO UNTIL, DO FOREVER are also available to enable the block repeat kind of execution to the user.
At the L1 level, instructions are multifunction instructions. The floating-point instructions are included in this level because all the processors do not support floating-point operations. The instructions with one or all memory operands are included in the L1 instruction set. Currently simple I/O is only supported.
Vector operations require multiple MAC instructions where as using L2 instructions, vector operations can be performed using a single instruction in VML.
The translation to target involves two different phases. In the first phase, the VML code is parsed and the parsed tree is generated. The VML is further optimized. In the second phase, the VML code is translated to target. This is again done in two phases. First, plugins in terms of VML are inserted for instructions not available in the target. Also, register mapping is performed. Next the VML code is optimized: removal of dead code; removal of unreachable code; move loop invariant instructions out of the loop. Finally, the VML code is translated to that of the target.
VML to target translation uses configuration files. This configuration includes both the hardware characteristics and the instruction support available in the target.
The original code written in VML before the optimization is performed is given in the Fig. 4.1.
1. MOVEA input, AD0;
2. MOVE #1, OF0;
3. JUMP LAB1;
4. MOVE #2, R0;
5. MOVE #3, R1;
6. ADD R0, R1, R2;
7. LAB1: MOVE #50, R0;
8. MOVE #51, R1;
9. SUB R1, R0, R2;
10. IF AZ ADD R0, R1, R2;
11. DO LAB2 FOREVER;
12. MULT R0, R1, R5;
13. MOVE #64, R0;
14. ADD R0, R5, R2;
15. MOVE #50, R7;
16. MOVE #1, R8;
17. ADD R7, R8, R9;
18. BREAK AC;
19. LAB2: MOVE R5, 1:(AD0:
OF0)[LINEAR PRE_INR];
20. MOVE #0, R2;
Fig 4.1 The Original Code Before Optimization
Using the control flow analysis [5] the instructions 3, 4, 5 and 6 which are unreachable codes are removed from the program. Using data flow analysis [5], DU and UD chains [5] for the operands used in the program the instructions 15, 16 and 17 which are loop in-variants are moved outside the loop. Using the same data flow analysis, DU and UD chains the instructions 15, 16 and 17 which are moved outside the loop and the instruction 20 which is dead code now are removed from the program. Using the graph coloring algorithm [5] and the DU and UD chains the register allocation is performed by mapping the register R0 to R4, R1 to R5, R2 to R3, R3 to R3, R7 to R2, R8 to R3, R9 to R3, R5 to R3 and R2 to R5 in the program. The final Code in VML after optimization is given in the Fig. 4.2.
1. MOVEA input, AD0;
2. MOVE #1, OF0;
3. ADD R0, R1, R2;
4. LAB1: MOVE #50, R4;
5. MOVE #51, R5;
6. SUB R5, R4, R3;
7. IF AZ ADD R4, R5, R3;
8. DO LAB2 FOREVER;
9. MULT R4, R5, R3;
10. MOVE #64, R4;
11. ADD R4, R3, R5;
12. BREAK AC;
13. LAB2: MOVE R3, 1:(AD0: OF0)[LINEAR
PRE_INR];
Fig 4.2 The Final Code After Optimization
Hence the length of the program is reduced from 20 to 13 and also the number of registers used in the program is reduced from 7 to 4. After optimization is performed on the original VML code, the result is increase in speed of execution and optimum usage of resources available in the target processors.
IV. Translator for DVM
The configuration files gives information about the constraints of the processor namely number of registers, condition code support and so on. Further it also provides a mapping for every instruction in VML. If a particular instruction is not available a keyword is provided to indicate the same. This information is useful for inserting plugins.
An added advantage of these instructions is their compatibility with variable precision of the operands. The operations can be done on variable precision operands. A small sequence of the configuration file is given in Appendix A.
To accommodate processor that do not support all the VML instructions, plugins interms of VML instructions are made available. This instruction replacement is done in the parse tree before translation. This process of replacing instructions not available in target by their corresponding routines is referred to as VML retargetting.
The list of instruction just inserted in to the program in place of the higher level instruction become part of the same code segment. Such a list of instructions is called a plugin.