1 Introduction to the RISC Processor

Prof. Dr. B. Schwarz

1 Introduction to the RISC Processor

Major development advances in the field of computer architectures [2, 3, 5, 11]:

· Family concept was introduced by IBM with System/360 (1964) and DEC with PDP-8 (1965):

A common hardware architectures with different performance characteristics.

Clock frequency, data bus width, addressable memory width

· Micro programmed control unit (suggested by Wilkes 1951, first realised in System/360):

Processing of a machine instruction in a sequence of microinstructions simplified the development

of the control unit for increasing instruction word length.

· Cache memory: Fast buffer to reuse commonly occurring data and/or instructions. Introduced in 1968 with the Type 85 of IBM System/360 family.

· Pipelining: Concurrent processing of several instructions in different stages increases the throughput of instructions. The Control Data machine CDC 6600 founded the roots of RISC in 1964.

· Multiple processors: Symmetric (shared memory) multiprocessors (SMP), nonuniform memory access (NUMA) systems with physically distributed memory.

· Reduced Instruction Set Computer (RISC)(1980 Patterson/1981 Hennessy): Instruction set , CPU architecture and control unit design derived from studies about dynamic frequency of instructions and variables in usual running programs.

Motivations for the design of Complex Instruction Set Computers (CISC) [2, 3, 5]:

· Two principle reasons: Simplified compilers and increased performance.

Concurrently more high level languages (HLL) with powerful expressions have been developed in order to reduce the software costs and it was intended to design hardware architectures with better support of HLLs.

· Simplified compilers:

Ø Complex machine instruction sets with more specialised instructions and many addressing modes get closer to the HLL expressions and therefore each HLL statement can be realised with a simpler sequence of machine instruction.

Ø But it was experienced that the process of translation tends to become more complicated with a larger number of sophisticated instructions. Optimisation for code reduction and pipelining was shown to be less efficient.

· Smaller machine programs:

Ø With decreasing semiconductor memory costs that was no longer a driving force

Ø In contrary less instructions can be supported with a shorter operation code and therefore fewer bytes have to transferred from main memory.

Ø Quantitative studies have shown that code size is reduced only by a small amount (Table 11).

Table 11: Code size of CISC instruction set relative to RISC I [2]

· Faster machine program processing:

Ø At the first glance it seems as if a HLL statement which is translated directly to a machine instruction can be processed faster than a sequence of simple instructions.

Ø But the control unit will be more complicated and the microcode storage has to be enlarged. As a result no reduction of processing cycles can be realised.

1.1 Motivation Derived from Code Analysis

· As a topic of cache memory design research for improvement of von Neumann computer architectures several dynamic measurements with running programs have been performed. The frequency of events with memory interaction was a main target of these studies [2].

Table 12: Relative dynamic frequency of HHL operations [2].

· The evaluation of event frequencies in running programs was driven by three questions:

Ø Which operations have to be supported by the processor?

Ø Which operands determine the memory organisation and addressing modes?

Ø Which optimisation goals for control unit sequencing and pipelining are of major importance?

Operations:

· Program activities are dominated by simple variable assignments which are related to move machine instructions. Conditional statements represent another preponderance. Because IF and LOOP are implemented with conditional branches the sequence control mechanisms are of interest.

· For an evaluation of the actual processor activities several HLL programs have been compiled for three CISC processors VAX, PDP-11 und M68000. Dynamic occurrence of machine instructions and memory accesses have been under investigation Table 13.

Table 13: Weighted relative dynamic frequency of HLL operations

[2, PATT82a].

· To obtain the columns 4 and 5 each value in columns 2 and 3 is multiplied by the relative number of machine instructions produced by the compiler.

Ø Conditional branches and procedure calls make up the largest part of machine language code and therefore the relationship of instruction set design and code sequencing is of interest.

· The 6th and 7th columns are derived by multiplying the frequency of occurrence of each statement by the relative number of memory references caused by each statement.

Ø Procedure calls and returns are the most time consuming operations in HLL programs. Parameter passing and register savings to main memory take more time than simple register access.

Operands:

· The majority of occurring operands are simple variables and 80% of them are locally associated with functions [2, PATT82a]. Indices of arrays and pointers for structure access are of the same kind.

Table 14: Dynamic percentage of HLL operands [2, PATT82a].

Procedure calls:

· The number of parameters, of local variables and the depths of procedure nesting determines the amount of memory transfers related to calls and returns.

Ø Quantitative studies by Tanenbaum [2, TANE78] have shown that 98% of reviewed procedure calls pass less than 6 parameters and 92 % of checked calls have used less than 6 local variables.

Ø Call-return behaviour of a program is depicted in Fig. 11. Each call is represented by the line moving down to the right and each return by the line moving up to the right. An interval with the nesting depth of 5 is defined as a grey window. Only a sequence of 6 calls causes a shift of the window’s position.

Ø By further investigations it was recognised that with window width of 8 only 1% of the calls makes the window’s position shift up or down.

Ø This characteristic of operand usage is stated as locality of reference.

Fig. 11: Example call-return behaviour of a program [2, PATT85].

First Implications:

· Designing an large instruction set with close relation to complex HLL statements and expressions is not very effective. HLLs should rather be supported where time consuming operations have to be processed, i.e. avoided memory accesses.

· A large number of internal general purpose registers (GPR) supports local handling of variables in order to reduce the amount of external memory transfers.

· A careful organisation of pipeline sequencing is necessary because of a typically large portion of conditional branches and procedure calls. Otherwise to many instructions are fed into the processors pipeline stages without being operated.

· Till here the requirements for a simplified and reduced instruction set can not be derived from the named study results.

Characteristics of Reduced Instruction Set Architectures:

A simple instruction set layout with a unique length of the instruction word with typically 32 bits.

A small number of different combinations of opcode, operand fields and addresses. Register fields with a minimum of 5 bit width ().

2. Logic and arithmetic operations only with operands located in registers (Register to Register Operations). Only load and store instructions support the access to external (Load/Store Operations). These instructions contain only one operand with a reference to the memory.

3. Only a few addressing modes: register indirect addressing, displacement, immediate und PC relative. No indirect addressing modes in order to avoid to many slower external memory transfers.

Advantages:

· The complexity of the instruction decoder component of the control unit will be reduced.

Ø No long sequences of microcode processing steps are necessary.

Ø Fewer chip area is occupied by the simple combinational logic of the control unit.

· Branch processing with in the pipeline becomes more efficient.

Fig. 12: Instruction layout for MIPS R processor family [2, 3, 5, 11].

· I-Typ

Loads und Stores with register indirect and displacement:

rt ¬ Mem(rs + imm)

All immediate addressed:

rt ¬ rs op imm

Conditional branches with PC relative:

rs = = 0, PC ¬ PC + imm

rs != rt, PC ¬ PC + imm

· R-Typ

rd ¬ rs func rt

1.2 Processor Structure

Essential characteristics of a RISC architecture: CPU general purpose register and a control unit which is realised with combinational logic hardware .

· User visible registers: Support the software engineer to optimise the locality of variables with register allocation in order to minimise the external memory transfers.

Ø GPRs Ri : Contain operands for all ALU operations and addresses for displacement addressing with numbers from the immediate field.

Ø Stack pointer: Special register points to the top of the stack and is modified implicitly by procedure calls and returns. Register savings are supported with push and pop instructions which address the stack implicitly as well via the stack pointer.

Ø Index registers: Automatic increment for address calculations in loops.

· Control, status registers: Contents controls CPU operations but can not be manipulated directly.

Ø Program counter: State addresses the next instruction in main memory which has to be processed.

Ø Instruction register: Contains the just loaded instruction.

Ø Memory buffer registers: Intermediate storage of effective addresses for load/store data and data itself.

Ø Condition code register: Several bits which characterise the results of ALU operations.

Fig. 13: Basic elements of a von Neumann RISC processor architecture.

Not depicted:

· CCR and index register

· CPU data and address bus

· Separation registers for decoupling the pipeline stages

RISC Project 1-13