Jeffrey Dwoskin & Kevin Green

VLSI Design Project Report

Fall 2001 – Spring 2002

Project Description

We have designed a microcontroller chip based on the RTL design and instruction set of an AMD 2910. Our chip is the sequencer component of the microcontroller, which processes instructions from the micro-program ROM and determines the address of the next microprogram instruction. The controller has a 4 input multiplexor that is used to determine the next instruction from the direct address input, the microprogram counter, the register/counter, or the 4-word stack.

The direct address is an inputto the chip which is used to initialize the controller, start execution of a new instruction with an address from the mapping PROM, or for branching instructions from the microprogram ROM.

The microprogram counter consists of an incrementer followed by a 12-bit register. The incrementer takes as input, the current microprogram address and adds one to advance to the next instruction. This new address is then stored in the register.

The address register/counter is used for looping over a set of instructions. It does this by first loading a count from the direct input, and then decrementing after each loop iteration. When the count reaches zero, a signal is sent that can be used to stop executing the loop. It can alternatively be used to store an address for a conditional jump instruction.

The 4-word 12-bit stack is used to execute subroutines and stores the return address. Since it has 4-words, it can be used for 4 levels of depth for subroutine calls.

During each instruction, the microcode ROM also provides the control signals for the rest of the micro-controlled CPU. The combination of our design, the microprogram ROM, and the mapping PROM make up the control unit for the CPU. The advantage of the microprogrammed control unit over a standard state machine design of a control unit is that by changing the program in the microprogram ROM and mapping PROM, it can be made to emulate the instruction set of any standard microprocessor.

We also designed a bit-sliced 8-bit ALU which resides on a separate chip. Multiple ALU chips can be cascaded to produce a large a bus width as desired. The ALU implements, addition/subtraction, all logic functions including: AND, OR, XOR, and their complements. Also, the second input can be inverted to allow for useful logic functions such as A AND (NOT B). The result can simultaneously be shifted left or right by 1-bit.

Testing Plan

We plan to use a sequential ATPG to find the all the test vectors for our chip. We plan to use SEST for this purpose. The fault coverage should be very high because there is a global reset line that will initialize all flip-flops, and our chip is designed in a such a way as to give access

through the pins to all of the components directly by setting the correct control signals. Moreover, there is an absence of cycles among the flip flops, which will allow the sequential ATPG to fully test the circuit. We could then use an ATE to apply the test vectors.

The first step involved would be to netlist our design into the rutmod format suitable for input into SEST. Then SEST will be able to generate all of the test vectors. However, due to the fact that our design is in Cadence and this software does not support the rutmod format we would have to design the netlist by hand, which is not feasible considering we have thousands of

transistors. Therefore, we will have to wait for a tool to convert our netlist from Cadence into rutmod. However, as stated above we believe that our design is optimal for testability because all components have high observability and controllability.

Criticism of the CAD Tools

Overall we found the CAD tools were straightforward and easy to use, however we found a few things that need improvement. At first we had designed one portion of our project in synopsis, but we were unable to convert the resulting hardware from synopsis into cadence. Synopsis also does not provide enough details of the interconnection between components for us to convert the design manually. For example, when it chose to use a JK flip flop, it showed a box with 4 or more inputs and the wires going into it. It did not tell us which wires went to which inputs in the flip flop. More importantly, we could not get synopsis to restrict its design to only use certain components that we chose. For example, we wanted it to use inverting logic instead of non-inverting logic and only D flip flops instead of JK flip flops, but we were unable to build a new library or restrict the standard G-tech library to accomplish this. We instead had to design the component by hand using K-maps.

As for Cadence, we were unable to simulate extracted layouts. For most of the fall semester, many of the components of cadence did not work for the AMI process. This included the simulator, extractor, and LVS. This made it difficult to test and verify our design as we went along. Also the design rule checker (DRC) was not working for most of the time we were laying out our design. This made the layout time consuming and more error prone and set us back a couple of months. It also meant that we didn’t know to follow some of the more obscure rules from the printed design rules.

We had some other problems with the Affirma analog simulator. First, it is very slow and difficult to work with. Many of the settings we have to set to the same values repeatedly, which it should remember. Also, it could use a much simpler interface, especially for entering the stimuli. For any circuit with more than 2 or 3 input pins, setting the stimuli correctly is very tedious. We’ve had problems simulating designs with a hierarchy, especially when it doesn’t identify global sources deeper in the hierarchy. There is a problem with the model libraries as well, although we're using AMI06 tech library, it still is using tsmc25 in the netlist and we can't determine where its getting this information.

Timing/Critical Path Analysis

Data from AMI C5N Process:

Sampled from:

Sheet resistance:

metal1/2: 0.09 ohm/sq

metal3: 0.06 ohm/sq

poly:22 ohm/sq

m1/2 contact:0.7 - 0.85ohm

Capacitance: (aF(10-18)/µm or µm2)

m1 m2 m3

area (sub) 31 17 10

area (m1) 32 13

area (m2) 36

fringe (sub) 76 59 39

fringe (m1) 56 35

fringe (m2) 51

Wire Delay:

Longest path: assume worst case all metal 1

pc output to next addr mux: 1550µm + 3 contacts

at minimum width = 0.9µm

1550µm/0.9µm = 1722.2 squares long

Resistance = 0.09ohm/sq * 1722.2 sq = 155 ohm + 3 * 0.85ohm = 157.55 ohm

Capacitance:

area (sub) = 31aF/µm^2 * 1550*0.9µm = 43245 aF

fringe (sub) = 76aF/µm * 1550µm = 117800 aF

total cap: 161045 aF = 0.161045 pF

RC = Tdelay = 25.373 ps = 0.025373 ns

Conclusion: wire delay is insignificant

rough approx (min width wire, assuming 1µm wide)

RC = 25.373ps / 1550µm = 0.01637 ps / µm

Component Delay:

Simulated to find delays:

DFF w/clr - .5 ns

2to1 mux - .675 ns

4to1 mux - .900 ns

inc/dec 12 bit - 2.55ns

condition mux (4to1 + 2to1 mux) 1.575 ns

Calculated from components that were simulated:

bus enable - .325 ns

control unit - 1.775 ns

stack control -- longest path through two 2-to-1 muxes - 1.35 ns

Critical Path Analysis:

There are 8 major paths in our chip that we are considering. They are the 4 inputs into the next address mux, and the paths that drive the components in each of those path.

Stack path output:

675um from control unit to stack control – ignore

delay for signal thru mux -- .900 ns

1150um from stack dff output to next addr mux input -- ignore

Total: 0.900 ns

Stack input:

delay thru stack control -- .900 ns

or 620um from pc to stack inputs -- ignore

delay to load registers in stack -- .5 ns

Total: 1.4 ns

Program counter Input:

700um from next addr mux output to incrementor -- ignore

delay thru incrementer -- 2.55 ns

delay to set register -- .5 ns

Total: 2.65ns

Program counter output:

1550um from PC output to next addr mux -- 0.025373ns

delay thru next addr mux -- .9ns

Total: 0.925ns

Addr/reg load input:

1100um from input reg to addr/reg mux -- ignore

or 750um control signals from control unit to load/dec -- ignore

delay thru 2to1 mux – 0.675ns

400um to other 2to1 mux -- ignore

delay thru 2to1 mux – 0.675ns

delay to set register – 0.5ns

Total: 1.85ns

Addr/reg decrement input:

750um control signals from control unit to load/dec --ignore

delay thru decrementer -- 2.55ns

delay thru 2to1 mux -- .675ns

400um to other 2to1 mux -- ignore

delay thru 2to1 mux -- .675ns

delay to set register -- .5ns

Total: 4.4ns

Addr/reg output

825um from register output to next addr mux -- ignore

delay thru next addr mux -- .9ns

Total: 0.9ns

Control unit input:

575um from pads into condition mux -- ignore

delay thru condition mux (8to1) -- 1.575 ns

650um from condition mux to control unit -- ignore

delay thru control unit -- 1.775 ns

Total: 3.35ns

Conclusion: The address register’s decrementer input results in the longest delay of 4.4ns. This must occur during half of the clock cycle, which makes our clock period:

1/8.8ns = 113.6 MHz

In order to be safe, we’ll say our maximum clock rate is 100 MHz.
Transistor Count

Control Unit: 336

8to1 Mux:36

4to1 Mux – 12-bit: 192

Addr Reg/Dec- 12-bit:496

Stack:1358

2 x 12-bit registers:504

2 x bus enable:48

5 x clock inverters:24

Incrementor:108

Total:3102

Power Dissipation

Estimate chip as 3102 transistors/2 = 1551 inverters

Gate Capacitance:

Gate cap on an inverter: 1008.1 aF

Total gate capacitance Cg = 1008.1 * 1551 = 1.56pF

Diffusion Capacitance:

Cd = Cja x a x b + Cjp x (2a + 2b)

a = 1.5µ, b = 1.2µ

P-trans: 5x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.00306pF

N-trans: 3x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.0027pF

Total Cd = (0.00306 + 0.0027) x 1551 inverters = 8.93pF

Interconnect Capacitance:

Total length of major interconnects in Metal 1: 21,505µm

Total length of major interconnects in Metal 2: 53,660µm

Metal 1: 21505µ x 0.9µ x 31 aF/µ2 + 21505µ x 76aF/µ = 2.2pF

Metal 2: 53660µ x 0.9µ x 17 aF/µ2 + 53660µ x 59aF/µ = 3.98pF

Total Interconnect Capacitance = 6.18pF

Total Load Capacitance:CL = 1.56pF + 8.93pF + 6.18pF = 16.67pF

Power = CL x VDD2 x f = 16.67pF x 5V2 x100MHz = 41.675mW

Metal Migration: I = 41.675/5V = 8.3mA

Width = 8.3mA / 0.5mA/µ = 16.67µ = 8.33µ per power line. We made them 12µ wide to be safe.

Address Register/Decrementer

Usage

The address register/decrementer is used primarily for execution ofloops. First the register is loaded with an initial count from the directinput. On each clock period the value stored is decremented and comparedto zero. When the value reaches zero a signal is sent to the condition

mux so execution of the loop is completed. The addressregister/decrementer can also be used to store an address to jump back toduring conditional branches.

Components

12-bit register w/ clear

12-bit decrementer

2x12 2 to 1 muxes

zero detector

The address register/decrementer is composed of the four components listedabove. The register is loaded from a mux which selects from either itsprevious value or the other mux. The other mux selects between the outputof the decrementer and the direct input. The decrementer takes its inputfrom the current value held in the registers and subtracts one from thisvalue. All the components are 12-bits wide. Of course the zero detectorsignals when the current value of the register is all zeros. The registeralso has a 12-bit output to the next address mux, which is used for thebranching operations.

Bit-sliced 8-bit ALU

Usage

Multiple ALU chips can be cascaded to produce a large a bus width as desired. TheALU implements, addition/subtraction, all logic functions including: AND, OR, XOR,and their complements. Also, the second input can be inverted to allow for usefullogic functions such as A AND (NOT B). The result can simultaneously be shifted leftor right by 1-bit.

Components

The 8-bit ALU is a ripple carry configuration of 8 1-bit ALUs.

Bus Enable

Usage

Used to connect to the I/O bus for selecting between the direct addressinput and the next address output. It is controlled by the level of theclock so that when the clock is high we read the direct input and when theclock is low the next address output lines are set.

Components

Three state-buffer

The Bus enable is a series of twelve three state-buffers.

Clock Inverter

Usage

The clock inverter is used whenever we need a signal and its complement ascontrol signals. It is not just used for the clock, we use it all overthe chip. It generates two signals that are complements of each otherwith no overlap of the signals.

Components

We used a transmission gate designed with the same delay as a stronginverter, which are placed in parallel and given the same input. The twooutputs are 180 degrees out of phase.

Condition Multiplexer

Usage

The 8 to 1 condition mux selects between various external signals which are used todetermine the way an instruction is executed. For example, which path of a branch istaken. By using the mux, the decision can be made based on:

- The sign of the ALU output

- Whether the ALU output equals zero

- Whether the ALU output overflowed

- The shift out bit from the ALU

- The carry out bit from the ALU

- The interrupt signal

- Always true (1)

- Always false (0)

Control Unit

Usage

Decodes the instruction given as input to the chip, along with signalsaddress zero and condition to generate control signals for all thecomponents on chip. The address zero signal comes from the addressregister/counter and is used for loop control. The condition signal comesfrom the condition mux, which selects from external signals coming fromother parts of the CPU.

Components

The control unit is made up of random logic. We used the instruction setfrom Mick&Brick and used K-maps to design the logic schematic.

12-bit Decrementer

Usage

The 12-bit decrementer is used in the address register/counter todecrement the current value by one.

Components

dec first bit

dec last bit

dec two bit

The decrementer is a ripple carry decrementer which is composed of a onebit decrementer which is basically an inverter followed by 5 dec two bitcomponents. The two-bit component uses alternating logic for speed. Thedecrementer is completed with a one bit decrementer at its tail, which is a single xor.

12-bit Incrementer

Usage

The 12-bit incrementer is used in the address register/counter todecrement the current value by one.

Components

inc first bit

inc last bit

inc two bit

The incrementer is a ripple carry incrementer which is composed of a onebit incrementer which is basically an inverter followed by 5 dec two bitcomponents. The two-bit component uses alternating logic for speed. Theincrementer is completed with a one bit incrementer at its tail, which is a single xor.

Next Address Multiplexor

Usage

The next address multiplexor is a 12-bit, 4 to 1 multiplexor which selects the source of the next address output. It selects from:

- Direct Input

- Stack

- Address Register/Counter

- Program Counter
12-bit Register w/ clear

Usage

The 12-bit register is used as a component of the stack, address register,and also to hold the input/output signals.

Components

12 DFF w/ clear

The 12-bit register is composed of 12 master/slave DFFs that have a clearline. The clock and clear lines are shared among the twelve bits but theyare otherwise independent.

Stack 4 x 12

Usage

This FIFO stack holds four twelve bit words. It is used to hold returnaddresses while making a subroutine call or conditional jump.

Components

stack 4x1

stack control

reg2 w/ clear

decoder

The stack is composed of 12 4 word, 1-bit units, and a component togenerate the correct address to load or read when given a push or popinstruction. The two registers always hold the next write address andread address, which are two bits each. They are set by a stack controlunit whenever a push or pop instruction has been issued. The decoder thenactivates the correct word position in each 4x1 unit. The 4x1 unit has amux to activate the correct word position for output based on the readaddress.