Jeffrey Dwoskin & Kevin Green
VLSI Design Project Report
Fall 2001 – Spring 2002
Project Description
We have designed a microcontroller chip based on the RTL design and instruction set of an AMD 2910. Our chip is the sequencer component of the microcontroller, which processes instructions from the micro-program ROM and determines the address of the next microprogram instruction. The controller has a 4 input multiplexor that is used to determine the next instruction from the direct address input, the microprogram counter, the register/counter, or the 4-word stack.
The direct address is an inputto the chip which is used to initialize the controller, start execution of a new instruction with an address from the mapping PROM, or for branching instructions from the microprogram ROM.
The microprogram counter consists of an incrementer followed by a 12-bit register. The incrementer takes as input, the current microprogram address and adds one to advance to the next instruction. This new address is then stored in the register.
The address register/counter is used for looping over a set of instructions. It does this by first loading a count from the direct input, and then decrementing after each loop iteration. When the count reaches zero, a signal is sent that can be used to stop executing the loop. It can alternatively be used to store an address for a conditional jump instruction.
The 4-word 12-bit stack is used to execute subroutines and stores the return address. Since it has 4-words, it can be used for 4 levels of depth for subroutine calls.
During each instruction, the microcode ROM also provides the control signals for the rest of the micro-controlled CPU. The combination of our design, the microprogram ROM, and the mapping PROM make up the control unit for the CPU. The advantage of the microprogrammed control unit over a standard state machine design of a control unit is that by changing the program in the microprogram ROM and mapping PROM, it can be made to emulate the instruction set of any standard microprocessor.
We also designed a bit-sliced 8-bit ALU which resides on a separate chip. Multiple ALU chips can be cascaded to produce a large a bus width as desired. The ALU implements, addition/subtraction, all logic functions including: AND, OR, XOR, and their complements. Also, the second input can be inverted to allow for useful logic functions such as A AND (NOT B). The result can simultaneously be shifted left or right by 1-bit.
Testing Plan
We plan to use a sequential ATPG to find the all the test vectors for our chip. We plan to use SEST for this purpose. The fault coverage should be very high because there is a global reset line that will initialize all flip-flops, and our chip is designed in a such a way as to give access
through the pins to all of the components directly by setting the correct control signals. Moreover, there is an absence of cycles among the flip flops, which will allow the sequential ATPG to fully test the circuit. We could then use an ATE to apply the test vectors.
The first step involved would be to netlist our design into the rutmod format suitable for input into SEST. Then SEST will be able to generate all of the test vectors. However, due to the fact that our design is in Cadence and this software does not support the rutmod format we would have to design the netlist by hand, which is not feasible considering we have thousands of
transistors. Therefore, we will have to wait for a tool to convert our netlist from Cadence into rutmod. However, as stated above we believe that our design is optimal for testability because all components have high observability and controllability.
Criticism of the CAD Tools
Overall we found the CAD tools were straightforward and easy to use, however we found a few things that need improvement. At first we had designed one portion of our project in synopsis, but we were unable to convert the resulting hardware from synopsis into cadence. Synopsis also does not provide enough details of the interconnection between components for us to convert the design manually. For example, when it chose to use a JK flip flop, it showed a box with 4 or more inputs and the wires going into it. It did not tell us which wires went to which inputs in the flip flop. More importantly, we could not get synopsis to restrict its design to only use certain components that we chose. For example, we wanted it to use inverting logic instead of non-inverting logic and only D flip flops instead of JK flip flops, but we were unable to build a new library or restrict the standard G-tech library to accomplish this. We instead had to design the component by hand using K-maps.
As for Cadence, we were unable to simulate extracted layouts. For most of the fall semester, many of the components of cadence did not work for the AMI process. This included the simulator, extractor, and LVS. This made it difficult to test and verify our design as we went along. Also the design rule checker (DRC) was not working for most of the time we were laying out our design. This made the layout time consuming and more error prone and set us back a couple of months. It also meant that we didn’t know to follow some of the more obscure rules from the printed design rules.
We had some other problems with the Affirma analog simulator. First, it is very slow and difficult to work with. Many of the settings we have to set to the same values repeatedly, which it should remember. Also, it could use a much simpler interface, especially for entering the stimuli. For any circuit with more than 2 or 3 input pins, setting the stimuli correctly is very tedious. We’ve had problems simulating designs with a hierarchy, especially when it doesn’t identify global sources deeper in the hierarchy. There is a problem with the model libraries as well, although we're using AMI06 tech library, it still is using tsmc25 in the netlist and we can't determine where its getting this information.
Timing/Critical Path Analysis
Data from AMI C5N Process:
Sampled from:
Sheet resistance:
metal1/2: 0.09 ohm/sq
metal3: 0.06 ohm/sq
poly:22 ohm/sq
m1/2 contact:0.7 - 0.85ohm
Capacitance: (aF(10-18)/µm or µm2)
m1 m2 m3
area (sub) 31 17 10
area (m1) 32 13
area (m2) 36
fringe (sub) 76 59 39
fringe (m1) 56 35
fringe (m2) 51
Wire Delay:
Longest path: assume worst case all metal 1
pc output to next addr mux: 1550µm + 3 contacts
at minimum width = 0.9µm
1550µm/0.9µm = 1722.2 squares long
Resistance = 0.09ohm/sq * 1722.2 sq = 155 ohm + 3 * 0.85ohm = 157.55 ohm
Capacitance:
area (sub) = 31aF/µm^2 * 1550*0.9µm = 43245 aF
fringe (sub) = 76aF/µm * 1550µm = 117800 aF
total cap: 161045 aF = 0.161045 pF
RC = Tdelay = 25.373 ps = 0.025373 ns
Conclusion: wire delay is insignificant
rough approx (min width wire, assuming 1µm wide)
RC = 25.373ps / 1550µm = 0.01637 ps / µm
Component Delay:
Simulated to find delays:
DFF w/clr - .5 ns
2to1 mux - .675 ns
4to1 mux - .900 ns
inc/dec 12 bit - 2.55ns
condition mux (4to1 + 2to1 mux) 1.575 ns
Calculated from components that were simulated:
bus enable - .325 ns
control unit - 1.775 ns
stack control -- longest path through two 2-to-1 muxes - 1.35 ns
Critical Path Analysis:
There are 8 major paths in our chip that we are considering. They are the 4 inputs into the next address mux, and the paths that drive the components in each of those path.
Stack path output:
675um from control unit to stack control – ignore
delay for signal thru mux -- .900 ns
1150um from stack dff output to next addr mux input -- ignore
Total: 0.900 ns
Stack input:
delay thru stack control -- .900 ns
or 620um from pc to stack inputs -- ignore
delay to load registers in stack -- .5 ns
Total: 1.4 ns
Program counter Input:
700um from next addr mux output to incrementor -- ignore
delay thru incrementer -- 2.55 ns
delay to set register -- .5 ns
Total: 2.65ns
Program counter output:
1550um from PC output to next addr mux -- 0.025373ns
delay thru next addr mux -- .9ns
Total: 0.925ns
Addr/reg load input:
1100um from input reg to addr/reg mux -- ignore
or 750um control signals from control unit to load/dec -- ignore
delay thru 2to1 mux – 0.675ns
400um to other 2to1 mux -- ignore
delay thru 2to1 mux – 0.675ns
delay to set register – 0.5ns
Total: 1.85ns
Addr/reg decrement input:
750um control signals from control unit to load/dec --ignore
delay thru decrementer -- 2.55ns
delay thru 2to1 mux -- .675ns
400um to other 2to1 mux -- ignore
delay thru 2to1 mux -- .675ns
delay to set register -- .5ns
Total: 4.4ns
Addr/reg output
825um from register output to next addr mux -- ignore
delay thru next addr mux -- .9ns
Total: 0.9ns
Control unit input:
575um from pads into condition mux -- ignore
delay thru condition mux (8to1) -- 1.575 ns
650um from condition mux to control unit -- ignore
delay thru control unit -- 1.775 ns
Total: 3.35ns
Conclusion: The address register’s decrementer input results in the longest delay of 4.4ns. This must occur during half of the clock cycle, which makes our clock period:
1/8.8ns = 113.6 MHz
In order to be safe, we’ll say our maximum clock rate is 100 MHz.
Transistor Count
Control Unit: 336
8to1 Mux:36
4to1 Mux – 12-bit: 192
Addr Reg/Dec- 12-bit:496
Stack:1358
2 x 12-bit registers:504
2 x bus enable:48
5 x clock inverters:24
Incrementor:108
Total:3102
Power Dissipation
Estimate chip as 3102 transistors/2 = 1551 inverters
Gate Capacitance:
Gate cap on an inverter: 1008.1 aF
Total gate capacitance Cg = 1008.1 * 1551 = 1.56pF
Diffusion Capacitance:
Cd = Cja x a x b + Cjp x (2a + 2b)
a = 1.5µ, b = 1.2µ
P-trans: 5x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.00306pF
N-trans: 3x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.0027pF
Total Cd = (0.00306 + 0.0027) x 1551 inverters = 8.93pF
Interconnect Capacitance:
Total length of major interconnects in Metal 1: 21,505µm
Total length of major interconnects in Metal 2: 53,660µm
Metal 1: 21505µ x 0.9µ x 31 aF/µ2 + 21505µ x 76aF/µ = 2.2pF
Metal 2: 53660µ x 0.9µ x 17 aF/µ2 + 53660µ x 59aF/µ = 3.98pF
Total Interconnect Capacitance = 6.18pF
Total Load Capacitance:CL = 1.56pF + 8.93pF + 6.18pF = 16.67pF
Power = CL x VDD2 x f = 16.67pF x 5V2 x100MHz = 41.675mW
Metal Migration: I = 41.675/5V = 8.3mA
Width = 8.3mA / 0.5mA/µ = 16.67µ = 8.33µ per power line. We made them 12µ wide to be safe.
Address Register/Decrementer
Usage
The address register/decrementer is used primarily for execution ofloops. First the register is loaded with an initial count from the directinput. On each clock period the value stored is decremented and comparedto zero. When the value reaches zero a signal is sent to the condition
mux so execution of the loop is completed. The addressregister/decrementer can also be used to store an address to jump back toduring conditional branches.
Components
12-bit register w/ clear
12-bit decrementer
2x12 2 to 1 muxes
zero detector
The address register/decrementer is composed of the four components listedabove. The register is loaded from a mux which selects from either itsprevious value or the other mux. The other mux selects between the outputof the decrementer and the direct input. The decrementer takes its inputfrom the current value held in the registers and subtracts one from thisvalue. All the components are 12-bits wide. Of course the zero detectorsignals when the current value of the register is all zeros. The registeralso has a 12-bit output to the next address mux, which is used for thebranching operations.
Bit-sliced 8-bit ALU
Usage
Multiple ALU chips can be cascaded to produce a large a bus width as desired. TheALU implements, addition/subtraction, all logic functions including: AND, OR, XOR,and their complements. Also, the second input can be inverted to allow for usefullogic functions such as A AND (NOT B). The result can simultaneously be shifted leftor right by 1-bit.
Components
The 8-bit ALU is a ripple carry configuration of 8 1-bit ALUs.
Bus Enable
Usage
Used to connect to the I/O bus for selecting between the direct addressinput and the next address output. It is controlled by the level of theclock so that when the clock is high we read the direct input and when theclock is low the next address output lines are set.
Components
Three state-buffer
The Bus enable is a series of twelve three state-buffers.
Clock Inverter
Usage
The clock inverter is used whenever we need a signal and its complement ascontrol signals. It is not just used for the clock, we use it all overthe chip. It generates two signals that are complements of each otherwith no overlap of the signals.
Components
We used a transmission gate designed with the same delay as a stronginverter, which are placed in parallel and given the same input. The twooutputs are 180 degrees out of phase.
Condition Multiplexer
Usage
The 8 to 1 condition mux selects between various external signals which are used todetermine the way an instruction is executed. For example, which path of a branch istaken. By using the mux, the decision can be made based on:
- The sign of the ALU output
- Whether the ALU output equals zero
- Whether the ALU output overflowed
- The shift out bit from the ALU
- The carry out bit from the ALU
- The interrupt signal
- Always true (1)
- Always false (0)
Control Unit
Usage
Decodes the instruction given as input to the chip, along with signalsaddress zero and condition to generate control signals for all thecomponents on chip. The address zero signal comes from the addressregister/counter and is used for loop control. The condition signal comesfrom the condition mux, which selects from external signals coming fromother parts of the CPU.
Components
The control unit is made up of random logic. We used the instruction setfrom Mick&Brick and used K-maps to design the logic schematic.
12-bit Decrementer
Usage
The 12-bit decrementer is used in the address register/counter todecrement the current value by one.
Components
dec first bit
dec last bit
dec two bit
The decrementer is a ripple carry decrementer which is composed of a onebit decrementer which is basically an inverter followed by 5 dec two bitcomponents. The two-bit component uses alternating logic for speed. Thedecrementer is completed with a one bit decrementer at its tail, which is a single xor.
12-bit Incrementer
Usage
The 12-bit incrementer is used in the address register/counter todecrement the current value by one.
Components
inc first bit
inc last bit
inc two bit
The incrementer is a ripple carry incrementer which is composed of a onebit incrementer which is basically an inverter followed by 5 dec two bitcomponents. The two-bit component uses alternating logic for speed. Theincrementer is completed with a one bit incrementer at its tail, which is a single xor.
Next Address Multiplexor
Usage
The next address multiplexor is a 12-bit, 4 to 1 multiplexor which selects the source of the next address output. It selects from:
- Direct Input
- Stack
- Address Register/Counter
- Program Counter
12-bit Register w/ clear
Usage
The 12-bit register is used as a component of the stack, address register,and also to hold the input/output signals.
Components
12 DFF w/ clear
The 12-bit register is composed of 12 master/slave DFFs that have a clearline. The clock and clear lines are shared among the twelve bits but theyare otherwise independent.
Stack 4 x 12
Usage
This FIFO stack holds four twelve bit words. It is used to hold returnaddresses while making a subroutine call or conditional jump.
Components
stack 4x1
stack control
reg2 w/ clear
decoder
The stack is composed of 12 4 word, 1-bit units, and a component togenerate the correct address to load or read when given a push or popinstruction. The two registers always hold the next write address andread address, which are two bits each. They are set by a stack controlunit whenever a push or pop instruction has been issued. The decoder thenactivates the correct word position in each 4x1 unit. The 4x1 unit has amux to activate the correct word position for output based on the readaddress.