1. INTRODUCTION
This project will describe the design and implementation from ground-up of a simple 4-bit up/down counter using CMOSIS5 technology.
1.1 Detail Description
The choice of an up/down counter was made in order to understand issues usually associated with clock distribution and power dissipation in static CMOS circuits. An up/down counter has a good balance between combinatorial and sequential logic building blocks. A 4-bit counter is a fairly small circuit by VLSI standards. The fact that this is a counter allows us to achieve different level of optimization both at the logic/algorithmic level and also at the architecture level. Different schemes of generating the next count will be simulated. Only the most efficient circuit design will be chosen for layout though, as time does not permit to try all the schemes available.
Possible areas of investigation:
· Performance
§ Power / Frequency relationship
§ Area / Power relationship
§ Circuit Implementation / Power /Speed relationship
· Clock distribution
· Relative block placement
· Transistor sizing / Speed relationship
· Voltage variation / Power / Speed relationship
While there are many parameters and data to be gathered from such experiment, we will try to focus on the main aspects of good design and implementation methodologies in general. The project will hence aim at a system with minimum power dissipation, maximum frequency operation and good clocking strategies.
1.2 System Requirement
The circuit should perform according to a generic up/down 4-bit counter. Such a counter usually has 3 inputs, an asynchronous clear, a clock and an up or down control signal. The result is obtained on a 4-bit output which comes from a register. On each rising clock, the circuit will count up if and only if the up-down signal is at logic high and will count down otherwise. The count will roll-over from “1111” to “0000” when incrementing while it will count from “0000” to “1111” if decrementing.
The circuit should be implemented on 0.6um technology. Transistors should be sized accordingly in order to allow symmetric rise time and fall time.
Test-benches are to be designed and used to cover all probable situations.
The block diagram, state transition table and state diagram for the design are represented below.
Fig. 1 Graphical system description
1.3 Design Analysis
The state transition table gives us the next state equations from which we can write the resultant logic functions driving the register. All Minterms from the four columns under “next state” are written into K-Map tables so as to eliminate any redundant terms.
Fig. 2 System Karnaugh-Map
Finally, the compacted next state equations are obtained. The resulting equations describe next states for the system.
The A, B, C and D terms are now replaced by their Q0, Q1, Q2, and Q3 terms that represent the current state. After some more minimizations that involve grouping the UD terms and collapsing terms into XOR functions, we get the final solution for the next state.
The above table contains the main equations that will be implemented as CMOS combinatorial logic. The above formula also contains two recursive terms. If we ever wanted to increase the depth of the counter, then we can see clearly which terms need to be added or change.
The gates with the largest fan-in are the XOR and the OR, with a maximum of three inputs. The maximum fan-out for the design is three loads.
GATE / QUATITY / COMPLEXITY
(TRANSISTORS) / TOTAL
(TRANSISTORS)
3-INPUT XOR / 3 / 24 / 72
3-INPUT OR / 2 / 24 / 48
2-INPUT AND / 6 / 6 / 36
D FLIP-FLOP / 4 / 32 / 128
INVERTER / 1 / 2 / 2
TOTAL / 286
As shown above, the complexity of the circuit, excluding input/output buffers is 286 transistors, placing the design in the Small Scale Integration (SSI) category.
2. DIGITAL LOGIC SYNTHESIS
The circuit below maps the logic functions that were obtained earlier onto discrete CMOS gates. It shows that the circuit consists of five types of gates; NAND, OR, XOR, NOT and a 4-bit register. There exists a 3-levels combinatorial delay from the output of the register to its input. This, along with the interconnect delay will form the basis of the maximum frequency of operation.
Fig. 3 Proposed up-down counter circuit
Details about the type of power supply used are hidden. For our process technology, a 3.3V power supply is used.
2.1 NAND Gate Transformation
Static CMOS logic should be implemented using NAND gates only as this will provide minimum transistor count, power dissipation and delay. Before implementing the design, we verify if there are any parts of the circuit that can be implemented using NAND gates only. A quick look at the circuit reveals that the structures doing the sum of product (SOP) operation can be efficiently implemented using NAND gates.
The following transformation applies:
The resulting logic is implemented using two or three inputs NAND gates with a maximum of two logic level delay. The propose circuit possesses similar characteristic as opposed to a conventional SOP implemented using OR and AND gates.
Fig. 4 NAND gate reduction
2.2 Basic Elements
The system consists of five main logic elements. Each one is built using CMOS technology and as such, implements the logic function either in a pull-up or pull-down structure with the exception of the XOR which propagates the right logic level.
· Inverter Gate (NOT)
The logic function for a NOT gate is:, where x is the input. The function will be implemented using CMOS technology.
Fig. 5 CMOS Inverter
· 2-inputs NAND (NAND2)
The logic function for an NAND2 is:, where a and b are the inputs. The logic function will be implemented using CMOS technology.
Fig. 6 NAND2 gate
· 3-inputs NAND (NAND3)
The logic function for an NAND3 is:, where a and b are the inputs. The logic function will be implemented using CMOS technology.
Fig. 7 NAND3 gate
2.3 Complex Elements
2.3.1 3-inputs Exclusive-Or (EXOR3)
The logic function for an XOR3 is:, where a, b and c are the inputs. The function will be implemented using CMOS transmission gates.
Fig. 8 Exclusive-Or 3-inputs PTL gate
2.3.2 Data Flip-Flop (DFF)
Implementing a register requires a different approach from generic combinatorial logic. Registers can hold a logic state indefinitely and will capture its inputs only on the rising or falling edge of a clock, as required by the design (rising edge for our purpose).
Fig. 9 Data Flip-Flop gate level implementation
The scheme used for the register is based on three SR-Latches. The first latch will memorize information at its input whenever the clock input is low. On the next transition to a high level, the content of the first latch will be transferred to the upper latch. The outputs from both latches will eventually set and reset inputs to a third latch. This latch has the advantage over the conventional transmission gate flip-flop of being built using basic NAND gates, hence saving on area. The design is eventually more modular and compact. The availability of both true and complementary outputs facilitates system integration and reduces the number of logic gates by one.
2.3.3 4-bit Register (REG4)
A register consists of a set of D flip-flops that share a common clock and reset. Together, they will store a word of data. The behavior of the register will hence depend directly on that of its primitive element. The clock being shared among all Flip-Flops will require to be buffered at some stage.
Fig. 10 4-bit register block
3. BASIC INVERTER MODEL
The static and dynamic characteristics of the gates used in the design are based on a two transistors inverter model. It is made so as to have approximately equal rise and fall times at the output. The output drive of the inverter is also assumed to be a capacitive load of 25Npf, where N is the fan-out. As such, the PMOS transistor is first sized four times more than the NMOS transistor in order to compensate for the smaller K’ parameter. Then, both transistors are sized once more with similar multiplying factor in order to account for the extra current needed to drive large loads.
3.1 Inverter Static Characteristics:
VDD / 3.3VVSS / 0V
VIL (max) / 0.5V
VIH (min) / 2.4V
VOL (max) / 0.3V
VOH (min) / 3.0V
NML / 0.2V
NMH / 0.6V
We begin with a minimum sized inverter. For CMOSIS5 technology, the minimum length is 0.6um and 0.9um for the width. We then size the PMOS and NMOS transistors so that they have the same trans-conductance value.
, based on CMOSIS5 V2.1 model
The ratio of transistors for CMOSIS5 technology is:
, rounded to nearest whole number
The minimum drawn length for the NMOS transistor is set to 0.6um and 2.4um for its width. Correspondingly, a PMOS transistor has a drawn length of 0.6um for a width of 9.6um.
3.2 Maximum Transistor Current
The maximum saturation PMOS current is obtained for VIL and VOH:
Similarly, maximum saturation NMOS current obtained for VIH and VOL:
3.3 Inverter Gate Capacitance
In order to get an approximate propagation delay for the inverter, we first evaluate the Cgp and Cgn. We proceed to get the total area of both transistors and multiply that value with Cox to get the total gate capacitance. The value gives an indication of how much capacitance is placed on a node by the input of an inverter.
Area of NMOS transistor = 0.6x2.4 = 1.44 m2
Area of PMOS transistor = 0.6x2.4x4 = 5.76 m2
Assuming Cox to be 3.59fF,
Gate capacitance of NMOS = 1.44 x 3.59fF = 5.04fF
Gate capacitance of PMOS = 5.76 x 3.59fF = 20.68fF
Total gate capacitance = 25.7fF (rounded to 25fF)
3.4 Inverter Dynamic Characteristic:
The performance of the system depend on the switching frequency allowed by each component and hence by the propagation delay of the simple inverter model. The delay also depends on the amount of capacitive load being driven. For our calculations, we will first assume a loaded output of one fan-out (25fF).
The gate delay is obtained by:
, Where tphl and tplh represents propagation delays between low-to-high and high-to-low signal with the output of the inverter.
The low to high transition propagation delay () of a signal at the output of an inverter driving a single load (fan-out of 1) is obtained from the equation:
Similarly, fall time () at the output of an inverter driving a single load is obtained:
,
Similarly, the rise time and fall time for the inverter is obtained from the following equations:
, where K 3.5
,
3.5 Comparative Delay
Calculated (ps) / Simulated (ps)tplh / 15.4 / 15.7
tphl / 13.2 / 13.8
tpd / 14.3 / 14.7
tr / 34 / 32
tf / 33 / 31
3.6 Power Consumption
An important aspect of ASIC design is reduction of power consumption to a minimum. The energy used in a circuit can be either useful or be wasted unnecessarily as heat and leakage current through the substrate. The former can be avoided by employing some good design practices:
· The algorithm is reduced so as to have minimum amount of Boolean functions
· Transistors are drawn to the minimum sizes (some special cases exist)
· Interconnect are reduced to a minimum
· Number of metal layers used is limited to two or less
The knowledge of the system power consumption is critical in designing the system properly. For the interconnection, we need to know before hand an approximate value of how much supply current will be distributed to logic gates. This will help in sizing the power rails properly.
For a single inverter sized to the smallest length and width, we can calculate, for our process, the amount of energy used up in the switching process.
The power is usually distributed among three components:
, where
,
,
,
Estimated total power per inverter with 25fF load= 84uA* 3.3V = 0.28mW at 125MHz.
Basic gate (50MHz) / Equivalent inverter number / Approximate power (mW)NAND2 / 2 / 0.5
NAND3 / 3 / 0.75
DFF / 16 / 4
EXOR3 / 18 / 4.5
SYSTEM / 286 / 30.5mW
3.7 NAND2 Gate Design
Static analysis:
DC performance of the NAND2 gate is based on the basic inverter model with similar operating voltages.
Dynamic analysis:
For the dynamic analysis, equal tr and tf are needed and therefore, the transistors are sized according to the inverter model. The cases where transistors are in series are dealt with differently and for a 2 input gate, the width of the transistors (NMOS) is sized twice more than the basic inverter model. The transistors are also implemented using minimum width and length for the NMOS with the proper multiplication factor for PMOS.
The output loading capacitance used to model the gate delay is representative of a unit fan-out (25fF). This includes parasitic capacitances of the driver transistors as well as the gate capacitances of the next stage transistors.
, ,,
,
,