Improved Brent & Kung Adder
A. H. MAAMAR G. Russell
Computer Engineering Department E. E. and Computer Engineering Department
HIE University of Newcastle upon Tyne
P.O.Box 38645, BeniWalid NE1 7RU , Newcastle
Libya UK
Abstract:This paper describes a modification to the Brent & Kung (M-B&K) adder which not only increases the speed of operation but also reduces the amount of the hardware for its implementation. This modified adder is being incorporated into a 32-bit RISC processor which has a concurrent error detection capability.
Keywords: Binary adder, Carry Lokahead Adder,CSA, M-B&K adder, vlsi
1
1 Introduction
The design of a n-bit binary parallel adder for a VLSI circuit is trade-off between speed of operation and hardware complexity (chip area). When speed is not the concern, a binary Carry Ripple Adder (CRA) is the best choice, because it is the least expensive, its fan-in and fan-out are very modest and remain constant as n increases, which makes it very regular; hence ideal for VLSI implementation. When speed is much more important than chip area, the Carry Look Ahead (CLA) adder may be the right choice. In theory the CLA is very fast as it can generate the sum of any two n-bit numbers in only 4 gate delay levels, but in practice this is not the case; as the number of bits increases the number of gates increases, the fan-in (n+1) increases ( the Ci carry signal is the OR of i+1 product terms) and fan-out also increases, this limits the size of the largest practical CLA. Four-bits is the optimum size of a CLA block; a n-bit CLA adder is constructed from 4-bit cascaded CLAs with carry ripple between the stages; however, the speed of the adder will suffer from the carry ripple between the stages. Another way to speed up the addition time is the use of Carry Select Adders (CSA), where both CRA and CLA may be used, but the drawback of CSA is the hardware complexity ( about double that of the CRA or the CLA). In CSA adders, with the exception of the first block, two adders are used for addition in each block, one adder with Cin=0 and the other adder with Cin=1, the two results for all the blocks become available about the same time, the Cout of the first block will select the right result of the second and so on.
Brent & Kung [1] presented an adder which increases the speed of the addition time compared with cascaded CLAs, increased regularity for VLSI , and reduced chip area compared to the CSA.
This paper demonstrates how to modify Brent&Kung adder to improve the addition delay time, and also the gate count used to build the adder. This modification will decrease the gate delays levels from 18 to 12 to generate the sum of two 32bit numbers.
2 Construction of M-B&K Adder
Let pi = xi yi, gi= xi . yi, are the carry propagate and the carry generate signals respectively, for bit position i generated at gate level 1, and are the block carry propagate and block carry generate respectively, for bit position i generated at gate level t. The carry signals are generated as shown below :
Step1: Generate all the carry propagate and carry generate signals for all bit positions, this step takes 1 gate delay.
Step2: Generate , and where i =8,12,16,20,24, and 28. This step takes 2 gate delays.
Step 3 : Generate carry signals and , block carry propagate and block carry generate for bit positions i =20, 28, this step takes 2 gate delays.
Step 4 : Generate and , the step takes 2 gate delays.
Step 5: Generate and , the step takes 2 gate delays.
Step 6: Generate all remaining carries not produced in the previous steps, from Figure(1) it can be seen that the generation of the carry signals at this step is straight forward, the step takes 2 gate delays.
Step7: The sum of the two numbers can now be obtained, since all of the carry signals are generated, the sum at bit position ‘i’ is given by . The step takes 1 gate delay, and the total gate delay count is 12 gate levels.
Five types of cells are used to implement the carry generator algorithm to build the 32-bit adder, namely:
Cell-1: This cell is the same as that used to produce C4 in any 4-bit CLA, it receives 4 carry propagate signals , 4 carry generate signals, and the input carry Cin. This cell is used only once to generate C4.
Cell-2: Generates one block carry propagate signal (P), one block carry generate signal (G), from 4-carry propagate signals and 4-carry generate signals, this cell is used 6 times.
Cell-3: Generates one block carry propagate, and one block carry generate signal, from 2-carry propagate, and 2-carry generate input signals.
Cell-4: This cell is the same as that used to generate C2 in any 4-bit CLA, it receives two block propagate signals, two block generate signals, and 1-Carry signal from previous generated carries. The cell used to produce C12 only.
Cell-5: This cell is the same as that used to produce C1 in a 4-bit CLA, it produces an output carry from one propagate, one generate, and one output carry. This cell is the most widely used cell in the adder, it is used to produce all carries except C4, and C12.
All the cells used in the design have a delay of 2 gate levels, assuming that the 2-input gate has the same gate delay as 5-input OR gate, ( simulation result shows that the difference is very small so it can be neglected). The connections of the cells used to build the 32-bit carry generation circuit as shown in Figure(1), where it can be seen that the maximum fan-out is 5, which is the same as the maximum fan-out in a 4bit-CLA.
3 Results
Table(1) shows the comparison of the worst case delays for adders of 8, 16, 24, and 32bits, using three different designs ( CLA, B&K, and the M-B&K design presented in this paper). From the table it is clear that the proposed design is the fastest, about 50% faster then the CLA, and about 18% faster then the B&K adder if 32bits are processed.
Table(2) shows the gate count for 8 different sizes of adders using the three methods. The maximum gate size is 5-input OR gate, and 5-input AND gate. The table shows the proposed adder not only reduces the addition delays, but also reduces the gate count compared with the CLA and B&K adders. The overall comparisons of the addition time of the adders is shown in Figure(2).
4 Conclusion
From the results above the M-B&K adder has a distinct advantage not only in terms of speed of performance, but also regarding gate count when compared to the CLA and B&K adder.
1
1
Adder / 8-bits / 16-bits / 24-bits / 32-bitsCLA / 11.547 / 21.311 / 31.075 / 40.592
B&K / 12.374 / 17.950 / 20.776 / 23.525
M-B&K / 9.224 / 12.811 / 15.676 / 18.387
Table(1) Carry Generation Delays
Adder / 8-bits / 16-bits / 24-bits / 32-bitsCLA / 52 / 104 / 156 / 208
B&K / 52 / 110 / 171 / 232
M-B&K / 55 / 104 / 170 / 221
Table(2) Gate Count
p32,g32 3 5 Cout
p31,g31 3 5 C31
p30,g30 3 5 C30
p29,g29 5 C29
p28,g28 2 3 5 C28
p27,g27 3 5 C27
p26,g26 3 5 C26
p25,g25 5 C25
p24,g24 2 5 C24
p23,g23 3 5 C23
p22,g22 5 C22
p21,g21 5 C21
p20,g20 2 3 5 C20
p19,g19 3 5 C19
p18,g18 5 C18
p17,g17 5 C17
p16,g16 2 5 C16
p15,g15 5 C15
p14,g14 5 C14
p13,g13 5 C13
p12,g12 2 4 C12
p11,g11 5 C11
p10,g10 5 C10
p9,g9 5 C9
p8,g8 2 5 C8
p7,g7 5 C7
p6,g6 5 C6
p5,g5 5 C5
p4,g4 1 C4
p3,g3 5 C3
p2,g2 5 C2
p1,g1 5 C1
Cin Cin
pi,gi Ci
Fig(1) Connections of the cells of the 32-bit M-B&K Adder
1
References:
[1] R.P. Brent and H. T. Kung .” A Regular Layout of Parallel Adders”, IEEE Trans. Computer, vol.C-31, March 1982, pp260-264.
[2] E.E.Swartzlander, Application Specific Processors,. Kluwer Academic Publishers, 1997
[3] N.Weste and K.Eshraghian, Principles of CMOS VLSI Design: A systems Perspective,. MA: Addision-Wesley,1988.
1
Fig (2) Comparison of addition Time
1