Improved Brent & Kung Adder

A. H. MAAMAR G. Russell

Computer Engineering Department E. E. and Computer Engineering Department

HIE University of Newcastle upon Tyne

P.O.Box 38645, BeniWalid NE1 7RU , Newcastle

Libya UK

Abstract:This paper describes a modification to the Brent & Kung (M-B&K) adder which not only increases the speed of operation but also reduces the amount of the hardware for its implementation. This modified adder is being incorporated into a 32-bit RISC processor which has a concurrent error detection capability.

Keywords: Binary adder, Carry Lokahead Adder,CSA, M-B&K adder, vlsi

1

1 Introduction

The design of a n-bit binary parallel adder for a VLSI circuit is trade-off between speed of operation and hardware complexity (chip area). When speed is not the concern, a binary Carry Ripple Adder (CRA) is the best choice, because it is the least expensive, its fan-in and fan-out are very modest and remain constant as n increases, which makes it very regular; hence ideal for VLSI implementation. When speed is much more important than chip area, the Carry Look Ahead (CLA) adder may be the right choice. In theory the CLA is very fast as it can generate the sum of any two n-bit numbers in only 4 gate delay levels, but in practice this is not the case; as the number of bits increases the number of gates increases, the fan-in (n+1) increases ( the Ci carry signal is the OR of i+1 product terms) and fan-out also increases, this limits the size of the largest practical CLA. Four-bits is the optimum size of a CLA block; a n-bit CLA adder is constructed from 4-bit cascaded CLAs with carry ripple between the stages; however, the speed of the adder will suffer from the carry ripple between the stages. Another way to speed up the addition time is the use of Carry Select Adders (CSA), where both CRA and CLA may be used, but the drawback of CSA is the hardware complexity ( about double that of the CRA or the CLA). In CSA adders, with the exception of the first block, two adders are used for addition in each block, one adder with Cin=0 and the other adder with Cin=1, the two results for all the blocks become available about the same time, the Cout of the first block will select the right result of the second and so on.

Brent & Kung [1] presented an adder which increases the speed of the addition time compared with cascaded CLAs, increased regularity for VLSI , and reduced chip area compared to the CSA.

This paper demonstrates how to modify Brent&Kung adder to improve the addition delay time, and also the gate count used to build the adder. This modification will decrease the gate delays levels from 18 to 12 to generate the sum of two 32bit numbers.

2 Construction of M-B&K Adder

Let pi = xi yi, gi= xi . yi, are the carry propagate and the carry generate signals respectively, for bit position i generated at gate level 1, and are the block carry propagate and block carry generate respectively, for bit position i generated at gate level t. The carry signals are generated as shown below :

Step1: Generate all the carry propagate and carry generate signals for all bit positions, this step takes 1 gate delay.

Step2: Generate , and where i =8,12,16,20,24, and 28. This step takes 2 gate delays.

Step 3 : Generate carry signals and , block carry propagate and block carry generate for bit positions i =20, 28, this step takes 2 gate delays.

Step 4 : Generate and , the step takes 2 gate delays.

Step 5: Generate and , the step takes 2 gate delays.

Step 6: Generate all remaining carries not produced in the previous steps, from Figure(1) it can be seen that the generation of the carry signals at this step is straight forward, the step takes 2 gate delays.

Step7: The sum of the two numbers can now be obtained, since all of the carry signals are generated, the sum at bit position ‘i’ is given by . The step takes 1 gate delay, and the total gate delay count is 12 gate levels.

Five types of cells are used to implement the carry generator algorithm to build the 32-bit adder, namely:

Cell-1: This cell is the same as that used to produce C4 in any 4-bit CLA, it receives 4 carry propagate signals , 4 carry generate signals, and the input carry Cin. This cell is used only once to generate C4.

Cell-2: Generates one block carry propagate signal (P), one block carry generate signal (G), from 4-carry propagate signals and 4-carry generate signals, this cell is used 6 times.

Cell-3: Generates one block carry propagate, and one block carry generate signal, from 2-carry propagate, and 2-carry generate input signals.

Cell-4: This cell is the same as that used to generate C2 in any 4-bit CLA, it receives two block propagate signals, two block generate signals, and 1-Carry signal from previous generated carries. The cell used to produce C12 only.

Cell-5: This cell is the same as that used to produce C1 in a 4-bit CLA, it produces an output carry from one propagate, one generate, and one output carry. This cell is the most widely used cell in the adder, it is used to produce all carries except C4, and C12.

All the cells used in the design have a delay of 2 gate levels, assuming that the 2-input gate has the same gate delay as 5-input OR gate, ( simulation result shows that the difference is very small so it can be neglected). The connections of the cells used to build the 32-bit carry generation circuit as shown in Figure(1), where it can be seen that the maximum fan-out is 5, which is the same as the maximum fan-out in a 4bit-CLA.

3 Results

Table(1) shows the comparison of the worst case delays for adders of 8, 16, 24, and 32bits, using three different designs ( CLA, B&K, and the M-B&K design presented in this paper). From the table it is clear that the proposed design is the fastest, about 50% faster then the CLA, and about 18% faster then the B&K adder if 32bits are processed.

Table(2) shows the gate count for 8 different sizes of adders using the three methods. The maximum gate size is 5-input OR gate, and 5-input AND gate. The table shows the proposed adder not only reduces the addition delays, but also reduces the gate count compared with the CLA and B&K adders. The overall comparisons of the addition time of the adders is shown in Figure(2).

4 Conclusion

From the results above the M-B&K adder has a distinct advantage not only in terms of speed of performance, but also regarding gate count when compared to the CLA and B&K adder.

1

1

Adder / 8-bits / 16-bits / 24-bits / 32-bits
CLA / 11.547 / 21.311 / 31.075 / 40.592
B&K / 12.374 / 17.950 / 20.776 / 23.525
M-B&K / 9.224 / 12.811 / 15.676 / 18.387

Table(1) Carry Generation Delays

Adder / 8-bits / 16-bits / 24-bits / 32-bits
CLA / 52 / 104 / 156 / 208
B&K / 52 / 110 / 171 / 232
M-B&K / 55 / 104 / 170 / 221

Table(2) Gate Count

p32,g32 3 5 Cout

p31,g31 3 5 C31

p30,g30 3 5 C30

p29,g29 5 C29

p28,g28 2 3 5 C28

p27,g27 3 5 C27

p26,g26 3 5 C26

p25,g25 5 C25

p24,g24 2 5 C24

p23,g23 3 5 C23

p22,g22 5 C22

p21,g21 5 C21

p20,g20 2 3 5 C20

p19,g19 3 5 C19

p18,g18 5 C18

p17,g17 5 C17

p16,g16 2 5 C16

p15,g15 5 C15

p14,g14 5 C14

p13,g13 5 C13

p12,g12 2 4 C12

p11,g11 5 C11

p10,g10 5 C10

p9,g9 5 C9

p8,g8 2 5 C8

p7,g7 5 C7

p6,g6 5 C6

p5,g5 5 C5

p4,g4 1 C4

p3,g3 5 C3

p2,g2 5 C2

p1,g1 5 C1

Cin Cin

pi,gi Ci

Fig(1) Connections of the cells of the 32-bit M-B&K Adder

1

References:

[1] R.P. Brent and H. T. Kung .” A Regular Layout of Parallel Adders”, IEEE Trans. Computer, vol.C-31, March 1982, pp260-264.

[2] E.E.Swartzlander, Application Specific Processors,. Kluwer Academic Publishers, 1997

[3] N.Weste and K.Eshraghian, Principles of CMOS VLSI Design: A systems Perspective,. MA: Addision-Wesley,1988.

1

Fig (2) Comparison of addition Time

1