### American Journal of Circuits, Systems and Signal Processing

Vol. 1, No. 3, 2015, pp. 69-85

http://www.aiscience.org/journal/ajcssp



# High Level Modelling and Relative Comparison of Different Full Adder Structures

### Negin Mahani\*

Computer Engineering Faculty, Shahid Bahonar University, Zarand High Education Centre, Kerman, Iran

#### **Abstract**

Comparing different full adders is very significant for VLSI design as they are essential components in almost all digital circuits. The science has yield the benchmarking work laborious as often distinct implementation techniques and technologies have been used in the design. Additionally, the design characteristics which are selected for performance analysis are not consistent. This paper shows the results of comparing four adder structures by implementing them all with the same technology and the same level of abstraction.

### **Keywords**

Carry-Select Adder, Carry-Look-Ahead Adder, Carry-Skip Adder, Brent-Kung Adder

Received: May 19, 2015 / Accepted: June 21, 2015 / Published online: July 13, 2015

@ 2015 The Authors. Published by American Institute of Science. This Open Access article is under the CC BY-NC license. http://creativecommons.org/licenses/by-nc/4.0/

### 1. Introduction

A one-bit full adder is a very important leaf cell in the design of Application Specific Integrated Circuits. This paper gives guidelines for high level modelling in Verilog. We have modelled four different full adders in high level with Verilog and then we have made some comparison on them based on their area, delay and etc. We also have simulated and synthesized the adders. So in this paper we have introduced some basic concepts of description levels of HDLs like Verilog and also we have talked about simulation and synthesis.

Here we have modelled different full adders like Carry-Look-Ahead adder, Carry-Skip adder, Carry-Select adder and Brent-Kung adder using high level descriptions. This document consists of some parts as they are described below: The structures of the full adders are described in section 2, while the implementation, modelling and simulation methodology is explained in section 3. Next in section 4, the results are summarized. Finally, conclusions are drawn in section 5.

Here are the structures of different full adders such as: Carry-Look-Ahead adder, Carry-Select adder, Brent-Kung adder and also Carry-Skip adder.

A. CARRY-LOOKAHEAD ADDER (CLA)

CLA can generate all the carries in parallel. As following by applying the equations in Fig.1 part (a) recursively, all the  $C_{i+l}s$  can be generated based on  $G_i$ ,  $P_i$  and  $C_0$ .

$$\begin{split} &C_1 = G_0 + C_0 P_0 \\ &C_2 = G_1 + C_1 P_1 \\ &= G_1 + G_0 P_1 + C_0 P_0 \ P_1 \\ &C_3 = G_2 + C_2 P_2 \\ &= G_2 + G_1 P_2 + G_0 P_1 P_2 + C_0 P_0 \ P_1 P_2 ..... \\ &C_{i+1} = G_i + G_{i-1} P_i + G_{i-2} P_{i-1} P_i + ... + G_0 P_1 P_2 ... P_i + C_0 P_0 \\ &P_1 P_2 ... P_i ..... \\ &C_n = G_{n-1} + G_{n-2} P_{n-1} + ... + G_0 P_1 P_2 ... P_{n-1} + C_0 P_0 \ P_1 P_2 ... P_{n-1}. \end{split}$$

<sup>2.</sup> Descriptions of Four Different Full Adders

<sup>\*</sup> Corresponding author E-mail address: Negin.Mahani@uk.ac.ir

The circuit to generate  $C_{i+1}$ s is called Carry-Look-Ahead unit shown in part (b) of Fig.1.The final sum S can be computed once  $C_i$ s, i = 1, 2...N, are known. Part(c) in Fig.1 shows the structure of summation. One can see that in Equation above, the number of terms in the OR function is as big as n + 1, and in the last term the number of valuables in the AND function is also n + 1. Fan-in and fan-out parameters could be a problem in Carry-Look-Ahead adder, as the number of bits (n) increases. On the other hand, sequential generation of each Ci+l is too slow. Instead, block Carry-Look-Ahead adder (BCLA) can be adopted in which groups of carries are generated in parallel [1][6].

#### B. CARRY-SELECT ADDER

A Carry-Select adder is divided into parts that each of which accomplish two additions in concurrent, one assuming a carry-in of zero, the other a carry-in of one, except for the least significant. The 16-bit carry-select adder of Fig.2 is divided into sectors of lengths 1, 2, 3, 4, and 6. The 4-bit sector of Fig.2 (b) illustrates the common principle. Within the sector, there are two 4-bit ripple- carry adders take the same data inputs but different carry-ins. The upper adder has a carry-in of zero; the lower adder a carry-in of one. The real carry-in from the prior sector chooses one of the two adders. Comparing to a Ripple-Carry adder instead of having to ripple through four full adders, the carry now only has to pass through an individual multiplexer [7].



### (a) Carry Generate/Propagate Unit



(b) Carry-Lookahead Unit



# (c) Summation Unit

Fig. 1. Carry-Look-Ahead adder [10].



(a). 16-bit Carry-Select adder



Fig. 2. Carry-Select adder.

### C. BRENT-KUNG ADDER

The following picture shows a Brent-Kung full adder (Fig. 3.)[11]. The Brent-Kung adder has a gate level depth of  $O(log_2(n))$ . Adding two n-bit numbers is performed as adding

in parallel the lower halves of the addends, the lower halves with input carry 0, and the upper halves with input carry 1, then selecting the appropriate upper half sum based on the output carry from the lower half sum. Thus the gate level

depth of the n-bit adder is equal to the depth of the half-width adder plus the depth of a mux. The following is the formula that is the basis of this adder:

$$(g, p) = (g_m, p_m) \cdot (g_1, p_1) = (g_m + p_m \cdot g_1, p_m \cdot p_1)$$
 [2]

### D. CARRY-SKIP ADDER

To speed-up operation, propagation is skipped to position i without waiting for rippling. Operation time varies according to operands as in carry-complete addition. To implement Carry-Skip adder, stages are divided into blocks (Fig. 4.)[3].

Carry-Skip logic is added to each block to detect when carry-in the block can be passed directly to the next block. Define carry transfer:  $t_i = a_i + b_i$  carry skipping can be detected

for a block size of m as follows (carry propagates through all stages):

$$T_i \bullet T_{j+1} \dots T_{j+m-1} = 1 \ (= (a_i + b_i) \bullet (a_{j+1} + b_{j+1}) \dots)$$

This method takes into account both propagated and generated carries. Block size in carry-skip adder is very important (Fig.5.) Worst case operation time takes place when:

- Carry is generated in the first block
- Carry skips intermediate stages
- Carry is killed in the last block [4][8].



Fig. 3. Brent-Kung adder [3].



Fig. 4. Carry skipping.



Fig. 5. Carry-Skip logic.

| Α        | 0001 | .0000 | (9999 |       |       | (0000 | (0001 |
|----------|------|-------|-------|-------|-------|-------|-------|
| В        | 0001 | 6666  |       | (6667 | (6666 | (0000 | (0001 |
| C_IN     | 0    |       |       |       |       |       |       |
| SUM      | 0002 | 6666  | Xffff | (0000 | Xffff | (0000 | (0002 |
| C_OUT    | St0  |       |       |       |       |       |       |
| GEN_OUT  | St0  |       |       |       |       |       |       |
| PROP_OUT | StO  |       |       |       |       |       |       |
|          |      |       |       |       |       |       |       |

Fig. 6. Simulation result of Carry-Look-Ahead adder.



Fig. 7. Simulation result of Carry-Select adder.



Fig. 8. Simulation result of Brent-Kung adder.

| Ά     | 0001 | 0000 | (9999 |       |       | (0000 | (0001 |
|-------|------|------|-------|-------|-------|-------|-------|
| В     | 0001 | 6666 |       | (6667 | (6666 | (0000 | (0001 |
| 'C_IN | 0    |      |       |       |       |       |       |
| SUM   | 0002 | 6666 | Xffff | (0000 | Xffff | (0000 | (0002 |
| C_OUT | StO  |      |       |       |       |       |       |

Fig. 9. Simulation result of Carry-Skip adder.

## 3. Methodology

We have implemented our modules in Verilog. The modules are named as belonged to a special kind of adder such as: Carry-Look-Ahead adder, Carry-Skip adder, Carry-Select

adder and Brent-Kung adder using high level descriptions. Then by scripting we have simulated them and after all we have used Synplify and Leonardo to synthesize Brent-Kung adder. We have made optimization based on area/delay or both of them. At last we have made comparisons on them. In this

work we have used scripting for simulation with Modelsim (appendix1). The results of simulation and synthesis have come in the result section. Interesting observations could be

gotten of them. Appendix lalso has some parts of report files of the synthesis.



Fig. 10. Brent-Kung adder synthesis result by Synplify RTL view.

# 4. Results of Simulations and Synthesis Processes

The results of the simulations and synthesis are shown in Fig.6 to Fig.13. The result of Modelsim simulation of Carry-Look-Ahead adder is depicted in Fig. 6. Resulted wave forms of Modelsim simulation of Carry-Select, Brent-Kung and Carry-Skip adders are illustrated in Fig.7, Fig.8 and Fig.9 respectively.

Brent-Kung adder synthesis result by Synplify RTL view is in Fig.10. Brent-Kung adder synthesis result by Synplify technology view is in Fig.11. Brent-Kung adder synthesis result by Leonardo is in Fig.12. Brent-Kung adder synthesis result \_critical path\_ is in Fig.13.

# 5. Summary and Conclusion

In this paper we have showed how to implement codes in high level and also we have got familiar with modularity coding. We have used Modelsim for simulation and verification and also we have shown that we can use scripting for simulations. Most of the adders discussed in this paper are applicable to general purpose designs, with a few exceptions. The first exception is the Carry-Skip adder, which is the slowest adder for all bit sizes. It has also larger area requirements and higher active capacitance than the Ripple adder or the Transmission Gate adder for all bit sizes. It may always be replaced with either one of these adder structures. The second adder structure which can be always replaced is the Carry-Look-Ahead adder. But based on previous works for 8 to 32 bit circuits the Conditional-Sum adder has better results

than the Carry-Look-Ahead adder. For the 4-bit adder this adder structure can be replaced with the Transmission-Gate adder or with the Ripple adder, because both of these structures have better results [5].

Brent-Kung adder requires  $2(\log N - 1)$  stages. In Brent-Kung adder using binary tree for carry propagation leads to logarithmic delay. Its area is twice as large as ripple adder and layout of the cells is very compact. Once carry signals are ready, sum bits derived in constant time. It is good for wide adders. Carry-Look-Ahead adder calculates carries in advance. Limited fan-in for NAND gate is less than or equal to 5. It is impractical for Ci with i >4 and need 2-level CLA with block size k. CLA compared to Ripple-Carry adder is Faster, but delay is still linear and has larger area. The limitation is that it cannot go beyond 4 bits of look-ahead and large p, g fan-out

slows down carry generation. Carry-Select adder calculates two cases simultaneously. Sum computed in one step after the intermediate carry signals are ready. Area overhead is about an additional carry path and a multiplexer (not the whole adder) and about 30% more than a Ripple-Carry. Its delay is sub-linear [9]. According to the simulation and synthesis results, the adder topology which has the best compromise between area, delay and power dissipation is carry look-ahead adder and it is suitable for high performance and low-power circuits. So the fastest adder is Carry-Select with the penalty of area. Carry-Skip adder improves on the delay of a Ripple-Carry adder with little effort compared to other adders. Carry-Select adder is one of the fastest adders to perform arithmetic operations. From the structure of CSL adder there is a scope for reducing the area and delay.



Fig. 11. Brent-Kung adder synthesis result by Synplify technology view.



Fig. 12. Brent-Kung adder synthesis result by Leonardo.



Fig. 13. Brent-Kung adder synthesis result \_critical path.

| Ap | pe | nd | ix | 1 |
|----|----|----|----|---|
|----|----|----|----|---|

Sunplify synthesis reports of Brent-Kung adder

Brent-Kung plg file of synplify

@P: Worst Slack: -6.187

@P: System - Estimated Frequency: 61.8 MHz

@P: System - Requested Frequency: 100.0 MHz

@P: System - Estimated Period: 16.187

@P: System - Requested Period: 10.000

@P: System - Slack : -6.187

@P: Total Area: 103.0

Rev8(brent kung.htm)

Worst Path Information

Path information for path number 1:

Requested Period: 10.000 10.000

= Required time:

- Propagation time: 16.187

= Slack (critical): -6.187

Number of logic level(s): 12

B[15:0] / B[0] Starting point:

Ending point: Sum[15:0] / Sum[11]

The start point is clocked by System [rising]

The end point is clocked by System [rising]

Total path delay (propagation time + setup) of 16.187 is

2.867(17.7%) logic and 13.320(82.3%) route.

Brent kung.areasrr

Report for cell Brent\_Kung\_ADDER.verilog

Cell usage:

cell count count\*area area **IB33** 33 0.0 0.0

OB33PH 17 0.0 0.0

**TOTAL** 132 103.0

Leonardo synthesis reports of Brent-Kung adder

Auto optimization area

Cell: Brent Kung16 View: INTERFACE Library: work

\*\*\*\*\*\*\*\*\*\*\*\*\*\*

Cell Library References Total Area

AN21 CUB 15 x 9 gates

AND2 CUB 16 x 10 gates

EN1 CUB 15 x 12 gates

EO1 **CUB** 1 x 1 1 gates

IN2 CUB 32 x 0 10 gates

OA21 CUB 16 x 12 gates

ON21 CUB 1 x 1 1 gates

Number of ports: 50

Number of nets: 129

96 Number of instances:

Number of references to this view:

Total accumulated area:

54 Number of gates:

Number of accumulated instances:

Delay

Critical Path Report

Critical path #1, (path slack = 1.8):

| NAME          | GATE | ARRIVAL      | LOAD |
|---------------|------|--------------|------|
|               |      |              |      |
| A(0)/         | 0.00 | 0.00 up      | 0.11 |
| ipg16_ix1/Q   | AND2 | 0.29 0.29 up | 0.11 |
| ipg16_ix359/Q | IN2  | 0.13 0.41 dn | 0.05 |
| ipg16_ix7/Q   | OA21 | 0.40 0.81 dn | 0.14 |

| ix431/Q                | AN21 0.29 1      | .10 up 0.14 | slack 1.84                           |
|------------------------|------------------|-------------|--------------------------------------|
| ix3/Q                  | IN2 0.17 1.2°    | 7 dn 0.06   |                                      |
| ix452/Q                | AN21 0.30 1      | .57 up 0.14 | Delayoptimization                    |
| ix7/Q                  | IN2 0.17 1.74    | 1 dn 0.06   | delay                                |
| ix426/Q                | AN21 0.30 2      | .04 up 0.14 | data required time 10.00             |
| ix11/Q                 | IN2 0.17 2.2     | 1 dn 0.06   | data arrival time 7.99               |
| ix457/Q                | AN21 0.32 2      | .53 up 0.16 |                                      |
| ix15/Q                 | IN2 0.17 2.7     | 0 dn 0.06   | slack 2.01                           |
| ix421/Q                | AN21 0.30 3      | .01 up 0.14 | area                                 |
| ix19/Q                 | IN2 0.17 3.1     | 7 dn 0.06   | Cell Library References Total Area   |
| ix466/Q                | AN21 0.32 3      | .49 up 0.16 |                                      |
| ix23/Q                 | IN2 0.17 3.6     | 7 dn 0.06   | AN21 CUB 31 x 1 19 gates             |
| ix416/Q                | AN21 0.30 3      | .97 up 0.14 | AND2 CUB 16 x 1 10 gates             |
| ix27/Q                 | IN2 0.17 4.1     | 4 dn 0.06   | EN1 CUB 15 x 1 12 gates              |
| ix475/Q                | AN21 0.32 4      | .46 up 0.16 | EO1 CUB 1 x 1 1 gates                |
| ix31/Q                 | IN2 0.17 4.6     | 3 dn 0.06   | IN2 CUB 48 x 0 15 gates              |
| ix411/Q                | AN21 0.32 4      | .95 up 0.16 | ON21 CUB 1 x 1 1 gates               |
| ix35/Q                 | IN2 0.17 5.1     | 3 dn 0.06   |                                      |
| ix484/Q                | AN21 0.30 5      | .43 up 0.14 | Number of ports: 50                  |
| ix39/Q                 | IN2 0.17 5.6     | 0 dn 0.06   | Number of nets: 145                  |
| ix406/Q                | AN21 0.30 5      | .90 up 0.14 | Number of instances: 112             |
| ix43/Q                 | IN2 0.17 6.0     | 7 dn 0.06   | Number of references to this view: 0 |
| ix493/Q                | AN21 0.32 6      | .39 up 0.16 |                                      |
| ix47/Q                 | IN2 0.17 6.5     | 6 dn 0.06   | Total accumulated area:              |
| ix401/Q                | AN21 0.32 6      | .88 up 0.16 | Number of gates: 57                  |
| ix51/Q                 | IN2 0.17 7.0     | 5 dn 0.06   | Number of accumulated instances: 112 |
| ix502/Q                | AN21 0.30 7      | .36 up 0.14 | area optimization                    |
| ix55/Q                 | IN2 0.17 7.5     | 3 dn 0.06   | area                                 |
| ix396/Q                | AN21 0.30 7      | .83 up 0.14 | Cell Library References Total Area   |
| ix95/Q                 | EN1 0.33 8.      | 16 up 0.02  |                                      |
| S(15)/                 | 0.00 8.16 u      | p 0.00      | AN21 CUB 15 x 1 9 gates              |
| data arrival time      | 8.16             |             | AND2 CUB 16 x 1 10 gates             |
|                        |                  |             | EN1 CUB 15 x 1 12 gates              |
| data required time (de | fault specified) | 10.00       | EO1 CUB 1 x 1 1 gates                |
|                        |                  |             | IN2 CUB 31 x 0 10 gates              |
| data required time     | 10.              | 00          | OA21 CUB 16 x 1 12 gates             |
| data arrival time      | 8.16             |             | ON21 CUB 1 x 1 1 gates               |

-----

```
Number of ports:
                               50
                                                                 }
Number of nets:
                               128
Number of instances:
                                 95
                                                                set PrefMain(font) {
Number of references to this view:
                                                                  Courier 10 roman normal
Total accumulated area:
Number of gates:
                                54
                                                                # compilation
Number of accumulated instances:
                                                                vlib work
                                       95
Delay
                                                                 foreach {library file list} $library file list {
                                                                  foreach file $file list {
data required time
                                       10.00
                                                                   if [regexp {.vhdl?$} $file] {
data arrival time
                                                                    vcom -93 $file
                                      8.16
                                                                   } else {
                                                                    vlog $file
slack
                                  1.84
                                                                   }
                                                                  }
One sample script of these four adders
                                                                 }
Brent-Kung adder
.tcl file
                                                                # simulation
puts {
                                                                eval vsim $top_level
 Compile and Simulate script for divider
 Provided by Mahdi N. Bojnordi
                                                                # If waves are required
                                                                if [llength $wave_patterns] {
}
                                                                  noview wave
                                                                  foreach pattern $wave patterns {
cd ../sim/pre syn
                                                                   add wave $pattern
set library_file_list {
 design_library {../../model/Verilog/Brent Kung16.v}
                                                                  configure wave -signalnamewidth 1
 test_library {../../test/ Brent_Kung16_test.v }
                                                                  foreach {radix signals} $wave_radices {
                                                                   foreach signal $signals {
                                                                    catch {property wave -radix $radix $signal}
set top level work.brent test
                                                                   }
                                                                  }
set wave patterns {
 /*
                                                                # run the simulation
set wave_radices {
                                                                run -all
 hexadecimal {A,B,Cin,Sum,Cout}
```

```
puts {
 job finished.
                                                                 // Make instances to calculate carries for each 4-bit part
                                                                 carry c( part_gen, part_prop, c_in, part_carry );
                                                                 assign c_out = part_carry[3];
Verilog source codes for adders
**********
                                                                 // make 4-bit adders to do additions
Carry-Look-Ahead adder with test bench
                                                                 assign sum[3:0] = a[3:0] + b[3:0] + c_in;
                                                                 assign sum[7:4] = a[7:4] + b[7:4] + part carry[0];
module cla 16(a, b, c in, sum, c out, gen out, prop out);
                                                                 assign sum[11:8] = a[11:8] + b[11:8] + part carry[1];
input [15:0] a, b;
                    // numbers to add
                                                                 assign sum[15:12] = a[15:12] + b[15:12] + part_carry[2];
input c in;
                                                                 endmodule
                  // carry in
output [15:0] sum;
                      // sum
                    // carry of 16-bit addition
                                                                 module carry(gens in, props in, c in, carries);
output c_out;
                                                                 input [3:0] gens_in; // generate for each of 4 parts
output gen_out;
                     // generate of 16-bit addition
output prop out;
                     // propagate of 16-bit addition
                                                                 input [3:0] props in; // propagate for each of 4 parts
                                                                 input c in;
                                                                                   // carry in
wire [3:0] part carry; // calculated carry out of
                                                                  output [3:0] carries; // carry out for each of 4 parts
              //each 4-bit part
wire [3:0] part_gen; // generate of each 4-bit part
                                                                 function [3:0] get carries;
wire [3:0] part_prop; // propagate of each 4-bit part
                                                                 input [3:0] gens_in, props_in;
       // Make instances to calculate generates and
                                                                 input c in;
propagates.
                                                                 reg [3:0] carries;
       // First 4 instances calculate generate and propagate
                                                                 integer i;
       // for each 4-bit part. Last instance calculates
       // generate and propagate for all 16 bits.
                                                                 begin
                                                                   for (i = 0; i \le 3; i = i + 1)
gen_prop gp0( (a[3:0] & b[3:0]), (a[3:0] | b[3:0]),part_gen[0],
                                                                     if (i == 0)
part prop[0]);
                                                                       carries[i] = gens in[i] | props in[i] & c in;
                                                                     else
gen prop gp1( (a[7:4] & b[7:4]), (a[7:4] | b[7:4]),part gen[1],
                                                                       carries[i] = gens_in[i] | props_in[i] &
part prop[1]);
                                                                                             carries[i-1];
                                                                   get carries = carries;
                                                  (a[11:8]
gen_prop
            gp2(
                    (a[11:8]
                                &
                                     b[11:8]),
b[11:8]),part_gen[2], part_prop[2]);
                                                                 end
                                                                 endfunction
gen_prop gp3( (a[15:12] & b[15:12]), (a[15:12] |
b[15:12]),part_gen[3], part_prop[3]);
                                                                 assign carries = get_carries(gens_in, props_in, c_in);
                                                                 endmodule
gen prop gp(part gen, part prop, gen out, prop out);
```

```
module gen prop(gens in, props in, gen out, prop out);
                                                           trireg PROP OUT;
                                                                                // propagate of 16-bit addition
input [3:0] gens in; // generate for each of 4 parts
                                                           cla 16 adder1 (A, B, C IN, SUM, C OUT, GEN OUT,
                                                           PROP OUT);
input [3:0] props in; // propagate for each of 4 parts
output gen_out, prop_out;
                                                           initial
                                                           begin
function [1:0] get gen and prop;
                                                              $monitor("%0d SUM = %b A = %b B = %b C IN = %b
input [3:0] gens_in, props_in;
                                                           C OUT = \%b, GEN OUT = \%b, PROP OUT = \%b",
reg prop, gen;
                                                                         $time, SUM, A, B, C IN, C OUT, GEN OUT,
integer i;
                                                           PROP_OUT);
begin
                                                             for (i = 0; i \le 3; i = i + 1) begin
                                                             B = 16'b0110011001100110;
   if (i == 0) begin
                                                             C IN = 0;
     gen = gens_in[i];
                                                             #10 A = 16'b1001100110011001;
     prop = props in[i];
                                                             #10 B = 16'b0110011001100111;
   end else begin
                                                             #10 C_IN = 0; B = 16'b0110011001100110;
     gen = gens in[i] | props in[i] & gen;
                                                             #10 A = 16'b0000000000000000; B =
                                                           16'b00000000000000000;
     prop = props in[i] & prop;
                                                             #10 C IN = 0; A = 16'b0000000000000001; B =
   end
                                                           16'b00000000000000001;
 end
                                                             #10 $finish;
                                                           end
get_gen_and_prop = {gen, prop};
                                                           endmodule
end
endfunction
                                                           Brent-Kung adder with test bench
assign {gen_out, prop_out} = get_gen_and_prop(gens_in,
                                                           module black (pg, pg0, pgo);
props in);
endmodule
                                                              input [1:0] pg, pg0;
                                                             output [1:0] pgo;
/////simulation
module cla test;
                                                             assign pgo[1] = pg[1] & pg0[1];
                                                             assign pgo[0] = (pg0[0] \& pg[1]) | pg[0];
                  // numbers to add
reg [15:0] A, B;
reg C IN;
                 // carry in
                                                           endmodule
trireg [15:0] SUM; // sum
                                                           trireg C OUT;
                   // carry of 16-bit addition
                                                           module gray (pg, pg0, pgo);
trireg GEN OUT;
                     // generate of 16-bit addition
```

```
assign pg1 = \{(A[1] \land B[1]), (A[1] \& B[1])\};
  input [1:0] pg;
                                                                   assign pg0 = \{(A[0] \land B[0]), (A[0] \& B[0])\};
  input pg0;
                                                                 endmodule
  output pgo;
                                                                 assign pgo = (pg0 \& pg[1]) | pg[0];
                                                                 module Brent Kung16 (A, B, Cin, S, Cout);
                                                                    input [15:0] A, B;
endmodule
                                                                   input Cin;
output [15:0] S;
module xor16 (A, B, S);
                                                                   output Cout;
  input [15:0] A, B;
                                                                   // First generate the propigate and generate signals for each
                                                                 bit
  output [15:0] S;
                                                                    wire [1:0] r1c16, r1c15, r1c14, r1c13, r1c12, r1c11, r1c10,
                                                                 r1c9;
  assign S = A \wedge B;
                                                                   wire [1:0] r1c8, r1c7, r1c6, r1c5, r1c4, r1c3, r1c2, r1c1;
endmodule
                                                                   pg16
ipg16(.A(A), .B(B), .pg15(r1c16),.pg14(r1c15),.pg13(r1c14),
module pg16 (A, B, pg15, pg14, pg13, pg12, pg11, pg10, pg9,
                                                                      .pg12(r1c13),.pg11(r1c12),.pg10(r1c11),.pg9(r1c10),.pg
pg8, pg7, pg6, pg5, pg4, pg3, pg2, pg1, pg0);
                                                                 8(r1c9),
  input [15:0] A, B;
                                                                      .pg7(r1c8),.pg6(r1c7),.pg5(r1c6),.pg4(r1c5),.pg3(r1c4),
  output [1:0] pg15, pg14, pg13, pg12, pg11, pg10, pg9, pg8,
                                                                      .pg2(r1c3),.pg1(r1c2),.pg0(r1c1));
pg7, pg6, pg5, pg4, pg3, pg2, pg1, pg0;
                                                                   // First row
  assign pg15 = \{(A[15] \land B[15]), (A[15] \& B[15])\};
                                                                    wire [1:0] r2c15, r2c13, r2c11, r2c9, r2c7, r2c5, r2c3;
  assign pg14 = \{(A[14] \land B[14]), (A[14] \& B[14])\};
                                                                    wire r2c1;
  assign pg13 = \{(A[13] \land B[13]), (A[13] \& B[13])\};
  assign pg12 = \{(A[12] \land B[12]), (A[12] \& B[12])\};
                                                                   black ir1c15(.pg(r1c15), .pg0(r1c14), .pgo(r2c15));
  assign pg11 = \{(A[11] \land B[11]), (A[11] \& B[11])\};
                                                                   black ir1c13(.pg(r1c13), .pg0(r1c12), .pgo(r2c13));
  assign pg10 = \{(A[10] \land B[10]), (A[10] \& B[10])\};
                                                                   black ir1c11(.pg(r1c11), .pg0(r1c10), .pgo(r2c11));
  assign pg9 = \{(A[9] \land B[9]), (A[9] \& B[9])\};
                                                                   black ir1c9(.pg(r1c9), .pg0(r1c8), .pgo(r2c9));
  assign pg8 = \{(A[8] \land B[8]), (A[8] \& B[8])\};
                                                                   black ir1c7(.pg(r1c7), .pg0(r1c6), .pgo(r2c7));
  assign pg7 = \{(A[7] \land B[7]), (A[7] \& B[7])\};
                                                                   black ir1c5(.pg(r1c5), .pg0(r1c4), .pgo(r2c5));
  assign pg6 = \{(A[6] \land B[6]), (A[6] \& B[6])\};
                                                                   black ir1c3(.pg(r1c3), .pg0(r1c2), .pgo(r2c3));
  assign pg5 = \{(A[5] \land B[5]), (A[5] \& B[5])\};
                                                                   gray ir1c1(.pg(r1c1), .pg0(Cin), .pgo(r2c1));
  assign pg4 = \{(A[4] \land B[4]), (A[4] \& B[4])\};
  assign pg3 = \{(A[3] \land B[3]), (A[3] \& B[3])\};
                                                                   // Second row
  assign pg2 = \{(A[2] \land B[2]), (A[2] \& B[2])\};
                                                                    wire [1:0] r3c15, r3c11, r3c7;
```

```
wire r3c3;
                                                               // Finaly produce the sum
                                                               xor16
                                                             ixor16(.A({r5c15,r7c14,r6c13,r7c12,r5c11,r7c10,r6c9,r7c8,r4
black ir2c15(.pg(r2c15), .pg0(r2c13), .pgo(r3c15));
                                                             c7,r7c6,
black ir2c11(.pg(r2c11), .pg0(r2c9), .pgo(r3c11));
black ir2c7(.pg(r2c7), .pg0(r2c5), .pgo(r3c7));
                                                             r6c5, r7c4, r3c3, r7c2, r2c1, Cin), .B({r1c16[1], r1c15[1], r1c14[
gray ir2c3(.pg(r2c3), .pg0(r2c1), .pgo(r3c3));
                                                             r1c13[1],r1c12[1],r1c11[1],r1c10[1],r1c9[1],r1c8[1],r1c7[1],r
// Third row
                                                             1c6[1],
wire [1:0] r4c15;
                                                                  r1c5[1],r1c4[1],r1c3[1],r1c2[1],r1c1[1]}), .S(S));
wire r4c7;
                                                               // Generate Cout
black ir3c15(.pg(r3c15), .pg0(r3c11), .pgo(r4c15));
                                                               gray gcout(.pg(r1c16), .pg0(r5c15), .pgo(Cout));
gray ir3c7(.pg(r3c7), .pg0(r3c3), .pgo(r4c7));
                                                             endmodule
// Fourth row
                                                             wire r5c15, r5c11;
                                                             module brent test;
gray ir4c15(.pg(r4c15), .pg0(r4c7), .pgo(r5c15));
                                                                  reg [15:0] a,b;
                                                                                   // numbers to add
gray ir6c11(.pg(r3c11), .pg0(r4c7), .pgo(r5c11));
                                                                  reg cin; // carry in
                                                                  wire[15:0] s;
                                                                                   //sum
// Fifth row
                                                                  wire cout; //cout
wire r6c13, r6c9, r6c5;
                                                                  Brent Kung16 bk16(a, b, cin, s, cout);
gray ir5c13(.pg(r2c13), .pg0(r5c11), .pgo(r6c13));
gray ir5c9(.pg(r2c9), .pg0(r4c7), .pgo(r6c9));
                                                                  initial
gray ir5c5(.pg(r2c5), .pg0(r3c3), .pgo(r6c5));
                                                                    begin
                                                                       a=16'b1010101010101010;
// Sixth row
                                                                       b=16'b0101010101010101;
wire r7c14, r7c12, r7c10, r7c8, r7c6, r7c4, r7c2;
                                                                       cin=1'b1;
gray ir6c14(.pg(r1c14), .pg0(r6c13), .pgo(r7c14));
                                                                       #10 b=16'b1010101010101010;
gray ir6c12(.pg(r1c12), .pg0(r5c11), .pgo(r7c12));
                                                                       #20 a=16'b0101010101010101:
gray ir6c10(.pg(r1c10), .pg0(r6c9), .pgo(r7c10));
                                                                       #30 cin=1'b0;
gray ir6c8(.pg(r1c8), .pg0(r4c7), .pgo(r7c8));
                                                                    end
gray ir6c6(.pg(r1c6), .pg0(r6c5), .pgo(r7c6));
                                                             endmodule[8]
gray ir6c4(.pg(r1c4), .pg0(r3c3), .pgo(r7c4));
gray ir6c2(.pg(r1c2), .pg0(r2c1), .pgo(r7c2));
                                                             Carry-Select adder
```

```
wire [3:0] w4;
module fulladder(s,cout,a,b,cin);
                                                                       wire [4:0] w7;
 input a,b,cin;
                                                                       wire [4:0] w8;
 output s,cout;
                                                                       wire [4:0] w9;
                                                                       wire [4:0] w10;
 assign s = ((a \land b) \land cin);
                                                                       wire [4:0] w11;
 assign cout = ((a \& b) | (cin \& (a \land b)));
                                                                       wire [8:0] w12;
                                                                       wire cout1;
endmodule
                                                                       wire w5;
wire w6;
module adder4(s, cout, a, b, cin);
   input [3:0] a;
   input [3:0] b;
                                                                adder4 a1(s[3:0], cout1, a[3:0], b[3:0], cin);
   input cin;
                                                                adder4 a2(w1[3:0], w1[4], a[7:4], b[7:4], c0);
   output [3:0] s;
                                                                adder4 a3(w2[3:0], w2[4], a[7:4], b[7:4], c1);
                                                                adder4 a4(w3[3:0], w5, a[11:8], b[11:8], c0);
   output cout;
                                                                adder4 a5(w4[3:0], w6, a[11:8], b[11:8], c1);
  fulladder fulladder0(s[0],cout0, a[0], b[0], cin);
                                                                adder4 a6(w7[3:0], w7[4], a[15:12], b[15:12], c0);
  fulladder1(s[1],cout1, a[1], b[1], cout0);
                                                                adder4 a7(w8[3:0], w8[4], a[15:12], b[15:12], c1);
  fulladder fulladder2(s[2],cout2, a[2], b[2], cout1);
  fulladder fulladder3(s[3],cout, a[3], b[3], cout2);
                                                              assign w9[4:0] = cout1 ? \{w2[4:0]\} : (~ cout1) ?
                                                               {w1[4:0]} : 5'b00000;
                                                              assign w10[4:0] = w6 ? \{w8[4:0]\} : (\sim w6) ? \{w7[4:0]\} :
endmodule
                                                              5'b00000;
assign w11[4:0]= w5 ? \{w8[4:0]\} : (\sim w5) ? \{w7[4:0]\} :
module csa (s, cout, a, b, cin, c0, c1);
                                                              5'b00000;
        input [15:0] a;
                                                                assign x0 [8:4] = w10 [4:0];
        input [15:0] b;
                                                                assign x0 [3:0] = w3 [3:0];
        input cin,c0,c1;
                                                                assign x1[8:4] = w11[4:0];
        output [15:0] s;
                                                                assign x1[3:0] = w4[3:0];
        output cout;
        wire [8:0] x0;
                                                               assign w12[8:0] = w9[4] ? {big in 1}: (\sim w9[4]) ? {x0}:
                                                               9'b0 0000 0000;
        wire [8:0] x1;
        wire [4:0] w1;
                                                                assign s[7:4] = w9[3:0];
        wire [4:0] w2;
                                                                assign s[15:8] = w12[7:0];
        wire [3:0] w3;
```

```
assign cout = w12[8];
                                                            output sum;
                                                            output po;
 endmodule
                                                            input a;
  module test_csa;
                                                            input b;
                                                            input c;
            reg [15:0] a;
            reg [15:0] b;
                                                            assign sum=a^b^c;
            reg cin,c0,c1;
                                                            assign po=a|b;
            wire [15:0] s;
                                                            assign carry=a&b|c&po;
            wire cout;
                                                          endmodule
                                                          module csblock(cout,Sum,A,B,cin);
           csa csa1 (s, cout, a, b, cin, c0,c1);
            initial
                                                            output cout;
            begin
                                                            output [3:0] Sum;
              a=16'b1010101010101010;
        b=16'b0101010101010101;
                                                            input [3:0] A;
        cin=1'b1;
                                                            input [3:0] B;
                                                            input cin;
              c0 = 1'b0;
                                                            wire [3:0] P,C;
              c1 = 1'b1;
                                                          padder a0 (C[0],Sum[0],P[0],A[0],B[0],cin);
              #10 b=16'b1010101010101010;
                                                          padder a1 (C[1],Sum[1],P[1],A[1],B[1],C[0]);
        #20 a=16'b0101010101010101;
                                                          padder a2 (C[2],Sum[2],P[2],A[2],B[2],C[1]);
                                                          padder a3 (C[3],Sum[3],P[3],A[3],B[3],C[2]);
        #30 cin=1'b0;
                                                          assign cout=C[3]|(cin&P[0]&P[1]&P[2]&P[3]);
            end
                                                          endmodule
      endmodule[9]
                                                          module csa(cout,Sum,A,B,cin);
Carry-Skip adder
                                                            output cout;
module padder(carry,sum,po,a,b,c);
                                                            output [15:0] Sum;
                                                            input [15:0] A;
 output carry;
                                                            input [15:0] B;
```

```
input cin;
 wire [3:0] carries;
csblock b0 (carries[0],Sum[3:0],A[3:0],B[3:0],cin);
csblock b1 (carries[1],Sum[7:4],A[7:4],B[7:4],carries[0]);
csblock b2 (carries[2],Sum[11:8],A[11:8],B[11:8],carries[1]);
csblock b3 (cout,Sum[15:12],A[15:12],B[15:12],carries[2]);
endmodule
module csa test;
            reg [15:0] A;
            reg [15:0] B;
            reg cin;
            wire [15:0] Sum;
            wire cout;
            csa csa1 (cout,Sum,A,B,cin);
             initial
              begin
                   a=16'b1010101010101010;
              b=16'b0101010101010101;
              cin=1'b1;
```

c0 = 1'b0;

c1 = 1'b1;

```
#10 b=16'b101010101010101010;

#20 a=16'b01010101010101010;

#30 cin=1'b0;

end

endmodule[10]
```

### References

- [1] B. Gilchrist et al., "Fast Carry Logic for Digital Computers," IRE Trans. EC-4, pp. 133-136, 1955.
- [2] B. Gilchrist, J. H. Pomerene and S. Y. Wong, "Fast Carry Logic for Digital Computers", IRE Trans. Elec. Comp., Vol. EC-4, no 4, pp. 133-136, 1955.
- [3] J. F. Kruy, "A Fast Conditional Sum Adder Using Carry Bypass Logic," AFIPS Con5 Proceedings, Vol. 27, FJCC, pp. 695-703, 1965
- [4] M. Lehman, and N. Burla, "Skip Techniques for High-speed Carry-Propagation in Binary Arithmetic Units," IRE Trans. EC-10, No. 4, pp. 691-698, 1961.
- [5] H. Ling, "High-speed Binary Parallel Adder," IEEE Trans. Comp., EC-15, No.5, pp. 799-802. 1966.
- [6] Mi Lu John, "Arithmetic and logic in computer systems" Wiley & Sons, Ap - Computers - 246 Pages Pp 61-63, 2005
- [7] Sarika A. Parate, Prof- R. N. Mandavgane, "Review of delay and power efficient Carry-Select adder using pipeline", International Engineering Journal For Research & Development, Volume 2 Issue 1, pp. 16-22, 2012.
- [8] Chan, P.K. Schlag, M.D.F.; Thomborson, C.D.; Oklobdzija, V.G.," Delay optimization of carry-skip adders and block carry-lookahead adders" Computer Arithmetic, 1991. Proceedings., 10th IEEE Symposium, pp 154 - 164 Grenoble 1991
- [9] EE 5324–VLSI Design II Kia Bazargan University of Minnesota Part II: Adders Pp 68, retrieved from http://www.ece.umn.edu/users/kia/Courses/EE5324/index.html adders. pdf, 2014.
- [10] High Speed Adder, retrieved from http://l3.elfak.ni.ac.rs/viewvc/int\_sab\_two\_level\_cl/doc/backg round/HighSpeedAdder.pdf?revision=1.1, 2014.
- [11] Brent-Kung Adder, retrieved from :http://en. academic. ru/dic.nsf/enwiki/1292234, 2014.