High speed modified carry save adder using a structure of multiplexers

Received Jun 4, 2020 Revised Aug 20, 2020 Accepted Nov 4, 2020 Adders are the heart of data path circuits for any processor in digital computer and signal processing systems. Growth in technology keeps supporting efficient design of binary adders for high speed applications. In this paper, a fast and area-efficient modified carry save adder (CSA) is presented. A multiplexer based design of full adder is proposed to implement the structure of the CSA. The proposed design of full adder is employed in designing all stages of traditional CSA. By modifying the design of full adder in CSA, the complexity and area of the design can be reduced, resulting in reduced delay time. The VHDL implementations of CSA adders including (the proposed version, traditional CSA, and modified CSAs presented in literature) are simulated using Quartus II synthesis software tool with the altera FPGA EP2C5T144C6 device (Cyclone II). Simulation results of 64-bit adder designs demonstrate the average improvement of 17.75%, 1.60%, and 8.81% respectively for the worst case time, thermal power dissipation and number of FPGA logic elements.


INTRODUCTION
Binary addition is a very basic and important operation in digital computer system. Adder circuits are used in implementing most of the other arithmetic operations. It is becoming increasingly important to enhance the data path units' performance to meet grows of high performance processor. The use of the adder unit in implementing most arithmetic operations increases the need to have low power, high speed and small area design of adder [1][2][3][4][5][6][7]. Adders are primarily used as basic units in implementing digital signal processing (DSP) systems for applications like design of analog to digital converters (ADCs) and design of digital filters [8][9][10][11][12][13]. Different techniques are used in the design of a multi bit binary adder, for example ripple carry adder (RCA), carry increment adder (CIA), carry skip adder (CSkA), carry look ahead adder (CLA), carry select adder (CSlA), carry save adder (CSA), and carry bypass adder (CBA) [14][15][16][17][18].
The very basic algorithm for multi-bit binary addition is RCA. RCA is the simplest but not the best way to build the multi-bit binary addition, through the use of N cascading full adders to add two N-bit numbers. Accumulated delay is produced in RCA adder as bits number N is large through the rippling of the carry in the N cascading full adders. On the other hand, CSA is one of the best architecture used in solving the delay problem associated with RCA. CSA is a powerful architecture for fast multi-operand addition adder. CSAs are primarily used with array multipliers to build the process of accumulating the partial products [19][20][21][22][23]. The major goal of this work is to provide a fast, and area efficient modified implementation of the CSA. The rest of the paper is organized as follows: section 2 deals with the related work. Section 3 describes the implementation of the proposed CSA. Finally, in section 4 and 5 simulation results and conclusion are discussed.

RELATED WORK 2.1. Traditional CSA
A traditional CSA uses a ladder of full adder units to build the structure of addition. Unlike basic structure of RCA, Saving the carry to a next level of addition is used to reduce the rippling of carry in CSA [24]. CSA is fast but not area and power efficient because it uses a large number of full adders including RCA uses in the final stage. The weakness in CSA is in its final stage which is basically designed with a RCA. Figure 1 shows the general structure of four operands N-bit traditional CSA. The basic implementation of 1-bit full adder used in the structure of traditional CSA is founded in accordance to the following equations: Figure 1. The general structure of four operands N-bit traditional CSA

CSA using CLA
To solve the delay caused by RCA in the final stage of traditional CSA, a modified version of CSA was presented in [24]. The modified version was done through the use of CLA in implementing the final stage of traditional CSA. All the full adders keep using the same implementation of full adder used with traditional CSA as mentioned previously in equations 1, and 2. The full adders in the last stage are modified to fit the implementation of CLA according to the following equations: The general structure of the modified CSA using CLA is shown in Figure 2. The use of CLA adder aims to speed up the process of adding. The modified CSA in [10] is faster than the traditional CSA, but still not area efficient implementation.

CSA using modified full adder
The authors in [25] use the same structure of a traditional CSA with a modified design of full adder. The suggested design of full adder is based on using two (4 to 1) multiplexers. The design aims to reduce the delay time of traditional CSA and present area efficient implementation. The design of full adder based on multiplexer used in [25] is shown in Figure 3. Using the structure of MUXs to build the full adders in CSA participates in speeding up the process of adding. The generated CSA is faster than the traditional CSA, but also still not area and power efficient.

PROPOSED WORK
It is clearly seen that full adder represent the basic unit design in the structure of implementing the traditional CSA. Accordingly, the proposed CSA design relies on using a modified structure of full adder. The suggested structure of full adder aims to reduce the power, the area, and the delay time for the CSA design. A new sum and carry generating architecture for full adder is used with a structure of three (2 to 1) multiplexers. Based on the basic truth table of 1-bit full adder shown in Table 1, an implementation of Sum and Carry outputs could be done as following: The (X ⊕ Y) could be simply implemented with one 2 to 1 multiplexer as follow: Sum could be generated as output of 2 to 1 multiplexer using (X ⊕ B) as a selector: Carry could be generated as output of 2 to 1 multiplexer using (X ⊕ B) as a selector:  The general block diagram of 1-bit multiplexer based design of full adder using three (2 to 1) multiplexers is shown in Figure 4. Based on the suggested architecture of full adder, the proposed implementation of the CSA is replacing all the units of full adders in the traditional structure of CSA with the suggested one shown in Figure 4.

SIMULATION RESULTS
This study demonstrates the addition of different design structures for CSA. Two different sizes of adders, i.e., 8-bit and 64-bit four binary operands for the implemented designs including (traditional CSA, Modified CSA [24], Modified CSA [25], and proposed version of CSA) are simulated to do the performance analysis in terms of time delay, power dissipation, and number of logic elements.
To increase the reliability of measures, all the designs were carried out using Quartus II synthesis software tool with VHDL to obtain simulation results. Altera FPGA EP2C5T144C6 device (Cyclone II) was selected and worst case time, thermal power dissipation reports, and number of logic elements were used to demonstrate the performance of all the designs. The simulation results for 8-bit and 16-bit CSA design structures discussed in this paper are shown in Table 2. Comparisons between the discussed designs in terms of worst case time, thermal power dissipation reports, and number of logic elements were presented as follows: Figure 5 shows the comparison of worst case time results, and this reveals that the proposed CSA design offers better performance than others. When compared to existing designs, the 8-bit proposed version of the CSA has around 6.17% improvement in worst case time and the 64-bit proposed version has around 17.75% improvement in worst case time.

Total thermal power dissipation
The power dissipation results of all the designs are shown in Figure 6. It is evident that the proposed CSA requires slightly lower power dissipation than other available designs. The 8-bit proposed version of the CSA has around 0.06% improvement in the total thermal power dissipation and the 64-bit proposed version has around 1.60% improvement in the total thermal power dissipation.  Figure 7 shows the total number of logic element resources used in the selected FPGA device and the results reveal that the second version of proposed CSA offers a lower number of logic elements than other available designs. The 8-bit proposed version of the CSA has around 4.17% improvement in the number of total logic elements and the 64-bit proposed version has around 8.81% improvement in the number of total logic elements.

CONCLUSION
In this paper we have presented a new structure to implement high speed, and area efficient CSA. A design of multiplexer based full adder with three (2 to 1) multiplexers is used to implement the proposed structure of CSA. The aim of the proposed design is reducing the complexity and area of implementation to reduce the delay time. A detailed comparison of worst case time, thermal power dissipation, and number of logic elements between the proposed implementation and the related designs is done. Simulations in Quartus II synthesis software tool with VHDL demonstrate that the proposed version of CSA is faster than the existing designs. Simulation results of 64-bit adder designs demonstrate the average improvement of 17.75%, Int J Elec & Comp Eng ISSN: 2088-8708  High speed modified carry save adder using a structure of multiplexers (Ahmed Salah Hameed) 1597 1.60%, and 8.81% respectively for the worst case time, thermal power dissipation and number of FPGA logic elements. We accomplished the best results by a design of multiplexer based full adder in which a fast and reduced area implementation is achieved.