# **Approximate XOR/XNOR-based Adders for Inexact Computing**

Zhixi Yang, Ajaypat Jain, Jinghang Liang, Jie Han and Fabrizio Lombardi

Abstract—Power dissipation has become a significant issue for integrated circuit design in nanometric CMOS technology. To reduce power consumption, approximate implementations of a circuit have been considered as a potential solution for applications in which strict exactness is not required. In inexact computing, power reduction is achieved through the relaxation of the often demanding requirement of accuracy. In this paper, new approximate adders are proposed for low-power imprecise applications. These adders are based on XOR/XNOR gates with multiplexers implemented by pass transistors. The proposed approximate XOR/XNOR-based adders (AXAs) are evaluated and compared with respect to energy consumption, delay, area and power delay product (PDP) with an accurate full adder. The metric of error distance is used to evaluate the reliability of the approximate designs. Simulation by Cadence's Spectre in TSMC 65nm process has shown that the proposed designs consume less power and have better performance (such as a lower propagation delay) compared to the accurate XOR/XNOR-based adder, while the error distance remains similar or better than other approximate adder designs.

*Index Terms*—Approximate computing, Inexact computing, Approximate adders, Error distance, Low power, Adders

#### I. INTRODUCTION

In highly integrated nanoscale designs, reliability issues resulting from PVT (process, voltage and temperature) variations, aging effects and soft errors have become major impediments for leveraging the benefits of a lower device scaling; moreover, leakage and static power are significant concerns for the high power consumption encountered at such high density. A potential solution to lower power dissipation is to employ approximate circuit designs [1].

Commonly used multimedia applications have digital signal processing (DSP) blocks as core. Most of these DSP blocks implement algorithms, in which the ultimate output is either an image or a video for human presentation and analysis. For example, the limited perception of human vision allows the outputs of these algorithms to be numerically approximate rather than accurate [2]. The relaxation on numerical exactness provides at least some freedom to perform imprecise or approximate computation. The development of imprecise, but simplified arithmetic units can provide an extra layer of power saving over conventional low-power design techniques such as using a lower supply voltage.

As basic building blocks in many digital circuits, adders have been investigated for approximate implementations.

These include approximate mirror adders (AMAs) [3] and the lower part OR adder (LOA) [4]. This paper proposes three new approximate adders (AXAs); they are based on area and power efficient designs using XOR and XNOR gates with multiplexers implemented by pass transistors. A reduction in logic complexity is accomplished at transistor level by removing some of the transistors required in the accurate adder design. Additionally, the node capacitances and thus dynamic power are reduced to lower the power/energy consumption of the proposed circuits. In this paper, delay, energy consumption, area and power-delay product are measured for comparing the different designs with an accurate adder. Also, the metric of error distance [5] is used to compare the proposed designs with other approximate adders. Extensive simulation results are provided to show the effectiveness of the proposed designs.

This paper is organized as follows. Section II presents a brief review. Section III presents the three new AXAs, followed by a comparative study in Section IV. Section V concludes the paper.

#### II. REVIEW

In this section, two approximate full adder designs are reviewed. The first is the *approximate mirror adder* (AMA); this design is obtained from a logic reduction at transistor level from the mirror adder (MA), a widely used implementation of an accurate full adder. Three approximate mirror adders have been presented in [3] by removing some transistors and attaining a lower power dissipation and circuit complexity. This results in a faster charging/discharging process of the node capacitance, thus incurring a shorter delay.

The second approximate adder is referred to as the *lower* part OR adder (LOA) [4]. In the LOA, OR gates are used to approximately compute the less significant bits (referred to as the lower part) of the sum. An additional AND gate is used to generate the carry-in for the more significant bits when both inputs to the most significant bit adder in the lower part are "1." Most carries are ignored in the lower part module of the LOA, thus it results in a loss of precision.

For these approximate designs, a metric must be used to assess the approximation with respect to the correct (exact) result; the so-called error distance has been proposed in [5] as figure of merit for inexact computing. For a given input, the *error distance* (ED) is defined as the arithmetic distance between an inexact output a and the correct output b as:

 $ED(a, b) = |a - b| = |\sum_i a[i] * 2^i - \sum_j b[j] * 2^j|,$  (1) where *i* and *j* are the indices for the bits in *a* and *b*, respectively. For example, the two erroneous values "01" and "00" have an ED of 1 and 2 with the correct (exact) value "10." As this paper primarily deals with single bit adders, the metric of *total error distance* (TED) is defined as the sum of the EDs for all the inputs of a full adder. The TED is then used to assess the effectiveness of the proposed designs.

<sup>\*</sup>Research supported in part by NSERC and MITACS in Canada.

Z. Yang, J. Liang and J. Han are with the Department of Electrical and Computer Engineering at the University of Alberta, Edmonton, AB, Canada T6G 2V4. (tel/fax: +1-780-412-1361/1811; e-mail: jhan8@ualberta.ca).

A. Jain is with Birla Institute of Technology & Science, Pilani, India and was a summer intern at the University of Alberta, Edmonton, AB, Canada.

F. Lombardi is with the Department of Electrical and Computer Engineering at Northeastern University, Boston, MA 02115, USA. (e-mail: lombardi@ece.neu.edu)

#### III. PROPOSED APPROXIMATE ADDERS

In this section, an accurate adder on which the approximate adders are developed is first introduced, followed by the approximate designs.

#### 3.1 Accurate XOR/XNOR-based Adders

The proposed approximate XOR/XNOR-based adder 1 (i.e., AXA1) is based on the 10-transistor full adder in [6], while AXA2 and AXA3 are based on the accurate design in [7]. As shown in Fig. 1, the adder in [7] is based on four-transistor (4T) XNOR gates; the total number of transistors in this adder is 10. X, Y and  $C_{in}$  are inputs; I is an internal signal.



Fig. 1. Accurate full adder with 10 transistors [7].

#### 3.2 Approximate XOR-based adder 1 (AXA1)

Fig. 2 shows the first approximate adder. In this design, the XOR operation is achieved by an inverter and two pass transistors connected to *X* and *Y* respectively. When *Y* is "1",  $I = \overline{X}$ ; otherwise, I = X; i.e.,  $I = X \bigoplus Y$ . Both *Sum* and  $C_{out}$  are accurate for 4 out of the total 8 input combinations. The total error distance achieved with this design is 4, as shown in the truth table of Table 1. The transistor count for this design is 8. The functions of *Sum* and  $C_{out}$  are given by:

$$Sum = C_{in},$$

$$C_{out} = \overline{(X \oplus Y)C_{in} + \overline{X}\overline{Y}}.$$
(2)
(3)



Fig. 2. Approximate XOR-based Adder 1 (AXA1).

#### 3.3 Approximate XNOR-based Adder 2 (AXA2)

The design in Fig. 3 implements an approximate adder with 6 transistors; it consists of a 4-transistor XNOR gate and a pass transistor block. *Sum* is accurate for 4 out of the 8 input combinations, while  $C_{out}$  is accurate for all input combinations. The total error distance for this design is also 4, as shown in Table 1. The functions of *Sum* and  $C_{out}$  are given by:

$$Sum = \overline{(X \oplus Y)},\tag{4}$$

$$C_{out} = (X \oplus Y)C_{in} + XY.$$
<sup>(5)</sup>

For the *Sum* and  $C_{out}$  signals, some transitions do not have a full swing; this is due to the threshold voltage drop in some of the pass transistors utilized in the design.



Fig. 3. Approximate XNOR-based Adder 2 (AXA2).

## 3.4 Approximate XNOR-based Adder 3 (AXA3)

This design in Fig. 4 is an extension of AXA2; it uses 2 more transistors in a pass transistor configuration for a better accuracy of *Sum*. In total, there are 8 transistors, 4 of which are utilized in the XNOR gate. *Sum* is accurate for 6 out of the total 8 input combinations, while  $C_{out}$  is accurate for all possible configurations; the total error distance achieved by this design is 2, as shown in the truth table of Table 1.



Fig. 4. Approximate XNOR-based Adder 3 (AXA3).

The functions of *Sum* and  $C_{out}$  are given by:

$$Sum = \overline{(X \oplus Y)}C_{in},\tag{6}$$

$$C_{out} = (X \oplus Y)C_{in} + XY. \tag{7}$$

Table 1 shows a summary of the truth table and error distance of the AXAs, in comparison with the accurate design.

#### IV. SIMULATION COMPARISON

A comparison of each approximate design and the accurate full adder is pursued with respect to energy consumption, delay, area and power delay product (PDP).

Simulation is performed using Cadence's Spectre in TSMC 65nm process, for which 1.0V is used as the standard supply voltage (Vdd). A load of four inverters is utilized, but its energy consumption is uncounted for in the evaluation of all adders. Inputs are provided by independent voltage sources. Since some inputs drive the outputs, the energy provided by the input signals is included in the simulation results.

Since dynamic power consumption is significantly larger than static power, the energy consumptions (ECs) of AXA1, AXA2, AXA3 and the accurate adder (ACA) in [7] due to dynamic transitions are shown in Fig. 5. It is computed by integrating the product of the current from a voltage source and the supply/input voltage during the transition time interval. All possible 64 transitions for different input combinations are considered; if an input combination is followed by itself, then no transition occurs, so the dynamic energy is considered to be zero. As can be seen, ACA has the

| v v |   | C   | Accurate adder |     | AXA1 |     | AXA2 |      | AXA3 |    |      |     |    |
|-----|---|-----|----------------|-----|------|-----|------|------|------|----|------|-----|----|
| л   | I | Cin | Cout           | Sum | Cout | Sum | ED   | Cout | Sum  | ED | Cout | Sum | ED |
| 0   | 0 | 0   | 0              | 0   | 0    | 0   | 0    | 0    | 1    | 1  | 0    | 0   | 0  |
| 0   | 0 | 1   | 0              | 1   | 0    | 1   | 0    | 0    | 1    | 0  | 0    | 1   | 0  |
| 0   | 1 | 0   | 0              | 1   | 1    | 0   | 1    | 0    | 0    | 1  | 0    | 0   | 1  |
| 0   | 1 | 1   | 1              | 0   | 0    | 1   | 1    | 1    | 0    | 0  | 1    | 0   | 0  |
| 1   | 0 | 0   | 0              | 1   | 1    | 0   | 1    | 0    | 0    | 1  | 0    | 0   | 1  |
| 1   | 0 | 1   | 1              | 0   | 0    | 1   | 1    | 1    | 0    | 0  | 1    | 0   | 0  |
| 1   | 1 | 0   | 1              | 0   | 1    | 0   | 0    | 1    | 1    | 1  | 1    | 0   | 0  |
| 1   | 1 | 1   | 1              | 1   | 1    | 1   | 0    | 1    | 1    | 0  | 1    | 1   | 0  |

Table 1. TRUTH TABLE AND ERROR DISTANCE (ED) OF AXAs

largest energy consumption among all adders. AXA1 is the second in energy consumption while AXA3 has a rather significant reduction in energy compared to the adder in [7].



In Fig. 5, AXA1 has a few values of EC higher than the accurate design. This occurs, because the output of the pass transistor connected to  $C_{in}$  is also connected to an inverter. The gate signal of this pass transistor is connected to the output of the other pass transistors (i.e., the internal signal *I*). The weakness of *I* to pull up/down  $\overline{C_{out}}$  to Vdd/GND fast and in a full scale, makes the inverter to have a relatively large short leakage current. Thus, the EC is relatively large.

Fig. 5 also shows that the EC of AXA2 is higher than AXA3. However, AXA2 has only 6 transistors, while AXA3 has 8. While in general more transistors result in higher energy consumption, it is also a function of the load capacitance. In AXA2, the *Sum* signal is an output connected to the load of four inverters and two pass transistors, while in AXA3, the signal I (the equivalent signal of *Sum* in AXA2) is connected to only 4 pass transistors, i.e., with a smaller capacitance. Therefore, less power/energy in consumed.

For AXA1, there is no delay for *Sum* because *Sum* is directly connected to  $C_{in}$ . For the other designs, AXA3 has a longer delay than the accurate adder, whereas AXA2 has a shorter delay. However, the average delay of *Sum* is significantly shorter than the delay of  $C_{out}$  for all of the adders. The delay of  $C_{out}$  is shown in Fig. 6 for all adders (in log scale). Consider as an example AXA2 and ACA in [7]. If the input combination is 010 followed by 110, then *Sum* should be 1, ideally at Vdd; however due to the threshold voltage drop of the pass transistors, the voltage for *Sum* is reduced to 700~800mV (i.e.,  $Vdd - V_{tn}$ , where  $V_{tn}$  is the threshold voltage drop) at a Vdd

of 1V. In AXA2, the *Sum* shares the same signal as the gate of the next stage pass transistor, thus at a reduced signal strength.  $C_{out}$  does not show a fast transition due to the weakness in pulling the NMOS to Vdd, thus causing a considerable  $C_{out}$  delay. The *Sum* delay can be explained using a similar argument. As shown in Fig. 6, ACA and AXA2 have almost the same  $C_{out}$  delay, while AXA1 has a significantly lower delay. AXA1 is the fastest adder in terms of the  $C_{out}$  delay. When the input combination is different while *Sum* or  $C_{out}$  remains the same (for instance, 001 and 010), the delay is zero, so these transitions are ignored in the comparison.



The power delay product (PDP) is calculated using the  $C_{out}$  delay and the dynamic power to evaluate the performance of these circuits, as shown in Fig. 7. As can be seen, all AXAs show a better PDP than the ACA, while AXA1 has the best performance in terms of PDP due to its short delay in carry propagation.



Fig. 7. Power delay product (PDP) for the ACA [7] and AXAs

Fig. 8 shows the layouts of the accurate and approximate adders. It can be seen that the complexity of the layout is dominated by the number of transistors in each design. However, the area of AXA1 is larger than AXA3, although they have the same number of transistors. This occurs because for AXA1, the four NMOS transistors cannot be placed in the same diffusion. Although for AXA3 the diffusion is not the same for NMOS, it has fewer separate pieces. Therefore, the NMOS transistors in AXA1 must be placed at more different diffusions, thus resulting in a larger area.



Fig. 8. Layouts of the ACA [7] and AXAs.

Table 2 shows the performance summary of AXA1, AXA2, AXA3 and the accurate adder [7]. AXA1 has the best performance in carry propagation delay with relatively large static power dissipation. AXA2 has the least area and small dynamic power dissipation; however, its delay is rather large. AXA3 has the best performance in terms of dynamic power dissipation with moderate static power consumption, area and propagation delay.

The error distance (ED) and transistor count of the proposed designs are compared with AMAs and LOA. As shown in Table 3, a similar or better total error distance is achieved by the proposed designs using a significantly smaller number of transistors. This comparison is favorable for the LOA, because the ED of the LOA depends on the number of lower bits. Therefore, its ED increases with the number of lower bits and is significantly lower for single bit addition.

### V. CONCLUSION

This paper presents the design and evaluation of new approximate adders based on four-transistor XOR/XNOR gates. Comparison with an accurate adder shows that by trading off a very small level of accuracy, significant savings in transistor count and power can be obtained. The evaluation of these designs also shows a better propagation delay than an accurate adder (except for the carry out delay of AXA2) and hence when cascaded for long word addition, these adders are faster. In summary, AXA1 has the best performance (i.e., the shortest delay), AXA2 uses the smallest area and AXA3 is the most power-efficient design with the shortest error distance. Overall, AXA1 has the lowest power-delay product (PDP).

As the use of pass transistors causes the outputs not to have a full voltage swing, which leads to a reduction in noise margin, the utilization of additional drivers may be needed at the output of these adders. Nevertheless, the proposed approximate adders are viable alternatives to existing designs for applications in which a lower accuracy can be tolerated with improvements in other metrics (such as power dissipation and transistor count) for nanoscale implementation. Future work will include the investigation of the application of these adders in other arithmetic circuits such as multipliers [8]. A detailed analysis of the reliability of the proposed approximate adders will be considered by using techniques such as those in [9] and [10].

 Table 2. PERFORMANCE COMPARISON OF AXA1, AXA2 AND

 AXA3 WITH THE ACCURATE ADDER (ACA)

| AAAS WITH THE ACCURATE ADDER (ACA) |       |       |       |            |                                         |       |       |
|------------------------------------|-------|-------|-------|------------|-----------------------------------------|-------|-------|
| Metric                             | AXA1  | AXA2  | AXA3  | ACA<br>[7] | Improvement (%) for<br>AXA1, AXA2, AXA3 |       |       |
| Transistor<br>Count                | 8     | 6     | 8     | 10         | 20.00                                   | 40.00 | 20.00 |
| Static<br>power<br>(nW)            | 72.82 | 19.33 | 30.77 | 56.02      | -29.99                                  | 65.45 | 45.07 |
| Dynamic<br>power<br>(uW)           | 3.872 | 3.448 | 3.234 | 4.657      | 15.22                                   | 25.07 | 30.57 |
| Delay for<br>Sum (ps)              | 0     | 20.16 | 61.82 | 35.98      | 100.0                                   | 43.96 | -71.8 |
| Delay for<br>C <sub>out</sub> (ps) | 60.7  | 254.9 | 159.7 | 253.9      | 76.09                                   | -0.39 | 37.09 |

| Table 3 | 3. COMF | ARISON   | OF TRA | ANSISTOR | COUNT . | AND ER | ROR |
|---------|---------|----------|--------|----------|---------|--------|-----|
| D       | ISTANC  | E FOR VA | ARIOUS | APPROXI  | MATE A  | DDERS  |     |

| Design   | Transistor Count | Total Error<br>Distance |
|----------|------------------|-------------------------|
| LOA [4]  | 8                | 4                       |
| AMA1 [3] | 16               | 3                       |
| AMA2 [3] | 14               | 3                       |
| AMA3 [3] | 11               | 4                       |
| AXA1     | 8                | 4                       |
| AXA2     | 6                | 4                       |
| AXA3     | 8                | 2                       |

#### REFERENCES

- J. Han and M. Orshansky, "Approximate computing: an emerging paradigm for energy-efficient design," in ETS'13, May 2013.
- [2] R. Hegde and N.R. Shanbhag, "Soft digital signal processing," IEEE Trans. VLSI Syst., vol. 9, no. 6, pp. 813–823, 2001.
- [3] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, K. Roy, "IMPACT: IMPrecise adders for low-power approximate computing," ISLPED 2011, Aug. 1-3, 2011.
- [4] H.R. Mahdiani, A. Ahmadi, S.M. Fakhraie, C. Lucas, "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Trans. on Circuits and Systems I: Regular Papers, vol. 57, no. 4, pp. 850-862, April 2010.
- [5] J. Liang, J. Han and F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders," IEEE Trans. Computers, 2013, in press. Advance access at IEEEXplore.
- [6] H.A. Mahmoud and M.A. Bayoumi, "A 10-transistor low-power high-speed full adder cell," ISCAS'99, vol. 1, pp. 43-46, 1999.
- [7] J.-F Lin, Y.-T. Hwang, M.-H. Sheu and C.-C Ho, "A novel high-speed and energy efficient 10-transistor full adder design", IEEE Trans. on Circuits and Systems-I: Regular Papers, Vol. 54, No.5, May 2007.
- [8] C. Liu, J. Han and F. Lomardi, "A high-performance approximate multiplier with configurable partial error recovery," technical report, University of Alberta, 2013.
- [9] J. Han, H. Chen, J. Liang, P. Zhu, Z. Yang and F. Lombardi, "A stochastic computational approach for accurate and efficient reliability evaluation," IEEE Trans. Computers, 2013, in press. Advance access at IEEEXplore.
- [10] H. Chen, J. Han and F. Lombardi, "A transistor-level stochastic approach for evaluating the reliability of digital nanometric CMOS circuits," DFT 2011, Vancouver, BC, Canada, pp. 60-67, 2011.