# Logic-in-Memory (LiM) with a Non-Volatile Programmable Metallization Cell (PMC)

Pilin Junsangsri, Student IEEE, Jie Han, Member IEEE and Fabrizio Lombardi, Fellow IEEE

Abstract-This paper introduces two new cells for Logic-in-Memory (LiM) operation. The first novelty of these cells is the resistive RAM configuration that utilizes a Programmable Metallization Cell (PMC) as non-volatile element. CMOS transistors and ambipolar transistors are used as processing and control elements for the logic operations of the LiM cells. The first cell employs ambipolar transistors and CMOS in its logic circuit (7T2A1P), while the second LiM cell uses only MOSFETs (9T1P) to implement logic functions such as AND, OR and XOR. The operational mode of the proposed cells is voltage-based, so different from previous designs in which a LiM cell operates on a current-mode. Extensive simulation results using HSPICE are provided for the evaluation of these cells; comparison shows that the proposed two cells outperform previous LiM cells in metrics such as logic operation delays, PDP, circuit complexity, write time and output swing.

*Index Terms*—Logic-in-Memory, Emerging Technology, Programmable Metallization Cell (PMC), HSPICE, Non-volatile Memory.<sup>\*</sup>

#### I. INTRODUCTION

Data transfer between memory and processing units is a major concern in today's high performance digital systems [1]. The different speed of memory compared to processors and the communication delay between them severely limit the computational efficiency in many applications that are data intensive [2]. This issue is further compounded by the drastic increases of static power dissipation in nanoscale CMOS [1] [3] and the length of global interconnections in advanced VLSIs [3]. The hardware design of a different type of memory known as Logic-in-Memory (LiM) has been proposed to alleviate these problems. LiM moves some parts of the computation tasks directly into the memory array, while still retaining compatibility with external chips as though a traditional memory interface. This type of memory design also supports programming environments by which computational operations are hidden behind the memory abstraction. In a LiM scheme, the embedded logic and the memory core are integrated at cell level, thereby reducing the data transfer between memory and processor and thus improving performance [2]. LiM is valuable for power critical and data dominated applications. Irregular and unpredictable

memory access patterns (thus not always allowing to exploit the memory hierarchy effectively) with a substantial amount of data transfer to be avoided [2], are also accommodated by LiM. The functions of the embedded logic of LiM are usually very limited, but they can be very efficient for some applications (such as image processing).

This paper introduces two LiM cells that utilize a resistive element for non-volatile storage [3][4]. A Resistive RAM (RRAM) consisting of a transistor and a Programmable Metallization Cell (PMC) is used as nonvolatile circuit, while either a hybrid (made of MOSFETs and ambipolar transistors) or a fully CMOS-based circuit is added for implementing the logic functions (such as AND, OR and XOR). The operational mode of the proposed cells is voltage based, so different from previous works [3] in which the LiM cell operates on a current-mode. Extensive simulation results using HSPICE are provided for the evaluation of these cells and [3]; comparison between these cells shows that the proposed two cells outperform [3] in many metrics such as logic operation delays, PDP, circuit complexity, write time and output swing.

#### II. REVIEW

This section reviews the technology and state-of-the-art works as relevant to the proposed LiM cells.

## A. Programmable Metallization Cell (PMC)

The Programmable Metallization Cell (PMC) also known as the Conducting Bridge Random Access Memory (CBRAM) is a resistive switching non-volatile element based on the migration of metallic ions through a solid electrolyte and the subsequent formation and dissolution of a metallic *conductive filament* (CF) connecting the two electrodes [5].



Fig. 1. Switching processes of a PMC a) the CF vertically grows prior to set process, b) the CF laterally dissolves prior to reset process

The set (OFF to ON state transition) and the reset (ON to OFF state transition) processes of a PMC device are shown in Figure 1.

P. Junsangsri and F. Lombardi are with the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA; <u>{lombardi@ece.neu.edu}</u>. J. Han is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2V4; Email: jhan8@ualberta.ca.

- Under a positive bias, the top active electrode is oxidized, and the fast metal ions (Ag<sup>+</sup> or Cu<sup>2+</sup>) drift toward the bottom electrode and form the CF. Thus, the CF vertically grows until it reaches the top electrode, at which time the set occurs. Following the set, the CF grows laterally and its diameter continues to increase, because more metal ions are present around it [6].
- For the reset process, when a negative voltage bias occurs across the PMC (Figure 1b), the CF tends to laterally dissolve, because the enhanced lateral electric field is at the top of the CF [6]. The reset is completed when the diameter of the conductive filament shrinks down to zero at the top electrode. After the reset, the CF vertically dissolves and its height keeps decreasing.

So, the switching process of a PMC has a *transition point* that occurs whenever the tip of the CF touches, or separates from the top electrode. The resistance of a PMC is dependent on the CF height (h) and the CF radius (r) for finding the ON and OFF-state resistances ( $R_{on}$  and  $R_{off}$ ). The OFF state occurs when the tip of the conductive filament is separated from the top electrode; in this case, h is less than the film thickness of the solid electrolyte or the height of the PMC (L). Once h is found, the OFF-state resistance ( $R_{off}$ ) is given by the sum of two resistors in series [6] as

$$R_{off} = (\rho_{on}h + \rho_{off}(L - h))/A$$
(1)

where  $\rho_{on}$  is the CF resistivity,  $\rho_{off}$  is the non-conducting solid-electrolyte resistivity, L is the film thickness of the solid electrolyte and A is the area at the bottom of the CF (on the assumption that it is cylindrical before the set process).

The ON-state resistance of a PMC ( $R_{on}$ ) occurs when the tip of the CF touches the top electrode; the resistance value is based on the CF radius (r). As the shape of the conductive filament is conical, then the cell resistance of a PMC in the ON-state is as follows

$$R_{\rm on} = \rho_{\rm on} L / (\pi r R) \tag{2}$$

where R is the radius at the bottom of the CF.

The significant advantage of a PMC is the very large resistance range compared with other resistive element technologies (such as a MTJ and the memristor) [7][8]; however, programming of a PMC requires a voltage whose value is usually larger than the supply voltage of nanoscale CMOS (this voltage is denoted as V<sub>dh</sub>). Another advantage of a PMC is its relative smaller size compared with other non-volatile resistive elements, such as MTJ and PCM.

### B. Ambipolar Transistors

Different from a traditional (unipolar silicon) CMOS device whose behavior (either p-type or n-type) is determined at fabrication, ambipolar devices can be operated in a switched mode (from p-type to n-type, or vice versa) by changing the gate bias [9][10]. Ambipolar conduction is characterized by the superposition of electron and hole currents; this behavior has been experimentally reported in different emerging technologies. An ambipolar transistor can be used to control the direction of the current based on the voltage at the so-called *polarity gate*. A 4-terminal ambipolar transistor (Double Gate MOSFET, or DG-FET) is utilized in this paper. The second gate (referred to as the Polarity Gate,

PG) controls its polarity, i.e. when PG is set to logic '0', the ambipolar transistor behaves like an NMOS; when PG is set to logic '1', it behaves like a PMOS [11]. The symbol and the modes of operation of the ambipolar transistor used in this paper are shown in Figure 2.



Fig. 2. Ambipolar transistor, a) Symbol, b) Characteristics

In the technical literature and to the best knowledge of the authors, there is no HSPICE compatible model to fully simulate the behavior of an ambipolar transistor. [11] has presented a model of an ambipolar transistor; this model consists of a PMOS and a NMOS. The operation of the ambipolar transistor is verified at functional macroscopic level by utilizing a circuit that has equivalent switching characteristics as its physical implementation. In this paper, the model of Figure 3 is utilized at macroscopic level for simulating the characteristics of an ambipolar transistor. It uses two ideal switches and two MOSFETs.



Fig. 3. Model of ambipolar transistor

The behavior of the ambipolar transistor is based on the voltage at its polarity gate. If the voltage at node PG is GND, switch Sw1 is ON, while Sw2 is OFF; the ambipolar transistor behaves as an NMOS. However if the voltage at the polarity gate is  $V_{DD}$ , switches Sw1 and Sw2 are OFF and ON respectively. The ambipolar transistor behaves as a PMOS. Several ambipolar-based gates (NAND and NOR) have been proposed in [11]; their performance (delay and power dissipation) has been shown to be superior to the CMOS counterparts [11].



Fig. 4. a) Input Images b) Output Images when AND, OR, XOR operations between the two input images and the inverse operation of input 1 are executed

#### III. LOGIC-IN-MEMORY

Logic-In-Memory (LiM) is a processing paradigm that exploits the large volume of storage found in today's computing systems for performance improvements in specific applications. An application suitable for LiM is image processing; the pixels of an image are stored in memory and data from another image (that could be also stored in memory) is then provided as input for processing.

Figure 4 shows two input images; output images as obtained by processing on a pixel basis the two input images using different logic operations (such as AND, OR, XOR, and NOT) are also shown. The advantage of LiM is that processing is performed locally in memory, so not incurring in the delay due to the movement of data with the processor. However, only some processing capabilities can be provided in each memory cell and applications that compute based on SIMD, are best fitted for LiM.

LiM has been analyzed also with respect to non-volatile memories such as those utilizing magnetic tunnel junctions (MTJs) [3]. Non-volatile memories can then be utilized together with CMOS-based gates for LiM.



Fig. 5. General Structure of MTJ-based LiM cell of [3]

Figure 5 shows the general structure of the MTJ-based LiM of [3]; it consists of 3 parts; a cross-coupled keeper (CCK), a logic-circuit tree and a dynamic current source (DCS). The CCK generates the complementary binary outputs (z and z') in accordance with a magnitude comparison between two current signals ( $I_{Z}$  and  $I_{Z}$ ). The precise current difference is found by using the feedback circuit. The use of the DCS makes it possible to cut off the steady current from V<sub>DD</sub> to GND, thus resulting in a low-power dissipation. Logic circuits are realized by programming the configuration of the logic-circuit tree [3]; 14 transistors, 2 MTJs devices and a capacitor are required for processing by a two-input AND and a two-input OR gates. These two different gates are generated by changing the wired-connection points of the logic-circuit tree. [3] has shown that LiM cells can be used to implement a full adder; however, the logic operations of [3] are fixed, so resulting in a considerable circuit complexity (as measured by the number of required transistors). [12] has presented a 2 input LUT using LiM to address these concerns. This circuit requires 16 CMOS transistors, 4 MTJs devices, 1 reference resistor and a capacitor. Flexibility in logic operations is therefore improved, but the issue of circuit complexity still remains. [13] has presented a synchronous non-volatile logic gate design based on resistive switching memories. The principles of this LiM circuit are similar to [3]. The LiM circuit of [13] requires more transistors than [3], i.e. 38 CMOS transistors and 4 resistive devices are required in [13] for implementing a full adder circuit (substantially more than [3] that requires 32 transistors, 4 MTJs and 2 capacitors).

# IV. PROPOSED PMC-BASED LOGIC-IN-MEMORY

Next, two PMC-based LiM cells are proposed; the programmable metallization cell (PMC) is used as non-volatile storage element, while CMOS transistors (as well as ambipolar transistors) are used as control/processing elements. The operations of these cells are voltage-based, so different from the current-mode of previous LiM schemes [3] [12].



Fig. 6. General structure of the proposed (PMC-based) LiM cell

Figure 6 presents the general structure of the proposed PMC-based LiM cell. The memory is a Resistive RAM (RRAM) that consists of a transistor and a programmable metallization cell (PMC), so 1T1P. The voltage at node D corresponds to the data stored in the PMC, while its complementary value (DN) is generated by using an inverter. The logic circuit of the LiM cell is then designed using different schemes. In the first scheme, ambipolar transistors are employed in the proposed cell to implement some of the logic functions for LiM. The second scheme is CMOS-based and implements the AND/OR/XOR/Inverter (AOXI) functions as part of the logic circuit of Figure 6.

Throughout this manuscript, the proposed cells are simulated using HSPICE as simulation tool, while the model of [14] is employed for simulating the PMC. The resistance range of the PMC is given by  $30k\Omega - 100Meg\Omega$ . The largest values for the CF height (L) and CF radius (R) of the PMC are given by 1.5nm and 25.2nm respectively, while the threshold CF height (h<sub>th</sub>) and the radius (r<sub>th</sub>) of the PMC [14] are selected as 1.45nm and 0.225 nm respectively. Therefore, the OFF state resistance of the PMC is given by 99.958Meg\Omega, while the ON state resistance of the PMC is given by 30.063k\Omega. Unless otherwise specified, a 32nm CMOS feature size is assumed (with a supply voltage of 0.9V).

#### 1) Write Operation

The write operation for LiM starts by setting the voltage at BL and Ctrl2; the voltage at WL is at  $V_{DD}$ . When there is the required voltage difference across the PMC, the write operation starts. To improve the write time of the PMC, the supply voltage must be increased. In this paper, the supply voltage used in the simulation is given by 3.3V and the time of the write '1' (write '0') operation is 4.8ps (21.081ps). The write time of this LiM is 21.081ps (as corresponding to the worst case).



Fig. 7. First proposed (ambipolar-based) LiM cell



Fig. 8. Write time ss supply voltage of the proposed PMC-based LiM

Figure 8 shows the relationship between the write time of the proposed PMC-based LiM versus the provided supply voltage. When increasing the supply voltage, the voltage difference across the PMC increases, so the write time of the proposed PMC-based LiM is also reduced.





Fig. 9. Voltage at node D in the read operation for a '1' as data stored in the PMC

Figure 9 shows the voltage at D of the proposed PMC when a '1' is stored as data in the cell. As the PMC resistance in state '1' ('0') is low (high), the voltage at D increases to  $V_{DD}$  (remains at GND). The read delay is 20.43ps.

#### A. Ambipolar-based LiM

Figure 7 shows the first proposed LiM cell; two ambipolar transistors are utilized with the MOSFETs. So in addition to the ambipolar transistors, 7 MOSFETs and 1 PMC are required in the cell of Figure 7, i.e. it is a 7T2A1P cell. The LiM cell operates as follows. The data stored as PMC resistance is read as voltage at node D by setting the voltage at lines WL and Ctrl2 to GND and  $V_{DD}$  respectively. If a '0'

('1') is stored in the cell, the voltage at D is at GND ( $V_{DD}$ ). The input data is given by the voltages at nodes XA and XO and by precharging the voltage at node OUT ( $V_{OUT}$ ) to  $V_{DD}$  prior to starting any logic operation. Next, the simulation results for the cell in Figure 7 are presented.

1) AND Function: For the AND operation, the voltages at XCont and XO are always set to GND (0V), transistors MXOR and ML2 are OFF and ON respectively. The input (as voltage at XA) is then ANDed with the stored data (voltage at D). The only case for the voltage at OUT to remain at its value ( $V_{DD}$ ) occurs when both voltages at D and XA are at  $V_{DD}$ . So, when the voltages at D and XA are at  $V_{DD}$ . So, when the voltages at D and XA are at  $V_{DD}$ , DN and XAB are at GND and transistors ML1 and ML3 are OFF. As transistor MXOR is also OFF, then there is no direct path between the match line (OUT) and GND, thus the voltage at OUT retains at its value.

Figure 10 shows the voltage at D, the precharged voltage and the output voltage when a '1' is stored in the PMC cell and a '0' is provided as input data. The PMOS transistor is used to precharge OUT to  $V_{DD}$  prior to start a logic operation and the RRAM is read. So after reading the data stored in the RRAM (occurring at 20ps), the gate voltage of the precharged transistor (i.e. the voltage at node Pre) is at  $V_{DD}$ . The voltages at OUT and for precharging are separated and depending on the stored and input data, the AND operation is then performed.



Fig. 10. AND operation between a '1' stored in the PMC and '0' as input data

TABLE I. PERFORMANCE OF PROPOSED LIM CELL WHEN OPERATING THE AND FUNCTION

| D       | XA | OUT    | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---------|----|--------|------------|------------|----------------------------|
| 0       | 0  | 0      | 43.471     | 12.793     | 5.5613                     |
| 0       | 1  | 0      | 32.207     | 11.437     | 3.6835                     |
| 1       | 0  | 0      | 37.066     | 17.634     | 6.5363                     |
| 1       | 1  | 1      | 20.02      | 19.201     | 3.8439                     |
| Average |    | 33.191 | 15.266     | 4.906      |                            |

Table I shows the delay, power dissipation and power delay product (PDP) of the proposed LiM cell when the AND operation is executed. Note that the delay is measured from the start of the read operation for the PMC till the output voltage reaches a stable state. The worst case delay is 43.471ps and occurs when both the stored and input data values are '0'.

2) OR Function: For the OR operation, the voltages at XA and XCont are at  $V_{DD}$  and GND respectively and transistors

ML1 and MXOR are OFF. The only condition for which the voltage at OUT is discharged to GND is when the stored and input data are '0'. The voltages at DN and XOB are at  $V_{DD}$  and transistors ML2 and ML3 are ON, i.e. the voltage at OUT is discharged to GND.

TABLE II. PERFORMANCE OF PROPOSED LIM CELL WHEN OPERATING THE OR FUNCTION

| D | XA      | OUT | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---|---------|-----|------------|------------|----------------------------|
| 0 | 0       | 0   | 32.207     | 11.437     | 3.6835                     |
| 0 | 1       | 1   | 20.05      | 0.38001    | 0.076192                   |
| 1 | 0       | 1   | 20.02      | 19.201     | 3.8439                     |
| 1 | 1       | 1   | 20.03      | 12.029     | 2.4094                     |
|   | Average |     | 23.077     | 10.762     | 2.503                      |

Table II shows the delay, power dissipation and power delay product (PDP) when the OR operation is executed; the worst case delay is 32.207ps.

3) *XOR Function:* For the XOR operation, the voltages at XCont and XOB are at  $V_{DD}$  and GND respectively and the transistors MXOR and ML2 are ON and OFF. As mentioned previously, the behavior of an ambipolar transistor is regulated by the voltage at its polarity gate. If the voltage at the polarity gate is  $V_{DD}$  (GND), then the ambipolar transistor behaves as a PMOS (NMOS). Hence, an ambipolar transistor can operate as an XOR gate [11]. So, when an ambipolar transistor operates as a PMOS and its gate voltage is at GND, then the voltage at OUT is not discharged to GND. However, there is still a voltage drop across the ambipolar transistor; a second ambipolar transistor is used to address this problem, i.e. an NMOS behaving ambipolar transistor, such that in the discharging process, the voltage at OUT is at GND.

TABLE III. PERFORMANCE OF PROPOSED LIM CELL WHEN OPERATING THE XOR FUNCTION

| D       | XA | OUT    | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---------|----|--------|------------|------------|----------------------------|
| 0       | 0  | 0      | 40.273     | 12.437     | 5.009                      |
| 0       | 1  | 1      | 20.05      | 0.39056    | 0.078308                   |
| 1       | 0  | 1      | 20.05      | 17.215     | 3.4517                     |
| 1       | 1  | 0      | 36.095     | 14.546     | 5.2505                     |
| Average |    | 29.117 | 11.147     | 3.447      |                            |

Table III shows the delay, power dissipation and power delay product (PDP) for the XOR operation. The worst case delay is 40.273 ps.

# B. CMOS-based LiM

This section presents the second proposed LiM cell; this cell still utilizes a PMC as a non-volatile storage element, while only CMOS transistors are used as control and processing elements.



Fig. 11. Second proposed (CMOS-based) LiM cell (9T1P)

Figure 11 shows the proposed cell that implements the AND/OR/XOR/Inverter (AOXI) logic function. This cell requires 9 MOSFETs and 1 PMC, i.e. it is 9T1P. The data stored in the PMC is read by setting the voltage at lines WL and Ctrl2 to GND and  $V_{DD}$  respectively; if this data is '0' ('1'), the voltage at D is at GND ( $V_{DD}$ ). As in the previous proposed design, prior to any logic operation, the voltage at OUT ( $V_{OUT}$ ) is precharged to  $V_{DD}$ . The input voltages are provided at XA, XO, Cinv and ContX, such that the AND/OR/XOR/Inverter function between the stored and input data is generated.

1) AND Function: For the AND operation, the voltages at Cinv, ContX, and XO are always at GND (0V). Transistors Minv and ML5 are OFF while transistor ML2 is ON. The AND operation depends on the value of the input data given by the voltage at XA.

TABLE IV. PERFORMANCE OF PROPOSED CMOS-BASED LIM CELL FOR AND FUNCTION

| D | XA      | OUT | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---|---------|-----|------------|------------|----------------------------|
| 0 | 0       | 0   | 29.94      | 14.411     | 4.3148                     |
| 0 | 1       | 0   | 32.67      | 11.234     | 3.6701                     |
| 1 | 0       | 0   | 34.942     | 16.779     | 5.8629                     |
| 1 | 1       | 1   | 20.03      | 16.664     | 3.3379                     |
|   | Average |     | 29.3955    | 14.772     | 4.29643                    |

The voltage at OUT remains at its value ( $V_{DD}$ ) only when D and XA are at  $V_{DD}$  (DN and XAB are at GND). Transistors ML1 and ML3 are OFF; as transistor ML5 is also OFF, then there is no direct path between OUT and GND, and OUT retains its value. For the other conditions, transistor ML2 is always ON; so depending on the voltages at DN and XAB, if transistor ML1 or ML3 is ON, then a direct path between the supply voltage and GND exists, i.e. the output voltage is discharged to GND. The simulation results (Table IV) show that the worst delay of the proposed cell is 34.942ps, so better than for the first proposed cell.

2) OR Function: For the OR operation, the voltages at Cinv, ContX, are always at GND (0V) while the voltage at XA is at  $V_{DD}$ . Transistors Minv, ML3, and ML5 are OFF, while the input signal is provided at XO.

TABLE V. PERFORMANCE OF PROPOSED CMOS-BASED LIM CELL FOR OR FUNCTION

| D       | XA | OUT    | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---------|----|--------|------------|------------|----------------------------|
| 0       | 0  | 0      | 32.67      | 11.234     | 3.6701                     |
| 0       | 1  | 1      | 20.05      | 0.4428     | 0.088781                   |
| 1       | 0  | 1      | 20.03      | 16.664     | 3.3379                     |
| 1       | 1  | 1      | 20.03      | 9.4527     | 1.8934                     |
| Average |    | 23.195 | 9.448      | 2.24755    |                            |

Table V shows the simulation results for the OR operation.  $V_{OUT}$  is discharged to GND if and only if the voltages at D and XO are GND. Else, a direct path between  $V_{DD}$  and GND does not exist and the output voltage retains at its value. The simulation results in Table V show that the worst delay of the proposed LiM cell for the OR function is 32.67ps.

3) *XOR Function:* For the XOR operation, the voltages at XA and Cinv are  $V_{DD}$  and GND respectively, so transistors Minv and ML3 are OFF. The voltage at Contx is the same as the voltage at XO.

TABLE VI. PERFORMANCE OF PROPOSED CMOS-BASED LIM CELL FOR XOR FUNCTION

| D | XA      | OUT | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---|---------|-----|------------|------------|----------------------------|
| 0 | 0       | 0   | 32.670     | 11.234     | 3.6701                     |
| 0 | 1       | 1   | 20.05      | 0.44755    | 0.089734                   |
| 1 | 0       | 1   | 20.03      | 16.664     | 3.3379                     |
| 1 | 1       | 0   | 33.090     | 12.764     | 4.2236                     |
|   | Average |     | 26.46      | 10.2774    | 2.83033                    |

Table VI shows the simulation results. The operation of transistors ML1, ML2, ML4, and ML5 is dependent on the input signal and the comparison with the data stored in the cell.

4) Inverter: The proposed cell requires the implementation of the inverter function for the stored data; transistor Minv is provided for this purpose. Transistors ML2 and ML5 are OFF while transistor Minv is ON by setting the voltages at XO and Cinv to  $V_{DD}$  (and GND for ContX).

 
 TABLE VII.
 PERFORMANCE OF PROPOSED CMOS-BASED LIM CELL FOR INVERSE FUNCTION

| D       | OUT | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> J) |
|---------|-----|------------|------------|----------------------------|
| 0       | 1   | 20.05      | 0.40612    | 0.081427                   |
| 1       | 0   | 35.820     | 10.057     | 3.6024                     |
| Average |     | 27.935     | 5.23156    | 1.8419                     |

Table VII shows the delay, power dissipation and PDP for the inverter function. The worst case delay occurs when a '1' is stored and is given by 35.82ps.

#### V. FULL ADDER EVALUATION

In this section, the proposed LiM cell is utilized to design a full adder.

$$Sum = A \oplus B \oplus C_{in} \tag{3}$$

$$C_{out} = (A \cdot B) + [C_{in} \cdot (A \oplus B)]$$
(4)

(3) and (4) give the logic functions of a full adder; A and B are the input (one-bit) operands,  $C_{in}$  is the carry-in input, Sum is the sum output and  $C_{out}$  is the carry-out bit.

#### A. Ambipolar-based LiM Cell

Four proposed LiM cells (shown in Figure 11) must be utilized to design a full adder.



Figure 12. Full adder using the proposed ambipolar-based LiM cell

Since the input data of a cell is inverted and the output data from cells A and C are used as inputs to cells B and D respectively, the output of cells A and C must be inverted. As shown in Figure 12, Sum is calculated by using cells A and B, while  $C_{out}$  is calculated from cells C and D respectively. Cell A generates the XNOR operation between the input bits A and B by setting the voltages at XCont and XOB to V<sub>DD</sub> and GND. This output voltage is connected to the input XAB of cell B; the XOR operation between A, B, and C<sub>in</sub> is executed by setting the voltages at XCont and XOB to V<sub>DD</sub> and GND while C<sub>in</sub> is provided as voltage at D. Two cells in series are required to generate C<sub>out</sub>. As shown in Figure 11, the output of cell C is connected to XOB of cell D. The operation of the full adder is generated by controlling the voltages at D, XAB, XCont and XOB of each cell, as shown in Figure 12.

Since cells B and D are connected in series to cells A and C respectively (Figure 12), simulation must take into account these two steps.



Figure 13. Voltages at Pre1, Pre2, C<sub>out</sub> and Sum when A, B, and C<sub>in</sub> are in states '1', '1', and '0' respectively

Figure 13 shows the voltages at nodes Pre1, Pre2,  $C_{out}$  and Sum when the inputs A, B, and  $C_{in}$  are '1', '1', and '0' respectively. Pre1 is connected to cells A and C, while Pre2 is connected to cells B and D (Ctrl2 of all cells is at  $V_{DD}$ ).

TABLE VIII. PERFORMANCE OF FULL ADDER WHEN IMPLEMENTED USING THE PROPOSED AMBIPOLAR-BASED LIM CELLS

| A | B | Cin | Cout | Sum | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> ) |
|---|---|-----|------|-----|------------|------------|---------------------------|
| 0 | 0 | 0   | 0    | 0   | 59.947     | 33.448     | 20.051                    |
| 1 | 0 | 0   | 0    | 1   | 58.194     | 40.620     | 23.639                    |
| 0 | 1 | 0   | 0    | 1   | 51.593     | 35.522     | 18.327                    |
| 1 | 1 | 0   | 1    | 0   | 59.136     | 34.118     | 20.176                    |
| 0 | 0 | 1   | 0    | 1   | 60.010     | 25.263     | 15.161                    |
| 1 | 0 | 1   | 1    | 0   | 54.743     | 44.776     | 24.512                    |
| 0 | 1 | 1   | 1    | 0   | 54.751     | 39.177     | 21.450                    |
| 1 | 1 | 1   | 1    | 1   | 40.04      | 39.811     | 15.940                    |
|   |   | Ave | rage |     | 54.8018    | 36.5919    | 19.907                    |

Table VIII shows the delay, power dissipation and power delay product (PDP) of a full adder implemented using the proposed LiM cells. The worst case delay is 60.01ps.

#### B. CMOS-Based LiM Cell

Next, the proposed CMOS-based LiM cells (9T1P cells) are connected as a full adder (Figure 13). The full adder requires the lines WL, BL and Cinv to be at GND, to control the voltage at Ctrl2, and precharge the voltage of Pre.



Figure 15. Voltages at Ctrl2 and Pre of cells A, B, C, and D of full adder

Figure 15 shows the timing diagram of the full adder of Figure 14. Table IX shows the delay, power dissipation and PDP of the full adder when using the proposed cells; the delay of a full adder when four 9T1P LiM cells are employed, is smaller than using four 7T2A1P LiM cells; however the number of transistors in this design is larger, i.e. 41 transistors and 4 PMCs are now utilized.

TABLE IX. METRICS OF THE FULL ADDER CELL WHEN IMPLEMENTED USING THE PROPOSED CMOS-BASED LIM CELL

| A | B | Cin | Cout | Sum | Delay (ps) | Power (µW) | PDP (*10 <sup>-16</sup> ) |
|---|---|-----|------|-----|------------|------------|---------------------------|
| 0 | 0 | 0   | 0    | 0   | 54.511     | 44.269     | 24.132                    |
| 1 | 0 | 0   | 0    | 1   | 55.774     | 41.442     | 23.114                    |
| 0 | 1 | 0   | 0    | 1   | 53.507     | 27.092     | 14.496                    |
| 1 | 1 | 0   | 1    | 0   | 55.034     | 44.598     | 24.544                    |
| 0 | 0 | 1   | 0    | 1   | 52.214     | 45.431     | 23.721                    |
| 1 | 0 | 1   | 1    | 0   | 52.340     | 43.918     | 22.987                    |
| 0 | 1 | 1   | 1    | 0   | 52.151     | 27.830     | 14.513                    |
| 1 | 1 | 1   | 1    | 1   | 40.03      | 59.876     | 23.969                    |
|   |   | Ave | rage |     | 51.945     | 41.807     | 21.4345                   |

#### VI. COMPARISON

In this section, the proposed cells are compared with the LiM cell of [3]. The HSPICE macromodel [15] is employed to simulate the electrical characteristics of the MTJ. The worst case delay, power dissipation, PDP, write time, circuit complexity and area are considered for the three functions (AND, OR, XOR) as well as the full adder design. Only the areas accounting for the CMOS transistors are reported because the other components in the LiM cells (such as ambipolar transistors, PMC, MTJ, and capacitors) are commonly stacked [16]. These components occupy a smaller area than CMOS transistors, hence stacking does not incur in density issues [16] [17].

TABLE X. AND FUNCTION COMPARISON

| Performance Metric         | Ambipolar-based | CMOS-based | [3]      |
|----------------------------|-----------------|------------|----------|
| Delay (ps)                 | 43.471          | 34.942     | 81.365   |
| Power (µW)                 | 19.201          | 16.779     | 10.688   |
| PDP (*10 <sup>-16</sup> J) | 6.5363          | 5.8629     | 8.6237   |
| Write time                 | 21.081ps        | 21.081ps   | 2ns      |
| Circuit Complexity         | 7CMOS+          | 9CMOS+     | 14CMOS + |
|                            | 2AMB+1PMC       | 1PMC       | 2MTJs+1C |
| CMOS Area ( $\lambda^2$ )  | 3422.22         | 4122.96    | 5614.815 |
| Full Swing Output          | Yes             | Yes        | No       |

TABLE XI. OR FUNCTION COMPARISON

| Performance Metric         | Ambipolar-based | CMOS-based | [3]      |
|----------------------------|-----------------|------------|----------|
| Delay (ps)                 | 32.207          | 32.67      | 78.125   |
| Power (µW)                 | 19.201          | 16.664     | 10.649   |
| PDP (*10 <sup>-16</sup> J) | 3.8439          | 3.6701     | 8.2596   |
| Write time                 | 21.081ps        | 21.081ps   | 2ns      |
| Circuit Complexity         | 7CMOS+          | 9CMOS+     | 14CMOS + |
|                            | 2AMB+1PMC       | 1PMC       | 2MTJs+1C |
| CMOS Area ( $\lambda^2$ )  | 3422.22         | 4122.96    | 5614.815 |
| Full Swing Output          | Yes             | Yes        | No       |

TABLE XII.

XOR FUNCTION COMPARISON

| Performance Metric         | Ambipolar-based | CMOS-based | [3]      |
|----------------------------|-----------------|------------|----------|
| Delay (ps)                 | 40.273          | 33.090     | 78.445   |
| Power (µW)                 | 17.215          | 16.664     | 10.644   |
| PDP (*10 <sup>-16</sup> J) | 5.2505          | 4.2236     | 8.3215   |
| Write Delay                | 21.081ps        | 21.081ps   | 2ns      |
| Circuit Complexity         | 7CMOS+          | 9CMOS+     | 14CMOS + |
|                            | 2AMB+1PMC       | 1PMC       | 2MTJs+1C |
| CMOS Area ( $\lambda^2$ )  | 3422.22         | 4122.96    | 5614.815 |
| Full Swing Output          | Yes             | Yes        | No       |

Tables X-XII compare these three LiM cells; the proposed LiM cells are superior than [3] in most figures of merit. The proposed cells have advantages such as lower delay, lower PDP, higher switching speed (as reflected in the lower write delay of the resistive element), reduced circuit complexity, lower area and full output voltage swing. The LiM cell of [3] has the lowest power dissipation.

TABLE XIII. FULL ADDER COMPARISON

| Performance Metric         | Ambipolar-based | CMOS-based | [3]       |
|----------------------------|-----------------|------------|-----------|
| Delay (ps)                 | 60.01           | 55.774     | 92.894    |
| Power (µW)                 | 44.776          | 59.876     | 17.573    |
| PDP (*10 <sup>-16</sup> J) | 24.512          | 24.544     | 16.148    |
| Write time                 | 21.081ps        | 21.081ps   | 2ns       |
| Circuit Complexity         | 28CMOS+         | 41CMOS+    | 32CMOS+   |
|                            | 8AMB+4PMC       | 4PMC       | 4MTJs+2C  |
| CMOS Area ( $\lambda^2$ )  | 13,688.88       | 19,568.38  | 15,644.44 |
| Full Swing Output          | Yes             | Yes        | No        |

Table XIII presents the comparison between full adders made of the proposed LiM cells (so requiring 4 PMC-based LiM cells) and the MTJs of [3]. The delay and write time using the proposed cells are improved compared with [3] and the outputs of the corresponding full adders have a full voltage swing. However, power dissipation and PDP of these full adders are worse due to the larger circuit complexity encountered for these designs compared with [3]. The proposed CMOS-based LiM is better than the ambipolarbased LiM when the performance of the logic functions is considered; when implementing the full adder, the ambipolarbased LiM uses a smaller number of transistors than the CMOS-based LiM. Therefore, circuit complexity, CMOS area and power dissipation of the ambipolar-based LiM cell are better than for the CMOS-based LiM cell.

# VII. CONCLUSION

Logic-In-Memory (LiM) is a processing paradigm that exploits the large volume of storage found in today's computing systems for performance improvements of specific computational applications. This paper has proposed two novel designs for a non-volatile LiM cell; in this type of cell, a resistive RAM (RRAM) that consists of a transistor and a Programmable Metallization Cell (PMC), is utilized as storage element. The first cell employs ambipolar transistors and CMOS in its logic circuit (7T2A1P), while the second proposed LiM cell uses only MOSFETs (9T1P) to implement logic functions such as AND, XOR and OR. Ranking of these cells with the current-based cell of [3] according to different circuit-level figures of merit is shown in Table XIV.

TABLE XIV. RANKING OF NONVOLATILE LOGIC IN MEMORY CELLS

| Performance<br>Metric  | Ambipolar-based<br>(7T2A1P) | CMOS-based<br>(9T1P) | [3] |
|------------------------|-----------------------------|----------------------|-----|
| Delay                  | 2                           | 1                    | 3   |
| Power dissipation      | 3                           | 2                    | 1   |
| PDP                    | 2                           | 1                    | 3   |
| Write time             | 1                           | 1                    | 3   |
| Area                   | 1                           | 2                    | 3   |
| Full Swing<br>Output   | 1                           | 1                    | 3   |
| Logic<br>Functionality | 1                           | 1                    | 3   |

As shown in Table XIV, [3] shows the best performance in terms of power dissipation however its logic functionality is mostly fixed at manufacturing. The proposed ambipolar-based LiM cell design improves over [3]; the logic functions can be varied based on the input voltages. This flexibility incurs a lower performance under most metrics when compared with the proposed CMOS-based cell. The circuit complexity due to the ambipolar transistors results in the slight degradation in few figures of merit, such as delay and PDP. Therefore, the proposed CMOS-based cell has the best performance in all metrics, except power dissipation and area. The best area is achieved by the ambipolar-based cell due to stacking of the non CMOS-components.

#### REFERENCES

- T. Hanyu, "Challenge of MTJ/MOS-Hybrid Logic-in-Memory Architecture for Nonvolatile VLSI Processor" ISCAS'13, Beijing pp. 117-120, May 2013
- [2] Q. Zhu, K. Vaidyanathan, O. Shacham, M. Horowitz, L. Pileggi, F. Franchetti "Design Automation Framework for Application-Specific Logic in Memory Block" IEEE ASAP 23<sup>rd</sup> pp.125-132, July 2012
- [3] S. Matsunaga, J. Hayakawa, S. Ikeda, K. Miura, T. Endoh, H. Ohno, T. Hanyu "MTJ-Based Nonvolatile Logic-in-Memory Circuit, Future Prospects and Issues" DATE'09 pp.433-435, April 2009

- [4] W. Zhao, G. Prenat, J.-O. Klein, B. Dieny, C. Chappert, D. Ravelosona "Emerging Hybrid Logic Circuits based on Non-Volatile Magnetic Memories" NEWCAS'11, Paris, France, pp.1-4, June 2013
- [5] M. N. Kozicki, M. Park, and M. Mitkova, "Nanoscale memory elements based on solid state electrolytes," *IEEE Trans. Nanotechnol.*, vol. 4, no. 3, pp. 331–338, May 2005.
- [6] S. Yu, H.S. P. Wong "Compact Modeling of Conducting-Bridge Random-Access Memory (CBRAM)" *IEEE Trans. Electron Devices*, Vol. 58, No.5, May 2011
- [7] H. Akinaga, H. Shima, "Resistive Random Access Memory (ReRAM) Based on Metal Oxides," *Proceedings of the IEEE*, pp.2237-2251, 2010.
- [8] S. Yamamoto, Y. Shuto, S. Sugahara, "Nonvolatile SRAM (NV-SRAM) Using Functional MOSFET Merged with Resistive Switching Devices", *IEEE 2009 CICC*, pp. 531-534, 2009.
- [9] S.-M. Koo, Q. Li, M. D. Edelstein, C. A. Richter, E. M. Vogel "Enhanced Channel Modulation in Dual-gated Silicon Nanowire Transistors," *Nano Letters*, vol. 5, no. 12, pp. 2519–2523, 2005.
- [10] Y.-M. Lin, J. Appenzeller, J. Knoch, P. Avouris, "High-performance Carbon Nanotube field-effect Transistor with Tunable Polarities," *IEEE Trans. Nanotechnology*, vol. 4, pp. 481–489, 2005.
- [11] M. H. B. Jamaa, K. Mohanram, G. D. Micheli "An Efficient Gate Library for Ambipolar CNTFET Logic" *IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems*, Vol. 30, No.2, Feb 2011
- [12] D. Suzuki, T. Endoh, T. Hanyu "TMR-Logic-Based LUT for Quickly Wake-up FPGA" MWSCAS 51<sup>st</sup> pp. 326-329, 2008
- [13] W. Zhao, M. Moreau, E. Deng, Y. Zhang, J.-M. Portal, J.-O. Klein, M. Bocquet, H. Aziza, D. Deleruyelle, C. Muller, D. Querlioz, N. B. Romdhane, D. Ravelosona, C. Chappert, "Synchronous Non-Volatile Logic Gate Design Based on Resistive Switching Memories", *IEEE Trans on Circuits and Systems*, Vol. 61, No. 2, February 2014
- [14] P. Junsangsri, J. Han and F. Lombardi "HSPICE Macromodel of a Programmable Metallization Cell (PMC) and its Application to Memory Design", Proc. ACM/IEEE Symposium on Nano Architectures, pp 45-50, Paris, 2014
- [15] S. S. Mukherjee, S. K. Kurinec, "A Stable SPICE Macro-Model for Magnetic Tunnel Junctions for Applications in Memory and Logic Circuits", *Magnetics, IEEE Transactions on*, vol. 45, No. 9, pp.3260-3268, September 2009.
- [16] P. Junsangsri and F. Lombardi, "Design of a Hybrid Memory Cell Using Memristance and Ambipolarity," *IEEE Transactions on Nanotechnology*, vol. 12, no. 1, pp. 71-80, 2013
- [17] W. Wei, K. Namba, J. Han and F. Lombardi "Design of a Non-Volatile 7T1R SRAM Cell for Instant-on Operation." *IEEE Transactions on Nanotechnology*, vol. 13, no. 5, pp. 905-916, 2014.