# Design and Comparative Evaluation of a PCM-based CAM (Content Addressable Memory) Cell

Pilin Junsangsri, Student IEEE, Jie Han, Senior Member IEEE and Fabrizio Lombardi, Fellow IEEE

Abstract-This paper presents the design of a Content Addressable Memory (CAM) cell. This cell utilizes a Phase Change Memory (PCM) as storage element and an ambipolar transistor for data comparison; the operation of the ambipolar transistor is controlled by voltage at the polarity gate. A memory core consisting of a CMOS transistor and a PCM is employed (1T1P). For the search operation, the data in the 1T1P memory core is read and its value is established using a differential sense amplifier. The proposed CAM cell is simulated and compared with other non-volatile CAM cells using emerging technologies (such as MTJ and memristor). The simulation results show that as the proposed CAM cell operates on a voltage basis, it offers significant advantages in terms of power delay product (PDP) for the search operation and reduced circuit complexity (in terms of lower transistor and storage element counts) compared with other designs found in the technical literature.

*Index Terms*—Content Addressable Memory (CAM), Phase Change Memory (PCM), Emerging Technology.

### I. INTRODUCTION

A Content Addressable Memory (CAM) implements a lookup-table function using a dedicated comparison circuitry within usually a single clock cycle. A CAM compares the input search data against a table of stored data; the address of the matching data (if any) is then returned [1]. CAMs have been used in a variety of applications that require a fast search capability, such as parametric curve extraction [2] and image coding [3]; however, the utilization of a CAM comes at the cost of increased circuit complexity and power consumption. Emerging memory devices (such as the memristor and magnetic tunneling junctions (MTJs)) have been utilized as storage elements to improve the performance of a CAM cell. Even though these devices are usually slower than CMOS based memories, lower power dissipation and non-volatile data retention are achieved. The phase change memory (PCM) is a non-volatile memory technology that has high density, fast switching time and excellent data retention capabilities [4]. It is used in this paper as a non-volatile storage device.

A CAM cell design is proposed in this paper. This cell uses a single PCM as non-volatile storage element and a CMOS transistor as control element in the memory core (i.e. 1T1P) while operating on a voltage basis. For the search operation, the data in the 1T1P memory core is read and its value is established using a differential sense amplifier. An ambipolar transistor is employed in the circuit for comparing the stored with the search data; the advantage of ambipolarity is that operations are controlled by the voltage at the polarity gate and can be implemented by a single device (such as a CNTFET) [5]. The proposed CAM cell is simulated and compared with other CAM cells found in the technical literature; the simulation results show that the proposed CAM cell offers significant advantages in terms of search time, power dissipation and reduced transistor count compared to other non-volatile memories.

# II. REVIEW AND PRELIMINARIES

This sections briefly reviews few items of relevance to the proposed design.

*Phase Change Memory (PCM):* The phase change memory is regarded as one of the most promising emerging technologies for non-volatile memory design. It has a high density, good speed, excellent scaling capabilities, and compatibility with a CMOS process. Data storage in a PCM is due to the phase transformation of the chalcogenide alloy (e.g. Ge<sub>2</sub>Sb<sub>2</sub>Te<sub>5</sub>, GST) that exhibits amorphous and crystalline phases. In the amorphous phase, the resistance of the PCM is high (commonly referred to as the reset state); in the crystalline phase, its resistance is low (commonly referred to as the set state) [4]. To write data in a PCM cell, a pulse with high amplitude is used to crystallize the resistive element to the amorphous phase (Reset State); a longer pulse with low amplitude is used to crystallize the resistive element (i.e. to the Set State) [4].

Ambipolar Transistor: Different from a traditional (unipolar silicon CMOS) device whose behavior (either p-type or n-type) is determined at fabrication, ambipolar devices can be operated in a switched mode (from p-type to n-type, or vice versa) by changing the gate bias [6]. This behavior has been experimentally reported in different emerging technologies such as carbon nanotubes [5] and silicon nanowires [6]. The direction of the current and the device behavior are controlled by the voltage at the polarity gate (PG). When PG is set to logic '0' ('1'), the ambipolar transistor behaves as a NMOS (PMOS) [7].

*Existing CAM Design:* CAM designs using emerging technologies and CMOS have been extensively analyzed in the technical literature. By comparison, a CMOS-based

P. Junsangsri and F. Lombardi are with the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA; {lombardi@cce.neu.edu, junsangsri.p@husky.neu.edu}; J. Han is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 1H9, Canada {jhan8@ualberta.ca}.

volatile CAM employs a SRAM (6 transistors) as storage core and 4 transistors for the comparison operation, i.e. the number of transistors is large, so incurring in a high power dissipation. A CAM using memristors has been presented in [9]; this design employs 2 memristors as storage elements and 7 transistors as control elements. The number of transistors and the power dissipation of the CAM of [9] are less than for a CMOS-based CAM; however the write/read times of this nonvolatile CAM are substantially higher than a volatile CMOSbased CAM. Moreover, the voltage drop across a memristor during the search operation slightly changes the value of its memristance; so, a refresh operation is required. The design of [8] uses a PCM due to excellent data retention capabilities; however in this design the size of the transistor must be adjusted until the resistances in the ON and OFF states are close in values to the PCM resistance of state '1' and '0' respectively. Moreover, the output of its search operation [8] is given by a match line current (I<sub>ML</sub>). The current of the match line is very small; V<sub>ML</sub> is limited to 0.4V, so a highperformance current sense amplifier is needed for correct operation.

#### III. PRINCIPLES AND DESIGN OF PROPOSED CELL

In this section, the basic principles of the proposed cell are initially presented.

#### A. Basic Memory Core (1T1P)

As shown in Figure 1a, the basic memory core consists of a PCM as storage element and a MOSFET as control element, i.e. this is a 1T1P memory core. The write and read operations of this 1T1P memory core are established by controlling the voltages at the bitline (BL) and the word lines (WL).



Figure 1. a) The proposed 1T1P core, b) Differential sense amplifier [10][11]

1) Write Operation: To write data into the memory core, the write voltage is obtained as input from BL, while WL is used as selection line. When the word line voltage ( $V_{WL}$ ) is connected to  $V_{DD}$ , the transistor M1 is ON. So, the write voltage of the PCM is the same as at BL; moreover, there is a voltage drop across the PCM. The 1T1P core can be written based on the value of  $V_{BL}$ . State '0' corresponds to the amorphous phase of the PCM (high resistance value) while state '1' corresponds to the crystalline phase (low resistance value).

2) Read Operation: Initially, the bitline is precharged to the  $V_{read}$  value; as the word line is at  $V_{DD}$ , M1 is ON. So the PCM receives  $V_{read}$  from BL; the data stored in the core is found by checking the value of  $V_{BL}$ . If a '1' (low PCM resistance) is stored in the 1T1P core,  $V_{BL}$  is easily pulled to GND. However if a '0' is stored, the value of  $V_{BL}$  is higher than for state '1'. Therefore, the data stored in the memory core is correctly read. The read voltage ( $V_{read}$ ) of the memory core is

limited to the holding voltage  $(V_h)$  because the holding voltage is the least threshold voltage of the PCM. The change from the OFF to the ON states never occurs during a read operation.

The proposed cell consists of few circuits (in addition to the core) that are analyzed and discussed next.

## B. Differential Sense Amplifier

At the designated read time, a differential sense amplifier is required for changing  $V_{BL}$  to the two-valued voltage (i.e. GND or  $V_{DD}$ ) corresponding to the state stored in the 1T1P core. Figure 1b shows the differential sense amplifier of [10]; the difference in values is found by comparing  $V_{BL}$  with the threshold voltage of the differential sense amplifier ( $V_{ths}$ ), then inverters are employed to drive the voltage difference to the output ( $V_{out}$ ). If a '0' is stored in the 1T1P core,  $V_{BL}$  is high; if  $V_{BL}$  is higher than  $V_{ths}$ , then the voltage at node out is at GND. If a '1' is stored in the 1T1P core,  $V_{BL}$  is less than  $V_{ths}$ , so the voltage at node out is  $V_{DD}$ .

# C. Comparator Circuit

After the data stored in the 1T1P memory core is processed by the differential sense amplifier to a two-value voltage (GND and  $V_{DD}$ ), a circuit is used to compare the stored data with the search voltage. The outcome is generated by using the match line voltage ( $V_{ML}$ ). Two types of comparator circuit are investigated:



Figure 2. a) CMOS-based CAM comparator circuit, b) Ambipolar-based CAM comparator circuit

*CMOS-Based Comparator Circuit:* Figure 2a shows a CMOS-based comparator circuit.  $V_{ML}$  is precharged to  $V_{DD}$  prior to the search operation; then the stored voltage is provided at node out (while the search voltage is at node search). If there is a match with the stored data,  $V_{ML}$  retains its value, else it is discharged.

Ambipolar-Based Comparator Circuit: Figure 2b shows an ambipolar-based comparator circuit;  $V_{ML}$  is precharged to  $V_{DD}$  prior to the search operation. The stored voltage (i.e. the voltage at node out) is connected to the gate of the ambipolar transistor (AMB); the search voltage is connected to the polarity gate of the ambipolar transistor. So for a search '0' ('1') operation,  $V_{search}$  is at GND ( $V_{DD}$ ) and AMB behaves as a NMOS (PMOS); if there is a match with the stored data,  $V_{ML}$  retains its value, else it is discharged.

# IV. SIMULATION RESULTS

HSPICE is used as simulation tool, the model of [4] is employed for the PCM; its (default) resistance range is given by  $7k\Omega - 200k\Omega$ . The model of [11] is used for the ambipolar transistor; the transistor sizes are adjusted to generate the symmetric conduction between PMOS and NMOS behaviors. Simulation is performed at 32nm CMOS feature size and a supply voltage of 0.9V. The performance of the proposed cell is given by the performance of its different circuits; the read time of the proposed CAM cell is set when the bitline voltage difference between the two states is  $V_{\rm h}/2$ .

*Comparator Circuit of CAM:* The performance of both comparison circuits presented previously are compared. Delay is initially considered.

| State | Stored Voltage         | Search Voltage | Search time (ns) |           |  |
|-------|------------------------|----------------|------------------|-----------|--|
| Sille | (V)                    | (V)            | CMOS             | Ambipolar |  |
| 0     | GND (0V)               | 0              | N/A              | N/A       |  |
|       |                        | 1              | 0.74             | 0.731     |  |
| 1     | V <sub>DD</sub> (0.9V) | 0              | 0.74             | 0.238     |  |
|       |                        | 1              | N/A              | N/A       |  |

Table 1. Search time of the CMOS and ambipolar-based CAM comparator circuits at a supply voltage  $(V_{DD})$  of 0.9V

Table 1 presents the search time for the CMOS-based (Figure 2a) and ambipolar-based (Figure 2b) comparator circuits (the search time is defined as the amount of time that  $V_{ML}$  is discharged until its value is less than  $V_{DD}/2$ ). The ambipolar-based comparison circuit is overall better than the CMOS-based circuit, especially for the search '0' operation. For the search '1' operation, the ambipolar transistor behaves as a PMOS; so if a '0' is stored in the 1T1P core, a mismatch outcome is generated (and  $V_{ML}$  discharges its value). However, the match line voltage is not fully discharged to 0V due to the threshold voltage drop across the ambipolar transistor; the search time of a '1' is slower than for the '0' operation.

*Delay:* The delays of all circuits in the CAM cell contribute to the total delay for the search operation; the results are shown in Table 2. The comparison circuit accounts for the largest value; if a SB-CNTFET [5] is utilized as equivalent in operation to the macromodeled ambipolar transistor [11], the delay of the comparison circuit can be significantly reduced, because [5] has shown that the inverter delay of a SB-CNTFET at a diameter of 1nm, is nearly 1ps [5].

| Circuit                           | Delay (ns) |
|-----------------------------------|------------|
| 1T1P Memory Core                  | 0.294      |
| Differential Sense Amplifier [18] | 0.067      |
| Comparison Circuit                | 0.731      |
| Total Delay                       | 1.092      |

Table 2. Delay of proposed CAM cell for a search operation at default values

*Power Dissipation:* The power dissipation of each circuit in the proposed cell is found next. For the 1T1P core, the power dissipation for the write operation is not considered because this operation is seldom executed in practical applications (such as routers) and is well known to be dominated by the high voltage of the resistive element [11]. Table 3 shows the average power dissipation, average miss delay, and power delay product (PDP) of each circuit in the proposed CAM cell; state '1' consumes more power than '0', because the bitline voltage for state '1' ( $7k\Omega$ ) is transferred to GND. The macromodel of the ambipolar transistor is used for finding the average power dissipation of the comparison circuit,; this is a very pessimistic value, because the power dissipation in Table 3 accounts mostly for the 10 transistors used in this

macromodel [11] rather than the power dissipation of a fabricated device (using for example a single CNTFET [5]); so the average power dissipation and the PDP of both comparator circuits implemented by a single CNTFET should be even lower than the values obtained by the macromodel of the ambipolar transistor [11].

| Circuit                         | State/<br>outcome | Average<br>Power (µW) | Average Miss<br>Delay (ns) | PDP<br>(fJ) |
|---------------------------------|-------------------|-----------------------|----------------------------|-------------|
| 1T1D (CAM)                      | 0                 | 2.38                  | 0.294                      | 0.6998      |
| IIIP (CAM)                      | 1                 | 4.269                 | 0.294                      | 2.9642      |
| Differential<br>Sense Amplifier | N/A               | 22.3939               | 0.067                      | 1.5004      |
| Comparator<br>(CAM)             | mismatch          | 43.728                | 0.731                      | 31.965      |
|                                 |                   |                       |                            |             |

Table 3. Average power dissipation, average miss delay and power delay product of each circuit in the proposed CAM cell

*CMOS Feature Size:* Previously, the CMOS feature size of the proposed CAM design has been fixed to its default value, i.e. 32nm. Next the design is assessed when HP (high performance) PTMs are utilized at the lower feature sizes of 22 and 16nm. Table 4 presents the delay of the proposed CAM cell for the search operation; the delay of the proposed CAM cell decreases considerably when reducing the CMOS feature size.

| Cinquit                      | CAM   |       |       |       |       |       |
|------------------------------|-------|-------|-------|-------|-------|-------|
| Circui                       | 16nm  | 22nm  | 32nm  | 16nm  | 22nm  | 32nm  |
| 1T1P Memory Cell             | 0.338 | 0.309 | 0.294 | 0.247 | 0.265 | 0.294 |
| Differential Sense Amplifier | 0.039 | 0.053 | 0.067 | 0.024 | 0.045 | 0.067 |
| Comparator                   | 0.394 | 0.5   | 0.731 | 0.208 | 0.32  | 0.731 |
| Total Delay                  | 0.771 | 0.862 | 1.092 | 0.479 | 0.63  | 1.092 |
| Voltage Supply(V)            | 0.7   | 0.8   | 0.9   |       | 0.9   |       |

Table 4. Delay of proposed CAM cell for a search operation when both

 CMOS feature size and supply voltage are changed

*PCM Resistance Range:* Next, the effect of the PCM resistance range is assessed with respect to the write/read times as well as the PDP of the 1T1P core. The resistance for state '0' is varied from  $100k\Omega$  to  $300k\Omega$  (the resistance for state '1' is kept constant at  $7k\Omega$ ).

| ( <i>ns</i> ) | IDI (J)                                                                                                 |
|---------------|---------------------------------------------------------------------------------------------------------|
| .81 0.318     | 3.0869                                                                                                  |
| 0.34 0.294    | 2.9642                                                                                                  |
| .78 0.279     | 2.9237                                                                                                  |
|               | (hs)         (hs)           .81         0.318           0.34         0.294           1.78         0.279 |

Table 5. 1T1P core performance under different PCM resistance ranges

Table 5 shows the worst case of the write time; the data in the memory core is changed from state '0' to state '1'. The read time and PDP are when the memory core (1T1P) executes the read '1' operation. The write time of the 1T1P core changes depending on the PCM resistance range; the read time of a smaller PCM resistance range results in a larger value, i.e. at a smaller resistance range, the bitline voltage difference between states '0' and '1' is small, so the read time increases. The same effect is observed for the PDP.

*IT1P Cores/Bitline:* Next, the number of 1T1P cores connected by a single bitline is considered at a read time of 0.294ns as a measure for array implementation. Figure 3 shows that the bitline voltage of state '1'  $(7k\Omega)$  increases

when the number of connected 1T1P cores is increased (at a read time of 0.294ns).



Figure 3. Bitline voltage vs number of 1T1P cores per bitline,

For state '0' ( $200k\Omega$ ), the bitline voltage is almost constant; its value is close to V<sub>h</sub>. The difference between the bitline voltages of states '0' and '1' is still large when the number of 1T1P cores connected to the same bitline is increased, i.e. the read operation can still be executed correctly.

#### V. COMPARISON

In this section, the proposed CAM cell is compared with different schemes found in the technical literature. All non-volatile cells (such as the proposed and the cells in [8, 9, 12]) can be implemented using stacking, so placing the nonvolatile elements on a different plane than the MOSFETs.

| Measure                         | Proposed<br>PCM | MTJ<br>NAND [12] | MTJ<br>NOR [12] | [8]    | MCAM<br>[9] | CMOS    |
|---------------------------------|-----------------|------------------|-----------------|--------|-------------|---------|
| Write time (ns)                 | 199.34          | 1.5              | 1.5             | 199.34 | 145         | 0.045   |
| Search Time (ns)                | 1.092           | 0.576            | 1.044           | 1.326  | 1.3035      | 0.589   |
| PDP of Search<br>Operation (fJ) | 36.4296         | 52.367           | 79.763          | 46.689 | 15.448      | 14.1285 |
| Number of<br>Transistors/Core   | 1               | 6                | 5               | 1      | 7           | 10      |
| Number of<br>Devices/Core       | 1               | 2                | 2               | 1      | 2           | 0       |
| Nonvolatile<br>Capability       | Yes             | Yes              | Yes             | Yes    | Yes         | No      |
| Refresh operation               | No              | No               | No              | No     | Yes         | No      |

Table 6. Comparison of proposed CAM cell, CMOS-based CAM cell, CAM cell of [8], memristor-based CAM of [9], and MTJ-based CAM cells of [12]

Table 6 shows that the proposed nonvolatile CAM cell requires the least number of transistors compared with other designs found in the technical literature (all at default values for a 32nm feature size). The design of [8] is still 1T1P as the proposed scheme, thus incurring in the same write time (slower than other CAM cells due to the slow crystallization rate of the PCM). However, the search time of the proposed CAM cell is faster than [8] due to the use of a voltage versus a current sense amplifier. Also, an adjustment of the search voltage (as function of the match line current I<sub>ML</sub>) is required for [8], so the transistor of the 1T1P cell of [8] must be adjusted till the resistances in the ON and OFF states are close to the PCM values of states '1' and '0'. The proposed CAM cell does not require such adjustment, because it operates on a voltage rather than a current mode [8]. As for the PDP of the search operation, the proposed cell is better than [8] and the other MTJ-based CAM cells. Table 6 shows the performance of the memristor-based CAM (MCAM) cell of [9] and a CMOS based CAM cell for comparative purposes. The MCAM requires a large number of transistors; moreover, a refresh operation must be executed after the search operation. The CMOS-based CAM cell is volatile, hence no comparison is further made with the other cells.

## VI. CONCLUSION

This paper has proposed a novel design of a non-volatile CAM cell; this design utilizes a single phase change memory (PCM) as storage element. Compared with other PCM-based cells [8], the proposed cell operates on a voltage basis, hence making the search operation considerably simpler. A further novelty is the comparator circuit; the proposed circuit is designed by using an ambipolar transistor and has been shown to be superior to their CMOS-based counterpart in term of delay and circuit complexity. An assessment and comparison of the proposed cell with other CAM cells [8, 9, 12] have been presented. Simulation results have been obtained using HSPICE at nanoscales for the cells of [8, 9, 12]. The results of this paper show that the proposed cell offers advantages in term of circuit complexity, nonvolatile operation, data retention capability, as well as power delay product (PDP). Features for scalability in array implementation (such as bitline voltage and search time) are also very good. The proposed cell is therefore, viable in nonvolatile applications in which circuit complexity and PDP requirements are stringent.

#### REFERENCES

- K. Pagiamtzis, A.Sheikholeslami "Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey" *IEEE Journal of Solid-State Circuits*, Vol. 41 No.3 March 2006
- [2] M. Meribout, T. Ogura, and M. Nakanishi, "On using the CAM concept for parametric curve extraction," *IEEE Trans. Image Process.*, vol. 9, no.12, pp. 2126 – 2130, Dec. 2000
- [3] S. Panchanathan and M. Goldberg, "A content-addressable memory architecture for image coding using vector quantization," *IEEE Trans. Signal Process.*, vol. 39, no. 9, pp. 2066–2078, Sep. 1991
- [4] P. Junsangsri, J. Han, F. Lombardi "Macromodeling a Phase Change Memory (PCM) Cell by HSPICE" Proc. Nanoarch 12 pp. 77 - 84, Amsterdam Netherland, July 2012
- [5] H.S.P. Wong, Jie Deng, A. Hazeghi, T. Krishnamohan, G. C. Wan "Carbon Nanotube Transistor Circuits – Models and Tools for Design and Performance Optimization" ICCAD'06 Nov 2006 pp. 651-654
- [6] S.-M. Koo, Q. Li, M. D. Edelstein, C. A. Richter, E. M. Vogel, "Enhanced Channel Modulation in Dual-gated Silicon Nanowire Transistors," *Nano Letters*, Vol. 5, No. 12, pp. 2519 – 2523, 2005.
- [7] M. H. B.Jamaa *et al.*, "Novel Library of Logic Gates with Ambipolar CNTFETs: Opportunities for Multi-Level Logic Synthesis," in *DATE2009.*, 2009, pp. 622 – 627
- [8] B. Rajendran, R. W. Cheek, L. A. Lastra, M. M. Franeschini, M. J. Breitwisch, A. G. Schrott, Jing Li, R. K. Montoye, L. Chang, Chung Lam "Demonstration of CAM and TCAM using Phase Change Devices" Memory Workshop (IMW), 2011 3<sup>rd</sup> IEEE, pp.1-4, May 2011
- [9] K. Eshraghian, K.R. Cho, O. Kavehei, S.K Kang, D. Abbott, S.M. Steve Kang, "Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines" *IEEE Transactions on VLSI Systems*, vol. 19, no. 8, pp. 1407 - 1417, 2011
- [10] R. J. Baker "CMOS, Circuit Design, Layout, and Simulation" IEEE Press Series on Microelectronic Systems 2<sup>nd</sup> Edition 2011
- [11] P. Junsangsri, F. Lombardi, "A Ternary Content Addressable Cell using a single Phase Change Memory (PCM)" *Proc. ACM/IEEE Great Lakes Symposium on VLSI*, pp. 259 - 264, Pittsburgh, May 2015.
- [12] K. Chen, J. Han, F. Lombardi "Design and Evaluation of two MTJ-Based Content Addressable Non-Volatile Memory Cells," *Proc. IEEE NANO Conference* 13<sup>th</sup>, pp. 707 – 712, Beijing, China, August, 2013