# Design of Static Single-Phase Flip-Flops for Energy-Efficient Near-Threshold Voltage Operation

Sajjad Hossian Bappy<sup>\*</sup>, Peiyi Zhao<sup>†</sup>, and Jie Han<sup>\*</sup>

\*Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada Emails: sajjadho@ualberta.ca, jhan8@ualberta.ca <sup>†</sup>Fowler School of Engineering, Chapman University, Orange, CA, USA Email: zhao@chapman.edu

Email: zhao@chapman.edu

Abstract-Flip-flops (FFs) and clock distribution networks, two core parts of the clocking system, are becoming increasingly important in chip design. They consume 60% of the overall dynamic power because of the constant clock transition and a large number of clock transistors in each FF. FFs and latches dissipate 50% of the overall dynamic power. They also have a significant impact on the performance, robustness, and size of the circuit in the near-threshold voltage (NTV) region. Thus, static and energy-efficient FFs are required in sequential circuits for NTV operation. In this paper, we propose two static contention-free single-phase negative edge-triggered flip-flops: a Low Transistor Count FF (LTCFF) and an Ultra-Low Power FF (ULPFF). Both designs reduce the number of clock transistors to just four. In the LTCFF, the transistor count is reduced to 16 by using a merging and sharing approach, while the static behavior of the FF is kept. The ULPFF, consisting of only 22 transistors, is extended from the LTCFF by eliminating redundant internal transitions to ensure ultra-low power operation. Designed in a 65-nm technology using Cadence Virtuoso, the proposed LTCFF and ULPFF achieve a reduction of 68.77% (or 67.05%) and 74.22% (or 70.58%) in the power-delay product (PDP), compared to the widely used transmission gate flip-flop (TGFF) at a supply voltage of 1 V and 1 GHz clock frequency (or 0.4 V and 25 MHz ) with 10% data activity. These designs are simulated with different voltages from 0.4 V to 1 V and process corners to ensure good performance in near-threshold operation and that there are no floating nodes for any input combination.

Index Terms-flip-flop, Static flip-flop, Contention-free

## I. INTRODUCTION

The clock system is one of the core components in modern digital systems, including graphics and artificial intelligence processors, and consumes about 60% of the total power in a chip, whereas the FFs and latches account for 84% of the clock power [1], or 50% of the total power in a chip. In recent years, battery life and power consumption in digital components have become significant performance measures. Hence, low-power and energy-efficient FFs have become critical in modern circuit design.

FF design has seen several improvements along with new technological advancements according to requirements with modern applications [2]–[10]. One of the requirements is its performance in near-threshold voltage (NTV). NTV operation offers a way to achieve high energy efficiency as the supply voltage is close to the transistor's threshold voltage. However, to achieve this, sequential circuits need to fulfill certain requirements, including a fully static operation, a fully contention-free transition to ensure that no conflicting transistors attempt to drive the same node simultaneously, elimination of redundant clock transition and redundant transistors, while keeping the area as small as possible [11]. Clock power efficiency is also one of the key requirements in modern FF design. Recent literature emphasized reducing redundant clock transistors to minimize the clock load [4]–[9]. Significant energy is wasted due to redundant clock transitions, which highlights the need for internal and external clock gating to reduce these transitions to decrease the dynamic power consumption [10]. However, to maintain these requirements, several trade-offs need to be made.

First of all, the static operation of the FF provides reliability and robustness [9]. Also, the internal or external clock gating reduces dynamic power consumption [10]. However, additional transistors need to be added to incorporate static operation and a redundancy-free clock transition. These added transistors can create a higher capacitive load that will slow down the operation of the FF. Additionally, using extra transistors makes the circuit bulky. On the other hand, in the FFs proposed in [4], [5] transistor count is significantly reduced, leading to dynamic behavior and contention, which could degrade performance in NTV operation. Another important factor is the weaker drive strength caused by the reduction of the transistors, which leads to slower charge and discharge of the internal nodes in the FF, subsequently increasing the setup and hold time. Some of the recent designs [6]–[9] reduce the power consumption by decreasing the transistor count but they perform poorly in NTV. Therefore, a balanced approach to designing an FF is required to maintain contention-free and static operation as well as a low transistor count.

Considering the trade-offs between power efficiency, delay and area, we propose two single-phase negative edgetriggered FFs, a low transistor count FF (LTCFF) and an ultra-low power FF (ULPFF) for NTV operation. To the best of our knowledge, LTCFF achieves the lowest transistor count among single-phase edge-triggered FFs that successfully meet the requirements to operate in NTV at 0.4 V. It is shown that the ULPFF attains the lowest overall power consumption among all the considered FFs without any redundant transitions of the major signals.

The rest of the paper is organized as follows: Section II briefly reviews the background and state-of-the-art singleedge triggered (SET) FFs. Section III outlines the proposed methodologies for the FF designs. Section IV presents the simulation results, and Section V concludes the paper.

# II. STATE-OF-THE-ART SET FFS

Several factors determine the power consumption, including the supply voltage (V), frequency (f), data activity ratio ( $\alpha$ ), capacitance (C), and the short-circuit current [12]. The total power (P) of a circuit is the sum of three major components, given by:

$$P = P_{dvnamic} + P_{shortcircuit} + P_{leakage}.$$
 (1)

The largest portion of the power consumption is dynamic power, which is due to the switching activity of the transistors:

$$P_{dynamic} = \alpha \, C \, V^2 \, f. \tag{2}$$

Short-circuit power is incurred when the pull-up and pulldown transistors are both conducting at the same time. Leakage power arises due to small currents that flow even when the circuit remains idle. As shown in (2), the dynamic power is quadratically dependent on supply voltage, suggesting that reducing the supply voltage can be useful to achieve lower dynamic power. The idea of NTV computing comes from this observation. However, in an NTV or sub-threshold voltage region, the threshold voltage is close to the supply voltage, which leads to an exponential increase in the sub-threshold leakage current due to process variation. Hence, it becomes the main source of power consumption in the circuits [12]. Additionally, the design should be contention-free, with all the major nodes remaining static.

The transmission gate flip flop (TGFF) [13] is widely used in digital circuits. A TGFF contains several internal signals that toggle even if the input is not changing, creating excessive dynamic power consumption. On the other hand, the clock load is high because the dual-phase clock of TGFF toggles when the input remains unchanged.

To address these challenges, different types of FFs have been proposed in the last decade to reduce the dynamic power and the clock load of the conventional static TGFF. In [3], an Adaptive Coupling FF (ACFF) is introduced, where an adaptive coupling technique uses a differential latch structure with pass transistors to get a true singlephase clock (TSPC) operation. However, the data contention problem in the ACFF makes it unreliable in NTV operation. Transistors with the contention problem need to be properly sized, making the circuit bulky. The topologically compressed FF (TCFF) [4] is regarded as one of the first designs with the elimination of redundant transistors in the FF. In the TCFF, transistor merging and sharing methods are used to eliminate redundant transistor count of TCFF is reduced to 21, with 3 clocked transistors, However, the TCFF suffers from contention issues when the clock signal is low and the input D signal transitions from low to high.

The data activity ratio of most FFs is now between 10% and 25% [4]. For a high portion of time, the FF remains in the standby mode, in which the D input and Q output do not change. One approach to reduce power is to minimize the clock transition when the input of the FF is not changing. In the single-phase static contention-free FF (SC2FF) [2] and 18-transistor true single-phase clocked FF (18TSPC) [5], redundant clock transitions are partially eliminated. The SC2FF has 5 clock transistors, fewer than TGFFs, thus reducing the clock load. Although the 18TSPC [5] uses 4 clock transistors with low power consumption at high data activities, the internal signals' transition hampers its power efficiency in low data activities. In the three-clock transistor true singlephase clocked FF (3CTSPC) proposed in [9], ultra-low power is achieved by reducing the clock transistors to only three. Similarly, the FF in [8], known as the very low-power FF (VLFF), achieves low power consumption by minimizing the clocked transistor count to two. However, these FFs are vulnerable to different process corners in an NTV below 0.6 V. The redundancy eliminated FF (REFF) proposed in [10] eliminates redundant clock transitions, achieving an ultralow-power, NTV-compatible design. However, the circuit overhead is large, which slows down the operation of the FF. Therefore, a balanced solution for the design of FFs is needed in terms of power efficiency, circuit area, and reliability for NTV operation.

## III. PROPOSED FLIP-FLOP DESIGNS

# A. A Low Transistor Count Flip-Flop (LTCFF)

In the first proposed design, we aim to reduce the transistor count in order to maintain a smaller delay and power consumption. The foundation of this design process starts from the Boolean function of a negative-edge-triggered primarysecondary FF. Fig. 1(a) illustrates a multiplexer-based negative edge-triggered primary-secondary FF design according to the Boolean expressions below:

$$MID = D \cdot clk + MID_{pre} \cdot \overline{clk}.$$
 (3)

$$Q = MID \cdot \overline{clk} + Q_{pre} \cdot clk. \tag{4}$$

Here, *MID* is the output of the primary latch. *MID*<sub>pre</sub> and  $Q_{pre}$  are the values of the *MID* and *Q* stored in the previous clock cycle. The FF operates with a negative edge-triggered clock. When the clock signal is logic 1, the primary latch updates the *MID* signal from input *D*, and the secondary latch keeps its value from the previous clock cycle. When the clock signal is logic zero, the primary latch holds the *MID* value, whereas the secondary latch updates the *Q* output signal. A MUX-based FF can be implemented with CMOS logic gates operating with a single-phase clock, as shown in Fig. 1(b). In Fig. 1(b), two sets of compound AND-OR-INVERTER (AOI)



Fig. 1. (a) A MUX-based FF, (b) A logic representation of the MUX-based FF, (c) A truth table for the major signals in the FF in (b), (d) A design after gate-level reduction. PL: Primary Latch, SL: Secondary Latch



Fig. 2. (a) A transistor-level circuit of Fig. 1(d), (b) Inverter reduction and transistor merging, (c) Proposed LTCFF, (d) Proposed ULPFF.

gates are used: M1, M2, and M5, M6. M3 and M4 function as basic NOR gates, and I1 and I2 are two inverters. Each of the compound gates consists of 6 transistors, while the NOR gates and inverters are built with 4 and 2 transistors, respectively. Therefore, there are 24 transistors in the FF, including 8 clock transistors in M1, M3, M4, and M5. The intermediate signals are denoted as A1-A7 for understanding the later designs.

1) Gate Level Reduction: The LTCFF aims to reduce redundant transistors while maintaining the FF's functionality. The clock transistors are the prime ones for the reduction. Fig. 1(c) shows the truth table for the operation of the circuit. The primary (or secondary) latch is transparent when the clock is logic 1 (or logic 0) and opaque when the clock is logic 0 (or logic 1). For the gate-level reduction, we tried to find out which gates and signals are similar for every input combination. As shown in Fig. 1(c), signals A1 and A4, i.e., the outputs of gates M3 and M4, are identical and can be merged into one. When the clock is logic high, A1 and A4 are logic zero and both become  $\overline{A3}$  when the clock transitions to low. So, M4 can be eliminated from the FF design. Since A1 and A4 are considered as the same signal, A1 can be connected directly to the inputs of M6. Fig. 1(d) shows the gate-level merged design. In this gate-merging process, two of the clock transistors are reduced with the elimination of gate M4, and a total of 4 transistors are reduced.

2) *Transistor Level Reduction:* Fig. 2(a) shows the transistor-level schematic diagram corresponding to Fig. 1(d).

After the gate-level reduction, the total transistor count is 20 with 6 clock transistors (P1, P4, P7, N1, N4, and N7). The initial transistor-level reduction starts with optimizing the inverters. As shown in Fig. 2(a), Q and A7 are both equal to  $\overline{A6}$ . Therefore, Q and A7 can be considered as the same signal, so one of the inverters (highlighted in red in Fig. 2(a)) can be removed. This is denoted as the first phase of reduction, as illustrated in Fig. 2(b).

In Fig. 2(b), the clock transistors identified for potential reduction are highlighted in blue color. P4 and P7 are clock transistors connected in parallel to the transistors controlled by D and Q. P4 and P7 can be merged into a single transistor, as shown in Fig. 2(c). In Fig. 2(c), the P4 transistor is shared between both the primary and secondary latches, and can be considered as a bridge between them. Similarly, the clock transistors N4 and N7 in Fig. 2(c), can be merged into a single clock transistor, N4 in Fig. 2(c), as they are both connected to the ground. The total transistor count is now 16 with 4 clock transistors (P1, P4, N1, and N4). The sizing of the transistors here is straightforward. All the PMOS transistors are 2x the size of the NMOS ones.

# B. An Ultra Low Power Flip Flop (ULPFF)

The proposed ULPFF is an extension of the LTCFF. As it can be seen from Fig. 2(c), one of the internal signals, A1 toggles with the clock when D = 1. The LTCFF has an issue of redundant internal transition on the A1 signal. When D = 1, A1 becomes  $\overline{A3}$  when the clock is logic low and transitions to 0 when the clock is logic high. As a result, the FF consumes extra dynamic switching power when D is logic 1 for a long time. To overcome this issue, we need to modify the function that produces A1. In the LTCFF, the A1 signal comes from the NOR gate, with the clock and A3 signals as inputs. To prevent undesired toggling when D = 1 and Q = 1, this function is modified to include logic conditions that ensure a proper operation. The updated Boolean expression for A1 is provided below:

$$A1 = \overline{(A6 + Db) \cdot clk + A3}.$$
 (5)

Here, Db and A6 are the inverted D and Q signals, respectively. This modified function ensures that the A1 signal will not change with the clock when D = 1 and Q = 1. To do that, we need to add two PMOS transistors, two NMOS transistors, and an inverter to invert the D signal. In Fig. 2(d), A6 is connected to the gate of P9 and N9, whereas Db works as a gate signal for P10 and N10. The transistor count is now increased to 22 with 4 clock transistors. However, the 6 extra transistors ensure no internal transition when the D input and Q output are not changing, which leads to an ultra-low power operation. To prevent any contention issues between A1 and A3, transistors P1, P2, P3, and P5 are resized. P1 and P2 are resized as 3x while P3 and P5 are scaled down from 2x to 1x of the NMOS transistors. The circuit will be slower compared to the LTCFF. However, the power consumption is much lower compared to the LTCFF.

# C. Operation of the proposed FFs

The operation of the proposed LTCFF is slightly different from the conventional FFs. Let the internal signals A3 and Q be initialized as logic 1. When the clock signal is high (clk = 1) and D is low (D = 0), the N1 transistor in the primary latch will switch on, which makes A1 logic 0. Since both D and A1 are logic 0, transistors P3 and P5 will be turned on, and that makes A3 switch to logic 1. In the secondary latch, Q will hold its previous value. We have already discussed that A6 is the inverted signal of Q. Since Q is initialized as logic 1, A6 will be logic 0. As both Q and clk are 1, N4 and N6 will be turned on, and A6 will remain at logic 0. If Q is initialized as logic 0 (A6=1), transistors P6 and P8 will be on, causing A6 to hold logic 1.

During the falling edge of the clock, A3 will hold its previous value of logic 1, which keeps A1 at logic 0. Q will change from its previous state. Since Q was stored as logic 1 previously, P6 will be off. A6 will be charged to 1 as transistors P3, P4, and P8 are on because D, clock, and A1 are all logic 0, causing Q to change its value from 1 to 0. If the previous Q value is logic 0, it will remain at logic 0 as P6 and P8 are both on, and A6 will be logic 1, which means that Q keeps its value at logic 0.

Now, let us consider that Q and A3 are initialized as logic 0. When D=1 and clock=1, both A1 and A3 are logic 0. In the secondary latch, Q will hold its value. If Q was previously stored as logic 0, A6 will be logic 1 since P6 and P8 are both

on, which makes Q hold its previous value. Similarly, if Q is logic 1, A6 will be logic 0. Since both N4 and N6 are on, Q will remain at logic 1. Upon the falling edge of the clock, both A3 and the clock are logic 0, causing A1 to change its value from 0 to 1. Since A1 is logic 1, A3 retains its value, and N8 is turned on in the secondary latch. A6 is logic 0, which makes Q become logic 1.

The ULPFF exhibits a slight variation from the LTCFF. The operational behavior of the ULPFF remains identical to LTCFF when the input D is at logic 0. However, when the D and clock inputs are logic-high, A3 is pulled down by transistors N3 and N4. In the secondary latch, Q retains its previous value when the clock is at logic high.

When A3 transitions to logic zero, A1 attains a logic high state when the clock becomes logic low. As A1 becomes logic high, the transistor N8 is activated, resulting in A6 becoming logic 0. On the other side, if Q was previously logic 0, it toggles to logic high. Once Q becomes logic high and A6 becomes logic low, in the next clock cycle, when the clock transitions to logic 0, A1 still retains its previous logic high state as Db and A6 keep transistors P9 and P10 active. A1 keeps its value unless the D input transitions to logic 0. With this mechanism, we prevent the unnecessary toggling of A1 with the clock when D and Q are both logic 1.

### **IV. SIMULATION RESULTS**

All the simulations were done in 65-nm TSMC technology using Cadence Virtuoso. All the sizes of PMOS and NMOS transistors of the FFs are kept at 2:1 unless there are some special conditions. To simulate a real-time scenario, the inputs (D and Clk) are connected to two chained inverters in the testbench. In addition, the output node is connected with four parallel inverters to create a fanout of 4. The supply voltage was varied from 1 V to 0.4 V to measure the circuit's performance in the NTV. The frequency for our experiments is 1 GHz at a supply voltage of 1 V - 0.8 V. For 0.6 V, we considered 50 MHz as the optimal frequency, and it is reduced to 25 MHz for 0.4 V. The temperature is set to 27 degrees Celsius for most of the simulations. For the experiments, we need to find out if the circuits work well at the optimal supply voltage of 1 V as well as at NTVs.

Fig. 3 shows the transient analysis of the proposed LTCFF and ULPFF. As can be seen in Fig. 3(a), there is no voltage degradation or loss in the transient analysis at 1 V. At a supply voltage of 0.4 V, and a clock frequency decreased to 25 MHz, Fig. 3(b) shows that there is no degradation or voltage loss. Also, setup and hold times are increased at the NTV. Both designs achieve contention-free, static, and reliable operation without any internal node conflicts during switching. It enhances the robustness, making them highly efficient in NTV operation.

Table I compares the LTCFF & ULPFF with state-of-theart FF designs. The table highlights metrics such as the transistor count, the number of clock transistors, timing analysis, and power consumption under different data activities at the 1 GHz/1 V/TT and 25 MHz/0.4 V/TT corners. For a fair



Fig. 3. Transient analysis of the FFs at (a) 1 GHz / 1 V/ TT corner and (b) 25 MHz / 0.4 V/ TT corner.

| FF designs                                        | TGFF   | SC2FF<br>[2] | ACFF<br>[3] | TCFF<br>[4] | 18TSPC<br>[5] | 3CTSPC<br>[9] | VLFF<br>[8] | LLTFF<br>[7] | REFF<br>[10] | Proposed<br>LTCFF | Proposed<br>ULPFF |
|---------------------------------------------------|--------|--------------|-------------|-------------|---------------|---------------|-------------|--------------|--------------|-------------------|-------------------|
| Transistor counts                                 | 24     | 28           | 22          | 21          | 18            | 21            | 19          | 16           | 25           | 16                | 22                |
| Clock transistors                                 | 12     | 8            | 4           | 3           | 4             | 3             | 2           | 4            | 4            | 4                 | 4                 |
| Contention free                                   | Yes    | Yes          | No          | No          | No            | Yes           | Yes         | Yes          | Yes          | Yes               | Yes               |
| Voltage 1 V, Clock frequency 1 GHz, (TT, 27°C)    |        |              |             |             |               |               |             |              |              |                   |                   |
| clk to Q delay (CQ) (ps)                          | 65.4   | 63.43        | 60.24       | 85.15       | 68.14         | 91.33         | 63.6        | 68.48        | 71.55        | 59.48             | 73.04             |
| D to Q delay (DQ) (ps)                            | 86.47  | 99.32        | 132.79      | 148.51      | 112.14        | 131.52        | 134.35      | 92.69        | 110.5        | 80.48             | 99.04             |
| Setup time (ps)                                   | 21.07  | 35.89        | 72.55       | 63.36       | 44            | 40.19         | 70.75       | 24.21        | 38.5         | 21                | 26                |
| Hold time (ps)                                    | -12.85 | -32.4        | -32.16      | -36.74      | 37            | 21.03         | 29.56       | 11.62        | -28.95       | 17                | 20                |
| Average Power (µW)<br>(DA=10%)                    | 17.68  | 7.93         | 6.51        | 4.37        | 7.32          | 4.57          | 5.78        | 7.59         | 5.11         | 6.07              | 4.08              |
| $PDP_{CQ}$ (aJ) (DA=10%)                          | 1156   | 503          | 392         | 372         | 499           | 417           | 368         | 520          | 366          | 361               | 298               |
| Voltage 0.4 V, Clock frequency 25 MHz, (TT, 27°C) |        |              |             |             |               |               |             |              |              |                   |                   |
| clk to Q delay (CQ) (ps)                          | 1340   | 3060         | 1800        | 1830        | 1540          | 3250          | 3820        | 1990         | 1890         | 1530              | 1980              |
| D to Q delay (DQ) (ps)                            | 2020   | 4050         | 3796        | 2790        | 2410          | 4210          | 4810        | 2730         | 2850         | 1960              | 2560              |
| Setup time (ps)                                   | 680    | 990          | 1996        | 960         | 870           | 960           | 990         | 740          | 960          | 430               | 580               |
| Hold time (ps)                                    | -460   | 100          | -650        | -690        | 300           | -200          | 830         | 770          | -250         | 210               | 240               |
| Average Power (nW)<br>(DA=10%)                    | 126.7  | 53.74        | 35.4        | 41.03       | 53.32         | 27.03         | 36.05       | 44.52        | 30.13        | 36.5              | 25.05             |
| $PDP_{CQ}$ (aJ) (DA=10%)                          | 170    | 164          | 64          | 75          | 82            | 87            | 138         | 89           | 57           | 56                | 50                |

 TABLE I

 OVERALL COMPARISON OF ALL CONSIDERED FLIP-FLOPS.

comparison in the NTV, we chose the frequency within the 25 MHz range.

The LTCFF and low-voltage low-power 16-transistor FF (LLTFF) achieve the lowest transistor count, indicating a minimal area usage and potential for the lowest fabrication cost. The LTCFF provides the lowest clock-to-Q delay, making it an ideal choice for high-speed applications in NTV. The ULPFF has a higher clk to Q delay and a higher transistor count, but it still outperforms several other FF architectures, such as the TCFF and 3CTSPC. The setup time for the LTCFF is the lowest. The ULPFF requires a slightly higher setup time compared to the TGFF and LLTFF; however, it performs better than the other FFs. The LTCFF achieves the lowest D to Q delay overall, while the ULPFF attains a reasonable D to Q delay.

The ULPFF achieves the lowest average power as it does not have any redundant internal signal transitions, making it power-efficient at low data activities. Although the ULPFF offers a higher clock to Q delay, it achieves the lowest PDP among all the FFs (See  $PDP_{CQ}$  in Table I). The simulation results in Table I show that, at 1 V supply voltage and 1 GHz clock frequency, the LTCFF and ULPFF achieve a reduction of PDP by 68.77% and 74.22%, respectively, compared to the widely used TGFF.

At NTV (0.4 V), conventional FFs exhibit significant degradation. However, the proposed FFs maintain a robust performance. The ULPFF has a significantly higher clock-to-Q delay; however, its power is the lowest among all the FFs. Therefore, ULPFF achieves the lowest PDP among all the FFs. The LTCFF reaches a moderate clock-to-Q delay and average power consumption. Both designs achieve the lowest PDPs and outperform the TGFF by 67.05% and 70.58%, respectively.

Fig. 4 compares the average power of the FFs under different data activities (1%, 10%, 25%, 50%). The ULPFF consumes the lowest power among all the FFs at low data activities. As the ULPFF uses an inverted signal (Db), the power consumption increases at high data activities. On the other hand, the LTCFF requires some extra power due to the redundant internal transitions of the A1 signal even when both D and Q are not changing. The impact of that redundant transition on power is higher in low data activities than in high data activities. Thus, the LTCFF is more powerefficient as the data activity increases. Fig. 5 shows the



Fig. 4. Comparison of the average power consumption of various FFs for different data activities at 50 MHz/ 0.6 V/ TT corner.



Fig. 5. Comparison of the power delay product (PDP) of the various FFs at 50 MHz/ 0.6 V/ TT corner.

PDP comparison of the FFs at 50 MHz/ 0.6 V/TT corner. Both ULPFF and LTCFF demonstrate significantly lower PDP values compared to the conventional FFs. To ensure robustness, the PDP of the FFs is analyzed across different process corners: Typical-Typical (TT), Slow-Slow (SS), Fast-Slow (FS), Slow-Fast (SF), and Fast-Fast (FF) in Fig. 6. The proposed FFs demonstrate low sensitivity to process variations, making them more reliable across different conditions.

## V. CONCLUSION

Two static contention-free FFs are proposed with different design objectives. The LTCFF achieves a low transistor count of only 16 after a reduction of redundant transistors via a sharing and merging process. The ULPFF provides an ultra-low power operation for low data activities by eliminating redundant internal transitions of major signals. Both designs require only 4 transistors to achieve a small clock load. At a clock frequency of 1 GHz, 10% data activity, and a supply voltage of 1 V, the LTCFF and ULPFF achieve a significant reduction in PDP compared to the conventional TGFF. The ULPFF reaches the lowest PDP at different supply voltages among all the considered FFs. Both proposed designs also maintain the advantage of low PDPs at different process corners and supply voltages, making them reliable for voltage scaling in NTV operation.

#### REFERENCES

 S. Hsu, A. Agarwal, S. Realov, M. Anders, G. Chen, M. Kar, R. Kumar, H. Sumbul, P. Knag, H. Kaul *et al.*, "Low-clock-power digital standard



Fig. 6. Comparison of the power delay product of the FFs for different process corners at 50 MHz / 0.6 V.

cell ips for high-performance graphics/ai processors in 10nm cmos," in 2020 IEEE Symposium on VLSI Circuits. IEEE, 2020, pp. 1–2.

- [2] Y. Kim, W. Jung, I. Lee, Q. Dong, M. Henry, D. Sylvester, and D. Blaauw, "27.8 a static contention-free single-phase-clocked 24t flipflop in 45nm for low-power applications," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, 2014, pp. 466–467.
- [3] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77% energy-saving 22-transistor single-phase-clocking d-flip-flop with adaptive-coupling configuration in 40nm cmos," in 2011 IEEE International Solid-State Circuits Conference. IEEE, 2011, pp. 338–340.
- [4] N. Kawai, S. Takayama, J. Masumi, N. Kikuchi, Y. Itoh, K. Ogawa, A. Ugawa, H. Suzuki, and Y. Tanaka, "A fully static topologicallycompressed 21-transistor flip-flop with 75% power saving," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 11, pp. 2526–2533, 2014.
- [5] Y. Cai, A. Savanth, P. Prabhat, J. Myers, A. S. Weddell, and T. J. Kazmierski, "Ultra-low power 18-transistor fully static contention-free single-phase clocked flip-flop in 65-nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 2, pp. 550–559, 2018.
- [6] A. Khorami, M. Sachdev, and M. Sharifkhani, "A contention-free, static, single-phase flip-flop for low data activity applications," in 2019 32nd IEEE International System-on-Chip Conference (SOCC). IEEE, 2019, pp. 11–16.
- [7] J.-F. Lin, Z.-J. Hong, J.-T. Wu, X.-Y. Tung, C.-H. Yang, and Y.-C. Yen, "Low-voltage and low-power true-single-phase 16-transistor flipflop design," *Sensors*, vol. 22, no. 15, p. 5696, 2022.
- [8] Y. Maheshwari and M. Sachdev, "Vlff-a very low-power flip-flop with only two clock transistors," in 2023 IEEE 36th International Systemon-Chip Conference (SOCC). IEEE, 2023, pp. 1–6.
- [9] Y. K. Maheshwari and M. Sachdev, "A power-efficient, single-phase, contention-free flip-flop with only three clock transistors," *Microelectronics Journal*, vol. 152, p. 106390, 2024.
- [10] G. Shin, E. Lee, J. Lee, Y. Lee, and Y. Lee, "An ultra-low-power fully-static contention-free flip-flop with complete redundant clock transition and transistor elimination," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 10, pp. 3039–3048, 2021.
- [11] Z. Wang, P. Zhao, T. Springer, C. Zhu, J. Mau, A. Wells, Y. Xia, and L. Wang, "Low-power redundant-transition-free tspc dual-edgetriggering flip-flop using single-transistor-clocked buffer," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 31, no. 5, pp. 706–710, 2023.
- [12] P. Zhao, Z. Wang, and G. Hang, "Power optimization for vlsi circuits and systems," in 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology. IEEE, 2010, pp. 639–642.
- [13] G. Gerosa, S. Gary, C. Dietz, D. Pham, K. Hoover, J. Alvarez, H. Sanchez, P. Ippolito, T. Ngo, S. Litch *et al.*, "A 2.2 w, 80 mhz superscalar risc microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 12, pp. 1440–1454, 1994.