# An Ultra-Low-Power Non-Uniform Derivative-Based Sampling Scheme With Tunable Accuracy

Mohammad Elmi<sup>10</sup>, Martin Lee<sup>10</sup>, and Kambiz Moez<sup>10</sup>, Senior Member, IEEE

Abstract-This paper presents an ultra-low-power nonuniform sampling scheme using a derivative-based algorithm that can maintain a comparable accuracy to other non-uniform sampling schemes but with less complexity and lower power consumption. In this method, the change in the derivative of the signal above certain threshold values is used to identify high signal activity for retention of the significant points of the signal. The scheme is implemented using simple building blocks that calculate and compare the change in approximate real-time derivative to a tunable threshed value that can be adjusted to obtain the desired Compression Factor (CF) and Post-Reconstruction Signal-to-Noise plus Distortion Ratio (PR-SNDR) for different signal types. Fabricated in TSMC's 0.13  $\mu$ m CMOS technology and tested with real-world biomedical signals, the proposed Derivative Dependent Sampling (DDS) system consumes a maximum power of 155 nW while achieving a CF of more than 6 for an Electrocardiography (ECG) signal. By adding the proposed DDS block to a data acquisition and processing system, the non-uniform sampling can reduce the power dissipation of the entire system.

*Index Terms*—Analog signal processing, derivative-dependent sampling, non-uniform sampling, low-power data acquisition.

# I. INTRODUCTION

**P**OWER efficient signal acquisition and processing of data are essential for the development of low-power electronic circuits and systems including sensors/actuators, wireline/wireless communication systems, radars, and data processors [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. The power consumption of a data acquisition system is mostly dependent on the rate of data sampling/processing and the required bit resolution (or the number of bits that represents the data). The higher data rate or resolution demands more power consumption. While the trade-off between the power consumption and data rate/resolution cannot be avoided on a static system, the power consumption can be significantly reduced by dynamical adjusting of the parameters of data processing and acquisition, while preserving the required data

Manuscript received 7 February 2023; revised 24 March 2023; accepted 16 April 2023. Date of publication 26 April 2023; date of current version 28 June 2023. This article was recommended by Associate Editor A. Worapishet. (*Corresponding author: Mohammad Elmi.*)

The authors are with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2R3, Canada (e-mail: elmi.m@ualberta.ca; mklee@ualberta.ca; kambiz@ualberta.ca).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2023.-fileno-.

Digital Object Identifier 10.1109/TCSI.2023.3268611



Fig. 1. An arbitrary signal sampled by (a) a uniform sampling scheme, and (b) an example of an NUS scheme (LC method).

accuracy. Most recent research presents novel dynamic data acquisition techniques to break this trade-off [4], [5], [9], [11], [12], [13], [14], [15], [16], [17], [18], [19].

In the majority of conventional data acquisition methods, the signal data, regardless of its shape and characteristics, are uniformly sampled at a fixed sampling time but with variable quantization levels (Fig. 1(a)). In such uniform signal-independent sampling methods, the sampling frequency has to be above its Nyquist rate to retain all signal information [20]. The uniform scheme is blind to the signal properties meaning that the sampling time does not change, no matter if it is an active or inactive part of the signal. The reconstruction of the signal by the collected data would be simply done by linearly extrapolating between sampling points. Increasing the sampling frequency helps the reconstruction of the signal with less quantization error, but at the cost of consuming more power. Data compression techniques using complex algorithms such as Compressed Sensing (CS) can be applied to reduce the amount of recorded data and associated power consumption [20], [21], [22]. However, the power consumption for data acquisition is not reduced as the original data is still sampled at a fixed rate.

In contrast to uniform sampling techniques, a Non-Uniform Sampling (NUS) scheme considers the input signal characteristics. As an example of NUS sampling techniques, Fig. 1(b) shows a signal reconstruction with the Level Crossing (LC) method where the signal is sampled only when the signal amplitude change exceeds a certain predefined level [18], [23],

1549-8328 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Fig. 2. Signal acquisition systems with (a) typical NUS, and (b) proposed DDS sampling scheme.

[24]. In the active area where the signal amplitude changes faster the signal may be sampled at a regular rate, but in an inactive area where the signal amplitude is approximately constant there would be no sampling. Overall, the number of samples is reduced by a factor of more than one, known as CF, compared to uniform sampling which leads to significant power saving. The improvement in power consumption comes at the cost of less accuracy in the reconstructed signal which can be acceptable in many signals with low-frequency contents. Increasing the number of levels/reducing the quantization step, setting adaptive levels [17], or clock-less level crossing [18], [19] have been suggested to improve the signal reconstruction with higher accuracy compared to conventional LC. Other NUS methods implement more complex mathematical relations at the analog circuit level. The Slope Dependent Sampling (SDS) technique [12], as an example, compares the slopes between three sets of the sampled/stored points of the input signal to provide a defined mathematical relation at the analog circuit level. The solution produces less error compared to the LC but requires three successive sampling stages and power-hungry building blocks with complex designs.

In this paper, we propose an NUS technique detecting the changes in the derivative of the input signal over an adjustable threshold. As the proposed technique does not require storing multiple sample points along with additional auxiliary circuits, it can be implemented by a power-efficient circuit configuration and algorithm with comparable accuracy to prior methods. The proposed technique is implemented using a CMOS integrated circuit and tested on a mixed-signal system by applying various types of signals such as ECG and Photoplethysmography (PPG) to prove its efficacy. The rest of the paper is organized as follows: prior NUS algorithms and the proposed technique are introduced and compared in Sections II and III, respectively. Section IV discusses the circuit implementation of the proposed method describing the techniques used to reduce system power consumption. The experimental responses of the system in various applications are presented and the results are compared in Section V. Section VI concludes the paper.

# **II. PRIOR NUS TECHNIQUES**

In contrast to a data acquisition and processing system with uniform sampling, the number of sampled points is reduced in an NUS system, helping to reduce the system total power consumption. The duty of the circuit that implements the NUS scheme, the NUS block, is to determine the most valuable points of the input signal to sample and retain for further processing. It is desirable to implement an NUS system with the highest possible CF in which the reconstructed signal deviation from the original signal remains acceptably low. Most of today's NUS schemes are implemented as part of the data acquisition system, potentially degrading the input Signal to Noise Ratio (SNR) by the presence of NUS block in the signal path, as shown in Fig. 2(a) [4], [10], [11], [12], [17]. The NUS system proposed in [12] uses the SDS scheme in which the Analog-to-Digital Converter (ADC) receives a sample of input signal after three successive sample-and-hold circuits, therefore, any sampling offset and additional noise introduced by these stages degrade the SNR. In the NUS systems using the LC scheme, several analog pre-processing blocks (scalars, multipliers, subtractors, etc.) are utilized before converting to the digital domain [16], [17], which degrade the SNR. Moreover, the LC scheme generally detects reference level corresponding to the input signal level by implementing an analog comparator, potentially vulnerable to offsets. The resolution bits in LC systems are limited to the number of these reference levels. Considering that increasing the number of reference levels requires more power consumption, increasing the resolution bits of the LC systems may not lead to lower overall power consumption despite using an NUS sampling scheme. In this work, we construct an analog signal pre-processing NUS block that is entirely separated from the system chain (Fig. 2(b)), which provides the advantage of preserving signalto-noise performance. In addition to controlling the sampling intervals of the ADC, the proposed block can control other blocks of an acquisition system, including the ADC.

As discussed, the NUS block generates a sign/clock signal whenever a significant event occurs in an input signal, however, the definition of a significant event varies in different NUS methods. Below, two NUS schemes presented in the prior state-of-the-arts are discussed:

1) Level-Crossing (LC) Method: In the LC method, multiple reference levels are defined where the interval between two consecutive levels is defined as the quantization step, as shown in Fig. 3(a). In this method, the sampled signal is quantized and stored when it crosses one of the reference levels, which represents a significant event. For example, the arbitrary signal depicted in Fig. 1(b) (black curve) is non-uniformly sampled when crossing the reference levels,  $L_1$  to  $L_{11}$ , retaining the sampled points shown by red squares that can be used to reconstruct the signal (blue curve). The reconstructed signal tracks the input signal while dropping sampling points in lessactive areas, which results in significant power saving. The LC technique can be implemented with minimal complexity using ultra-low-power circuits.

2) Slope-Dependent Sampling (SDS) Method: In the SDS method proposed in [12], the difference in the slopes between sampled points is used to identify the significant events in the signal, as opposed to the signal level itself. As shown in Fig. 3(b), the sampled points V[m], V[n - 1], and V[n] are the last sampling points retained, the second last, and the last sampled points, respectively. These three points are used to find the slopes  $S_1$  and  $S_2$ . The difference between the



Fig. 3. Scheme algorithm for (a) the LC, and (b) the SDS [12] techniques.



Fig. 4. Overall block diagram of the SDS system [12] to implement Eq. (1).

two slopes,  $|S_1 - S_2|$  is compared to a threshold value ( $\varepsilon$ ). If the difference exceeds the threshold value, the V[n-1] is retained. A mathematical relation for the significant event can be expressed as

$$\left|\frac{V[n] - V[n-1]}{T_s} - \frac{V[n-1] - V[m]}{(n-m-1) \times T_s}\right| \ge \varepsilon, \quad (1)$$

where  $T_s$  is the sampling period. The scheme is implemented as an analog block next to an ADC, as shown in Fig. 4, and generates an ADC trigger signal enabling the ADC whenever an event occurs. Other sampling points are dropped and the ADC and the system are turned off during these intervals. To produce V[m], V[n-1], and V[n] used in Eq. (1), three successive sampled-and-hold circuits are utilized where the last one is clocked by the output ADC trigger. Two subtractors are employed to achieve V[n] - V[n-1] and V[n-1] - V[m]and a binary bank of resistors controlled by a clock is utilized to provide an analog division for the second term of the Eq. (1). A third subtractor followed by some amplifier stages and a comparator is utilized to realize the ultimate form of Eq. (1), ignoring some constant terms such as the gain values. The overall structure is more complex than the LC method requiring a larger number of analog blocks for its realization.

Fig. 5 depicts LC and SDS sampling schemes applied to an arbitrary input signal. The reconstructed signal and the error, the deviation of the reconstructed signal from the input, are shown for each method in this figure. For reconstruction, the retained sampling points by SDS technique are simply connected together with a straight line (first order interpolation) and the sampled level in LC technique is retained until the next significant change. None of the reconstruction techniques, such as higher order linear interpolation methods and derivative level crossing [25], [26], have been applied in reconstruction with these methods. The CF is calculated for each method as



Fig. 5. Applying the LC and the SDS techniques to an arbitrary signal.

the ratio of sampling points in a uniform sampling method to the number of stored sampling points. The resolution values (number of reference levels in LC and  $\varepsilon$  in SDS methods) are chosen to obtain a CF of 2 in both methods for a fair comparison. The Root-mean-square Deviation (RMSD) of the error signal and PR-SNDR are calculated to compare the performance of the two schemes. These parameters can be defined as

$$RMSD = \sqrt{\frac{\sum_{N}^{i=1} x_{e}^{2}}{N}}, \text{ and}$$

$$PR-SNDR = 10 \log \frac{Power(x_{i} - mean(x_{i}))}{Power(x_{e})}, \quad (2)$$

where  $x_i$  and  $x_e$  are input and error values, respectively. In the LC case, the RMSD and PR-SNDR are 0.018, and 31.4 dB, respectively. Increasing the number of reference levels can enhance the RMSD, however, CF will be reduced. The RMSD of the SDS technique (0.0048) is remarkably better than that of the LC technique because of the improved detection of significant events. The PR-SNDR of the SDS method, (42.7 dB) is also higher than the LC method (31.4 dB). Although the SDS method achieves higher accuracy, lower RMSD, and higher PR-SNDR compared to the LC method, the complexity, process or offset variations, and power consumption are noticeably higher than the LC method.

In general, in a clocked NUS scheme, the NUS block should be clocked at a rate equal or higher than the Nyquist rate [12], otherwise, a significant event that would trigger the ADC might be missed. However, the ADC is triggered when a significant event is detected, as a result, the sampling rate of the ADC is dynamically adjusted by the NUS block based on the signal activity to have an overall sampling rate less than the Nyquist rate to save more power as long as the desired accuracy is maintained after the reconstruction of the signal. Therefore, the state-of-the-art NUS schemes try to preserve the accuracy while increasing the CF as much as possible by improving the process of significant event detection. This may increase the complexity and power consumption of the NUS block that implements advanced detection mechanisms.



Fig. 6. Implementation of the proposed DDS scheme (a) in a conceptual case by using a differentiator, and (b) in a practical case using sample-and-hold blocks and a subtractor.

# **III. PROPOSED SAMPLING SCHEME**

As discussed earlier, the process of decision-making in the prior proposed schemes suffers from trade-offs between accuracy, power consumption, and complexity. Therefore, in this paper, we proposed a scheme that can produce significantly higher accuracy than the LC method while it can be implemented with less complexity and power dissipation than the SDS technique.

# A. Proposed Derivative-Dependent Sampling (DDS) Scheme

As depicted in Fig. 6(a), in the DDS method the difference between the instant derivative of the signal (D2) and the last retained derivative (D1) is compared to a threshold value ( $\varepsilon$ ). Exceeding this difference from the threshold value (*i.e.*  $|D_2 - D_1| > \varepsilon$ ), represents a significant event and thus the point corresponding to the instant derivative is converted by ADC and D2 is stored. To implement an ideal DDS system, the input signal is first connected to a differentiator producing an instant derivative of the input signal at its output. Considering that the sample-and-hold circuit (S&H<sub>m</sub>) retains the derivative at the last sampled input, a unity gain subtractor then produces the difference between the current derivative of the input signal and the last retained derivative. Then, it is compared to the threshold value of  $\varepsilon$  using a comparator. If the difference between the derivatives is greater than  $\varepsilon$ , an ADC trigger signal is generated to enable the ADC and the sampleand-hold circuit.

Since using an analog differentiator along with an analog comparator dissipates a considerable amount of power, we propose to implement this differentiator based on calculating the rate of the level change at two consecutive points. In the structure shown in Fig. 6(b), the input is sampled by two sample-and-hold circuits (S&H<sub>1</sub> and S&H<sub>2</sub>) which are enabled

by non-overlapped enable signals, En(t) and  $En(t+t_d)$  where  $t_d$  is a predefined constant delay time. The frequency of the enabling signals is the same as the ADC's main clock frequency, if the ADC is working with uniform sampling. Then, a subtractor provides the difference between the outputs of S&H<sub>1</sub> and S&H<sub>2</sub> which is related to the derivative of the signal assuming that  $t_d$  is small enough with respect to the signal changes. Considering the sample-and-hold circuit S&H<sub>m</sub> stores the output of the first subtractor at the last retained sampling point, V[m], then the mathematical expression for the structure shown in Fig. 6(b) can be written as

$$G_{S1}G_{S2}|(V[n+t_d]-V[n])-(V[m+t_d]-V[m])| \ge \varepsilon, \quad (3)$$

where  $G_{S1}$  and  $G_{S2}$  are the subtractors' voltage gain, V[n]and  $V[n+t_d]$  are the last sampling point and its corresponding delayed point, respectively, V[m] and  $V[m+t_d]$  are the last retained sampling point and its corresponding delayed point, respectively. Dividing both sides of equation by  $t_d$ , we have

$$\left|\frac{V\left[n+t_{d}\right]-V\left[n\right]}{t_{d}}-\frac{V\left[m+t_{d}\right]-V\left[m\right]}{t_{d}}\right| \ge \frac{\varepsilon}{G_{S1}G_{S2}t_{d}}.$$
 (4)

The left-hand side of the above equation represents the difference between the current derivative and the last retained derivative of the signal, or  $|D_2 - D_1|$ . If the condition in the Eq. (4) is satisfied, then this will be considered a significant event and the system generates an enable signal triggering ADC to convert the last sampling point.

Applying the proposed DDS to a sawtooth wave signal, as an example, the method only selects the sampling points on the edges of the signal as the change in the derivative is high at these points. Since the reconstructed signal is generated by the simplest interpolation of the retained points, a large CF with a very low RMSD can be obtained for this type of signal. This is because two sampling points are sufficient in the DDS method to reconstruct a ramp signal rather than several points that are needed with the LC method since a ramp signal crosses several reference levels. In the next subsection, the proposed DDS technique is applied to other types of signals to best evaluate its performance compared to prior NUS methods.

# B. Comparison With Other NUS Methods

Fig. 7 summarizes the process of detecting significant points in the proposed DDS, SDS, and clocked LC techniques by applying them to the same input signal. In the proposed DDS technique, the system tracks the derivative of the input signal to find a significant change in it, while in the SDS, the system subtracts the slope between the last two sampled points from the slope between the last retained and next to the last sampled points. Three sampling points, three mathematical subtractions, and an analog division are needed to calculate the above slopes [12]. This makes the implementation more complex with respect to the DDS and LC methods as several analog building blocks are required to achieve the SDS system. In contrast, the clocked LC scheme can be implemented with minimum complexity, where the system compares the signal level to the fixed or adaptive reference levels to detect a change in the voltage level status. In this example with the specified



Fig. 7. Detection of significant events in the DDS, SDS, and clocked LC schemes.

setup, the DDS, SDS, and clocked LC systems retain 5, 3, and 9 points, respectively.

To compare the proposed DDS technique with the SDS and LC techniques, they are applied to a sinusoidal and a real-world ECG input signal and evaluated using the RMSD while keeping a similar CF for a fair comparison. Fig. 8 shows the reconstructed and error signal of the three NUS techniques along with their RMSD and CF for the two input signals. For a sinusoidal signal, the RMSD is found to be 0.008, 0.007, and 0.046 for the proposed DDS, SDS, and LC techniques, respectively. The reconstructed signals for the proposed DDS and the SDS schemes are generated by connecting the retained sampled points using first-order linear interpolation. For the LC method, a zero-order hold, in which the reconstruction level remains constant until the signal passes another level, is utilized as it produces a higher PR-SNDR than the first-order linear interpolation. Similar to Fig. 5, no post-processing reconstruction method has been applied to the reconstructed signals for all three methods to have a fair comparison. The retained sampling points in the rising and falling parts of the sinusoidal signal in the DDS and SDS techniques are asymmetric. This is mostly due to that the detection process is clock based and depends on the initial phase/sampled points. In the DDS scheme, for example, the derivative of the next points is compared to the first retained sampling point. Therefore, changing the first retained point changes the set of the next retained sampling points, and consequently, the CF and PR-SNDR might slightly change.

For the ECG signal, the RMSD is found to be 0.0058, 0.0054, and 0.0095 for the proposed DDS, SDS, and LC techniques, respectively. In both scenarios, the performance of the LC technique is found to be worse than the other two NUS techniques despite having a lower CF. Increasing the number of reference levels in the LC method can decrease the error but at the cost of decreasing the CF. The proposed DDS method is able to achieve a similar performance compared to the SDS method while keeping the same CF, however, the proposed DDS detection process is significantly simpler than the SDS technique. This saves power while being less sensitive

to non-idealities and design variations. Note that in the above, the input signal is assumed to be isolated from the NUS systems, however, potential degradation in SNR is expected by the SDS and LC techniques, as explained in Fig. 2(a).

Figs. 9(a) to 9(c) show the Power Spectral Density (PSD) of the reconstructed ECG signals plotted in Figs. 8(d) to 8(f) for the proposed DDS, SDS, LC, and uniform sampling schemes. There is no significant difference in the PSD of all methods at frequencies below 20 Hz. The PSD of the reconstructed signal by the LC method shows a greater deviation from the uniform sampling at higher frequencies while the PSD of the proposed DDS and SDS schemes follow the PSD of the uniform sampling. The PSD of the error signals along with their linear regression is shown in Fig. 9(d). Although the PSD of the error signal generated in the LC system has smaller values at the frequencies close to DC (smaller frequency contents), it shows larger errors at higher frequencies. The PSD of the error signals of the proposed DDS and SDS are in the same range while the SDS scheme shows marginally smaller frequency components. Figs. 9(e) and 9(f) also show the PSD of the reconstructed sinusoidal signals and corresponding error signals plotted in Figs. 8(a) to 8(c) for the proposed DDS, SDS, LC, and uniform sampling schemes, where a similar observation can be made. As expected, the PSD of reconstructed signals in all methods peaks at 100 Hz, the fundamental frequency of the single-tone sinusoidal signal, and there is no significant difference in the PSD of all methods at frequencies below 200 Hz. At frequencies higher than 200 Hz, the PSD of the reconstructed signal by the LC method shows greater deviation from the PSD of uniform sampling, especially at the harmonics of the fundamental frequency, and it is confirmed by the PSD of the corresponding error signal. The PSD of the reconstructed sinusoidal and corresponding error signals in SDS and DDS methods show similar behavior within the spectrum with slight differences. The RMSD values shown in Figs. 8(a) to 8(c) and the linear regressions depicted inFig. 9(f) also confirm these observations.

It should be noted that several modifications to the LC scheme, mostly clockless ones, have been also proposed. For example, the adaptive LC scheme introduced in [18] and [19] aims to adjust the comparator thresholds (window) to have a greater quantization step in the fast-moving parts of the signal so that the scheme samples the signal less frequently in these parts. Additional blocks for analog signal preprocessing and a feedback control loop are needed to implement the adaptive LC technique, which increases the structural complexity and power consumption of the NUS block compared to the conventional LC. The derivative LC proposed in [26] also applies an LC sampling on the derivative of the signal (rather than the signal itself) at the transmitter side, then sends the sampled data to the receiver side as the zero-order-hold data, and ultimately applies an integration at the receiver side to represent a first-order reconstruction of the input. As an advantage, the error and the power consumption of the reconstruction process on the receiver side are expected to be reduced if there is a limited power budget on the receiver side. However, the static power consumption of the required clockless analog differentiators and integrators makes the design of the NUS



Fig. 8. A comparison between NUS methods by applying (a) proposed DDS, (b) SDS, and (c) LC techniques on a 100Hz sinusoidal signal, and by applying (d) proposed DDS, (e) SDS, and (f) LC techniques on a real ECG signal.



Fig. 9. PSD of the reconstructed ECG signals in Fig. 8(d)-(f) using (a) the proposed DDS, (b) the SDS, and (c) the LC methods along with the uniform sampling method, (d) PSD of error signals after reconstruction of the ECG signals, (e) PSD of the reconstructed sinusoidal signals in Fig. 8(a)-(c) using the proposed DDS, the SDS, and the LC methods along with the uniform sampling method, and (f) PSD of error signals after reconstruction of sinusoidal signals.

block less energy efficient and more complex in comparison to that of the conventional LC. The DDS scheme proposed in this work tracks the changes in the derivative of the signal by comparing the current derivative with the previously stored derivative. This scheme further reduces the sampling rate if the slope of the signal is not significantly changed, resulting in higher CF and lower system power consumption compared to the LC method and its variants. Moreover, it implements the derivative function utilizing sampled-and-hold circuits without any static power consumption rather than that caused by the transistors' leakage. The DDS block can be also implemented separately from the ADC, triggering it at significant events while not degrading the SNR. Although all the above NUS techniques are signaldependent, we can discuss scenarios in which maximum error occurs in these methods, as illustrated in Fig. 10. For the DDS case, the input signal shown in Fig. 10(a) is analyzed. Here, the last retained derivative,  $D_0$ , is calculated at V[0]. The input signal direction changes as it is sampled at V[1] to V[5], however, the difference between the derivative calculated at these points,  $D_1$  to  $D_5$ , may come close but does not exceed the threshold value. As a result, these sampled points are dropped. It is only until the sample point V[6] that the difference in its derivative,  $D_6$  compared to the last retained derivative,  $D_0$ , exceeds the threshold value and the system retains the sampling point V[6]. The reconstructed

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on September 15,2023 at 22:41:29 UTC from IEEE Xplore. Restrictions apply.



Fig. 10. An example of the worst-case scenario of the error in (a) the proposed DDS, (b) SDS [12], and (c) LC techniques.

signal from the retained sample points is shown in blue and  $Err_{max}$  denotes the maximum error observed. In the worstcase scenario, we can assume V[0] and V[6] are equal to the lowest and highest possible voltage levels, e.g. 0 and VDD, and  $D_0 \approx \varepsilon$  and  $D_4 \approx 2\varepsilon$ . Therefore, the maximum error occurs at V[4] in this scenario. Assuming p and q are the number of silent clock cycles in the flat direction and in the rising direction, respectively, the maximum error can be calculated as

$$\frac{Err_{max}}{VDD} \approx \frac{p \times T}{(p+q) \times T}$$
 or  $Err_{max} \approx \frac{p}{p+q} VDD$ , (5)

where T is the clock period. Accordingly, the maximum error approaches VDD by increasing p while keeping q constant.

Similarly, a maximum error of VDD can occur using the SDS technique. The worst-case scenario in this technique is discussed in [12] and presented in Fig. 10(b). The signal direction deviates for each clock cycle in such a way that the slope between the last retained point and the second last sampling point, or  $S_{0i-1}$ , is not significantly greater than the slope between the last two sampling points, or  $S_{i-1 i}$ . Thus,

$$|S_{0\,i-1} - S_{i-1\,i}| < \varepsilon \quad \text{for} \quad i = 2, 3, 4, 5$$
 (6)

Consequently, the scheme drops the sampling points V[1] to V[4] in the silent clock cycles. Assuming that at the fifth sampling point we have

$$|S_{05} - S_{56}| \ge \varepsilon,\tag{7}$$

the system retains the sampling point V[5]. Similar to the DDS technique, as the number of silent clock cycles increases, the maximum error shown in Fig. 10(b) will approach VDD.

In contrast to the DDS and SDS techniques, the maximum error is equal to the quantization step in the LC method, as shown in Fig. 10(b). This can be reduced by increasing the number of reference levels and the threshold, but this will reduce the CF. While the maximum error is less than the other methods, the overall error power or RMSD will be greater for this method in real-world signal cases as presented in Fig. 8.

The worst-case scenario described for DDS is very unlikely to occur with a real-world signal, in contrast to the worst-case scenario for the SDS method where the shape of the signal resembles a PPG or a low-frequency sinusoidal signal. Furthermore, since the polarity of the derivative changes with a change in the direction of the signal, any peaks presented in the input signal are retained through reconstruction in the DDS method, unlike the SDS and LC methods. There are some solutions to limit this error. One solution would be to reduce the threshold value. Another solution would be to set a limit on the number of silent clock cycles, as suggested in [12],



Fig. 11. (a) Limiting the number of silent/dropped clock cycles by using a counter in parallel to the DDS block to set  $2^N$  as the limit number, and (b) applying to an example scenario of a saturated signal.

which can be simply implemented by a parallel low-power counter and an OR gate (shown in Fig. 11(a)). When the number of silent/dropped clock cycles reaches a determined limit,  $2^N$  in this example, the ADC is enabled and the system retains the sampling point, whether there is a significant event or not. This technique is especially helpful when no significant event occurs for a long period of time, and prevents error due to the long time holding by sampled-and-hold circuits. An example of this scenario may be a saturated signal (shown in Fig. 11(b)), possibly due to a large analog front-end gain. In the inactive/saturated part of this input signal example, the system is adjusted to work at the least sampling rate defined by a controllable counter. Although this might be a concern in some applications, in most real-world signal cases we considered, the number of silent/dropped sampling points has not reached the limit, especially when N and the master clock frequency are selected properly.

To summarize the above discussions, the accuracy, complexity, number of building blocks, and power consumption of the above NUS schemes are mostly dependent on the complexity of the equation implemented at the circuit level to detect the significant event. Therefore, among the LC, SDS, and DDS, the LC method is expected to have the least complexity and power consumption but possibly the worst accuracy (especially with respect to higher frequency contents), while the SDS has the most complexity and power consumption with possibly the best accuracy. The proposed DDS presents a comparable accuracy to the SDS while the complexity and power consumption are considerably less. Note that there is no significant difference in the response time (decision delay) between the DDS, SDS, and clocked LC techniques since it is mostly dependent on the master clock frequency. The response time of a clockless LC system, Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on September 15,2023 at 22:41:29 UTC from IEEE Xplore. Restrictions apply.



Fig. 12. (a) Implementation of proposed DDS system at the circuit level, and (b) System response to an arbitrary input signal.

on the other hand, is mainly dependent on the supported bandwidth of the system and scales directly with its power consumption. As the main part of the DDS, SDS, and LC systems that is responsible for the detection of significant events is clocked by a predetermined or fixed master clock, its power consumption does not vary based on the signal activities. However, the power consumption of the other part of the system, which is activated only when significant events are detected to produce the output trigger signals, scales with the input signal activities, but it only accounts for a very small percentage of the total power consumption. Therefore, the total power consumption of these NUS systems does not vary noticeably depending on the signal activities.

# IV. CIRCUIT IMPLEMENTATION OF PROPOSED SCHEME

As described in Section III-A, if the time delay between the two sampled points is constant and small enough, the derivative of the signal can be approximately calculated by dividing the change in the signal level over a short time delay. Therefore, the circuit implementation does not require the actual derivative to be computed and a combination of sample and hold circuits and subtractors can be used to find and compare the input signal's last derivative and retained derivative. Fig. 12(a) illustrates the proposed circuit implementation of the DDS system. The structure incorporates a clock manager that generates six clock signals from an input master clock,  $CK_{in}$ , with a frequency of  $f_s$ . It generates  $CK_1$  at the rising edge of the master clock in addition to  $CK_2$ , which is a delayed non-overlapping version of  $CK_1$ . A third clock signal,  $CK_T$ , is generated for the subtractor and window comparator amplifiers, which is equal to the union of  $CK_1$  and  $CK_2$ , with a rising edge before the rising edge of  $CK_1$  and a falling edge after the falling edge of  $CK_2$ . The clock signal  $CK_T$ ensures the amplifiers are turned on only when the system should be active. A controller voltage,  $V_{TUNE}$ , is used to tune the pulse width of  $CK_1$  and  $CK_2$ . It is typically set to the smallest practical value that can guarantee proper sampling (proper settling time) with minimal power overhead. The low duty cycle of the generated clock signals improves the overall power efficiency of the system.

The input signal is sampled at the first stage,  $V_0$ , using a sample and hold circuit driven by  $CK_2$ . The following subtractors,  $S_1$  and  $S_2$ , take and amplify the difference between the input signal and  $V_0$  to find the approximate derivative of the input signal. The outputs of subtractors are passed to two samples and hold circuits, one is driven by  $CK_1$  and the other is driven by  $CK_A$  produced by ANDing the  $CK_1$  with the system output, the ADC trigger. Therefore,  $V_1$  is updated with the last approximated derivative at every rising edge of  $CK_1$ , and at node  $V_2$ , the last retained approximated derivative is kept until the next significant event occurs. Similarly, subtractor  $S_3$  takes and amplifies the difference between the last derivative and the retained derivative of the input, and passes it to  $V_3$  on the rising edge of  $CK_2$ . At the final stage, a window comparator circuit compares  $V_3$  with a threshold value of  $\varepsilon$ . The output of the comparator, the ADC trigger, is high when  $V_3$  is greater than  $\varepsilon$ + or less than  $\varepsilon$ -. Note that the node  $V_3$  is DC biased at  $V_{REF}$ , therefore, we have  $\varepsilon + = V_{REF} + \varepsilon$  and  $\varepsilon - = V_{REF} - \varepsilon$  where  $\varepsilon$  is the threshold value. The output of the comparator goes low at the rising edge of the next cycle when the feedback  $CK_A$  enables the second sampled-and-hold circuit, thus  $V_1$  equals  $V_2$ , and then  $V_3$  will be approximately equal to  $V_{REF}$ . Although the performance of the structure depends on the signal type, the CF can be controlled by adjusting the threshold values.

Fig. 12(b) shows comparator input  $(V_3)$  in response to an arbitrary input signal. As depicted, the ADC trigger is changed from low to high when  $V_3$  reaches  $\varepsilon$ + or  $\varepsilon$ - thresholds. The decision delay, a clock cycle delay for the detection of significant events, is also shown in this figure. In this example, significant events occur at the edges of the input signal, thus the ADC trigger output goes high in response after one clock cycle. Note the ADC trigger signal can be ANDed with the master clock or its inverted version, if needed, to be applied to the system ADC.

Some important design considerations for the implementation are as follows:

(a) Using two independent subtractors at the first stage instead of a single shared subtractor helps to avoid the undesired charge sharing between  $C_1$  and  $C_2$ . This reduces the likelihood of false alarms in the detection process.

(b) The large size of the transmission gate provides a smaller on-resistance which is desirable for reducing settling time, but it also increases the parasitic capacitance which would be a problem for proper sampling in addition to the increasing power of the clock manager. The size of these transistors is optimized considering this trade-off.

(c) The capacitors in the sample-and-hold circuits should be properly sized; a small capacitor results in a significant offset due to charge sharing or leakage, while a large capacitor increases the settling time or dissipates more power.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on September 15,2023 at 22:41:29 UTC from IEEE Xplore. Restrictions apply.



Fig. 13. Circuit implementation of (a) the subtractor, (b) the proposed current reuse comparator, and (c) the proposed clock generator.

(d) A significant event can be defined by setting a threshold value ( $\varepsilon$ ), subtractor voltage gain ( $G_S$ ), and time difference ( $t_d$ ) as shown in Eq. (4). The  $t_d$  is fixed to reduce complexity, but the voltage gain of subtractors can be controlled by tuning the large resistors,  $R_1$  to  $R_3$ , at the output of amplifiers. Note that the  $V_{REF}$  is ac grounded by large off-chip capacitors.

(e)  $\varepsilon$  + and  $\varepsilon$  - can be unbalanced when there is an offset in  $V_{REF}$  or in cases where we prefer to have a different threshold value for ascending and descending parts of the input.

(f) If we apply  $CK_1$  to the first sampling circuit and  $CK_2$  to the next sampling stage after the subtractor, then the small delay between  $CK_1$  and  $CK_2$ , *i.e.* small  $t_d$  in Eq. (4), necessitates a higher voltage gain in the subtractor stage. This will increase the sensitivity of the significant event detection to noise. Moreover,  $V_3$  would be generated before  $V_1$  and  $V_2$  settle if there is no delay for the second subtraction stage. To avoid these difficulties,  $CK_2$  is applied to the first sampling circuit transmission gate and  $CK_1$  is applied to the output of the subtractor. This delays the decision by a clock cycle as mentioned above, but it can be compensated if the input signal is delivered to the ADC with the same delay or ignored for a low-frequency signal.

The transistor-level design of the main building blocks of the systems are discussed in the following subsections:

# A. Subtractor

Three subtractors have been used in the first and second stages of the DDS system where they take and amplify the difference between their inputs. It is expected that the difference between their inputs is small, therefore, the subtractor requires an accurate, high voltage gain amplifier with high input common-mode range and high common-mode rejection ratio (CMRR). To meet all the above criteria, we have chosen the circuit topology shown in Fig. 13(a). The circuit consumes around 1.44  $\mu$ A in regular operation from a supply voltage of 1 V, and its voltage gain can be expressed as

$$V_{OUT} = \frac{g_{m_{P1,P2}}}{g_{ds_{P3}} + g_{ds_{N3}} + \frac{1}{R_{out}}} \times \left(V^+ - V^-\right), \quad (8)$$

where  $g_{M_{P1,P2}}$  is the transconductance of the transistors  $M_{P1}$ and  $M_{P2}$ , and  $g_{ds_{N3}}$  and  $g_{ds_{P3}}$  are the drain to source conductance of  $M_{N3}$  and  $M_{P3}$ , respectively. The  $R_{out}$  is the variable resistor placed at the output which is the  $R_1$  to  $R_3$  resistors in Fig. 12(a).

To build a power-efficient system, an enable signal, connected to  $CK_T$ , is applied to  $M_{EN1}$  to  $M_{EN3}$  to turn them on

only when necessary. These transistor widths should satisfy the trade-off between low on-resistance and low parasitic capacitance. The former is to avoid significant drop-off from the voltage supply and the latter is to avoid the long charging time of parasitic capacitors to make the circuit ready for operation. As discussed earlier, the rising edge of  $CK_T$  as the enable signal should arrive before the rising edge of  $CK_1$ , and also  $CK_2$ . This is to establish the amount of parasitic capacitance and prepare the amplifiers for subtraction. The greater the width of  $M_{EN1}$  to  $M_{EN3}$ , the smaller the on-resistance and the larger the parasitic capacitance, therefore, the larger the time difference between  $CK_T$  and  $CK_1$ . In this design, the on-resistance of  $M_{EN1}$ - $M_{EN3}$  is set around 100  $\Omega$ .

# B. Comparator

Fig. 13(b) shows the circuit implementation of amplifiers used in comparator design. The proposed current reuse structure is used for the first stage to save power  $(I_{total,comp} \approx$ 306 nA) and to obtain a high voltage gain ( $\approx$  36 dB). An NMOS differential pair,  $MN_{1,2}$ , and a PMOS differential pair,  $MP_{1,2}$ , are used in the first stage to build the current reuse structure. The cascode transistors,  $MN_{3,4}$  and  $MP_{3,4}$ , provide greater output resistance, therefore, higher gain for the first stage. The input range decreases when using multiple stacked transistors, however, this effect is negligible as the reference levels,  $\varepsilon$ +, and  $\varepsilon$ -, are normally chosen to be close to half of the voltage supply. This is fed to a buffer inverter to further improve the gain. The output transistors of both stages  $(MN_{3-5} \text{ and } MP_{3-5})$  are sized to reduce the parasitic capacitance, which reduces power consumption and makes the comparator faster. The parasitic capacitance of the input is negligible since the comparator input is connected to either a constant voltage ( $\varepsilon$ + or  $\varepsilon$ -) or the sample and hold capacitor  $(C_3).$ 

The threshold values of  $\varepsilon$ + or  $\varepsilon$ - can be set to be fixed or can be adaptively adjusted through a feedback-controlled loop. In adaptive LC proposed in [18] and [19], for example, a selfcalibration circuit adjusts the reference levels to have larger quantization steps for fast-moving parts of the signal and finer quantization steps during segments of low activity. This may be at the cost of an additional analog signal preprocessing implemented by several analog building blocks which may add complexity and power consumption. A similar strategy can be applied to the DDS system to achieve an adaptive threshold value based on signal characteristics, e.g., reducing  $\varepsilon$  for an ECG signal with a smaller amplitude. A calibration procedure



Fig. 14. Tuning voltage  $V_{TUNE}$  versus the pulse-width of  $CK_1$  and the overall power consumption of the clock manager.

through Digital Signal Processor (DSP) or other possible analog implementation can be also utilized for achieving a targeted CF, and correspondingly, a targeted accuracy. A simple analog calibration, for example, is to calculate the CF in a certain period (e.g.,  $2^M$  clock cycles) using a simple counter that counts the number of ADC triggers, and accordingly, adjusts the resolution threshold,  $\varepsilon$ .

# C. Clock Generator

The implementation of the proposed clock manager and its generated time signals are shown in Fig. 13(c). Two types of delay circuit is utilized in the structure as follows:

(a) A fixed delay is achieved by a simple on-chip R-C circuit followed by an inverter. This delay box is mainly used to provide a predetermined delay between  $CK_T$  and  $CK_1$ , as well as  $CK_2$  and  $CK_T$ . It is also used at the last stage of  $CK_2$  generation to guarantee that  $CK_1$  and  $CK_2$  do not overlap.

(b) A tunable delay is produced by a transmission gate driving an on-chip capacitor. As the on-resistance of the transmission gate switch can be controlled by the gate voltage,  $V_{TUNE}$ , the RC delay of the gate can be varied accordingly. This delay is used to provide the tunable pulse width of  $CK_1$  and  $CK_2$ , and it is duplicated to provide the width required for the charging time of the sampled-and-hold capacitors.

As shown in Fig. 13(c), the master clock is ANDed with its delayed version to generate  $CK_1$ . The same blocks are successively used for  $CK_2$ , except for an additional fixed delay at the final stage producing non-overlap  $CK_1$  and  $CK_2$ . Moreover, ANDing the master clock and its delayed version after multiple delay blocks generate  $CK_T$  with a pulse width equal to the sum of pulse widths of  $CK_1$  and  $CK_2$ .

As discussed, the pulse width of  $CK_1$  (and  $CK_2$ ) is frequency independent and is tuned to the smallest possible value that provides proper settling time for all building blocks of the system. However, the smallest possible pulse width may change based on the type of signal and due to process variations, it may need to be calibrated. In the proposed clock generator shown in Fig. 13(c), the delay blocks can be controlled by  $V_{TUNE}$  that adjusts the on-resistance of the switches to change the pulse width of  $CK_1$ , and  $CK_2$ . Fig. 14 shows the change in pulse width and power of the clock generator with  $V_{TUNE}$  with a 1 kHz master clock. An optimal range of  $V_{TUNE}$  is found to be between 0.35 V and 0.5 V, where the clock generator consumes less than 35 nW while providing a pulse width between 2.5  $\mu$ s and 5  $\mu$ s for proper operation of the system.

#### V. EXPERIMENTAL RESULTS

The proposed DDS system is implemented in TSMC's 130 nm CMOS technology. The fabricated circuit occupies a die area of 0.04 mm<sup>2</sup> as shown in Fig. 15(a). To compare the performance of the proposed DDS scheme to prior state-ofthe-art schemes, the test bench shown in Fig. 15(b) has been used. The input analog signal is applied to analog buffers and their outputs are connected to two external identical 12-bit Successive Approximation Register (SAR) ADCs; one of the ADCs is operating with the uniform sampling scheme and the other is triggered by the proposed DDS building block. Note that the DDS technique is not dependent on the type of ADC or its characteristics (e.g., the resolution bits). The master clock generated by the micro-controller is applied to the DDS system and the ADC that uses a uniform sampling scheme. As the system has been tested with various signals, the master clock frequency and the comparator references,  $\varepsilon$  + and  $\varepsilon$ -, can be tuned to any desired values to achieve higher CF (higher power saving) or higher PR-SNDR (higher accuracy). The master clock frequency,  $\varepsilon +$  and  $\varepsilon -$  are not fixed for a given input signal.

# A. Test Procedure

The system has been tested using various ideal and realworld signals. Fig. 16 depicts the measured system response to ideal saw-tooth and sinusoidal signals. The reconstructed signal using the uniform scheme (black), the reconstructed signal with the proposed DDS scheme (blue), the output ADC trigger signal (light blue), and the error signal (red) are depicted in each case. The first-order linear interpolation used in reconstruction might be not considered for applications with a tight power budget on the receiver side, as it may need intensive computations in the process. However, the main purpose of the proposed DDS system is to reduce the number of data points at the transmitter side to further reduce the overall power consumption of the data acquisition system. Therefore, no power limit is considered on the receiver side in the experimental results. Note that no other post-processing reconstruction method has been applied to the reconstructed signals to provide a fair comparison to the other state-of-theart schemes. As shown in Fig. 16(a), the system generates its output, the trigger signal, one clock cycle after the edge of the saw-tooth signal when the derivative of the signal changes. With an input clock of 1 kHz, the DDS system achieves an 11.1 dB PR-SNDR with a compression factor of 22. The input signal to the ADC can be delayed to compensate for the error due to the decision delay. The system response in this scenario is shown in Fig. 16(b), where using the same setup and compression factor results in a PR-SNDR of 29.7 dB. For input signals with higher frequency content, the effect of this compensation would be more noticeable. This delay for compensation can be simply implemented by a two-stage sample-and-hold circuits or a buffer delay, although



Fig. 15. (a) Die micrograph of the DDS block and (b) diagram of test bench.



Fig. 16. System measured responses to ideal case signals, (a) saw-tooth signal, (b) saw-tooth signal with compensated delay, and (c) sinusoidal signal.



Fig. 17. Experimental results of the DDS system, (a) ECG signal with high CF setup and a 1 kHz clock, (b) ECG signal with low CF setup and a 1 kHz clock, (c) ECG signal with 250 Hz clock, (d) Noisy ECG signal with high CF setup and a 1 kHz clock, (e) PPG signal with high CF setup, (f) PPG signal with low CF setup, (g) EEG signal with high CF setup, and (h) EEG signal with low CF setup.

they are not used in the following measurements to isolate the pre-signal processing from the ADC. Fig. 16(c) presents the measured system response to an ideal 20 Hz sinusoidal signal, where the system achieves a PR-SNDR of 21.4 dB (CF = 6.1). Fig. 17 shows the response of the proposed system to various real-world biomedical signals with both high and low CFs, such as ECG, PPG, and Electroencephalogram (EEG). No post-processing methods have been utilized for reconstruction. The effects of noise and sampling rate are also shown



Fig. 18. Monte Carlo simulation on CF and PR-SNDR resulted in sampling a noisy ECG signal using the proposed DDS system to investigate the effects of PVT variations.



Fig. 19. DDS system power dissipation vs. clock frequencies.

for the ECG signal. In this figure, the ADC trigger signal pulses are shown separately in a subplot below the main plot. In Fig. 17(a) and Fig. 17(b), an ECG signal with a heart rate of 60 bpm is applied to the system with a 1 kHz master clock. A PR-SNDRs of 19.1 dB and 26.6 dB are obtained for 9.7 and 6.7 CF, respectively. The comparator window can be also narrowed to reduce the CF for a better PR-SNDR. Fig. 17(c) shows that the system is functional at different sampling frequencies. Using a modified 250 Hz master clock, shown in Fig. 17(c), the system achieves an 18.7 dB PR-SNDR with a 4.9 CF. Modifying the sampling frequency may change the number of stored data points (the number of times the ADC is turned on), as well as the CF. For example, in the case of Fig. 17(c), with a 250 Hz master clock and CF of 5, the ADC turns on half as often compared to Fig. 17(b) with a 1 kHz master clock and CF of 10.

To evaluate the effect of input noise, a non-periodic noisy ECG signal is applied to the system in Fig. 17(d). The noisy signal combined with a high CF of 12.7 represents a suboptimal scenario for the DDS system, where a PR-SNDR around 15.4 dB is measured. As a result of the noise, the system incorrectly detects significant events in the signal, especially between the T-wave and P-wave of the signal where the signal is flat. However, the system is still able to properly detect active parts of the signal. While the effects of noise in an ECG signal are shown, it should be noted that the effects of noise are typically reduced by a filter stage in the analog front-end. Various approaches such as blanking and linear interpolation [27] can be also used to deal with noise and artifact cancellation in bio-signals.

Fig. 17(e) and Fig. 17(f) show the results of the system in response to a PPG signal (1 kHz clock frequency) using a high and low CF, respectively. With a CF of 32.6 and 10.5, a 21.2 dB and 31.2 dB PR-SNDR are measured, respectively. High CF and PR-SNDR are obtained for a PPG signal due

TABLE I POWER DISTRIBUTION OF THE SYSTEM WITH A 1KHZ MASTER CLOCK

| TOWER DISTRIBUTION OF THE STSTEM WITH A TRIE MASTER CLOCK |           |               |               |            |              |  |  |  |  |  |
|-----------------------------------------------------------|-----------|---------------|---------------|------------|--------------|--|--|--|--|--|
|                                                           | Clock     |               |               | Logic      | Total        |  |  |  |  |  |
|                                                           | Generator | Subtractors   | Comparators   | Gates      | Power        |  |  |  |  |  |
|                                                           | (nW)      | ( <b>nW</b> ) | ( <b>n</b> W) | (nW)       | (nW)         |  |  |  |  |  |
| Scenario I                                                | 28.1      | 41.5          | 6.7           | 0.8        | 77.2         |  |  |  |  |  |
| Scenario II                                               | 54.1      | 79.4          | 11.2          | 1.4        | 146.2        |  |  |  |  |  |
| Scenario III                                              | 54.1      | 79.2          | 9.1           | 0.7        | 143.2        |  |  |  |  |  |
| Percentage                                                | ~36-38%   | $\sim$ 52-55% | ~6-9%         | $\sim 1\%$ | $\sim 100\%$ |  |  |  |  |  |

Scenario I. Maximum output activity, V<sub>TUNE</sub>=0.5V

Scenario II. Maximum output activity,  $V_{TUNE}$ =0.35V

Scenario II. ECG input signal, CF $\approx$ 6, V<sub>TUNE</sub>=0.35V

to its low-frequency contents combined with a low number of edges, similar to a saw-tooth signal. Fig. 17(g) and Fig. 17(h) also show the high and low CF reconstruction of an EEG signal (1 kHz clock frequency), where PR-SNDR of 15.7 dB and 19.3 dB for CF of 9.1 and 5.8 have been measured, respectively. Due to the presence of sharp edges in an EEG signal, the decision delay has a larger effect on EEG signals compared to PPG and ECG signals.

# B. PVT Variation Effects

Fig. 18 shows Monte Carlo simulation results on the CF and PR-SNDR with 1000 random iterations to investigate the performance of the designed DDS system in the presence of PVT variations. A noisy ECG signal is selected as the input signal and the resolution values are set for achieving approximately high CF (at 1kHz master clock) since greater variations in Monte Carlo results are expected when the number of retained sample points by the DDS system is smaller. The mean values achieved for CF and PR-SNDR are 6.28 and 28.058 dB, respectively, and the standard deviations for normal distribution are approximately 0.12 and 0.01 dB, respectively. The achieved CF and PR-SNDR are close to the nominal values in most iterations, therefore, the designed system performance is sufficiently robust to PVT variations.

## C. Power Consumption

The implemented DDS system shown in Fig. 12(a) can be divided into four principal main building blocks: (1) three subtractors, (2) two comparators, (3) a clock generator, and (4) several digital logic gates. The total power consumption of the system can be expressed as

$$P_{total} = P_{Subt} + P_{Comp} + P_{Clk} + P_{Logic}, \tag{9}$$

where  $P_{Subt}$ ,  $P_{Comp}$ ,  $P_{Clk}$ , and  $P_{Logic}$  are the power dissipation in subtractors, comparators, clock manager, and digital logic gates, respectively. Note the power described in the first three terms is in direct relation to the pulse width of  $CK_T$  and the master clock frequency,  $f_{CKin}$ . Therefore, a higher clock frequency or enabling time will result in higher power dissipation. Except for  $P_{Clk}$ , all other terms depend on the input signal type that can also affect output switching activity.

Fig. 19 shows the measured and simulated power consumption of the DDS system at its maximum activity (CF = 1) for different input clock frequencies,  $f_{CK_{in}}$ . As expected, the static leakage power is dominant at clock frequencies lower than 100 Hz. As the total power varies with the pulse width

| TABLE II                                           |      |
|----------------------------------------------------|------|
| PERFORMANCE COMPARISON WITH THE PRIOR STATE OF THE | ARTS |

|                                      | This Work          | [12]              | [25]                 | [28]               | [29]             | [13]          | [14]                |
|--------------------------------------|--------------------|-------------------|----------------------|--------------------|------------------|---------------|---------------------|
| Topology                             | DDS+ADC            | SDS+ADC           | Clockless LC<br>+ADC | Clocked LC<br>+ADC | CS-AFE<br>(RMPI) | Data CS       | Adaptive Rate       |
| Sampling Scheme                      | Nonuniform         | Nonuniform        | Nonuniform           | Nonuniform         | Uniform          | Uniform       | Nonuniform          |
| Implementation                       | Analog             | Analog            | Analog               | Analog             | Analog           | Digital       | Analog              |
| Technique Isolation<br>from Sampling | Yes                | No                | No                   | No                 | N/A              | N/A           | No                  |
| Input Signal Type                    | All                | All               | All                  | All                | Bio-Signals      | Bio-Signals   | ECG                 |
| ADC Resolution Bits                  | 12 <sup>a</sup>    | $12^{a}$          | 6                    | 8                  | 10               | 10            | 11                  |
| CF                                   | Widely<br>Tunable  | Widely<br>Tunable | N/A <sup>b</sup>     | N/A                | Tunable          | Fixed (2.38)  | Tunable<br>(<7.3)   |
| Voltage Supply (V)                   | 1                  | 2                 | 0.8                  | 1.8-2.4            | 0.9              | 1             | 2                   |
|                                      | 578nW <sup>c</sup> | $1.7 \mu W^c$     | 313-582 nW           | 0.6-1.7 μW         | $1.8 \ \mu W$    | $170 \ \mu W$ | 8.4 µW              |
| Power Consumption                    | for ECG @CF≈6      | for ECG @CF≈6     | for 5Hz-5kHz         | for ECG            | for ECG          | for ECG       | for ECG             |
|                                      | (1kHz Clock)       | (1kHz Clock)      | Input Sinewave       | (32kHz Clock)      | (2kHz S.R.)      | (256Hz S.R.)  | (64Hz-1024Hz S.R.)d |
| CMOS Technology                      | 0.13-µm            | 0.18-µm           | 0.18-µm              | 0.35-µm            | 0.13-µm          | 65-nm         | 0.5-µm              |

<sup>a</sup> The test bench ADC resolution bits (this varies with the selected ADC attached to the NUS block) <sup>b</sup>Not applicable for clockless LC

<sup>c</sup>The reported power consumption includes both the NUS block and test bench ADC (this varies with the selected ADC's power and CF) <sup>d</sup>Adaptive sampling ratio up to 1024Hz

of the enable signal  $(CK_T)$ , we have reported the simulated power dissipation for  $V_{TUNE}$  equal to 0.35 V and 0.5 V. The measured power dissipation is with  $V_{TUNE}$  set to 0.35 V, except for when the clock frequency is at 100 kHz, where  $V_{TUNE}$  is set to 0.5 V because of the shorter period. Note that the ADC's power is not included in the graph as it varies depending on the utilized ADC.

Table. I shows the power distribution of the DDS system in three different scenarios in simulation. The maximum activity, where the CF is equal to 1, is considered for the first two scenarios, while in the last scenario an ECG signal is applied to the system with a CF around 6. Although the total power is doubled in scenario II with  $V_{TUNE}$  equal to 0.35 V because of a larger pulse width compared to Scenario I, the percentage of power consumed by the blocks does not change significantly. A comparison between Scenario II and III also shows the CF has a negligible effect on the power dissipation of the blocks, except for the dynamic logic gates, since these blocks have to be on at the time of the decision. A counter may also be required to count the distance between two stored sampling points. A typical design of a counter may add around 10 nA current (1 kHz master clock) to the total current dissipation.

Table II compares the performance of the implemented DDS technique with recent state-of-the-art uniform and non-uniform sampling techniques. Compared to other methods, the DDS block is entirely isolated from the sampling path, therefore, its effect on the signal-to-noise ratio during the decision-making process is minimized, and it can be used with any available ADCs. The DDS method is not limited to specific signal types unlike [13], [14], [29], and its CF and PR-SNDR are widely tunable by adjusting the reference thresholds, voltage gain, and/or clock frequency. The accuracy of the different methods presented in Table II cannot be easily compared since they are inherently signal dependent. An ECG signal similar to the one analyzed in Fig. 17 has been applied to the SDS method in [12]. The reported PR-SNDR of 28.7 dB with a CF of 6.1 is comparable to our measured PR-SNDR of 26.6 dB with a CF of 6.7 for the proposed DDS method. However, the DDS method achieves higher CFs than those of SDS method for PPG signals while having better PR-SNDR.

As discussed, using the implemented DDS technique in a data acquisition system allows it to wake up only when a significant event occurs, therefore, the dynamic power is reduced by decreasing the operating time. The Power Saving Factor (PSF) in a data acquisition system using an NUS block can be defined as [12]

$$PSF = \left(1 - \frac{\text{System Power w/ NUS}}{\text{System Power w/o NUS}}\right) \times 100\%.$$
(10)

If the entire system power is mostly determined by the dynamic power, the maximum PSF is equal to  $(1 - 1/CF) \times$ 100%, considering the power consumption of the NUS block is negligible compared to the system, *i.e.*, with a common CF of 6, a maximum PSF of 83.3% can be achieved. For example, with a Texas Instrument CC2650 RF microcontroller, commonly used in low-power wireless sensor systems, the standalone microcontroller consumes 1.7 mW. With the DDS block, the microcontroller power can be reduced by a PSF of 81%. On its own, the DDS system only consumes 155 nW, which is 8 times less than the power consumption of the SDS system in [12]. Due to the low complexity of the DDS technique, the area of the fabricated circuit  $(0.04 \text{mm}^2)$  is one of the smallest among the reported works in Table II. The area is comparable to the fabricated LC systems in [25] and [28], while it is three times less than the chip area used by SDS system [12].

#### VI. CONCLUSION

An ultra-low-power non-uniform sampling scheme using a derivative-based algorithm is proposed in this paper. The derivative-based algorithm can maintain a comparable accuracy to other sampling schemes, however, the algorithm can be implemented with simple analog blocks that reduce its complexity, thereby, its power consumption compared to other schemes. Several techniques have been used to further reduce the power and improve the tunability of the system. The reference threshold and voltage gain of the proposed system can be tuned to achieve any desirable PR-SNDR or CF. The proposed system is placed next to, but isolated from, the ADC in an acquisition system and enables it when necessary to reduce the total power dissipation of the system. It is fabricated in TSMC's 0.13  $\mu$ m CMOS technology and tested with real-world and ideal signals. The DDS system consumes less than 155 nW. By adding the proposed DDS system to a data acquisition system chain, the power dissipation of the entire system can be remarkably reduced.

#### ACKNOWLEDGMENT

The authors would like to thank Ehsan Hadizadeh for his great assistance during the project.

#### REFERENCES

- T.-S. Chen, H.-C. Kuo, and A.-Y. Wu, "A 232–1996-kS/s robust compressive sensing reconstruction engine for real-time physiological signals monitoring," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 307–317, Jan. 2019.
- [2] L. Ye et al., "The challenges and emerging technologies for low-power artificial intelligence IoT systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 12, pp. 4821–4834, Dec. 2021.
- [3] P. Zhai, Z. Zhu, X. Zhou, Y. Cai, F. Zhang, and Q. Li, "An on-chip power-supply noise analyzer with compressed sensing and enhanced quantization," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, pp. 302–311, Jan. 2022.
- [4] T.-F. Wu and M. S.-W. Chen, "A noise-shaped VCO-based nonuniform sampling ADC with phase-domain level crossing," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 623–635, Mar. 2019.
- [5] T.-F. Wu, C.-R. Ho, and M. S.-W. Chen, "A flash-based non-uniform sampling ADC with hybrid quantization enabling digital anti-aliasing filter," *IEEE J. Solid-State Circuits*, vol. 52, no. 9, pp. 2335–2349, Sep. 2017.
- [6] M. Zulqarnain et al., "A flexible ECG patch compatible with NFC RF communication," *Npj Flexible Electron.*, vol. 4, no. 1, p. 13, Jul. 2020.
- [7] J. Ding, Y. Tang, L. Zhang, F. Yan, X. Gu, and R. Wu, "A novel frontend design for bioelectrical signal wearable acquisition," *IEEE Sensors J.*, vol. 19, no. 18, pp. 8009–8018, Sep. 2019.
- [8] C.-Y. Chou, K.-C. Hsu, B.-H. Cho, K.-C. Chen, and A.-Y.-A. Wu, "Lowcomplexity on-demand reconstruction for compressively sensed problematic signals," *IEEE Trans. Signal Process.*, vol. 68, pp. 4094–4107, 2020.
- [9] T. Moy et al., "An EEG acquisition and biomarker-extraction system using low-noise-amplifier and compressive-sensing circuits based on flexible, thin-film electronics," *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 309–321, Jan. 2017.
- [10] J. Xiang, Y. Dong, X. Xue, and H. Xiong, "Electronics of a wearable ECG with level crossing sampling and human body communication," *IEEE Trans. Biomed. Circuits Syst.*, vol. 13, no. 1, pp. 68–79, Feb. 2019.
- [11] I. Zhou et al., "Compressed level crossing sampling for ultra-low power IoT devices," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 9, pp. 2495–2507, Sep. 2017.
- [12] E. H. Hafshejani, M. Elmi, N. TaheriNejad, A. Fotowat-Ahmady, and S. Mirabbasi, "A low-power signal-dependent sampling technique: Analysis, implementation, and applications," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 12, pp. 4334–4347, Dec. 2020.
- [13] E. Chua and W.-C. Fang, "Mixed bio-signal lossless data compressor for portable brain-heart monitoring systems," *IEEE Trans. Consum. Electron.*, vol. 57, no. 1, pp. 267–273, Feb. 2011.
- [14] R. F. Yazicioglu, S. Kim, T. Torfs, H. Kim, and C. Van Hoof, "A 30 μW analog signal processor ASIC for portable biopotential signal monitoring," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 209–223, Jan. 2011.
- [15] W. Tang et al., "Continuous time level crossing sampling ADC for biopotential recording systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 6, pp. 1407–1418, Jun. 2013.
- [16] Y. Hou, K. Yousef, M. Atef, G. Wang, and Y. Lian, "A 1-to-1-kHz, 4.2to-544-nW, multi-level comparator based level-crossing ADC for IoT applications," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 65, no. 10, pp. 1390–1394, Oct. 2018.
- [17] H. Wang, F. Schembari, M. Miśkowicz, and R. B. Staszewski, "An adaptive-resolution quasi-level-crossing-sampling ADC based on residue quantization in 28-nm CMOS," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 8, pp. 178–181, Aug. 2018.

- [18] C. Weltin-Wu and Y. Tsividis, "An event-driven clockless level-crossing ADC with signal-dependent adaptive resolution," *IEEE J. Solid-State Circuits*, vol. 48, no. 9, pp. 2180–2190, Sep. 2013.
- [19] M. Kurchuk and Y. Tsividis, "Signal-dependent variable-resolution clockless A/D conversion with application to continuous-time digital signal processing," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 5, pp. 982–991, May 2010.
- [20] E. J. Candes and M. B. Wakin, "An introduction to compressive sampling," *IEEE Signal Process. Mag.*, vol. 25, no. 2, pp. 21–30, Mar. 2008.
- [21] D. L. Donoho, "Compressed sensing," *IEEE Trans. Inf. Theory*, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
- [22] W. Guo, Y. Kim, A. H. Tewfik, and N. Sun, "A fully passive compressive sensing SAR ADC for low-power wireless sensors," *IEEE J. Solid-State Circuits*, vol. 52, no. 8, pp. 2154–2167, Aug. 2017.
- [23] J. W. Mark and T. Todd, "A nonuniform sampling approach to data compression," *IEEE Trans. Commun.*, vol. COM-29, no. 1, pp. 24–32, Jan. 1981.
- [24] Y. Tsividis, "Event-driven data acquisition and digital signal processing—A tutorial," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no. 8, pp. 577–581, Aug. 2010.
- [25] Y. Li, D. Zhao, and W. A. Serdijn, "A sub-microwatt asynchronous level-crossing ADC for biomedical applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 7, no. 2, pp. 149–157, Apr. 2013.
- [26] P. Martínez-Nuevo, S. Patil, and Y. Tsividis, "Derivative level-crossing sampling," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 62, no. 1, pp. 11–15, Jan. 2015.
- [27] F. Tala, M. Bandali, and B. C. Johnson, "Automated distributed element model generation for neural interface co-design," in *Proc. IEEE 63rd Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Aug. 2020, pp. 917–920.
- [28] T. Marisa et al., "Pseudo asynchronous level crossing ADC for ECG signal acquisition," *IEEE Trans. Biomed. Circuits Syst.*, vol. 11, no. 2, pp. 267–278, Apr. 2017.
- [29] D. Gangopadhyay, E. G. Allstot, A. M. R. Dixon, K. Natarajan, S. Gupta, and D. J. Allstot, "Compressed sensing analog front-end for bio-sensor applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, pp. 426–438, Feb. 2014.



Mohammad Elmi received the B.Sc. degree in electrical engineering from the Babol Noshirvani University of Technology, Mazandaran, Iran, in 2013, and the M.Sc. degree in electrical engineering-analog electronics from Shahid Beheshti University, Tehran, Iran, in 2015. He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada. His main research interests include analog/RF integrated circuits design, low-power biomedical circuits design, and ultra-low-power sensor systems.





Kambiz Moez (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the University of Tehran, Tehran, Iran, in 1999, and the M.Sc. and Ph.D. degrees from the University of Waterloo, Waterloo, ON, Canada, in 2002 and 2006, respectively. Since 2007, he has been with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada, where he is currently a Full Professor. His current research interests include the analysis and design of radio frequency CMOS integrated circuits and

systems. He is a registered Professional Engineer in Alberta. He is currently serving as an Associate Editor for IEEE TRANSACTIONS OF CIRCUITS AND SYSTEMS—I: REGULAR PAPERS and *IET Electronics Letters*.