Special Session Paper: An Efficient Hardware Design for Cerebellar Models using Approximate Circuits

Honglan Jiang  
University of Alberta  
Edmonton, AB, Canada  
honglan@ualberta.ca

Leibo Liu  
Institute of Microelectronics  
Tsinghua University  
Beijing, China

Jie Han  
University of Alberta  
Edmonton, AB, Canada  
jhan8@ualberta.ca

ABSTRACT
The superior controllability of the cerebellum has motivated extensive interest in the development of computational cerebellar models. Many models have been applied to the motor control and image stabilization in robots. Often computationally complex, cerebellar models have rarely been implemented in dedicated hardware. Here, we propose an efficient hardware design for cerebellar models using approximate circuits with a small area and a low power. Leveraging the inherent error tolerance in the cerebellum, approximate adders and multipliers are carefully evaluated for implementations in an adaptive filter based cerebellar model to achieve a good tradeoff in accuracy and hardware usage. A saccade system, whose vestibulo-ocular reflex (VOR) is controlled by the cerebellum, is simulated to show the applicability and effectiveness of the proposed design. Simulation results show that the approximate cerebellar circuit achieves a similar accuracy as an exact implementation, but it saves area by 29.7% and power by 37.3%.

CCS CONCEPTS
• Hardware → Application specific integrated circuits;

KEYWORDS
Cerebellar model, approximate circuits, multiplier, adder

1 INTRODUCTION
The human beings’ superior ability to accurately control complex movements, due to the cerebellum, has engaged considerable attention. Many computational models have been proposed to explain and mimic the cerebellar function for signal processing and motor control applications. However, little has been done on implementing the cerebellar model in hardware due to its high complexity. Meanwhile, approximate computing has emerged for energy-efficient and high-performance processing with some loss in accuracy [6]. As the cerebellum and its models are inherently error-tolerant, approximate computing circuits are utilized in the hardware implementation of a cerebellar model in this paper.

Many cerebellar models have been proposed, including the perceptron based model [1, 13], the continuous spatio-temporal model [3] and the higher-order lead-lag compensator model [7]. However, the most widely used cerebellar model is based on the adaptive filter [5] due to its low complexity and high structural resemblance to the cerebellum. Therefore, an efficient hardware design using approximate arithmetic circuits is proposed for the adaptive filter based cerebellar model.

2 CEREBELLAR MODEL DESIGN
Fig. 1 shows a connection network of cerebellar cells [8], where the Purkinje cell (PC), granule cell (GC), Golgi cell (Go), mossy fibre (MF) and climbing fibre (CF) are key elements for the cerebellum. In the adaptive filter based cerebellar model, the GC and Go are combined and simplified to a tap-delay line [11]. The output of the PC is given by

\[ z(t) = \sum_{i=0}^{N-1} w_i(t) \cdot x_i(t), \]  

where \( w_i(t) \) is the synaptic weight between the \( i \)th parallel fibre (PF) and the PC, \( x_i(t) = u(t - T_i) \) is the delayed input, \( T \) is the constant delay of the Go-GC system, and \( N \) is the number of synapses. The synaptic weights are updated by the error signal carried by the CF according to the least mean square (LMS) algorithm. The LMS algorithm is formulated as

\[ w_i(t + T) = w_i(t) + \mu \cdot e(t) \cdot x_i(t), \quad i = 0, 1, \ldots, N - 1, \]  

where \( \mu \) is the step size, and \( e(t) = d(t) - z(t) \) is the error between the desired signal \( d(t) \) and the PC output.

Figure 1: A connection network of cerebellar cells.

As per (1) and (2), 3\( N \) multipliers and 2\( N \) adders are required for the circuit implementation of the model. As the number of PF-PC synapses is very large (around 200,000) [13], \( N \) can be very large and the cerebellar model requires a large number of multipliers and adders. Taking advantage of the fault-tolerance of the cerebellar model, approximate adders and multipliers are used in the proposed design to reduce its hardware consumption.

The weighted sum operation computing \( z(t) \) indicates that the average errors of the approximate multiplier and adder determine its accuracy. Therefore, the approximate radix-8 Booth multiplier (ABM2) [9] and the lower-part-OR adder (LOA) [12] with low average errors and low power-delay products (PDPs) are chosen according to the comparative evaluation in [10]. ABM2 is redesigned as an \( n \times n \) fixed-width multiplier with \( n \) approximated bits (with error compensation) in the recoding adder and \( (n - 1) \) truncated least significant bits (LSBs) of partial products. For an \( n \)-bit LOA, \( k \) LSBs are added by using
OR gates, and an AND gate is used to generate a carry-in signal for the more significant bits processed by an \((n - k)\)-bit exact adder. It is referred to as LOA-\(k\).

3 EVALUATION

The cerebellum plays a key role in the control of eye movement in the saccade system; this involuntary eye movement is referred to as the vestibulo-ocular reflex (VOR). The VOR stabilizes a visual stimulus into the center of the retina (fovea) for a clear vision when the head moves [2]. Fig. 2 shows a simplified model of the VOR, where the cerebellum works as a forward model to predict the eye plant output and to compensate the movement command indirectly. In the saccade system, the head movements are sensed by the vestibular system consisting of semicircular canals and the otolith organs [14]. As a first study, only the horizontal head velocity sensed by the horizontal canal is considered as the input. The horizontal canal is modeled as a high-pass filter, \(V(s) = \frac{x}{s + 1/T_c}\), where \(T_c = 6s\) [14]. The brainstem acts as a control center that receives the sensory information and compensation signals from the cerebellum. It then generates commands to drive the eye muscles for movement. The transfer functions of the brainstem and the eye plant are given by \(B(s) = G_d + \frac{C}{s + 1/T_i}\) and \(P(s) = \frac{s(a+1/T_p)}{(s+1/T_i)(s+1/T_p)+s^2}\), respectively, where \(G_d = 1, G_i = 5.05, T_i = 500ms, T_p = 370ms, T_2 = 57ms\) and \(T_2 = 200ms\) [4].

Figure 2: A simplified model of the VOR.

To evaluate the accuracy of the approximate cerebellar model, the saccade system in Fig. 2 is implemented in MATLAB. The cerebellar model is implemented in an \(n\)-bit fixed-point format consisting of 1 sign bit and \((n - 1)\) fractional bits. Fig. 3 shows the retinal slip (error signal) during a 5s training, where the constant delay \(T\) is 1ms, \(N\) is 128, and the step size \(\mu\) is set to \(2^{-8}\) (to simplify the multiplication to a shift operation). Fig. 3 shows that the accurate 20-bit fixed-point cerebellar model produces the lowest stable retinal slip, and the 18-bit implementation produces a slightly higher one, while the retinal slip of the 16-bit implementation does not converge.

Two 20-bit fixed-point approximate models, AP(6, 2) and AP(8, 2), are considered. In AP(6, 2), all multiplications are implemented by LOA-6’s, and LOA-2’s are used to update the synaptic weights because (2) is more sensitive to errors. For AP(8, 2), the only difference is that LOA-8 is used for the adder tree. The simulation results in Fig. 3 show that AP(6, 2) generates a similar retinal slip with the accurate 18-bit implementation, while the retinal slip of AP(8, 2) converges to a slightly larger value.

The circuit characteristics of the cerebellar designs that result in similar retinal slips are shown in Table 1. The synthesis results are reported for the critical path delay, area and power dissipation by the Synopsys design compiler in STM 28nm CMOS technology; the power is estimated under a clock frequency of 125MHz. With a similar accuracy, the AP(6, 2) is faster by 17.3%, and consumes a smaller area by 37.3% and a lower power by 29.7% than the accurate 18-bit design. As a result, a saving of 41.9% in PDP is obtained by using approximate multipliers and adders in the adaptive filter based cerebellar model.

4 CONCLUSION

This paper proposes an efficient hardware design for the adaptive filter based cerebellar model using approximate multipliers and approximate adders. The simulation results show that the approximate cerebellar model (AP(6, 2)) achieves a similar accuracy as the exact 18-bit design. However, AP(6, 2) is more efficient in hardware than the accurate 18-bit design, with a reduction of PDP by 41.9%.

REFERENCES