FIR Filtering With Wide Data

Marc-Julien objois, Catherine Single, Charlena Fong, and Mariya Shterngartz

Introduction

It may be necessary to create FIR filters which act on wide data. Doing so using registers and multiple multipliers gets very expensive. For instance, a 20 bit lpm_mult requires 1259 logic units on a FLEX10K device. For each tap of the filter, incoming data needs to be stored in a 20 bit register. Each filter coefficient also needs 20 bits. Considering each bit requires a logic unit, a multi-tap filter can get extremely large.

However, using a single lpm_mult and SRAM storage, the same filter can be implemented.

Implementation

The lpm_mult megafunction is most efficient when used with a pipelining value of 4. On a FLEX10K20RC-4, it can be clocked at approximately 24 MHz. This speed is not entirely necessary, and the master 25MHz clock can be divided by two to clock the lpm_mult. The main blocks and general flow of data are presented in Fig. 1.

Fig. 1: Crucial Elements

The trick is that you can't wait for lpm_mult to finish a multiplication before feeding it new values. Using a counter or two, this can be accomplished easily. One process can feed each wave data value and its corresponding coefficient into the multiplier, while a delayed counter can start adding values to an initially cleared accumulator as soon as data starts to emerge from the lpm_mult. The coefficients of the FIR filter will be refered to as h, and the values of the wave data will be referred to as n.

Fig. 2: Order / Timing

Note that the data width of the multiplier is 2X that of the input data. This precision is needed all the way to the end. Thus, the adder must support 2X the data width as well. At the beginning of the cycle, it is reset to 0. After the lpm_mult has finished providing new data, the correct output value for the filtered result is contained in the adder. However, it is then necessary to take it back down to the proper size (data width). The value of the output of the adder can be mapped to the proper output value as follows. Take the width of the adder to be 2*N, and the master data width to be N. The adder's output is adder_sum.

  signal truncated_sum : std_logic_vector(N-1 downto 0);
    .
    .
    .
  truncated_sum(N-1) <= adder_sum(2*N-1);
  truncated_sum(N-2) <= adder_sum(N-2 downto 0);