EE 552 2003w   2003-1-31
 

Lab 4: Fast Arithmetic, Pipelining, Packages, Interfacing


Please read the requirements for lab assignments.

In this lab, you will obtain greater speed in arithmetic by through the use of alternative number representations and pipelining.
 

Exercise

Area Measurement

Compile any design, such as the adder you used in the exercise of Lab 3.  Find out how many of the chip's logic cells were used by this adder, double-click the "rpt" icon below the fitter, then search "^F Total logic".  You will find a result that looks like:
Total logic cells used:                         7/1152   (  0%)
 

Lab

 

Part A: Growth of Delay

  1. Measure the minimum clock period for at least four different sizes of counters (e.g. 32, 64, 128, 256, 512 bits) and use this to estimate the clock period as a function of the number of bits in the counter.  To conserve IO pins, direct only the 8 most significant bits to the output.  (Hand in only calculation and function.)

Part B: Reducing Critical Paths

Carry-save arithmetic does not fully propagate carries, but rather, stores results in a redundant format with 2 bits representing each bit of the result.  VHDL code for a carry-save adder is provided as an example.

Design a carry-save counter.  If you had used a simple binary adder, you would tie one adder input to "00001" and connect the output through a register to the other adder input.  Using the carry-save adder, connect sum_part1 and sum_part2 through registers to adden2 and adden3.   To avoid running out of pins, you may want to send only the most significant bits "off-chip".

  1. Again, express the minimum clock period as a function of the number of bits in the counter.
A note for the curious, this carry-save arithmetic technique is also used in fast multipliers.
 

Part C: Configuring FPGAs/CPLDs

This is the last lab period to get Part C checked off from the previous week.
Please include the requirements in this lab report.
 

Part D: Pipelines

Incorporate an unsigned 8 by 8 multiplier (16 bit result) into a design using lpm_mult.

Use lpm_mult -- read the Maxplus2 online documentation. Do not use any EAB's (embedded array blocks = memory) or the "*" operator in this section.   Additional documentation, for the curious, is available at http://www.edif.org/lpmweb/
Include the library with:

library lpm;
use lpm.lpm_components.all;

Your design should include lpm_mult and pipeline registers (D-flipflops) before and after it.  The data input and output of your multiplier must be registers in order for registered-performance timing analysis to provide a report for your entire circuit.  Measure the total number of logic cells in the design.  Throughput is the maximum clock frequency (MHz).  We will define latency as the number of pipeline register stages, not counting the first one, (alternatively, the number of pipeline stages in lpm_mult plus one) times the minimum clock period.
 

  1. With the pipeline generic parameter set to 0 and to 4 (1 and 5 pipeline stages), compare the size, throughput, and latency of the two variants of the design.

Part E: Runtimes for Behavioural vs. Post-layout Simulation

Using a stopwatch or the unix "time" command, compare how long it takes in mentor graphics vs. maxplus2 to:
  1. compile the following code
  2. simulate the following code for 1000 cycles (or some smaller number if you have a slow workstation)
If you're not convinced that behavioural simulations can save time, try increasing counterWidth.
If it runs too slowly on your workstation, decrease counterWidth.  Either way,  if you change counterWidth, also change the second parameter to conv_unsigned().

squares.vhd