abstract

ELECTRICAL ENGINEERING 552

PROJECT REPORT

IMPLEMENTATION OF THE Z- (or DEPTH-) BUFFER ALGORITHM

Shazia Mardhani 227387

Ray Still 354146

Abstract

Our objective was to create an IC that would be able to show which parts of various triangles would be visible when many triangles at various height and of different colours are present. The IC would be given, in memory, the 3-Dimensional coordinates of each triangle and, return to memory, the visible colour of each pixel. This is done through the use of the "Z-Buffer or Depth-Buffer Algorithm." This algorithm will be discussed in more detail later. Although this was our original version various constraints forced us to simplify our design at different points. These simplifications and constraints will also be discussed later.

I.C. Data Sheet

The simplified pinout diagram shown above uses only 12 pins of the 160 pins available to the user on the XC4010.

Clock (Pin B17) The clock signal represents the global clock needed to make any synchronous design work. In our system this signal came from another FPGA containing a counter which divided a 16 MHz clock down to a more human readable frequency. (Around 1 HZ or less.) Pin choice was dictated by two factors: the needed connection to the other FPGA, and the need for low skew routing resources, which are available at this pin.

Reset (Pin V5) An asynchronous system reset. In our test, this signal came from a switch, again giving no choice for pin selection.

Done (Pin E16) A signal to the user that indicates task completed. In a more advanced system, this could be part of a handshaking co-ordinating system to allow multiple systems to work together. Our system used one of the two attached LEDs for this signal, drastically limiting our choice of pins.

Pace (Pin E17) This signal tells the observer when each new pixel is displayed on the output bus. This signal may also be needed to allow system co-ordination for a more advanced system. This signal used the other of the two LEDs (the first is attached to Done.) thus determining the pin number.

Test_out (Pins C2 A7 A4 A5 A6 A8 A9 A10; most significant bit to least significant bit order) While the name of this bus dates from an intermediate version of the code, this is the output bus. The information is displayed on 8 LEDs where the user can read off the information. These eight LEDs are attached to the second FPGA so these signals have to be attached to the second FPGA too, thus limiting pin choice.

The second FPGA (which is not shown) contains only a clock divider and routing for the output bus.

Design Changes

The original version of our design was quite large, but due to the constraints of size, and difficulty, the design was cut a few times. In all stages, the same basic algorithm was used; the only things that were changed were the size of the numbers dealt with, the source of the raw data, and the destination of the processed data.

At the outset, we expected to use the microprocessor’s (the 68306) DRAM but we had problems directly access it.

When the Rapid-Prototyping Board was built, only a few DRAM address lines directly to the FPGAs. As well, some of the needed control lines were accessible. With external connecting cables, these addresslines and control signals could be used, but the cables weren’t readily available, and even then, there would be a lot to figure out to make things work. It was decided to just reduce the picture size, colour gradient, (i.e. number of shades of grey available) and depth accuracy to enable the use of the smaller, but easier to use SRAM. The triangle defining constants and the processed data could be passed through the 68306. This passing was another thing to be cut, mostly due to difficulty. The final version has all constants hard wired in and outputs all processed data to the on board LEDs.

Another problem came about because the algorithm that we are using requires multiplication.

end fnew;

architecture behaviour of fnew is

signal temp: std_logic_vector (31 downto 0);

begin

comb_logic : process(a,b,c,x,y)

begin

temp(31 downto 16) <= c;

temp(15 downto 0) <= "0000000000000000";

fvalue <= a * x + b * y + temp;

end process comb_logic;

end behaviour;

(a,b,c,x,y are 16 bit standard logic vectors, fvalue is 32 bits)

Looking at this simple code, imagine our surprise when upon compilation we discovered that this required 590 complex logic blocks. (CLBs) (about 1.5 FPGAs, and we used four copies of this!). This was the largest part and at this time, we could nearly neglect everything else in determining size of the complete project. We ended up cutting a, b, c, x, and y down to 5 bits each, and fvalue down to 10 bits. (Remember that the amount of hardware needed for a multiplier varies as the product of the number of bits in each number being multiplied

One other difficulty that caused us to cut out the use of memory was the use of inout pins. The data buses to both the 68306 and to the SRAM are designed so that the data to the device and the data from the device use the same wires. Synopsys (the compiler) needs to know what direction the signals are going so that it can put the right kind of buffers in. There were serious problems in telling Synopsys that a bus was to be regarded as both input and output. The Problem would have been recognized earlier as a problem, but we had a fairly limited understanding of buffering in general and inout buffering in particular. The problem was partially solved, but not completely. The answer lies in using a special primitive that Synopsys recognizes by name. (More details can be found in our CAD tool documentation.)

Algorithm

"The z-buffer or depth-buffer image-precision algorithm, … is one of the simplest visible-surface algorithms"(Computer Graphics: Principles and Practices, Foley et al). The algorithm summarized is:

Scan through the entire plane that is being represented. At each point, compare the depth (Z coordinate, thus the alternate name) of the current shape with the z coordinate in memory. (The Z buffer). If the current point is behind the one in memory go on to the next point. If the current point is in front of the one in memory, replace the Z buffer entry with the current point’s depth, and replace the colour stored in memory (the colour buffer) with the colour (in our case, grayscale value) of the current point. When finished, repeat for the next shape until there are no more shapes.

Verification

Verification is difficult with our project due to the way that it isn’t really finished, in that there is really no way to get the output into the form of a picture. Verification would be performed by confirming that the picture given by final colour buffer matches a hand done picture. There is not really any other way to do it, due to the long cycle, and large data set handled.

Conclusion

We have discovered that synthesis of VHDL code into hardware is very powerful, if somewhat finicky. The learning curve was very steep and time consuming and very rewarding.

This project is really the beginnings of a crude graphics card. We have discovered that one key to making such a system work is having a large digital system, much larger than we had available. Needed are large multipliers, wide buses, and fast hardware. The Xilinx XC4010 just is not up to the job.

A way that might work to keep the size down and the speed up would be to use the 68306 for multiplication.