EE 552 Final Report

EE 552 Final Report: Interactive Audio Manipulation Processor

April 5, 2000

Declaration of Original Content

The project and the contents of this report are entirely the original work of the authors except as listed in the REFERENCE section and as follows

VGA code file "syncgen.vhd" taken directly from reference [6]

VGA code file "count_xy.vhd", taken directly from reference [7]

VGA code file "video.vhd" which was modified from the file "line.vhd" from reference [7]

Atari joystick pinout information from reference [9]

VHDL code file "bidir.vhd", modified from references [11] and [12]

File test benching methods adapted from reference [13]

ADC/DAC circuitry based on CRYSTAL datasheets, reference [14]

Signed:

Todd Carter

Daniel Ross

Stephen Tang

Abstract

This document describes the design of the interactive Audio Manipulation Processor (iAMP) project for Electrical Engineering 552. The iAMP allows a user to locate a monophonic audio source in the two-dimensional plane of a four-speaker audio system using a graphical user interface. The iAMP system consists of four speakers, ideally placed in the four corners of a room, a VGA monitor and joystick, an audio amplifier capable of amplifying the four channels, and a digital system of our design with four analog RCA output jacks and one analog RCA input jack. An audio source, such as a compact disc player or computer, is connected to the audio input. The four audio outputs feed through the audio amplifier to each of the four speakers.

The iAMP in its entirety is too large of a design to fit in the supplied Altera EPF10K20 device. Hence, a scaled-down design without RAM was implemented.

Table of Contents

Declaration of original content *

Abstract *

Table of Contents *

Achievements *

Description of Operation *

Overview *

Description of Audio Operation *

Video Handling *

Interface with top-level VGA entity *

Timing and count signals *

Constructing the onscreen image *

RAM Driver *

Joystick Driver *

iAMP Data Sheet *

Total Logic Cell Usage *

Total logic cell usage of all components *

Total logic cell usage of the final build *

Results of Experiments *

General Comments *

Multiplier Design: Speed versus Size *

Audio_Manipulator: Echo and Sample Length versus Size *

Video and UI Testing *

Optimizing Code Size of Audio Coordinate Converter *

version 1: lookup table comparisons *

version 2: counters and comparators *

version 3: a combination of methods *

References *

Data Sheets for Parts Used in the iAMP *

Design Verification *

ac_gain_apply.scf *

ac_get_samples.scf *

adc_port.scf *

audio_controller.scf *

audio_manipulator.scf *

dac_port.scf *

echo_delay.scf *

Gain_mult.scf *

incident_gain.scf *

AudioCoords *

Design Hierarchy *

VHDL Design *

Test Benches *

RAM Interface Verification *

Overview of RAM Queue *

Overview of Handshaking Model *

The RAM Interface Test Benches *

Schematics *

Achievements

In the end, the iAMP design was too large to fit onto the EPF10K20 device on the Altera UP1 board. It was necessary to scale down the design by removing the RAM driver and removing the RAM-related parts of the ADC-RAM-audiopath interface entity.

The video output operates properly, on the whole, as does the joystick input. The DAC driver sends valid data to the DAC. As well, the handshaking between the RAM, ADC, and audio control path works correctly. The audio manipulation code successfully handles the task of splitting an input signal among the audio outputs of the system.

Description of Operation

Overview

The interactive Audio Manipulation Processor system block diagram is as follows:

The input signal is produced from any analog audio source, i.e. CD player, computer, radio, turntable, etc. The input signal is connected to the analog input of an Analog-to-Digital Converter (ADC), which converts the analog signal into a stream of two's-complement 24-bit digital samples, of which only the 8 most significant bits are used. The digital samples of the audio source are then multiplied by a gain between 1 and 0 for each of the four speaker output streams depending on the X and Y locations currently selected by the user. These samples are then sent to Digital-to-Analog Converters (DACs) that convert the digital signals into analog signals, that are then amplified and produced audibly at the speakers. The effect of this process is to create sound directionality using the differences in intensity between the outputs of the four speakers. For a discussion of how humans perceive sound, see Appendix A of the Resource Requirements document.

The user interface consists of a joystick and a VGA monitor. The center of the screen features an icon denoting it as the center of the square bounded by the four speakers. A sprite cursor appears onscreen, controlled by the joystick. Pushing the joystick will cause the cursor to move accordingly, which adjusts the X and Y location of the output signal. Moving the cursor to the top-right corner of the screen, for example, will pan the sound to the front-right speaker of the room. In a similar fashion, moving the cursor to the center of the screen will result in an equal sound level from each speaker, thereby placing the location of the sound at the center of the room. This will require hardware instructions to redraw the screen regularly through the 5-pin VGA connector as well as a joystick driver that polls the joystick periodically for input.

Description of Audio Operation

The operation of the audio is split into three parts: the analog-to-digital converter interface, the audio processor and the digital-to-analog converter interface.

The analog-to-digital interface receives sample data from the Crystal CS5360 ADC and places it into RAM. The data captured is 24-bit two’s-complement integers. We reduce it to 8 bits, and pass the value to the RAM interface, which places the sample on the sample stack. The Audio Manipulator sequentially asks for eight samples from the RAM interface, providing a 5-bit delay value and setting the "Sample_Req" signal high. The Ram_interface multiplies the delay by a coefficient (i.e.64) and retrieves sample S(0 - 64*delay), where sample S(0) is the current sample. All eight samples are placed into a register bank for pipelining. (As is described below, the RAM feature was removed due to the contraints on the available space on the EPF10K20 FPGA on the UP1 board. However, the audio system is not aware of this and processes data as described here)

All eight samples are then multiplied by a gain value and placed into a second register bank. The four 'incident' samples and the four 'echo' samples are summed with each other, producing a resultant four samples, one for each speaker. These are then passed to the DAC which produces a corresponding audio signal.

A block diagram describes the audio path below:

The basic sound algorithm is as follows:

The echo delays and the gain coefficients are calculated based on the current X and Y location of the cursor. Gain is calculated by the inverse square of the distance between the location designated by the cursor and the speaker -- near the speaker, the gain will be high, and away from the speaker, the sound is much lower. (Since we calculate our gain as a value between 0 and 1, it would more accurately be called a loss coefficient.) The distance square calculations are complex and required a large amount of space on the FPGA, and as such we found we could not implement them without using up an overly significant amount of space on the UP1. As a result, we designed a simplified first-order equation using a 16x16 location grid:

(Note that the values are divided by 16, so the values are between 15/16 and 1/16. The top left corner is the corner nearest the speaker.)

Gain = 15 - (MAX(X,Y) - 1)

Echo delay similarly requires a large amount of calculation. Our calculations are based on a radial distance from the speaker. Assuming that the speakers are set in the corners of the room (assumed square) and that the sound comes from the location that the cursor marks, our delay is the 'echo' produced when the sound waves reach that speaker and 'reflect' back towards the listener. If the sound is near one speaker, the sound will reflect off that corner sooner than the opposite corner. This also required a significant amount of calculation (pathygoras' rule, in particular.) As a result, we found through least-squares method the following linearization that most accurately matches the time delay:

(These are in time-delay units. These are multiplied by a value (such as 64 or 256) to get an appropriate time delay.)

Delay = X + Y -2.

These tables are similar for all four speakers. The gain coefficients for the echoed signals were simply half of the incident gain for the same speaker.

Video Handling

For details on the video drivers, see the code listings for ui.vhd, video.vhd, ColourChooser.vhd, syncgen.vhd, and count_xy.vhd. Much of the logic for the video can be attributed to references [6] and [7].

Interface with top-level VGA entity

The following figure depicts the top-level user interface entity, ui, and its relationship with the top-level VGA entity, video. Also shown in the top-level entity of the entire system, main.

The signals vga_red, vga_green, vga_blue, vga_h_sync, and vga_v_sync are the video signals required for the VGA monitor. They are output from the ui entity and are mapped to the VGA connector on the UP1 board (see page 13 of [1]) via the top-level entity, main. Essentially, the ui entity routes these five signals from the video entity to the external environment. It is the video entity that actually generates these signals. In order to do this, it requires a 25.175 MHz clock, fed in via the signal clock, and information regarding the current location of the joystick-controlled cursor and the status of the joystick button. These data are provided by the signals cursorX, cursorY, and button. The last input signal is a reset, which is not used currently used in the design but is retained for future use.

Timing and count signals

The internal structure of the video entity is shown below.

The syncgen component (which is taken from Reference [6]) takes the input 25.175 MHz clock and uses it to generate horizontal and vertical synchronization signals. It also produces a Boolean signal, vclock, that indicates when the vertical synchronization signal is active, but this signal is not used anywhere.

The clock and the synchronization signals are used by count_xy (from Reference [7]) to determine the x-y coordinates of the pixel that is currently being drawn. This information is then passed to the ColourChooser component. The ColourChooser component uses the signals cursorX, cursorY, and button to determine what colour the current pixel should take on, and then outputs the appropriate values for the vga_red, vga_green, vga_blue signals.

Constructing the onscreen image

The onscreen image will be somewhat like that shown in the following figure.

Objects are represented by blocks of colour. The image is constructed by colouring pixels in order of priority. The joystick/sound source cursor has the highest priority – the colour of its icon will always take priority over any other part of the onscreen image. The background, the "you are here" marker, and the speaker markers are of the lowest priority. Pixels will only take on the colours of these elements when there is no cursor on top of them.

ColourChooser works by comparing the X and Y coordinates of the current pixel in the VGA scan to the X and Y coordinates of the various sprites in the system. When a match is found, the pixel is coloured according to the colour definition of the sprite. These operations are performed on the fly, so no memory for holding frame buffers is required.

RAM Driver

All the necessary information in creating a ram driver came from the RAM data sheet [14]. The data buses, both internal and external, are bi-directional. The bidirectionality of the buses were implemented using an entity obtained from Altera [11]. Through the use of a state machine the control of the timing of the driver interface became much simpler.

Each state of the machine corresponds to the matching point on the timing diagram of when the different signals become valid. Information passed to and from the driver is more reliable due to the handshaking implemented into the design. See the RAM test bench section for more details.

Joystick Driver

The following text is adapted from Stephen’s final report for EE 582 (reference [10]).

Due to delays in getting a working mouse driver and due to the large code size of the mouse driver in progress, we chose to forgo use of a mouse as an input device and selected a joystick interface instead.

The joystick used is an Atari joystick originally used on 8-bit Atari game consoles. The connector for this joystick has five outputs (up, down, left, right, and fire button) and one ground pin. The four joystick directions and the fire button are all connected to ground by normally open switches, as shown in the figure below. Thus, to keep the joystick outputs normally high, the five outputs were attached to pull-up resistors of 150W each. When the player presses a button on the joystick or pushes the stick, the corresponding output pins will be pulled low. When one of the directional inputs is low for more than a certain amount of time (chosen experimentally to give the best-feeling response), the X and Y coordinates of the onscreen cursor are updated accordingly.

The Atari joystick uses a female DB9 connector, the pinout of which is shown below (from reference [9]). The diagram shows the connector when held facing you.

pin #	function
1	up
2	down
3	left
4	right
5	unused
6	button
7	unused
8	ground
9	unused

iAMP Data Sheet

Chip used = EPF10K20RC240-4 on UP1

logic cell usage = 100%

EAB usage = 0

speed ratings = 25.175Mhz

Connection	Purpose	Connector	Pin Number
UP1 board to VGA display	VGA connector		5
line-in ADC to UP1	s_data s_clock m_clock lr_clock reset	Flex *	* * * * *
UP1 to line-out DAC0	s_data0 s_clock0 m_clock0 lr_clock0	Flex A	17 19 22 21
UP1 to line-out DAC1	s_data1 s_clock1 m_clock1 lr_clock1	Flex *	29 27 26 25
UP1 to external RAM	address[0..19] data[0..7] r_w_ oe_ cs_	Flex *	* * * * *
UP1 to joystick	Left Right Up Down Button	Flex *	* * * * *

Total Logic Cell Usage

Total logic cell usage of all components

The following table summarizes the chip resources consumed by each part of the design. Most of these tests are based on isolated unit tests of each component, since many components had not yet been integrated in the main entity. Table entries describe the resource usage of two or more integrated components are described following the table. Elements listed after the "totals" row were not counted toward the totals, since they overlap with the elements above the "totals" row. They are listed only to give an idea of how much space is devoted to subcomponents of the design.

Note that the entire iAMP system could not fit into the space available on the EPF10K20.

Entity	Logic cells	% logic cells	I/O external
audio manipulator	*	32	0
adc_port	50	4	5
dac_port	134	11	4
dac_port	134	11	4
dac_interface	*	<13>	*
ram_port		11	29
ram_interface	*	23	*
Main	*	23	14
Totals	*	128	*
Audiocoord	67	5	0
Joystick		8	5
Ui		29	14

Total logic cell usage of the final build

The complete functionality of the iAMP could not be realized on the 10K20, so we elected to remove the ram driver entirely and scale down the ram_interface component. The new ram_interface2 component does not actually interface with RAM, but rather passes the most recent samples to the audio manipulator regardless of whether the audio manipulator wants a delayed sample or not. In this way, the audio manipulator code did not have to be revised. The following table shows the logic cell usage breakdown for the final build that was implemented on the 10K20.

Entity	Logic cells	% logic cells	I/O external
audio manipulator	*	32	0
adc_port	50	4	5
dac_port	134	11	4
dac_port	134	11	4
dac_interface	*	<13>	*
adc_interface (ram_interface2)	*	6	*
Main	*	23	14
Totals	1154	100	*
Audiocoord	67	5	0
Joystick		8	5
Ui		29	14

Results of Experiments

General Comments

Due to all the features we initially wanted to implement many "experiments" were conducted to reduce the size of all vhdl components. Actually in most cases the actual vhdl module itself was cut. Many features such as RAM, and echo had to be removed. In the case of the remaining modules there size was reduced tremendously by having the sampling size decreased to 8 from 24 bits. This allowed us to include features we had to have such as the ADC, 2 DAC and graphical volume control. I could see many projects in themselves of the modules we cut away. The initial idea of having mouse control, now joystick control, would be a great deal of work to reduce the size needed in this project. Attempts were made to decrease the size enough but in the end it was not included even though this had been a component in which many hours of design, debug and testing have gone.

Multiplier Design: Speed versus Size

In the design of the multiplier (multiplier.vhd), we decided to use the sequential add/shift multiplication routine described on page 4-27 of Dr. X. Sun’s EE 480 Class Notes. This method uses a series of registers, an adder controlled by a finite-state machine. This design, we suspected, would save space in the FPGA at the cost of speed. After designing and compiling a 16-bit x 16-bit multiplier, we found that the multiplier had taken 10% of the total logic units in the FLEX 10K20 FPGA. On simulation, we found that we could use a maximum clock period of 60 ns, or roughly 16.7 MHz. However, since the multiplier required a number of clock periods to complete each routine (16 shifts plus one add for each ‘1’ in the second value), we found that the routine could take from 1.3 to 2.0 microseconds to obtain a result.

We decided to compare this complicated state-machine based technique versus a more direct method. In fact, we chose to try using the "*" operator and letting MAXPLUS II perform the design for us. The results were what we expected, i.e. much larger but much faster. The multiplier took 56% of the logic gates, and the maximum delay between input and output is 89 nanoseconds, much quicker than our sequential multiplier.

While the "*" operator method results in times almost 20 times faster, the multiplier uses up over half the space, unacceptable for our design. As a result, we find that our multiplier design is acceptable for our audio processor.

(the sample clock period is 1/44100, or 22.7 microseconds.)

ADDENDUM: We realized that we did not need such a complex multiplication, since this circuit was meant for 'mixing,' not realizing that the audio mixing process is in fact a summation! A multiplier is used (multiplying a sample by a 4-bit gain value, then right-shifting by 4 bits to divide by 16. This was created out of adders, resulting in a small size.)

Audio_Manipulator: Echo and Sample Length versus Size

Initially, we were going to use 24-bit samples with full echo (using all eight samples). Once the entire audio_manipulator.vhd entity was compiled, we found that it used up 1314 Logic Cells, which is roughly 200 Cells more than the 10K20 has available! A close examination of the circuit revealed that most of the bulk was made up of 24-bit registers. Thus, to reduce the size of our design, two options were available. The first option was to reduce the sample length to 16 or 8 bits, although the sound quality would decrease. The second option was to remove the four 'echo' signals, thus removing half the registers and eliminating the need for the summers and some of the controller logic. The following chart shows the size of each of these options with and without echo, for different sample lengths.

# Logic Cells Used	WITH ECHO	WITHOUT ECHO
24 Bits	1314 (114%)	790 (68%)
16 Bits	946 (82%)	582 (50%)
8 Bits	579(50%)	373 (32%)

As we can see, the without-echo version is almost 40% smaller than the equivalent with-echo version. Each 8 bit jump is a 32% gain for the with-echo case, and an 18% gain for the without-echo gain. Regardless, this portion is significant in all versions.

Video and UI Testing

The testing of the video code was very straightforward. After compiling the code, the generated SOF file was downloaded to the UP1 board and a VGA monitor was connected to the board. The code was debugged until the video on the monitor appeared as desired. It was noted that the same code would produce slightly different results on different monitors, that is, the image would be shifted with respect to the edges of the screen. On one of the three monitors used, red, green, and blue artifacts that were less than a pixel in size appeared onscreen. Neither of the other two monitors did this.

Due to the simplicity of the joystick code, no test waveforms were made. Rather, the code was immediately tested on with the hardware by observing the motion of the onscreen cursor in response to the joystick inputs.

Optimizing Code Size of Audio Coordinate Converter

The audio coordinate converter (file AudioCoords.vhd) takes the VGA display coordinates of the onscreen cursor, which occupies a 640 by 480 pixel space, and converts them to the smaller coordinate space that is required by the audio manipulator. This space is 15 by 15 squares.

The algorithm basically chops the 640 by 480 pixel space into a set of 15x15 blocks, and determines which block the cursor is located in, then outputs the new coordinate values to the audio manipulator. 480 divided by 15 is 32, which, being an integer, is a very easy number to work with. However, 640/15=42.667. Nonintegral numbers are not easy to deal with in a digital system.

version 1: lookup table comparisons

Our first approach was to use huge 15-condition if-else statements to sample down the X-Y coordinates of the sound source. If the Y coordinate fell between 0 and (32-1), the sound source was assigned a new Y coordinate of 1. If the Y coodinate fell between 32 and (64-1), the source was assigned a new Y value of 2, and so on. It was slightly more difficult to deal with the X coordinate. What we did was to come up with a series of multiples of 42.667 and placed the rounded-down copies of these multiples into the lookup table. The first fruits of our labour are shown in the code listing audiocoords.vhd.first_version, which is on the following page.

This approach consumed 71 logic cells.

version 2: counters and comparators

The previous method was inelegant, so we chose to try a different technique. This time, we chose to use counters that incremented in steps of 32 (for the Y coordinate) or steps of 42.667 (for the X coordinate). A variable would hold the previous value of the counter, and the coordinate value would be compared to the new and old counter values in order to determine what the downsampled coordinate should be.

The astute reader will notice that we could not really count increments of 42.667. Rather, we implemented logic that would set the counter value to the closest whole number to the ideal value.

The results of this approach are shown in the listing audiocoords.vhd.second_version. This time, 75 logic cells were used. This was not very surprising, considering the logic that had to be implemented to make the X counter work.

version 3: a combination of methods

The counter method seemed very clean when used for the Y coordinate, but not for the X coordinate. What if we tried to use the table lookup method for the X coordinate but continued to use the counter method for the Y coordinate? Listing 3, our final version of the code, implements this idea. Indeed, this method seemed to be optimal as only 67 logic cells were consumed.

References

[1] Altera Corporation, University Program Design Laboratory Package User Guide, version 1, San Jose, CA, August 1997.

[2] S.Howard Bartley, Introduction to Perception, Harper & Row Publishers, Inc., New York, N.Y., 1980. p.303.

[6] Jean’s HDL/UP1- board page, http://www.jps.net/kyunghi/up1board.htm, last visited March 2, 2000.

[7] H. Luan, B. Liu, and A. Chan, EE552 Application Note, http://www.ee.ualberta.ca/~elliott/ee552/studentAppNotes/1998_w/dicerace_video_display/, last visited March2, 2000.

[8] X. Sun, EE 480 Class Notes, p.4-27.

[9] Karl Heller and Zube, Atari 2600/7800 FAQ v 12.0, http://www.gamesdomain.com/faqdir/atari_2600_5200_7800.txt, last visited March 15, 2000.

[10]Sydney Tang, Stephen Tang, and Su-tarn Lim, EE 582 Report: M*CORE Bomberman, Edmonton, AB, April 12, 1999.

[11] Altera Corporation, VHDL: Bidirectional Bus, http://www.altera.com/html/atlas/examples/vhdl/v_bidir.html, last visited April 5, 2000.

[12] Cheung, Cheng, Kwan, and Li, Bi-directional Tristate Bus Controller, http://www.ee.ualberta.ca/~elliott/ee552/studentAppNotes/2000_w/interfacing/bi_direction_bus/bidir_bus.html, last visited April 5, 2000.

[13] Gunthrope, O’Reilly, and Lewis, Testbenches using File I/O under VHDL, http://www.ee.ualberta.ca/~elliott/studentAppNotes/2000_w/vhdl/File_IO_Testbenching/testbench.html, last visited April 5, 2000.

[14] Attached Data Sheets

Data Sheets for Parts Used in the iAMP

CRYSTAL CS4390 D/A converter

CRYSTAL CS5360 A/D converter

Cypress CYM1465 512Kx8 SRAM

Design Verification

Index to Test Waveforms

Note: No waveforms are available for the joystick and video components since these were tested directly on the UP1 board. Also, no tests for the ram_port, adc_port, and dac_port components were possible since a bug in MAX+plus II causes "inout" signals" to be converted into two separate signals – an "in" signal and an "out" signal. The MAX+plus II simulator considers these two split signals to be completely independent, and hence cannot be used to reliably simulate the above named components.

ac_gain_apply.scf

ac_get_samples.scf

adc_port.scf

audio_controller.scf

ac_get_samples.scf

audio_controller.scf

audio_manipulator.scf

dac_port.scf

echo_delay.scf

Gain_mult.scf

incident_gain.scf

audiocoords.scf

ac_gain_apply.scf

This waveform shows two clocks - the system clock (~12.5 MHz) and the sample clock (normally 44.1 KHz but for display purposes such time length is unnecessary.) When sample_clock goes high, reg2_enable will rise and fall, then sample_select will increment, up to the set limit (in this case, three.) When sample_clock goes low, the system returns to its original state.

ac_get_samples.scf

This waveform shows two clocks - the system clock (~12.5 MHz) and the sample clock (normally 44.1 KHz but for display purposes such time length is unnecessary.) When sample_clock goes high, sample_req with go high and a delay will be selected (reg0_load is the MUX control codes on the outputs of echo_delay.) when data_ready goes high, implying that the data is ready, reg0_enable will momentarily rise. The reg0_load will increment after data_ready returns to a low value.

adc_port.scf

The period of clock is 40 ns, the m_clock is 160 ns, the s_clock is 40 us, and the lr_clock is 1.92 ms. S_data is clocked in MSB first for 24 consecutive bits. We are only using one input of the ADC, the left channel, therefore we use all data inputted while the lr_clock is high. After the data is collected we have 1.92 ms before the next rising edge of the lr_clock. This is plenty of time for the small handshaking used to verify the output of adc_port.vhd.

audio_controller.scf

This simulation shows a number of different things. Since this contains echo_delay, incident_gain, ac_gain_apply, and ac_get_samples, we see the interaction of each of these signals. When sample_clock goes high, sample_req goes high and four samples will be received by the audio path (controlled with reg0_enable and reg0_gain.) we now have a delay associated with each sample, which are marked down on the signal diagram.

audio_manipulator.scf

This is the entire audio path. Samples are requested using the sample_req signal and acknowledged with the data_ready signal (for each clock pulse, the input sample is identical). After three sample_clocks, output starts coming out. Since the X and Y locations are in the center (8,8) each sample has been retrieved at delay 14, and all output samples are exactly one-half of the input (this is on purpose.) We have an incrementing input sample, showing the pipelining occuring through the audio path. As each input signal rises by 2222(base 16), our outputs rise by one half. This is just a demonstration of the audio path in operation.

dac_port.scf

The period of clock is 40 ns, the m_clock is 160 ns, the s_clock is 40 us, and the lr_clock is 1.92 ms. S_data is clocked out MSB first for 24 consecutive bits. We are using both outputs of the DAC therefore all data inputted before the lr_clock changes state. Data is collected and we have 160 us before the changing edge of the lr_clock. This is plenty of time for the small handshaking used to verify the input of dac_port.vhd.

echo_delay.scf

This takes in X and Y values and produces four outputs, T00, T01, T10, and T11, all five-bit unsigned integers. We can see here that it is, more or less, combinational logic.

Gain_mult.scf

This is a multiplier: Z = A*B/16. Here, A is 1225689, and B varies between 0 and 15. The output varies accordingly.

incident_gain.scf

This takes in X and Y values and produces four outputs, A00, A01, A10, and A11, all five-bit unsigned integers. We can see here that it is, more or less, combinational logic.

AudioCoords

The UI components define the onscreen location of the sound source in terms of X and Y coordinates in a 640x480-pixel grid. However, the audio manipulator can only handle the sound source location in terms of a 15x15 grid, with a coordinate of (7,7) being in the center of the speaker array and with a possible distance, (0,0) being at the front left speaker, (15,15) being at the back right speaker, etc. The AudioCoords component maps the screen coordinates to the 15x15 grid components. For more detail on how this was achieved, refer to the section," Optimizing Code Size of Audio Coordinate Converter".

Consider the Y coordinates. 480/15 = 32. Therefore, a pixel Y coordinate of 0 to 31 is mapped to 1 on the audio grid, and pixel Y coordinate of 32 to 63 is mapped to 2 on the audio grid. Things are more complicated for the X coordinates, since 640/15 is not an integer. The mapping scheme for the X coordinates is given in the code file AudioCoords.vhd.

Design Hierarchy

The overall design hierarchy of the project is described in the following figure. Arrows indicate the direction of signals between components.

An alternative representation of this design hierarchy is given by the following tree diagram of VHDL entities. An entity that is pointed to by an arrow is a component of the entity that the arrow originates from.

VHDL Design

These source files are available here.

ac_get_samples.vhd

Part of audio_controller.vhd. This is a state-machine for sequentially retrieving samples from the audio sample controller. Simulated, no known bugs.

ac_gain_apply.vhd

Part of audio_controller.vhd. This is a state-machine for sequentially multiplying each sample by its respective gain coefficient.

adc_port.vhd

Interface to ADC. compiled – no errors

and2.vhd

A 2-input AND gate. Simulated, no known bugs.

AudioCoords.vhd

Takes the VGA coordinates of the sound source, which is a 640x480-point space, and converts them to a smaller set of coordinates in a 15x15-point space. Simulated, no known bugs.

audio_controller.vhd

This is the controller for the audio path contained in audio_manipulator.vhd. It calculates gain and delay values for each speaker based on X and Y coordinates and sequentially retrieves samples from the audio sample controller and multiplies each sample by its respective gain coefficient. Simulated, no known bugs.

audio_manipulator.vhd

This is the audio path. Consisting of the parallel audio path and the controller, this retrieves samples from the audio sample controller and applies gain and delay, outputting four samples to the DAC controller. Simulated, no known bugs.

ColourChooser.vhd

Determines the colour of the pixel at the current X-Y coordinates of the VGA scan. Simulated with no known errors.

constants.vhd

Contains constants common to various parts of the design. Simulated, no known errors.

count_xy.vhd

Provides two counts that are used to time the horizontal and vertical sync signals to the VGA monitor. Taken from reference [7]. Simulated, no known errors.

dac_interface.vhd

Takes the four 8-bit samples from the audio path and passes them to the DACs. Not compiled.

dac_port.vhd

Interface to DAC. Compiled, no errors

echo_delay.vhd

Calculates delay information based on X and Y (the delays are 5-bit unsigned integers, which are then multiplied by the audio sample controller by 128 to calculate how many sample back it must retrieve the sample from.) Simulated, no known bugs.

gain_mult.vhd

Multiplies a sample by the gain (4-bit integer, 1-15) and divides by 16. Simulated, no known bugs.

GenerateSlowClocks

Creates clock waveforms slower than the 25.175MHz system clock. Simulated, no known bugs.

Incident_gain.vhd

Calculated gain information based on X and Y coordinates. Simulated, no known bugs.

joystick.vhd

The joystick driver software, used as a user input device in lieu of the mouse driver. Simulated and tested, no known errors.

mymux.vhd

4-input MUX of variable width. Simulated, no known bugs.

myregister.vhd

n-bit register. Simulated, no known bugs.

myselector2.vhd

ram_interface.vhd, ram_interface2.vhd

ram_interface was the state machine responsible for performing the handshaking and sample-passing between the ADC, audio manipulator, and RAM. After we determined that the RAM code was using too much space, we produced ram_interface2.vhd, which is actually nothing more than a direct pipe between the ADC and the audio manipulator. ram_interface.vhd: simulated, no known errors; ram_interface2.vhd: compiled, no known bugs

ram_port.vhd

Interface to external SRAM; compiled – no errors

syncgen.vhd

Takes the counter signals from count_xy in order to generate the synchronization signals for the VGA monitor. Taken from reference [6]. Simulated, no known errors.

ui.vhd

The top-level user interface entity that ties the mouse/joystick interface with the graphics and audio components. Simulated, no known errors.

video.vhd

The top-level video interface entity. Simulated and tested, no known errors.

Test Benches

RAM Interface Verification

As was mentioned earlier in the report, the final system did not use any RAM. However, a pair of test benches for the RAM interface was developed before this decision was made, and they are presented here. Preceding the test bench descriptions is a discussion of the system under test, the ram_interface entity.

Overview of RAM Queue

The entire purpose of having RAM in the system is to provide a memory of past audio samples from the ADC. This memory would allow the system to handle effects such as echo, reverberation, and binaural phase difference.

The RAM is treated as a cyclic queue. New samples are written to consecutive addresses toward higher memory. When the last available address is written to, the address pointer loops back to the first address and the cycle repeats. The address that is next to be written to is pointed to by a variable called CurAddress.

Data is read from the address CurAddress – Delay – 1, where Delay is a five-bit value supplied by the audio manipulator. When the audio manipulator wants the most recent sample, it sets Delay to 0. Thus, address CurAddress – 1 is read. If the audio manipulator wanted the fourth sample back, the address read would be CurAddress – 5. The pointer DelayedAddress points at the next address to be read from. The queue logic can deal with delays that put DelayedAddress back past address 0.

Overview of Handshaking Model

The following diagram depicts the handshaking model for the ram_interface entity and the three other entities that it interacts with. Only the handshaking signals are shown. The internal names by which the entities call these signals are shown inside each entity box. The signal names outside of the entity boxes are the names used by the ram interface test bench entities.

A summary of the functions of these signals is given below:

ram_interface connection	handshaking description
adc_port	adc_port asserts ack_out when it a new sample available ram_interface requests that ram_port write the new sample to RAM. When ram_port verifies that this has been done, ram_interface sends SampleAccepted on seeing the SampleAccepted signal, adc_port de-asserts ack_out
audio_manipulator	when audio_manipulator is ready to obtain another sample, it asserts Sample_Req and at the same time supplies a value for Delay, which indicates how far back from the front of the RAM queue it wants the sample to be fetched. once ram_interface retrieves the requested sample from RAM and asserts it on the data bus between it and audio_manipulator, it asserts SampleSentToAudio After audio_manipulator clocks in the sample, it de-asserts Sample_Req SampleSentToAudio goes low when Sample_Req goes low
ram_port (read case)	On receipt of a SampleRequest from the audio_manipulator, ram_interface notes the value of Delay and then calculates the sample offset from the head of the RAM queue. ram_interface then asserts the desired address on the RAM address bus, sets the read-write signal low, and asserts RamRequest. On seeing RamRequest, ram_port fetches the appropriate sample from RAM. Once the sample has been retrieved, ram_port sends RamDone. When ram_interface sees RamDone, it de-asserts RamRequest
ram_port (write case)	On receipt of SampleReady from adc_port, ram_interface sets the read-write signal high, asserts the address of the head of the RAM queue onto the RAM address bus, and asserts RamRequest. On seeing RamRequest, ram_port copies the waiting sample into the requested address. When the sample has been written, ram_port sends RamDone. When ram_interface sees RamDone, it deasserts RamRequest

The RAM Interface Test Benches

The test benches evaluate ram_interface by "stubbing" all of the entities that ram_interface interacts with. To this end, they contain three state-machine processes -- AdcStub, RamStub and AudioStub.-- that behave in the manner expected by the ram interface. AudioStub does not actually do anything with the data that it receives, and AdcStub extracts sample values as 24-bit std_logic_vectors from a binary disk file (an application note describing how to access binary files in VHDL should be available in the next few days; the basic methodology is similar to the ASCII text file testbenching described in reference [13]). RamStub always waits for a fixed period of time before asserting RamDone, and it reads/writes sample data to a 4-byte array.

Test Bench 1: Handshaking and Queue Handling

The first test bench is defined in the file ram_bench.vhd. This bench used a constant value for Delay in order to make the handshaking and data transfer behaviour more apparent.

Waveform TB1 depicts the results of the test bench. Areas of note are labeled on the waveform, and these labeled portions are explained below. The TB2 series of waveforms provide a more detailed view of the information in waveform TB1.

This part of the waveform describes the first data write sequence. Initially, the ADC stub indicates by asserting SampleReady that a new sample having the value 0x30F0F0 is available. The RAM interface pulls the RamRW line low and asserts RamRequest, which indicates that it wishes to write data to RAM. Note that the first RAM address, 0x00000, has been selected, and the most significant byte of the sample, 0x30, has been put on the RAM data bus, RamData.

After a few clock cycles, the RAM stub sets RamDone, and transfers the byte on the data bus into the RAM array, TheRam. ram_interface sees the RamDone signal and de-asserts RamRequest and sends SampleAccepted to the ADC stub.

(Recall that only the most significant 8 bits of the 24-bit audio sample are used by the iAMP)

Here we consider the first read sequence. After sending SampleAccepted, the ram_interface "notices", for the first time, the SampleRequest signal from the audio manipulator stub. Note that, while ram_interface was busy in the write-mode of its state machine, it did not respond to SampleAccepted.

The high SampledAccepted value causes ram_interface to force RamRW high, indicating that it wishes to read data. RamAddress points to the address of the just-written sample, ie, 0x00000. Note that RamData is zeroed since the ram interface is not driving the bus.

When RamDone goes high, the data in address 0x0000 appears on the RAM bus. The ram interface puts this value on the SampleFromRam data bus and asserts SampleSentToAudio. The high SampleSentToAudio causes SampleRequest to go low.

Here we see the second write sequence. Since address 0x00000 is now full, the ram interface chooses the next address, 0x00001, to store the next sample in. Indeed, the most significant byte of the latest sample is pushed into the RAM.

We see here the result of running past the maximum allowed RAM address. The address pointer simply loops back to the starting address and the first sample is then overwritten.

Test Bench 2: Delay Handling

The second test bench is in the file ram_bench2.vhd. ram_bench2.vhd is so similar to ram_bench.vhd that the source code has not been included. Rather, commented-out code is included in ram_bench.vhd to indicate how the second test bench differs from the first. The only real difference is that the second bench cycles the Delay signal through the values 0 to 3. Waveform TB3 is an excerpt of the results obtained with this bench.

The first case is much like what we saw in waveform TB1, since Delay = 0. The CurrentAddress pointer is pointing at address 0x00000, which is where the next sample will be written. However, we are reading data, so we must read from the preceding address, which is 0x00003 – remember that the RAM queue is circular!

This is another case where the circular nature of the RAM queue comes into play. For a delay of zero, address 0x00000 would be read from. However, the delay is 1, so we read 0x00003, the location 2 addresses back from the head of the queue.

This example is very simple. We read from address 0x00003 – 2 – 1.

Here we see a delay whose size is just short of the total number of addresses. The effect is that the sample read is the same address that the next sample will be written to.

We have come full circle – once again the delay is zero.

Schematics