SPATIAL AUDIO TECHNIQUES

SPATIAL AUDIO TECHNIQUES

Concepts and Implementation
Student Application Notes
Ross, Daniel
EE 552
March 7/2000

TABLE OF CONTENTS

1.0 Introduction *
2.0 The Human Auditory Mechanism *
2.1 Externalization *
2.2 Localization of Sound *
2.3 Cues for Localization *
2.4 The First Class of Cues *
2.4.1 Binaural Phase Difference *
2.4.2 Binaural Intensity Difference *
2.4.3 Outer Ear Response *
2.4.4 Shoulder Echoes *
2.5 The Second Class of Cues *
2.5.1 Head Motion *
2.5.2 Vision *
2.5.3 Reverberation *
3.0 Methods of Creating Spatial Audio *
Bibliography *

1.0 Introduction

This document discusses the theory of stereophonic and spatial audio techniques and discusses in detail methods of creating spatial audio effects.

Spatial Audio is an effect created through the use of speakers and signal processing techniques that create an effect for the listener such that the sound source appears to originate from a specific location. Taking advantage of the weaknesses and limitations of the human auditory mechanism, a number of techniques have been formed throughout the years resulting in a number of complex techniques that either explicitly or implicitly locates sound sources.

2.0 The Human Auditory Mechanism

The human auditory mechanism is a complex mechanical system that enables the human to perceive a relatively large range of frequencies and to make relatively accurate conclusions about the audio landscape surrounding them. However, the human auditory mechanism has a number of weakness and limitations that allow us to create audio effects that 'fool' the brain into interpreting sounds differently than what is truly there.

2.1 Externalization

The perception of space is the primary limitation of the human ear. The auditory mechanism does not include the facilities that would enable it to take a role in initiating the perception of space as a domain (1). Sounds produced by external sources may seem to originate within the body rather than outside it. For example, someone wearing headphones will not interpret the sounds as coming from specific directions, but as an overall sound field from a non-specific direction (1). As a result, the localization of sounds requires a number of direct and indirect cues.

2.2 Localization of Sound

A number of characteristics are dependant on the fact that the human has two ears. The first characteristic is that the human ear almost never confuses an acoustic source on the right with one on the left. This implies that in the axis along the ears (i.e. right to left,) there is a complete ability to locate sounds along that axis. The second characteristic is that a subject may mistake sounds coming from the front or above for a sound from behind. The third characteristic is that a subject is generally confused, when the source is off to one side of the median plane, which is describable by means of a "cone of confusion" whose apex lies at the center of the head. Any position on the cone may be confused with any other position on it (1). These two facts imply that the human auditory mechanism is easily confused by the location of sounds in the plane orthogonal to the axis along the ears. This also implies that at least one other cue is required for humans to interpret sounds in front and from behind.

Figure 1. Cone of Confusion in auditory localization. Within the area bounded by the cone, two acoustic sources may be confused. (Source: E.B.Newman. "Hearing," in E.G. Boring, H. Langfeld, and H.P. Weld (eds.), Foundations of Psychology. New York: Wiley, 1948, Fig. 165.

2.3 Cues for Localization

There are two classes of cues used for locating audio sources. The first class are direct physical phenomenon that appear at the human ears and are a result directly of the human body in the environment. The second class of cues is phenomenon produced of a result of the environment. These cues are used by the human mind to interpret and place sound (2).

2.4 The First Class of Cues

The first class of cues is the following: Binaural Phase Difference, Binaural Intensity Difference, Outer Ear Response and Shoulder Echoes.

2.4.1 Binaural Phase Difference

Acoustic stimuli originating from a common source to one side or the other of the median place do not reach both ears at the same time. When they do, the stimuli are out of phase. The higher the frequency, the less is the time involved in the various phases of a given wave, and high frequencies provide no basis for utilizing phase differences (1). Binaural Phase Difference is effective only for pure tonal sources of less than 1500 Hz, so for most applications this effect is rarely utilized, although reverberation poses certain applications of the principles involved.

2.4.2 Binaural Intensity Difference

The most common technique, this factor aids the individual to localize sound by the difference in the intensity of the signal on the two ears. In general, the sound appears to originate from the direction of higher intensity. In tests, this phenomenon has been experimented upon, particularly by Stewart and Hovda who used tuning forks in tandem through tubes (1). They found that an intensity ratio of 10 to 1 was needed to shift the apparent direction of the sound source 45 degrees. This ratio is much larger than one resulting from a physical acoustic source at a 45 degree angle and suggests that intensity is not the only cue used for normal accuracy in experiencing sound(`).

2.4.3 Outer Ear Response

The outer ear plays a major role in allowing humans to place sounds. The outer ear due to its non-radial shape attenuates signals as a function of their direction (2). Listeners recognize static produced in front of them versus from behind by the difference in character. The static in a test performed by Stevens and Newman (1936) found that people could more easily identify the source of static in open air. The subjects found that facing the sound sounded like a "shhh" versus an "sss" sound produced when not directly facing it. They also felt that sounds from the front were louder than sounds from behind (1). This implies that the auditory mechanism uses the router ear response as a tool for identifying sound locations.

2.4.4 Shoulder Echoes

The shoulders are attenuating surfaces in close proximity to ears. As such they play a role in producing attenuated echoes as a function of the direction of origin. However, this effect has not been widely studied and little or no information can be found on this subject.

2.5The Second Class of Cues

The second class of cues is the following: Head Motion, Vision and Reverberation.

2.5.1 Head Motion

Head Motion plays an important role in determining the location of sounds. By interpreting the change in intensity, frequency response and phase differences in the movement of the head, one is able to more accurately locate sounds. This provides the brain with a 'benchmark' to compare against and to use the perceived differences to produce a more complete sound landscape.

2.5.2 Vision

Vision and hearing go hand in hand in understanding the world around a person and plays an extremely large role in the interpretation of sounds. Our brain interprets objects viewed as symbolic information and identifies the object with certain qualities, i.e. behavior and sound. For example, if we see a duck, we expect it to act like a duck and quack like a duck. As well, our visual understanding of our environment produced certain expectations and assumptions we have gained through our gained understanding of the world around. For example, if the duck is in front and to the right of us, we expect to hear the duck quack from that direction, rather than from behind us. This expected correlation of our senses from continued conditioning from our birth influences our perception of sound (1).

For example, if a sound source of a duck is produced directly in front of a listener, and the listener sees a duck in front and to the right, the listener will expect the sound of the duck to come from the duck's location. Even if the sound of the duck is not precisely from that direction, the brain will reject the non-correlating information and locate the sound source at the duck's location.

2.5.3 Reverberation

Reverberation plays an important cue in perceiving distance from the listener. Figure 2 below describes reverberatory wavefronts reaching the listener from the sound source in a room. This image tends to suggest that if the source were placed closer to the ear the major portion of the energy reaching the ear would reach it directly, rather than in a series of delayed wavefronts at random delays and intensities. Only a small mount of energy would reach the ear after being reflected back and forth between the walls. As a result, we can see that a basis for locating a sound is attributed to its reverberation characteristic rather than wholly to its reduced energy content. Therefore, the perceived distance of a sound is based on the complexity of reverberated wavefronts.

3.0 Methods of Creating Spatial Audio

Now that we have discussed the cues used by the human auditory mechanism, we can apply these techniques to our system.

First of all, we will describe our sound environment. Our target system is a square room with a width w and a length l. We are using four speakers, where the output of each is controlled independently. The gain for each speaker is calibrated such that the output signals sent to each speaker by our audio processor are at equal amplitudes. Each speaker is equidistant away from the center of the room, shown in Figure 2 as a chair.

Figure 2. Speaker Layout

In this situation, there are a number of extreme situations. To move the position (marked by an "X" on the diagram) to one of the four corners, simply limit the output to that corner's speaker. To place the sound locus in the center of one of the four sides, we can active only the speakers on that side.

At center, we find that we produce a reinforced direct signal as the four speakers, equidistant from the listener and all in phase produce outputs of equal intensity. While there are many reverberation signals reflecting of the walls, the reinforced wavefront from all directions will envelop the listener in sound, where the sound is no longer externalized.

This will provide a binaural intensity difference to locate the sound, produced by the gain controls for each speaker. In each of the cases described above, a binaural phase difference manifests itself due to the location of the speakers (since if we have a single speaker activated, the phase difference will exist since one ear is closer to the speaker than the other. As well, from the front or rear, since both speakers are activated and in the same phase, the wavefronts will reach both ears at the same time.) Reverberations off the walls will also assist the listener in placing the sound.

Is this enough to have the listener locate a sound? Since the rear speakers are physically located behind the listener, the directional attenuation of the outer ears will exist. The reverberation cues will exist, as echoes will be produced off the walls. The effect will be similarly for the front speakers. The combination of multiple cues provides enough information for the brain to detect the sound in the desired direction in the front-rear axis.

As far as left-right location is concerned, as the human auditory mechanism has no difficulty in detecting the location of sound along the left-right axis, there should be no problems in this regard.

In our study on this topic, we were surprised to find that in explicit "surround" systems (such as this setup) as well as in implicit "surround" systems (such as Dolby Pro-Logic) that sound positioning is almost entirely created electronically through intensity differences, letting the recording, and the position of the speakers create the phase differences and reverberations. The only systems that take advantage of this information are those computer systems where 3-D Sound is synthesized using only a pair of speakers, and extremely complex film and movie houses (such as IMAX,) where the geometry of the room is constant.

(By implicit and explicit I refer to whether the sound outputs are encoded or not into the input signal. In Dolby Pro-Logic and the like, the output signals are encoded as signal differences between the two channels, where a 90 degree phase shift is performed (in order to create a delay between the front and rear speakers to create a 'spacious' feeling to the sound) and band-pass filtered (3). For our system, we simply take the input(s) and directly place the signal through intensity differences and echoes.)

(Side Note: in our EE552 project, we have an advantage of a visual aid. The sound location is 'chosen' using a mouse cursor on a VGA screen, where the user clicks on an overhead view of the room. This will create a subconscious expectation of where the sound will come from, and the spatial effect will be reinforced by the user's expectations of where the sounds will come from. Of course, if the user has no expectations of sound location (i.e. he or she closes their eyes before the sound location is chosen,) then perhaps this effect is nullified. However, it's all part of the illusion!)

Bibliography

(1). S.Howard Bartley. "Introduction to Perception" 1980, New York, Harper & Row.

(2). Mohamed Alkanhal & Deepak Turaga. "3D Audio Techniques and Applications" Internet. URL: http://www.ece.cmu.edu/~ee899/project/deepak_sem/index.htm

(3). Roger Dressler. "Dolby Pro Logic Surround Decoder Principles of Operation" Internet. URL: http://www.dolby.com/tech/whtppr.html