Captain. USAF - DTIC · Sad Related Transfer Function Approximation Using Neural Netv THESIS John K. Mlillhouse Captain. USAF AFIT/GE/ENG/94D-21 J, DEPARTMENT OF THE AIR FORCE AIR

Sad Related Transfer Function Approximation Using Neural Netv

THESIS

John K. MlillhouseCaptain. USAF

AFIT/GE/ENG/94D-21 J,

DEPARTMENT OF THE AIR FORCE

AIR UNIVERSITY

AIR FORCE INSTITUTE OF TECHNOLOGY

Wright-Patterson Air Force Base, OhioDiSTFRBUTIGfN STATEMENT A

Accroved for public release: I

AFIT/GE/ENG/94D-21

Head Related Transfer Function Approximation Using Neural Networks

THESIS

John K. Millhouse Accesion ForCaptain, USAF NTIS CRA&I

DTIC TAB|Unaninounrced El

AFIT/GE/ENG/94D-21 J Uistiicon - -......................

By------------------------------------

Distribut0on I

AvUi bII~ •Codes

i Av aii a ndJ! or>.Dist SpOcial

Approved for public release; distribution unlimited

AFIT/GE/ENG/94D-21

Head Related Transfer Function Approximation Using Neural

Networks

THESIS

Presented to the Faculty of the Graduate School of Engineering

of the Air Force Institute of Technology

Air University

In Partial Fulfillment of the

Requirements for the Degree of

Master of Science (Electrical Engineering)

John K. Millhouse, B.S. Electrical Engineering

Captain, USAF

December, 1994

Approved for public release; distribution unlimited

Acknowledgements

In thanking the many people who helped me, I will start at the very top. No

one deserves more credit for this work than my Lord and Savior-Jesus Christ.

My thesis committee, composed of Dr. Steve Rogers, Dr. Mark Oxley, Bar-

bara McQuiston, Dr. Dennis Ruck, and Dr. Matthew Kabrisky were the source of

ideas, and technical expertise for the research in this thesis. A number of students

were very helpful in teaching me better ways to use the Sun Workstations in the

Signal Processing Lab. In particular Capt Brian Smith was extremely helpful with

answering my questions regarding the use of ESPS and 3D sound in general. If not

for the contributions of each of these individuals, my work load would have been

even heavier. Finally, I would like to thank my wife, Audra, for her support and

patience throughout this research effort.

John K. Millhouse

ii

Table of Contents

Page

Acknowledgements ........................................ ii

List of Figures . . .......... ............................... vii

List of Tables ......... ................................... xii

Abstract .......... ...................................... xiii

1. Introduction ......................................... 1

1.1 Background ............................. .... 1

1.2 Problem ........ ............................ 4

1.3 Definitions ..................................... 4

1.4 Research Objectives ............................ 6

1.5 Scope ......... ............................. 6

1.6 Approach ........ ........................... 7

1.6.1 Synthesizing 3D Cues ..................... 7

1.6.2 Artificial Neural Networks ................... 10

1.6.3 LNKmap. ............................. 13

1.7 Thesis Outline ........ ........................ 14

II. Background ............. ............................. 16

2.1 Introduction ........ .......................... 16

2.2 Graphical Display ....... ...................... 17

2.3 Graphical and Auditory Displays ................... 18

2.4 Acuity of Sound Localization I ..................... 21

2.4.1 Interaural Time Difference ................. 22

iii

Page

2.4.2 Interaural Intensity Difference ............... 22

2.4.3 Experiment 1 ....... .................... 23

2.4.4 Experiment 2 ....... .................... 24

2.4.5 Discussion ............................. 25

2.5 Acuity of Sound Localization II .................... 25

2.5.1 Experiment ....... ..................... 25

2.5.2 Results ............................... 25

2.6 Headphone Localization of Speech ..... ............. 26

2.7 Learning LNKmap with a simple function ............. 27

2.7.1 Linear function ...... ................... 27

2.7.2 Non-linear functions ...... ................ 29

2.8 Learning LNKmap with a Cosine function ............. 33

2.8.1 Cosine Revisited ....... .................. 34

2.8.2 Conclusion ............................. 36

2.9 Learning LNKmap with a Two Input Cosine function . . . 38

2.9.1 RBF Architecture ........................ 39

2.9.2 MLP Architecture ........................ 40

2.9.3 Conclusion ............................. 44

2.10 Conclusion ........ .......................... 45

III. Methodology ......... ............................... 46

3.1 Introduction ................................. 46

3.2 Learning HRTF Data for Zero Elevation .............. 46

3.2.1 MLP Architecture ........................ 46

3.2.2 RBF Architecture ........................ 47

3.2.3 Conclusion ............................. 49

3.3 Experiment 1: With and Without HRTF .............. 49

3.3.1 Objective ....... ...................... 49

iv

Page

3.3.2 Method ....... ....................... 49

3.3.3 The Model ............................. 50

3A Experiment 2: AAMRL HRTF and RBF HRTF ...... ... 54

3.4.1 Objective ....... ...................... 54

3.4.2 Method ....... ....................... 55

3.4.3 The Model ............................. 56

3.5 Conclusion ........ .......................... 56

IV. Experiment Results .................................... 57

4.1 Introduction .................................. 57

4.2 Types of Error ........ ........................ 57

4.2.1 Algebraic Error ...... ................... 57

4.2.2 Absolute Error ......................... 57

4.2.3 Reversals ....... ...................... 57

4.2.4 Default to 900. ...... .................... .... 58

4.3 Experiment One Results ......................... 58

4.3.1 Further examination of results ............... 66

4.3.2 Summary of Experiment One Results ....... ... 72

4.4 Experiment Two Results ......................... 72

4.4.1 Further examination of results ............... 77

4.4.2 Summary of Experiment Two Results ....... ... 81

V. Conclusions and Recommendations ......................... 84

5.1 Summary ........ ........................... 84

5.2 Conclusions .................................. 84

5.3 Recommendations for Future Research ............... 85

Appendix A. Experiment 1 Data: With and Without HRTF .......... 86

v

Page

Appendix B. Experiment 2 Data: AAMRL HRTF and RBF HRTF. . . 101

Appendix C. Matlab M-files ....... ........................ 116

C.1 angleresponse.m ....................... 116

C.2 anova.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

C .3 drc.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

C.4 expm atl.m .......................... 122

C.5 expm at2.m .......................... 123

C .6 idrc.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

C.7 LRsphere.m ................................. 125

C.8 makesphere.m ................................ 125

C.9 mapsurf.m .................................. 126

C.10 mapeval.m ........ .......................... 127

C.11 mypolar.m .................................. 127

C.12 plotresult.m ................................. 130

C.13 plothist.m .................................. 131

C.14 readall.m ........ ........................... 133

C.15 removerev.m ........ ......................... 135

C.16 sortdata.m ........ .......................... 136

Bibliography ......... ................................... 137

vi

List of Figures

Figure Page

1. Typical presentation forms of data ......................... 1

2. Block diagram of a typical signal processing transfer function . ... 7

3. HRTF transforms sound to aid the brain in location perception . . . 7

4. Elevation, ¢ measured from the ground plane to a sound source ... 8

5. Azimuth, 0 of sound source to directly in front of face ............ 9

6. Distant, d of sound source to center of head ................... 9

7. Geodesic sphere with sound sources at multiple locations .......... 10

8. Multilayer-Perceptron (MLP) neural network with two hidden layers

10 nodes each ......... .............................. 11

9. Radial-Basis-Function (RBF) neural network .................. 12

10. Training function y = cos(27rx) ........................... 13

11. Training function z = cos(27r(x 2 - y2)) ..... ................. 14

12. 272 speaker locations around the head ...................... 16

13. HRTF samples from 0 to 1000 ....... ..................... 17

14. Left ear HRTF data around the head for a frequency of 5623 Hz. .. 18

15. Left ear HRTF data around the left side of head for frequencies 119

Hz to 19,953 Hz ........ ............................. 19

16. Zero elevation HRTF data for the left ear. Frequency and Response

shown in logarithm scale ........ ........................ 20

17. A 3 layer net, 1 input node, 1 hidden node and 1 output node notated

1:1:1 .............................................. 28

18. y = x (solid) and MLP (1:1:1) output (dash) after 100 epochs . . 29

19. y = x (solid) and MLP (1:5:1) output (dash) after 1000 epochs . . 30

20. A 4 layer net, 1 input node, 5 hidden nodes,1 hidden node and 1 output

node notated 1:5:1:1 ........ ........................... 30

21. y = x (solid) and MLP (1:10:1) output (dash) after 1000 epochs . 31

vii

Figure Page

22. y = x (solid) and MLP (1:10:1) output (dash) after 300 epochs . . . 31

23. y z -x 2 (solid) and MLP (1:10:1) output (dash) ............... 32

24. x • -7y 2 (solid) and MLP (1:10:1) output (dash) ............... 33

25. Training function y = cos(x), where x ranged for 0 to 27r with a step

size of 0.01 for a total of 629 points ....... .................. 35

26. Dynamic Range Compression of y = cos(x) to values between 0 and 1 35

27. y = cos(x) (solid) and RBF (1:1:1) output (dash) with 1 center in the

K-means clustering algorithm ............................. 36

28. y = cos(x) (solid) and RBF (1:2:1) output (dash) with 2 centers in the

K-means clustering algorithm ............................. 37

29. y = cos(x) (solid) and RBF (1:9:1, 1:32:1) output (dash) with 9 and

32 centers in the K-means clustering algorithm ................. 37

30. y = cos(x) (solid) and MLP (1:10:10:1) output (dash) after 15, 100

and 220 epochs ........ ............................. -38

31. z = cos(27r(x 2 - y 2)) where x and y ranged from 0 to 1 ........ ... 38

32. z = cos(27r(x 2 - y 2)) where x and y ranged from -1 to 1 ........ .. 39

33. Output of RBF with 8, 64 and 128 centers in the K-means clustering

algorithm ......... ................................. 40

34. RMS error of RBF with 8, 64 and 128 centers in the K-means clustering

algorithm ......... ................................. 40

35. Difference between outputs (left) and RMS errors (right) using 64. and

128 centers in the K-means clustering algorithm ............... 41

36. Symmetric sigmoid used for non-linearity function y = +ex - 1 . . . 41

37. Output of MLP after 220 and 440 epochs with architecture of 2:10:1 42

38. RMS error of MLP after 220 and 440 epochs with architecture of 2:10:1 42

39. Output of MLP after 20, 50 and 100 epochs with architecture of 2:64:1 43

40. RMS error of MLP after 20, 50 and 100 epochs with architecture of

2:64:1 ......... ................................... 43

41. Output of MLP after 20, 220 and 440 epochs with architecture of

2:10:10:1 ......... ................................. 44

viii

Figure Page

42. RMS error of MLP after 20, 220 and 440 epochs with architecture of

2:10:10:1 ......... ................................. 44

43. Zero elevation HRTF data for the left ear. Frequency and Response

shown in logarithm scale ........ ........................ 46

44. Zero elevation MLP (2:50:50:1) after 7235 epochs. Frequency and Re-

sponse shown in logarithm scale ....... .................... 47

45. Zero elevation RBF with 256 centers. Frequency and Response shown

in logarithm scale ........ ............................ 48

46. Zero elevation RMS err for the MLP (left) and RBF (right) ..... ... 48

47. Display provided on a computer screen for selection of the perceived

location ......... .................................. 50

48. Algebraic Error, positive values indicate a clockwise difference and

negative values indicate a counter-clockwise difference ............ 58

49. Reversal, a symmetric flip about the horizontal axis through the ears 59

50. Example of a subject with back-front reversals with HRTF treatment

(left) and a different subject with front-back reversals without HRTF

treatment (right). Notation: x indicates actual location, o indicate

perceived location. The line between the x and o shows the pairwise

groupings. The gradual spacing toward the center of the head is for

clarity only and is not an indication of perceived distance ....... .. 59

51. Example of subject whose perceived locations defaulted to 90' and

2700. Notation same as above ....... ..................... 60

52. Azimuth, 0 of sound source to directly in front of face ............ 61

53. Absolute mean error and standard deviation for sounds with the HRTF

(solid) and without the HRTF (dash). The error bars show the stan-

dard deviation of the mean error ........................... 67

54. Algebraic mean error and standard deviation for sounds with the

HRTF (solid) and without the HRTF (dash). The error bars show

the standard deviation of the mean error ...................... 67

55. Mean and standard deviation for sounds with the HRTF (left) and

without the HRTF (right). The arcs show the standard deviation of

the mean (shown as a *) ................................ 68

ix

Figure Page

56. Polar Histogram of all responses for sounds with the HRTF (solid) and

without the HRTF (dash) ............................... 69

57. Polar Histogram of all responses correct within one button for sounds

with the HRTF (solid) and without the HRTF (dash) ......... ... 71

58. Polar Histogram of reversal corrected responses correct within one but-

ton for sounds with the HRTF (solid) and without the HRTF (dash). 71

59. Absolute mean error and standard deviation for sounds with the AAMRL

HRTF (solid) and RBF HRTF (dash). The error bars show the stan-

dard deviation of the mean error ........................... 77

60. Algebraic mean error and standard deviation for sounds with the

AAMRL HRTF (solid) and RBF HRTF (dash). The error bars show

the standard deviation of the mean error ...................... 78

61. Mean and standard deviation for sounds with the AAMRL HRTF (left)

and RBF HRTF (right). The arcs show the standard deviation of the

mean (shown as a *) ................................... 79

62. Polar Histogram of all responses for sounds with the AAMRL HRTF

(solid) and RBF HRTF (dash) ............................ 80

63. Polar Histogram of all responses correct within one button for sounds

with the AAMRL HRTF (solid) and RBF HRTF (dash) ....... ... 81

64. Polar Histogram of reversal corrected responses correct within one but-

ton for sounds with the AAMRL HRTF (solid) and RBF HRTF (dash) 82

65. All subjects' responses for actual locations, 100, 320, 450, 58', 690 and

820 shown left to right, top to bottom respectively. Notation: x and

solid line indicates with HRTF, o and dashed line indicate without

HRTF. The arcs show the standard deviation of the samples. The *

and + indicates the mean with and without HRTF respectively. . 97

66. All subjects' responses for actual locations, 980, 1110, 1230, 1350, 1480

and 1690 shown left to right, top to bottom respectively. Notation is

same as above ....................................... 98


and 262' shown left to right, top to bottom respectively. Notation is

same as above ....................................... 99

x

Figure Page

68. All subjects' responses for actual locations, 2780, 2910, 3030 , 3150, 3280


same as above ....................................... 100

69. All subjects' responses for actual locations, 10', 320, 450, 58', 69' and

82' shown left to right, top to bottom respectively. Notation: x and

solid line indicates with AAMRL HRTF, o and dashed line indicate

RBF HRTF. The arcs show the standard deviation of the samples.

The * and + indicates the mean AAMRL and RBF HRTF respectively. 112

70. All subjects' responses for actual locations, 98', 111', 123', 135', 148'


same as above ....................................... 113



same as above ....................................... 114

72. All subjects' responses for actual locations, 278', 291', 303', 3150, 3280

and 3490 shown left to right, top to bottom respectively. Notation is

same as above ....................................... 115

xi

List of TablesTable Page

1. Simple linear function ........ .......................... 28

2. Simple nonlinear function ............................... 32

3. Analysis of Variance for "Reduced three-way model" ............. 55

4. Experiment 1 factors ........ .......................... 60

5. Levels of "Azimuth" ................................... 61

6. Analysis of Variance for "Response" ........................ 62

7. Analysis of Variance for "Response" with reversal corrections . . .. 64

8. Without reversal corrections (nor) ...... ................... 70

9. With reversal corrections (rev) ........................... 70

10. Summary of Experiment One ............................. 72

11. Analysis of Variance for "Response" ........................ 73

12. Analysis of Variance for "Response" with reversal corrections . . .. 75

13. Without reversal corrections (nor) ...... ................... 80

14. With reversal corrections (rev) ........................... 80

15. Summary of Experiment Two ....... ..................... 82

16. Three dimensional graphic commands. Copyright (c) 1984-93 by The

MathWorks, Inc ..................................... 116

xii

AFIT/GE/ENG/94D-21

Abstract

This thesis determines whether an artificial neural network (ANN) can ap-

proximate the Armstrong Aerospace Medical Research Laboratories (AAMRL) head

related transfer functions (HRTF) data obtained from research at AAMRL during

the fall of 1988. In order to test this hypothesis, two separate tests are performed.

The first test determines whether HRTF lends any support in sound localiza-

tion when compared to no HRTF (Interaural Time Delay only). Depending on the

statistical method used, we can conclude that the HRTF does provide an advantage

in localization accuracy over simple ITD. There is, however, a statistically significant

interaction between the location of the sound and whether the HRTF or no HRTF

is used. When this interaction is removed using the alternate F-Value, the statistics

give the result of equal means for the filters and azimuth. This means that at certain

angles of azimuth, the HRTF either provides no advantage at all or hinders local-

ization capabilities. Adding in the corrections for reversals changes the results to

where the means of the azimuth are not statistically equal. The reversal corrections

will inherently reduce the error results. However, comparing the number of reversals

indicates an advantage of using the HRTF over no HRTF.

The second test determines whether HRTF and ANN lend the same amount of

imformation in sound localization when compared to each other. Without consider-

ing reversals, we can conclude that the RBF HRTF provides a statistical advantage

in localization accuracy over the AAMRL HRTF from which they were derived.

However, with reversal corrections included, an advantage is not indicated, and two

filters are statistically equal. Also with reversal corrections included, there is a sta-

tistically significant interaction between the location of the sound and whether the

AAMRL HRTF or ANN HRTF is used. As discussed in the first experiment, this

means that at certain angles of azimuth the HRTF either provides no advantage at

xiii

all or hinders localization capabilities. However comparing the number of reversals

does not indicate a large difference between the AAMRL HRTF and the RBF HRTF.

The final conclusion is that the AAMRL HRTFs can be approximated by an

artificial neural network for azimuth positions at zero degrees elevation and still have

good results.

xiv

Head Related Transfer Function Approximation Using Neural

Networks

1. Introduction

1.1 Background

The visualization of multivariate data is a complex area. There are a number of

ways to help view the data. The conventional approach for the presentation of data

is to present it in tabular or graphical form. Common graphical techniques work fine

for three dimensions or less. For example, figure 1 shows some typical presentation

forms of data. In each case, the presentation utilizes only the human sense of sight.

3D plot Sphere

Plot of sine wave Polar plot1 .901

0.5

00

-0.5 21

-1 ' 2700 5 10

Figure 1. Typical presentation forms of data

The other four senses (auditory, touch, taste and smell) are not exploited. The

presentations in figure 1, as depicted, sound the same. Except for the thickness of

the ink on the page, they feel the same. They also taste the same and smell the same.

Of these four unused senses, the auditory sense has the most practical potential (6).

For example, in a study of sleeping subjects, the subjects responded faster to an

auditory alarm than to the presence of heat or the smell of smoke (10, 22). The

fastest, simple reaction time come from electric shock (130 ms), followed by audio

and tactile (140 ms) and finally visual (180 ms) (22). Sleeping is a case where a

visual display is not appropriate. The following types of circumstances are possible

cases where an auditory display would be preferable to a visual display: (22)

"* When the origin of the signal is itself a sound

"* When the message is simple and short

"* When the message will not be referred to later

"* When the message deals with events in time

"* When warnings are sent or when the message calls for immediate action

"* When continuously changing information of some type is presented, such as

aircraft, radio range, or flight-path information

"* When the visual system is overburdened

"* When speech channels are fully employed (in which case auditory signals such

as tones should be clearly detectable from the speech)

"* When illumination limits use of vision

"* When the receiver moves from one place to another

"* When a verbal response is required

Many systems make use of different audio and visual displays. Sound displays

are not limited to one dimension (monophonic) or two dimensions (stereophonic).

Three dimensional (binaural) sound is where the listener hears the direction and

2

distance of the sound. Begault and Wenzel list some advantages and possible appli-

cation of binaural systems: (3)

"* monitor and identify sources of information from all possible locations

"* improve intelligibility of sources in noise

"* enhance segregation of multiple sources of speech

"* improve air traffic control displays

"* applications in cockpit communications

"* telepresence applications such as teleconferencing

"* shared electronic workspaces

"* telerobotic control

A potential application listed above is in the cockpit (3, 8, 22). Although

sound is already an important tool in the cockpit-only 1D sound is used. The pilot

perceives the sound as coming from his or her earphones or from inside his or her

head. In 3D sound, the pilot perceives the sound as coming not from the earphones or

inside the head, but from a location outside the head at a certain azimuth, elevation

and distance. By using 3D sound, the position of the pilot's wingman can then be

encoded into three dimensions so that when the pilot hears the wingman speak he

or she perceives the wingman's voice as coming from the wingman's relative position

to that of the pilot. In the same way, targets and threats can be presented to the

pilot. Active and non active threats could be distinguished by different pitches. The

distance that the sound is perceived from the head could indicate the urgency of

the data. Immediate action information could be placed close to the head while less

urgent information could be placed farther away. Begault, in an experiment involving

commercial pilots and collision warning systems, found that air crews presented with

3D auditory data acquired targets approximately 2.2 seconds faster than air crews

that were presented only 1D auditory data (2).

3

In another example, data from an Army battlefield simulation was encoded

into sound creating "battlefield songs" of the data (6). For example, in a song with

only three variables: rhythm, tempo and pitch; rhythm may indicate the combatants

of the battle, tempo may indicate the number of troops moving toward the front and

pitch may indicate the number of troops already at the front. Using this scenario,

a battlefield song of one rhythm that is increasing in tempo and pitch and another

rhythm that has steady tempo and pitch would indicate one combatant of the battle

had continuous reinforcements; while the other combatant did not get reinforcements

and was losing troops at the front. In the multidimensional case, listeners had

difficulty hearing the two sides of the battle independently; however, 3D sound may

help separate the songs better by separating the songs to different locations outside

the head (3).

1.2 Problem

This thesis determines whether an artificial neural network (ANN) can approx-

imate the Armstrong Aerospace Medical Research Laboratories (AAMRL) head re-

lated transfer functions (HRTF) data obtained from research at AAMRL during the

fall of 1988 (12). In order to test this hypothesis, two separate tests are performed.

The first test determines whether HRTF lends any support in sound localization

when compared to no HRTF (Interaural Time Delay only). The second test deter-

mines whether AAMRL HRTF and ANN HRTF lend the same amount of support

in sound localization when compared to each other.

1.3 Definitions

This section provides definitions of key terms that will be used in this the-

sis. (25)

Binaural sound is sound that arises from two separate audio signals-one at

each ear. In a natural environment, humans hear binaural sound. The signal that

4

arrives at the left ear is characteristically different from the signal at the right ear.

The brain uses the differences between the two signals to determine the direction

and distance of the source. Binaural sound is usually recorded with a manikin as

opposed to stereo sounds which are recorded with without a manikin. The left ear

signal is recorded separately from the the right ear signal. When sound is played

through headphones, the sound is binaural if the signal sent to the left headphone is

separate from the signal sent to the right (23:3).

Binaural mixing console is an electronic device that contains filters which con-

vert a monophonic signal to a binaural signal corresponding to a given distance and

direction (15:208).

Telepresence is a human's perception of being present in a natural environment

while actually being present in an artificial or virtual environment.

Virtual audio is synthetically produced audio signals which enable a listener

to achieve auditory telepresence.

Extracranialized sound is sound that is presented to a listener in the form of a

binaural signal through headphones and is perceived as coming from some distance

from the listener (i.e. outside the head).

Pinna(e) is(are) the human outer ear(s). The design of each person's pinnae

is unique and each set of pinnae will uniquely filter sounds. In fact, the human head

and ears form an antenna system characteristic to the individual (24). Experiments

have shown that spectral shaping by the pinnae is dependent upon direction and cues

provided by the pinnae. These are critical in extracranializing the sound (30:84).

Head related transfer function (HRTF) is the transfer function which accounts

for the filtering effects of the pinnae. Because the filtering of sound by the pinnae is

dependent upon direction, there is a different HRTF for each angle of azimuth and

elevation. HRTF's will differ for each persons pinnae; however, these differences are

5

relatively small. Accurate localization of sound sources can be made using "someone

else's pinnae" by presenting sounds filters with another persons HRTF's (31).

Auditory Localization Cues are cues that change the sound as a function of

sound source and receiver location. There are four primary types of cues, 1) Monaural

Temporal Cues, 2) Monaural Spectral Cues, 3) Binaural Temporal Cues, and 4)

Binaural Spectral Cues. Spectral cues affect the spectrum of the sound that reaches

the ear, while temporal cues affect only the arrival time of the sound signal (31)

1.4 Research Objectives

In order to solve the stated problem the following research objective will be

met:

" Modify AFIT algorithms that have been used successfully on placing sounds in

azimuth and elevation. The algorithms that create 3D sound with the AAMRL

HRTF will be changed to use the ANN HRTF instead. This will allow the

creation of tests to compare the ANN HRTF with the AAMRL HRTF.

" Statistically check the advantage of HRTF and ITD versus ITD only. The

results of the first experiment will provide a significantly large data base to

analyze.

"* Statistically check the difference between AAMRL HRTF data and ANN HRTF

data. The results of the second experiment will provide a significantly large

data base to analyze.

"* Investigate the program LNKmap as a neural network tool for function ap-

proximation.

1.5 Scope

This thesis will process and analyze experimental data to obtain results of the

two experiments. The differences in sound localization accuracies from the HRTF

6

versus no HRTF experiment and the AAMRL HRTF versus ANN HRTF will both be

analyzed by the analysis of variance method (14). To limit the amount of data to be

processed only the 24 azimuth location at zero elevation will be analyzed. This will

facilitate the Neural Network learning time and the amount of data to be analyzed.

1.6 Approach

1.6.1 Synthesizing 3D Cues. Humans use auditory localization cues to

spatially localize aound. Three primary characterists of the auditory signals are

attenuation, time of arrival (TOA) and HRTF. As the sound travels to the body,

obstacles such as walls, furniture or even air attenuate the sound source. If the head

is not directly facing the sound wave, the relative distance from each ear to the sound

source will be different. This gives the brain the TOA cue. Once the sound reaches

the body, the head, the torso, the nose and the outer ear or pinna filter the sound

wave. This filtering function is known as the HRTF. The HRTF is analogous to an

analog signal processing filter (figure 2). As in a analog filter where an input signal is

Input signal -. H Output signal

Figure 2. Block diagram of a typical signal processing transfer function

transformed by a system function, H to an output signal, the HRTF transformation

of the sound appears to facilitate the brain in localization perception (figure 3).

The factors which affect. the HRTF include distance (d), elevation (q), azimuth (0),

HRTFInput Head D Output

Sound Source Torso Inner Ear and Brain

Pinnae

Figure 3. HRTF transforms sound to aid the brain in location perception

frequency (w), torso and head size. The frequency of the sound plays an important

-7

role in determining the HRTF. Although the HRTF varies for frequency and for each

point in space, the torso and head size can be generalized from person to person (3).

For example, the HRTF for a sound located at 300 elevation, 200 azimuth, 10 feet

and 440 Hz will be different from the HRTF for a sound located at 300 elevation,

200 azimuth, 10 feet and 1000 Hz. However, the sound transformed by the same

HRTF will generally be perceived as coming from the same direction by people with

varying size heads and torsos.

The elevation (0) of the sound is measured from the ground plane through the

ears to a sound source (figure 4). The azimuth (0) of the sound is measured from

0

Figure 4. Elevation, 0 measured from the ground plane to a sound source

directly in front of the face counter-clockwise to the sound source (figure 5). The

distance (d) of the sound is measured from the center of the head to the sound source

(figure 6).

All of these parameters can be measured in a sphere (figure 7) to determine

the HRTF for any location and frequency. To measure the AAMRL HRTF, an

anatomically correct manikin bust is positioned at the center of the anechoic chamber

sphere with microphones placed inside the ear canals. Pure sine waves produced by

the speakers are recorded with the ear microphones and used to determine the HRTF.

The sine wave frequency is held constant until that particular HRTF is determined

and then it is incremented for the next HRTF sample. The speaker location contains

8

0 0

Figure 5. Azimuth, 9 of sound source to directly in front of face

0 0

Figure 6. Distant, d of sound source to center of head

9

Figure 7. Geodesic sphere with sound sources at multiple locations(23:8)

the azimuth, elevation and distance information. Smith gives the location in azimuth

and elevation of 272 speakers used to determine the HRTF by the above method

(25). Once all the HRTFs are calculated, they can be used to filter any sound and

its reflections. When attenuation and TOA are combined with the filtered sound,

any recorded sound can be replayed to the listener through headphones to give the

impression of a sound located at a distinct direction from outside the head.

1.6.2 Artificial Neural Networks. Current models of producing 3D sound

from the HRTF require lengthy computations that cannot be accomplished in real

time with a general purpose computer (26). Although two special purpose computers

have been developed, the DIRAD and Convolvotron, this thesis will investigate the

use of artificial neural networks to approximate the HRTF (23). Along with approx-

imating the HRTFs, the networks will provide easy interpolation between the known

HRTF data points. They may also provide insight into the underlying function of the

HRTF. In other words, the networks may show that there is a less complex function

for the HRTF than what is represented by the set of AAMRL HRTF data.

10

1.6.2.1 Multilayer Perceptron. Two types of artificial neural networks

will be studied. The first is the Multilayer Perceptron (MLP). It has been shown by

Cybenko that the MLP can approximate any arbitrary function (20). Figure 8 shows

a three layer MLP (3:10:10:1). The input layer has three nodes corresponding to

Azimuth

Elevation -•

Frequency

Figure 8. Multilayer-Perceptron (MLP) neural network with two hidden layers 10nodes each

azimuth, elevation and frequency. The next two layers or hidden layers have 10 nodes

each called hidden nodes. The last layer or output layer has one node corresponding

to the HRTF. The number of nodes in the hidden layers are arbitrary, however,

Neti, Young and Schneider found that a MLP network with less than 10 nodes

in the hidden layers was sufficient for output direction from input sound (HRTF)

(16). Although this thesis will reverse the inputs and outputs, input direction and

output HRTF, it seems reasonable to start the investigation with a similar network

architecture because it will basically be an inverse transformation of the Neti, Young

and Schneider MLP.

11

The MLP will be trained with HRTF data from Smith's thesis (25). The data

contain 25,296 HRTF taken from 272 speaker locations at 93 different frequency. The

speaker locations range from 0' to 3600 in azimuth and -90o to 900 in elevation. The

frequencies ranges from 100 Hz to 20,000 Hz. A draw back of the MLP is the time

required to train the network. Three layers of weights have to be trained. Therefore,

the second network to be studied is the Radial-Basis-Function (RBF) artificial neural

network.

1.6.2.2 Radial-Basis-Function. RBF networks are broad enough for

universal approximation (19). The RBF is a two layer network. Figure 9 shows a

RBF network with 8 nodes in the hidden layer. Again, the input layer has three

Centers

Azimuth

Elevation HRTF

Frequency

Figure 9. Radial-Basis-Function (RBF) neural network

nodes corresponding to azimuth, elevation and frequency. The next layer or hidden

layer has 8 nodes corresponding to the centroids of the clustering algorithm. The

last layer or output layer has one node corresponding to the HRTF output. The

RBF will be trained with the same data as the MLP above.

12

1.6.3 LNKmap. The computer program LNKmap allows the user to train a

number of different artificial neural networks on a SUN workstation. Both the MLP

and RBF networks are available on this system. Since the HRTF is a complicated

function to approximate, bugs in the LNKmap program would be hard to spot.

Therefore, relatively simple functions were tested first to verify the inner workings

of the LNKmap program. For example, a one input, one output function like y =

cos(27rx) (figure 10) and a two input, one output function like z = cos(27r(x 2 - y2))

(figure 11) were trained and analyzed. From the training of these simple functions,

0.8

0.6

0.4

0.2.5_

-0.2

-0.4

-0.6

--0.8

0 1 2 3 4 5 6 7x-axis

Figure 10. Training function y = cos(27rx)

insight into the workings of LNKmap was gained. If LNKmap could not have been

trained on one of these simple functions it would not have been able to be trained a

complicated function like the HRTF. If this was the case, other programs would have

been investigated to train the networks. Once a program was successfully trained

on the simple functions, the process of training on the HRTF data was began.

13

0.5. ,;ii•

01-0.,5

0.8 ..

0.6 0.80.4 0.6

0.2 0.2 0.4

y-axis 0 0 x-axis

Figure 11. Training function z = cos(21r(x 2 - y2))

1.7 Thesis Outline

Chapter one is an introduction to this thesis. It presents a brief background

of auditory displays which leads into the problem statement for this thesis. After

that it defines key terms and research objectives. Finally it presents an approach to

synthesizing 3D cues and creating articficial neural networks.

Chapter two continues the discussion of visualizing data that was introduced

in chapter one. It also contains a review of past research in sound localization.

Specifically, the work of Oldfield and Parker, and Begault and Wenzel is presented

in great detail because of their direct correlation with this thesis. This chapter

finally presents a historical record of the learning of the program LNKmap and the

background needed to train an ANN with the HRTF data.

The previous chapters explored the background of 3D sound and ANN. Chapter

three will combine these two concepts to train a ANN to approximate the 3D sound

HRTF. This chapter presents the methodology used in training a MLP and RBF

14

ANN. It also presents the objectives and methodology of two experiments. Finally,

it presents a detailed analysis of variance model used to analyze the experimental

results.

Chapter four presents the results of the two experiments conducted during

this thesis work. Before presenting the results of each experiment a discussion of the

types of errors analyzed in the experiments is presented.

Chapter five presents the summary, conclusions and recommendations for fu-

ture research.

15

II. Background

2.1 Introduction

The visualization of the data is important to help determine its characteristics.

Viewing the HRTF data is a classic case of how to visualize multidimensional data.

The HRTF data consists of 25,296 data points. Broken down, this consists of 272

azimuth and elevation locations: 0' to 350' in azimuth and -82' to 820 in elevation.

The azimuths and elevations are spaced so that they are approximately 150 apart over

the whole sphere (see figure 12). For each azimuth and elevation pair, 93 frequencies

".......... ........

.• -- .0" ............................

_-0 -- ---------_6-1

• - ,' 0 a •. ' .

----------- I............ ..................... - - . . .04 0

. . . .. ... .. .. . ........... O. ----------------- ..... O ................ .0 .. . .------.. ! ..... .. . .S-. . 0 0. -.

o~o • o- o• ! o . .... o.-. .... .- o .. ..40:

0 - o 0 0 0 0:9

00 0 000 0..0

"0 .o0: 0

0,0 0 0 0

O,09 0 .0 O _ 0... :... ..... .... •.'-..0 0,OL00

-. - 0 ---- - - - -

Figure 12. 272 speaker locations around the head

ranging from 100 to 19,952 Hz are used allowing the HRTF data to consist of four

axes of data: azimuth, elevation, frequency and response. Figure 13 shows the first

1000 HRTF samples broken down into the four axes. Each line represents the value

of the axis for that sample. For example, the 5 0 0 th sample approximately has a

response of 0.05, an elevation of 70', an azimuth of 1000 and a frequency of 1000 Hz.

16

10, 1

10

102~ ~ ~ ~ ~ ~ ~ ~ ~~ ~~~. ....... •,.........' ....... ;.......•

10 2

.............. : Azimuth

Elevation----------Frequency

100- Response

10'1

10.2

0 100 200 300 400 500 600 700 800 900 1000sample

Figure 13. HRTF samples from 0 to 1000

2.2 Graphical Display

Matlab is a powerful tool to help visualize the data. One approach to visual-

izing 4 dimensions is to set the first three dimensions to the x, y, and z-axis and the

4th dimension to a set color. Matlab has a number of commands to do this.

By using the sphere command, a sphere around the head can be color coded

with the HRTF value. The grey scale level indicates the magnitude of the HRTF.

This display uses only the azimuth, elevation and response data. A separate sphere

is needed for each frequency. Figure 14 shows left ear HRTF data around the head

for a frequency of 5623 Hz. Note the lighter shades of grey on the left side of the

head versus the right side of the head. This is because the left ear has a direct path

from the sound source. Figure 15 shows most of the frequencies for the left side of

the head only. The frequency increases from 119 Hz to 19,953 Hz. These two figures

show graphically that the HRTF magnitude varies both as a function of location and

frequency.

Similar to above where each plot consisted of a single frequency, figure 16

consists of a single elevation. The zero elevation data consists of 24 azimuth points

17

Right side of head Left side of head

Figure 14. Left ear HRTF data around the head for a frequency of 5623 Hz.

with 93 frequencies each. For a total of 2232 data points. The response has a number

of peaks and ridges with a high point localized slightly in front of the left ear at 3000

Hz. Both this view and the spherical view above have a limitations in the amount of

data that can be presented graphically. The following sections present a discussion

of the presentation of data.

2.3 Graphical and Auditory Displays

Graphical and auditory presentations are similar. When data varies with time,

graphical presentation such as animation can provide a powerful analysis tool of

graphical data (6). In the same sense, sound must be presented in time to be heard.

If the length of the sound is too short its pitch, tempo, rhythm, and other attributes

cannot be distinguished. Graphical symbols, known as icons, convey information

in a small amount of space. On many computers with graphical interfaces, icons

represent files, directories or functions. For example, a picture of a sheet of paper can

represent a text document or a picture of a trash can represent a delete file function.

"Earcons," the audio counterpart of icons, similarly convey information (4). The

18

119 126 133 141 150 158 168 178 188 200 211 224 237 251 266

282 299 316 335 355 376 398 422 447 473 501 531 562 596 631

668 708 750 794 841 891 944 1000 1059 1122 1188 1259 1334 1413 1496

1585 1679 1778 1884 1995 2113 2239 2371 2512 2661 2818 2985 3162 3350 3548

3758 3981 4217 4467 4732 5012 5309 5623 5957 6310 6683 7079 7499 7943 8414

0 9 1 1iO01 0 0OOs0 008913 9441 10000 10592 11220 11885 12589 13335 14125 14962 15849 16788 17783 18836 19953

Figure 15. Left ear HRTF data around the left side of head for frequencies 119 Hzto 19,953 Hz

19

10

10

10

10-

Frequency

Figure 16. Zero elevation HRTF data for the left ear. Frequency and Responseshown in logarithm scale

20

beep of the computer when it displays a warning message is an earcon. The two

tones of a door bell is an earcon conveying the message that someone is at the door.

In graphical presentation, color is often used as the fourth dimension when presenting

4D data. Sound resolution is similar to color resolution (6). Both the frequency of

pitch and the frequency of color are logarithmic. (blue 435.8 nm, green 546.1 nm and

red 700 nm) (9) Octaves provide expression of logarithmic variance (6). On the other

hand, sound is different from a graphic display. As discussed above, the graphical

presentation of multivariate data is commonly limited to three or four dimensions. In

contrast, the human ear can discriminate between any two of 400,000 different sounds

presented in rapid succession. In addition, most people can identify 49 different

sounds at one time (6). The different sounds can be used to represent different

dimensions of data. For example, pitch, volume, duration, attack, fundamental wave

shape and fifth harmonic wave shape can be used to represent 6 different dimensions.

Yeung used nine to twenty dimensions of sound (6). Matthew used a combination

of auditory and visual presentations of data (6). Matthew had five dimensions, x

and y visually and frequency, timbre and amplitude phonically. Smith, Bergeron

and Grinstein also combined auditory and visual presentations of data (26). In their

scheme, the auditory data representation is triggered by the position of the mouse

on the visual display. Most of the past research has been with 1D and 2D sound.

Although Bly did not use 3D sound in her thesis, she suggests using 3D sound for

the visualization of three dimensional data (6). "Several sound characteristics were

not used and these deserve attention in the future. Certainly location is an easily

detectable attribute of sound." (6:40)

2.4 Acuity of Sound Localization I

In an effort to investigate the acuity of sound localization, Oldfield and Parker

conducted a study of 8 subjects who judge the apparent spatial location of white

noise through speakers over a range -40' to +400 in elevation and 00 to 180' in

21

azimuth (17). There are two major binaural cues: Interaural Time Difference (ITD)

and Interaural Intensity Difference (lID).

2.4.1 Interaural Time Difference.

"* Interaural Time Difference is a function of the distance between the two ears,

the angle of incidence of the incoming sound, and the frequency of the sound (25).

"* ITD is frequency independent below approximately 500 Hz and above approx-

imately 3000 Hz; furthermore, between these frequencies ITD decreases as fre-

quency increases. For azimuth angles of incidence less than 60 degrees left or

right of the front of the listener (0 degrees), the minimum ITD occurs between

1400 and 1600 Hz. (25, 11:165-166).

"• The most significant changes in ITD occur at angles of incidence between 0

and 30 degrees, consistent with the fact that humans perceive a sound best

when the source is directly in front of the listener (25, 1:700).

2.4.2 Interaural Intensity Difference.

"* Interaural Intensity Difference is a function of the distance between the two

ears, the angle of incidence of the incoming sound, and the frequency of the

sound. IID can also be affected by size and shape of the torso, head, and ears.

The torso affects primarily the low frequencies (25).

"• The most significant changes in IID occur at angles of incidence between 0 and

30 degrees, consistent with the fact that humans perceive a sound best when

the source is directly in front of the listener (25, 1:700).

"* Improvement in localization capabilities of humans at frequencies above 3000

Hz is a result of improvement in the IID cue. (25, 11:166).

"* At frequencies above 8000 Hz, IID cues increase human capabilities to localize

in elevation as well as in azimuth (25, 13:101).

22

From these two cues, there is a range of positions which could account for

the same pattern of interaural differences. In a simplified model of the head, these

positions form a cone. Watkins calls it the "cone of confusion" (29). Oldfield and

Parker presented a second set of cues which they describe as spectral modifications

by the pinna and head. These spectral modifications are what Begault and Wenzel

call the head related transfer function (HRTF) (3). Past studies show evidence of

non-binaural cues (HRTFs) (17). These studies use sound sources in the median

sagittal plane. The median sagittal plane divides the body into left and right sides.

Other studies blocked the pinnae or low pass filtered the sound sources and confirmed

that localization ability is lost. Few studies before 1984 use sound sources outside

the median sagittal plane (17).

2.4.3 Experiment 1. The Oldfield and Parker experiment was conducted

in an anechoic chamber for frequencies down to 250 Hz. The apparatus for the

experiment consisted of a boom for positioning the sound source, a photographic

recording system and pointing gun. The subject's head was held stationary. The

importance of head motion in human auditory localization cannot be overstated.

Without head motion, a person's head and ears can still act as a directional antenna;

however, sounds located at 0 degrees azimuth and various elevations (i.e. on the

medial sagittal plane) can be extremely difficult to localize. Front/back and up/down

confusions result when head motion is not allowed; this occurs because the spectral

and temporal cues are very similar at both ears (31, 5, 25). The implementation

of head motion into a binaural room simulation requires that a real-time system be

used (23). For this reason, head motion was not implemented into the non-real-time

simulation done for this thesis.

The sound was white noise which the subject could activate through a hand-

held switch. The subject used a pointing gun to "shoot" the sound. The position

was recorded with the photographic recording system. The advantage of pointing is

that the response is not limited to preset locations. The disadvantage of pointing is

23

the limitation of the accuracy with which the hand can be positioned. Fitts found

that blind-positioning movements are most accurate in the dead-ahead, center and

lower tier positions and least accurate at the side and upper tier positions (22, 7) To

avoid pointing inaccuracies in this thesis, the sound locations were fixed and subjects

were required to pick preset locations.

2.4.4 Experiment 2. To validate the pointing technique, Oldfield and

Parker conducted a second experiment to the measure the motor skills of the sub-

jects. The apparatus and procedures were the same as experiment 1 except the

subjects were not blindfolded and the chin brace was loosely fitted. The subjects

were allowed to see the speaker position before the sound was presented. Next the

room was completely darken and the sound presented for the subject to "shoot."

They concluded that the biases in the pointing response did not affect the overall

outcome of the results. Reversals were never observed in the visual task (17).

2.4.4.1 Results. Azimuth error and elevation error were measured.

The absolute and algebraic form of these were computed. Absolute error measured

general acuity and algebraic error measured directional bias. For algebraic error,

positions perceived higher and behind were termed positive errors (actual position

minus perceived position). Front/back reversal were analyzed separately: "a straight

subtraction of the actual and perceived sound position has not been regarded by re-

searchers as an honest representation of error (17, 28:587)." This thesis also analyzed

reversals.

There were few front/back reversals. Most occurred in the upper back quad-

rant. One exception to this was the region near 90' azimuth where the reversals were

uniform over elevation. Another kind of error was what Oldfield and Parker called

"defaults to 90"'. Like reversals the defaults occurred in the upper back quadrant.

Defaults tended to boost the error in the upper back quadrant, whereas reversals

and non-reversals had the same amount of error.

24

2.4.5 Discussion. The results support the model of the cone-of-confusion

in the case of elevation error biases effecting azimuth error biases. Both azimuth

and elevation discrimination is less accurate behind the head. Oldfield and Parker

suggest that sound comings from behind the head do not directionally interact with

the convolutions of the pinna. Front/back discrimination derives from the pinna

when the head is held still. The higher number of reversal in the back region is

consistent with a reduction in pinna cues. Binaural differences cannot be separated

from pinna modification because they are in the signal before any binaural differences

can be processed by the brain.

2.5 Acuity of Sound Localization H

Oldfield and Parker also conducted a study of 4 subjects who judged the ap-

parent spatial location of white noise through speakers over a range -40' to +400

in elevation and 0' to 180' in azimuth (18). In this experiment the pinna of the

subjects were blocked to simulate the effect of localization without the pinna. This

experiment has a direct correlation with the first experiment conducted in this thesis

where the hypothesis of no HRTF is similar to no pinna. The implication is that

the determination of azimuth and elevation depends on more than just the binaural

differences from the sound source. An additional cue is the spectral modification

made by the pinna.

2.5.1 Experiment. The data collection apparatus was the same as described

above in part I of the study (17). Each subject was fitted with individually cast

molds for the pinna. An access hole remained to the auditory canal. Subjects were

presented 171 different speaker locations while blindfolded and head help stationary.

2.5.2 Results. The Oldfield and Parker study presented the relationship

between azimuth error and elevation error. This thesis presents only azimuth error.

With the pinna blocked the overall absolute elevation error was almost double that

25

of azimuth error (11.90 and 21.9' respectively). The largest differences between

absolute azimuth and elevation errors occurred in the front quadrant. The absolute

azimuth error was largest between 120' and 160' azimuth position. Azimuth error

is reasonably uniform and elevation error is less uniform with larger errors near the

extremes. The mean algebraic elevation and mean azimuth errors were generally

positive in the front and negative in the back. The subjects perceived the sounds in

the front pulled higher and toward the ears. The subjects also perceived the sounds

in the back pulled lower and toward the ears. Filling the convolutions of the pinna

has the effect of displacing elevation judgements toward 00.

A comparison of the results of the pinna filled versus unfilled (normal) showed

that there was a increase in elevation error in the pinna-filled case (filled 21.90 versus

normal 8.40). The azimuth error also increase in the pinna-filled case, but not as

great (filled 11.90 versus normal 9.30). The mean absolute azimuth error for the

filled-pinna was generally lower than the normal pinna except to azimuth positions

1400 to 160'. The error did not change significantly by azimuth position, however it

did change significantly by elevation position.

A number of subjects made front/back reversals. Reversals accounted for 26%

of the responses: 31% were front/back reversals and 21% were back/front reversals.

These results are compared with the findings in this thesis in chapter 4.

2.6 Headphone Localization of Speech

A study of 11 inexperienced subjects who judge the apparent spatial location

of headphone-presented speech stimuli filtered with non-individualized head-related

transfer functions (HRTFs) was conducted by Begault and Wenzel. (3). This is in

contrast with the Oldfield and Parker study who used experienced subjects listening

to white noise from an actual sound source in three dimensional space.

HRTFs are "the listener-specific, direction-dependent acoustic effects imposed

on an incoming signal by the pinnae (3:361-362)." It should be noted that not only

26

the pinnae, but the head and torso effect the incoming signal also. Note also that

this definition says "listener-specific". Although HRTF are "listener-specific", they

generalize to other people. The main thrust of the Begault and Wenzel study is

to investigate how non-individualized HRTFs work. The HRTFs used are from the

pinnae of a good localizer (or as Dr Rogers' would say a person with a "golden ear").

The study decides against simple averages of HRTFs to avoid possibly elimination

of distinctive spectral features. This thesis used AAMRL HRTFs (12).

Eleven adults were used in the Begault and Wenzel study. They were screened

by oral questions about hearing loss, recent exposure to loud noises and work en-

vironment. There is no apparent relation between audiometric measurements and

binaural performance (3). Therefore this thesis will forego the audiometric measure-

ment of subjects.

The sounds used in the study were from a set of 45 one- or two-syllable words

representing a particular phoneme. The sounds were filtered with the HRTFs to

simulate sounds coming from 0' elevation and 0', ±30', ±-60', ±90', ±120', ±150'

and 180' azimuth. Similarly, this thesis used the male voice utterance "seventeen"

presented at 0' elevation and 24 azimuth locations. Begault and Wenzel conclude

that most listeners can obtain useful directional information from speech without

requiring individual HRTFs. Taking this result a step further, this thesis investigated

the hypothesis of obtaining useful directional information from speech filtered with

a neural network trained HRTF.

2.7 Learning LNKmap with a simple function

2.7.1 Linear function. During the spring of 1994, the program LNKnet

was gaining popularity at AFIT as a neural network classifying tool. LNKnet's

companion program for function approximation (LNKmap) had not yet been fully

investigated. To begin the exploration of LNKmap, a simple function i.e. y = x was

inputed into LNKmap.

27

Since LNKmap was similar to LNKnet the file structure was assumed to be

the same. Therefore the input data file should be in two columns with the output

values in the first column and the input values in the second column. A small file

was used starting with 10 points between zero and one (see table 1). Since the

y x.1 .1.2 .2.3 .3.4 .4.5 .5.6 .6.7 .7.8 .8.9 .9

1.0 1.0

Table 1. Simple linear function

function is linear, a linear output node function was used. Most network users call

the "output node function" a "non-linearity function". A common starting point is

a 3 layer multi-layer perceptron (MLP), 1 input node, 1 hidden node and 1 output

node notated 1:1:1 (figure 17). Some researchers refer to this architecture as a 2

InPut Node Hidden Node Output Node

Figure 17. A 3 layer net, 1 input node, 1 hidden node and 1 output node notated1:1:1.

layer net referring to the number of weight layers. This thesis will try to use the

colon notation where ever a confusion may exist. Many classifiers present the data

in random order, however for function approximation it is best to present the data

in order (21). With the training set described above the net learned the function in

about 100 epochs.

28

0.9-

0.8-

0.7

M 0.6

0.4-

0.3

0.2 - original-- netoutput

0 ,1 0:2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

x-axis

Figure 18. y = x (solid) and MLP (1:1:1) output (dash) after 100 epochs

Next a standard sigmoid on the same data was tried while also increasing the

number of hidden nodes to 5 (1:5:1). The net learned the function in about 1000

epochs (see figure 19). In an effort to get a lower error a second hidden layer with

one node (1:5:1:1) was added. This architecture is shown in figure 20. After a 1000

epoch, the net had not learned the function. More nodes to the second hidden layer

(1:5:5:1) were added. After another 1000 epoch, the net had not learned the function.

Adding more nodes seems to slow the learning process therefore one hidden

layer was used in subsequent trials. With an architecture of 1:10:1 and a standard

sigmoid output node function, the net learned the function (see figure 21). With the

same architecture and using the symmetric sigmoid output node function, the net

learned the function with an even lower error (see figure 22).

2.7.2 Non-linear functions. At this point, some nonlinear data was tried.

The 10 point data was modified to be low near the end points and higher in the

middle (y - -X2, see table 2 and figure 23). The LNKmap plot of the data

approximated the function x - -y 2 (see figure 24). This indicated that the columns

of the data file were reversed. This discovery uncovered that LNKmap needed the

29

0.9-

0.8-

0.7--

0.5-6 /

0.4-

0.3

0.2 - - oi a0. b.- "I•"0

.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x-axis

Figure 19. y x (solid) and MLP (1:5:1) output (dash) after 1000 epochs

Input Node Hidden Node Hidden Node Output Node

Figure 20. A 4 layer net, 1 input node, 5 hidden nodes,1 hidden node and 1 outputnode notated 1:5:1:1.

30

0.8 -

0.7-

. 0.6/

'~0.5-

0.4-.

0.3-/

0.32-- original

0.2 -netoutput

1 0.2 0.3 0.4 015 0.6 0.7 018 0.9 1

x-axis


0.9

0.8

0.7

0.6-

0.5 -

0.4-

0.3-

0.2 -,- oiia

0.1 , -- netoutput

l. 1 012 013 014 0'5 016 017 0.8 0.9x-axls


31

y x

.1 .1

.4 .2

.6 .3

.9 .41.0 .51.0 .6.8 .7.5 .8.3 .9.1 1.0

Table 2. Simple nonlinear function

0.9-

0.8

0.7

•0.6-- -

0.5-

0.4

0.3-

0.2-- oginal0.2

- - netoutput

1 0.2 0'3 0'4 0.5 0.6 0.7 0.8 0.9

x-axis

Figure 23. y - -x 2 (solid) and MLP (1:10:1) output (dash)

32

0.9

0.8

0.7-

S0.6 - - -_

08.5

0.4

0.3

.2 - - netoutput0.2

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

x-axis

Figure 24. x _ -y 2 (solid) and MLP (1:10:1) output (dash)

data presented in a ASCII matrix where the first columns were the input parameters

and the last columns were the output parameters. This is contrary to LNKnet where

the first columns are the class. A correction to the columns was made and training

of a number of net configuration was resumed. The net could only match a straight

line to the data which contributed to the low number of data points.

2.8 Learning LNKmap with a Cosine function

After determining the correct file format, a trigonometric function was tested.

The function to be trained was y = cos(27rx), where x ranged from 0 to 1 with

a step size of 1 or 1 for a total of 629 points. Using the same methodology

as before, starting with a net architecture of 1:1:1 and linear output node function

many configurations were tried settling on two hidden layers with 10 nodes each

(1:10:10:1). Through all these trials, the current experiment was always continued

and randomly presented the data. Although, as mentioned above, the data should

be presented in order, the strategy at this point in the investigation of LNKmap was

to follow the same procedures as LNKnet. The results of the training was that the

net learned the first half of the cosine curve and then proceeded asymptotically to a

33

line parallel to the x axis for the second half of the curve. The approximate number

of epochs at this point was 10,000 to 20,000.

At this point it was concluded that random presentation of the data was not

working and the data was presented in the order of the input file (x increasing from

zero to one). The net learned the function.

In order to duplicate the results, Capt Smith used the same cosine training

file. He trained the net for 3000 epochs using random presentation of the data. The

net learned the first half as above. Next he continued to train the net to present the

data in order. The net learned the whole cosine function. Unfortunately, plots of

this data are not available because the LNKmap program has since been changed as

discussed below.

2.8.1 Cosine Revisited. From early training attempts with LNKmap, it

was determined that LNKmap did not correctly train on data with more than one

input. The author of LNKmap was contacted and a fix was obtained. After this fix,

the cosine function was retrained using both a RBF and a MLP architecture. This

time the function trained was y = cos(x), where x ranged for 0 to 27r with a step

size of 0.01 for a total of 629 points (see figure 25). Before training, the data was

normalized to values between 0 and 1 using a Dynamic Range Compression (DRC)

algorithm (see figure 26). Given data in a matrix where each column represents a

variable, the DRC subtracts the minimum value for each column from each value

in the column and divides by the maximum value of the difference for each column.

The Matlab code can be seen in appendix C.

2.8.1.1 RBF Architecture. The first architecture used for training

was a Radial Basis Function (RBF) using a K-means clustering algorithm. The RBF

was trained using 1, 2, 8, 9 and 32 centers in the K-means clustering. Figure 27 shows

the result with 1 center in the K-means clustering. Figure 28 shows the result with

2 centers in the K-means clustering. Since the cosine function in the range 0 to 27r

34

0.8

0.6

0.4

0.2

0-

-0.2

-0.4

-0.6

-0.8

0 1 2 3 4 5 6 7x-axis

Figure 25. Training function y = cos(x), where x ranged for 0 to 2ir with a stepsize of 0.01 for a total of 629 points

0.9

0.8

0.7

0.6-

0.5

0.4

0.3

0.2

0.1

C0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

x-axis

Figure 26. Dynamic Range Compression of y = cos(x) to values between 0 and 1

35

is essentially unimodal only one clustering center is needed. Forcing more than one

center increases the error. The LNKmap clustering algorithm does not place centers

in the same place. A different clustering algorithm may have placed the centers in

the same place. Figure 29 shows that adding more centers will reduce the error, but

at the expense of a larger network.

1.5

1 \ /

0.5/

0 /

-0.5 • ,, [- original

- - netoutput

-10.5

0 1 2 3 4 5 6 7x-axis

Figure 27. y = cos(x) (solid) and RBF (1:1:1) output (dash) with 1 center in theK-means clustering algorithm

2.8.1.2 MLP Architecture. Next a MLP architecture of 1:10:10:1

was chosen to match the best results from the previous cosine training (before the

software fix). The MLP'learned the cosine in only 220 epochs. Figure 30 shows the

results after training for 15, 100 and 220 epochs respectively.

2.8.2 Conclusion. It has not been determined why the network needed the

data presented first randomly and then in order to learn the function. Presenting the

data only randomly or only in order did not produce satisfactory results. The obvious

answer is that a bug was in the program. After the fix the program performed as

36

0.8-

0.6

0.40.2 "'\,

-0.2

-0.4-

-0.6 - original

-- netoutput-0.8

0 1 2 3 4 5 6 7x-axis

Figure 28. y = cos(x) (solid) and RBF (1:2:1) output (dash) with 2 centers in theK-means clustering algorithm

1.5 1.5,

l i II

0.5 .

9 0 0 • O

I It 1 1% 1

1 I -11 I: 1 7 ,I

-15ýSo -1.56"05 3 4 5 150 1 2 4 5 7

x.-a•d x-axis

Figure 29. y =cos(x) (solid) and RBF (1:9:1, 1:32:1) output (dash) with 9 and 32

centers in the K-means clustering algorithm

37

S/ /0.6 -,• 0.61 0.6

0.4 , 0.4

0.2 - 02 / 0

0 10-02 -02 -0.5

-0.4 -0.4

-0.8 ..8S

0 1 2 3 4 5 6 7 0 I 2 3 4 5 6 7 '0 2 3 4 5 6 7X-fi i.-0i

Figure 30. y = cos(x) (solid) and MLP (1:10:10:1) output (dash) after 15, 100 and220 epochs

expected. Meaning when the data was presented in order the network learned the

function faster. Regardless, the major effort of this investigation was to learn how

to use LNKmap as a function approximation tool. This task was accomplished.

2.9 Learning LNKmap with a Two Input Cosine function

The next function tested was the "bird function," z = cos(27r(x 2 - y2)) where

x and y ranged from 0 to 1 for a total of 10201 training points (see figure 31). This

01"- 0.5:"1::";

function was chosen because of its simplicity and the output values were relatively

38

low between -1 and 1 so no normalization was required. Expanding the axes of the

bird function actually shows the whole function to be a symmetric cross shape (see

figure 32).

0.5.

X ,

0 .5"

-0.5

0 ~ 0.-0.5 -0.5

y-1xis -1 -

y-axisx-axis

Figure 32. z = cos(27r(x 2 - y2)) where x and y ranged from -1 to 1.

2.9.1 RBF Architecture. Two network architectures were tested with the

bird data. The first was a Radial Basis Function (RBF) using a K-means clustering

algorithm. The RBF was trained using 8, 64 and 128 centers in the K-means clus-

tering (see figure 33). Power of two centers above 128 (i.e. 256, 512, etc...) caused

an error in the program. The best the LNKmap RBF algorithm could learn the

bird data is shown in -figure 33. The placement of the clustering centers can be seen

as bumps in the bird's back. The RMS error for the training using 8, 64 and 128

centers is shown in figure 34. The highest error was seen near the edge of the graphs

where the function was constrained. Increasing the number of centers in this case

did not reduce the error. The mean error for the 64 centers case was 0.1229 while

the mean error for the 128 centers case was 0.1273. Figures 33 and 34 do not show

39

much difference between the 64 center and 128 center cases. To examine the effect of

doubling the number of clustering centers from 64 to 128, the difference between the

two outputs is shown in figure 35. The average difference was 9.3888 x 10-5. The

difference between the two RMS errors is shown in figure 35. The average difference

of the RMS error was 0.0044.

0.5

-441-l 1 -1

0.8 0 .8 oo°06 Ci Ci 0.8 oi 0.8

0 0 0.0 0.4 0.6 .0. 0.4 0. 0 CA

y-Mu 0 0 L*od y-aM 0 0 -my.-3,o 0 0 x-ali

Figure 33. Output of RBF with 8, 64 and 128 centers in the K-means clusteringalgorithm

20.

0, 0. 0 00 0

00.4 .4 O T 0.4 0.0 . 0.4 0.6-0. 4 0- . 0 -a0. 2 0.2

y-_u 0 0 x-rd y.-4xi 0 0 X-Ax y.-Uxi 0 0

Figure 34. RMS error of RBF with 8, 64 and 128 centers in the K-means clusteringalgorithm

2.9.2 MLP Architecture. The second network architecture used to train on

the bird function was a Multi-Layer Perceptron (MLP). Three MLP architectures

were chosen to train the bird data. For each architecture, the non-linearity function

used was the symmetric sigmoid (see figure 36). The first architecture had one hidden

40

022

0, 0.11

-42

-0.3 4 -0.0

0.8 0.8

0.6 0.8 0.0 0.8040.6 O.4A 0.6

0 0.4 040.2 022 0.2

y-axis 0 0 x-,ax y-Wcs 0 0 Xaiy-r x-aaia

Figure 35. Difference between outputs (left) and RMS errors (right) using 64 and128 centers in the K-means clustering algorithm

S. .............. ......... i...... ..... .. •.

-• .i ... .. . . .............. ...... ... ... ........ ..... i . .... .... . . .. . .. .

Ma2..... .... .. ...... ... ....... . .. .i. .......i ........ .

-a4 . ....

-a•_ •. ........... • :. ......... , ........... ........... i ....... ..... ..... .......,

-• . .. .... ... ...:, . ... ... .... i .. . .... ...... ...... .....: .......... . . .......... . . ...

-6 - ~ -2 0 2 4 a

Figure 36. Symmetric sigmoid used for non-linearity function V 2 _11+e4

41

layer of 10 nodes (2:10:1). Figure 37 shows the progress of learning for 220 and 440

epochs respectively. The RMS error is shown in figure 38. The second architecture

0.5 0. 8

0. 6 . 0. 6 0.8

0.4 0 .,0. 0 .4

0.2 0.2 040.2 0,2 0.

y- is 0 0 x-axis y-axis 0 0 x-ixis

Figure 37. Output of MLP after 220 and 440 epochs with architecture of 2:10:1

4,

0: 0

0.0 0.8 0.6 0.80.4 0.4 0.4

02 0.2 0,2 0.2

y-axis 0 0 x-ax• y..•S 0 0 x-8A•

Figure 38. RMS error of MLP after 220 and 440 epochs with architecture of 2:10:1

had one hidden layer of 64 nodes (2:64:1). This architecture was chosen to compare

with the 64 center RBF. Figure 39 shows the progress of learning for 20, 50 and 100

epochs respectively. The RMS error is shown in figure 40. Comparing figures 33 and

39 it is clear that the RBF outperformed this architecture in both speed and error

even though the architectures have basically the same number of nodes. The third

42

,5 05.

-0051,r

08

0. 2 0.2 020yo 0 0 X.-Am • 0 xz 0 0 x-ayi

Figure 39. Output of MLP after 20, 50 and 100 epochs with architecture of 2:64:1

3.5,4

2.5,

0.DAJ 0.6 0.8 0.6 0,8040.6 0.4 0.8 0.4 .

O 0,4 0,4 0.40.2 '0.2 0.2 0.2 012 .2

y- m~i 0 0 ,-i y.-0~i 0 0 X-X"yLk 0 0 X-I~is

Figure 40. RMS error of MLP after 20, 50 and 100 epochs with architecture of2:64:1

43

architecture had two hidden layers of 10 nodes each (2:10:10:1). Figure 41 shows

the progress of learning for 20, 220 and 440 epochs respectively. The RMS error is

shown in figure 42.

11 1.1

05 05

0 6 02 000.6 0,8 1 A, 0,8 0. 0

0,4 .5 0. a,6 0A4 0.

-,2 0.4 0.4

-le 0 0 x- y0 0 a a .m•y 0 0 x-r

Figure 41. Output of MLP after 20, 220 and 440 epochs with architecture of2:10:10:1

Z61 2, 21

20,

0.5 05 .

020 1:0IO0.0. 0.8 0.8 ,!0.

050. ' 0. , '. 8 0.6 .4 0.4 04 0A 0. 0.4

020.4. 0.' 1.2 0.1 0.4

y-lmi 0 0 -• -x 0 0 -z -- il 0 0 x-x

Figure 42. RMS error of MLP after 20, 220 and 440 epochs with architecture of

2:10:10:1

2.9.3 Conclusion. Comparing figures 33 to 41 with figure 31 (the original

bird data), it is clear from these results that the MLP was able to learn the bird

data better than the RBF. However it is important to note that this result does not

generalize to other data and the different architectures produced alternate results.

The main purpose of the bird data tests was to verify that the LNKmap program

could approximate a function with more than one input.

44

2.10 Conclusion

This chapter explored the background of both actual and simulated 3D sound.

It showed that natural errors occur (i.e. reversals) with both actual 3D and simu-

lated 3D sound sources. The head, pinna and HRTF are important for 3D sound

localization. It also showed that non-individualized HRTFs can be successfully used

with speech.

The chapter also explored the use of LNKmap as a tool for function approxi-

mation. LNKmap was able to successfully train a number of different neural network

architectures from simple linear and non-linear functions to two input trigonometric

functions.

45

III. Methodology

3.1 Introduction

The previous chapters explored the background of 3D sound and neural net-

works. This chapter will combine the two concepts to train a neural network to

approximate the 3D sound HRTF.

3.2 Learning HRTF Data for Zero Elevation

3.2.1 MLP Architecture. Having shown that the LNKmap program works

correctly for multiple input data, the next step is to map the Head Related Transfer

Function (HRTF) data. The HRTF data consists of 25,296 data points. Broken

down that is 272 azimuth and elevation locations: 100 to 3500 in azimuth and -90'

to 900 in elevation. For each azimuth and elevation pair, 93 frequencies ranging from

100 to 20,000 Hz are used. To simplify the training, only the zero elevation is used in

this thesis. The zero elevation data consists of 24 azimuth points with 93 frequencies

each. For a total of 2232 data points (see figure 43). Since the MLP worked the

1002

10"

102

10, 3 00

10010' 400 AzImuth

Frequency

Figure 43. Zero elevation HRTF data for the left ear. Frequency and Responseshown in logarithm scale

46

best with the bird data, the zero elevation was first trained with a MLP with an

architecture of 2:50:50:1. The output node function was a symmetric sigmoid. The

weights were updated after each trial (no batch). Figure 44 shows the results after

7235 epochs. The MLP extremely smoothes the HRTF data. An experiment needs

100-

102 100

10' 200 '010' 10d•300

10' 400 Azimuth

Frequency

Figure 44. Zero elevation MLP (2:50:50:1) after 7235 epochs. Frequency and Re-sponse shown in logarithm scale

to be constructed to determine if the jaggedness of the actual HRTF data is needed

to perceive 3D sound. This is not addressed in this thesis.

3.2.2 RBF Architecture. The RBF was tested with the zero elevation data

next. A K-means clustering algorithm with 256 centers was used. Compared with

the MLP the RBF trained a lot faster and results were even better from a visual

standpoint. Figure 45 shows the results of training with the RBF. Blank areas in

the figure are where the RBF output was negative or zero (logarithmic scale requires

positive values). Since negative HRTFs are not possible, they will be zeroed for filter

design. The RBF is able to more closely match the jaggedness of the actual HRTF

data. Figure 46 show the RMS error for the MLP and RBF networks respectively.

A close inspection of figure 46 reveals that the maximum magnitude for RBF error

(0.2707) was actually higher than the MLP error (0.1573). However the MLP had

47

10".

10*.

0

10, 100

10, •300

10' 400 Azimuth

Frequency

Figure 45. Zero elevation RBF with 256 centers. Frequency and Response shownin logarithm scale

0.2-02.

0.15 0.25

10.1 102.1

0.1,0.05

0 0.05 0

0 100 100

lto0 3 0010 WOO 4300

lie le 101

10o 4 okih 400 4lAzitbFrequ / Fneqy

Figure 46. Zero elevation RMS err for the MLP (left) and RBF (right)

48

more error overall. The sum of the MLP and RBF error was 7.3518 and 5.8528

respectively.

3.2.3 Conclusion. It would be premature to generalize that the RBF works

better than the MLP for jagged data like the HRTF data as compared to smooth

data like the bird data. The MLP and RBF had two different architectures. The

number of weights used by the MLP used was 100 weights (50 weights per hidden

node) while the RBF used 512 weights (256 centers by K-means and 256 weights).

That is a ratio of 5 to 1. However the RBF does have the desirable feature of speed.

3.3 Experiment 1: With and Without HRTF

In the process of manipulating and filtering sound sources with the HRTF an

obvious question arose, "Whether or not the HRTF data helped in 3D localization?"

To answer this question experiment one was conceived.

3.3.1 Objective. Determine whether or not the utilization of AAMRL's

HRTF data for filter design provides a specific advantage over the use of only ITD's

in generation of a binaural signal. The filters were designed using the program ESPS

by Entropic Research lab. The design was an FIR filter using the weighted mean

square error criterion. The weights corresponded to the AAMRL HRTF for the 93

frequencies at the test location. A separate filter was designed for each ear and for

each location tested.

3.3.2 Method. Twenty subjects were presented 48 binaural sounds through

a set of headphones. -In this forced choice experiment, the subjects were asked to

select the location from which they perceived the sound. Subjects choose from a

display provided on a computer screen by moving the mouse to place the cursor over

the perceived location and clicking the button (see figure 47). The display had only

24 possible azimuths to choose from, and the elevation angle is always zero. These

49

Figure 47. Display provided on a computer screen for selection of the perceivedlocation

24 locations corresponded to the 24 possible locations of the computer generated

binaural signals. The signal used in each case was either digitally filtered and de-

layed appropriately for a given angle of azimuth, or simply delayed appropriately to

account for the angle. In either case, the original signal was the speech pattern for

a male utterance of the word "seventeen."

Subjects were asked to close their eyes during the data collection phase of this

experiment. Interaction effects between the subjects and the two fixed factors (angle

and method of signal generation) in this experiment are assumed to be zero. The

angle at which the signal is designed to come from was called factor B, and has

24 levels (one for each azimuth). Factor A represented the method by which the

binaural signal was generated, and has 2 levels (one for HRTF combined with ITD

and one for ITD only). Factor C represented the subjects.

3.3.3 The Model. The model used in experiment one is an analysis of

variance (ANOVA) model for a reduced three-factor experiment (14). It is assumed

that the a • b . c treatment combinations represent random samples of size n, where

50

a = 2, the number of filters; b = 24, the number of azimuths; c = 20, the number of

subjects; and n = 1, the sample size. The model is

Xijkl = g + ai + /3+ + 7 + (c4)ij + (Q'Y)ik + (WY)ik + eJ (1)

where

j = 1, 2,...,Ib

k 1, 2, ... ,I c

1) 2= , ,...,In

The XiJkL are assumed to be normally distributed and the following restrictions

apply:

Zc = 0, 0, E 7k =o0 (2)i 3 k

E(a)ij = 0, E(/)ij = 0 (3)i 3

E(a)ik = 0, E(a-y)ik = 0 (4)i k

E(ZY)ik = 0, E(!3')ik = 0 (5)j kc

Here ai, Oj, and 7k represent the main effects due to factors filter, azimuth and

subject respectively. -The terms (a 3 )ij, (a-y)ik, and (!3 y)jk represent interaction

between the factors. Let

Nijk. = mean for the (ijk)th treatment combination

•ij.. = the average of the population means for those

populations receiving the ith level of A and

51

the jth level of B= Aijk.

(6)k=1 C

-t=... the average of the population means for those

population receiving the ith level of Ab c i k(

7EE=bcZ . (7)j=1 k=1

1,t the average of the population means for all

populations under considerationa b c

E- E.- ike (8)i=ij=ik=1 abc

(9)

The other mean terms are defined in a similar manner:

b A 2i3k. (10)

j='

a/A.jk. = .= /-Jka (11)

a C

a- -(12)" '-"= = ac

k. = E E aIijk. (13)i=1 j=1

(14)

The main effects and interaction effects are defined as follows:

-i = Ai...- A (15)

,j = t. 3...- tI (16)

Ilk = A..k.--A (17)

(ao)ij = Itij.. - fti... - /t.j.. + IL (18)

(QY)ik = /tik.. - /i... - /..k. + A (19)

52

(,3 7)jk = Ajk.. - A.j..- A..k. + A (20)

The totals for the three-way ANOVA are shown next:

Ti...= sum of all responses from experimental units receiving

level i of A and B; where x is the absolute difference

of the actual location and perceived location of the soundb c n

= E E E Xijkl (21)j=1 k=1 1=1

Ti5..= sum of all responses from experiment units receiving

level i of A and j of BC nl

= E E Xijk (22)k=1 1=1

The other total terms are defined in a similar manner:

a c n

Tj.. = E E E Xikl (23)i=1 k=1 1=1

a b n

T.k. = E E E Xijkl (24)i=1 j=1 1=1

b n

Ti.k. = E E XiZkl (25)j=1 1=1

a n

Tjk. = Z Xjkl (26)i=1 1=1

n

Tjk•. = Xijkl (27)

a b c n

T... = E E E E Xijkl (28)i=1 j=1k=1 I=1

The sums of squares needed in the three-way ANOVA are:

ssA = T?.. T. (29)SSA bcn abcn

53

bT 2 T.2...(0Bs = 3+ (30),

j=1 acn abcn

C T2 k T2C = . .... (31)

k=1 abn abcna b T . a b .. + ... (32)

SSAB =EE -+i=1 j=1 cn i=1 bcn .= acn abcn

c __ i. T2!. .SSAC = bn - a - n a bc (33)

i= 1 k= 1 b i --1 b---1 a n a bcn

SSBC = c _ "2 .. . __.. (34)

3=lk=1 an a acn k- abn abcn

a b c n T2

SStotal = k - ab c. c n (35)i=1 i=1 k---1 1=1

SSE = SStotal - SSA - SSB - SSc - SSAB - SSAC - SSBC (36)

Finally an ANOVA table as shown in table 3 is set up to test the null (H0 )

and alternative (H1) hypothesis:

Ho1 ): There is no interactions of the factors A and B means;that is, (ao3 )ij = 0.

Ho2): There is no interactions of the factors A and C means;that is, (ay)ik =-

Ho3): There is no interactions of the factors B and C means;that is, (0Y)i)k = 0.

H•4): Factor A means are equal; that is, ai = 0.H•5): Factor B means are equal; that is, O3 = 0.

Ho): Factor C means are equal; that is, yk = 0.

H•'-6): The means are not equal.

3.4 Experiment 2: AAMRL HRTF and RBF HRTF

3.4.1 Objective. Determine whether or not the utilization of AAMRL's

HRTF data for filter design is statistically different from the utilization of RBF

neural network data for filter design in generation of a binaural signal.

54

Source of Degrees of Sum of Mean CalculatedVariation Freedom Squares (SS) square (MS) F Value

A a- 1 SSA SSA/(a-- 1) MSA/MSE

B b- 1 SSB SSB/(b- 1) MSB/MSE

C c- 1 SSC SSc/(c- 1) MSC/MSE

AB (a - 1)(b - 1) SSAB SSAB/(a - 1)(b - 1) MSAB/MSE

AC (a - 1)(c - 1) SSAC SSAc/(a - 1)(c - 1) MSAC/MSEBC (b - 1)(c - 1) SSBC SSBc/(b - 1)(c - 1) MSBC/MSE

Error (E) DFE = subtraction SSE SSE/DFETotal abcn - 1 S~tota1

Table 3. Analysis of Variance for "Reduced three-way model"

3.4.2 Method. Twenty subjects were presented 48 binaural sounds through

a set of headphones. In this forced choice experiment, the subjects were asked to

select the location from which they perceived the sound to come from. Subjects

chose from a display provided on a computer screen by moving the mouse to place

the cursor over the perceived location and clicking the button (see figure 47).

The display had only 24 possible azimuths to choose from, and the elevation

angle is always zero. These 24 locations corresponded to the 24 possible locations

of the computer generated binaural signals. The signal used in each case are either

digitally filtered with the AAMRL HRTF data or RBF HRTF data and delayed

appropriately for a given angle of azimuth. The filters were designed using the

program ESPS by Entropic Research lab. The design was an FIR filter using the

weighted mean square error criterion. The weights corresponded to the AAMRL

HRTF or RBF HRTF for the 93 frequencies at the test location. A separate filter

was designed for each ear and for each location tested. In either case, the original

signal is the speech pattern for a male utterance of the word "seventeen."

Subjects were asked to close their eyes during the data collection phase of this

experiment. Interaction effects between the subjects and the two fixed factors (angle

and method of signal generation) in this experiment are assumed to be zero. The

angle at which the signals are designed to come from are called factor B, and has

55

24 levels (one for each azimuth). Factor A represented the method by which the

binaural signal was generated, and has 2 levels (one for AAMRL HRTF and one for

RBF HRTF). Factor C represented the subjects.

3.4.3 The Model. The model used in experiment two is the same as

described above in experiment one.

3.5 Conclusion

This chapter presented the methodology for training a neural network to ap-

proximate the HRTF for zero elevation. It also presented two experiments to verify

the methodology. The results of the training and experiments will be presented in

the following chapter.

56

IV. Experiment Results

4.1 Introduction

This chapter presents the results of the two experiments conducted during this

thesis work. Before presenting the results of each experiment a discussion of the

types of errors analyzed in the experiments is presented.

4.2 Types of Error

4.2.1 Algebraic Error. Algebraic error is measured by taking the difference

between the actual location of the sound source and the subjects perceived location.

The difference is calculated so that the results is always between or equal to ±1800 .

Also the sign of the result indicates the direction the subjects response was from the

actual location. Positive values indicate a clockwise difference and negative values

indicate a counter-clockwise difference (see figure 48). For example, if the actual

location is 3490 and the subject responded with 100 the algebraic error is calculated

as +21'. On the other hand, if the actual location is 100 and the subject responded

with 3490 the algebraic error is calculated as -21'.

4.2.2 Absolute Error. Absolute error is measured by taking the absolute

value of the difference between the actual location of the sound source and the

subjects perceived location of the sound source. The difference is calculated so that

the result is always less than or equal to 1800. For example, if the actual location

is 349' and the subject responded with 100 the absolute error is calculated as 210 as

opposed to 3390.

4.2.3 Reversals. Reversals are a phenomenon where the sound is perceived

flipped front to back or back to front from the actual location. Note this is not

a 1800 shift, but a symmetric flip about the horizontal axis though the ears (see

figure 49). Figure 50 shows actual occurrence of this phenomena from experiment

57

Perceived location

Actual location

Actual location -/

• Perceived location

Figure 48. Algebraic Error, positive values indicate a clockwise difference and neg-ative values indicate a counter-clockwise difference

one. The left image shows a subject with back-front reversals with HRTF treatment.

The right image shows a different subject with front-back reversals without HRTF

treatment. The notation has the following meaning. The x symbol indicates actual

location while the o symbol indicates perceived location. The line between the x

and o shows the pairwise groupings. The gradual spacing toward the center of the

head is for clarity" only and is not an indication of perceived distance.

4.2.4 Default to 90'. Default to 90o is a phenomena where the perceived

location defaults to 90' or 270' no matter where the actual location is. This phe-

nomena has been observed by other researchers (17). Figure 51 is an example of a

subject whose perceived locations defaulted to 901 and 270'.

4.3 Experiment One Results

As stated previously, the primary objective of experiment one was to determine

whether or not the Head Related Transfer Function (HRTF) provides any advantage

to a listener attempting to identify the azimuth of a computer generated binaural

58

Actual location

0 0

00C

Perceived location

Figure 49. Reversal, a symmetric flip about the horizontal axis through the ears

10 349 10 3493 ~ = -?8 3 - ~ -. 8

'15 45155' 03 5/ 03

(5

5 15 40"

6' '91 6' '91

8 78 8 78

9 62 9& .6211'

.4 9 11

,O 49

'0 ' 123712 37 12 237

13 • 25 13 251, 4 _12

14 _,- '"212

169 191 169 191

Figure 50. Example of a subject with back-front reversals with HRTF treatment(left) and a different subject with front-back reversals without HRTFtreatment (right). Notation: x indicates actual location, o indicateperceived location. The line between the x and o shows the pairwisegroupings. The gradual spacing toward the center of the head is forclarity only and is not an indication of perceived distance.

59

10 349

4- 155' 03

6' 91

8 78

9 62

11~ 4912 37

13 2514 -•___ - f-212

169 191

Figure 51. Example of subject whose perceived locations defaulted to 900 and 2700.Notation same as above

signal. More specifically, does the use of the HRTF in generating binaural signals

provide a listener with a signal that can be more accurately located in azimuth than

a binaural signal generated using only interaural time delay (ITD)? The HRTF's

used in this experiment were developed at Armstrong Aerospace Medical Research

Laboratories (AAMRL), Wright-Patterson AFB, OH (12).

Data collected for 20 subjects was used to determine the answer to the question

posed above. Calculations in this statistical analysis of the data are performed using

Matlab. The data used is found in appendix A. The Matlab code is found in appendix

C. This experiment is a reduced three factor experiment. Table 4 summarizes the

factors in this experiment.

Factor Levels

Filter 2Subject 20Azimuth 24

Table 4. Experiment 1 factors

Where the levels of "Subject" represent each of the twenty individuals taking

part in the experiment, and the levels of "Filter" are 1 and 2 (1 corresponds to the

60

use of the HRTF and 2 corresponds to no use of HRTF). The levels of "Azimuth"

are 1, 2, 3, ... , 24 and correspond to the angles shown in table 5. The angle of

Level Angle Level Angle1 100 13 19102 320 14 21203 450 15 2250

4 580 16 23705 690 17 24906 820 18 26207 980 19 27808 1110 20 29109 1230 21 303010 1350 22 315011 1480 23 328012 1690 24 3490

Table 5. Levels of "Azimuth"

azimuth is measured as shown in figure 52. Using the data set shown in appendix A

Figure 52. Azimuth, 0 of sound source to directly in front of face

with "Response" representing the magnitude of the difference between the subjects'

perceived angle and the true location, the following Analysis of Variance (ANOVA)

(table 6) was obtained.

61

Source of Degrees of Sum of Mean Calculated F at

Variation Freedom Squares (SS) square (MS) F Value 1% level

Filter (A) 1 60554.26 60554.26 57.44 6.85,6.637.47 7.88

Azimuth (B) 23 416834.70 18123.25 17.19 2.03,1.792.23 2.78,2.70

Subject (C) 19 48451.13 2550.06 2.42 2.19,1.88

A * B 23 186224.72 8096.73 7.68 2.03,1.79A * C 19 23134.95 1217.63 1.15 2.19,1.88B * C 437 564293.18 1291.29 1.22 1.53,1.00

Error 437 460732.48 1054.31Total 959 1760225.42

Table 6. Analysis of Variance for "Response"

Recall, the following model for this experiment:

Xi 3kl = pL + Oaj + /3j + -1k + (a~3)i + (ay)ik + (/ 3 ')ik + 6ijkl

Interaction effects between filters and azimuth must be tested. The null and alternate

hypotheses are H'1) (no interactions between filters and azimuth) and H1 (the means

are not equal). The test statistic is Fcac = (Mean Square Interaction) + (Mean

Square Random error) = 7.68 (as seen in table 6). Testing at an significance level of

1% (a = .01)1

Fcarc = 7.68 > F.99,23 ,437 P 2.03,1.79.

Hence, H11) can be rejected. Since the hypothesis HT.) for the Filter * Azimuth

can be rejected, it can be concluded that the interactions are significantly large.

Differences in the factors are then significant only if they are large compared with

the interactions. For this reason, many statisticians recommend that HO4 ) and H05 )

be tested by using the F ratios MSA/MSAB and MSB/MSAB rather than those

given in table 3 (27). These alternate F values are the second values shown in

62

table 6. Testing Ho4 ) at an significance level of 1%,

Fcaiait = 7.47 < F. 9 9 ,1,23 = 7.88.

Testing Ho5 ) at an significance level of 1%,

Fralcai, = 2.23 < F.99,23,23 z 2.78, 2.70.

Therefore the means of the filters are statistically equal and the means of the az-

imuths are statistically equal.

Next completing the same tests for Filter * Subject interactions. Testing at an

significance level of 1% (a = .01),

Fcaic = 1.15 < F.99 ,19 ,4 37 - 2.19,1.88

shows that H(2) cannot be rejected. It can be concluded that there is no Filter *

Subject interactions and H(4 ) and H(6) can be tested. Testing H(4) at an significance

level of 1%,

FcaIc = 57.44 > F.99 ,1 ,4 37 - 6.63.

Testing H(6) at an significance level of 1%,

FcaIc = 2.42 > F.99 ,19 ,48 0 ; 1.91.

Therefore the means of the Filters nor the means of the Subjects are statistically

equal.

Finally completing the same tests for Azimuth * Subject interactions. Testing

at an significance level of 1% (a = .01),

Fca.c = 1.22 > F.99 ,4 37 ,4 37 - 1.53, 1.00

63

shows that HO3) cannot be rejected. Testing HO5 ) at an significance level of 1%,

Fcatc = 17.19 > F.99,23,437 - 2.03,1.79.

Testing HO6 ) at an significance level of 1%,

Fcac = 2.42 > F.99,19,437 • 2.19,1.88.

Therefore the means of the azimuth nor the means of the subjects are statistically

equal.

Using the reversal corrections, the following Analysis of Variance (ANOVA)

(table 7) was obtained. If the reversal location error was smaller than the sub-

jects' response then the reversal angle was used in table 7. The number of reversals

corrected was 432 out of 960 or 45%.

Source of Degrees of Sum of Mean Calculated F atVariation Freedom Squares (SS) square (MS) F Value 1% level

Filter (A) 1 1811.7 1811.77 7.45 6.85,6.633.28 8.18

Azimuth (B) 23 31188.33 1356.01 5.58 2.03,1.79

Subject (C) 19 21444.68 1128.67 4.64 2.19,1.882.04 3.15,3.00

A * B 23 7804.46 339.32 1.40 2.03,1.79A * C 19 10498.02 552.53 2.27 2.19,1.88B * C 437 137136.70 313.81 1.29 1.53,1.00Error 437 106269.46 243.18Total 959 316153.40

Table 7. Analysis of Variance for "Response" with reversal corrections

From table 7 the calculated F values indicate significant interaction between

Filter * Subject. The other interactions Filter * Azimuth and Azimuth * Subject

indicate no significant interaction. Comparing within factor variance, the means

64

of the filters are statistically equal; the means of the azimuths not are statistically

equal; and the means of the subjects are statistically equal.

Now, the main objective of this experiment is addressed. Is there a difference

in the mean response for HRTF versus non-HRTF binaural signals? To determine

the answer to this question, a pairwise comparison of the means for the two levels of

the variable "Filter" is conducted. The following formula is used to determine the

confidence intervals for the differences of two population means:

S-- (37)X2 - X1 ± z10'92-91 = X2 X 1 ±- zc , 2 [0 N- (7

N2 N,

where,

X1 is the mean of the with HRTF filter error.X 2 is the mean of the without HRTF filter error.oCl is the standard deviation of the with HRTF filter error.C2 is the standard deviation of the without HRTF filter error.z, is the 99% confidence coefficient (z, = 2.58).N1 is the size of the with HRTF filter error.N 2 is the size of the without HRTF filter error.

And since the point estimates for the means at each level are:

X, = 35.73

X2 = 51.61

X2-X1 = 15.88

We can state with 99 percent confidence that

8.87 < X 2 - X 1 < 22.90

65

And since the point estimates for the means at each level with reversal corrections

are:

X1 = 20.12

X2 = 17.37

X2 - X1 = -2.75


-5.76 < X 2 - X 1 < 0.27

The confidence interval shows a 8.87' to 22.90' advantage of using the HRTF versus

no HRTF. After reversal correction the advantage is reduced to 0.27' to a disadvan-

tage of 5.76'. It should be obvious that the correction for reversals will reduce the

error. The algorithm used to include reversals chooses the smaller error. The mean

with HRTF error reduced from 35.73' to 20.12' and the mean without HRTF error

reduced from 51.61' to 17.37'. The interaction of reversals is examined further in

the next section.

4.3.1 Further examination of results. The above results show mixed results

of the statistical advantage in localization accuracy over simply ITD with significant

interaction between the location of the sound and whether the HRTF is used.

Figure 53 shows the absolute mean error and standard deviation for sounds

with the HRTF (solid) and without the HRTF (dash). The error bars (vertical lines)

show the standard deviation of the mean error. This figure reveals that the without

HRTF filter had greater error for presentation in front of the head when compared to

behind the head. Figure 54 shows the algebraic mean error and standard deviation

for sounds with the HRTF (solid) and without the HRTF (dash). Again the error

bars show the standard deviation of the mean error. The algebraic error gives a

66

180 ...

~160 - .. ... ... .I. .

i~120 - ........ ......

~ 1 '\I 7T

100 .......

80I T1)

60 .. ' .. . .. . .. . .

T7 T-T l-rI

~20 I ' 'j.-. . .'

0............. ...... . -7.. .. I.....

-20L-0 90 180 270 360

azimuth angle In degrees

Figure 53. Absolute mean error and standard deviation for sounds with the HRTF(solid) and without the HRTF (dash). The error bars show the standarddeviation of the mean error.

150-f ......

T I

so -.. .. . . . . . . l. .

-so ...... .. ..... . . .

. -100 .. .. ... ... ... .-

-1 5 0 .. ... ........ ..... ..... ..... .........

0 90 180 270 360azimuth angle In degrees

Figure 54. Algebraic mean error and standard deviation for sounds with the HRTF(solid) and without the HRTF (dash). The error bars show the standarddeviation of the mean error.

67

direction to the error. As can be seen from the figure, the sign of the error indicates

a general pulling of the perceived sound toward the ears. Figure 55 is perhaps a

better view of-the data. In this figure the,* shows the mean location of the perceived

1 0 349 10 349

'32 ,.... ...... ..... ,3832 328

82 278 82

111/ ,,/ . 1 1".;"2

123".."27 12 "'.23

135"". 25135 '. '225

18 ".148 212

169 191 169 191

Figure 55. Mean and standard deviation for sounds with the HRTF (left) and with-out the HRTF (right). The arcs show the standard deviation of the mean(shown as a,.

sounds and the arcs show the standard deviation. As can be seen, the mean location

tends to be pulled near the 9Q0 and 2700 locations. Figure 55 also shows how the

without HRTF treatment is perceived more behind the head than in front of the head.

This parallels the results of Oldfield and Parker where the pinna-filled perceptions

were pulled toward the ears (18).

To further present the data in more detail, appendix A has figures 65, 66, 67

and 68 depicting all •he subjects responses for each actual location. The notation

used in these figures is as follows: x and solid line indicates with HRTF, o and

dashed line indicates without HRTF. The arcs show the standard deviation of the

samples. The * and + indicate the mean with and without HRTF respectively.

These are the actual responses. No corrections have been made concerning reversal.

68

In an effort to take into account reversals, polar histograms of the data are

presented next. The polar histogram shows the frequency of subjects' responses for

a specific angle of azimuth. Figure 56 shows the polar histogram of all responses for

sounds with the HRTF (solid) and without the HRTF (dash). The dotted circle in-

dicates the number of times the sounds were actual at that location. Notice subjects

10 349

32 ,"'.............. 328 J4 .,' " '.315

/ 3.

1"36 .. s. / I I\ ,,.%, 225

..... Actual 14 1 'I -- "\212

-with HRTF 1" ',19 1SI

- - without HRTF

Figure 56. Polar Histogram of all responses for sounds with the HRTF (solid) andwithout the HRTF (dash).

heard sound at some locations more times than the sound was actual there. Recall

that each subject heard 48 sounds, one with HRTF and one without HRTF for each

of the 24 azimuth locations for a total of 960 sounds. This figure again indicates that

subjects tended not to perceive sounds in front or behind the head. However subjects

perceive less without the HRTF sounds in front of the head and more without the

HRTF sound behind the head than the with HRTF sound respectively.

Tables 8 and 9 show statistics for the with HRTF and without HRTF filters.

Without taking into account reversals corrections 41.25% and 34.58% of the with

HRTF and without HRTF responses respectively were correct within one button

(±23°). Recall the button layout is shown in figure 47. Figure 57 is a polar histogram

of all responses correct within one button. This figure shows without HRTF correct

69

with HRTF Without HRTFNumber of samples 480 480rmserr 0.4500 5.9583stderr 50.7709 70.8760correct - exact 16.88% 14.37%

correct - within one button (230) 41.25% 34.58%

Table 8. Without reversal corrections (nor)

with HRTF Without HRTFNumber of samples 480 480rmserr 0.9375 0.5458stderr 27.6543 24.8911Number of reversals 176 259correct - exact 24.70% 28.75%correct - within one button (230) 57.29% 66.87%

Table 9. With reversal corrections (rev)

responses are greater behind the head while the with HRTF correct response are

more evenly distributed front and back.

Next taking into account reversals corrections 57.29% and 66.87% of the with

HRTF and without HRTF responses respectively were correct within one button

(±230). Figure 58 is a polar histogram of all responses with reversal corrections that

are correct within one button. This figure shows without HRTF correct responses

and the with HRTF correct responses are more evenly distributed front and back.

The percentage of reversals with the HRTF and without the HRTF was 36.67%

and 53.96% respectively. Oldfield and Parker found few reversals with the pinna

unfilled and 26% with the pinna filled (17, 18). The relative increase in the number

of reversals indicates that the HRTF helps in reducing the number of reversals. The

differences between Oldfield and Parker and this thesis also indicate that there may

be cues in the actual sound that are lost in the filtered sound. Note: Oldfield and

Parker used white noise while this thesis used filtered speech.

70

10 349

32 ... 328

45 .315

58 303

69 * 291

82 278

"98 262

S\ • \,,4 •249

123 ". ~ •.237

135 225

S..... Actual 148 ............ 212

- with HRTF correct 169 191

- - without HRTF correct

Figure 57. Polar Histogram of all responses correct within one button for soundswith the HRTF (solid) and without the HRTF (dash).

10 349

32 .... . . . .. 32845 ."'•' - • "'""....315

58 303

69 ' "' 291

8 • . 278

98 2.. . 262

123",. 23

-with HRTIFcorrect 169 191

-- without HRTIF correct

Figure 58. Polar Histogram of reversal corrected responses correct within one but-ton for sounds with the HRTF (solid) and without the HRTF (dash).

71

4.3.2 Summary of Experiment One Results. Table 10 shows a summary of

the ANOVA hypotheses acceptances and rejections. An "accept" means the means

Source of No Reversals ReversalsVariation Hypothesis Hypothesis

Filter (A) reject rejectaccept accept

Azimuth (B) reject rejectaccept

Subject (C) reject rejectaccept

A * B reject accept

A * C accept rejectB * C accept accept

Table 10. Summary of Experiment One

are statistically equal, while a "reject" mean the difference between the means is

statistically large. Depending on the statistical method used, we can conclude that

the HRTF does provide a statistical advantage in localization accuracy over simply

ITD. There is however a statistically significant interaction between the location

of the sound and whether the HRTF is used. When this interaction is removed

using the alternate F-Value, the statistics give mixed results of equal means for the

filters and azimuth. This may mean that at certain angles of azimuth, the HRTF

either provides no advantage at all or hinders localization capabilities. Adding in

the calculations for reversals changes the results to where the means of the azimuth

are not statistically equal. The reversal calculations will inherently reduce the error

results. However comparing the number of reversals indicates an advantage of using

the HRTF over no HRTF.

4.4 Experiment Two Results

The primary objective of experiment two was to determine if a neural network

could be trained to learn the HRTF.

72


Filter (A) 1 7578.07 7578.07 8.58 6.85,6.63

Azimuth (B) 23 287455.55 12498.07 14.15 2.03,1.798.39 2.03,1.79

Subject (C) 19 39938.99 2102.05 2.38 2.19,1.881.41 2.19,1.88


Table 11. Analysis of Variance for "Response"

Recall, the following model for this experiment:

Xj = IL + c*i + /3 + -1k + (ao3 )ij + (a-y)ik + (07i3~ k + 6ijkl

Interaction effects between filters and azimuth must be tested. The null and alternate

hypotheses are H'1) (no interactions between filters and azimuth) and H1 (The means

are not equal). The test statistic is Fcac = (Mean Square Interaction) + (Mean

Square Random error) = 1.34 (as seen in table 11). Testing at an significance level

of 1% (a = .01),

EcaIc = 1.34 < F.99,23,437 • 2.03,1.79.

Hence, H•1) cannot be rejected. it can be concluded that there is no Filter * Azimuth

interactions and HL4) and H(5) can be tested. Testing HO4 ) at an significance level

of 1%,

Fca.c = 8.58 > F 99,1,437 " 6.85, 6.63.

Testing HO5) at an significance level of 1%,

Fcaic = 14.15 > F.99,23,437 - 2.03, 1.79.

73

Therefore the means of the filters nor the means of the azimuths are statistically

equal.

Next completing the same tests for Filter * Subject interactions. Testing at an

significance level of 1% (a = .01),

Fcaic = 1.14 < F. 99,19 ,437 ;: 2.19, 1.88

shows that HA2) cannot be rejected. It can be concluded that there is no Filter

* Subject interactions and HO4 ) and HO6 ) can then be tested. Testing HL4 ) at an

significance level of 1%, Fcaic = 8.58 > F(.99, 1,437) ; 6.85, 6.63. Testing H(6) at

an significance level of 1%,

Fea, -= 2.38 > F.99,19 ,437 - 2.19,1.88.

Therefore the means of the filters nor the means of the subjects are statistically

equal.

Finally, completing the same tests for Azimuth * Subject interactions. Testing

at an significance level of 1% (a = .01),

Fcaic = 1.69 > F.99,437,437 - 1.53, 1.00

shows that H(3) can be rejected. Differences in the factors are then significant only

if they are large compared with the interactions. Again the alternate F values are

used. These alternate F values are the second values shown in table 6. Testing H(5)

at an significance level of 1%,

Fcalcalt = 8.39 > F.99,23 ,43 7 ; 2.03, 1.79.

74

Testing H(6) at an significance level of 1%,

Fcalcaft = 1.41 < F. 9 9 , 1 9 ,4 3 7 • 2.19, 1.88.

Therefore the means of the azimuths are not statistically equal, however, the means

of the subjects are statistically equal.

Using the reversal corrections, the following Analysis of Variance (ANOVA)

(table 12) was obtained. If the reversal location error was smaller than the subjects'

response then the reversal angle was used in table 12. The number of reversal

corrections was 351 out of 960 or 37%.


Filter (A) 1 1249.46 1249.46 9.23 6.85,6.633.94 7.88

Azimuth (B) 23 16741.34 727.88 5.38 2.03,1.792.29 2.78,2.70

Subject (C) 19 6288.65 330.98 2.45 2.19,1.88


Table 12. Analysis of Variance for "Response" with reversal corrections

From table 12 the calculated F values indicate significant interaction between

Filter * Azimuth. The other interactions Filter * Subject and Azimuth * Subject

indicate no significant interaction. Comparing the within factor variance the means

of filters are statistically equal; the means of the azimuths are statistically equal;

and the means of the subjects are not statistically equal.

Now, the main objective of this experiment are addressed. Is there a difference

in the mean response for AAMRL HRTF versus RBF HRTF binaural signals? To

75

determine the answer to this question, a pairwise comparison of the means for the

two levels of the variable "Filter" are conducted. The following formula is used to

determine the confidence intervals for the differences of two population means:

S_-± (38)X2 - X1 ± Zcrx,-X, = X2 - X1 ± Z, -102 (38)

N2 N,

where,

X, is the mean of the AAMRL HRTF filter error.X 2 is the mean of the RBF HRTF filter error.a, is the standard deviation of the AAMRL HRTF filter error.

U 2 is the standard deviation of the RBF HRTF filter error.z, is the 99% confidence coefficient (z, = 2.58).NI is the size of the AAMRL HRTF filter error.N 2 is the size of the RBF HRTF filter error.

And since the point estimates for the means at each level are:

X1 = 38.28

X2 = 32.66

X2 - X1 = -5.62


-12.01 < X 2 - X1 < 0.77

And since the point estimates for the means at each level with reversal corrections

are:

X1 = 17.18

X2 = 14.90

X2 - X• = -2.28

76


-4.58 < X?2 - X 1 < 0.02

The confidence interval shows a 0.770 advantage to a 12.01' disadvantage of using

the AAMRL HRTF versus RBF HRTF. After reversal correction the advantage is

reduced to 0.02' to a disadvantage of 4.58'. Again the correction for reversals reduced

the error. The mean AAMRL HRTF error reduced from 38.28' to 17.18' and the

mean RBF HRTF error reduced from 32.66' to 14.90'. The interaction of reversals

is examined further in the next section.

4.4.1 Further examination of results. Figure 53 shows the absolute mean

error and standard deviation for sounds with the AAMRL HRTF (solid) and RBF

HRTF (dash). The error bars show the standard deviation of the mean error. This

2 0 0 '.......... ..................... ...... .. . .....

1 5 0 . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .... . . . . . ...

1 1 - ..................... .......... ........ ........ . .....

I,I T " T

50 - T ..... T .

0 90 180 270 360"azimuth angle in degrees

Figure 59. Absolute- mean error and standard deviation for sounds with theAAMRL HRTF (solid) and RBF HRTF (dash). The error bars showthe standard deviation of the mean error.

figure reveals that AAMRL HRTF had slightly greater error than the RBF HRTF. A

comparison of the standard deviations reveals no significant trends. Figure 60 shows

the algebraic mean error and standard deviation for sounds with the AAMRL HRTF

77

(solid) and RBF HRTF (dash). The error bars show the standard deviation of the

mean error. The algebraic error gives a direction to the error. As can be seen from

"1 0 _ . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

8'100 ' . :i

.8 II I

-C I : I\

. . . . .. . . . . . .- . . . . . .

"T IS 50 - .. .....

I, I II-,II II "

1 -0 .. ... .... . ... ..... .. ..... .. ........

150

0 90 180 270 360azimuth angle In degrees

Figure 60. Algebraic mean error and standard deviation for sounds with theAAMRL HRTF (solid) and RBF HRTF (dash). The error bars showthe standard deviation of the mean error.

the figure, the sign of the error indicates a general pulling of the perceived sound

toward the ears. Figure 55 is perhaps a better view of the data. In this figure the

* shows the mean location of the perceived sounds and the arcs show the standard

deviation. As can be seen, the mean location tends to be pulled near the 90' and

2700 locations. This parallels the results of experiment one and Oldfield and Parker

where the pinna-filled perceptions were pulled toward the ears (18). Begault and

Wenzle also reported subjects had a pattern of pulling toward the vertical-lateral

plane (3).

To present the data in more detail, appendix B has figures 69, 70, 71 and 72

depicting all the subjects' responses for each actual location. The notation used in

these figures is as follows: x and solid line indicates AAMRL HRTF, o and dashed

line indicate RBF HRTF. The arcs show the standard deviation of the samples. The

* and + indicates the mean AAMRL and RBF HRTF respectively. These are the

actual responses. No corrections have been made concerning reversal.

78

10 349 10 349

32 . 328 3 2

45 315 45 315458

58 •,".303 58 .. 303

69

82 278 82 278

98 262 98 262

11124 il 24123"..;"

123 237 123" , 237

135 225 135 225

148 148 212. ... 21.48 " ....... . ...... 21

169 191 169 191

Figure 61. Mean and standard deviation for sounds with the AAMRL HRTF (left)and RBF HRTF (right). The arcs show the standard deviation of themean (shown as a *).

In an effort to take into account reversals, polar histograms of the data are

presented next. The polar histogram shows the frequency of subjects' responses for

a specific angle of azimuth. Figure 62 shows the polar histogram of all responses for

sounds with the AAMRL HRTF (solid) and the RBF HRTF (dash). The AAMRL

HRTFs are label "Actual HRTF" in the figure legend. The dotted circle indicates the

number times the sounds was actual at that location. Again the responses between

the AAMRL HRTF and RBF HRTF are similar with the larger number of responses

occurring on the sides.

Tables 13 and 14. show statistics for the AAMRL HRTF and RBF HRTF

filters. Without taking into account reversals corrections 40.62% and 51.88% of

the AAMRL HRTF and RBF HRTF responses respectively were correct within one

button (±23'). Figure 63 is a polar histogram of all responses correct within one

button. This figure shows a even distribution between filters. The front and back

distribution may slightly favor the back.

79

10 349

98

123' / ..

135 ' 225

............... .. .............

-Acua HTF169 191

- -RB HRT.."

Figure 62. Polar Histogram of all responses for sounds with the AAMRL HRTF

(solid) and RBF HRTF (dash).

AAMRL HRTF RBF HRTFNumber of samples 480 480

rmserr 1.6354 0.0521

stderr 54.2080 50.4625correct - exact 15.42% 22.08%


Table 13. Without reversal corrections (nor)

AAMRL HRTF RBF HRTFNumber of samples 480 480

rmserr 0.1146 0.3312stderr 22.5449 19.9285

Number of reversals 188 163correct - exact 25.21% 30.63%


Table 14. With reversal corrections (rev)

80

10 349

32 328

45 .315

58 ..303

69 / ,'291

82 278

98 262

"i" ' .249

123'-". -,.237

135 226

Actual 148 . 212

- Actual HRTF correct 169 191

- - RBF HRTF correct

Figure 63. Polar Histogram of all responses correct within one button for soundswith the AAMRL HRTF (solid) and RBF HRTF (dash).

Next, taking into account reversals corrections 62.71% and 70.42% of the

AAMRL HRTF and RBF HRTF responses respectively were correct within one but-

ton (±230 ). Figure 64 is a polar histogram of all responses with reversal corrections

that are correct within one button. This figure shows there are more AAMRL HRTF

correct responses and the RBF HRTF correct responses and they are more evenly

distributed front and back. The percentage of reversals with the AAMRL HRTF and

the RBF HRTF was 39.17% and 33.96% respectively. Begault and Wenzle reported

the mean number of reversals over all the locations was 29% (3). The results of

experiment one showed the AAMRL HRTF reversals to be 36.67%.

4•4.2 Summary of Experiment Two Results. Table 15 shows a summary of

the ANOVA hypotheses acceptances and rejections. An "accept" means the means

are statistically equal, while a "reject" mean the difference between the means is

statistically large. Without considering reversals, we can conclude that the RBF

HRTF provides a statistical advantage in localization accuracy over the AAMRL

HRTF from which they were derived. However with reversals correction include a

81

10 34932 .. . . . . . . . . . .. . . .. 328

45 .. 5...

. t 148..2 303

82 "278

111 "2 249

123 2'.37•"

135 ". I"225

.1 8.......tual ... . ..........1 212

- Actual HRTF correct 169 191

- - RBF HRTF correct

Figure 64. Polar Histogram of reversal corrected responses correct within one but-ton for sounds with the AAMRL HRTF (solid) and RBF HRTF (dash).

Source of No Reversals ReversalsVariation Hypothesis Hypothesis

Filter (A) reject rejectaccept

Azimuth (B) reject rejectreject accept

Subject (C) reject acceptaccept

A* B accept rejectA * C accept acceptB * C reject accept

Table 15. Summary of Experiment Two

82

statistical advantage is not indicated and the means of the two filters are statisti-

cally equal. There is also a statistically significant interaction between the location

of the sound and the filter. As discussed in experiment one, this may mean that

at certain angles of azimuth, the HRTF either provides no advantage at all or hin-

ders localization capabilities. The reversal calculations will inherently reduce the

error results. However, comparing the number of reversals does not indicate a large

difference between the AAMRL HRTF and the RBF HRTF.

83

V. Conclusions and Recommendations

5.1 Summary

This thesis determined whether an artificial neural network (ANN) can approx-

imate the Armstrong Aerospace Medical Research Laboratories (AAMRL) head re-

lated transfer functions (HRTF) data. In order to test this hypothesis, two separate

tests were performed. The first test determined whether HRTF lends any support in

sound localization when compared to no HRTF (Interaural Time Delay only). The

second test determined whether HRTF and ANN lend the same amount of support

in sound localization when compared to each other.

The following research objectives were met:

"* Modified AFIT algorithms that have been used successfully on placing sounds

in azimuth. The algorithms that create 3D sound with the AAMRL HRTF

were changed to used the ANN HRTF instead. This allowed the creation of

tests to compare the ANN HRTF with the AAMRL HRTF.

"* Statistically checked to see advantage of HRTF and ITD versus ITD only.

"* Statistically checked to see difference between AAMRL HRTF data and ANN

HRTF data

"* Investigated the program LNKmap as a neural network tool for function ap-

proximation.

5.2 Conclusions

The final conclusion is that the AAMRL HRTFs can be approximated by

an artificial neural network for azimuth positions at zero degrees elevation. The

point estimates for the difference in the AAMRL and RBF HRTF's mean error was

only 5.62' and 2.28' without and with reversal corrections respectively. The point

84

estimates indicate that the RBF HRTF had slightly lower error than the AAMRL

HRTF. However the 99% confidence interval of the mean error shows no advantage

one way or the other.

To show that the HRTF was in fact contributing to sound localization, experi-

ment one was conducted. The results indicate an advantage in localization accuracy

with HRTF. There is a statistically significant interaction between the location of

the sound and whether the HRTF is used. This result may indicate that for the 00

elevations ITD has a strong influence. Comparing the numbers of reversals clearly

indicates an advantage of using the HRTF over no HRTF. It is important to note that

the 3D sounds presented in the experiments were the direct paths only. Smith showed

an improvement in results when attenuated reflections were also included (25).

5.3 Recommendations for Future Research

The obvious next step is to expand the neural network training the other

elevations of the AAMRL data. The AAMRL data has HRTF responses for elevation

±90'. Oldfield and Parker found that the acuity of sound localization varied with

azimuth and elevation (17). The ANN HRTF would have to be tested in these

non-zero degrees elevation regions.

The AAMRL HRTF 0' elevation data in this thesis was spaced approximately

230 apart. Future research could investigate the interpolations accuracy of the ANN

against the AAMRL HRTF data spaced 1' apart at 0' elevation.

Replacing the AAMRL HRTF samples by an artificial neural network approx-

imator may also provide insight of underlining functions within the HRTF. Since

subjects where able to localize with the smoothed ANN HRTF, simpler filter designs

may be implemented.

Implementing the ANN HRTF in hardware may provide an alternate approach

to real-time implementation other than the DIRAD and Convolvotron.

85

Appendix A. Experiment 1 Data: With and Without HRTF

86

With Without With Without

Actual HRTF HRTF Actual HRTF HRTF

100 1480 1690 100 100 1910

320 320 1480 320 1480 1690

450 1230 1480 450 1480 1690

580 580 1480 580 820 1480

690 690 820 690 450 980

820 580 1110 820 690 1350

980 820 1230 980 690 1110

1110 820 820 1110 980 1230

1230 1350 1350 1230 1350 1480

1350 690 1350 1350 1110 1690

1480 1480 1230 1480 1110 1350

1690 1480 1690 1690 1690 1690

1910 1910 1690 1910 2120 1910

2120 2490 2120 2120 2250 2120

2250 2620 2120 2250 2250 2250

2370 2620 2910 2370 3030 2250

2490 2780 2780 2490 2780 3150

2620 2370 2780 2620 2620 2620

2780 3030 2490 2780 2620 3030

2910 3150 2370 2910 2780 2490

3Q3 0 2910 2490 3030 2910 2910

3150 2910ý 2120 3150 3490 2120

3280 2780 2490 3280 2250 2780

3490 3490 1910 3490 3280 1910

87



100 320 1910 100 100 1910

320 450 1350 320 450 1690

450 820 1480 450 320 1350

580 450 1480 580 450 1350

690 690 1350 690 580 1350

820 450 820 820 450 1350

980 450 980 980 690 980

1110 820 980 1110 690 1110

1230 820 1480 1230 820 1480

1350 820 1230 1350 820 1480

1480 1350 1690 1480 1480 1480

1690 1690 1690 1690 1910 1690

1910 3490 1910 1910 2120 1910

2120 2120 2120 2120 2780 2120

2250 2250 2620 2250 2780 2250

2370 2780 2490 2370 2780 2490

2490 2780 2910 2490 3030 2780

2620 3150 2620 2620 2780 2910

2780 3280 2780 2780 2910 2780

2910 3150 2910 2910 3030 2250

3Q30 3150 2620 3030 3150 2490

3150 3280 2250 3150 3150 2250

3280 3030 2620 3280 3030 2370

3490 3490 00 3490 3490 1910

88



100 820 1480 100 3490 1690

320 820 1480 320 1230 3150

450 100 1230 450 690 1480

580 320 980 580 580 580

690 690 1110 690 450 580

820 2780 690 820 820 1350

980 450 690 980 2910 690

1110 980 820 1110 1110 1230

1230 00 1230 1230 980 1480

1350 1110 1230 1350 1110 690

1480 820 00 1480 320 1480

1690 1110 1480 1690 1690 320

1910 1480 1690 1910 1910 3490

2120 2370 2780 2120 2490 2120

2250 00 3030 2250 3030 2120

2370 3150 2910 2370 2780 1910

2490 3030 3030 2490 2780 2780

2620 2620 2490 2620 3150 2370

2780 2910 2250 2780 3150 2120

2910 2780 3280 2910 3030 2910

3030 2780 2120 3030 00 2250

3150 2916 2120 3150 3280 1910

3280 3280 2120 3280 2910 1480

3490 1910 1690 3490 3280 100

89



100 320 450 100 320 1690

320 320 1230 320 820 1480

450 690 1110 450 690 1230

580 820 980 580 580 1690

690 580 820 690 580 1230

820 320 980 820 690 1230

980 580 820 980 820 1230

1110 450 690 1110 980 820

1230 820 820 1230 580 1110

1350 980 980 1350 980 1480

1480 450 1110 1480 1350 980

1690 320 1690 1690 1480 1910

1910 1910 1910 1910 580 1910

2120 2910 3030 2120 2370 1910

2250 2910 3280 2250 2620 2370

2370 2780 3150 2370 2490 3030

2490 3030 2490 2490 2620 2910

2620 2780 2780 2620 2910 2780

2780 3030 2780 2780 2780 2780

2910 3150 2780 2910 2910 2490

3Q30 3150 2910 303O 3150 2910

3150 2910 3030 3150 2370 2370

3280 2620 2490 3280 2780 2620

3490 3030 2120 3490 1910 2120

90



100 320 1910 100 320 1690

320 1230 1690 320 820 980

450 320 1350 450 580 980

580 450 1480 580 1910 980

690 450 1110 690 580 980

820 580 1350 820 690 1230

980 690 820 980 580 980

1110 820 1230 1110 1230 690

1230 820 1350 1230 580 980

1350 1350 1480 1350 980 980

1480 1350 1690 1480 580 980

1690 100 1690 1690 1110 1230

1910 2120 1910 1910 1910 3490

2120 2370 2250 2120 2370 2250

2250 2370 2250 2250 2620 00

2370 3030 2370 2370 2620 2370

2490 2780 2620 2490 3030 2490

2620 2780 2490 2620 2910 2490

2780 2910 2780 2780 3030 2620

2910 2910 2250 2910 2370 3150

3Q30 3030 2620 3030 3030 2620

3150 2370 2620 3150 2780 2910

3280 3030 2250 3280 3490 2250

3490 3490 1910 3490 2490 2120

91



100 3280 1690 100 580 2620

320 690 1480 320 820 1110

450 1230 1690 450 690 1110

580 820 1230 580 690 1110

690 690 980 690 980 980

820 820 1350 820 980 820

980 980 1110 980 820 980

1110 690 450 1110 980 980

1230 820 1480 1230 690 980

1350 580 1230 1350 690 980

1480 580 690 1480 690 00

1690 1690 1910 1690 3490 3490

1910 2780 1910 1910 2910 2490

2120 2780 2120 2120 2780 2780

2250 2910 1910 2250 2620 2910

2370 2780 3490 2370 2490 2780

2490 2910 3030 2490 2780 2780

2620 3280 3150 2620 2780 2620

2780 3030 2620 2780 2780 2620

2910 3030 2370 2910 2910 2780

3Q30 2910 2620 3030 2620 2620

3150 303• 1910 3150 2780 2780

3280 303 0 2490 3280 2780 2780

3490 3030 2370 3490 2620 2780

92



100 1690 1690 100 1350 1910

320 580 1910 320 690 1480

450 580 580 450 450 1230

580 690 580 580 450 1480

690 690 1110 690 690 580

820 820 580 820 820 580

980 980 980 980 580 980

1110 580 690 1110 1110 690

1230 820 450 1230 820 580

1350 580 580 1350 690 1690

1480 1110 320 1480 1110 1110

1690 320 100 1690 1480 1910

1910 320 100 1910 2490 1910

2120 3280 3490 2120 2370 1910

2250 3030 3030 2250 2780 2490

2370 2780 2620 2370 2910 2490

2490 3030 2910 2490 2910 2910

2620 2910 2490 2620 2780 2910

2780 2780 2910 2780 2910 2910

2910 2910 2620 2910 2780 2620

3Q30 2910 2490 3030 3030 2780

3150 2780 3030 3150 3030 2250

3280 3030 2370 3280 2910 2780

3490 1910 3280 3490 3030 2120

93



100 100 1690 100 580 1690

320 580 1690 320 980 1350

450 450 320 450 1230 580

580 450 580 580 1350 1350

690 690 450 690 690 690

820 820 580 820 690 1480

980 820 00 980 980 980

1110 1110 690 1110 450 1350

1230 980 450 1230 980 580

1350 980 1690 1350 820 320

1480 1350 1110 1480 00 690

1690 1690 1690 1690 1480 1480

1910 2120 1910 1910 3030 2490

2120 2370 1910 2120 2620 2370

2250 2780 3280 2250 2910 2370

2370 3030 2490 2370 2490 2490

2490 2620 2910 2490 2780 2620

2620 2780 2910 2620 2120 2250

2780 2910 2910 2780 2620 2370

2910 303 0 2910 2910 2910 2250

3Q30 3150 2490 3030 2250 2370

3150 3150 2120 3150 3150 3030

3280 3280 2370 3280 2620 2370

3490 3490 3490 3490 2910 2370

94



100 1110 1910 100 100 2120

320 690 1230 320 1110 1910

450 1230 1110 450 450 1690

580 1110 1110 580 450 320

690 820 980 690 820 1480

820 690 820 820 1350 1690

980 820 820 980 980 2780

1110 820 690 1110 1480 100

1230 820 980 1230 1110 1690

1350 820 1480 1350 1350 100

1480 1230 1230 1480 580 2620

1690 1480 1350 1690 100 100

1910 2620 1910 1910 1910 3490

2120 2370 2370 2120 2370 2250

2250 2490 2780 2250 2120 2120

2370 2490 2490 2370 2250 2910

2490 2620 2780 2490 2490 2370

2620 2620 2780 2620 3030 3150

2780 2620 2780 2780 2780 2490

2910 2620 2490 2910 2250 3150

3030 3030 2490 3030 2780 2490

3150 2916 2490 3150 2780 2120

3280 2620 2250 3280 2780 2490

3490 2910 1690 3490 1910 3490

95



100 1480 1690 100 3280 3280

320 580 1480 320 580 820

450 820 1350 450 820 690

580 980 1110 580 820 580

690 580 1350 690 820 820

820 580 1110 820 820 690

980 690 820 980 690 980

1110 690 1350 1110 980 580

1230 690 1230 1230 690 820

1350 820 1480 1350 820 320

1480 1350 1110 1480 1110 1230

1690 1690 1480 1690 580 3490

1910 2120 1910 1910 2910 3030

2120 2780 1910 2120 2780 3030

2250 2780 2370 2250 2780 2780

2370 2250 2370 2370 2780 2910

2490 2780 2370 2490 2780 2620

2620 303 0 2250 2620 2780 2490

2780 2910 2490 2780 2780 2490

2910 2910 2780 2910 2780 2780

303 0 2910 2370 3030 2780 2490

3150 2620 2620 3150 2780 2780

3280 2490 2120 3280 2910 2780

3490 1910 1910 3490 3030 2780

96

10 349 10 349

32 ---- -- 328 32 328--

41.-%315

148 ,303 5 303

9 291 169 9 291

24 ill\ 0 3249

1327123 /237

135 25 1 '5 3225

14 -22148 " --- 212

189 191 169 191

10 349 10 349

3 328 32 " 328

45 ,315 4542

\8303 69,/ %303

9/ 1291 9/ 1291

821 'ý 2 93 '278

98% 12, 98% 1262

11 2349 1 1 t249

123% 237 123 /237

Fiue6.Alsbjcs epne fo cua oatos 10/3° 5,5°9 n

135% 225 1353% /225

HRT. Th -ac shwtestnaddeito f h apls h

148 - -"212 148 .=-- 212

189 191

10 349 10 349

32 - .328 3 2

58/ %3W 8 0

69\ 9

82!Z8

9827

123% 237 123237

135% 225 13 225

148 212--- 14 212

Figure 65. All subjects' responses for actual locations, 100, 320, 450, 58', 690 and82' shown left to right, top to bottom respectively. Notation: x andsolid line indicates with HRTF, o and dashed line indicate withoutHRTF. The arcs show the standard deviation of the samples. The*and + indicates the mean with and without HRTF respectively.

97

10 349 10 349

32 .,• - .-.- - - ,- 328

32 ., ". .. " .328

5445

45 / %.,315

58 303 58303

69 291 69291

82 271 82 '278

311

S262

Ill\ 49 Il\ /249I /2

13\/1*'237 123 2, 37

148 '... - 212 148 '.. 212

1 59 191 169 191

10 349 10 349

32.. 328 32 -- .. 328

45 %'.,315 45 -1 .,315

583M3 58/ P303

629 69 ~2918 271 82 1 " 278

98 -26: 98 2e2

19 191

169 191

10 349

10 349

1238 1237 123% 23

139' /225 135' /225

,5. ,5,.-

.. ,,

1498 . = . 212 148' 212-

169

191

169

191

Figu1re,.. / 66,l'ub e t 're p n e f r a t all tons 98', 11 ',12 ', 3 ', 8

10 9 349 1

10 9 349

and 1692 shown left to right,

top to bottom respectively.

Notation is

same as above.

98

10 349 10 349

232328 32.' .328456 '.315 45/ ,i6 % 315

58 I '303 585 / %.303

69, .291 99, / '291

82 1 278 82 278

98 ' 122 98 1262

I

Il'

1 131'. • /'3

il\/249 /1249

123\' # 237 123 237

135s 1 25 135' 225

148 48 %,2.4 . 212

169 191 189 191

10 349 10 349

32 % ~~~328 2 --- 2

45 - %315 45 /` I .- 315

83 /303 58 I303

6 \ 291/ / .291

921 278 821 278

11 \ 249 1249

123, \ '237 123 247

135% /`225 135 1 225

148 !-- 212 148 -. -. 212

199 191 1o 1 I2I

10 349 10 349

32 .-328 2 ....- 328

45,- %%315 45/ '.315/

,t 5\.o

/

" \•

59/ ~' 303 59/ 5303

99/ 9 9/ '291

9•2 278 82 278

I

I

19 262 99% 1262I

I

ill\\

NO1

24924

123 '. " / 237 123 \ / 237

135 % /'225 135 N 225*%

/

'.

148•--,-..-...• 212 1489_-.,212

169 191 169 191

Figure 67. All subjects' responses for actual locations, 1910, 2120, 2250, 2370, 2490and 262' shown left to right, top to bottom respectively. Notation issame as above.

99

10 349 10 349

32 . '. 328 32".,s' ' . 328

45 /%.,315 45 -%315.,,.8 \303 59/ 303

69/ 291 69/ 291/ I

821 '278 82' 276

\)\\,123\ \" / 237 123 /237

135 "'225 135' '225

148' o'212 148'. 212

169 191 169 191

10 349 10 349

32 -- . . 328 32 - - -- - M32

45 0- %.315 45/ %-315

//8 \303 58/ 303

69/216 291

2 1 278 82 '279I I

98 1262 98 % 1262

111'\ 249 111 249

123 \ 237 123' /2371\,, ,, , ) ;, 1 .3 \ ,/ 3

135 225 135/ 22513 15

148' .S - 212 148' .212

169 191 169 191

10 349 10 349

32 •.• •-- 328 32 32

45. %/315 45w. \! 315/,,/ /31

so 303 56/ 0

1/ 291 691 21

Fiue68.Alsbet'rsoss9r cullctos/7° 9° 303°, 915,38

821 279 821 276

968 1262 96% '262

ill\ 249 ill\ p249

12'.\ 237 123' /237

135' N"226 135' /225

148 . 212 148' 212-

169 191 169 191

Figure 68. All subjects' responses for actual locations, 2780, 2910, 303 0, 3150, 3280and 349' shown left to right, top to bottom respectively. Notation issame as above.

100

Appendix B. Experiment 2 Data: AAMRL HRTF and RBF HRTF

101

AAMRL RBF AAMRL RBF


100 1110 1480 100 100 100

320 690 580 320 820 580

450 690 820 450 690 1110

580 450 580 580 690 820

690 580 450 690 820 690

820 580 980 820 1110 820

980 450 980 980 980 820

1110 820 690 1110 1110 1110

1230 980 980 1230 980 1230

1350 1110 1230 1350 690 1350

1480 1480 1350 1480 100 980

1690 1690 1690 1690 320 100

1910 1910 2120 1910 2370 2120

2120 2620 2490 2120 2490 2120

2250 3030 2780 2250 2620 2490

2370 2620 2620 2370 2490 2490

2490 2910 2910 2490 2370 2620

2620 303 0 3150 2620 2780 2620

2780 2910 3030 2780 2620 2620

2910 2910 2910 2910 2620 2780

303 0 2620 2620 3030 2780 2620

3150 2490 2490 3150 2490 2490

3280 3280 3280 3280 2120 2620

3490 2250 1910 3490 2370 3030

102

AAMRL RBF AAMRL RBF


100 - 100 1910 100 100 3490

320 690 320 320 450 320

450 1480 1230 450 320 1110

580 1350 980 580 320 580

690 580 580 690 690 580

820 690 1110 820 690 450

980 450 690 980 690 820

1110 820 580 1110 1110 450

1230 820 1230 1230 1350 1230

1350 820 1480 1350 820 1480

1480 1480 1480 1480 1230 1480

1690 1910 1690 1690 3490 1690

1910 2780 2120 1910 2250 1910

2120 2120 2120 2120 1910 2250

2250 2780 2120 2250 2780 2370

2370 2490 2490 2370 2910 2620

2490 2490 2620 2490 2910 2620

2620 2910 3030 2620 2780 2780

2780 3030 3030 2780 2780 3030

2910 3030 2620 2910 2910 3030

3030 3150 2620 3030 3150 3030

3150 2910 2910 3150 3280 303 0

3280 2250 3030 3280 2910 3150

3490 1910 2120 3490 3280 3490

103

AAMRL RBF AAMRL RBF


100 1690 320 100 1690 1910

320 1230 100 320 1110 320

450 1690 580 450 1110 320

580 580 980 580 450 450

690 980 690 690 450 690

820 820 690 820 690 820

980 1110 820 980 580 580

1110 980 980 1110 980 820

1230 980 690 1230 820 1110

1350 1350 1480 1350 1230 1690

1480 1350 1480 1480 1350 1350

1690 1690 1690 1690 1690 1690

1910 3030 2120 1910 2120 1910

2120 2120 2120 2120 2120 2120

2250 2250 2910 2250 2490 2490

2370 2370 2370 2370 2910 2620

2490 2780 2780 2490 2490 2780

2620 2780 2910 2620 2780 2780

2780 2780 2910 2780 2910 2780

2910 3030 2780 2910 2780 3150

3030 2910 2490 3030 3150 3030

3150 2910 3280 3150 3150 3280

3280 2250 2370 3280 3280 3280

3490 1910 2120 3490 3490 3280

104

AAMRL RBF AAMRL RBF


100 100 100 100 100 1910

320 1690 320 320 1480 450

450 450 690 450 1690 580

580 580 690 580 1480 320

690 580 820 690 320 580

820 820 690 820 820 820

980 690 690 980 820 450

1110 690 580 1110 820 820

1230 980 1110 1230 1690 1350

1350 690 1350 1350 320 320

1480 320 1480 1480 1110 820

1690 1690 1480 1690 1910 100

1910 2120 2120 1910 1910 3490

2120 2490 2620 2120 2620 2620

2250 2620 2490 2250 3030 2370

2370 2780 2620 2370 2780 2780

2490 2910 2780 2490 2250 2620

2620 2620 2910 2620 2370 2490

2780 3030 2780 2780 3150 2910

2910 2910 2780 2910 2780 2780

3030 2910 2780 3030 2780 2780

3150 2910 3030 3150 3280 3030

3280 2370 2780 3280 2910 2910

3490 2490 2490 3490 2490 3490

105

AAMRL RBF AAMRL RBF


100 1690 3280 100 1480 1480

320 450 100 320 580 580

450 320 320 450 320 1110

580 450 1350 580 320 690

690 320 320 690 980 690

820 580 580 820 450 320

980 690 450 980 690 820

1110 580 820 1110 690 1480

1230 580 450 1230 980 1110

1350 450 1480 1350 1230 1480

1480 1110 690 1480 1230 1230

1690 100 1910 1690 1480 1690

1910 2490 1910 1910 2490 1690

2120 2490 2910 2120 2370 2120

2250 2370 3150 2250 2370 3280

2370 3030 2620 2370 2490 2120

2490 3150 2490 2490 2620 2910

2620 2490 2780 2620 2370 2120

2780 3030 2780 2780 2620 3030

2910 2620 3150 2910 3030 3030

3Q30 3030 2780 3030 2490 2620

3150 29i1 3150 3150 3150 2620

3280 303 0 2910 3280 2120 2780

3490 3150 2120 3490 3150 2120

106

AAMRL RBF AAMRL RBF


100 580 1690 100 100 1690

320 1230 1350 320 450 320

450 580 980 450 1230 1480

580 580 450 580 820 1110

690 980 690 690 1230 690

820 580 690 820 1230 1110

980 450 1110 980 1110 1110

1110 1110 820 1110 980 1110

1230 980 580 1230 690 1350

1350 1480 1480 1350 1110 1350

1480 1230 1350 1480 1350 1480

1690 690 100 1690 1480 320

1910 1910 2120 1910 2490 2120

2120 2250 1910 2120 2910 2370

2250 2910 2490 2250 2910 3030

2370 2370 2620 2370 3150 3150

2490 2620 2910 2490 2370 3150

2620 2910 2780 2620 3150 3"50

2780 2250 2370 2780 3150 3030

2910 2370 3030 2910 3030 3150

3030 3150 3030 3030 3150 3150

3150 3030 2490 3150 3280 3030

3280 2780 3030 3280 2120 3280

3490 3280 3280 3490 2910 3280

107

AAMRL RBF AAMRL RBF


100 3490 100 100 1690 3280

320 1480 1480 320 1480 100

450 1230 580 450 1350 580

580 1110 820 580 1110 1230

690 580 690 690 690 580

820 690 1110 820 690 580

980 1230 1230 980 580 1110

1110 690 1230 1110 580 820

1230 690 1230 1230 1110 820

1350 1350 1350 1350 1230 580

1480 1480 1480 1480 100 580

1690 1690 1480 1690 2120 1910

1910 2250 1910 1910 2490 2120

2120 2370 2120 2120 2370 3030

2250 3150 2370 2250 2490 2780

2370 3150 2490 2370 2910 2620

2490 3030 2370 2490 2370 2780

2620 3030 2910 2620 3030 2910

2780 3280 3030 2780 2780 2780

2910 3030 3150 2910 2910 2780

303 0 2370 3150 303 0 303 0 3150

3150 30 0 3150 3150 2910 3030

3280 2250 3280 3280 2490 3150

3490 2120 3150 3490 3030 2120

108

AAMRL RBF AAMRL RBF


100 450 2370 100 1690 1480

320 320 580 320 1110 1230

450 580 450 450 980 820

580 1110 580 580 820 690

690 1110 690 690 820 820

820 980 690 820 580 580

980 820 580 980 1230 820

1110 980 1110 1110 690 690

1230 820 980 1230 820 980

1350 1110 580 1350 980 1230

1480 580 450 1480 1350 1110

1690 1910 1690 1690 1690 1690

1910 2120 2120 1910 1910 2120

2120 2370 2370 2120 2370 2250

2250 2370 2490 2250 2370 2370

2370 2370 2490 2370 2780 2370

2490 2490 2490 2490 2620 2780

2620 2910 2910 2620 2780 2620

2780 2910 2620 2780 2780 2780

2910 2780 3030 2910 2370 2910

3030 2490 2490 3030 00 2910

3150 3030 2780 3150 2120 2620

3280 3030 2370 3280 2250 2370

3490 2250 2370 3490 3280 1910

109

AAMRL RBF AAMRL RBF


100 1350 1690 100 1690 1480

320 1110 980 320 1350 580

450 1230 580 450 980 1230

580 820 1230 580 820 690

690 580 820 690 450 580

820 820 820 820 450 820

980 690 820 980 690 580

1110 690 1110 1110 690 580

1230 820 980 1230 1110 980

1350 580 1480 1350 1110 1350

1480 1350 1480 1480 1350 1480

1690 1480 1480 1690 1480 1690

1910 2490 1910 1910 2120 1910

2120 2120 2250 2120 2250 2250

2250 2910 2620 2250 2780 2620

2370 2780 2370 2370 2780 2910

2490 2620 2490 2490 2620 2620

2620 2780 2780 2620 3150 3030

2780 3030 2620 2780 3150 3030

2910 2910 2490 2910 3030 2910

3030 2910 3030 3030 3030 2780

3150 2620 3030 3150 2620 2620

3280 2910 3150 3280 3030 2620

3490 3150 2120 3490 2910 1910

110

AAMRL RBF AAMRL RBF


100 100 100 100 1480 100

320 450 320 320 580 820

450 320 320 450 690 980

580 690 320 580 690 580

690 450 320 690 690 980

820 450 820 820 820 1110

980 580 450 980 980 580

1110 450 450 1110 1110 820

1230 690 450 1230 690 1230

1350 580 320 1350 1110 690

1480 320 320 1480 690 1110

1690 100 100 1690 1230 1230

1910 100 100 1910 100 2120

2120 3150 3280 2120 2370 2250

2250 3150 2910 2250 2490 2370

2370 3150 3150 2370 2780 2490

2490 2780 3030 2490 2490 2780

2620 2910 3150 2620 2910 2780

2780 3030 303 0 2780 2780 2620

2910 2780 3030 2910 2490 2910

3030 3150 2780 3030 2620 3030

3150 3150 3030 3150 2490 2620

3280 3280 3280 3280 2370 3030

3490 3280 3490 3490 2370 2370

111

10 349 10 349

2332 - - - -28 2- 1 32

45`/S ,315 45/ %315

S58 \303

69 291 69291

82 1 .278 82 I2781 198 1 262 98 |1262

11/1249Ill\//249

123 123 237

135 "225 13 225

"3N9 -212

199 191 169 191

10 349 10 349

3,2 - - -- - '- 328 32 . -- - - .-- 328

545/0 % 11315 45,1- %.315

8 \303 0/ \303

69 291 89 291

82 278 82 278

98 1262 1,

9 35 %39 225 135322

•148 " , " 212 148 -'. ., " 212

189 191 189 191

10 349 10 349

a2 ,.•.,3128 32 328"" , .41,45 • %%315 45/ %315

58 / \303 58 f-/ %303

69 \291 691 291

/ /

82 l•••278 82 [••278

981262 18262

Il\29111 \' 249

123 /237 123 /237

135 225 135% /225

148 - - - - - - - 212 1489 -" , -__ - - - 212

199 191 169 191

Figure 69. All subjects' responses for actual locations, 10', 32', 459 , 58', 69' and

82' shown left to right, top to bottom respectively. Notation: x andsolid line indicates with AAMRL HRTF, o and dashed line indicate RBFHRTF. The arcs show the standard deviation of the samples. The,* and+ indicates the mean AAMRL and RBF HRTF respectively.

112

10 349 10 349

32 328 32 - - -... - .-328

4 5 0 %31 5 4 5 -1 3 1 5

58303 / %303

9 ~291 8/ ~291

278 8278

88 1262 98- 1262

1 /1249 Il\14

123 237 123\ /237

135% .225 135% /225

1 4 8 -- 2 1 2 1 4 8 '- o' 2 1 2

169 191 169 191

10 349 10 349

32 328 32 - .328

45 \ %315 45 N 0 ' 3 15.,

,3° 5 /8 %303 58, 303

69 I28l291 69 291

82 '278 82 278

98 1262 988 1 262

Ill 24 Il\249123\, 237 123% / /237

135%. 225 135' 225

148'. 212 148 '..12

169 191 169 191

10 349 10 349

32 ,328 32 - .328

45 / ' .315 45 %.-,315

5 303 58/ %303

19 291 9 1291

821 278 821 278

Fig re70 Al s2ec s8rs on e fo c u l o ai ns 8 ,11 ° 2 1 35,848

I''

3

'

98 24288 1242

Il\%/24 ill 24123 /237 123Q /237


same as above.

113

10 349 10 349

- 32 -- 328 32 - .328

45/ %.315 45 .p , %315

58/ 303 58/ / 3'303/

/ / •9 #291 89/ / \ 291

82 % 278 82 278

3I

I

98 1 262 98

1 262\I

/ 11

1 1 1 \2 4 9 i l l \ / 2 4 9

123. \/237 12323 / '237

/'.

/ d23

135 /225 135'. 225

148 . 212 148 * '. -'- 212

169 191 189 191

10 349 10 349

32 328 32- .328

45 ' ., 315 45 -%315

58•0 / /. <3 58 5 3',03

69 2/ ,1/ 92 9 1I I

82//19 82/8212 278 83

98% ,2,2 8 /282

,249 I 249

93/ l6 98

/

123'. \ / 237 123'. \ / 237/\S.

\ /

135 \0 225 135 % / 225S.

S

1 48 212 148 '212

169 191 188 191

10 349 10 349

32 328 32 - .3284 5 ' % 3 1 5 4 5 N 3 1 5

821 1278 82 '1278

I 2

98% V262 98% 1283

1il 249 ill %

249

123'\ . ",/,237 123'. \ /237135 %%,,

/225 135 %%b

/225

148 212- 148 .=------ 212

* 169 191 168 191

Figure 71. All subjects' responses for actual locations, 1910, 2120, 2250, 2370, 2490and 262' shown left to right, top to bottom respectively. Notation issame as above.

114

10 349 10 349

- 32 - -- . ",,,328 32 - .... -'.-- 328

45 // -' % % 315 45 - % ,315

"5 / / / 3 o 5 "58 / 3 5/ 03

69/ 291 89/29I

I821 278 82 27

98 , . 1262 982 '262

1 1 •" I2 4 9

ill /24 111% 412% 237 123~ 237

123 \3

135% 9225 135% 225

14 21 - 48 . --- 1

169 191 169 191

10 349 10 349

32 -, 328 32 - - -- - - 328

4 5 % 3 1 5 4 5 / % 3 1 5

58/ 3 58/ 303

69/ 291 9 291

82 278 82 278

98 1262 98 '2622 419 1 1 1 %

2 4 9

I I

1237 123 237a /

a

135 N . 225 135 ."225

148 ' -.- - - - - 212 148 . - - - - . 212

169 191 169 191

10 349 10 349

• -- '-'--S 3 .. **-. 32

3 2" .3 2 8 3 2 . -

45 %,.%315 45 -31

58/ 58 /

69/ %291 69/29

62 .278 278

98 262 98 1262

I\249 i% \249

2237 123 • /237123 \3

135 225 135% /225

148 . - - 212 148 - s - 212

169 191 169 191

Figure 72. All subjects' responses for actual locations, 2780, 2910, 3030, 3150, 3280and 3490 shown left to right, top to bottom respectively. Notation issame as above.

115

Appendix C. Matlab M-files

A sampling of the graphic commands in Matlab is listed in table 16.

Line and area fill commands. 3-D objects.plot3 Plot lines and points in 3-D. cylinder Generate cylinder.fill3 Draw filled 3-D polygons in 3-D. sphere Generate sphere.

comet3 3-D comet-like trajectories. Volume visualization.Contour and other 2-D plot slice Volumetric visualization plots.

of 3-D data. Graph appearance.contour Contour plot. view 3-D graph viewpoint

contour3 3-D contour plot. specification.clabel Contour plot elevation labels. viewmtx View transformation matrices.

contourc Contour plot computation. hidden Mesh hidden line removal mode.pcolor Pseudocolor (checkerboard) plot. shading Color shading mode.quiver Quiver plot. axis Axis scaling and appearance.

Surface and mesh plots. caxis Pseudocolor axis scaling.mesh 3-D mesh surface. colormap Color look-up table.

meshc Combination mesh/contour plot. Graph annotation.

meshz 3-D Mesh with zero plane. text Text annotation.surf 3-D shaded surface. title Graph title.

surfc Combination surf/contour plot. xlabel X-axis label.surfl 3-D shaded surface with lighting. ylabel Y-axis label.

waterfall Waterfall plot. zlabel Z-axis label for 3-D plots.

Table 16. Three dimensional graphic commands. Copyright (c) 1984-93 by TheMathWorks, Inc.

C.1 angleresponse.m

function [meanerr, stdall] =angleresponse (allsubjects);% ANGLERESPONSE Determines the mean error and standard deviation from file% allsubjects for each angle of azimuth in experiments one and two.% Written by Capt John K. Millhouse

[m,n]=size(allsubjects);% correct for bad responsesfor i=1:m,

for j=l:n,if allsubjects(i,j)==O,

allsubjects(i,j)=allsubjects(i,1);endif abs(allsubjects(i,j)-allsubjects(i,l))<180,

116

errall(i,j)=allsubjects(i,j)-allsubjects(i,l);else

errall(i,j)=allsubjects(i,j)-allsubjects(i,l)-360;end -

endendaz=errall(1:24,1);%. calculate mean and std and plot resultsfor i=1:24,

meanerr(i,1:3)=mea~n(errall(i:24:480,1:3));stdall(i,1:3)=(std(errall(i:24:480,1:3)));thetal~zeros (40,1);thetal(1:2:40)=allsubjects(i:24:480,2)*pi/180;rho lzeros (40, 1);rhol(1:2 :40) =ones (20 ,1);theta2=zeros (40,1);theta2(1:2:40)=allsubjects(i:24:480,3)*pi/180;rho2=zeros (40,1);rho2(1:2:40)=.95*ones(20,1);mypolar(thetal,rhol, 'g-')

hold onmypolar(theta2,rho2, 'r:')mypolar(allsubjects(i:24:480,2)*pi/180,ones(20, 1) ,Igmypolar(allsubjects(i:24:480,3)*pi/180, .95*ones(20,1), 'ro')mypolar([0 (allsubjects(i,1)+meanerr(i,2))*pi/180] ..

[0 1.05],'g*')mypolar([0 (allsubjects(i,1)+meanerr(i,3))*pi/180],.

[0 1.05],'r*')hold offtitle(['azimuth ,.

nium2str(allsubjects(i,1)),', ,.

nuna2str(meanerr(i,2)),', I,.

nuni2str(meanerr(i,3))])pause

end

C.2 anova.m

% get data %%%%%%%%%%%%%%%%7.%%.%%%%%%%%%%%allsubj ects=allsubjects2;trial=l;index=O;actual=allsubjects(:,4);with~allsubjects(: ,2);without=allsubjects(: ,3);

117

% correct for bad responses... .make mean of other subjects%without (107)=122 .78;%without (275)=122 .78;la=length(actual);for i=1:la

index~index+1;if index==25, index=1; endif with(i)0=, with(i)=sum(with(index:24:480) )/19;endif without(i)==O, without(i)=sum(without(index:24:480))/19;end

endfor i1l:la

if with(i)<180,rwith(i)=180-with(i);

elserwith(i)=540-with(i);

endif without(i)<180,

rwithout (i)=180-without Ci);else

rwithout Ci) =540-without Ci);end

end

witherr=zeros (480,1);withouterr~zeros (480, 1);rwitherr=zeros (480,1);rwithouterr=zeros (480,1);for i1l:la,

witherr Ci) =actual Ci) -with(i);withouterr(i)=actual(i)-without Ci);rwitherr(i)=actual (i)-rwith(i);rwithouterr(i)=actual(i)-rwithout Ci);if abs(witherr(i)).>180,

witherr(i)=abs(360 - abs(witherr(i)));else

witherr (i)'=abs (witherr Ci));endif abs Cwithouterr Ci)) >180,

withouterr~i)=abs (360 - abs Cwithouterr~i)));else

withouterr~i)=abs Cwithouterr~i));end

118

if abs(rwitherr(i))>180,rwitherr(i)=abs(360 - abs(rwitherr(i)));

elserwitherr(i)=abs (rwitherr(i));

endif abs(rwithouterr(i) )>180,

rwithouterr(i)=abs(360 - abs(rwithouterr(i)));else

rwithouterr (i)=abs (rwithouterr (i));end

end%correct f or reversalrevcount=O;for i1l:la,

if witherr(i)>rwitherr(i),witherr (i) =rwitherr (i);revcount~revcount+1;

endif withouterr(i)>rwithouterr(i),

withouterr(i)=rwithouterr (i);revcount=revcount+l;

endend

% start anova %%%%%%%%%%%%%%%X=[witherr withouterr];a=2;b=24;c=20;n=1;% get totals %%%%%%%%%%%%%%%%.%%/*.*./*./*/*./*./'./%%%%%%%.*/*%%%%%%

Ti~zeros(a,1);Tj=zeros(b,1);Tk~zeros(c,1);Tij~zeros(a,b);Tij2=zeros(a,b);Tik=zeros(a,c);Tjk~zeros(b,c);for i=l:afor j1l:bfor k=1:c

Ti(i)=Ti(i)+getx(X,i,j ,k);Tj (j)=Tj (j)+getx(X,i,j ,k);Tk(k)=Tk(k)+getx(X,i,j ,k);Tij (i,j)=Tij (i,j)+getx(X,i,j ,k);

119

Tik(i,k)=Tik(i,k)-9getx(X,i,j ,k);

Tjk(j ,k)=Tjk(j ,k)+getx(X,i,j ,k);

end

end

end

Tijk=X;

T~sum(sum(X));

% get sum of squares %Y%%%%%%%.Y%%%%%7%%%%%%%%%%SSa=O; SSb=O; SSc=O;

SSab=O; SSac=O; SSbc=O;

SSt=O; SSe=O;

for i1l:a

end

for j=1:b

SSb=SSb-i Tj(j).-2/(a*c);

endfor k1i:c

SSc=SSc+ Tk(k).-2/(a*b);end

for i1l:afor j1l:b

SSab=SSab+ Tij(i,j).-2/(c);

end

end

for i=1:a

for k=1:c

SSac=SSac+ Tik(i,k).-2/(b);

endend

for j1l:b

for k1l:c

SSbc=SSbc+ Tjk(j,k).-2/(a);

end

endSSab=SSab -SSa -SSb.+ T.-2/(a*b*c);

SSac=SSac -SSa -SSc + T.^2/(a*b*c);SSbc=SSbc -SSb -SSc + T.-2/(a*b*c);

SSa=SSa- T.-2/(a*b*c);

SSb=SSb- T.-2/(a*b*c);

SSc=SSc- T.^21(a*b*c);for i=l:a

for j=1:b

for k1l:c

SSt=SSt+ getx(X,i,j,k).-2;

120

endendendSSt=SSt- T.-2/(a*b*c);SHe= SSt - SSa - SSb - SSc -SSab -SSac -SSbc;

%mean squareMSa=SSa/(a-i);MSb=SSb/ (b-i);MSc=SSc/(c-i);MSab=SSab/( (a-i) *(b-i) )MSac=SSac/( (a-i) *(c-i) )MSbc=SSbc/( (b-i) *(c-i) )DFe=(a*b*c-MSe=SSe/DFe;

% F valueFa=MSa/MSe;Fb=MSb/MSe;Fc=MSc/MSe;Fab=MSab/MSe;Fac=MSac/MSe;Fbc=MSbc/MSe;

%. displaytable= [I a-i SSa MSa Fa

2 b-i SSb MSb Fb3 c-i SSc MSc Fci2 (a-i)*(b-i) SSab HSab Fabi3 (a-i)*(c-i) SSac MSac Fac23 (b-i)*(c-i) SSbc MSbc Fbco DFe SSe MSe 0o (a*b*c -i) SSt 0 0]

C. 3 drc.mr

function [range,minvalue,maxvaluel~drc(matrix);

%h Dynamic Range Compression (drc)% return range between 0 and i of input matrix columns%. Written by Capt John K. Millhouse

[m n] =size (matrix);

temp-matrix- (ones (in,i)*min (matrix));

121

range=temp./(ones(m,1)*max(temp));minvalue--min (matrix);

maxvalue--max (temp);

C.4 expmatl.mn

% Setup for experiment one% Creates a script file called 'run-.experiment'% Written by Capt John K. Millhouse

expaz=[10 32 45 58 69 82 98 111 123 135 148 169 191 212 225 237

249 262 278 291 303 315 328 349 1000 3200 4500 5800 6900 8200

9800 11100 12300 13500 14800 16900 19100 21200 22500 23700 24900

26200 27800 29100 30300 31500 32800 34900];

% randomly place sounds

row=randperm(48);

for i=1:48,

snd(i)=expaz(row(i));

temp=[PO' int2str(snd(i))];

if snd(i)< 100tsnd(i,1:3)=temp(1:3);

elseif snd(i) < 1000tsnd(i,1:3)=temp(2:4);


else

tsnd(i,1:3)=temp(2:4);

end

end

fid = fopen('run-.experiment' ,lw

fprintf(fid, '#!/bin/csh %s\n',' Y);

fprintf(fid, 'makecompass &%s\n',' ');

fprintf(fid, 'introduction %s\n',' ');

fprintf(fid,'set ans = $< %s\n',' 1);

for i=1:48,

if snd(i)<1000

texl['s32cplay -f40000 ../actualfilters/rl' int2str(snd(i)) '.d %s\n'];

tex2=['s32cplay -f 40000 . ./actualfilters/rl' int2str(snd(i)) '.d

echo ''Aactual I tsnd(i,:) "I >> results.out %s\n'];

else

122

texl=[Ls32cplay -f40000 ../meanHRTF/rl' int2str(snd(i)/100) '.d %s\n'];

tex2=Ii's32cplay -f40000 ../meanHRTF/rl' int2str(snd(i)/100) '.d

echo ''Bactual ' tsnd(i,:) ''I >> results.out %s\n'];

end

fprintf(iid,texl,,' ');

fprintf(fid,'sleep 1 %s\n',' ');

fprintf(fid,tex2,' ');

fprintf(fid,'clear Ys\n',' ');

fprintf(fid,'echo "'Select location and press return'" %s\n', );

fprintf(fid,'set ans =$< %s\n',' ');

end

end

fprintf(fid,'s32cplay -f 40000 thankyou.d %s\n',' ');

fclose(fid);

C.5 expmat2m

% Setup for experiment two% Creates a script file called 'run-..experiment'% Written by Capt John K. Millhouse

expaz=[10 32 45 58 69 82 98 111 123 135 148 169 191 212 225 237

249 262 278 291 303 315 328 349 1000 3200 4500 5800 6900 8200

9800 11100 12300 13500 14800 16900 19100 21200 22500 23700 24900

26200 27800 29100 30300 31500 32800 34900];

% randomly place soundsrow=randperm(48);

for i=1:48,

snd(i)=expaz(row(i));

temp=['0' int2str(snd(i))];

if snd(i)< 100



elseif snd(i)' < 10000tsnd(i,1:3)=temp(1:3);

else


end

end

123

fid = fopen('run-.experiment' ,'w');fprintf(fid, '#!/bin/csh %s\n',' ');fprintf(fid, 'makecompass &Ys\n',' ');fprintf(fid, 'introduction %s\n',' ');fprintf(fid,'set axis = $< %s\n',' ');

for i=1:48,

if snd(i)<1000

texl=['echo ''Aactual ' tsnd(i,:) "'' >> results.out;

s32cplay -f 40000 ../actualfilters/rl' int2str(snd(i)) '.d %s\n'.J;

tex2=['s32cplay -f 40000 ../actualfilters/rl' int2str(snd(i)) '.d %s\n'1;

else

texl=['echo ''Bactual I tsnd(i,:) ''I >> results.out;

s32cplay -f 40000 . ./netfilters/rl' int2str(snd(i)/100) '.d %s\n'];tex2=['s32cplay -f 40000 ../netfilters/rl' int2str(snd(i)/100) '.d %s\n'];

endfprintf(fid,texl,' ');fprintf(fid,'sleep 1 %s\n',' ');fprintf(fid,tex2,' ');fprintf(fid,'clear %s\n', );

fprintf(fid,'echo ''Select location and press return'' %s\n',' )fprintf(fid,'set axis = $< %s\n',' ');

end

end

fprintf(fid,'s32cplay -f40000 thankyou.d %s\n',' ');

fclose(fid);

C. 6 idrc.mr

function [matrix]=idrc(range,minvalue,maxvalue);

% inverse Dynamic Range Compression (idrc)% return matrix to original values% Written by Capt John K. Millhouse

[m n]=size(range);

temp~range.*(ones(m,l)*maxvalue);matrix~temp+(ones Cm, 1)*minvalue);

124

C.7 LRsphere.m

% Create left and right spheres of HRTF data% Written by Capt John K. Milihousek=1;for i=1:2:90,freq-i;[x y z c]=makesphere(features,freq,16);

subplot(1,2,1);surf(x,y,z,c);caxis([0, 0.7915]);view(180,0);shading interp,axis offtitle (nixn2str (features Ci, 4)))text(.1,1,0,'(&) <-- Right ear')xlabel('right side of head');

subplot (1,2, 2);surf(x,y,z,c);caxis([0, 0.7915]);view(0,0);shading interpaxis offtext(-.1,-1,0,'(&) <-- Left ear')text(-1.8,0,0,'<-- Nose -- >')

xlabel('left side of head');title(C'Hz ' )

colormapC jet)k=k+l;end

0.8 makesphere. m

function Ex y ,z, c]ft makesphere(features,freq,size)% freq range 1 to 93%. Create sphere with color corresponding to HRTF value% Written by Capt John K. Millhousek=0;clear az elev respfor i~freq:93: 25296,

k~k+1;az(k)=features(i,2);elev(k)=features(i,3);

125

resp(k)=features(i,5);endc--mapsurf(az,elev,resp,size);Ex y z]=sphere(size);

C.9 mapsurf~m

function [c] = mapsurf(az,elev,value,spheresize)% maps the HRTF value to the az and elev% locations on a sphere% Written by Capt John K. Millhouse

n~length(az);1=0;

c =zeros(spheresize+1,spheresize+1);v =c;

for I =1:n,a fix(az(I)/360*(spheresize+1))+1;e =spheresize+1 - fix((elev(I)+90)/180*(spheresize+l));if e>spheresize+1

e=spheresize+1;endif e<1

e=1;endif a>spheresize-I-

a=spheresize+1;endif a<1

a=1;endv(e ,a)=v(e ,a)+1;c(e,a)=c(e,a)+value(I);

end

for I = :spheresize+1,for J = 1:spheresize+l,

if v(I,J)==0else

c(I,J) = c(I,J)/v(I,J);end

end

126

end

C.10 mapeval.m

function [actual, netout, error]=mapeval(x, y, data);% Written by Capt John K. Millhouse% Converts LNKmap output to Matlab matrix suitable for% display% data in form [number actual netout error]

nx=length(x);ny=length(y);actual=zeros(nx,ny);netout=zeros(nx,ny);error=zeros(nx,ny);k=O;

for i=1:nx,for j=l:ny,

k=k+l;actual(i,j)=data(k,2);netout(i,j)=data(k,3);error(i,j)=data(k,4);

endendactual=actual';netout=netout';error=error';

C.11 mypolar.m

function hpol = mypolar(theta,rho,line.style)%POLAR Polar coordinate plot.% POLAR(THETA, RHO) makes a plot using polar coordinates of% the angle THETA, in radians, versus the radius RHO.

% POLAR(THETA,RHO,S) uses the linestyle specified in string S.

% See PLOT for a description of legal linestyles.

% See also PLOT, LOGLOG, SEMILOGX, SEMILOGY.

% Copyright (c) 1984-93 by The MathWorks, Inc.% Modified by Capt John K. Millhouse to display% head figure and test locations.

if nargin < 1error('Requires 2 or 3 input arguments.')

elseif nargin == 2

127

if isstr(rho)line-style = rho;rho = theta;

[mr,nr] = size(rho);if mr == 1

theta = 1:nr;else

th = (1:mr)';theta = th(:,ones(1,nr));

endelse

line-style = 'auto';end

elseif nargin == 1line-style = 'auto';rho = theta;

[mr,nr] = size(rho);if mr == 1

theta = 1:nr;else

th = (1:mr)';theta = th(:,ones(1,nr));

endendif isstr(theta) I isstr(rho)

error('Input arguments must be numeric.');

endif any(size(theta) -= size(rho))

error('THETA and RHO must be the same size.');

end

% get hold state

cax= newplot;next = lower(get(cax,'NextPlot'));hold-state = ishold;

% get x-axis text color so grid is in same colortc = get(cax,'xcolor');

% only do grids if hold is off

if -hold-state

% make a radial gridhold on;

hhh=plot([O max(theta(:))],I[o max(abs(rho(:)))]);

128

v = [get(cax,'xlim') get(cax,'ylim')];ticks = length(get(cax,'ytick'));delete (hhh);

% check radial limits and ticksrmin = 0; rmax = v(4); rticks = ticks-i;if rticks > 5 % see if we can reduce the number

if rem(rticks,2) == 0rticks = rticks/2;

elseif rem(rticks,3) == 0rticks = rticks/3;

endend

%. define a circleth = 0:pi/50:2*pi;xunit = cos(th);yunit = sin(th);

%now really force points on x/y axes to lie on them exactlyinds = [1: (length(th)-1)/4:length(th)];xiumits(inds(2:2:4)) = zeros(2,I);yunits(inds(1:2:5)) = zeros(3,1);ynose=[.24; .3; .24]*rmax;xnose=[.05; 0; -.05]*rmax;year=[.05; .05; -.05; -.05]*rmax;xear=[.24; .3; .3; .24]*rmax;

rinc = (rmax-rmin)/rticks;i=rmax;plot (xunit*i,yumit*i,':','color' ,tc, 'linewidth' ,1);i=rmax/4;plot(xunit*i,yiumit*i, '-','color' ,tc, 'linewidth' ,1);plot(xnose,ynose, '-','color' ,tc, 'linewidth' ,1);plot(xear,year, '-','color' ,tc, 'linewidth' 4);plot(-xear,year, '-','color' ,tc, 'linewidth' ,1);

% plot spokesth =[10 32 45 58 69 82 98 111 123 135 148 169 191

212 225 237 249 262 278 291 303 315 328 349] *pi/l8O;cst =-sin(th);

snt =cos(th);

cs = zeros(1,24); cst];sn =[zeros(1,24); snt];%plot(rmax*cs,rmax*sn,': ','color' ,tc, 'linewidth' 4);

129

% annotate spokes in degrees

rt = 1.1*rmax;for i = 1:max(size(th))

text(rt*cst(i),rt*snt(i),int2str(th(i)*180/pi),'horizontalalignment','center')

end

% set viewto 2-Dview(O,90);

% set axis limitsaxis(rmax*[-1 1 -1.1 1.1]);

end

% transform data to Cartesian coordinates.xx = rho.*(-sin(theta));yy = rho.*cos(theta);

% plot data on top of gridif strcmp(line.style,'auto')

q = plot(xx,yy);else

q = plot(xx,yy,line-style);end

if nargout > 0hpol = q;

endif -hold-state

axis('equal');axis('off');end

% reset hold stateif -hold-state, set(cax,'NextPlot',next); end

C.12 plotresult.m

function [h]=plotresult(result,rquestion);

% plot results of a subjects response% Written by Capt John K. Millhouseif nargin < 2

rquestion = 'nor';endrhoslO=ones(1,360);rhos20=ones(1,360);

130

for i=1:length(result),thetal=result (i, 2) *pi/180;theta2=(result (i 33)) *pi/180;lt='yo'; -

if rquestion'=rev',if abs(result(i,4))>abs(result(i,6))

theta2=(result (i, 5)) *pi/180;

end r+'end

index=round(theta2*180/pi)+1;if result(i,1)=10

rhoslO(index)=rhosl0(index)- .05;

rholO~rhoslO(index);subplot (1,2, 1)mypolar(thetal,1, 'gx')hold onmypolar (theta2, rho10, lt)mypolar( [thetal; theta2], [1; rholO] ,)g

title('With HRTF')else

rhos20(index)=rhos2O(index)- .05;rho20=rhos20 (index);subplot(1,2,2)mypolar(thetail , 1 mx')hold onmypolar(theta2,rho20 ,lt)mypolar( Ithetal; theta2] ,[1; rho20] ,)i-'))title('Withbut HRTF')

endend

for i=1:2subplot (1,2, i)hold offend

C.13 plot hi st.m

function [correct lOtheta, correct20theta] =plothist (testall ,testlO,test2O,rquestion,correctdeg);

% Written by Capt John K. Millhouse% Plots Polar histogram of data of correct responses%. testall is all test responses

131

% testlO is test responses for filterl% test20 is test responses for filter2% rquestion is 'rev' reverses or 'nor' no reverses% correctdeg is the number of degrees off an still correct

if nargin<5correctdeg=23;

end%%%%%%% plot histogramsactualtheta=testall(:,2)*pi/180;testtheta=testall(:,3)*pi/180;testwrevtheta=[testlO(:,3)*pi/180; test20(:,3)*pi/180];testlOrevtheta=[testlO(:,3)*pi/180];test20revtheta=[test20(:,3)*pi/180];testlOtheta=testall(l:length(testlO),3)*pi/180;test2Otheta=testall(length(testlO)+l:length(testall),3)*pi/180;for i=l:length(testlO),

if abs(testlO(i,4))<correctdeg,correctlOtheta=[correctlOtheta; testlO(i,2)*pi/180];

endendfor i=l:length(test20),

if abs(test20(i,4))<correctdeg,correct2Otheta=[correct2Otheta; test20(i,2)*pi/180J;

endendbins=180;

% number of bins in the histogram% if the number of bins is >90 then offset the bins for displayif bins>90

correctlOtheta=correctlOtheta+(pi/90);correct20theta=correct2Otheta-(pi/90);testlOtheta=testlOtheta+(pi/90);test20theta=test2Otheta-(pi/90);

end[tact,ract]=rose(actualtheta,bins);[tclO,rclO]=rose(correctlOtheta,bins);[tc20,rc2O]=rose(coirect2Otheta,bins);

[tlO ,rlO] =rose (test lOtheta,bins);[t20,r20] =rose (test2Otheta,bins);

figuremypolar(tact,ract/2,'w:')hold onmypolar(tlO,rlO,'w-.')

132

mypolar(t20,r20, 'w-')title(['Histogram of all responses'])legend('w:','Actual','w-.','with HRTF','w-','without HRTF')hold offfiguremypolar(tact,ract/2, 'w:')hold onmypolar(tclO,rclO, 'w-.')

mypolar(tc2O,rc2O, 'w-')title(['Histograin of correct responses I I(' rquestion ')'])legend('w:','Actual','w-.','with HRTF correct','w-','without HRTF correct')

hold off

C. 14 readall. m

function [rmserr,stderr,testall,revall,testlO,test2O]=readall(rquestion, subjects);

%%%%%%%%%% %%%%../ %%%%%%%/W.././/%Y%%%%%%%.%%%%%%%%%%%%%%%X%%%. Read all the test subjects data and save to a %% file named alisubjects%% by J. Millhouse%

%folder= ['meanHRTFresults'];folder= ['results'];

if nargin < 1rquestion = 'nor'

endif nargin<2

if length(folder)=15% meanxHRTFresults %%%%%.%%%%%%%.%%%%%X%%%%%%%%Y%%X%%Y..%%%%%%%%%%%%subjects=['bsmit~hl.out , 'bsmith2.out' ,'jmillhousel.out',....

'jmillhouse2.out' ,'bmcql.out' , bmcq2.out',...'ceisenbiesl.out' ,'ceisenbies2.out , 'gharrupl.out',...'~lmyersi.out' , lmyers2.out' , rdenneryl.out',....'sstewartl.out' , twilsonl.out' , twilson2.out',..'amillhousel.out , 'bobl.out' , jrmillhousel.out',....'djenningsl.out' ,'rjreidi.out'];

else% results %%%%%%%%%%%%%%XX%%%%%%%X%%%%%%%%%%%%%Y%%%Y%%Y/%%%%%%subjects=['ksmithl.out' , rdenneryl.out' ,'jmillhousel.out',....

133

'jmillhouse2.out' , bsmithi.out' , twilsonl.out',...'gharrupl.out , 'jrmillhousel.out' ,'jrmillhouse2.out',...'lmyers2.out' , lmyers3.out' , ddouglasl.out',...

- ddouglas2.out' , rmillhousel .out' , rmillhouse2.out',...'ceisenbiesl.out , 'ceisenbies2.out' , djenningsl.outI,....'srogersl.out' ,'amillhousel.out'];

end

endtestall=EJ;testall2=[];revall=[];start=1;finish=1;for i=4: length(subj ects)

j=i-3;if subjects(j :i)==3.out'

finish~i;na~me=subjects(start :finish-4)[test rev]=readresult([folder '/1 name 'ot)%f igure%plotresult (test, rquestion);start~finish+1;testall=[testall; test];revall=trevall; rev];

%%%%%%%%%%%%%.%%%%%X%%%%%%%%%X%%X7..%%%%%%%.Y%%%%%%% make output formatevalCEname '=zeros(24,3);I]);f=sortdata(testC: ,1:3) ,2);

c=0;oldaz=O;for a=1:length(f)

if f(a1l)==1O,b=2;

elseb=3;

endaz=f(a,2);if oldaz<az c~c+l; endoldaz~az;eval([name '(c,1)=f(a,2);']);eval(Ename '(c,b)=f(a,3);']);

endeval(['testall2=[testall2; I name 1')

end

134

endeval(['save ' folder '/allsubjects testall2 -ascii -tabs'])testall=sortdata(testall, 1);revall=sum(revall);

testi 10=[;test20=[];test=testall;for i=l:length(testall)%%%%%%%%%%%%%%7Y/%/%/%7%%%%%%%%%%%%%%%%%%%//Y%%%%////%%%%% get reversalsif rquestion'=rev',if abs(testall(i,4))>abs(testall(i,6))

testCi, 4) =testCi, 6);test(i,6)=test(i,3);test(i,3)=test(i,5);test(i,5)=test(i,6);test(i,6)=999;

endend

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% split into twoif testall(i,1)==10

testlo=[testlo; test(i,:)];else

test20=[test2O; test(i,:)];end

end

avgerr=[mean(testlO(: ,4)) mean(test20(: ,4))];rmserr=sqrt(avgerr. -2);stderr=[std(testIO(: ,4)) std(test20(: ,4))];

Y%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%W/%%%%Y% plot histograms%plothist (testall, test10, test2 , rquestion);

C.15 eoeemfunction [y,z]=removerev(x);% Written by Capt John K. Hillhouse

[m,n]=size(x);

for i=i:m,if x(i,n)<999,

y~ly; x~i,:)];else

135

z=[z; x(i,:)];end

end

C.16 sortdata.m

function [ Y J=sortdata(filel, column)

En,m]=size(filel);[X,I]=sort(filei);for j1l:n,

for k=1:m,if k=column

Y j , k)=X Q , column);else

Y(j ,k)=filel(I(j ,column) ,k);end

endend

136

Bibliography

1. Abbagnaro, Louis A., et al. "Measurements of Diffraction and Interaural Delayof a Progressive Sound Wave Caused by the Human Head," Journal of theAcoustical Society of America, 58:693-700 (September 1975).

2. Begault, Durand R. "Head-Up Auditory Display for Traffic Collision AvoidanceSystem Advisories: A Preliminary Investigation," The Journal of the HumanFactors and Ergonomics society, 35(4):707-717 (December 1993).

3. Begault, Durand R. and Elizabeth M Wenzel. "Headphone Localizationof Speech," The Journal of the Human Factors and Ergonomics society,35(2):361-376 (June 1993).

4. Blattner, Mecra M., et al. "Earcons and Icons: Their Structure and CommonDesign Principles," Human-Computer Interaction, 4:11-44 (1989).

5. Blauert, Jens. Spatial Hearing. MIT Press, 1983.

6. Bly, Sara. Sound and Computer Information Presentation. Thesis, ComputingScience Group University of California, Davis, Davis, CA, March 1982.

7. Fitts, P. Psychological research on equipment design. A study of location dis-crimination ability 19, Columbus, Ohio: Ohio State University: Army Air Force,Aviation Psychology Program., 1947.

8. Furness, T. "The Super Cockpit and its human factors challenges." Proceedingsof the Human Factors Society 30th Annual Meeting. 48-52. Santa Monica, CA:Human Factors Society, 1986.

9. Gonzalez, Rafael C. and Richard E. Woods. Digital Image Processing. Addison-Wesley, 1992.

10. Kahn, M. J. "Human awakening and subsequent identification of fire-relatedcues." Proceedings of the Human Factors Society 27th Annual Meeting. 806-810. Santa Monica, CA: Human Factors Society, 1983.

11. Kuhn, George F. "Model for the Interaural Time Differences in the AzimuthalPlane," Journal of the Acoustical Society of America, 62:157-167 (July 1977).

12. McKinley, Richard L. Concept and Design of an Auditory Localization CueSynthesizer. MS thesis, Air Force Institute of Technology (AU), 1988.

13. Middlebrooks, John C. "Directional Sensitivity of Sound-Pressure Levels in theHuman Ear Canal," Journal of the Acoustical Society of America, 86:89-108(July 1989).

14. Milton, J. S. and Jesse C. Arnold. Introduction to Probability and Statistics(second Edition). McGraw-Hill, 1990.

15. Moller, Henrik. "Fundamentals of Binaural Technology," Applied Acoustics,36:171-218 (March 1992).

137

16. Neti, Chalapathy, et al. "Neural Network Models of Sound Localization Basedon Directional Filtering by the Pinna," Journal of the Acoustical Society ofAmerica, 3140-3156 (December 1992).

17. Oldfield, Simon R. and Simon P. A. Parker. "Acuity of Sound Localisation:A Topgraphy of Auditory Space. I. Normal Hearing Conditions," Perception,13:581-600 (1984).

18. Oldfield, Simon R. and Simon P. A. Parker. "Acuity of Sound Localisation: atopography of auditoy space. II. Pinna cues absent," Perception, 13:601-617(1984).

19. Park, Jooyoung and Irwin W. Sandberg. "Approximation and Radial-Basis-Function Networks," Neural Computation, 5(2):305-316 (March 1993).

20. Rogers, Steven K., et al. An Introduction to Biological and Artificial NeuralNetworks. Wright-Patterson AFB OH, 45433: Air force Institute of Technology(AU), October 1990.

21. Ruck, Dennis W. From a conversation, May 1994.

22. Sanders, Mark S. and Ernest J. McCormick. Human Factors in Engineeringand Design (seventh Edition). McGraw-Hill, 1993.

23. Scarborough, Captain Eric L. Enhancement of Audio Localization Cue Synthe-sis by Adding Environmental and Visual Cues. MS thesis, Air Force Instituteof Technology (AU), 1992.

24. Shaw, Edgar A.G. "Acoustical Characteristics of the Human External Ear."Conference on Binaural and Spatial Hearing. 1993.

25. Smith, Brain A. Binaural Room Simulation. MS thesis, School of Engineering,Air force Institute of Technology (AU), Wright-Patterson AFB OH, December1993.

26. Smith, Stuart, et al. "Stereophonic and Surface Sound Generation for Ex-ploratory Data Analysis." Chi '90 proceedings. 125-132. April 1990.

27. Spiegel, Murray R. Schaum's Outline of Theory and Problems of Statistics(second Edition). Schaum's Outline Series, McGraw-Hill, 1994.

28. Stevens, S. S. and E. B. Newman. "The localization of actual sources of sound,"American Journal of Psychology, 48:297-306 (1936).

29. Watkins, A. J. "Psychoacoustical aspects of synthesized vertical locale cues,"Journal of the Acoustical Society of America, 63:1152-1165 (1978).

30. Wenzel, Elizabeth M. "Localization in Virtual Acoustic Displays," Presence, I(1992).

31. Wightman, Frederic L. and Doris J. Kistler. "Factors Affecting Relative Impor-tance of Sound Localization Cues." Conference on Binaural and Spatial Hearing.1993.

138

Captain John K. Millhouse was born on 11 October 1966 in Milwaukee, Wis-

consin. He graduated from Centerville High School, Centerville Ohio in 1985 and

attended Ohio University from which he received a Bachelor of Science degree in

electrical engineering in June 1989. He was commissioned though the Air Force Re-

serve Officer Training Corp (AFROTC) at Ohio University and assign to the Range

and Air Base Systems Program Office, Air Combat Maneuvering Instrumentation

(ACMI) Directorate, Eglin AFB, Florida. He entered the Masters Program in the

School of Engineering, Air Force Instituted of Technology, in June 1993.

He is married to Audra L. Millhouse (Baker) from Grafton, Ohio. They have

one son James Jerome Millhouse.

139

SForm Approved

REPORT DOCUMENTATION PAGE No. 0704-0188c- - r or :-. .ec,.oh f ,tcrma.on is est,,matea :o average 1 hour e -esporse. ecoudi9 the t ,me for ee in sru os searching ex ~sng aata sources,

.ather ane •n o•lt he eq te ja:a needed, and cormpletir'g 3ha red ie- d :re *2Ilectnon of PformatlOn. Send comments reoac nq this burden estirate or any Dtier asoect of tes

zuje-_ on Of tn-a:'or. ncludng suggestions for reducing tins ourden t o -islh,ngton Headauarters Services, Directorate o- information Operations and Reocrts. 1215 Jefferson

Da,!s H~h vSuit.e 12C4, 4,,hnctci iA 22202-4302, and to tie Dffi2e f ManQaemeet and Budget, Pýnerwork Reducton Project (0704-0138), Vashinton DC 20503

1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3 REPORT TYPE AND DATES COVEREDecember 1994 aster's Thesis

4. TiTLE AND SUBTITLE 5. FUNDING NUMBERS

[Iead Related Transfer Function Approximation Using Neural Networks

6. AUTHOR(S)

rohn Kindell Millhouse

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATIONI. REPORT NUMBER

Air Force Institute of Technology, WPAFB OH 45433-6583 REOT/ NUME RAFIT/GE/ENG/94D-21

9. SPONSORING 'MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING / MONITORINGAGENCY REPORT NUMBER

None

11. SUPPLEMENTARY NOTES

12a. DISTRIBUTION!/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE

Distribution Unlimited

13. ABSTRACT (Maximum 200 words)

This thesis determines whether an artificial neural network (ANN) can approximate the Armstrong Aerospace Medical

Research Laboratories (AAMRL) head related transfer functions (HRTF) data obtained from research at AAMRL

during the fall of 1988. The first test determines whether HRTF lends any support in sound localization when

compared to no HRTF (Interaural Time Delay only). There is a statistically significant interaction between the

location of the sound and whether the HRTF or no HRTF is used. When this interaction is removed using the

alternate F-Value, the statistics give the result of equal means for the filters and azimuth. This means that at

certain angles of azimuth, the HRTF either provides no advantage at all or hinders localization capabilities. Adding

in the corrections for reversals changes the results to where the means of the azimuth are not statistically equal. The

reversal corrections will inherently reduce the error results. However, comparing the number of reversals indicates

an advantage of using the HRTF over no HRTF. The second test determines whether HRTF and ANN lend the

same amount of imformation in sound localization when compared to each other. With reversal corrections included,a statistical advantage is not indicated, and the means of the two filters are statistically equal. Also there is a

statistically significant interaction between the location of the sound and whether the AAM RL HRTF or ANN

HRTF is used. Comparing the number of reversals does not indicate a large difference between the AAMRL HRTF

and the RBF HRTF.

14. SUBJECT TERMS 15. NUMBER OF PAGES

3D sound, HRTF, Neural Networks 15516. PRICE CODE

17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACTOF REPORT1 OF THIS PAGE OF ABSTRACT

UNCLASSIFIED UNCLASSIFIED IUNCLASSIFIED UL

NJSN 7540-01-280-5500 Standard Form 298 (Rev 2-89)Prescribed by 4NS Std Z39-18298-102

Documents

Captain. USAF - DTIC · Sad Related Transfer Function Approximation Using Neural Netv THESIS John K. Mlillhouse Captain. USAF AFIT/GE/ENG/94D-21 J, DEPARTMENT OF THE AIR FORCE AIR