Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Sad Related Transfer Function Approximation Using Neural Netv
THESIS
John K. MlillhouseCaptain. USAF
AFIT/GE/ENG/94D-21 J,
DEPARTMENT OF THE AIR FORCE
AIR UNIVERSITY
AIR FORCE INSTITUTE OF TECHNOLOGY
Wright-Patterson Air Force Base, OhioDiSTFRBUTIGfN STATEMENT A
Accroved for public release: I
AFIT/GE/ENG/94D-21
Head Related Transfer Function Approximation Using Neural Networks
THESIS
John K. Millhouse Accesion ForCaptain, USAF NTIS CRA&I
DTIC TAB|Unaninounrced El
AFIT/GE/ENG/94D-21 J Uistiicon - -......................
By------------------------------------
Distribut0on I
AvUi bII~ •Codes
i Av aii a ndJ! or>.Dist SpOcial
Approved for public release; distribution unlimited
AFIT/GE/ENG/94D-21
Head Related Transfer Function Approximation Using Neural
Networks
THESIS
Presented to the Faculty of the Graduate School of Engineering
of the Air Force Institute of Technology
Air University
In Partial Fulfillment of the
Requirements for the Degree of
Master of Science (Electrical Engineering)
John K. Millhouse, B.S. Electrical Engineering
Captain, USAF
December, 1994
Approved for public release; distribution unlimited
Acknowledgements
In thanking the many people who helped me, I will start at the very top. No
one deserves more credit for this work than my Lord and Savior-Jesus Christ.
My thesis committee, composed of Dr. Steve Rogers, Dr. Mark Oxley, Bar-
bara McQuiston, Dr. Dennis Ruck, and Dr. Matthew Kabrisky were the source of
ideas, and technical expertise for the research in this thesis. A number of students
were very helpful in teaching me better ways to use the Sun Workstations in the
Signal Processing Lab. In particular Capt Brian Smith was extremely helpful with
answering my questions regarding the use of ESPS and 3D sound in general. If not
for the contributions of each of these individuals, my work load would have been
even heavier. Finally, I would like to thank my wife, Audra, for her support and
patience throughout this research effort.
John K. Millhouse
ii
Table of Contents
Page
Acknowledgements ........................................ ii
List of Figures . . .......... ............................... vii
List of Tables ......... ................................... xii
Abstract .......... ...................................... xiii
1. Introduction ......................................... 1
1.1 Background ............................. .... 1
1.2 Problem ........ ............................ 4
1.3 Definitions ..................................... 4
1.4 Research Objectives ............................ 6
1.5 Scope ......... ............................. 6
1.6 Approach ........ ........................... 7
1.6.1 Synthesizing 3D Cues ..................... 7
1.6.2 Artificial Neural Networks ................... 10
1.6.3 LNKmap. ............................. 13
1.7 Thesis Outline ........ ........................ 14
II. Background ............. ............................. 16
2.1 Introduction ........ .......................... 16
2.2 Graphical Display ....... ...................... 17
2.3 Graphical and Auditory Displays ................... 18
2.4 Acuity of Sound Localization I ..................... 21
2.4.1 Interaural Time Difference ................. 22
iii
Page
2.4.2 Interaural Intensity Difference ............... 22
2.4.3 Experiment 1 ....... .................... 23
2.4.4 Experiment 2 ....... .................... 24
2.4.5 Discussion ............................. 25
2.5 Acuity of Sound Localization II .................... 25
2.5.1 Experiment ....... ..................... 25
2.5.2 Results ............................... 25
2.6 Headphone Localization of Speech ..... ............. 26
2.7 Learning LNKmap with a simple function ............. 27
2.7.1 Linear function ...... ................... 27
2.7.2 Non-linear functions ...... ................ 29
2.8 Learning LNKmap with a Cosine function ............. 33
2.8.1 Cosine Revisited ....... .................. 34
2.8.2 Conclusion ............................. 36
2.9 Learning LNKmap with a Two Input Cosine function . . . 38
2.9.1 RBF Architecture ........................ 39
2.9.2 MLP Architecture ........................ 40
2.9.3 Conclusion ............................. 44
2.10 Conclusion ........ .......................... 45
III. Methodology ......... ............................... 46
3.1 Introduction ................................. 46
3.2 Learning HRTF Data for Zero Elevation .............. 46
3.2.1 MLP Architecture ........................ 46
3.2.2 RBF Architecture ........................ 47
3.2.3 Conclusion ............................. 49
3.3 Experiment 1: With and Without HRTF .............. 49
3.3.1 Objective ....... ...................... 49
iv
Page
3.3.2 Method ....... ....................... 49
3.3.3 The Model ............................. 50
3A Experiment 2: AAMRL HRTF and RBF HRTF ...... ... 54
3.4.1 Objective ....... ...................... 54
3.4.2 Method ....... ....................... 55
3.4.3 The Model ............................. 56
3.5 Conclusion ........ .......................... 56
IV. Experiment Results .................................... 57
4.1 Introduction .................................. 57
4.2 Types of Error ........ ........................ 57
4.2.1 Algebraic Error ...... ................... 57
4.2.2 Absolute Error ......................... 57
4.2.3 Reversals ....... ...................... 57
4.2.4 Default to 900. ...... .................... .... 58
4.3 Experiment One Results ......................... 58
4.3.1 Further examination of results ............... 66
4.3.2 Summary of Experiment One Results ....... ... 72
4.4 Experiment Two Results ......................... 72
4.4.1 Further examination of results ............... 77
4.4.2 Summary of Experiment Two Results ....... ... 81
V. Conclusions and Recommendations ......................... 84
5.1 Summary ........ ........................... 84
5.2 Conclusions .................................. 84
5.3 Recommendations for Future Research ............... 85
Appendix A. Experiment 1 Data: With and Without HRTF .......... 86
v
Page
Appendix B. Experiment 2 Data: AAMRL HRTF and RBF HRTF. . . 101
Appendix C. Matlab M-files ....... ........................ 116
C.1 angleresponse.m ....................... 116
C.2 anova.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
C .3 drc.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C.4 expm atl.m .......................... 122
C.5 expm at2.m .......................... 123
C .6 idrc.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.7 LRsphere.m ................................. 125
C.8 makesphere.m ................................ 125
C.9 mapsurf.m .................................. 126
C.10 mapeval.m ........ .......................... 127
C.11 mypolar.m .................................. 127
C.12 plotresult.m ................................. 130
C.13 plothist.m .................................. 131
C.14 readall.m ........ ........................... 133
C.15 removerev.m ........ ......................... 135
C.16 sortdata.m ........ .......................... 136
Bibliography ......... ................................... 137
vi
List of Figures
Figure Page
1. Typical presentation forms of data ......................... 1
2. Block diagram of a typical signal processing transfer function . ... 7
3. HRTF transforms sound to aid the brain in location perception . . . 7
4. Elevation, ¢ measured from the ground plane to a sound source ... 8
5. Azimuth, 0 of sound source to directly in front of face ............ 9
6. Distant, d of sound source to center of head ................... 9
7. Geodesic sphere with sound sources at multiple locations .......... 10
8. Multilayer-Perceptron (MLP) neural network with two hidden layers
10 nodes each ......... .............................. 11
9. Radial-Basis-Function (RBF) neural network .................. 12
10. Training function y = cos(27rx) ........................... 13
11. Training function z = cos(27r(x 2 - y2)) ..... ................. 14
12. 272 speaker locations around the head ...................... 16
13. HRTF samples from 0 to 1000 ....... ..................... 17
14. Left ear HRTF data around the head for a frequency of 5623 Hz. .. 18
15. Left ear HRTF data around the left side of head for frequencies 119
Hz to 19,953 Hz ........ ............................. 19
16. Zero elevation HRTF data for the left ear. Frequency and Response
shown in logarithm scale ........ ........................ 20
17. A 3 layer net, 1 input node, 1 hidden node and 1 output node notated
1:1:1 .............................................. 28
18. y = x (solid) and MLP (1:1:1) output (dash) after 100 epochs . . 29
19. y = x (solid) and MLP (1:5:1) output (dash) after 1000 epochs . . 30
20. A 4 layer net, 1 input node, 5 hidden nodes,1 hidden node and 1 output
node notated 1:5:1:1 ........ ........................... 30
21. y = x (solid) and MLP (1:10:1) output (dash) after 1000 epochs . 31
vii
Figure Page
22. y = x (solid) and MLP (1:10:1) output (dash) after 300 epochs . . . 31
23. y z -x 2 (solid) and MLP (1:10:1) output (dash) ............... 32
24. x • -7y 2 (solid) and MLP (1:10:1) output (dash) ............... 33
25. Training function y = cos(x), where x ranged for 0 to 27r with a step
size of 0.01 for a total of 629 points ....... .................. 35
26. Dynamic Range Compression of y = cos(x) to values between 0 and 1 35
27. y = cos(x) (solid) and RBF (1:1:1) output (dash) with 1 center in the
K-means clustering algorithm ............................. 36
28. y = cos(x) (solid) and RBF (1:2:1) output (dash) with 2 centers in the
K-means clustering algorithm ............................. 37
29. y = cos(x) (solid) and RBF (1:9:1, 1:32:1) output (dash) with 9 and
32 centers in the K-means clustering algorithm ................. 37
30. y = cos(x) (solid) and MLP (1:10:10:1) output (dash) after 15, 100
and 220 epochs ........ ............................. -38
31. z = cos(27r(x 2 - y 2)) where x and y ranged from 0 to 1 ........ ... 38
32. z = cos(27r(x 2 - y 2)) where x and y ranged from -1 to 1 ........ .. 39
33. Output of RBF with 8, 64 and 128 centers in the K-means clustering
algorithm ......... ................................. 40
34. RMS error of RBF with 8, 64 and 128 centers in the K-means clustering
algorithm ......... ................................. 40
35. Difference between outputs (left) and RMS errors (right) using 64. and
128 centers in the K-means clustering algorithm ............... 41
36. Symmetric sigmoid used for non-linearity function y = +ex - 1 . . . 41
37. Output of MLP after 220 and 440 epochs with architecture of 2:10:1 42
38. RMS error of MLP after 220 and 440 epochs with architecture of 2:10:1 42
39. Output of MLP after 20, 50 and 100 epochs with architecture of 2:64:1 43
40. RMS error of MLP after 20, 50 and 100 epochs with architecture of
2:64:1 ......... ................................... 43
41. Output of MLP after 20, 220 and 440 epochs with architecture of
2:10:10:1 ......... ................................. 44
viii
Figure Page
42. RMS error of MLP after 20, 220 and 440 epochs with architecture of
2:10:10:1 ......... ................................. 44
43. Zero elevation HRTF data for the left ear. Frequency and Response
shown in logarithm scale ........ ........................ 46
44. Zero elevation MLP (2:50:50:1) after 7235 epochs. Frequency and Re-
sponse shown in logarithm scale ....... .................... 47
45. Zero elevation RBF with 256 centers. Frequency and Response shown
in logarithm scale ........ ............................ 48
46. Zero elevation RMS err for the MLP (left) and RBF (right) ..... ... 48
47. Display provided on a computer screen for selection of the perceived
location ......... .................................. 50
48. Algebraic Error, positive values indicate a clockwise difference and
negative values indicate a counter-clockwise difference ............ 58
49. Reversal, a symmetric flip about the horizontal axis through the ears 59
50. Example of a subject with back-front reversals with HRTF treatment
(left) and a different subject with front-back reversals without HRTF
treatment (right). Notation: x indicates actual location, o indicate
perceived location. The line between the x and o shows the pairwise
groupings. The gradual spacing toward the center of the head is for
clarity only and is not an indication of perceived distance ....... .. 59
51. Example of subject whose perceived locations defaulted to 90' and
2700. Notation same as above ....... ..................... 60
52. Azimuth, 0 of sound source to directly in front of face ............ 61
53. Absolute mean error and standard deviation for sounds with the HRTF
(solid) and without the HRTF (dash). The error bars show the stan-
dard deviation of the mean error ........................... 67
54. Algebraic mean error and standard deviation for sounds with the
HRTF (solid) and without the HRTF (dash). The error bars show
the standard deviation of the mean error ...................... 67
55. Mean and standard deviation for sounds with the HRTF (left) and
without the HRTF (right). The arcs show the standard deviation of
the mean (shown as a *) ................................ 68
ix
Figure Page
56. Polar Histogram of all responses for sounds with the HRTF (solid) and
without the HRTF (dash) ............................... 69
57. Polar Histogram of all responses correct within one button for sounds
with the HRTF (solid) and without the HRTF (dash) ......... ... 71
58. Polar Histogram of reversal corrected responses correct within one but-
ton for sounds with the HRTF (solid) and without the HRTF (dash). 71
59. Absolute mean error and standard deviation for sounds with the AAMRL
HRTF (solid) and RBF HRTF (dash). The error bars show the stan-
dard deviation of the mean error ........................... 77
60. Algebraic mean error and standard deviation for sounds with the
AAMRL HRTF (solid) and RBF HRTF (dash). The error bars show
the standard deviation of the mean error ...................... 78
61. Mean and standard deviation for sounds with the AAMRL HRTF (left)
and RBF HRTF (right). The arcs show the standard deviation of the
mean (shown as a *) ................................... 79
62. Polar Histogram of all responses for sounds with the AAMRL HRTF
(solid) and RBF HRTF (dash) ............................ 80
63. Polar Histogram of all responses correct within one button for sounds
with the AAMRL HRTF (solid) and RBF HRTF (dash) ....... ... 81
64. Polar Histogram of reversal corrected responses correct within one but-
ton for sounds with the AAMRL HRTF (solid) and RBF HRTF (dash) 82
65. All subjects' responses for actual locations, 100, 320, 450, 58', 690 and
820 shown left to right, top to bottom respectively. Notation: x and
solid line indicates with HRTF, o and dashed line indicate without
HRTF. The arcs show the standard deviation of the samples. The *
and + indicates the mean with and without HRTF respectively. . 97
66. All subjects' responses for actual locations, 980, 1110, 1230, 1350, 1480
and 1690 shown left to right, top to bottom respectively. Notation is
same as above ....................................... 98
67. All subjects' responses for actual locations, 1910, 2120, 2250, 2370, 2490
and 262' shown left to right, top to bottom respectively. Notation is
same as above ....................................... 99
x
Figure Page
68. All subjects' responses for actual locations, 2780, 2910, 3030 , 3150, 3280
and 349' shown left to right, top to bottom respectively. Notation is
same as above ....................................... 100
69. All subjects' responses for actual locations, 10', 320, 450, 58', 69' and
82' shown left to right, top to bottom respectively. Notation: x and
solid line indicates with AAMRL HRTF, o and dashed line indicate
RBF HRTF. The arcs show the standard deviation of the samples.
The * and + indicates the mean AAMRL and RBF HRTF respectively. 112
70. All subjects' responses for actual locations, 98', 111', 123', 135', 148'
and 169' shown left to right, top to bottom respectively. Notation is
same as above ....................................... 113
71. All subjects' responses for actual locations, 1910, 2120, 2250, 2370, 2490
and 262' shown left to right, top to bottom respectively. Notation is
same as above ....................................... 114
72. All subjects' responses for actual locations, 278', 291', 303', 3150, 3280
and 3490 shown left to right, top to bottom respectively. Notation is
same as above ....................................... 115
xi
List of TablesTable Page
1. Simple linear function ........ .......................... 28
2. Simple nonlinear function ............................... 32
3. Analysis of Variance for "Reduced three-way model" ............. 55
4. Experiment 1 factors ........ .......................... 60
5. Levels of "Azimuth" ................................... 61
6. Analysis of Variance for "Response" ........................ 62
7. Analysis of Variance for "Response" with reversal corrections . . .. 64
8. Without reversal corrections (nor) ...... ................... 70
9. With reversal corrections (rev) ........................... 70
10. Summary of Experiment One ............................. 72
11. Analysis of Variance for "Response" ........................ 73
12. Analysis of Variance for "Response" with reversal corrections . . .. 75
13. Without reversal corrections (nor) ...... ................... 80
14. With reversal corrections (rev) ........................... 80
15. Summary of Experiment Two ....... ..................... 82
16. Three dimensional graphic commands. Copyright (c) 1984-93 by The
MathWorks, Inc ..................................... 116
xii
AFIT/GE/ENG/94D-21
Abstract
This thesis determines whether an artificial neural network (ANN) can ap-
proximate the Armstrong Aerospace Medical Research Laboratories (AAMRL) head
related transfer functions (HRTF) data obtained from research at AAMRL during
the fall of 1988. In order to test this hypothesis, two separate tests are performed.
The first test determines whether HRTF lends any support in sound localiza-
tion when compared to no HRTF (Interaural Time Delay only). Depending on the
statistical method used, we can conclude that the HRTF does provide an advantage
in localization accuracy over simple ITD. There is, however, a statistically significant
interaction between the location of the sound and whether the HRTF or no HRTF
is used. When this interaction is removed using the alternate F-Value, the statistics
give the result of equal means for the filters and azimuth. This means that at certain
angles of azimuth, the HRTF either provides no advantage at all or hinders local-
ization capabilities. Adding in the corrections for reversals changes the results to
where the means of the azimuth are not statistically equal. The reversal corrections
will inherently reduce the error results. However, comparing the number of reversals
indicates an advantage of using the HRTF over no HRTF.
The second test determines whether HRTF and ANN lend the same amount of
imformation in sound localization when compared to each other. Without consider-
ing reversals, we can conclude that the RBF HRTF provides a statistical advantage
in localization accuracy over the AAMRL HRTF from which they were derived.
However, with reversal corrections included, an advantage is not indicated, and two
filters are statistically equal. Also with reversal corrections included, there is a sta-
tistically significant interaction between the location of the sound and whether the
AAMRL HRTF or ANN HRTF is used. As discussed in the first experiment, this
means that at certain angles of azimuth the HRTF either provides no advantage at
xiii
all or hinders localization capabilities. However comparing the number of reversals
does not indicate a large difference between the AAMRL HRTF and the RBF HRTF.
The final conclusion is that the AAMRL HRTFs can be approximated by an
artificial neural network for azimuth positions at zero degrees elevation and still have
good results.
xiv
Head Related Transfer Function Approximation Using Neural
Networks
1. Introduction
1.1 Background
The visualization of multivariate data is a complex area. There are a number of
ways to help view the data. The conventional approach for the presentation of data
is to present it in tabular or graphical form. Common graphical techniques work fine
for three dimensions or less. For example, figure 1 shows some typical presentation
forms of data. In each case, the presentation utilizes only the human sense of sight.
3D plot Sphere
Plot of sine wave Polar plot1 .901
0.5
00
-0.5 21
-1 ' 2700 5 10
Figure 1. Typical presentation forms of data
The other four senses (auditory, touch, taste and smell) are not exploited. The
presentations in figure 1, as depicted, sound the same. Except for the thickness of
the ink on the page, they feel the same. They also taste the same and smell the same.
Of these four unused senses, the auditory sense has the most practical potential (6).
For example, in a study of sleeping subjects, the subjects responded faster to an
auditory alarm than to the presence of heat or the smell of smoke (10, 22). The
fastest, simple reaction time come from electric shock (130 ms), followed by audio
and tactile (140 ms) and finally visual (180 ms) (22). Sleeping is a case where a
visual display is not appropriate. The following types of circumstances are possible
cases where an auditory display would be preferable to a visual display: (22)
"* When the origin of the signal is itself a sound
"* When the message is simple and short
"* When the message will not be referred to later
"* When the message deals with events in time
"* When warnings are sent or when the message calls for immediate action
"* When continuously changing information of some type is presented, such as
aircraft, radio range, or flight-path information
"* When the visual system is overburdened
"* When speech channels are fully employed (in which case auditory signals such
as tones should be clearly detectable from the speech)
"* When illumination limits use of vision
"* When the receiver moves from one place to another
"* When a verbal response is required
Many systems make use of different audio and visual displays. Sound displays
are not limited to one dimension (monophonic) or two dimensions (stereophonic).
Three dimensional (binaural) sound is where the listener hears the direction and
2
distance of the sound. Begault and Wenzel list some advantages and possible appli-
cation of binaural systems: (3)
"* monitor and identify sources of information from all possible locations
"* improve intelligibility of sources in noise
"* enhance segregation of multiple sources of speech
"* improve air traffic control displays
"* applications in cockpit communications
"* telepresence applications such as teleconferencing
"* shared electronic workspaces
"* telerobotic control
A potential application listed above is in the cockpit (3, 8, 22). Although
sound is already an important tool in the cockpit-only 1D sound is used. The pilot
perceives the sound as coming from his or her earphones or from inside his or her
head. In 3D sound, the pilot perceives the sound as coming not from the earphones or
inside the head, but from a location outside the head at a certain azimuth, elevation
and distance. By using 3D sound, the position of the pilot's wingman can then be
encoded into three dimensions so that when the pilot hears the wingman speak he
or she perceives the wingman's voice as coming from the wingman's relative position
to that of the pilot. In the same way, targets and threats can be presented to the
pilot. Active and non active threats could be distinguished by different pitches. The
distance that the sound is perceived from the head could indicate the urgency of
the data. Immediate action information could be placed close to the head while less
urgent information could be placed farther away. Begault, in an experiment involving
commercial pilots and collision warning systems, found that air crews presented with
3D auditory data acquired targets approximately 2.2 seconds faster than air crews
that were presented only 1D auditory data (2).
3
In another example, data from an Army battlefield simulation was encoded
into sound creating "battlefield songs" of the data (6). For example, in a song with
only three variables: rhythm, tempo and pitch; rhythm may indicate the combatants
of the battle, tempo may indicate the number of troops moving toward the front and
pitch may indicate the number of troops already at the front. Using this scenario,
a battlefield song of one rhythm that is increasing in tempo and pitch and another
rhythm that has steady tempo and pitch would indicate one combatant of the battle
had continuous reinforcements; while the other combatant did not get reinforcements
and was losing troops at the front. In the multidimensional case, listeners had
difficulty hearing the two sides of the battle independently; however, 3D sound may
help separate the songs better by separating the songs to different locations outside
the head (3).
1.2 Problem
This thesis determines whether an artificial neural network (ANN) can approx-
imate the Armstrong Aerospace Medical Research Laboratories (AAMRL) head re-
lated transfer functions (HRTF) data obtained from research at AAMRL during the
fall of 1988 (12). In order to test this hypothesis, two separate tests are performed.
The first test determines whether HRTF lends any support in sound localization
when compared to no HRTF (Interaural Time Delay only). The second test deter-
mines whether AAMRL HRTF and ANN HRTF lend the same amount of support
in sound localization when compared to each other.
1.3 Definitions
This section provides definitions of key terms that will be used in this the-
sis. (25)
Binaural sound is sound that arises from two separate audio signals-one at
each ear. In a natural environment, humans hear binaural sound. The signal that
4
arrives at the left ear is characteristically different from the signal at the right ear.
The brain uses the differences between the two signals to determine the direction
and distance of the source. Binaural sound is usually recorded with a manikin as
opposed to stereo sounds which are recorded with without a manikin. The left ear
signal is recorded separately from the the right ear signal. When sound is played
through headphones, the sound is binaural if the signal sent to the left headphone is
separate from the signal sent to the right (23:3).
Binaural mixing console is an electronic device that contains filters which con-
vert a monophonic signal to a binaural signal corresponding to a given distance and
direction (15:208).
Telepresence is a human's perception of being present in a natural environment
while actually being present in an artificial or virtual environment.
Virtual audio is synthetically produced audio signals which enable a listener
to achieve auditory telepresence.
Extracranialized sound is sound that is presented to a listener in the form of a
binaural signal through headphones and is perceived as coming from some distance
from the listener (i.e. outside the head).
Pinna(e) is(are) the human outer ear(s). The design of each person's pinnae
is unique and each set of pinnae will uniquely filter sounds. In fact, the human head
and ears form an antenna system characteristic to the individual (24). Experiments
have shown that spectral shaping by the pinnae is dependent upon direction and cues
provided by the pinnae. These are critical in extracranializing the sound (30:84).
Head related transfer function (HRTF) is the transfer function which accounts
for the filtering effects of the pinnae. Because the filtering of sound by the pinnae is
dependent upon direction, there is a different HRTF for each angle of azimuth and
elevation. HRTF's will differ for each persons pinnae; however, these differences are
5
relatively small. Accurate localization of sound sources can be made using "someone
else's pinnae" by presenting sounds filters with another persons HRTF's (31).
Auditory Localization Cues are cues that change the sound as a function of
sound source and receiver location. There are four primary types of cues, 1) Monaural
Temporal Cues, 2) Monaural Spectral Cues, 3) Binaural Temporal Cues, and 4)
Binaural Spectral Cues. Spectral cues affect the spectrum of the sound that reaches
the ear, while temporal cues affect only the arrival time of the sound signal (31)
1.4 Research Objectives
In order to solve the stated problem the following research objective will be
met:
" Modify AFIT algorithms that have been used successfully on placing sounds in
azimuth and elevation. The algorithms that create 3D sound with the AAMRL
HRTF will be changed to use the ANN HRTF instead. This will allow the
creation of tests to compare the ANN HRTF with the AAMRL HRTF.
" Statistically check the advantage of HRTF and ITD versus ITD only. The
results of the first experiment will provide a significantly large data base to
analyze.
"* Statistically check the difference between AAMRL HRTF data and ANN HRTF
data. The results of the second experiment will provide a significantly large
data base to analyze.
"* Investigate the program LNKmap as a neural network tool for function ap-
proximation.
1.5 Scope
This thesis will process and analyze experimental data to obtain results of the
two experiments. The differences in sound localization accuracies from the HRTF
6
versus no HRTF experiment and the AAMRL HRTF versus ANN HRTF will both be
analyzed by the analysis of variance method (14). To limit the amount of data to be
processed only the 24 azimuth location at zero elevation will be analyzed. This will
facilitate the Neural Network learning time and the amount of data to be analyzed.
1.6 Approach
1.6.1 Synthesizing 3D Cues. Humans use auditory localization cues to
spatially localize aound. Three primary characterists of the auditory signals are
attenuation, time of arrival (TOA) and HRTF. As the sound travels to the body,
obstacles such as walls, furniture or even air attenuate the sound source. If the head
is not directly facing the sound wave, the relative distance from each ear to the sound
source will be different. This gives the brain the TOA cue. Once the sound reaches
the body, the head, the torso, the nose and the outer ear or pinna filter the sound
wave. This filtering function is known as the HRTF. The HRTF is analogous to an
analog signal processing filter (figure 2). As in a analog filter where an input signal is
Input signal -. H Output signal
Figure 2. Block diagram of a typical signal processing transfer function
transformed by a system function, H to an output signal, the HRTF transformation
of the sound appears to facilitate the brain in localization perception (figure 3).
The factors which affect. the HRTF include distance (d), elevation (q), azimuth (0),
HRTFInput Head D Output
Sound Source Torso Inner Ear and Brain
Pinnae
Figure 3. HRTF transforms sound to aid the brain in location perception
frequency (w), torso and head size. The frequency of the sound plays an important
-7
role in determining the HRTF. Although the HRTF varies for frequency and for each
point in space, the torso and head size can be generalized from person to person (3).
For example, the HRTF for a sound located at 300 elevation, 200 azimuth, 10 feet
and 440 Hz will be different from the HRTF for a sound located at 300 elevation,
200 azimuth, 10 feet and 1000 Hz. However, the sound transformed by the same
HRTF will generally be perceived as coming from the same direction by people with
varying size heads and torsos.
The elevation (0) of the sound is measured from the ground plane through the
ears to a sound source (figure 4). The azimuth (0) of the sound is measured from
0
Figure 4. Elevation, 0 measured from the ground plane to a sound source
directly in front of the face counter-clockwise to the sound source (figure 5). The
distance (d) of the sound is measured from the center of the head to the sound source
(figure 6).
All of these parameters can be measured in a sphere (figure 7) to determine
the HRTF for any location and frequency. To measure the AAMRL HRTF, an
anatomically correct manikin bust is positioned at the center of the anechoic chamber
sphere with microphones placed inside the ear canals. Pure sine waves produced by
the speakers are recorded with the ear microphones and used to determine the HRTF.
The sine wave frequency is held constant until that particular HRTF is determined
and then it is incremented for the next HRTF sample. The speaker location contains
8
0 0
Figure 5. Azimuth, 9 of sound source to directly in front of face
0 0
Figure 6. Distant, d of sound source to center of head
9
Figure 7. Geodesic sphere with sound sources at multiple locations(23:8)
the azimuth, elevation and distance information. Smith gives the location in azimuth
and elevation of 272 speakers used to determine the HRTF by the above method
(25). Once all the HRTFs are calculated, they can be used to filter any sound and
its reflections. When attenuation and TOA are combined with the filtered sound,
any recorded sound can be replayed to the listener through headphones to give the
impression of a sound located at a distinct direction from outside the head.
1.6.2 Artificial Neural Networks. Current models of producing 3D sound
from the HRTF require lengthy computations that cannot be accomplished in real
time with a general purpose computer (26). Although two special purpose computers
have been developed, the DIRAD and Convolvotron, this thesis will investigate the
use of artificial neural networks to approximate the HRTF (23). Along with approx-
imating the HRTFs, the networks will provide easy interpolation between the known
HRTF data points. They may also provide insight into the underlying function of the
HRTF. In other words, the networks may show that there is a less complex function
for the HRTF than what is represented by the set of AAMRL HRTF data.
10
1.6.2.1 Multilayer Perceptron. Two types of artificial neural networks
will be studied. The first is the Multilayer Perceptron (MLP). It has been shown by
Cybenko that the MLP can approximate any arbitrary function (20). Figure 8 shows
a three layer MLP (3:10:10:1). The input layer has three nodes corresponding to
Azimuth
Elevation -•
Frequency
Figure 8. Multilayer-Perceptron (MLP) neural network with two hidden layers 10nodes each
azimuth, elevation and frequency. The next two layers or hidden layers have 10 nodes
each called hidden nodes. The last layer or output layer has one node corresponding
to the HRTF. The number of nodes in the hidden layers are arbitrary, however,
Neti, Young and Schneider found that a MLP network with less than 10 nodes
in the hidden layers was sufficient for output direction from input sound (HRTF)
(16). Although this thesis will reverse the inputs and outputs, input direction and
output HRTF, it seems reasonable to start the investigation with a similar network
architecture because it will basically be an inverse transformation of the Neti, Young
and Schneider MLP.
11
The MLP will be trained with HRTF data from Smith's thesis (25). The data
contain 25,296 HRTF taken from 272 speaker locations at 93 different frequency. The
speaker locations range from 0' to 3600 in azimuth and -90o to 900 in elevation. The
frequencies ranges from 100 Hz to 20,000 Hz. A draw back of the MLP is the time
required to train the network. Three layers of weights have to be trained. Therefore,
the second network to be studied is the Radial-Basis-Function (RBF) artificial neural
network.
1.6.2.2 Radial-Basis-Function. RBF networks are broad enough for
universal approximation (19). The RBF is a two layer network. Figure 9 shows a
RBF network with 8 nodes in the hidden layer. Again, the input layer has three
Centers
Azimuth
Elevation HRTF
Frequency
Figure 9. Radial-Basis-Function (RBF) neural network
nodes corresponding to azimuth, elevation and frequency. The next layer or hidden
layer has 8 nodes corresponding to the centroids of the clustering algorithm. The
last layer or output layer has one node corresponding to the HRTF output. The
RBF will be trained with the same data as the MLP above.
12
1.6.3 LNKmap. The computer program LNKmap allows the user to train a
number of different artificial neural networks on a SUN workstation. Both the MLP
and RBF networks are available on this system. Since the HRTF is a complicated
function to approximate, bugs in the LNKmap program would be hard to spot.
Therefore, relatively simple functions were tested first to verify the inner workings
of the LNKmap program. For example, a one input, one output function like y =
cos(27rx) (figure 10) and a two input, one output function like z = cos(27r(x 2 - y2))
(figure 11) were trained and analyzed. From the training of these simple functions,
0.8
0.6
0.4
0.2.5_
-0.2
-0.4
-0.6
--0.8
0 1 2 3 4 5 6 7x-axis
Figure 10. Training function y = cos(27rx)
insight into the workings of LNKmap was gained. If LNKmap could not have been
trained on one of these simple functions it would not have been able to be trained a
complicated function like the HRTF. If this was the case, other programs would have
been investigated to train the networks. Once a program was successfully trained
on the simple functions, the process of training on the HRTF data was began.
13
0.5. ,;ii•
01-0.,5
0.8 ..
0.6 0.80.4 0.6
0.2 0.2 0.4
y-axis 0 0 x-axis
Figure 11. Training function z = cos(21r(x 2 - y2))
1.7 Thesis Outline
Chapter one is an introduction to this thesis. It presents a brief background
of auditory displays which leads into the problem statement for this thesis. After
that it defines key terms and research objectives. Finally it presents an approach to
synthesizing 3D cues and creating articficial neural networks.
Chapter two continues the discussion of visualizing data that was introduced
in chapter one. It also contains a review of past research in sound localization.
Specifically, the work of Oldfield and Parker, and Begault and Wenzel is presented
in great detail because of their direct correlation with this thesis. This chapter
finally presents a historical record of the learning of the program LNKmap and the
background needed to train an ANN with the HRTF data.
The previous chapters explored the background of 3D sound and ANN. Chapter
three will combine these two concepts to train a ANN to approximate the 3D sound
HRTF. This chapter presents the methodology used in training a MLP and RBF
14
ANN. It also presents the objectives and methodology of two experiments. Finally,
it presents a detailed analysis of variance model used to analyze the experimental
results.
Chapter four presents the results of the two experiments conducted during
this thesis work. Before presenting the results of each experiment a discussion of the
types of errors analyzed in the experiments is presented.
Chapter five presents the summary, conclusions and recommendations for fu-
ture research.
15
II. Background
2.1 Introduction
The visualization of the data is important to help determine its characteristics.
Viewing the HRTF data is a classic case of how to visualize multidimensional data.
The HRTF data consists of 25,296 data points. Broken down, this consists of 272
azimuth and elevation locations: 0' to 350' in azimuth and -82' to 820 in elevation.
The azimuths and elevations are spaced so that they are approximately 150 apart over
the whole sphere (see figure 12). For each azimuth and elevation pair, 93 frequencies
".......... ........
.• -- .0" ............................
_-0 -- ---------_6-1
• - ,' 0 a •. ' .
----------- I............ ..................... - - . . .04 0
. . . .. ... .. .. . ........... O. ----------------- ..... O ................ .0 .. . .------.. ! ..... .. . .S-. . 0 0. -.
o~o • o- o• ! o . .... o.-. .... .- o .. ..40:
0 - o 0 0 0 0:9
00 0 000 0..0
"0 .o0: 0
0,0 0 0 0
O,09 0 .0 O _ 0... :... ..... .... •.'-..0 0,OL00
-. - 0 ---- - - - -
Figure 12. 272 speaker locations around the head
ranging from 100 to 19,952 Hz are used allowing the HRTF data to consist of four
axes of data: azimuth, elevation, frequency and response. Figure 13 shows the first
1000 HRTF samples broken down into the four axes. Each line represents the value
of the axis for that sample. For example, the 5 0 0 th sample approximately has a
response of 0.05, an elevation of 70', an azimuth of 1000 and a frequency of 1000 Hz.
16
10, 1
10
102~ ~ ~ ~ ~ ~ ~ ~ ~~ ~~~. ....... •,.........' ....... ;.......•
10 2
.............. : Azimuth
Elevation----------Frequency
100- Response
10'1
10.2
0 100 200 300 400 500 600 700 800 900 1000sample
Figure 13. HRTF samples from 0 to 1000
2.2 Graphical Display
Matlab is a powerful tool to help visualize the data. One approach to visual-
izing 4 dimensions is to set the first three dimensions to the x, y, and z-axis and the
4th dimension to a set color. Matlab has a number of commands to do this.
By using the sphere command, a sphere around the head can be color coded
with the HRTF value. The grey scale level indicates the magnitude of the HRTF.
This display uses only the azimuth, elevation and response data. A separate sphere
is needed for each frequency. Figure 14 shows left ear HRTF data around the head
for a frequency of 5623 Hz. Note the lighter shades of grey on the left side of the
head versus the right side of the head. This is because the left ear has a direct path
from the sound source. Figure 15 shows most of the frequencies for the left side of
the head only. The frequency increases from 119 Hz to 19,953 Hz. These two figures
show graphically that the HRTF magnitude varies both as a function of location and
frequency.
Similar to above where each plot consisted of a single frequency, figure 16
consists of a single elevation. The zero elevation data consists of 24 azimuth points
17
Right side of head Left side of head
Figure 14. Left ear HRTF data around the head for a frequency of 5623 Hz.
with 93 frequencies each. For a total of 2232 data points. The response has a number
of peaks and ridges with a high point localized slightly in front of the left ear at 3000
Hz. Both this view and the spherical view above have a limitations in the amount of
data that can be presented graphically. The following sections present a discussion
of the presentation of data.
2.3 Graphical and Auditory Displays
Graphical and auditory presentations are similar. When data varies with time,
graphical presentation such as animation can provide a powerful analysis tool of
graphical data (6). In the same sense, sound must be presented in time to be heard.
If the length of the sound is too short its pitch, tempo, rhythm, and other attributes
cannot be distinguished. Graphical symbols, known as icons, convey information
in a small amount of space. On many computers with graphical interfaces, icons
represent files, directories or functions. For example, a picture of a sheet of paper can
represent a text document or a picture of a trash can represent a delete file function.
"Earcons," the audio counterpart of icons, similarly convey information (4). The
18
119 126 133 141 150 158 168 178 188 200 211 224 237 251 266
282 299 316 335 355 376 398 422 447 473 501 531 562 596 631
668 708 750 794 841 891 944 1000 1059 1122 1188 1259 1334 1413 1496
1585 1679 1778 1884 1995 2113 2239 2371 2512 2661 2818 2985 3162 3350 3548
3758 3981 4217 4467 4732 5012 5309 5623 5957 6310 6683 7079 7499 7943 8414
0 9 1 1iO01 0 0OOs0 008913 9441 10000 10592 11220 11885 12589 13335 14125 14962 15849 16788 17783 18836 19953
Figure 15. Left ear HRTF data around the left side of head for frequencies 119 Hzto 19,953 Hz
19
10
10
10
10-
Frequency
Figure 16. Zero elevation HRTF data for the left ear. Frequency and Responseshown in logarithm scale
20
beep of the computer when it displays a warning message is an earcon. The two
tones of a door bell is an earcon conveying the message that someone is at the door.
In graphical presentation, color is often used as the fourth dimension when presenting
4D data. Sound resolution is similar to color resolution (6). Both the frequency of
pitch and the frequency of color are logarithmic. (blue 435.8 nm, green 546.1 nm and
red 700 nm) (9) Octaves provide expression of logarithmic variance (6). On the other
hand, sound is different from a graphic display. As discussed above, the graphical
presentation of multivariate data is commonly limited to three or four dimensions. In
contrast, the human ear can discriminate between any two of 400,000 different sounds
presented in rapid succession. In addition, most people can identify 49 different
sounds at one time (6). The different sounds can be used to represent different
dimensions of data. For example, pitch, volume, duration, attack, fundamental wave
shape and fifth harmonic wave shape can be used to represent 6 different dimensions.
Yeung used nine to twenty dimensions of sound (6). Matthew used a combination
of auditory and visual presentations of data (6). Matthew had five dimensions, x
and y visually and frequency, timbre and amplitude phonically. Smith, Bergeron
and Grinstein also combined auditory and visual presentations of data (26). In their
scheme, the auditory data representation is triggered by the position of the mouse
on the visual display. Most of the past research has been with 1D and 2D sound.
Although Bly did not use 3D sound in her thesis, she suggests using 3D sound for
the visualization of three dimensional data (6). "Several sound characteristics were
not used and these deserve attention in the future. Certainly location is an easily
detectable attribute of sound." (6:40)
2.4 Acuity of Sound Localization I
In an effort to investigate the acuity of sound localization, Oldfield and Parker
conducted a study of 8 subjects who judge the apparent spatial location of white
noise through speakers over a range -40' to +400 in elevation and 00 to 180' in
21
azimuth (17). There are two major binaural cues: Interaural Time Difference (ITD)
and Interaural Intensity Difference (lID).
2.4.1 Interaural Time Difference.
"* Interaural Time Difference is a function of the distance between the two ears,
the angle of incidence of the incoming sound, and the frequency of the sound (25).
"* ITD is frequency independent below approximately 500 Hz and above approx-
imately 3000 Hz; furthermore, between these frequencies ITD decreases as fre-
quency increases. For azimuth angles of incidence less than 60 degrees left or
right of the front of the listener (0 degrees), the minimum ITD occurs between
1400 and 1600 Hz. (25, 11:165-166).
"• The most significant changes in ITD occur at angles of incidence between 0
and 30 degrees, consistent with the fact that humans perceive a sound best
when the source is directly in front of the listener (25, 1:700).
2.4.2 Interaural Intensity Difference.
"* Interaural Intensity Difference is a function of the distance between the two
ears, the angle of incidence of the incoming sound, and the frequency of the
sound. IID can also be affected by size and shape of the torso, head, and ears.
The torso affects primarily the low frequencies (25).
"• The most significant changes in IID occur at angles of incidence between 0 and
30 degrees, consistent with the fact that humans perceive a sound best when
the source is directly in front of the listener (25, 1:700).
"* Improvement in localization capabilities of humans at frequencies above 3000
Hz is a result of improvement in the IID cue. (25, 11:166).
"* At frequencies above 8000 Hz, IID cues increase human capabilities to localize
in elevation as well as in azimuth (25, 13:101).
22
From these two cues, there is a range of positions which could account for
the same pattern of interaural differences. In a simplified model of the head, these
positions form a cone. Watkins calls it the "cone of confusion" (29). Oldfield and
Parker presented a second set of cues which they describe as spectral modifications
by the pinna and head. These spectral modifications are what Begault and Wenzel
call the head related transfer function (HRTF) (3). Past studies show evidence of
non-binaural cues (HRTFs) (17). These studies use sound sources in the median
sagittal plane. The median sagittal plane divides the body into left and right sides.
Other studies blocked the pinnae or low pass filtered the sound sources and confirmed
that localization ability is lost. Few studies before 1984 use sound sources outside
the median sagittal plane (17).
2.4.3 Experiment 1. The Oldfield and Parker experiment was conducted
in an anechoic chamber for frequencies down to 250 Hz. The apparatus for the
experiment consisted of a boom for positioning the sound source, a photographic
recording system and pointing gun. The subject's head was held stationary. The
importance of head motion in human auditory localization cannot be overstated.
Without head motion, a person's head and ears can still act as a directional antenna;
however, sounds located at 0 degrees azimuth and various elevations (i.e. on the
medial sagittal plane) can be extremely difficult to localize. Front/back and up/down
confusions result when head motion is not allowed; this occurs because the spectral
and temporal cues are very similar at both ears (31, 5, 25). The implementation
of head motion into a binaural room simulation requires that a real-time system be
used (23). For this reason, head motion was not implemented into the non-real-time
simulation done for this thesis.
The sound was white noise which the subject could activate through a hand-
held switch. The subject used a pointing gun to "shoot" the sound. The position
was recorded with the photographic recording system. The advantage of pointing is
that the response is not limited to preset locations. The disadvantage of pointing is
23
the limitation of the accuracy with which the hand can be positioned. Fitts found
that blind-positioning movements are most accurate in the dead-ahead, center and
lower tier positions and least accurate at the side and upper tier positions (22, 7) To
avoid pointing inaccuracies in this thesis, the sound locations were fixed and subjects
were required to pick preset locations.
2.4.4 Experiment 2. To validate the pointing technique, Oldfield and
Parker conducted a second experiment to the measure the motor skills of the sub-
jects. The apparatus and procedures were the same as experiment 1 except the
subjects were not blindfolded and the chin brace was loosely fitted. The subjects
were allowed to see the speaker position before the sound was presented. Next the
room was completely darken and the sound presented for the subject to "shoot."
They concluded that the biases in the pointing response did not affect the overall
outcome of the results. Reversals were never observed in the visual task (17).
2.4.4.1 Results. Azimuth error and elevation error were measured.
The absolute and algebraic form of these were computed. Absolute error measured
general acuity and algebraic error measured directional bias. For algebraic error,
positions perceived higher and behind were termed positive errors (actual position
minus perceived position). Front/back reversal were analyzed separately: "a straight
subtraction of the actual and perceived sound position has not been regarded by re-
searchers as an honest representation of error (17, 28:587)." This thesis also analyzed
reversals.
There were few front/back reversals. Most occurred in the upper back quad-
rant. One exception to this was the region near 90' azimuth where the reversals were
uniform over elevation. Another kind of error was what Oldfield and Parker called
"defaults to 90"'. Like reversals the defaults occurred in the upper back quadrant.
Defaults tended to boost the error in the upper back quadrant, whereas reversals
and non-reversals had the same amount of error.
24
2.4.5 Discussion. The results support the model of the cone-of-confusion
in the case of elevation error biases effecting azimuth error biases. Both azimuth
and elevation discrimination is less accurate behind the head. Oldfield and Parker
suggest that sound comings from behind the head do not directionally interact with
the convolutions of the pinna. Front/back discrimination derives from the pinna
when the head is held still. The higher number of reversal in the back region is
consistent with a reduction in pinna cues. Binaural differences cannot be separated
from pinna modification because they are in the signal before any binaural differences
can be processed by the brain.
2.5 Acuity of Sound Localization H
Oldfield and Parker also conducted a study of 4 subjects who judged the ap-
parent spatial location of white noise through speakers over a range -40' to +400
in elevation and 0' to 180' in azimuth (18). In this experiment the pinna of the
subjects were blocked to simulate the effect of localization without the pinna. This
experiment has a direct correlation with the first experiment conducted in this thesis
where the hypothesis of no HRTF is similar to no pinna. The implication is that
the determination of azimuth and elevation depends on more than just the binaural
differences from the sound source. An additional cue is the spectral modification
made by the pinna.
2.5.1 Experiment. The data collection apparatus was the same as described
above in part I of the study (17). Each subject was fitted with individually cast
molds for the pinna. An access hole remained to the auditory canal. Subjects were
presented 171 different speaker locations while blindfolded and head help stationary.
2.5.2 Results. The Oldfield and Parker study presented the relationship
between azimuth error and elevation error. This thesis presents only azimuth error.
With the pinna blocked the overall absolute elevation error was almost double that
25
of azimuth error (11.90 and 21.9' respectively). The largest differences between
absolute azimuth and elevation errors occurred in the front quadrant. The absolute
azimuth error was largest between 120' and 160' azimuth position. Azimuth error
is reasonably uniform and elevation error is less uniform with larger errors near the
extremes. The mean algebraic elevation and mean azimuth errors were generally
positive in the front and negative in the back. The subjects perceived the sounds in
the front pulled higher and toward the ears. The subjects also perceived the sounds
in the back pulled lower and toward the ears. Filling the convolutions of the pinna
has the effect of displacing elevation judgements toward 00.
A comparison of the results of the pinna filled versus unfilled (normal) showed
that there was a increase in elevation error in the pinna-filled case (filled 21.90 versus
normal 8.40). The azimuth error also increase in the pinna-filled case, but not as
great (filled 11.90 versus normal 9.30). The mean absolute azimuth error for the
filled-pinna was generally lower than the normal pinna except to azimuth positions
1400 to 160'. The error did not change significantly by azimuth position, however it
did change significantly by elevation position.
A number of subjects made front/back reversals. Reversals accounted for 26%
of the responses: 31% were front/back reversals and 21% were back/front reversals.
These results are compared with the findings in this thesis in chapter 4.
2.6 Headphone Localization of Speech
A study of 11 inexperienced subjects who judge the apparent spatial location
of headphone-presented speech stimuli filtered with non-individualized head-related
transfer functions (HRTFs) was conducted by Begault and Wenzel. (3). This is in
contrast with the Oldfield and Parker study who used experienced subjects listening
to white noise from an actual sound source in three dimensional space.
HRTFs are "the listener-specific, direction-dependent acoustic effects imposed
on an incoming signal by the pinnae (3:361-362)." It should be noted that not only
26
the pinnae, but the head and torso effect the incoming signal also. Note also that
this definition says "listener-specific". Although HRTF are "listener-specific", they
generalize to other people. The main thrust of the Begault and Wenzel study is
to investigate how non-individualized HRTFs work. The HRTFs used are from the
pinnae of a good localizer (or as Dr Rogers' would say a person with a "golden ear").
The study decides against simple averages of HRTFs to avoid possibly elimination
of distinctive spectral features. This thesis used AAMRL HRTFs (12).
Eleven adults were used in the Begault and Wenzel study. They were screened
by oral questions about hearing loss, recent exposure to loud noises and work en-
vironment. There is no apparent relation between audiometric measurements and
binaural performance (3). Therefore this thesis will forego the audiometric measure-
ment of subjects.
The sounds used in the study were from a set of 45 one- or two-syllable words
representing a particular phoneme. The sounds were filtered with the HRTFs to
simulate sounds coming from 0' elevation and 0', ±30', ±-60', ±90', ±120', ±150'
and 180' azimuth. Similarly, this thesis used the male voice utterance "seventeen"
presented at 0' elevation and 24 azimuth locations. Begault and Wenzel conclude
that most listeners can obtain useful directional information from speech without
requiring individual HRTFs. Taking this result a step further, this thesis investigated
the hypothesis of obtaining useful directional information from speech filtered with
a neural network trained HRTF.
2.7 Learning LNKmap with a simple function
2.7.1 Linear function. During the spring of 1994, the program LNKnet
was gaining popularity at AFIT as a neural network classifying tool. LNKnet's
companion program for function approximation (LNKmap) had not yet been fully
investigated. To begin the exploration of LNKmap, a simple function i.e. y = x was
inputed into LNKmap.
27
Since LNKmap was similar to LNKnet the file structure was assumed to be
the same. Therefore the input data file should be in two columns with the output
values in the first column and the input values in the second column. A small file
was used starting with 10 points between zero and one (see table 1). Since the
y x.1 .1.2 .2.3 .3.4 .4.5 .5.6 .6.7 .7.8 .8.9 .9
1.0 1.0
Table 1. Simple linear function
function is linear, a linear output node function was used. Most network users call
the "output node function" a "non-linearity function". A common starting point is
a 3 layer multi-layer perceptron (MLP), 1 input node, 1 hidden node and 1 output
node notated 1:1:1 (figure 17). Some researchers refer to this architecture as a 2
InPut Node Hidden Node Output Node
Figure 17. A 3 layer net, 1 input node, 1 hidden node and 1 output node notated1:1:1.
layer net referring to the number of weight layers. This thesis will try to use the
colon notation where ever a confusion may exist. Many classifiers present the data
in random order, however for function approximation it is best to present the data
in order (21). With the training set described above the net learned the function in
about 100 epochs.
28
0.9-
0.8-
0.7
M 0.6
0.4-
0.3
0.2 - original-- netoutput
0 ,1 0:2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x-axis
Figure 18. y = x (solid) and MLP (1:1:1) output (dash) after 100 epochs
Next a standard sigmoid on the same data was tried while also increasing the
number of hidden nodes to 5 (1:5:1). The net learned the function in about 1000
epochs (see figure 19). In an effort to get a lower error a second hidden layer with
one node (1:5:1:1) was added. This architecture is shown in figure 20. After a 1000
epoch, the net had not learned the function. More nodes to the second hidden layer
(1:5:5:1) were added. After another 1000 epoch, the net had not learned the function.
Adding more nodes seems to slow the learning process therefore one hidden
layer was used in subsequent trials. With an architecture of 1:10:1 and a standard
sigmoid output node function, the net learned the function (see figure 21). With the
same architecture and using the symmetric sigmoid output node function, the net
learned the function with an even lower error (see figure 22).
2.7.2 Non-linear functions. At this point, some nonlinear data was tried.
The 10 point data was modified to be low near the end points and higher in the
middle (y - -X2, see table 2 and figure 23). The LNKmap plot of the data
approximated the function x - -y 2 (see figure 24). This indicated that the columns
of the data file were reversed. This discovery uncovered that LNKmap needed the
29
0.9-
0.8-
0.7--
0.5-6 /
0.4-
0.3
0.2 - - oi a0. b.- "I•"0
.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x-axis
Figure 19. y x (solid) and MLP (1:5:1) output (dash) after 1000 epochs
Input Node Hidden Node Hidden Node Output Node
Figure 20. A 4 layer net, 1 input node, 5 hidden nodes,1 hidden node and 1 outputnode notated 1:5:1:1.
30
0.8 -
0.7-
. 0.6/
'~0.5-
0.4-.
0.3-/
0.32-- original
0.2 -netoutput
1 0.2 0.3 0.4 015 0.6 0.7 018 0.9 1
x-axis
Figure 21. y = x (solid) and MLP (1:10:1) output (dash) after 1000 epochs
0.9
0.8
0.7
0.6-
0.5 -
0.4-
0.3-
0.2 -,- oiia
0.1 , -- netoutput
l. 1 012 013 014 0'5 016 017 0.8 0.9x-axls
Figure 22. y = x (solid) and MLP (1:10:1) output (dash) after 300 epochs
31
y x
.1 .1
.4 .2
.6 .3
.9 .41.0 .51.0 .6.8 .7.5 .8.3 .9.1 1.0
Table 2. Simple nonlinear function
0.9-
0.8
0.7
•0.6-- -
0.5-
0.4
0.3-
0.2-- oginal0.2
- - netoutput
1 0.2 0'3 0'4 0.5 0.6 0.7 0.8 0.9
x-axis
Figure 23. y - -x 2 (solid) and MLP (1:10:1) output (dash)
32
0.9
0.8
0.7-
S0.6 - - -_
08.5
0.4
0.3
.2 - - netoutput0.2
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x-axis
Figure 24. x _ -y 2 (solid) and MLP (1:10:1) output (dash)
data presented in a ASCII matrix where the first columns were the input parameters
and the last columns were the output parameters. This is contrary to LNKnet where
the first columns are the class. A correction to the columns was made and training
of a number of net configuration was resumed. The net could only match a straight
line to the data which contributed to the low number of data points.
2.8 Learning LNKmap with a Cosine function
After determining the correct file format, a trigonometric function was tested.
The function to be trained was y = cos(27rx), where x ranged from 0 to 1 with
a step size of 1 or 1 for a total of 629 points. Using the same methodology
as before, starting with a net architecture of 1:1:1 and linear output node function
many configurations were tried settling on two hidden layers with 10 nodes each
(1:10:10:1). Through all these trials, the current experiment was always continued
and randomly presented the data. Although, as mentioned above, the data should
be presented in order, the strategy at this point in the investigation of LNKmap was
to follow the same procedures as LNKnet. The results of the training was that the
net learned the first half of the cosine curve and then proceeded asymptotically to a
33
line parallel to the x axis for the second half of the curve. The approximate number
of epochs at this point was 10,000 to 20,000.
At this point it was concluded that random presentation of the data was not
working and the data was presented in the order of the input file (x increasing from
zero to one). The net learned the function.
In order to duplicate the results, Capt Smith used the same cosine training
file. He trained the net for 3000 epochs using random presentation of the data. The
net learned the first half as above. Next he continued to train the net to present the
data in order. The net learned the whole cosine function. Unfortunately, plots of
this data are not available because the LNKmap program has since been changed as
discussed below.
2.8.1 Cosine Revisited. From early training attempts with LNKmap, it
was determined that LNKmap did not correctly train on data with more than one
input. The author of LNKmap was contacted and a fix was obtained. After this fix,
the cosine function was retrained using both a RBF and a MLP architecture. This
time the function trained was y = cos(x), where x ranged for 0 to 27r with a step
size of 0.01 for a total of 629 points (see figure 25). Before training, the data was
normalized to values between 0 and 1 using a Dynamic Range Compression (DRC)
algorithm (see figure 26). Given data in a matrix where each column represents a
variable, the DRC subtracts the minimum value for each column from each value
in the column and divides by the maximum value of the difference for each column.
The Matlab code can be seen in appendix C.
2.8.1.1 RBF Architecture. The first architecture used for training
was a Radial Basis Function (RBF) using a K-means clustering algorithm. The RBF
was trained using 1, 2, 8, 9 and 32 centers in the K-means clustering. Figure 27 shows
the result with 1 center in the K-means clustering. Figure 28 shows the result with
2 centers in the K-means clustering. Since the cosine function in the range 0 to 27r
34
0.8
0.6
0.4
0.2
0-
-0.2
-0.4
-0.6
-0.8
0 1 2 3 4 5 6 7x-axis
Figure 25. Training function y = cos(x), where x ranged for 0 to 2ir with a stepsize of 0.01 for a total of 629 points
0.9
0.8
0.7
0.6-
0.5
0.4
0.3
0.2
0.1
C0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x-axis
Figure 26. Dynamic Range Compression of y = cos(x) to values between 0 and 1
35
is essentially unimodal only one clustering center is needed. Forcing more than one
center increases the error. The LNKmap clustering algorithm does not place centers
in the same place. A different clustering algorithm may have placed the centers in
the same place. Figure 29 shows that adding more centers will reduce the error, but
at the expense of a larger network.
1.5
1 \ /
0.5/
0 /
-0.5 • ,, [- original
- - netoutput
-10.5
0 1 2 3 4 5 6 7x-axis
Figure 27. y = cos(x) (solid) and RBF (1:1:1) output (dash) with 1 center in theK-means clustering algorithm
2.8.1.2 MLP Architecture. Next a MLP architecture of 1:10:10:1
was chosen to match the best results from the previous cosine training (before the
software fix). The MLP'learned the cosine in only 220 epochs. Figure 30 shows the
results after training for 15, 100 and 220 epochs respectively.
2.8.2 Conclusion. It has not been determined why the network needed the
data presented first randomly and then in order to learn the function. Presenting the
data only randomly or only in order did not produce satisfactory results. The obvious
answer is that a bug was in the program. After the fix the program performed as
36
0.8-
0.6
0.40.2 "'\,
-0.2
-0.4-
-0.6 - original
-- netoutput-0.8
0 1 2 3 4 5 6 7x-axis
Figure 28. y = cos(x) (solid) and RBF (1:2:1) output (dash) with 2 centers in theK-means clustering algorithm
1.5 1.5,
l i II
0.5 .
9 0 0 • O
I It 1 1% 1
1 I -11 I: 1 7 ,I
-15ýSo -1.56"05 3 4 5 150 1 2 4 5 7
x.-a•d x-axis
Figure 29. y =cos(x) (solid) and RBF (1:9:1, 1:32:1) output (dash) with 9 and 32
centers in the K-means clustering algorithm
37
S/ /0.6 -,• 0.61 0.6
0.4 , 0.4
0.2 - 02 / 0
0 10-02 -02 -0.5
-0.4 -0.4
-0.8 ..8S
0 1 2 3 4 5 6 7 0 I 2 3 4 5 6 7 '0 2 3 4 5 6 7X-fi i.-0i
Figure 30. y = cos(x) (solid) and MLP (1:10:10:1) output (dash) after 15, 100 and220 epochs
expected. Meaning when the data was presented in order the network learned the
function faster. Regardless, the major effort of this investigation was to learn how
to use LNKmap as a function approximation tool. This task was accomplished.
2.9 Learning LNKmap with a Two Input Cosine function
The next function tested was the "bird function," z = cos(27r(x 2 - y2)) where
x and y ranged from 0 to 1 for a total of 10201 training points (see figure 31). This
01"- 0.5:"1::";
function was chosen because of its simplicity and the output values were relatively
38
low between -1 and 1 so no normalization was required. Expanding the axes of the
bird function actually shows the whole function to be a symmetric cross shape (see
figure 32).
0.5.
X ,
0 .5"
-0.5
0 ~ 0.-0.5 -0.5
y-1xis -1 -
y-axisx-axis
Figure 32. z = cos(27r(x 2 - y2)) where x and y ranged from -1 to 1.
2.9.1 RBF Architecture. Two network architectures were tested with the
bird data. The first was a Radial Basis Function (RBF) using a K-means clustering
algorithm. The RBF was trained using 8, 64 and 128 centers in the K-means clus-
tering (see figure 33). Power of two centers above 128 (i.e. 256, 512, etc...) caused
an error in the program. The best the LNKmap RBF algorithm could learn the
bird data is shown in -figure 33. The placement of the clustering centers can be seen
as bumps in the bird's back. The RMS error for the training using 8, 64 and 128
centers is shown in figure 34. The highest error was seen near the edge of the graphs
where the function was constrained. Increasing the number of centers in this case
did not reduce the error. The mean error for the 64 centers case was 0.1229 while
the mean error for the 128 centers case was 0.1273. Figures 33 and 34 do not show
39
much difference between the 64 center and 128 center cases. To examine the effect of
doubling the number of clustering centers from 64 to 128, the difference between the
two outputs is shown in figure 35. The average difference was 9.3888 x 10-5. The
difference between the two RMS errors is shown in figure 35. The average difference
of the RMS error was 0.0044.
0.5
-441-l 1 -1
0.8 0 .8 oo°06 Ci Ci 0.8 oi 0.8
0 0 0.0 0.4 0.6 .0. 0.4 0. 0 CA
y-Mu 0 0 L*od y-aM 0 0 -my.-3,o 0 0 x-ali
Figure 33. Output of RBF with 8, 64 and 128 centers in the K-means clusteringalgorithm
20.
0, 0. 0 00 0
00.4 .4 O T 0.4 0.0 . 0.4 0.6-0. 4 0- . 0 -a0. 2 0.2
y-_u 0 0 x-rd y.-4xi 0 0 X-Ax y.-Uxi 0 0
Figure 34. RMS error of RBF with 8, 64 and 128 centers in the K-means clusteringalgorithm
2.9.2 MLP Architecture. The second network architecture used to train on
the bird function was a Multi-Layer Perceptron (MLP). Three MLP architectures
were chosen to train the bird data. For each architecture, the non-linearity function
used was the symmetric sigmoid (see figure 36). The first architecture had one hidden
40
022
0, 0.11
-42
-0.3 4 -0.0
0.8 0.8
0.6 0.8 0.0 0.8040.6 O.4A 0.6
0 0.4 040.2 022 0.2
y-axis 0 0 x-,ax y-Wcs 0 0 Xaiy-r x-aaia
Figure 35. Difference between outputs (left) and RMS errors (right) using 64 and128 centers in the K-means clustering algorithm
S. .............. ......... i...... ..... .. •.
-• .i ... .. . . .............. ...... ... ... ........ ..... i . .... .... . . .. . .. .
Ma2..... .... .. ...... ... ....... . .. .i. .......i ........ .
-a4 . ....
-a•_ •. ........... • :. ......... , ........... ........... i ....... ..... ..... .......,
-• . .. .... ... ...:, . ... ... .... i .. . .... ...... ...... .....: .......... . . .......... . . ...
-6 - ~ -2 0 2 4 a
Figure 36. Symmetric sigmoid used for non-linearity function V 2 _11+e4
41
layer of 10 nodes (2:10:1). Figure 37 shows the progress of learning for 220 and 440
epochs respectively. The RMS error is shown in figure 38. The second architecture
0.5 0. 8
0. 6 . 0. 6 0.8
0.4 0 .,0. 0 .4
0.2 0.2 040.2 0,2 0.
y- is 0 0 x-axis y-axis 0 0 x-ixis
Figure 37. Output of MLP after 220 and 440 epochs with architecture of 2:10:1
4,
0: 0
0.0 0.8 0.6 0.80.4 0.4 0.4
02 0.2 0,2 0.2
y-axis 0 0 x-ax• y..•S 0 0 x-8A•
Figure 38. RMS error of MLP after 220 and 440 epochs with architecture of 2:10:1
had one hidden layer of 64 nodes (2:64:1). This architecture was chosen to compare
with the 64 center RBF. Figure 39 shows the progress of learning for 20, 50 and 100
epochs respectively. The RMS error is shown in figure 40. Comparing figures 33 and
39 it is clear that the RBF outperformed this architecture in both speed and error
even though the architectures have basically the same number of nodes. The third
42
,5 05.
-0051,r
08
0. 2 0.2 020yo 0 0 X.-Am • 0 xz 0 0 x-ayi
Figure 39. Output of MLP after 20, 50 and 100 epochs with architecture of 2:64:1
3.5,4
2.5,
0.DAJ 0.6 0.8 0.6 0,8040.6 0.4 0.8 0.4 .
O 0,4 0,4 0.40.2 '0.2 0.2 0.2 012 .2
y- m~i 0 0 ,-i y.-0~i 0 0 X-X"yLk 0 0 X-I~is
Figure 40. RMS error of MLP after 20, 50 and 100 epochs with architecture of2:64:1
43
architecture had two hidden layers of 10 nodes each (2:10:10:1). Figure 41 shows
the progress of learning for 20, 220 and 440 epochs respectively. The RMS error is
shown in figure 42.
11 1.1
05 05
0 6 02 000.6 0,8 1 A, 0,8 0. 0
0,4 .5 0. a,6 0A4 0.
-,2 0.4 0.4
-le 0 0 x- y0 0 a a .m•y 0 0 x-r
Figure 41. Output of MLP after 20, 220 and 440 epochs with architecture of2:10:10:1
Z61 2, 21
20,
0.5 05 .
020 1:0IO0.0. 0.8 0.8 ,!0.
050. ' 0. , '. 8 0.6 .4 0.4 04 0A 0. 0.4
020.4. 0.' 1.2 0.1 0.4
y-lmi 0 0 -• -x 0 0 -z -- il 0 0 x-x
Figure 42. RMS error of MLP after 20, 220 and 440 epochs with architecture of
2:10:10:1
2.9.3 Conclusion. Comparing figures 33 to 41 with figure 31 (the original
bird data), it is clear from these results that the MLP was able to learn the bird
data better than the RBF. However it is important to note that this result does not
generalize to other data and the different architectures produced alternate results.
The main purpose of the bird data tests was to verify that the LNKmap program
could approximate a function with more than one input.
44
2.10 Conclusion
This chapter explored the background of both actual and simulated 3D sound.
It showed that natural errors occur (i.e. reversals) with both actual 3D and simu-
lated 3D sound sources. The head, pinna and HRTF are important for 3D sound
localization. It also showed that non-individualized HRTFs can be successfully used
with speech.
The chapter also explored the use of LNKmap as a tool for function approxi-
mation. LNKmap was able to successfully train a number of different neural network
architectures from simple linear and non-linear functions to two input trigonometric
functions.
45
III. Methodology
3.1 Introduction
The previous chapters explored the background of 3D sound and neural net-
works. This chapter will combine the two concepts to train a neural network to
approximate the 3D sound HRTF.
3.2 Learning HRTF Data for Zero Elevation
3.2.1 MLP Architecture. Having shown that the LNKmap program works
correctly for multiple input data, the next step is to map the Head Related Transfer
Function (HRTF) data. The HRTF data consists of 25,296 data points. Broken
down that is 272 azimuth and elevation locations: 100 to 3500 in azimuth and -90'
to 900 in elevation. For each azimuth and elevation pair, 93 frequencies ranging from
100 to 20,000 Hz are used. To simplify the training, only the zero elevation is used in
this thesis. The zero elevation data consists of 24 azimuth points with 93 frequencies
each. For a total of 2232 data points (see figure 43). Since the MLP worked the
1002
10"
102
10, 3 00
10010' 400 AzImuth
Frequency
Figure 43. Zero elevation HRTF data for the left ear. Frequency and Responseshown in logarithm scale
46
best with the bird data, the zero elevation was first trained with a MLP with an
architecture of 2:50:50:1. The output node function was a symmetric sigmoid. The
weights were updated after each trial (no batch). Figure 44 shows the results after
7235 epochs. The MLP extremely smoothes the HRTF data. An experiment needs
100-
102 100
10' 200 '010' 10d•300
10' 400 Azimuth
Frequency
Figure 44. Zero elevation MLP (2:50:50:1) after 7235 epochs. Frequency and Re-sponse shown in logarithm scale
to be constructed to determine if the jaggedness of the actual HRTF data is needed
to perceive 3D sound. This is not addressed in this thesis.
3.2.2 RBF Architecture. The RBF was tested with the zero elevation data
next. A K-means clustering algorithm with 256 centers was used. Compared with
the MLP the RBF trained a lot faster and results were even better from a visual
standpoint. Figure 45 shows the results of training with the RBF. Blank areas in
the figure are where the RBF output was negative or zero (logarithmic scale requires
positive values). Since negative HRTFs are not possible, they will be zeroed for filter
design. The RBF is able to more closely match the jaggedness of the actual HRTF
data. Figure 46 show the RMS error for the MLP and RBF networks respectively.
A close inspection of figure 46 reveals that the maximum magnitude for RBF error
(0.2707) was actually higher than the MLP error (0.1573). However the MLP had
47
10".
10*.
0
10, 100
10, •300
10' 400 Azimuth
Frequency
Figure 45. Zero elevation RBF with 256 centers. Frequency and Response shownin logarithm scale
0.2-02.
0.15 0.25
10.1 102.1
0.1,0.05
0 0.05 0
0 100 100
lto0 3 0010 WOO 4300
lie le 101
10o 4 okih 400 4lAzitbFrequ / Fneqy
Figure 46. Zero elevation RMS err for the MLP (left) and RBF (right)
48
more error overall. The sum of the MLP and RBF error was 7.3518 and 5.8528
respectively.
3.2.3 Conclusion. It would be premature to generalize that the RBF works
better than the MLP for jagged data like the HRTF data as compared to smooth
data like the bird data. The MLP and RBF had two different architectures. The
number of weights used by the MLP used was 100 weights (50 weights per hidden
node) while the RBF used 512 weights (256 centers by K-means and 256 weights).
That is a ratio of 5 to 1. However the RBF does have the desirable feature of speed.
3.3 Experiment 1: With and Without HRTF
In the process of manipulating and filtering sound sources with the HRTF an
obvious question arose, "Whether or not the HRTF data helped in 3D localization?"
To answer this question experiment one was conceived.
3.3.1 Objective. Determine whether or not the utilization of AAMRL's
HRTF data for filter design provides a specific advantage over the use of only ITD's
in generation of a binaural signal. The filters were designed using the program ESPS
by Entropic Research lab. The design was an FIR filter using the weighted mean
square error criterion. The weights corresponded to the AAMRL HRTF for the 93
frequencies at the test location. A separate filter was designed for each ear and for
each location tested.
3.3.2 Method. Twenty subjects were presented 48 binaural sounds through
a set of headphones. -In this forced choice experiment, the subjects were asked to
select the location from which they perceived the sound. Subjects choose from a
display provided on a computer screen by moving the mouse to place the cursor over
the perceived location and clicking the button (see figure 47). The display had only
24 possible azimuths to choose from, and the elevation angle is always zero. These
49
Figure 47. Display provided on a computer screen for selection of the perceivedlocation
24 locations corresponded to the 24 possible locations of the computer generated
binaural signals. The signal used in each case was either digitally filtered and de-
layed appropriately for a given angle of azimuth, or simply delayed appropriately to
account for the angle. In either case, the original signal was the speech pattern for
a male utterance of the word "seventeen."
Subjects were asked to close their eyes during the data collection phase of this
experiment. Interaction effects between the subjects and the two fixed factors (angle
and method of signal generation) in this experiment are assumed to be zero. The
angle at which the signal is designed to come from was called factor B, and has
24 levels (one for each azimuth). Factor A represented the method by which the
binaural signal was generated, and has 2 levels (one for HRTF combined with ITD
and one for ITD only). Factor C represented the subjects.
3.3.3 The Model. The model used in experiment one is an analysis of
variance (ANOVA) model for a reduced three-factor experiment (14). It is assumed
that the a • b . c treatment combinations represent random samples of size n, where
50
a = 2, the number of filters; b = 24, the number of azimuths; c = 20, the number of
subjects; and n = 1, the sample size. The model is
Xijkl = g + ai + /3+ + 7 + (c4)ij + (Q'Y)ik + (WY)ik + eJ (1)
where
j = 1, 2,...,Ib
k 1, 2, ... ,I c
1) 2= , ,...,In
The XiJkL are assumed to be normally distributed and the following restrictions
apply:
Zc = 0, 0, E 7k =o0 (2)i 3 k
E(a)ij = 0, E(/)ij = 0 (3)i 3
E(a)ik = 0, E(a-y)ik = 0 (4)i k
E(ZY)ik = 0, E(!3')ik = 0 (5)j kc
Here ai, Oj, and 7k represent the main effects due to factors filter, azimuth and
subject respectively. -The terms (a 3 )ij, (a-y)ik, and (!3 y)jk represent interaction
between the factors. Let
Nijk. = mean for the (ijk)th treatment combination
•ij.. = the average of the population means for those
populations receiving the ith level of A and
51
the jth level of B= Aijk.
(6)k=1 C
-t=... the average of the population means for those
population receiving the ith level of Ab c i k(
7EE=bcZ . (7)j=1 k=1
1,t the average of the population means for all
populations under considerationa b c
E- E.- ike (8)i=ij=ik=1 abc
(9)
The other mean terms are defined in a similar manner:
b A 2i3k. (10)
j='
a/A.jk. = .= /-Jka (11)
a C
a- -(12)" '-"= = ac
k. = E E aIijk. (13)i=1 j=1
(14)
The main effects and interaction effects are defined as follows:
-i = Ai...- A (15)
,j = t. 3...- tI (16)
Ilk = A..k.--A (17)
(ao)ij = Itij.. - fti... - /t.j.. + IL (18)
(QY)ik = /tik.. - /i... - /..k. + A (19)
52
(,3 7)jk = Ajk.. - A.j..- A..k. + A (20)
The totals for the three-way ANOVA are shown next:
Ti...= sum of all responses from experimental units receiving
level i of A and B; where x is the absolute difference
of the actual location and perceived location of the soundb c n
= E E E Xijkl (21)j=1 k=1 1=1
Ti5..= sum of all responses from experiment units receiving
level i of A and j of BC nl
= E E Xijk (22)k=1 1=1
The other total terms are defined in a similar manner:
a c n
Tj.. = E E E Xikl (23)i=1 k=1 1=1
a b n
T.k. = E E E Xijkl (24)i=1 j=1 1=1
b n
Ti.k. = E E XiZkl (25)j=1 1=1
a n
Tjk. = Z Xjkl (26)i=1 1=1
n
Tjk•. = Xijkl (27)
a b c n
T... = E E E E Xijkl (28)i=1 j=1k=1 I=1
The sums of squares needed in the three-way ANOVA are:
ssA = T?.. T. (29)SSA bcn abcn
53
bT 2 T.2...(0Bs = 3+ (30),
j=1 acn abcn
C T2 k T2C = . .... (31)
k=1 abn abcna b T . a b .. + ... (32)
SSAB =EE -+i=1 j=1 cn i=1 bcn .= acn abcn
c __ i. T2!. .SSAC = bn - a - n a bc (33)
i= 1 k= 1 b i --1 b---1 a n a bcn
SSBC = c _ "2 .. . __.. (34)
3=lk=1 an a acn k- abn abcn
a b c n T2
SStotal = k - ab c. c n (35)i=1 i=1 k---1 1=1
SSE = SStotal - SSA - SSB - SSc - SSAB - SSAC - SSBC (36)
Finally an ANOVA table as shown in table 3 is set up to test the null (H0 )
and alternative (H1) hypothesis:
Ho1 ): There is no interactions of the factors A and B means;that is, (ao3 )ij = 0.
Ho2): There is no interactions of the factors A and C means;that is, (ay)ik =-
Ho3): There is no interactions of the factors B and C means;that is, (0Y)i)k = 0.
H•4): Factor A means are equal; that is, ai = 0.H•5): Factor B means are equal; that is, O3 = 0.
Ho): Factor C means are equal; that is, yk = 0.
H•'-6): The means are not equal.
3.4 Experiment 2: AAMRL HRTF and RBF HRTF
3.4.1 Objective. Determine whether or not the utilization of AAMRL's
HRTF data for filter design is statistically different from the utilization of RBF
neural network data for filter design in generation of a binaural signal.
54
Source of Degrees of Sum of Mean CalculatedVariation Freedom Squares (SS) square (MS) F Value
A a- 1 SSA SSA/(a-- 1) MSA/MSE
B b- 1 SSB SSB/(b- 1) MSB/MSE
C c- 1 SSC SSc/(c- 1) MSC/MSE
AB (a - 1)(b - 1) SSAB SSAB/(a - 1)(b - 1) MSAB/MSE
AC (a - 1)(c - 1) SSAC SSAc/(a - 1)(c - 1) MSAC/MSEBC (b - 1)(c - 1) SSBC SSBc/(b - 1)(c - 1) MSBC/MSE
Error (E) DFE = subtraction SSE SSE/DFETotal abcn - 1 S~tota1
Table 3. Analysis of Variance for "Reduced three-way model"
3.4.2 Method. Twenty subjects were presented 48 binaural sounds through
a set of headphones. In this forced choice experiment, the subjects were asked to
select the location from which they perceived the sound to come from. Subjects
chose from a display provided on a computer screen by moving the mouse to place
the cursor over the perceived location and clicking the button (see figure 47).
The display had only 24 possible azimuths to choose from, and the elevation
angle is always zero. These 24 locations corresponded to the 24 possible locations
of the computer generated binaural signals. The signal used in each case are either
digitally filtered with the AAMRL HRTF data or RBF HRTF data and delayed
appropriately for a given angle of azimuth. The filters were designed using the
program ESPS by Entropic Research lab. The design was an FIR filter using the
weighted mean square error criterion. The weights corresponded to the AAMRL
HRTF or RBF HRTF for the 93 frequencies at the test location. A separate filter
was designed for each ear and for each location tested. In either case, the original
signal is the speech pattern for a male utterance of the word "seventeen."
Subjects were asked to close their eyes during the data collection phase of this
experiment. Interaction effects between the subjects and the two fixed factors (angle
and method of signal generation) in this experiment are assumed to be zero. The
angle at which the signals are designed to come from are called factor B, and has
55
24 levels (one for each azimuth). Factor A represented the method by which the
binaural signal was generated, and has 2 levels (one for AAMRL HRTF and one for
RBF HRTF). Factor C represented the subjects.
3.4.3 The Model. The model used in experiment two is the same as
described above in experiment one.
3.5 Conclusion
This chapter presented the methodology for training a neural network to ap-
proximate the HRTF for zero elevation. It also presented two experiments to verify
the methodology. The results of the training and experiments will be presented in
the following chapter.
56
IV. Experiment Results
4.1 Introduction
This chapter presents the results of the two experiments conducted during this
thesis work. Before presenting the results of each experiment a discussion of the
types of errors analyzed in the experiments is presented.
4.2 Types of Error
4.2.1 Algebraic Error. Algebraic error is measured by taking the difference
between the actual location of the sound source and the subjects perceived location.
The difference is calculated so that the results is always between or equal to ±1800 .
Also the sign of the result indicates the direction the subjects response was from the
actual location. Positive values indicate a clockwise difference and negative values
indicate a counter-clockwise difference (see figure 48). For example, if the actual
location is 3490 and the subject responded with 100 the algebraic error is calculated
as +21'. On the other hand, if the actual location is 100 and the subject responded
with 3490 the algebraic error is calculated as -21'.
4.2.2 Absolute Error. Absolute error is measured by taking the absolute
value of the difference between the actual location of the sound source and the
subjects perceived location of the sound source. The difference is calculated so that
the result is always less than or equal to 1800. For example, if the actual location
is 349' and the subject responded with 100 the absolute error is calculated as 210 as
opposed to 3390.
4.2.3 Reversals. Reversals are a phenomenon where the sound is perceived
flipped front to back or back to front from the actual location. Note this is not
a 1800 shift, but a symmetric flip about the horizontal axis though the ears (see
figure 49). Figure 50 shows actual occurrence of this phenomena from experiment
57
Perceived location
Actual location
Actual location -/
• Perceived location
Figure 48. Algebraic Error, positive values indicate a clockwise difference and neg-ative values indicate a counter-clockwise difference
one. The left image shows a subject with back-front reversals with HRTF treatment.
The right image shows a different subject with front-back reversals without HRTF
treatment. The notation has the following meaning. The x symbol indicates actual
location while the o symbol indicates perceived location. The line between the x
and o shows the pairwise groupings. The gradual spacing toward the center of the
head is for clarity" only and is not an indication of perceived distance.
4.2.4 Default to 90'. Default to 90o is a phenomena where the perceived
location defaults to 90' or 270' no matter where the actual location is. This phe-
nomena has been observed by other researchers (17). Figure 51 is an example of a
subject whose perceived locations defaulted to 901 and 270'.
4.3 Experiment One Results
As stated previously, the primary objective of experiment one was to determine
whether or not the Head Related Transfer Function (HRTF) provides any advantage
to a listener attempting to identify the azimuth of a computer generated binaural
58
Actual location
0 0
00C
Perceived location
Figure 49. Reversal, a symmetric flip about the horizontal axis through the ears
10 349 10 3493 ~ = -?8 3 - ~ -. 8
'15 45155' 03 5/ 03
(5
5 15 40"
6' '91 6' '91
8 78 8 78
9 62 9& .6211'
.4 9 11
,O 49
'0 ' 123712 37 12 237
13 • 25 13 251, 4 _12
14 _,- '"212
169 191 169 191
Figure 50. Example of a subject with back-front reversals with HRTF treatment(left) and a different subject with front-back reversals without HRTFtreatment (right). Notation: x indicates actual location, o indicateperceived location. The line between the x and o shows the pairwisegroupings. The gradual spacing toward the center of the head is forclarity only and is not an indication of perceived distance.
59
10 349
4- 155' 03
6' 91
8 78
9 62
11~ 4912 37
13 2514 -•___ - f-212
169 191
Figure 51. Example of subject whose perceived locations defaulted to 900 and 2700.Notation same as above
signal. More specifically, does the use of the HRTF in generating binaural signals
provide a listener with a signal that can be more accurately located in azimuth than
a binaural signal generated using only interaural time delay (ITD)? The HRTF's
used in this experiment were developed at Armstrong Aerospace Medical Research
Laboratories (AAMRL), Wright-Patterson AFB, OH (12).
Data collected for 20 subjects was used to determine the answer to the question
posed above. Calculations in this statistical analysis of the data are performed using
Matlab. The data used is found in appendix A. The Matlab code is found in appendix
C. This experiment is a reduced three factor experiment. Table 4 summarizes the
factors in this experiment.
Factor Levels
Filter 2Subject 20Azimuth 24
Table 4. Experiment 1 factors
Where the levels of "Subject" represent each of the twenty individuals taking
part in the experiment, and the levels of "Filter" are 1 and 2 (1 corresponds to the
60
use of the HRTF and 2 corresponds to no use of HRTF). The levels of "Azimuth"
are 1, 2, 3, ... , 24 and correspond to the angles shown in table 5. The angle of
Level Angle Level Angle1 100 13 19102 320 14 21203 450 15 2250
4 580 16 23705 690 17 24906 820 18 26207 980 19 27808 1110 20 29109 1230 21 303010 1350 22 315011 1480 23 328012 1690 24 3490
Table 5. Levels of "Azimuth"
azimuth is measured as shown in figure 52. Using the data set shown in appendix A
Figure 52. Azimuth, 0 of sound source to directly in front of face
with "Response" representing the magnitude of the difference between the subjects'
perceived angle and the true location, the following Analysis of Variance (ANOVA)
(table 6) was obtained.
61
Source of Degrees of Sum of Mean Calculated F at
Variation Freedom Squares (SS) square (MS) F Value 1% level
Filter (A) 1 60554.26 60554.26 57.44 6.85,6.637.47 7.88
Azimuth (B) 23 416834.70 18123.25 17.19 2.03,1.792.23 2.78,2.70
Subject (C) 19 48451.13 2550.06 2.42 2.19,1.88
A * B 23 186224.72 8096.73 7.68 2.03,1.79A * C 19 23134.95 1217.63 1.15 2.19,1.88B * C 437 564293.18 1291.29 1.22 1.53,1.00
Error 437 460732.48 1054.31Total 959 1760225.42
Table 6. Analysis of Variance for "Response"
Recall, the following model for this experiment:
Xi 3kl = pL + Oaj + /3j + -1k + (a~3)i + (ay)ik + (/ 3 ')ik + 6ijkl
Interaction effects between filters and azimuth must be tested. The null and alternate
hypotheses are H'1) (no interactions between filters and azimuth) and H1 (the means
are not equal). The test statistic is Fcac = (Mean Square Interaction) + (Mean
Square Random error) = 7.68 (as seen in table 6). Testing at an significance level of
1% (a = .01)1
Fcarc = 7.68 > F.99,23 ,437 P 2.03,1.79.
Hence, H11) can be rejected. Since the hypothesis HT.) for the Filter * Azimuth
can be rejected, it can be concluded that the interactions are significantly large.
Differences in the factors are then significant only if they are large compared with
the interactions. For this reason, many statisticians recommend that HO4 ) and H05 )
be tested by using the F ratios MSA/MSAB and MSB/MSAB rather than those
given in table 3 (27). These alternate F values are the second values shown in
62
table 6. Testing Ho4 ) at an significance level of 1%,
Fcaiait = 7.47 < F. 9 9 ,1,23 = 7.88.
Testing Ho5 ) at an significance level of 1%,
Fralcai, = 2.23 < F.99,23,23 z 2.78, 2.70.
Therefore the means of the filters are statistically equal and the means of the az-
imuths are statistically equal.
Next completing the same tests for Filter * Subject interactions. Testing at an
significance level of 1% (a = .01),
Fcaic = 1.15 < F.99 ,19 ,4 37 - 2.19,1.88
shows that H(2) cannot be rejected. It can be concluded that there is no Filter *
Subject interactions and H(4 ) and H(6) can be tested. Testing H(4) at an significance
level of 1%,
FcaIc = 57.44 > F.99 ,1 ,4 37 - 6.63.
Testing H(6) at an significance level of 1%,
FcaIc = 2.42 > F.99 ,19 ,48 0 ; 1.91.
Therefore the means of the Filters nor the means of the Subjects are statistically
equal.
Finally completing the same tests for Azimuth * Subject interactions. Testing
at an significance level of 1% (a = .01),
Fca.c = 1.22 > F.99 ,4 37 ,4 37 - 1.53, 1.00
63
shows that HO3) cannot be rejected. Testing HO5 ) at an significance level of 1%,
Fcatc = 17.19 > F.99,23,437 - 2.03,1.79.
Testing HO6 ) at an significance level of 1%,
Fcac = 2.42 > F.99,19,437 • 2.19,1.88.
Therefore the means of the azimuth nor the means of the subjects are statistically
equal.
Using the reversal corrections, the following Analysis of Variance (ANOVA)
(table 7) was obtained. If the reversal location error was smaller than the sub-
jects' response then the reversal angle was used in table 7. The number of reversals
corrected was 432 out of 960 or 45%.
Source of Degrees of Sum of Mean Calculated F atVariation Freedom Squares (SS) square (MS) F Value 1% level
Filter (A) 1 1811.7 1811.77 7.45 6.85,6.633.28 8.18
Azimuth (B) 23 31188.33 1356.01 5.58 2.03,1.79
Subject (C) 19 21444.68 1128.67 4.64 2.19,1.882.04 3.15,3.00
A * B 23 7804.46 339.32 1.40 2.03,1.79A * C 19 10498.02 552.53 2.27 2.19,1.88B * C 437 137136.70 313.81 1.29 1.53,1.00Error 437 106269.46 243.18Total 959 316153.40
Table 7. Analysis of Variance for "Response" with reversal corrections
From table 7 the calculated F values indicate significant interaction between
Filter * Subject. The other interactions Filter * Azimuth and Azimuth * Subject
indicate no significant interaction. Comparing within factor variance, the means
64
of the filters are statistically equal; the means of the azimuths not are statistically
equal; and the means of the subjects are statistically equal.
Now, the main objective of this experiment is addressed. Is there a difference
in the mean response for HRTF versus non-HRTF binaural signals? To determine
the answer to this question, a pairwise comparison of the means for the two levels of
the variable "Filter" is conducted. The following formula is used to determine the
confidence intervals for the differences of two population means:
S-- (37)X2 - X1 ± z10'92-91 = X2 X 1 ±- zc , 2 [0 N- (7
N2 N,
where,
X1 is the mean of the with HRTF filter error.X 2 is the mean of the without HRTF filter error.oCl is the standard deviation of the with HRTF filter error.C2 is the standard deviation of the without HRTF filter error.z, is the 99% confidence coefficient (z, = 2.58).N1 is the size of the with HRTF filter error.N 2 is the size of the without HRTF filter error.
And since the point estimates for the means at each level are:
X, = 35.73
X2 = 51.61
X2-X1 = 15.88
We can state with 99 percent confidence that
8.87 < X 2 - X 1 < 22.90
65
And since the point estimates for the means at each level with reversal corrections
are:
X1 = 20.12
X2 = 17.37
X2 - X1 = -2.75
We can state with 99 percent confidence that
-5.76 < X 2 - X 1 < 0.27
The confidence interval shows a 8.87' to 22.90' advantage of using the HRTF versus
no HRTF. After reversal correction the advantage is reduced to 0.27' to a disadvan-
tage of 5.76'. It should be obvious that the correction for reversals will reduce the
error. The algorithm used to include reversals chooses the smaller error. The mean
with HRTF error reduced from 35.73' to 20.12' and the mean without HRTF error
reduced from 51.61' to 17.37'. The interaction of reversals is examined further in
the next section.
4.3.1 Further examination of results. The above results show mixed results
of the statistical advantage in localization accuracy over simply ITD with significant
interaction between the location of the sound and whether the HRTF is used.
Figure 53 shows the absolute mean error and standard deviation for sounds
with the HRTF (solid) and without the HRTF (dash). The error bars (vertical lines)
show the standard deviation of the mean error. This figure reveals that the without
HRTF filter had greater error for presentation in front of the head when compared to
behind the head. Figure 54 shows the algebraic mean error and standard deviation
for sounds with the HRTF (solid) and without the HRTF (dash). Again the error
bars show the standard deviation of the mean error. The algebraic error gives a
66
180 ...
~160 - .. ... ... .I. .
i~120 - ........ ......
~ 1 '\I 7T
100 .......
80I T1)
60 .. ' .. . .. . .. . .
T7 T-T l-rI
~20 I ' 'j.-. . .'
0............. ...... . -7.. .. I.....
-20L-0 90 180 270 360
azimuth angle In degrees
Figure 53. Absolute mean error and standard deviation for sounds with the HRTF(solid) and without the HRTF (dash). The error bars show the standarddeviation of the mean error.
150-f ......
T I
so -.. .. . . . . . . l. .
-so ...... .. ..... . . .
. -100 .. .. ... ... ... .-
-1 5 0 .. ... ........ ..... ..... ..... .........
0 90 180 270 360azimuth angle In degrees
Figure 54. Algebraic mean error and standard deviation for sounds with the HRTF(solid) and without the HRTF (dash). The error bars show the standarddeviation of the mean error.
67
direction to the error. As can be seen from the figure, the sign of the error indicates
a general pulling of the perceived sound toward the ears. Figure 55 is perhaps a
better view of-the data. In this figure the,* shows the mean location of the perceived
1 0 349 10 349
'32 ,.... ...... ..... ,3832 328
82 278 82
111/ ,,/ . 1 1".;"2
123".."27 12 "'.23
135"". 25135 '. '225
18 ".148 212
169 191 169 191
Figure 55. Mean and standard deviation for sounds with the HRTF (left) and with-out the HRTF (right). The arcs show the standard deviation of the mean(shown as a,.
sounds and the arcs show the standard deviation. As can be seen, the mean location
tends to be pulled near the 9Q0 and 2700 locations. Figure 55 also shows how the
without HRTF treatment is perceived more behind the head than in front of the head.
This parallels the results of Oldfield and Parker where the pinna-filled perceptions
were pulled toward the ears (18).
To further present the data in more detail, appendix A has figures 65, 66, 67
and 68 depicting all •he subjects responses for each actual location. The notation
used in these figures is as follows: x and solid line indicates with HRTF, o and
dashed line indicates without HRTF. The arcs show the standard deviation of the
samples. The * and + indicate the mean with and without HRTF respectively.
These are the actual responses. No corrections have been made concerning reversal.
68
In an effort to take into account reversals, polar histograms of the data are
presented next. The polar histogram shows the frequency of subjects' responses for
a specific angle of azimuth. Figure 56 shows the polar histogram of all responses for
sounds with the HRTF (solid) and without the HRTF (dash). The dotted circle in-
dicates the number of times the sounds were actual at that location. Notice subjects
10 349
32 ,"'.............. 328 J4 .,' " '.315
/ 3.
1"36 .. s. / I I\ ,,.%, 225
..... Actual 14 1 'I -- "\212
-with HRTF 1" ',19 1SI
- - without HRTF
Figure 56. Polar Histogram of all responses for sounds with the HRTF (solid) andwithout the HRTF (dash).
heard sound at some locations more times than the sound was actual there. Recall
that each subject heard 48 sounds, one with HRTF and one without HRTF for each
of the 24 azimuth locations for a total of 960 sounds. This figure again indicates that
subjects tended not to perceive sounds in front or behind the head. However subjects
perceive less without the HRTF sounds in front of the head and more without the
HRTF sound behind the head than the with HRTF sound respectively.
Tables 8 and 9 show statistics for the with HRTF and without HRTF filters.
Without taking into account reversals corrections 41.25% and 34.58% of the with
HRTF and without HRTF responses respectively were correct within one button
(±23°). Recall the button layout is shown in figure 47. Figure 57 is a polar histogram
of all responses correct within one button. This figure shows without HRTF correct
69
with HRTF Without HRTFNumber of samples 480 480rmserr 0.4500 5.9583stderr 50.7709 70.8760correct - exact 16.88% 14.37%
correct - within one button (230) 41.25% 34.58%
Table 8. Without reversal corrections (nor)
with HRTF Without HRTFNumber of samples 480 480rmserr 0.9375 0.5458stderr 27.6543 24.8911Number of reversals 176 259correct - exact 24.70% 28.75%correct - within one button (230) 57.29% 66.87%
Table 9. With reversal corrections (rev)
responses are greater behind the head while the with HRTF correct response are
more evenly distributed front and back.
Next taking into account reversals corrections 57.29% and 66.87% of the with
HRTF and without HRTF responses respectively were correct within one button
(±230). Figure 58 is a polar histogram of all responses with reversal corrections that
are correct within one button. This figure shows without HRTF correct responses
and the with HRTF correct responses are more evenly distributed front and back.
The percentage of reversals with the HRTF and without the HRTF was 36.67%
and 53.96% respectively. Oldfield and Parker found few reversals with the pinna
unfilled and 26% with the pinna filled (17, 18). The relative increase in the number
of reversals indicates that the HRTF helps in reducing the number of reversals. The
differences between Oldfield and Parker and this thesis also indicate that there may
be cues in the actual sound that are lost in the filtered sound. Note: Oldfield and
Parker used white noise while this thesis used filtered speech.
70
10 349
32 ... 328
45 .315
58 303
69 * 291
82 278
"98 262
S\ • \,,4 •249
123 ". ~ •.237
135 225
S..... Actual 148 ............ 212
- with HRTF correct 169 191
- - without HRTF correct
Figure 57. Polar Histogram of all responses correct within one button for soundswith the HRTF (solid) and without the HRTF (dash).
10 349
32 .... . . . .. 32845 ."'•' - • "'""....315
58 303
69 ' "' 291
8 • . 278
98 2.. . 262
123",. 23
-with HRTIFcorrect 169 191
-- without HRTIF correct
Figure 58. Polar Histogram of reversal corrected responses correct within one but-ton for sounds with the HRTF (solid) and without the HRTF (dash).
71
4.3.2 Summary of Experiment One Results. Table 10 shows a summary of
the ANOVA hypotheses acceptances and rejections. An "accept" means the means
Source of No Reversals ReversalsVariation Hypothesis Hypothesis
Filter (A) reject rejectaccept accept
Azimuth (B) reject rejectaccept
Subject (C) reject reject- accept
A * B reject accept
A * C accept rejectB * C accept accept
Table 10. Summary of Experiment One
are statistically equal, while a "reject" mean the difference between the means is
statistically large. Depending on the statistical method used, we can conclude that
the HRTF does provide a statistical advantage in localization accuracy over simply
ITD. There is however a statistically significant interaction between the location
of the sound and whether the HRTF is used. When this interaction is removed
using the alternate F-Value, the statistics give mixed results of equal means for the
filters and azimuth. This may mean that at certain angles of azimuth, the HRTF
either provides no advantage at all or hinders localization capabilities. Adding in
the calculations for reversals changes the results to where the means of the azimuth
are not statistically equal. The reversal calculations will inherently reduce the error
results. However comparing the number of reversals indicates an advantage of using
the HRTF over no HRTF.
4.4 Experiment Two Results
The primary objective of experiment two was to determine if a neural network
could be trained to learn the HRTF.
72
Source of Degrees of Sum of Mean Calculated F atVariation Freedom Squares (SS) square (MS) F Value 1% level
Filter (A) 1 7578.07 7578.07 8.58 6.85,6.63
Azimuth (B) 23 287455.55 12498.07 14.15 2.03,1.798.39 2.03,1.79
Subject (C) 19 39938.99 2102.05 2.38 2.19,1.881.41 2.19,1.88
A * B 23 27216.78 1183.34 1.34 2.03,1.79A * C 19 19117.75 1006.20 1.14 2.19,1.88B * C 437 650785.39 1489.21 1.69 1.53,1.00Error 437 385933.34 883.14Total 959 1418025.87
Table 11. Analysis of Variance for "Response"
Recall, the following model for this experiment:
Xj = IL + c*i + /3 + -1k + (ao3 )ij + (a-y)ik + (07i3~ k + 6ijkl
Interaction effects between filters and azimuth must be tested. The null and alternate
hypotheses are H'1) (no interactions between filters and azimuth) and H1 (The means
are not equal). The test statistic is Fcac = (Mean Square Interaction) + (Mean
Square Random error) = 1.34 (as seen in table 11). Testing at an significance level
of 1% (a = .01),
EcaIc = 1.34 < F.99,23,437 • 2.03,1.79.
Hence, H•1) cannot be rejected. it can be concluded that there is no Filter * Azimuth
interactions and HL4) and H(5) can be tested. Testing HO4 ) at an significance level
of 1%,
Fca.c = 8.58 > F 99,1,437 " 6.85, 6.63.
Testing HO5) at an significance level of 1%,
Fcaic = 14.15 > F.99,23,437 - 2.03, 1.79.
73
Therefore the means of the filters nor the means of the azimuths are statistically
equal.
Next completing the same tests for Filter * Subject interactions. Testing at an
significance level of 1% (a = .01),
Fcaic = 1.14 < F. 99,19 ,437 ;: 2.19, 1.88
shows that HA2) cannot be rejected. It can be concluded that there is no Filter
* Subject interactions and HO4 ) and HO6 ) can then be tested. Testing HL4 ) at an
significance level of 1%, Fcaic = 8.58 > F(.99, 1,437) ; 6.85, 6.63. Testing H(6) at
an significance level of 1%,
Fea, -= 2.38 > F.99,19 ,437 - 2.19,1.88.
Therefore the means of the filters nor the means of the subjects are statistically
equal.
Finally, completing the same tests for Azimuth * Subject interactions. Testing
at an significance level of 1% (a = .01),
Fcaic = 1.69 > F.99,437,437 - 1.53, 1.00
shows that H(3) can be rejected. Differences in the factors are then significant only
if they are large compared with the interactions. Again the alternate F values are
used. These alternate F values are the second values shown in table 6. Testing H(5)
at an significance level of 1%,
Fcalcalt = 8.39 > F.99,23 ,43 7 ; 2.03, 1.79.
74
Testing H(6) at an significance level of 1%,
Fcalcaft = 1.41 < F. 9 9 , 1 9 ,4 3 7 • 2.19, 1.88.
Therefore the means of the azimuths are not statistically equal, however, the means
of the subjects are statistically equal.
Using the reversal corrections, the following Analysis of Variance (ANOVA)
(table 12) was obtained. If the reversal location error was smaller than the subjects'
response then the reversal angle was used in table 12. The number of reversal
corrections was 351 out of 960 or 37%.
Source of Degrees of Sum of Mean Calculated F atVariation Freedom Squares (SS) square (MS) F Value 1% level
Filter (A) 1 1249.46 1249.46 9.23 6.85,6.633.94 7.88
Azimuth (B) 23 16741.34 727.88 5.38 2.03,1.792.29 2.78,2.70
Subject (C) 19 6288.65 330.98 2.45 2.19,1.88
A * B 23 7303.92 317.56 2.35 2.03,1.79A * C 19 3282.8 172.78 1.28 2.19,1.88B * C 437 89632.50 205.11 1.52 1.53,1.00Error 437 59127.73 135.30Total 959 183626.43
Table 12. Analysis of Variance for "Response" with reversal corrections
From table 12 the calculated F values indicate significant interaction between
Filter * Azimuth. The other interactions Filter * Subject and Azimuth * Subject
indicate no significant interaction. Comparing the within factor variance the means
of filters are statistically equal; the means of the azimuths are statistically equal;
and the means of the subjects are not statistically equal.
Now, the main objective of this experiment are addressed. Is there a difference
in the mean response for AAMRL HRTF versus RBF HRTF binaural signals? To
75
determine the answer to this question, a pairwise comparison of the means for the
two levels of the variable "Filter" are conducted. The following formula is used to
determine the confidence intervals for the differences of two population means:
S_-± (38)X2 - X1 ± Zcrx,-X, = X2 - X1 ± Z, -102 (38)
N2 N,
where,
X, is the mean of the AAMRL HRTF filter error.X 2 is the mean of the RBF HRTF filter error.a, is the standard deviation of the AAMRL HRTF filter error.
U 2 is the standard deviation of the RBF HRTF filter error.z, is the 99% confidence coefficient (z, = 2.58).NI is the size of the AAMRL HRTF filter error.N 2 is the size of the RBF HRTF filter error.
And since the point estimates for the means at each level are:
X1 = 38.28
X2 = 32.66
X2 - X1 = -5.62
We can state with 99 percent confidence that
-12.01 < X 2 - X1 < 0.77
And since the point estimates for the means at each level with reversal corrections
are:
X1 = 17.18
X2 = 14.90
X2 - X• = -2.28
76
We can state with 99 percent confidence that
-4.58 < X?2 - X 1 < 0.02
The confidence interval shows a 0.770 advantage to a 12.01' disadvantage of using
the AAMRL HRTF versus RBF HRTF. After reversal correction the advantage is
reduced to 0.02' to a disadvantage of 4.58'. Again the correction for reversals reduced
the error. The mean AAMRL HRTF error reduced from 38.28' to 17.18' and the
mean RBF HRTF error reduced from 32.66' to 14.90'. The interaction of reversals
is examined further in the next section.
4.4.1 Further examination of results. Figure 53 shows the absolute mean
error and standard deviation for sounds with the AAMRL HRTF (solid) and RBF
HRTF (dash). The error bars show the standard deviation of the mean error. This
2 0 0 '.......... ..................... ...... .. . .....
1 5 0 . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .... . . . . . ...
1 1 - ..................... .......... ........ ........ . .....
I,I T " T
50 - T ..... T .
0 90 180 270 360"azimuth angle in degrees
Figure 59. Absolute- mean error and standard deviation for sounds with theAAMRL HRTF (solid) and RBF HRTF (dash). The error bars showthe standard deviation of the mean error.
figure reveals that AAMRL HRTF had slightly greater error than the RBF HRTF. A
comparison of the standard deviations reveals no significant trends. Figure 60 shows
the algebraic mean error and standard deviation for sounds with the AAMRL HRTF
77
(solid) and RBF HRTF (dash). The error bars show the standard deviation of the
mean error. The algebraic error gives a direction to the error. As can be seen from
"1 0 _ . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8'100 ' . :i
.8 II I
-C I : I\
. . . . .. . . . . . .- . . . . . .
"T IS 50 - .. .....
I, I II-,II II "
1 -0 .. ... .... . ... ..... .. ..... .. ........
150
0 90 180 270 360azimuth angle In degrees
Figure 60. Algebraic mean error and standard deviation for sounds with theAAMRL HRTF (solid) and RBF HRTF (dash). The error bars showthe standard deviation of the mean error.
the figure, the sign of the error indicates a general pulling of the perceived sound
toward the ears. Figure 55 is perhaps a better view of the data. In this figure the
* shows the mean location of the perceived sounds and the arcs show the standard
deviation. As can be seen, the mean location tends to be pulled near the 90' and
2700 locations. This parallels the results of experiment one and Oldfield and Parker
where the pinna-filled perceptions were pulled toward the ears (18). Begault and
Wenzle also reported subjects had a pattern of pulling toward the vertical-lateral
plane (3).
To present the data in more detail, appendix B has figures 69, 70, 71 and 72
depicting all the subjects' responses for each actual location. The notation used in
these figures is as follows: x and solid line indicates AAMRL HRTF, o and dashed
line indicate RBF HRTF. The arcs show the standard deviation of the samples. The
* and + indicates the mean AAMRL and RBF HRTF respectively. These are the
actual responses. No corrections have been made concerning reversal.
78
10 349 10 349
32 . 328 3 2
45 315 45 315458
58 •,".303 58 .. 303
69
82 278 82 278
98 262 98 262
11124 il 24123"..;"
123 237 123" , 237
135 225 135 225
148 148 212. ... 21.48 " ....... . ...... 21
169 191 169 191
Figure 61. Mean and standard deviation for sounds with the AAMRL HRTF (left)and RBF HRTF (right). The arcs show the standard deviation of themean (shown as a *).
In an effort to take into account reversals, polar histograms of the data are
presented next. The polar histogram shows the frequency of subjects' responses for
a specific angle of azimuth. Figure 62 shows the polar histogram of all responses for
sounds with the AAMRL HRTF (solid) and the RBF HRTF (dash). The AAMRL
HRTFs are label "Actual HRTF" in the figure legend. The dotted circle indicates the
number times the sounds was actual at that location. Again the responses between
the AAMRL HRTF and RBF HRTF are similar with the larger number of responses
occurring on the sides.
Tables 13 and 14. show statistics for the AAMRL HRTF and RBF HRTF
filters. Without taking into account reversals corrections 40.62% and 51.88% of
the AAMRL HRTF and RBF HRTF responses respectively were correct within one
button (±23'). Figure 63 is a polar histogram of all responses correct within one
button. This figure shows a even distribution between filters. The front and back
distribution may slightly favor the back.
79
10 349
98
123' / ..
135 ' 225
............... .. .............
-Acua HTF169 191
- -RB HRT.."
Figure 62. Polar Histogram of all responses for sounds with the AAMRL HRTF
(solid) and RBF HRTF (dash).
AAMRL HRTF RBF HRTFNumber of samples 480 480
rmserr 1.6354 0.0521
stderr 54.2080 50.4625correct - exact 15.42% 22.08%
correct - within one button (230) 40.62% 51.88%
Table 13. Without reversal corrections (nor)
AAMRL HRTF RBF HRTFNumber of samples 480 480
rmserr 0.1146 0.3312stderr 22.5449 19.9285
Number of reversals 188 163correct - exact 25.21% 30.63%
correct - within one button (230) 62.71% 70.42%
Table 14. With reversal corrections (rev)
80
10 349
32 328
45 .315
58 ..303
69 / ,'291
82 278
98 262
"i" ' .249
123'-". -,.237
135 226
Actual 148 . 212
- Actual HRTF correct 169 191
- - RBF HRTF correct
Figure 63. Polar Histogram of all responses correct within one button for soundswith the AAMRL HRTF (solid) and RBF HRTF (dash).
Next, taking into account reversals corrections 62.71% and 70.42% of the
AAMRL HRTF and RBF HRTF responses respectively were correct within one but-
ton (±230 ). Figure 64 is a polar histogram of all responses with reversal corrections
that are correct within one button. This figure shows there are more AAMRL HRTF
correct responses and the RBF HRTF correct responses and they are more evenly
distributed front and back. The percentage of reversals with the AAMRL HRTF and
the RBF HRTF was 39.17% and 33.96% respectively. Begault and Wenzle reported
the mean number of reversals over all the locations was 29% (3). The results of
experiment one showed the AAMRL HRTF reversals to be 36.67%.
4•4.2 Summary of Experiment Two Results. Table 15 shows a summary of
the ANOVA hypotheses acceptances and rejections. An "accept" means the means
are statistically equal, while a "reject" mean the difference between the means is
statistically large. Without considering reversals, we can conclude that the RBF
HRTF provides a statistical advantage in localization accuracy over the AAMRL
HRTF from which they were derived. However with reversals correction include a
81
10 34932 .. . . . . . . . . . .. . . .. 328
45 .. 5...
. t 148..2 303
82 "278
111 "2 249
123 2'.37•"
135 ". I"225
.1 8.......tual ... . ..........1 212
- Actual HRTF correct 169 191
- - RBF HRTF correct
Figure 64. Polar Histogram of reversal corrected responses correct within one but-ton for sounds with the AAMRL HRTF (solid) and RBF HRTF (dash).
Source of No Reversals ReversalsVariation Hypothesis Hypothesis
Filter (A) reject reject- accept
Azimuth (B) reject rejectreject accept
Subject (C) reject acceptaccept
A* B accept rejectA * C accept acceptB * C reject accept
Table 15. Summary of Experiment Two
82
statistical advantage is not indicated and the means of the two filters are statisti-
cally equal. There is also a statistically significant interaction between the location
of the sound and the filter. As discussed in experiment one, this may mean that
at certain angles of azimuth, the HRTF either provides no advantage at all or hin-
ders localization capabilities. The reversal calculations will inherently reduce the
error results. However, comparing the number of reversals does not indicate a large
difference between the AAMRL HRTF and the RBF HRTF.
83
V. Conclusions and Recommendations
5.1 Summary
This thesis determined whether an artificial neural network (ANN) can approx-
imate the Armstrong Aerospace Medical Research Laboratories (AAMRL) head re-
lated transfer functions (HRTF) data. In order to test this hypothesis, two separate
tests were performed. The first test determined whether HRTF lends any support in
sound localization when compared to no HRTF (Interaural Time Delay only). The
second test determined whether HRTF and ANN lend the same amount of support
in sound localization when compared to each other.
The following research objectives were met:
"* Modified AFIT algorithms that have been used successfully on placing sounds
in azimuth. The algorithms that create 3D sound with the AAMRL HRTF
were changed to used the ANN HRTF instead. This allowed the creation of
tests to compare the ANN HRTF with the AAMRL HRTF.
"* Statistically checked to see advantage of HRTF and ITD versus ITD only.
"* Statistically checked to see difference between AAMRL HRTF data and ANN
HRTF data
"* Investigated the program LNKmap as a neural network tool for function ap-
proximation.
5.2 Conclusions
The final conclusion is that the AAMRL HRTFs can be approximated by
an artificial neural network for azimuth positions at zero degrees elevation. The
point estimates for the difference in the AAMRL and RBF HRTF's mean error was
only 5.62' and 2.28' without and with reversal corrections respectively. The point
84
estimates indicate that the RBF HRTF had slightly lower error than the AAMRL
HRTF. However the 99% confidence interval of the mean error shows no advantage
one way or the other.
To show that the HRTF was in fact contributing to sound localization, experi-
ment one was conducted. The results indicate an advantage in localization accuracy
with HRTF. There is a statistically significant interaction between the location of
the sound and whether the HRTF is used. This result may indicate that for the 00
elevations ITD has a strong influence. Comparing the numbers of reversals clearly
indicates an advantage of using the HRTF over no HRTF. It is important to note that
the 3D sounds presented in the experiments were the direct paths only. Smith showed
an improvement in results when attenuated reflections were also included (25).
5.3 Recommendations for Future Research
The obvious next step is to expand the neural network training the other
elevations of the AAMRL data. The AAMRL data has HRTF responses for elevation
±90'. Oldfield and Parker found that the acuity of sound localization varied with
azimuth and elevation (17). The ANN HRTF would have to be tested in these
non-zero degrees elevation regions.
The AAMRL HRTF 0' elevation data in this thesis was spaced approximately
230 apart. Future research could investigate the interpolations accuracy of the ANN
against the AAMRL HRTF data spaced 1' apart at 0' elevation.
Replacing the AAMRL HRTF samples by an artificial neural network approx-
imator may also provide insight of underlining functions within the HRTF. Since
subjects where able to localize with the smoothed ANN HRTF, simpler filter designs
may be implemented.
Implementing the ANN HRTF in hardware may provide an alternate approach
to real-time implementation other than the DIRAD and Convolvotron.
85
Appendix A. Experiment 1 Data: With and Without HRTF
86
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 1480 1690 100 100 1910
320 320 1480 320 1480 1690
450 1230 1480 450 1480 1690
580 580 1480 580 820 1480
690 690 820 690 450 980
820 580 1110 820 690 1350
980 820 1230 980 690 1110
1110 820 820 1110 980 1230
1230 1350 1350 1230 1350 1480
1350 690 1350 1350 1110 1690
1480 1480 1230 1480 1110 1350
1690 1480 1690 1690 1690 1690
1910 1910 1690 1910 2120 1910
2120 2490 2120 2120 2250 2120
2250 2620 2120 2250 2250 2250
2370 2620 2910 2370 3030 2250
2490 2780 2780 2490 2780 3150
2620 2370 2780 2620 2620 2620
2780 3030 2490 2780 2620 3030
2910 3150 2370 2910 2780 2490
3Q3 0 2910 2490 3030 2910 2910
3150 2910ý 2120 3150 3490 2120
3280 2780 2490 3280 2250 2780
3490 3490 1910 3490 3280 1910
87
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 320 1910 100 100 1910
320 450 1350 320 450 1690
450 820 1480 450 320 1350
580 450 1480 580 450 1350
690 690 1350 690 580 1350
820 450 820 820 450 1350
980 450 980 980 690 980
1110 820 980 1110 690 1110
1230 820 1480 1230 820 1480
1350 820 1230 1350 820 1480
1480 1350 1690 1480 1480 1480
1690 1690 1690 1690 1910 1690
1910 3490 1910 1910 2120 1910
2120 2120 2120 2120 2780 2120
2250 2250 2620 2250 2780 2250
2370 2780 2490 2370 2780 2490
2490 2780 2910 2490 3030 2780
2620 3150 2620 2620 2780 2910
2780 3280 2780 2780 2910 2780
2910 3150 2910 2910 3030 2250
3Q30 3150 2620 3030 3150 2490
3150 3280 2250 3150 3150 2250
3280 3030 2620 3280 3030 2370
3490 3490 00 3490 3490 1910
88
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 820 1480 100 3490 1690
320 820 1480 320 1230 3150
450 100 1230 450 690 1480
580 320 980 580 580 580
690 690 1110 690 450 580
820 2780 690 820 820 1350
980 450 690 980 2910 690
1110 980 820 1110 1110 1230
1230 00 1230 1230 980 1480
1350 1110 1230 1350 1110 690
1480 820 00 1480 320 1480
1690 1110 1480 1690 1690 320
1910 1480 1690 1910 1910 3490
2120 2370 2780 2120 2490 2120
2250 00 3030 2250 3030 2120
2370 3150 2910 2370 2780 1910
2490 3030 3030 2490 2780 2780
2620 2620 2490 2620 3150 2370
2780 2910 2250 2780 3150 2120
2910 2780 3280 2910 3030 2910
3030 2780 2120 3030 00 2250
3150 2916 2120 3150 3280 1910
3280 3280 2120 3280 2910 1480
3490 1910 1690 3490 3280 100
89
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 320 450 100 320 1690
320 320 1230 320 820 1480
450 690 1110 450 690 1230
580 820 980 580 580 1690
690 580 820 690 580 1230
820 320 980 820 690 1230
980 580 820 980 820 1230
1110 450 690 1110 980 820
1230 820 820 1230 580 1110
1350 980 980 1350 980 1480
1480 450 1110 1480 1350 980
1690 320 1690 1690 1480 1910
1910 1910 1910 1910 580 1910
2120 2910 3030 2120 2370 1910
2250 2910 3280 2250 2620 2370
2370 2780 3150 2370 2490 3030
2490 3030 2490 2490 2620 2910
2620 2780 2780 2620 2910 2780
2780 3030 2780 2780 2780 2780
2910 3150 2780 2910 2910 2490
3Q30 3150 2910 303O 3150 2910
3150 2910 3030 3150 2370 2370
3280 2620 2490 3280 2780 2620
3490 3030 2120 3490 1910 2120
90
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 320 1910 100 320 1690
320 1230 1690 320 820 980
450 320 1350 450 580 980
580 450 1480 580 1910 980
690 450 1110 690 580 980
820 580 1350 820 690 1230
980 690 820 980 580 980
1110 820 1230 1110 1230 690
1230 820 1350 1230 580 980
1350 1350 1480 1350 980 980
1480 1350 1690 1480 580 980
1690 100 1690 1690 1110 1230
1910 2120 1910 1910 1910 3490
2120 2370 2250 2120 2370 2250
2250 2370 2250 2250 2620 00
2370 3030 2370 2370 2620 2370
2490 2780 2620 2490 3030 2490
2620 2780 2490 2620 2910 2490
2780 2910 2780 2780 3030 2620
2910 2910 2250 2910 2370 3150
3Q30 3030 2620 3030 3030 2620
3150 2370 2620 3150 2780 2910
3280 3030 2250 3280 3490 2250
3490 3490 1910 3490 2490 2120
91
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 3280 1690 100 580 2620
320 690 1480 320 820 1110
450 1230 1690 450 690 1110
580 820 1230 580 690 1110
690 690 980 690 980 980
820 820 1350 820 980 820
980 980 1110 980 820 980
1110 690 450 1110 980 980
1230 820 1480 1230 690 980
1350 580 1230 1350 690 980
1480 580 690 1480 690 00
1690 1690 1910 1690 3490 3490
1910 2780 1910 1910 2910 2490
2120 2780 2120 2120 2780 2780
2250 2910 1910 2250 2620 2910
2370 2780 3490 2370 2490 2780
2490 2910 3030 2490 2780 2780
2620 3280 3150 2620 2780 2620
2780 3030 2620 2780 2780 2620
2910 3030 2370 2910 2910 2780
3Q30 2910 2620 3030 2620 2620
3150 303• 1910 3150 2780 2780
3280 303 0 2490 3280 2780 2780
3490 3030 2370 3490 2620 2780
92
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 1690 1690 100 1350 1910
320 580 1910 320 690 1480
450 580 580 450 450 1230
580 690 580 580 450 1480
690 690 1110 690 690 580
820 820 580 820 820 580
980 980 980 980 580 980
1110 580 690 1110 1110 690
1230 820 450 1230 820 580
1350 580 580 1350 690 1690
1480 1110 320 1480 1110 1110
1690 320 100 1690 1480 1910
1910 320 100 1910 2490 1910
2120 3280 3490 2120 2370 1910
2250 3030 3030 2250 2780 2490
2370 2780 2620 2370 2910 2490
2490 3030 2910 2490 2910 2910
2620 2910 2490 2620 2780 2910
2780 2780 2910 2780 2910 2910
2910 2910 2620 2910 2780 2620
3Q30 2910 2490 3030 3030 2780
3150 2780 3030 3150 3030 2250
3280 3030 2370 3280 2910 2780
3490 1910 3280 3490 3030 2120
93
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 100 1690 100 580 1690
320 580 1690 320 980 1350
450 450 320 450 1230 580
580 450 580 580 1350 1350
690 690 450 690 690 690
820 820 580 820 690 1480
980 820 00 980 980 980
1110 1110 690 1110 450 1350
1230 980 450 1230 980 580
1350 980 1690 1350 820 320
1480 1350 1110 1480 00 690
1690 1690 1690 1690 1480 1480
1910 2120 1910 1910 3030 2490
2120 2370 1910 2120 2620 2370
2250 2780 3280 2250 2910 2370
2370 3030 2490 2370 2490 2490
2490 2620 2910 2490 2780 2620
2620 2780 2910 2620 2120 2250
2780 2910 2910 2780 2620 2370
2910 303 0 2910 2910 2910 2250
3Q30 3150 2490 3030 2250 2370
3150 3150 2120 3150 3150 3030
3280 3280 2370 3280 2620 2370
3490 3490 3490 3490 2910 2370
94
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 1110 1910 100 100 2120
320 690 1230 320 1110 1910
450 1230 1110 450 450 1690
580 1110 1110 580 450 320
690 820 980 690 820 1480
820 690 820 820 1350 1690
980 820 820 980 980 2780
1110 820 690 1110 1480 100
1230 820 980 1230 1110 1690
1350 820 1480 1350 1350 100
1480 1230 1230 1480 580 2620
1690 1480 1350 1690 100 100
1910 2620 1910 1910 1910 3490
2120 2370 2370 2120 2370 2250
2250 2490 2780 2250 2120 2120
2370 2490 2490 2370 2250 2910
2490 2620 2780 2490 2490 2370
2620 2620 2780 2620 3030 3150
2780 2620 2780 2780 2780 2490
2910 2620 2490 2910 2250 3150
3030 3030 2490 3030 2780 2490
3150 2916 2490 3150 2780 2120
3280 2620 2250 3280 2780 2490
3490 2910 1690 3490 1910 3490
95
With Without With Without
Actual HRTF HRTF Actual HRTF HRTF
100 1480 1690 100 3280 3280
320 580 1480 320 580 820
450 820 1350 450 820 690
580 980 1110 580 820 580
690 580 1350 690 820 820
820 580 1110 820 820 690
980 690 820 980 690 980
1110 690 1350 1110 980 580
1230 690 1230 1230 690 820
1350 820 1480 1350 820 320
1480 1350 1110 1480 1110 1230
1690 1690 1480 1690 580 3490
1910 2120 1910 1910 2910 3030
2120 2780 1910 2120 2780 3030
2250 2780 2370 2250 2780 2780
2370 2250 2370 2370 2780 2910
2490 2780 2370 2490 2780 2620
2620 303 0 2250 2620 2780 2490
2780 2910 2490 2780 2780 2490
2910 2910 2780 2910 2780 2780
303 0 2910 2370 3030 2780 2490
3150 2620 2620 3150 2780 2780
3280 2490 2120 3280 2910 2780
3490 1910 1910 3490 3030 2780
96
10 349 10 349
32 ---- -- 328 32 328--
41.-%315
148 ,303 5 303
9 291 169 9 291
24 ill\ 0 3249
1327123 /237
135 25 1 '5 3225
14 -22148 " --- 212
189 191 169 191
10 349 10 349
3 328 32 " 328
45 ,315 4542
\8303 69,/ %303
9/ 1291 9/ 1291
821 'ý 2 93 '278
98% 12, 98% 1262
11 2349 1 1 t249
123% 237 123 /237
Fiue6.Alsbjcs epne fo cua oatos 10/3° 5,5°9 n
135% 225 1353% /225
HRT. Th -ac shwtestnaddeito f h apls h
148 - -"212 148 .=-- 212
189 191
10 349 10 349
32 - .328 3 2
58/ %3W 8 0
69\ 9
82!Z8
9827
123% 237 123237
135% 225 13 225
148 212--- 14 212
Figure 65. All subjects' responses for actual locations, 100, 320, 450, 58', 690 and82' shown left to right, top to bottom respectively. Notation: x andsolid line indicates with HRTF, o and dashed line indicate withoutHRTF. The arcs show the standard deviation of the samples. The*and + indicates the mean with and without HRTF respectively.
97
10 349 10 349
32 .,• - .-.- - - ,- 328
32 ., ". .. " .328
5445
45 / %.,315
58 303 58303
69 291 69291
82 271 82 '278
311
S262
Ill\ 49 Il\ /249I /2
13\/1*'237 123 2, 37
148 '... - 212 148 '.. 212
1 59 191 169 191
10 349 10 349
32.. 328 32 -- .. 328
45 %'.,315 45 -1 .,315
583M3 58/ P303
629 69 ~2918 271 82 1 " 278
98 -26: 98 2e2
19 191
169 191
10 349
10 349
1238 1237 123% 23
139' /225 135' /225
,5. ,5,.-
.. ,,
1498 . = . 212 148' 212-
169
191
169
191
Figu1re,.. / 66,l'ub e t 're p n e f r a t all tons 98', 11 ',12 ', 3 ', 8
10 9 349 1
10 9 349
and 1692 shown left to right,
top to bottom respectively.
Notation is
same as above.
98
10 349 10 349
232328 32.' .328456 '.315 45/ ,i6 % 315
58 I '303 585 / %.303
69, .291 99, / '291
82 1 278 82 278
98 ' 122 98 1262
I
Il'
1 131'. • /'3
il\/249 /1249
123\' # 237 123 237
135s 1 25 135' 225
148 48 %,2.4 . 212
169 191 189 191
10 349 10 349
32 % ~~~328 2 --- 2
45 - %315 45 /` I .- 315
83 /303 58 I303
6 \ 291/ / .291
921 278 821 278
11 \ 249 1249
123, \ '237 123 247
135% /`225 135 1 225
148 !-- 212 148 -. -. 212
199 191 1o 1 I2I
10 349 10 349
32 .-328 2 ....- 328
45,- %%315 45/ '.315/
,t 5\.o
/
" \•
59/ ~' 303 59/ 5303
99/ 9 9/ '291
9•2 278 82 278
I
I
19 262 99% 1262I
I
ill\\
NO1
24924
123 '. " / 237 123 \ / 237
135 % /'225 135 N 225*%
/
'.
148•--,-..-...• 212 1489_-.,212
169 191 169 191
Figure 67. All subjects' responses for actual locations, 1910, 2120, 2250, 2370, 2490and 262' shown left to right, top to bottom respectively. Notation issame as above.
99
10 349 10 349
32 . '. 328 32".,s' ' . 328
45 /%.,315 45 -%315.,,.8 \303 59/ 303
69/ 291 69/ 291/ I
821 '278 82' 276
\)\\,123\ \" / 237 123 /237
135 "'225 135' '225
148' o'212 148'. 212
169 191 169 191
10 349 10 349
32 -- . . 328 32 - - -- - M32
45 0- %.315 45/ %-315
//8 \303 58/ 303
69/216 291
2 1 278 82 '279I I
98 1262 98 % 1262
111'\ 249 111 249
123 \ 237 123' /2371\,, ,, , ) ;, 1 .3 \ ,/ 3
135 225 135/ 22513 15
148' .S - 212 148' .212
169 191 169 191
10 349 10 349
32 •.• •-- 328 32 32
45. %/315 45w. \! 315/,,/ /31
so 303 56/ 0
1/ 291 691 21
Fiue68.Alsbet'rsoss9r cullctos/7° 9° 303°, 915,38
821 279 821 276
968 1262 96% '262
ill\ 249 ill\ p249
12'.\ 237 123' /237
135' N"226 135' /225
148 . 212 148' 212-
169 191 169 191
Figure 68. All subjects' responses for actual locations, 2780, 2910, 303 0, 3150, 3280and 349' shown left to right, top to bottom respectively. Notation issame as above.
100
Appendix B. Experiment 2 Data: AAMRL HRTF and RBF HRTF
101
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 1110 1480 100 100 100
320 690 580 320 820 580
450 690 820 450 690 1110
580 450 580 580 690 820
690 580 450 690 820 690
820 580 980 820 1110 820
980 450 980 980 980 820
1110 820 690 1110 1110 1110
1230 980 980 1230 980 1230
1350 1110 1230 1350 690 1350
1480 1480 1350 1480 100 980
1690 1690 1690 1690 320 100
1910 1910 2120 1910 2370 2120
2120 2620 2490 2120 2490 2120
2250 3030 2780 2250 2620 2490
2370 2620 2620 2370 2490 2490
2490 2910 2910 2490 2370 2620
2620 303 0 3150 2620 2780 2620
2780 2910 3030 2780 2620 2620
2910 2910 2910 2910 2620 2780
303 0 2620 2620 3030 2780 2620
3150 2490 2490 3150 2490 2490
3280 3280 3280 3280 2120 2620
3490 2250 1910 3490 2370 3030
102
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 - 100 1910 100 100 3490
320 690 320 320 450 320
450 1480 1230 450 320 1110
580 1350 980 580 320 580
690 580 580 690 690 580
820 690 1110 820 690 450
980 450 690 980 690 820
1110 820 580 1110 1110 450
1230 820 1230 1230 1350 1230
1350 820 1480 1350 820 1480
1480 1480 1480 1480 1230 1480
1690 1910 1690 1690 3490 1690
1910 2780 2120 1910 2250 1910
2120 2120 2120 2120 1910 2250
2250 2780 2120 2250 2780 2370
2370 2490 2490 2370 2910 2620
2490 2490 2620 2490 2910 2620
2620 2910 3030 2620 2780 2780
2780 3030 3030 2780 2780 3030
2910 3030 2620 2910 2910 3030
3030 3150 2620 3030 3150 3030
3150 2910 2910 3150 3280 303 0
3280 2250 3030 3280 2910 3150
3490 1910 2120 3490 3280 3490
103
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 1690 320 100 1690 1910
320 1230 100 320 1110 320
450 1690 580 450 1110 320
580 580 980 580 450 450
690 980 690 690 450 690
820 820 690 820 690 820
980 1110 820 980 580 580
1110 980 980 1110 980 820
1230 980 690 1230 820 1110
1350 1350 1480 1350 1230 1690
1480 1350 1480 1480 1350 1350
1690 1690 1690 1690 1690 1690
1910 3030 2120 1910 2120 1910
2120 2120 2120 2120 2120 2120
2250 2250 2910 2250 2490 2490
2370 2370 2370 2370 2910 2620
2490 2780 2780 2490 2490 2780
2620 2780 2910 2620 2780 2780
2780 2780 2910 2780 2910 2780
2910 3030 2780 2910 2780 3150
3030 2910 2490 3030 3150 3030
3150 2910 3280 3150 3150 3280
3280 2250 2370 3280 3280 3280
3490 1910 2120 3490 3490 3280
104
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 100 100 100 100 1910
320 1690 320 320 1480 450
450 450 690 450 1690 580
580 580 690 580 1480 320
690 580 820 690 320 580
820 820 690 820 820 820
980 690 690 980 820 450
1110 690 580 1110 820 820
1230 980 1110 1230 1690 1350
1350 690 1350 1350 320 320
1480 320 1480 1480 1110 820
1690 1690 1480 1690 1910 100
1910 2120 2120 1910 1910 3490
2120 2490 2620 2120 2620 2620
2250 2620 2490 2250 3030 2370
2370 2780 2620 2370 2780 2780
2490 2910 2780 2490 2250 2620
2620 2620 2910 2620 2370 2490
2780 3030 2780 2780 3150 2910
2910 2910 2780 2910 2780 2780
3030 2910 2780 3030 2780 2780
3150 2910 3030 3150 3280 3030
3280 2370 2780 3280 2910 2910
3490 2490 2490 3490 2490 3490
105
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 1690 3280 100 1480 1480
320 450 100 320 580 580
450 320 320 450 320 1110
580 450 1350 580 320 690
690 320 320 690 980 690
820 580 580 820 450 320
980 690 450 980 690 820
1110 580 820 1110 690 1480
1230 580 450 1230 980 1110
1350 450 1480 1350 1230 1480
1480 1110 690 1480 1230 1230
1690 100 1910 1690 1480 1690
1910 2490 1910 1910 2490 1690
2120 2490 2910 2120 2370 2120
2250 2370 3150 2250 2370 3280
2370 3030 2620 2370 2490 2120
2490 3150 2490 2490 2620 2910
2620 2490 2780 2620 2370 2120
2780 3030 2780 2780 2620 3030
2910 2620 3150 2910 3030 3030
3Q30 3030 2780 3030 2490 2620
3150 29i1 3150 3150 3150 2620
3280 303 0 2910 3280 2120 2780
3490 3150 2120 3490 3150 2120
106
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 580 1690 100 100 1690
320 1230 1350 320 450 320
450 580 980 450 1230 1480
580 580 450 580 820 1110
690 980 690 690 1230 690
820 580 690 820 1230 1110
980 450 1110 980 1110 1110
1110 1110 820 1110 980 1110
1230 980 580 1230 690 1350
1350 1480 1480 1350 1110 1350
1480 1230 1350 1480 1350 1480
1690 690 100 1690 1480 320
1910 1910 2120 1910 2490 2120
2120 2250 1910 2120 2910 2370
2250 2910 2490 2250 2910 3030
2370 2370 2620 2370 3150 3150
2490 2620 2910 2490 2370 3150
2620 2910 2780 2620 3150 3"50
2780 2250 2370 2780 3150 3030
2910 2370 3030 2910 3030 3150
3030 3150 3030 3030 3150 3150
3150 3030 2490 3150 3280 3030
3280 2780 3030 3280 2120 3280
3490 3280 3280 3490 2910 3280
107
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 3490 100 100 1690 3280
320 1480 1480 320 1480 100
450 1230 580 450 1350 580
580 1110 820 580 1110 1230
690 580 690 690 690 580
820 690 1110 820 690 580
980 1230 1230 980 580 1110
1110 690 1230 1110 580 820
1230 690 1230 1230 1110 820
1350 1350 1350 1350 1230 580
1480 1480 1480 1480 100 580
1690 1690 1480 1690 2120 1910
1910 2250 1910 1910 2490 2120
2120 2370 2120 2120 2370 3030
2250 3150 2370 2250 2490 2780
2370 3150 2490 2370 2910 2620
2490 3030 2370 2490 2370 2780
2620 3030 2910 2620 3030 2910
2780 3280 3030 2780 2780 2780
2910 3030 3150 2910 2910 2780
303 0 2370 3150 303 0 303 0 3150
3150 30 0 3150 3150 2910 3030
3280 2250 3280 3280 2490 3150
3490 2120 3150 3490 3030 2120
108
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 450 2370 100 1690 1480
320 320 580 320 1110 1230
450 580 450 450 980 820
580 1110 580 580 820 690
690 1110 690 690 820 820
820 980 690 820 580 580
980 820 580 980 1230 820
1110 980 1110 1110 690 690
1230 820 980 1230 820 980
1350 1110 580 1350 980 1230
1480 580 450 1480 1350 1110
1690 1910 1690 1690 1690 1690
1910 2120 2120 1910 1910 2120
2120 2370 2370 2120 2370 2250
2250 2370 2490 2250 2370 2370
2370 2370 2490 2370 2780 2370
2490 2490 2490 2490 2620 2780
2620 2910 2910 2620 2780 2620
2780 2910 2620 2780 2780 2780
2910 2780 3030 2910 2370 2910
3030 2490 2490 3030 00 2910
3150 3030 2780 3150 2120 2620
3280 3030 2370 3280 2250 2370
3490 2250 2370 3490 3280 1910
109
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 1350 1690 100 1690 1480
320 1110 980 320 1350 580
450 1230 580 450 980 1230
580 820 1230 580 820 690
690 580 820 690 450 580
820 820 820 820 450 820
980 690 820 980 690 580
1110 690 1110 1110 690 580
1230 820 980 1230 1110 980
1350 580 1480 1350 1110 1350
1480 1350 1480 1480 1350 1480
1690 1480 1480 1690 1480 1690
1910 2490 1910 1910 2120 1910
2120 2120 2250 2120 2250 2250
2250 2910 2620 2250 2780 2620
2370 2780 2370 2370 2780 2910
2490 2620 2490 2490 2620 2620
2620 2780 2780 2620 3150 3030
2780 3030 2620 2780 3150 3030
2910 2910 2490 2910 3030 2910
3030 2910 3030 3030 3030 2780
3150 2620 3030 3150 2620 2620
3280 2910 3150 3280 3030 2620
3490 3150 2120 3490 2910 1910
110
AAMRL RBF AAMRL RBF
Actual HRTF HRTF Actual HRTF HRTF
100 100 100 100 1480 100
320 450 320 320 580 820
450 320 320 450 690 980
580 690 320 580 690 580
690 450 320 690 690 980
820 450 820 820 820 1110
980 580 450 980 980 580
1110 450 450 1110 1110 820
1230 690 450 1230 690 1230
1350 580 320 1350 1110 690
1480 320 320 1480 690 1110
1690 100 100 1690 1230 1230
1910 100 100 1910 100 2120
2120 3150 3280 2120 2370 2250
2250 3150 2910 2250 2490 2370
2370 3150 3150 2370 2780 2490
2490 2780 3030 2490 2490 2780
2620 2910 3150 2620 2910 2780
2780 3030 303 0 2780 2780 2620
2910 2780 3030 2910 2490 2910
3030 3150 2780 3030 2620 3030
3150 3150 3030 3150 2490 2620
3280 3280 3280 3280 2370 3030
3490 3280 3490 3490 2370 2370
111
10 349 10 349
2332 - - - -28 2- 1 32
45`/S ,315 45/ %315
S58 \303
69 291 69291
82 1 .278 82 I2781 198 1 262 98 |1262
11/1249Ill\//249
123 123 237
135 "225 13 225
"3N9 -212
199 191 169 191
10 349 10 349
3,2 - - -- - '- 328 32 . -- - - .-- 328
545/0 % 11315 45,1- %.315
8 \303 0/ \303
69 291 89 291
82 278 82 278
98 1262 1,
9 35 %39 225 135322
•148 " , " 212 148 -'. ., " 212
189 191 189 191
10 349 10 349
a2 ,.•.,3128 32 328"" , .41,45 • %%315 45/ %315
58 / \303 58 f-/ %303
69 \291 691 291
/ /
82 l•••278 82 [••278
981262 18262
Il\29111 \' 249
123 /237 123 /237
135 225 135% /225
148 - - - - - - - 212 1489 -" , -__ - - - 212
199 191 169 191
Figure 69. All subjects' responses for actual locations, 10', 32', 459 , 58', 69' and
82' shown left to right, top to bottom respectively. Notation: x andsolid line indicates with AAMRL HRTF, o and dashed line indicate RBFHRTF. The arcs show the standard deviation of the samples. The,* and+ indicates the mean AAMRL and RBF HRTF respectively.
112
10 349 10 349
32 328 32 - - -... - .-328
4 5 0 %31 5 4 5 -1 3 1 5
58303 / %303
9 ~291 8/ ~291
278 8278
88 1262 98- 1262
1 /1249 Il\14
123 237 123\ /237
135% .225 135% /225
1 4 8 -- 2 1 2 1 4 8 '- o' 2 1 2
169 191 169 191
10 349 10 349
32 328 32 - .328
45 \ %315 45 N 0 ' 3 15.,
,3° 5 /8 %303 58, 303
69 I28l291 69 291
82 '278 82 278
98 1262 988 1 262
Ill 24 Il\249123\, 237 123% / /237
135%. 225 135' 225
148'. 212 148 '..12
169 191 169 191
10 349 10 349
32 ,328 32 - .328
45 / ' .315 45 %.-,315
5 303 58/ %303
19 291 9 1291
821 278 821 278
Fig re70 Al s2ec s8rs on e fo c u l o ai ns 8 ,11 ° 2 1 35,848
I''
3
'
98 24288 1242
Il\%/24 ill 24123 /237 123Q /237
and 169' shown left to right, top to bottom respectively. Notation is
same as above.
113
10 349 10 349
- 32 -- 328 32 - .328
45/ %.315 45 .p , %315
58/ 303 58/ / 3'303/
/ / •9 #291 89/ / \ 291
82 % 278 82 278
3I
I
98 1 262 98
1 262\I
/ 11
1 1 1 \2 4 9 i l l \ / 2 4 9
123. \/237 12323 / '237
/'.
/ d23
135 /225 135'. 225
148 . 212 148 * '. -'- 212
169 191 189 191
10 349 10 349
32 328 32- .328
45 ' ., 315 45 -%315
58•0 / /. <3 58 5 3',03
69 2/ ,1/ 92 9 1I I
82//19 82/8212 278 83
98% ,2,2 8 /282
,249 I 249
93/ l6 98
/
123'. \ / 237 123'. \ / 237/\S.
\ /
135 \0 225 135 % / 225S.
S
1 48 212 148 '212
169 191 188 191
10 349 10 349
32 328 32 - .3284 5 ' % 3 1 5 4 5 N 3 1 5
821 1278 82 '1278
I 2
98% V262 98% 1283
1il 249 ill %
249
123'\ . ",/,237 123'. \ /237135 %%,,
/225 135 %%b
/225
148 212- 148 .=------ 212
* 169 191 168 191
Figure 71. All subjects' responses for actual locations, 1910, 2120, 2250, 2370, 2490and 262' shown left to right, top to bottom respectively. Notation issame as above.
114
10 349 10 349
- 32 - -- . ",,,328 32 - .... -'.-- 328
45 // -' % % 315 45 - % ,315
"5 / / / 3 o 5 "58 / 3 5/ 03
69/ 291 89/29I
I821 278 82 27
98 , . 1262 982 '262
1 1 •" I2 4 9
ill /24 111% 412% 237 123~ 237
123 \3
135% 9225 135% 225
14 21 - 48 . --- 1
169 191 169 191
10 349 10 349
32 -, 328 32 - - -- - - 328
4 5 % 3 1 5 4 5 / % 3 1 5
58/ 3 58/ 303
69/ 291 9 291
82 278 82 278
98 1262 98 '2622 419 1 1 1 %
2 4 9
I I
1237 123 237a /
a
135 N . 225 135 ."225
148 ' -.- - - - - 212 148 . - - - - . 212
169 191 169 191
10 349 10 349
• -- '-'--S 3 .. **-. 32
3 2" .3 2 8 3 2 . -
45 %,.%315 45 -31
58/ 58 /
69/ %291 69/29
62 .278 278
98 262 98 1262
I\249 i% \249
2237 123 • /237123 \3
135 225 135% /225
148 . - - 212 148 - s - 212
169 191 169 191
Figure 72. All subjects' responses for actual locations, 2780, 2910, 3030, 3150, 3280and 3490 shown left to right, top to bottom respectively. Notation issame as above.
115
Appendix C. Matlab M-files
A sampling of the graphic commands in Matlab is listed in table 16.
Line and area fill commands. 3-D objects.plot3 Plot lines and points in 3-D. cylinder Generate cylinder.fill3 Draw filled 3-D polygons in 3-D. sphere Generate sphere.
comet3 3-D comet-like trajectories. Volume visualization.Contour and other 2-D plot slice Volumetric visualization plots.
of 3-D data. Graph appearance.contour Contour plot. view 3-D graph viewpoint
contour3 3-D contour plot. specification.clabel Contour plot elevation labels. viewmtx View transformation matrices.
contourc Contour plot computation. hidden Mesh hidden line removal mode.pcolor Pseudocolor (checkerboard) plot. shading Color shading mode.quiver Quiver plot. axis Axis scaling and appearance.
Surface and mesh plots. caxis Pseudocolor axis scaling.mesh 3-D mesh surface. colormap Color look-up table.
meshc Combination mesh/contour plot. Graph annotation.
meshz 3-D Mesh with zero plane. text Text annotation.surf 3-D shaded surface. title Graph title.
surfc Combination surf/contour plot. xlabel X-axis label.surfl 3-D shaded surface with lighting. ylabel Y-axis label.
waterfall Waterfall plot. zlabel Z-axis label for 3-D plots.
Table 16. Three dimensional graphic commands. Copyright (c) 1984-93 by TheMathWorks, Inc.
C.1 angleresponse.m
function [meanerr, stdall] =angleresponse (allsubjects);% ANGLERESPONSE Determines the mean error and standard deviation from file% allsubjects for each angle of azimuth in experiments one and two.% Written by Capt John K. Millhouse
[m,n]=size(allsubjects);% correct for bad responsesfor i=1:m,
for j=l:n,if allsubjects(i,j)==O,
allsubjects(i,j)=allsubjects(i,1);endif abs(allsubjects(i,j)-allsubjects(i,l))<180,
116
errall(i,j)=allsubjects(i,j)-allsubjects(i,l);else
errall(i,j)=allsubjects(i,j)-allsubjects(i,l)-360;end -
endendaz=errall(1:24,1);%. calculate mean and std and plot resultsfor i=1:24,
meanerr(i,1:3)=mea~n(errall(i:24:480,1:3));stdall(i,1:3)=(std(errall(i:24:480,1:3)));thetal~zeros (40,1);thetal(1:2:40)=allsubjects(i:24:480,2)*pi/180;rho lzeros (40, 1);rhol(1:2 :40) =ones (20 ,1);theta2=zeros (40,1);theta2(1:2:40)=allsubjects(i:24:480,3)*pi/180;rho2=zeros (40,1);rho2(1:2:40)=.95*ones(20,1);mypolar(thetal,rhol, 'g-')
hold onmypolar(theta2,rho2, 'r:')mypolar(allsubjects(i:24:480,2)*pi/180,ones(20, 1) ,Igmypolar(allsubjects(i:24:480,3)*pi/180, .95*ones(20,1), 'ro')mypolar([0 (allsubjects(i,1)+meanerr(i,2))*pi/180] ..
[0 1.05],'g*')mypolar([0 (allsubjects(i,1)+meanerr(i,3))*pi/180],.
[0 1.05],'r*')hold offtitle(['azimuth ,.
nium2str(allsubjects(i,1)),', ,.
nuna2str(meanerr(i,2)),', I,.
nuni2str(meanerr(i,3))])pause
end
C.2 anova.m
% get data %%%%%%%%%%%%%%%%7.%%.%%%%%%%%%%%allsubj ects=allsubjects2;trial=l;index=O;actual=allsubjects(:,4);with~allsubjects(: ,2);without=allsubjects(: ,3);
117
% correct for bad responses... .make mean of other subjects%without (107)=122 .78;%without (275)=122 .78;la=length(actual);for i=1:la
index~index+1;if index==25, index=1; endif with(i)0=, with(i)=sum(with(index:24:480) )/19;endif without(i)==O, without(i)=sum(without(index:24:480))/19;end
endfor i1l:la
if with(i)<180,rwith(i)=180-with(i);
elserwith(i)=540-with(i);
endif without(i)<180,
rwithout (i)=180-without Ci);else
rwithout Ci) =540-without Ci);end
end
witherr=zeros (480,1);withouterr~zeros (480, 1);rwitherr=zeros (480,1);rwithouterr=zeros (480,1);for i1l:la,
witherr Ci) =actual Ci) -with(i);withouterr(i)=actual(i)-without Ci);rwitherr(i)=actual (i)-rwith(i);rwithouterr(i)=actual(i)-rwithout Ci);if abs(witherr(i)).>180,
witherr(i)=abs(360 - abs(witherr(i)));else
witherr (i)'=abs (witherr Ci));endif abs Cwithouterr Ci)) >180,
withouterr~i)=abs (360 - abs Cwithouterr~i)));else
withouterr~i)=abs Cwithouterr~i));end
118
if abs(rwitherr(i))>180,rwitherr(i)=abs(360 - abs(rwitherr(i)));
elserwitherr(i)=abs (rwitherr(i));
endif abs(rwithouterr(i) )>180,
rwithouterr(i)=abs(360 - abs(rwithouterr(i)));else
rwithouterr (i)=abs (rwithouterr (i));end
end%correct f or reversalrevcount=O;for i1l:la,
if witherr(i)>rwitherr(i),witherr (i) =rwitherr (i);revcount~revcount+1;
endif withouterr(i)>rwithouterr(i),
withouterr(i)=rwithouterr (i);revcount=revcount+l;
endend
% start anova %%%%%%%%%%%%%%%X=[witherr withouterr];a=2;b=24;c=20;n=1;% get totals %%%%%%%%%%%%%%%%.%%/*.*./*./*/*./*./'./%%%%%%%.*/*%%%%%%
Ti~zeros(a,1);Tj=zeros(b,1);Tk~zeros(c,1);Tij~zeros(a,b);Tij2=zeros(a,b);Tik=zeros(a,c);Tjk~zeros(b,c);for i=l:afor j1l:bfor k=1:c
Ti(i)=Ti(i)+getx(X,i,j ,k);Tj (j)=Tj (j)+getx(X,i,j ,k);Tk(k)=Tk(k)+getx(X,i,j ,k);Tij (i,j)=Tij (i,j)+getx(X,i,j ,k);
119
Tik(i,k)=Tik(i,k)-9getx(X,i,j ,k);
Tjk(j ,k)=Tjk(j ,k)+getx(X,i,j ,k);
end
end
end
Tijk=X;
T~sum(sum(X));
% get sum of squares %Y%%%%%%%.Y%%%%%7%%%%%%%%%%SSa=O; SSb=O; SSc=O;
SSab=O; SSac=O; SSbc=O;
SSt=O; SSe=O;
for i1l:a
end
for j=1:b
SSb=SSb-i Tj(j).-2/(a*c);
endfor k1i:c
SSc=SSc+ Tk(k).-2/(a*b);end
for i1l:afor j1l:b
SSab=SSab+ Tij(i,j).-2/(c);
end
end
for i=1:a
for k=1:c
SSac=SSac+ Tik(i,k).-2/(b);
endend
for j1l:b
for k1l:c
SSbc=SSbc+ Tjk(j,k).-2/(a);
end
endSSab=SSab -SSa -SSb.+ T.-2/(a*b*c);
SSac=SSac -SSa -SSc + T.^2/(a*b*c);SSbc=SSbc -SSb -SSc + T.-2/(a*b*c);
SSa=SSa- T.-2/(a*b*c);
SSb=SSb- T.-2/(a*b*c);
SSc=SSc- T.^21(a*b*c);for i=l:a
for j=1:b
for k1l:c
SSt=SSt+ getx(X,i,j,k).-2;
120
endendendSSt=SSt- T.-2/(a*b*c);SHe= SSt - SSa - SSb - SSc -SSab -SSac -SSbc;
%mean squareMSa=SSa/(a-i);MSb=SSb/ (b-i);MSc=SSc/(c-i);MSab=SSab/( (a-i) *(b-i) )MSac=SSac/( (a-i) *(c-i) )MSbc=SSbc/( (b-i) *(c-i) )DFe=(a*b*c-MSe=SSe/DFe;
% F valueFa=MSa/MSe;Fb=MSb/MSe;Fc=MSc/MSe;Fab=MSab/MSe;Fac=MSac/MSe;Fbc=MSbc/MSe;
%. displaytable= [I a-i SSa MSa Fa
2 b-i SSb MSb Fb3 c-i SSc MSc Fci2 (a-i)*(b-i) SSab HSab Fabi3 (a-i)*(c-i) SSac MSac Fac23 (b-i)*(c-i) SSbc MSbc Fbco DFe SSe MSe 0o (a*b*c -i) SSt 0 0]
C. 3 drc.mr
function [range,minvalue,maxvaluel~drc(matrix);
%h Dynamic Range Compression (drc)% return range between 0 and i of input matrix columns%. Written by Capt John K. Millhouse
[m n] =size (matrix);
temp-matrix- (ones (in,i)*min (matrix));
121
range=temp./(ones(m,1)*max(temp));minvalue--min (matrix);
maxvalue--max (temp);
C.4 expmatl.mn
% Setup for experiment one% Creates a script file called 'run-.experiment'% Written by Capt John K. Millhouse
expaz=[10 32 45 58 69 82 98 111 123 135 148 169 191 212 225 237
249 262 278 291 303 315 328 349 1000 3200 4500 5800 6900 8200
9800 11100 12300 13500 14800 16900 19100 21200 22500 23700 24900
26200 27800 29100 30300 31500 32800 34900];
% randomly place sounds
row=randperm(48);
for i=1:48,
snd(i)=expaz(row(i));
temp=[PO' int2str(snd(i))];
if snd(i)< 100tsnd(i,1:3)=temp(1:3);
elseif snd(i) < 1000tsnd(i,1:3)=temp(2:4);
elseif snd(i) < 10000tsnd(i,1:3)=temp(1:3);
else
tsnd(i,1:3)=temp(2:4);
end
end
fid = fopen('run-.experiment' ,lw
fprintf(fid, '#!/bin/csh %s\n',' Y);
fprintf(fid, 'makecompass &%s\n',' ');
fprintf(fid, 'introduction %s\n',' ');
fprintf(fid,'set ans = $< %s\n',' 1);
for i=1:48,
if snd(i)<1000
texl['s32cplay -f40000 ../actualfilters/rl' int2str(snd(i)) '.d %s\n'];
tex2=['s32cplay -f 40000 . ./actualfilters/rl' int2str(snd(i)) '.d
echo ''Aactual I tsnd(i,:) "I >> results.out %s\n'];
else
122
texl=[Ls32cplay -f40000 ../meanHRTF/rl' int2str(snd(i)/100) '.d %s\n'];
tex2=Ii's32cplay -f40000 ../meanHRTF/rl' int2str(snd(i)/100) '.d
echo ''Bactual ' tsnd(i,:) ''I >> results.out %s\n'];
end
fprintf(iid,texl,,' ');
fprintf(fid,'sleep 1 %s\n',' ');
fprintf(fid,tex2,' ');
fprintf(fid,'clear Ys\n',' ');
fprintf(fid,'echo "'Select location and press return'" %s\n', );
fprintf(fid,'set ans =$< %s\n',' ');
end
end
fprintf(fid,'s32cplay -f 40000 thankyou.d %s\n',' ');
fclose(fid);
C.5 expmat2m
% Setup for experiment two% Creates a script file called 'run-..experiment'% Written by Capt John K. Millhouse
expaz=[10 32 45 58 69 82 98 111 123 135 148 169 191 212 225 237
249 262 278 291 303 315 328 349 1000 3200 4500 5800 6900 8200
9800 11100 12300 13500 14800 16900 19100 21200 22500 23700 24900
26200 27800 29100 30300 31500 32800 34900];
% randomly place soundsrow=randperm(48);
for i=1:48,
snd(i)=expaz(row(i));
temp=['0' int2str(snd(i))];
if snd(i)< 100
tsnd(i,1:3)=temp(1:3);
elseif snd(i) < 1000tsnd(i,1:3)=temp(2:4);
elseif snd(i)' < 10000tsnd(i,1:3)=temp(1:3);
else
tsnd(i,1:3)=temp(2:4);
end
end
123
fid = fopen('run-.experiment' ,'w');fprintf(fid, '#!/bin/csh %s\n',' ');fprintf(fid, 'makecompass &Ys\n',' ');fprintf(fid, 'introduction %s\n',' ');fprintf(fid,'set axis = $< %s\n',' ');
for i=1:48,
if snd(i)<1000
texl=['echo ''Aactual ' tsnd(i,:) "'' >> results.out;
s32cplay -f 40000 ../actualfilters/rl' int2str(snd(i)) '.d %s\n'.J;
tex2=['s32cplay -f 40000 ../actualfilters/rl' int2str(snd(i)) '.d %s\n'1;
else
texl=['echo ''Bactual I tsnd(i,:) ''I >> results.out;
s32cplay -f 40000 . ./netfilters/rl' int2str(snd(i)/100) '.d %s\n'];tex2=['s32cplay -f 40000 ../netfilters/rl' int2str(snd(i)/100) '.d %s\n'];
endfprintf(fid,texl,' ');fprintf(fid,'sleep 1 %s\n',' ');fprintf(fid,tex2,' ');fprintf(fid,'clear %s\n', );
fprintf(fid,'echo ''Select location and press return'' %s\n',' )fprintf(fid,'set axis = $< %s\n',' ');
end
end
fprintf(fid,'s32cplay -f40000 thankyou.d %s\n',' ');
fclose(fid);
C. 6 idrc.mr
function [matrix]=idrc(range,minvalue,maxvalue);
% inverse Dynamic Range Compression (idrc)% return matrix to original values% Written by Capt John K. Millhouse
[m n]=size(range);
temp~range.*(ones(m,l)*maxvalue);matrix~temp+(ones Cm, 1)*minvalue);
124
C.7 LRsphere.m
% Create left and right spheres of HRTF data% Written by Capt John K. Milihousek=1;for i=1:2:90,freq-i;[x y z c]=makesphere(features,freq,16);
subplot(1,2,1);surf(x,y,z,c);caxis([0, 0.7915]);view(180,0);shading interp,axis offtitle (nixn2str (features Ci, 4)))text(.1,1,0,'(&) <-- Right ear')xlabel('right side of head');
subplot (1,2, 2);surf(x,y,z,c);caxis([0, 0.7915]);view(0,0);shading interpaxis offtext(-.1,-1,0,'(&) <-- Left ear')text(-1.8,0,0,'<-- Nose -- >')
xlabel('left side of head');title(C'Hz ' )
colormapC jet)k=k+l;end
0.8 makesphere. m
function Ex y ,z, c]ft makesphere(features,freq,size)% freq range 1 to 93%. Create sphere with color corresponding to HRTF value% Written by Capt John K. Millhousek=0;clear az elev respfor i~freq:93: 25296,
k~k+1;az(k)=features(i,2);elev(k)=features(i,3);
125
resp(k)=features(i,5);endc--mapsurf(az,elev,resp,size);Ex y z]=sphere(size);
C.9 mapsurf~m
function [c] = mapsurf(az,elev,value,spheresize)% maps the HRTF value to the az and elev% locations on a sphere% Written by Capt John K. Millhouse
n~length(az);1=0;
c =zeros(spheresize+1,spheresize+1);v =c;
for I =1:n,a fix(az(I)/360*(spheresize+1))+1;e =spheresize+1 - fix((elev(I)+90)/180*(spheresize+l));if e>spheresize+1
e=spheresize+1;endif e<1
e=1;endif a>spheresize-I-
a=spheresize+1;endif a<1
a=1;endv(e ,a)=v(e ,a)+1;c(e,a)=c(e,a)+value(I);
end
for I = :spheresize+1,for J = 1:spheresize+l,
if v(I,J)==0else
c(I,J) = c(I,J)/v(I,J);end
end
126
end
C.10 mapeval.m
function [actual, netout, error]=mapeval(x, y, data);% Written by Capt John K. Millhouse% Converts LNKmap output to Matlab matrix suitable for% display% data in form [number actual netout error]
nx=length(x);ny=length(y);actual=zeros(nx,ny);netout=zeros(nx,ny);error=zeros(nx,ny);k=O;
for i=1:nx,for j=l:ny,
k=k+l;actual(i,j)=data(k,2);netout(i,j)=data(k,3);error(i,j)=data(k,4);
endendactual=actual';netout=netout';error=error';
C.11 mypolar.m
function hpol = mypolar(theta,rho,line.style)%POLAR Polar coordinate plot.% POLAR(THETA, RHO) makes a plot using polar coordinates of% the angle THETA, in radians, versus the radius RHO.
% POLAR(THETA,RHO,S) uses the linestyle specified in string S.
% See PLOT for a description of legal linestyles.
% See also PLOT, LOGLOG, SEMILOGX, SEMILOGY.
% Copyright (c) 1984-93 by The MathWorks, Inc.% Modified by Capt John K. Millhouse to display% head figure and test locations.
if nargin < 1error('Requires 2 or 3 input arguments.')
elseif nargin == 2
127
if isstr(rho)line-style = rho;rho = theta;
[mr,nr] = size(rho);if mr == 1
theta = 1:nr;else
th = (1:mr)';theta = th(:,ones(1,nr));
endelse
line-style = 'auto';end
elseif nargin == 1line-style = 'auto';rho = theta;
[mr,nr] = size(rho);if mr == 1
theta = 1:nr;else
th = (1:mr)';theta = th(:,ones(1,nr));
endendif isstr(theta) I isstr(rho)
error('Input arguments must be numeric.');
endif any(size(theta) -= size(rho))
error('THETA and RHO must be the same size.');
end
% get hold state
cax= newplot;next = lower(get(cax,'NextPlot'));hold-state = ishold;
% get x-axis text color so grid is in same colortc = get(cax,'xcolor');
% only do grids if hold is off
if -hold-state
% make a radial gridhold on;
hhh=plot([O max(theta(:))],I[o max(abs(rho(:)))]);
128
v = [get(cax,'xlim') get(cax,'ylim')];ticks = length(get(cax,'ytick'));delete (hhh);
% check radial limits and ticksrmin = 0; rmax = v(4); rticks = ticks-i;if rticks > 5 % see if we can reduce the number
if rem(rticks,2) == 0rticks = rticks/2;
elseif rem(rticks,3) == 0rticks = rticks/3;
endend
%. define a circleth = 0:pi/50:2*pi;xunit = cos(th);yunit = sin(th);
%now really force points on x/y axes to lie on them exactlyinds = [1: (length(th)-1)/4:length(th)];xiumits(inds(2:2:4)) = zeros(2,I);yunits(inds(1:2:5)) = zeros(3,1);ynose=[.24; .3; .24]*rmax;xnose=[.05; 0; -.05]*rmax;year=[.05; .05; -.05; -.05]*rmax;xear=[.24; .3; .3; .24]*rmax;
rinc = (rmax-rmin)/rticks;i=rmax;plot (xunit*i,yumit*i,':','color' ,tc, 'linewidth' ,1);i=rmax/4;plot(xunit*i,yiumit*i, '-','color' ,tc, 'linewidth' ,1);plot(xnose,ynose, '-','color' ,tc, 'linewidth' ,1);plot(xear,year, '-','color' ,tc, 'linewidth' 4);plot(-xear,year, '-','color' ,tc, 'linewidth' ,1);
% plot spokesth =[10 32 45 58 69 82 98 111 123 135 148 169 191
212 225 237 249 262 278 291 303 315 328 349] *pi/l8O;cst =-sin(th);
snt =cos(th);
cs = zeros(1,24); cst];sn =[zeros(1,24); snt];%plot(rmax*cs,rmax*sn,': ','color' ,tc, 'linewidth' 4);
129
% annotate spokes in degrees
rt = 1.1*rmax;for i = 1:max(size(th))
text(rt*cst(i),rt*snt(i),int2str(th(i)*180/pi),'horizontalalignment','center')
end
% set viewto 2-Dview(O,90);
% set axis limitsaxis(rmax*[-1 1 -1.1 1.1]);
end
% transform data to Cartesian coordinates.xx = rho.*(-sin(theta));yy = rho.*cos(theta);
% plot data on top of gridif strcmp(line.style,'auto')
q = plot(xx,yy);else
q = plot(xx,yy,line-style);end
if nargout > 0hpol = q;
endif -hold-state
axis('equal');axis('off');end
% reset hold stateif -hold-state, set(cax,'NextPlot',next); end
C.12 plotresult.m
function [h]=plotresult(result,rquestion);
% plot results of a subjects response% Written by Capt John K. Millhouseif nargin < 2
rquestion = 'nor';endrhoslO=ones(1,360);rhos20=ones(1,360);
130
for i=1:length(result),thetal=result (i, 2) *pi/180;theta2=(result (i 33)) *pi/180;lt='yo'; -
if rquestion'=rev',if abs(result(i,4))>abs(result(i,6))
theta2=(result (i, 5)) *pi/180;
end r+'end
index=round(theta2*180/pi)+1;if result(i,1)=10
rhoslO(index)=rhosl0(index)- .05;
rholO~rhoslO(index);subplot (1,2, 1)mypolar(thetal,1, 'gx')hold onmypolar (theta2, rho10, lt)mypolar( [thetal; theta2], [1; rholO] ,)g
title('With HRTF')else
rhos20(index)=rhos2O(index)- .05;rho20=rhos20 (index);subplot(1,2,2)mypolar(thetail , 1 mx')hold onmypolar(theta2,rho20 ,lt)mypolar( Ithetal; theta2] ,[1; rho20] ,)i-'))title('Withbut HRTF')
endend
for i=1:2subplot (1,2, i)hold offend
C.13 plot hi st.m
function [correct lOtheta, correct20theta] =plothist (testall ,testlO,test2O,rquestion,correctdeg);
% Written by Capt John K. Millhouse% Plots Polar histogram of data of correct responses%. testall is all test responses
131
% testlO is test responses for filterl% test20 is test responses for filter2% rquestion is 'rev' reverses or 'nor' no reverses% correctdeg is the number of degrees off an still correct
if nargin<5correctdeg=23;
end%%%%%%% plot histogramsactualtheta=testall(:,2)*pi/180;testtheta=testall(:,3)*pi/180;testwrevtheta=[testlO(:,3)*pi/180; test20(:,3)*pi/180];testlOrevtheta=[testlO(:,3)*pi/180];test20revtheta=[test20(:,3)*pi/180];testlOtheta=testall(l:length(testlO),3)*pi/180;test2Otheta=testall(length(testlO)+l:length(testall),3)*pi/180;for i=l:length(testlO),
if abs(testlO(i,4))<correctdeg,correctlOtheta=[correctlOtheta; testlO(i,2)*pi/180];
endendfor i=l:length(test20),
if abs(test20(i,4))<correctdeg,correct2Otheta=[correct2Otheta; test20(i,2)*pi/180J;
endendbins=180;
% number of bins in the histogram% if the number of bins is >90 then offset the bins for displayif bins>90
correctlOtheta=correctlOtheta+(pi/90);correct20theta=correct2Otheta-(pi/90);testlOtheta=testlOtheta+(pi/90);test20theta=test2Otheta-(pi/90);
end[tact,ract]=rose(actualtheta,bins);[tclO,rclO]=rose(correctlOtheta,bins);[tc20,rc2O]=rose(coirect2Otheta,bins);
[tlO ,rlO] =rose (test lOtheta,bins);[t20,r20] =rose (test2Otheta,bins);
figuremypolar(tact,ract/2,'w:')hold onmypolar(tlO,rlO,'w-.')
132
mypolar(t20,r20, 'w-')title(['Histogram of all responses'])legend('w:','Actual','w-.','with HRTF','w-','without HRTF')hold offfiguremypolar(tact,ract/2, 'w:')hold onmypolar(tclO,rclO, 'w-.')
mypolar(tc2O,rc2O, 'w-')title(['Histograin of correct responses I I(' rquestion ')'])legend('w:','Actual','w-.','with HRTF correct','w-','without HRTF correct')
hold off
C. 14 readall. m
function [rmserr,stderr,testall,revall,testlO,test2O]=readall(rquestion, subjects);
%%%%%%%%%% %%%%../ %%%%%%%/W.././/%Y%%%%%%%.%%%%%%%%%%%%%%%X%%%. Read all the test subjects data and save to a %% file named alisubjects%% by J. Millhouse%
%folder= ['meanHRTFresults'];folder= ['results'];
if nargin < 1rquestion = 'nor'
endif nargin<2
if length(folder)=15% meanxHRTFresults %%%%%.%%%%%%%.%%%%%X%%%%%%%%Y%%X%%Y..%%%%%%%%%%%%subjects=['bsmit~hl.out , 'bsmith2.out' ,'jmillhousel.out',....
'jmillhouse2.out' ,'bmcql.out' , bmcq2.out',...'ceisenbiesl.out' ,'ceisenbies2.out , 'gharrupl.out',...'~lmyersi.out' , lmyers2.out' , rdenneryl.out',....'sstewartl.out' , twilsonl.out' , twilson2.out',..'amillhousel.out , 'bobl.out' , jrmillhousel.out',....'djenningsl.out' ,'rjreidi.out'];
else% results %%%%%%%%%%%%%%XX%%%%%%%X%%%%%%%%%%%%%Y%%%Y%%Y/%%%%%%subjects=['ksmithl.out' , rdenneryl.out' ,'jmillhousel.out',....
133
'jmillhouse2.out' , bsmithi.out' , twilsonl.out',...'gharrupl.out , 'jrmillhousel.out' ,'jrmillhouse2.out',...'lmyers2.out' , lmyers3.out' , ddouglasl.out',...
- ddouglas2.out' , rmillhousel .out' , rmillhouse2.out',...'ceisenbiesl.out , 'ceisenbies2.out' , djenningsl.outI,....'srogersl.out' ,'amillhousel.out'];
end
endtestall=EJ;testall2=[];revall=[];start=1;finish=1;for i=4: length(subj ects)
j=i-3;if subjects(j :i)==3.out'
finish~i;na~me=subjects(start :finish-4)[test rev]=readresult([folder '/1 name 'ot)%f igure%plotresult (test, rquestion);start~finish+1;testall=[testall; test];revall=trevall; rev];
%%%%%%%%%%%%%.%%%%%X%%%%%%%%%X%%X7..%%%%%%%.Y%%%%%%% make output formatevalCEname '=zeros(24,3);I]);f=sortdata(testC: ,1:3) ,2);
c=0;oldaz=O;for a=1:length(f)
if f(a1l)==1O,b=2;
elseb=3;
endaz=f(a,2);if oldaz<az c~c+l; endoldaz~az;eval([name '(c,1)=f(a,2);']);eval(Ename '(c,b)=f(a,3);']);
endeval(['testall2=[testall2; I name 1')
end
134
endeval(['save ' folder '/allsubjects testall2 -ascii -tabs'])testall=sortdata(testall, 1);revall=sum(revall);
testi 10=[;test20=[];test=testall;for i=l:length(testall)%%%%%%%%%%%%%%7Y/%/%/%7%%%%%%%%%%%%%%%%%%%//Y%%%%////%%%%% get reversalsif rquestion'=rev',if abs(testall(i,4))>abs(testall(i,6))
testCi, 4) =testCi, 6);test(i,6)=test(i,3);test(i,3)=test(i,5);test(i,5)=test(i,6);test(i,6)=999;
endend
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% split into twoif testall(i,1)==10
testlo=[testlo; test(i,:)];else
test20=[test2O; test(i,:)];end
end
avgerr=[mean(testlO(: ,4)) mean(test20(: ,4))];rmserr=sqrt(avgerr. -2);stderr=[std(testIO(: ,4)) std(test20(: ,4))];
Y%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%W/%%%%Y% plot histograms%plothist (testall, test10, test2 , rquestion);
C.15 eoeemfunction [y,z]=removerev(x);% Written by Capt John K. Hillhouse
[m,n]=size(x);
for i=i:m,if x(i,n)<999,
y~ly; x~i,:)];else
135
z=[z; x(i,:)];end
end
C.16 sortdata.m
function [ Y J=sortdata(filel, column)
En,m]=size(filel);[X,I]=sort(filei);for j1l:n,
for k=1:m,if k=column
Y j , k)=X Q , column);else
Y(j ,k)=filel(I(j ,column) ,k);end
endend
136
Bibliography
1. Abbagnaro, Louis A., et al. "Measurements of Diffraction and Interaural Delayof a Progressive Sound Wave Caused by the Human Head," Journal of theAcoustical Society of America, 58:693-700 (September 1975).
2. Begault, Durand R. "Head-Up Auditory Display for Traffic Collision AvoidanceSystem Advisories: A Preliminary Investigation," The Journal of the HumanFactors and Ergonomics society, 35(4):707-717 (December 1993).
3. Begault, Durand R. and Elizabeth M Wenzel. "Headphone Localizationof Speech," The Journal of the Human Factors and Ergonomics society,35(2):361-376 (June 1993).
4. Blattner, Mecra M., et al. "Earcons and Icons: Their Structure and CommonDesign Principles," Human-Computer Interaction, 4:11-44 (1989).
5. Blauert, Jens. Spatial Hearing. MIT Press, 1983.
6. Bly, Sara. Sound and Computer Information Presentation. Thesis, ComputingScience Group University of California, Davis, Davis, CA, March 1982.
7. Fitts, P. Psychological research on equipment design. A study of location dis-crimination ability 19, Columbus, Ohio: Ohio State University: Army Air Force,Aviation Psychology Program., 1947.
8. Furness, T. "The Super Cockpit and its human factors challenges." Proceedingsof the Human Factors Society 30th Annual Meeting. 48-52. Santa Monica, CA:Human Factors Society, 1986.
9. Gonzalez, Rafael C. and Richard E. Woods. Digital Image Processing. Addison-Wesley, 1992.
10. Kahn, M. J. "Human awakening and subsequent identification of fire-relatedcues." Proceedings of the Human Factors Society 27th Annual Meeting. 806-810. Santa Monica, CA: Human Factors Society, 1983.
11. Kuhn, George F. "Model for the Interaural Time Differences in the AzimuthalPlane," Journal of the Acoustical Society of America, 62:157-167 (July 1977).
12. McKinley, Richard L. Concept and Design of an Auditory Localization CueSynthesizer. MS thesis, Air Force Institute of Technology (AU), 1988.
13. Middlebrooks, John C. "Directional Sensitivity of Sound-Pressure Levels in theHuman Ear Canal," Journal of the Acoustical Society of America, 86:89-108(July 1989).
14. Milton, J. S. and Jesse C. Arnold. Introduction to Probability and Statistics(second Edition). McGraw-Hill, 1990.
15. Moller, Henrik. "Fundamentals of Binaural Technology," Applied Acoustics,36:171-218 (March 1992).
137
16. Neti, Chalapathy, et al. "Neural Network Models of Sound Localization Basedon Directional Filtering by the Pinna," Journal of the Acoustical Society ofAmerica, 3140-3156 (December 1992).
17. Oldfield, Simon R. and Simon P. A. Parker. "Acuity of Sound Localisation:A Topgraphy of Auditory Space. I. Normal Hearing Conditions," Perception,13:581-600 (1984).
18. Oldfield, Simon R. and Simon P. A. Parker. "Acuity of Sound Localisation: atopography of auditoy space. II. Pinna cues absent," Perception, 13:601-617(1984).
19. Park, Jooyoung and Irwin W. Sandberg. "Approximation and Radial-Basis-Function Networks," Neural Computation, 5(2):305-316 (March 1993).
20. Rogers, Steven K., et al. An Introduction to Biological and Artificial NeuralNetworks. Wright-Patterson AFB OH, 45433: Air force Institute of Technology(AU), October 1990.
21. Ruck, Dennis W. From a conversation, May 1994.
22. Sanders, Mark S. and Ernest J. McCormick. Human Factors in Engineeringand Design (seventh Edition). McGraw-Hill, 1993.
23. Scarborough, Captain Eric L. Enhancement of Audio Localization Cue Synthe-sis by Adding Environmental and Visual Cues. MS thesis, Air Force Instituteof Technology (AU), 1992.
24. Shaw, Edgar A.G. "Acoustical Characteristics of the Human External Ear."Conference on Binaural and Spatial Hearing. 1993.
25. Smith, Brain A. Binaural Room Simulation. MS thesis, School of Engineering,Air force Institute of Technology (AU), Wright-Patterson AFB OH, December1993.
26. Smith, Stuart, et al. "Stereophonic and Surface Sound Generation for Ex-ploratory Data Analysis." Chi '90 proceedings. 125-132. April 1990.
27. Spiegel, Murray R. Schaum's Outline of Theory and Problems of Statistics(second Edition). Schaum's Outline Series, McGraw-Hill, 1994.
28. Stevens, S. S. and E. B. Newman. "The localization of actual sources of sound,"American Journal of Psychology, 48:297-306 (1936).
29. Watkins, A. J. "Psychoacoustical aspects of synthesized vertical locale cues,"Journal of the Acoustical Society of America, 63:1152-1165 (1978).
30. Wenzel, Elizabeth M. "Localization in Virtual Acoustic Displays," Presence, I(1992).
31. Wightman, Frederic L. and Doris J. Kistler. "Factors Affecting Relative Impor-tance of Sound Localization Cues." Conference on Binaural and Spatial Hearing.1993.
138
Captain John K. Millhouse was born on 11 October 1966 in Milwaukee, Wis-
consin. He graduated from Centerville High School, Centerville Ohio in 1985 and
attended Ohio University from which he received a Bachelor of Science degree in
electrical engineering in June 1989. He was commissioned though the Air Force Re-
serve Officer Training Corp (AFROTC) at Ohio University and assign to the Range
and Air Base Systems Program Office, Air Combat Maneuvering Instrumentation
(ACMI) Directorate, Eglin AFB, Florida. He entered the Masters Program in the
School of Engineering, Air Force Instituted of Technology, in June 1993.
He is married to Audra L. Millhouse (Baker) from Grafton, Ohio. They have
one son James Jerome Millhouse.
139
SForm Approved
REPORT DOCUMENTATION PAGE No. 0704-0188c- - r or :-. .ec,.oh f ,tcrma.on is est,,matea :o average 1 hour e -esporse. ecoudi9 the t ,me for ee in sru os searching ex ~sng aata sources,
.ather ane •n o•lt he eq te ja:a needed, and cormpletir'g 3ha red ie- d :re *2Ilectnon of PformatlOn. Send comments reoac nq this burden estirate or any Dtier asoect of tes
zuje-_ on Of tn-a:'or. ncludng suggestions for reducing tins ourden t o -islh,ngton Headauarters Services, Directorate o- information Operations and Reocrts. 1215 Jefferson
Da,!s H~h vSuit.e 12C4, 4,,hnctci iA 22202-4302, and to tie Dffi2e f ManQaemeet and Budget, Pýnerwork Reducton Project (0704-0138), Vashinton DC 20503
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3 REPORT TYPE AND DATES COVEREDecember 1994 aster's Thesis
4. TiTLE AND SUBTITLE 5. FUNDING NUMBERS
[Iead Related Transfer Function Approximation Using Neural Networks
6. AUTHOR(S)
rohn Kindell Millhouse
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATIONI. REPORT NUMBER
Air Force Institute of Technology, WPAFB OH 45433-6583 REOT/ NUME RAFIT/GE/ENG/94D-21
9. SPONSORING 'MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING / MONITORINGAGENCY REPORT NUMBER
None
11. SUPPLEMENTARY NOTES
12a. DISTRIBUTION!/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE
Distribution Unlimited
13. ABSTRACT (Maximum 200 words)
This thesis determines whether an artificial neural network (ANN) can approximate the Armstrong Aerospace Medical
Research Laboratories (AAMRL) head related transfer functions (HRTF) data obtained from research at AAMRL
during the fall of 1988. The first test determines whether HRTF lends any support in sound localization when
compared to no HRTF (Interaural Time Delay only). There is a statistically significant interaction between the
location of the sound and whether the HRTF or no HRTF is used. When this interaction is removed using the
alternate F-Value, the statistics give the result of equal means for the filters and azimuth. This means that at
certain angles of azimuth, the HRTF either provides no advantage at all or hinders localization capabilities. Adding
in the corrections for reversals changes the results to where the means of the azimuth are not statistically equal. The
reversal corrections will inherently reduce the error results. However, comparing the number of reversals indicates
an advantage of using the HRTF over no HRTF. The second test determines whether HRTF and ANN lend the
same amount of imformation in sound localization when compared to each other. With reversal corrections included,a statistical advantage is not indicated, and the means of the two filters are statistically equal. Also there is a
statistically significant interaction between the location of the sound and whether the AAM RL HRTF or ANN
HRTF is used. Comparing the number of reversals does not indicate a large difference between the AAMRL HRTF
and the RBF HRTF.
14. SUBJECT TERMS 15. NUMBER OF PAGES
3D sound, HRTF, Neural Networks 15516. PRICE CODE
17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACTOF REPORT1 OF THIS PAGE OF ABSTRACT
UNCLASSIFIED UNCLASSIFIED IUNCLASSIFIED UL
NJSN 7540-01-280-5500 Standard Form 298 (Rev 2-89)Prescribed by 4NS Std Z39-18298-102