On the manifolds of spatial hearing

Embed Size (px)

DESCRIPTION

Human spatial hearing How are humans able to judge the direction of a sound source? Why do we have two ears? Why is the pinna shaped the way it is?

Citation preview

On the manifolds of spatial hearing
Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park NIPS 2006 workshop on novel applications of dimensionality reduction December 9, 2006 Human spatial hearing How are humans able to judge the
direction of a sound source? Why do we have two ears? Why is the pinna shaped the way it is? Plan of the talk Human spatial hearing Perceptual manifolds
Exploratory studies Applications How do humans localize sound source?
Primary cues Interaural Time Difference (ITD) Interaural Level Difference (ILD) Explains localization only in the horizontal plane. All points in the one half of the hyperboloid of revolution have the same ITD and IID. [cone of confusion ] Other cues Pinna shape gives elevation cues for higher frequencies. Torso and Head give elevation cues for lower frequencies. Source HEAD Left ear Right ear Intricate system to be completely modelled Its head, torso, and pinna Head Related Transfer Function(HRTF)
Spectral filtering caused by the head, torso, and the pinna. HRIRHead related impulse response. Can experimentally measure HRIR for all elevation and azimuth. Convolve the source signal with the measured HRIR to create virtual audio Sample HRIR and HRTF Source directly in front of your right ear. CIPIC Database Public Domain HRIR Database
HRIRs sampled at 1250 points around the head 45 subjects Anthropometry measurements V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano,"The CIPIC HRTF database, "in WASSAP '01 (2001 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 2001). Interaural polar coordinate system
Azimuth Elevation Implications of the origin not being exactly the center of the head. Plan of the talk Human spatial hearing Perceptual manifolds
Exploratory studies Applications Manifold representation
A HRIR of N samples can be considered as a point in N dimensional space. As the elevation is varied smoothly, the points essentially trace out a one-dimensional manifold in the N-dimensional space. If we can unfold this low-dimensional manifold we have a good perceptual representation of the signal. Exploratory studies using
Perceptual manifolds Exploratory studies using PCA LLE Isomap MVU A few applications Perceptual distance metric Interpolation Customization Our data matrix Elevation manifold
Our data matrix Elevation manifold points in a [HRTF=257 HRIR=200] dimensional space d 200 x 50 Subject 10. We will concentrate only on azimuth zero to begin with. Use the HRIR. 257 x 50 Dimensionality Reduction methods
We used to following four methods Principal Component Analysis (PCA) Local Linear Embedding (LLE) Isomap Maximum Variance Unfolding (MVU) We expect The manifold to have an intrinsic dimensionality of 1. The first embedded component to be monotonic with elevation. Optional slide HRTF elevation manifold PCA HRTF manifold Isomap (K=3) HRTF manifold Isomap (K=2) HRTF manifold LLE (K=3) HRTF manifold LLE (K=2) HRTF manifold MVU HRTF manifold MVU HRIR elevation manifold
PCA Isomap LLE MVU Complete manifold Azimuth -45:5:45 Elevation -45:5:230 We expect
The manifold to have an intrinsic dimensionality of 2. The first two embedded components should show a grid like structure. Optional slide Complete manifold PCA Complete manifold LLE (K=4) Complete manifold LMVU (K=4) Isomap (K=4) HRIR manifold PCA Isomap Isomap
Data representation -- manifold properties LLE, MVU - numerical problems Plan of the talk Human spatial hearing Perceptual manifolds
Exploratory studies Applications Problem 1: Interpolation
HRTFs generally measured for a finite sampling grid of elevation and azimuth. For a smooth virtual audio system we need to interpolate HRTFs. HRTF measurement is a tedious and time consuming process. Normally takes an hour. Subject must be immobile. Some prelimnary results Problem 2: Distance metric
How to compare any two given HRTFs Perceptually inspired metric Psychoacoustical tests Squared log-magnitude error It is tough to decide what aspects of a given signal are perceptually relevant Use geodesic distance How to compare any two given HRIRs i.e. how to formulate a distance metric in the space of HRIRs. The distance metric has to be perceptually inspired. The absolute justification however is to do psycho acoustical tests. In the absence of any good perceptual error metric the most commonly used one is the squared log-magnitude error of the spectrum of the HRIRs. It is tough to decide what aspects of a given signal are perceptually relevant. For our case of all HRIRs for different elevation angles, the obvious perceptual information to be extracted is the elevation of the source. A natural measure of distance would be the distance on the extracted one-dimensional manifold. Distance on the manifold Problem 3: Customization
HRTF measured for a particular person if used for different persons elevation perception is very poor. Ear shape of each person is unique and also the anatomy. Each persons localizing capabilites are tuned to the shape of their ear and anatomy. A big bottleneck for commercialization of spatial audio. Style vs Content Anthropometric measurements
Can we relate the antopometric measurements to some characteristics (?) of the manifold. Problem 4: Microphone calibration Thank You !| Questions ?