Download pdf - SOUND PROPAGATION OVERVIEW - UMIACSramani/hrtf/duda_umd00_slides.pdf · umd00_overview.ai OVERVIEW • Physics of sound • Acoustic cues for sound localization • Azimuth • Elevation

jh_propagation.ai

SOUND PROPAGATION

listenersound source

speed c

x

lwavelength

c = f l

frequency f (Hz)

AN INTRODUCTION TOHUMAN SPATIAL HEARING

Richard O. DudaCIPIC Interface Laboratory

UC Davis

http://phosphor.cipic.ucdavis.edu

October 12, 2000

��umd00_title.ai

umd00_overview.ai

OVERVIEW

•Physics of sound

•Acoustic cues for sound localization •Azimuth •Elevation •Range

•Head-related transfer functions (HRTFs)

•Approaches to synthesizing spatial sound

•Opportunities and challenges

jh_paths.ai

MULTIPATH PROPAGATION

Reflection

Refraction

Scattering

umd00_axiom_1.ai

AXIOM I

The sound pressure at the twoear drums is a sufficient stimulus.

Producing the same sound pressure willproduce the same auditory perception.

•Bone conduction•Adaptation•Conflicting visual cues•Conflicting expectations

Caveats:

umd00_axiom_2.ai

AXIOM II

Exact reproduction of the sound pressureis not necessary for producing the sameauditory perception.

The limitations of neural responsesallow different (and simpler) stimulito produce the same response.

•Bandwidth (20 Hz to 20 kHz) •Amplitude (1-dB resolution) •Monaural phase (2-ms resolution) •Latency (10-ms resolution) •Spectral fine structure(critical bands, Q = 8)

Examples:

umd00_axiom_3.ai

AXIOM III

Although it is not necessary to reproduceall of the cues exactly, conflicting cuesdegrade perception.

Key engineering challenge -- find themost cost-effective approximation.

ubc_vp_coords.ai

VERTICAL-POLARCOORDINATES

qf

Plane o

f

consta

nt

azimuth

r

Cone ofconstantelevation

MedianPlane

Sound source

Horizontal plane

q

ubc_ip_coords.ai

INTERAURAL-POLARCOORDINATES

f

q f

Plane of constant elevation

rInteraural axis

Cone ofconstantazimuth

MedianPlaneHorizontal plane

Sound source

jh_azimuth_cues.ai

AZIMUTH CUES

sound source

q

•ITD (Interaural Time Difference)

•ILD (Interaural Level Difference)

WOODWORTH'S FORMULA

ubc_delay.ai

Contralateral Ear Ipsilateral Ear

Sound Source

a q

qa

aq

q

a sin q

DTips =- a sin q

c

ITD = a q + sin qc

DTcon =a qc

ARRIVAL TIME

ubc_delay_curve.ai

Rayleigh's solution (20% rise time)

Woodworth's formula

Angle of Incidence (deg)

Arr

iva

l tim

e

(ms)

0 50 100 150 200 250 300 350 400

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

-0.4

jh_elevation_cues.ai

ELEVATION CUES

soundsource

f

•Pinna reflections and resonances

•Torso and shoulder reflections

umd00_torso_refl1.ai

TORSO REFLECTIONsoundsource

f

h

soundsource

fmin

ffmin 90o

DTT

2hc

|H(f)|

f12DTT

32DTT

52DTT

72DTT ubc_pinna_nomenclature.ai

THE PINNA

Cavum concha

Cymba concha

Helix

Crus helias

Triangular fossaScaphoid fossa

LobuleIntertragal incisure

Antihelix

External auditory meatusTragus

Antitragus

ubc_pinna_modes.ai

PINNA PHENOMENA

Pinna reflections (Batteau)

Pinna resonances (Shaw)

+ +

+

++

PINNAE

ubc_pinnae.ai jh_elevation_cues.ai

RANGE CUES

•Loudness (for familiar sources)

•Excess ILD (for close sources)

•Direct/reverberant (for distant sources)

sound source

soundsource

umd00_dynamnic_cues1.ai

HEAD-MOTION CUES ANDFRONT/BACK CONFUSION

?

?

umd00_dynamnic_cues2.ai

HEAD-MOTION CUES ANDELEVATION MAGNITUDE

aa

aa

aa

f

ITD = a2ac

ITD = a cos f2acITD = 0

umd00_other_cues.ai

OTHER CUES

•Visual cues •Synchronized motion •Absence

•Knowledge of source

•Knowedge of environment

jh_ff.ai

FREE-FIELD RADIATION FROM ASPHERICAL SOURCE

X(f) = Fourier transform of source pressureXff(f)= Free-field pressure at head center

Xff = Hff X

Hff(f)= e- j k r , k = r0r

Inverse range Propagation delay

2 p fc

Sound Source

X(f)

r0

r0

r

Xff(f)

ubc_HRTF_def.ai

THE HEAD-RELATEDTRANSFER FUNCTION

X(f) = Fourier transform of source pressureXL(f)= Fourier transform of left ear pressureXR(f)= Fourier transform of right ear pressureXff(f)= Free-field pressure at the origin

XL(f)= HL(f) Xff(f) XR(f)= HR(f) Xff(f)

HR(f)

Sound Source

X(f)

XR(f)

XL(f)

HL(f)

ubc_HRIR_def.ai

THE HEAD-RELATEDIMPULSE RESPONSE

hR(t)

Sound Source

d(t)

xR(t)

xL(t)

hL(t)

xL(t) = Left ear pressurexR(t) = Right ear pressurexff(t) = Free-field pressure at the origin

xL(t) = hL(t) xff(t-t) dt xR(t) = hR(t) xff(t-t) dt- 8

8

- 88

HRIR SOUND SYNTHESIS

jh_synthesis.ai

xR(t)xL(t)

Convolver Convolver

Head-RelatedImpulse Responses

Sound SignalhL(t) hR(t)

Azimuth q Elevation f Range r

VirtualSource

x(t)

jh_structural_model.ai

A STRUCTURAL MODEL

VirtualSource

xR(t)xL(t)

x(t)

+ +

Head Torso Room Head Torso Room

Pinna Pinna

Sound Signal

COMPUTING HRTFs BYBOUNDARY ELEMENT METHODS

•Digitize with a 3-D scanner

•Solve wave equation numerically

ubc_bem.ai

* See Kahana et al.

THE KEMARACOUSTIC MANIKIN

ubc_kemar.ai

f

q

Interaural

Axis

Elevation

Azim

uth

umd00_hoop.ai

ACOUSTICHRTF MEASUREMENT

jh_kemar_hrir_m45.ai

KEMAR HRIR

Azimuth = -45o, Elevation = 0o

0 0.5 1 1.5 2

Left ear

Right ear

Time (ms)

jh_kemar_hrtf_m45.ai

KEMAR HRTF

Azimuth = -45o, Elevation = 0o

Frequency (kHz)

Re

spo

nse

(d

B)

-30

-20

-10

0

10

20

30

0.1 1 1020.2 20

Left ear

Right ear

ubc_ke_freq.ai

RIGHT-EAR HRTF FOR KEMAR(Horizontal Plane)

100 1000 10000Frequency (Hz)

FRONT

Re

spo

nse

(d

B)

-25

-20

-15

-10

-5

0

5

10

15

20

AZIMUTH = 0o

AZIMUTH = 90o

AZIMUTH = -90o

100 1000 10000

Re

spo

nse

(d

B)

BACK

-25

-20

-15

-10

-5

0

5

10

15

20

Frequency (Hz)

AZIMUTH = 90o

AZIMUTH = 180o

AZIMUTH = 270o

ubc_ke_np_freq.ai

HRTF FOR KEMAR, NO PINNA(Horizontal Plane)

100 1000 10000

-25

-20

-15

-10

-5

0

5

10

Frequency (Hz)

Re

spo

nse

(d

B)

FRONTAZIMUTH = 90oAZIMUTH = 0o

AZIMUTH = -90o

BACK

Frequency (Hz)

Re

spo

nse

(d

B)

100 1000 10000

-25

-20

-15

-10

-5

0

5

10AZIMUTH = 90o

AZIMUTH = 180o

AZIMUTH = 270o

umd00_full_HRTF.ai

HRTF ELEVATION DEPENDENCE

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

Elevation (deg)0 100 200

-15

-10

-5

0

5

10

15

dB

umd00_HRTF_nopinna.ai

HRTF WITHOUT PINNA

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

dBElevation (deg)0 100 200

umd00_pinplane.ai

A PINNA ON A PLANE

umd00_HRTF_pinna.ai

HRTF FOR ISOLATED PINNA

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

dBElevation (deg)0 100 200

-15

-10

-5

0

5

10

15

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

Elevation (deg)0 100 200

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

dB

Full HRTF

Head and torso

Pinna

umd00_HRTF_contributions.ai

CONTRIBUTIONS TO THE HRTF

jh_structural_model.ai

A STRUCTURAL MODEL

VirtualSource

xR(t)xL(t)

x(t)

+ +

Head Torso Room Head Torso Room

Pinna Pinna

Sound Signal

ubc_sphere_model.ai

THE SPHERICAL-HEAD MODEL

VirtualSource

q

xR(t)xL(t)

x(t)

DTL(q)

HHsL(q)

DTR(q)

HHsR(q)

jh_sphere_assess.ai

ASSESSING THESPHERICAL HEAD MODEL

•Only one parameter -- easily customized

•Well focused

•Good left/right position

•No up/done control -- image elevated

•With a head tracker: • Moderately externalized • Little front/back confusion

•Without a head tracker: • Internalized • Usually seems to be in back

jh_torso_reflections.ai

ELLIPSOIDAL-TORSO MODEL

soundsource

f

HeadModel

HeadModel

rT

DTT

rT

DTT

= torso reflection coefficient

= torso reflection delay

jh_ellipsoid_assess.ai

ASSESSING THEELLIPSOIDAL TORSO MODEL

•Five parameters; still easily customized

•Provides an elevation cue • Significant below 3 kHz • Ineffective in median plane

•Only one component of a full model

jh_structural_model_2.ai

STRUCTURAL HRTF MODEL

HeadModel

HeadModel

TorsoModel

PinnaModel

DTH(q)

HHS

(q)

Head Model

rT

DTT(q,f)

Torso Model

jh_structural_model_3.ai

SIMPLIFIED PINNA MODEL

kP(f)

DTP(f)

Fixed-poleresonator

kP(f)

DTP(f)

Fixed-poleresonator

umd00_systems.ai

SPATIAL SOUND SYSTEMS

Multichannel

Two-channel: headphones

Two-channel: crosstalk-canceled loud speakers

umd00_systems2.ai

MULTICHANNEL SYSTEMS

Pros •Works with a large audience •No customization needed •Conceptually simple

Cons •Speakers must be distant •Many channels needed for full 3-D •Space consuming, expensive

umd00_systems3.ai

TWO-CHANNEL: HEADPHONES

Pros •Can reproduce full 3-D with only 2 channels •Private and non-interfering •Conceptually simple

Cons •Uncomfortable for extended use •Clumsy for a large audience •Requires customization for full 3-D •Difficult to achieve frontal externalization

xL(t) xR(t)

umd00_systems4.ai

TWO-CHANNEL: CROSSTALK-CANCELED

LOUD SPEAKERS

Pros •Can reproduce full 3-D with only 2 channels •Unencumbered listening

Cons •Small "sweet spot" •Cannot be used with a large audience •Requires customization for full 3-D •Difficult to get near or rear locations

xL(t) xR(t)

Inverse HRTFs

umd00_customization.ai

APPROACHES TOCUSTOMIZATION

•Measure exact HRTF for each person •Acoustic •Computational

•Nearest-neighbor •Trial and error •Anthropometry

•Scale a standard HRTF •Global •Pinna/head/torso components

•Use an adaptive model •Match to anthropometry •Match to exact HRTF

umd00_problems.ai

CHALLENGESAND

OPPORTUNITIES

•Frequency range (combining partial HRTFs)

•Elevation perception •Front/back confusion •Low elevations

•Range perception •Headphones: externalization • Median plane • Frontal •Speakers: back locations

•Transducers •Headphone compensation •Loudspeaker "sweet spot"

•Latency in dynamic systems

•Room acoustics