42
Centerfor Auditory and Acoustic Research Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research Institute for Systems Research Electrical and Computer Engineering University of Maryland, College Park

C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Embed Size (px)

Citation preview

Page 1: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Representation of Timbre inthe Auditory System

Shihab A. Shamma

Center for Auditory and Acoustic ResearchInstitute for Systems Research

Electrical and Computer EngineeringUniversity of Maryland, College Park

Page 2: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

9

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7Time (ms)

200 400 600 800 1000 1200 1400 1600 1800

125

250

500

1000

2000

Musical SpectrogramsViolin (vibrato) Piano

Time (ms)

Page 3: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

sound

Central AuditoryStages

CollicularStages

MidbrainNuclei

Early AuditoryStages

Attributes of Complex Sounds

NLL

LL

TB

Anatomy of the AuditorySystem

DCNPVCNAVCN

Location Timbre Pitch

Spatial maps

Computing pitch

Harmonic templates

ILD, ITDSpectral cues

The auditory spectrum

IC

MGB

Page 4: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

AnalysisCochlear filters

TransductionHair cells

ReductionLateral inhibition

log f

log f

log f

log f

log u

u log f

eardrum cochlea basilar membranefilters

hair cell stages lateral inhibitorynetwork

Time(ms)

100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Audit ory Spec t rogram

Early Auditory Processing Stages

Page 5: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

4000

2000

1000

500

250

Time(ms) 60

average response

Auditory-Nerve ResponsePatterns to a Single Tone

Page 6: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

4000

2000

1000

500

250

Time(ms) 60

average response

Auditory-Nerve ResponsePatterns to Two-Tone Stimulus

Page 7: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

500

500

Time (ms)

Time (ms)

4000

4000

250

250

/ r i t a w a y /

Page 8: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Sound

Estimated stimulus spectrum

60Time (msec)

Basilar membrane vibrations

Time (msec) 500

A’

B’

C’

Cochlear Analysis Auditory-Nerve Responses

C4

.25

Har

mon

ic s

erie

s

Time (msec)

4000

250

60

CF

(H

z)

4000

250

CF

(Hz)

Time (msec)

A

B

500

C

CF

(kH

z)

Hair cells along the tonotopic axis

Characteristic F

req uen cy Ax is (C

F)

Auditory-nerve fibers

Lateral Inhibition

Page 9: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Normal

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Normal Down-Shift

Compress Dilate

Page 10: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Page 11: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Awake Set-up

Awake ferret with head restraint in cylindrical holder

Page 12: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

The raw neural trace typically contained multiple distinct waveforms(typically representing 1-4 neurons) which were sorted off-line.

0 20 40 60 80 100 120

0

Spike Sorting

2000

4000

8000

1600010e Unit 2

tagless 10e

2000

4000

8000

1600010e Unit 1

21

7770 14748

Time (ms) 50 100 150 Time (ms) 50 100 150

Waveforms were sorted in a semi-automatic procedure. First, aPCA-based algorithm was used to pre-sort the spikes. Then aMATLAB based program was used to refine the classification.

Page 13: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.5

1

1.5

2

2.5

3

3.5

4

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5

3

Three envelopes ofmodulation:Slow (< 30 Hz)Intemediate (< 500 Hz)Fast (< 4 kHz)

/come/ /home/ /right/ /away/

Page 14: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

0

25005

t (ms)

Rate (Hz)

0.6

0.2

124-4-12

Time (ms)100

200 300 400 500 600

700

800

900 1000

125

250

500

1000

2000

Decomposing a Spectrogram into Dynamic Ripples

Frequency (kHz)

∆A

1 2 4 8 16

Tim

e (ms)

Frequency

w4 Hz

0

250

Page 15: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

4241682028Ω=0.4cyc/octω=4to32Hz 30sweepsperωTime(ms)

TemporalFrequency(Hz) RippleFrequencyis0.4cycles/oct1232 55dB

Reponses to Moving Ripples

Page 16: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

w(Hz) Ω= 0.8 cyc/oct

Time (ms)

w= 12 HzΩ (cyc/oct)

Time (ms)

| F { }|

|TF ( ,Ω )|

0

Ω

0 T

0

X

t (m s)

ST RF (t,x )

B

freq

uenc

y

-w w

w

A

Page 17: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

4

0.125

4

0.125

4

0.125

4

0.125

4

0.125

4

0.125

Examples of Different STRF Shapes

Page 18: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Spectro-Temporal Response Fields

Page 19: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

250 8000.25

8

A

C

1 8 0

0.25

8

Multiscale Cortical Representation of a Spectrogram

Frequency

Rate (

Hz)

Page 20: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Scale-Rate Decomposition

Reconstruction

Page 21: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

MUSICAL TIMBRE

Page 22: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

9

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7Time (ms)

200 400 600 800 1000 1200 1400 1600 1800

125

250

500

1000

2000

Musical SpectrogramsViolin (vibrato) Piano

Time (ms)

Page 23: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

- 1.- 2.- 4.- 8.- 16.- 32.0.25

0.50

1.00

2.00

4.00

8.00

1. 2. 4. 8. 16. 32.0.25

0.50

1.00

2.00

4.00

8.00

- 1.- 2.- 4.- 8.- 16.- 32.0.25

0.50

1.00

2.00

4.00

8.00

1. 2. 4. 8. 16. 32.0.25

0.50

1.00

2.00

4.00

8.00

Rate (Hz)

_ + +_

1 2 4 8 16 32 - 1- 2- 4- 8- 16-32 1 2 4 8 16

32- 1- 2- 4- 8- 16-32

.25

.5

1

2

4

8Violin (vibrato) Piano

OboeClarinet

Patterns of Musical TimbreViolin (vibrato) Piano

OboeClarinet

Page 24: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Timbre Metric for Some Musical Instruments (TSVQ)

Page 25: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Timbre Metric for Musical Instruments

exp1c-model-rs-mf

2 4 6 8 10 12

2

4

6

8

10

12

exp1c-subjects

2 4 6 8 10 12

2

4

6

8

10

12

GuitarHarpViolin Pizz.Violin Bowed Bass Synth A Synth B Oboe ClarinetFlute HornTrumpet

Gui

tar

Har

pV

ioli

n P

izz.

Vio

lin

Bow

ed

Bas

s S

ynth

A

Syn

th B

O

boe

Cla

rine

tF

lute

H

orn

Tru

mpe

t

GuitarHarpViolin Pizz.Violin Bowed Bass Synth A Synth B Oboe ClarinetFlute HornTrumpet

Gui

tar

Har

pV

ioli

n P

izz.

Vio

lin

Bow

ed

Bas

s S

ynth

A

Syn

th B

O

boe

Cla

rine

tF

lute

H

orn

Tru

mpe

t

Subjects (1-24) Spectral cues

Temporal cues

Spectro-temporal cues

Page 26: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Mapping musical instruments

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400

125

250

500

1000

2000

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400

125

250

500

1000

2000

Guitar Trumpet

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400 1600 1800 2000

125

250

500

1000

2000

Trumpar

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400 1600 1800 2000

250

500

1000

2000

4000

ACE Chord

- 1- 2- 4- 8- 16

0.50

1.00

2.00

4.00

8.00

1 2 4 8 16

0.50

1.00

2.00

4.00

8.00

- 1- 2- 4- 8- 16

0.50

1.00

2.00

4.00

8.00

1 2 4 8 16

0.50

1.00

2.00

4.00

8.00

A Melody with the Trumpar

Page 27: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Speech Analysis&

Assessment of Inteligibility

Page 28: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.5

1

1.5

2

2.5

3

3.5

4

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5

3

Three envelopes ofmodulation:Slow (< 30 Hz)Intemediate (< 500 Hz)Fast (< 4 kHz)

/come/ /home/ /right/ /away/

Page 29: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Page 30: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Human versus Ferret Sensitivity to Spectrotemporal Modulations

Page 31: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Page 32: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Page 33: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Auditory Scene Analysis&

Pitch Extraction

Page 34: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

250 8000.25

8

A

C

1 8 0

0.25

8

Relevance to Auditory Scene Analysis: Streaming and grouping

Frequency

Rate (

Hz)

Working Hypotheses

Streaming: Any consistently isolated feature in the multiscale representation can be streamed e.g., spectral patterns (tones or average vocal tract spectra) repetitive temporal dynamics (modulated noise or sinusoidal FM tones) - transients as segmenters

Grouping: Harmonicity and its linearly interpolated extensions (pitch extraction and segregation, regular patterns) Shared dynamics (Common onsets and modulations)

Page 35: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Frequency (Hz)

Time (ms)100 200 300 400 500 600 700 800 900 1000

250

500

1000

2000

4000

250 500 1000 2000 4000

0.5

1.0

2.0

4.0

8.0

0 20 40 60 80 100 120 14002468

10121416

250 500 1000 2000 4000

0.5

1.0

2.0

4.0

Cortical Representation of Harmonic & Shifted Spectra

Auditory Spectrum Multiscale Representation

Sca

le

Frequency

Reduced Representation

Sca

le

Shifted Spectra are also grouped although they are inharmonic

Page 36: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Computing Pitch

Page 37: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

125

250

500

1000

2000

125

250

500

1000

2000

Pitch Estimates

Pre-cortical processing Post-cortical processing

Page 38: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

F em ale

10 time (s)

M ale

.125

2

10 time (s)

M ale+F emale

10 time (s)

B Extracted Female

10 time (s)

A

P itc h tracks

Estimating Pitch Extracting Pitch Streams

Page 39: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Voice Morphing

Page 40: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Manipulating Temporal and Spectral Modulations

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Normal

Temporally smeared

Spectrally smeared

Temporally sharpened

Page 41: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time (ms)500 1000 1500 2000 2500 3000

125

250

500

1000

2000

Time(ms)500 1000 1500 2000 2500 3000

125

250

500

1000

2000

Female

Oboe

Female Oboe

MorphingVoices

Page 42: C enter for A uditory and A coustic R esearch Representation of Timbre in the Auditory System Shihab A. Shamma Center for Auditory and Acoustic Research

Center for Auditoryand Acoustic Research

Auditory Speech and Music ProcessingTai Chi, Mounya El-Hilali, Powen Ru

Cortical Physiology and Auditory ComputationsDidier Depireux, Jonathan Fritz, David KleinJonathan Simon

Acknowledgment

Supported by:MURI # N00014-97-1-0501 from the Office of Naval Research# NIDCD T32 DC00046-01 from the NIDCD# NSFD CD8803012 from the National Science Foundation