Cortical Encoding of Auditory Objects at the Cocktail Party

Preview:

Citation preview

Cortical Encoding of Auditory Objects at the

Cocktail PartyJonathan Z. Simon

University of Maryland

Computational Audition Workshop, June 2013

Introduction

• Auditory Objects

• Magnetoencephalography (MEG)

• Neural Representations of Auditory Objects in Cortex: Decoding

• Neural Representations of Auditory Objects in Cortex: Encoding

Auditory Objects

• What is an auditory object?

• perceptual construct (not neural, not acoustic)

• commonalities with visual objects

• several potential formal definitions

Auditory Object Definition

• Griffiths & Warren definition:

• corresponds with something in the sensory world

• object information separate from information of rest of sensory world

• abstracted: object information generalized over particular sensory experiences

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Auditory Objects at the Cocktail Party

Ding & Simon, PNAS (2012)

Experiments

tissue

CSF

skull

scalpB

MEG

VEEG

recordingsurface

currentflow

orientationof magneticfield

MagneticDipolarField

Projection

•Direct electrophysiological measurement•not hemodynamic•real-time

•No unique solution for distributed source

Magnetoencephalography (MEG)

Photo by Fritz Goro

•Measures spatially synchronized cortical activity

•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)

tissue

CSF

skull

scalpB

MEG

VEEG

recordingsurface

currentflow

orientationof magneticfield

MagneticDipolarField

Projection

•Direct electrophysiological measurement•not hemodynamic•real-time

•No unique solution for distributed source

Magnetoencephalography (MEG)

Photo by Fritz Goro

•Measures spatially synchronized cortical activity

•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)

Phase-Locking in MEG to Slow Temporal Modulations

Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)

AM  at  3  Hz   3  Hz  phase-­locked  response  

response  spectrum  (subject  R0747)  

MEG activity is precisely phase-locked to temporal modulations of sound

0 10

Frequency (Hz)

3 Hz

6 Hz

Phase-Locking in MEG to Slow Temporal Modulations

Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)

AM  at  3  Hz   3  Hz  phase-­locked  response  

response  spectrum  (subject  R0747)  

MEG activity is precisely phase-locked to temporal modulations of sound

0 10

Frequency (Hz)

3 Hz

6 Hz

MEG Responses

AuditoryModel

to Speech Modulations

Ding & Simon, J Neurophysiol (2012)

Spectro-Temporal Response Function (STRF)

(up to ~10 Hz)

MEG ResponsesPredicted by STRF Model

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Speech Stream as an Auditory Object

• corresponds with something in the sensory world

• information separate from information of rest of sensory worlde.g. other speech streams or noise

• abstracted: object information generalized over particular sensory experiencese.g. different sound mixtures

• neural representation is of something in sensory world

• when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene

• neural representation invariant under broad changes in specific acoustics

Neural Representation of an Auditory Object

Selective Neural Encoding

Selective Neural Encoding

Selective Neural Encoding

Unselective vs. Selective Neural Encoding

Unselective vs. Selective Neural Encoding

Selective Neural Encoding

Stream-Specific Representation

grand average over subjects

representative subject

Identical Stimuli!

reconstructed from MEG

attended speech envelopes

reconstructed from MEG

attending tospeaker 1

attending tospeaker 2

Stream-Specific Representation

grand average over subjects

representative subject

Identical Stimuli!

reconstructed from MEG

attended speech envelopes

reconstructed from MEG

attending tospeaker 1

attending tospeaker 2

Single Trial Speech Reconstruction

Single Trial Speech Reconstruction

Overall Speech Reconstruction

0.2

0

0.1

correlation

attended  speechreconstruction

backgroundreconstruction

attended  speech background  

Distinct neural representations for different speech streams

Invariance Under Acoustic Changes

Invariance Under Acoustic Changes

Invariance Under Acoustic Changes

Invariance Under Acoustic Changes

Invariance Under Acoustic Changes

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

Speaker  Relative  Intensity    (dB)

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

Neural Results

Speaker  Relative  Intensity    (dB)

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

Neural Results

Speaker  Relative  Intensity    (dB)

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

Neural Results

•Stream-based not stimulus-based•Neural representation is invariant to acoustic changes.

Speaker  Relative  Intensity    (dB)

✓ neural representation is of something in sensory world

✓ when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene

✓ neural representation invariant under broad changes in specific acoustics

Neural Representation of an Auditory Object

Forward STRF Model

Spectro-Temporal Response Function (STRF)

Forward STRF Model

Spectro-Temporal Response Function (STRF)

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency  (kHz)

.2

.5

1

3

0 100 200

Attended

time  (ms) time  (ms)

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency  (kHz)

.2

.5

1

3

0 100 200

Attended

time  (ms) time  (ms)

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency  (kHz)

.2

.5

1

3

0 100 200

Attended

time  (ms) time  (ms)

Neural Sources

RightLeft

anterior

posterior

medial

M50STRFM100STRFM100

•M100STRF source near (same as?) M100 source: PT

•M50STRF source is anterior and medial to M100 (same as M50?): HG

5 mm

Cortical Object-Processing Hierarchy

0 100 200 400time  (ms)

0

attendedbackground

Attentional  Modulation

0 100 200 400

0

time  (ms)

clean

-­5  dB-­8  dB

Influence  of  Relative  Intensity

0  dB5  dB8  dB

•M100STRF strongly modulated by attention, but not M50STRF.•M100STRF invariant against acoustic changes.•Objects well-neurally represented at 100 ms, but not 50 ms.

Not Just SpeechCompeting Tone Streams

Xiang et al., J Neuroscience (2010) Elhilali et al., PLoS Biology (2009)

Tone Stream in Masker Cloud

Time

Frequency

Time

Frequency

Tone Stream in Masker Cloud

Time

Frequency

Tone Stream in Masker Cloud

Time

Frequency

Tone Stream in Masker Cloud

Time

Frequency

Tone Stream in Masker Cloud

Time

Frequency

Competing Tone Streams

Time

Frequency

Competing Tone Streams

Time

Frequency

Competing Tone Streams

Time

Frequency

Summary

• Cortical representations of speech found here:

✓ consistent with being neural representations of auditory (perceptual) objects

✓ meet 3 formal criteria for auditory objects

• Object representation fully formed by 100 ms latency (PT), but not by 50 ms (HG)

• Not special to speech

Acknowledgements

FundingNIH R01 DC 008342

CollaboratorsCatherine CarrAlain de CheveignéDidier DepireuxMounya ElhilaliJonathan FritzCindy MossDavid PoeppelShihab Shamma

Past PostdocsDan HertzYadong Wang

Grad StudentsFrancisco CervantesMarisel Villafane Delgado Kim DrnecKrishna Puvvada

Past Grad StudentsNayef AhmarClaudia BoninMaria ChaitNai DingVictor Grau-SerratLing MaRaul RodriguezJuanjuan XiangKai Sum LiJiachen Zhuo

Undergraduate StudentsAbdulaziz Al-Turki Nicholas AsendorfSonja BohrElizabeth CamengaCorinne CameronJulien DagenaisKatya DombrowskiKevin HoganKevin KahnAndrea ShomeMadeleine VarmerBen Walsh

Collaborators’ StudentsMurat AytekinJulian JenkinsDavid KleinHuan Luo

Thank You

Reconstruction of Same-Sex Speech

0 0.1 0.2 0.3 0.40

0.1

0.2

0.3

0.4

Speaker

Tw

o

Speaker One

Single  Trial  Decoding  Results

Two

One

Attended  Speaker

Attended  SpeechReconstruction

Correlation

0.1

0.2

attended  speech

background  speech

Speech in Noise: Stimuli

Speech in Noise: Results

Speech in Noise: Results

Speech in Noise: Results

Recommended