1
The Cocktail Party Problem Acknowledgement The authors thank Krishna Puvvada for interesting discussions. This project was supported by NIH grant 1R01AG036424. Forward Model: Attention Model: Inverse Model: Spk2 (Female) Spk2 Spk1 Attended Spk2 Attended MEG Spk1 MEG (Male) Spk1 x 10 -5 0 125 250 375 500 -6 -4 -2 0 2 4 6 8 50 t (ms) M50 TRF M100 TRF Sink Source 50ft/Step 0 1 Our Contribution: 1) Parsimonious use of covariates 2) High temporal resolusion ~ Seconds 3) Scalability Existing Techniques: 1) Full spectrotemporal features as covariates 2) Low temporal Resolution ~ Minutes Objectives Convex, but highly non-linear and coupled in time. Efficient solution: two nested EM algorithms * ** **** *** Time (s) 10 20 30 40 50 0 0.5 1 10 dB 60 10 20 30 40 50 60 −5 0 5 10 20 30 40 50 60 −5 0 5 10 20 30 40 50 60 0 0.5 1 (x 10 ) −3 (x 10 ) −3 Time (s) Spk2 Attended Spk2 Attended MEG Spk2 MEG Spk1 10 20 30 40 50 0 0.5 1 0 dB −10 dB −20 dB 60 References Speech Segregation: Identifying and tracking a target speaker, corrupted by acoustic interference 1 . Neural Activity at the Cortical level: Strongly modulated by low-frequency temporal modula- tions (envelope) of the attended target speaker 2 . Magnetic field map of the auditory MEG component (DSS) 3 1 Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the aoustical society of America, 25(5), 975-979. 2 Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859. 3 de Cheveigne, A., & Simon, J. Z. (2008). Denoising based on spatial filtering. Journal of neuroscience methods, 171(2), 331-339. Speech envelope Temporal receptive field Noise Auditory MEG attending to Spk2 attending to Spk1 MAP Estimate: Maximize » » » » Outer EM iteration : E-Step: Compute * M-Step: Update and ** Inner EM iteration : E-Step: Compute *** M-Step: Update **** end end MEG Spk2 MEG Spk1 Spk2 Attended Spk2 Attended MEG Spk2 MEG Spk1 Time(s) Time(s) MEG Spk2 MEG Spk1 MEG Spk2 MEG Spk1 Spk1 Attended 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 0 0.5 1 10 20 30 40 50 0 0.5 1 60 60 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 0 0.5 1 10 20 30 40 50 0 0.5 1 60 60 Sbj 1 Sbj 2 Classifier Sbj 1 Sbj 2 Classifier 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 0 0.5 1 10 20 30 40 50 0 0.5 1 60 60 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 60 −0.02 0 0.02 10 20 30 40 50 0 0.5 1 10 20 30 40 50 0 0.5 1 60 60 Sbj 1 Sbj 2 Classifier Sbj 1 Sbj 2 Classifier 1 0 1 0 Spk1 Attended Time(s) Time(s) Simulation Resaults von Mises Distribution The Inverse Solution 1 0 1 0 Spk1 Attended Spk1 Attended 1 0 1 0 The Inverse Problem: Estimating , given the observed data from trials. Spk1 Attended A Sate-Space Model for Decoding Auditory Attentional Modulation from MEG in a Competing-Speaker Environment Sahar Akram 1, 2 , Jonathan Z. Simon 1, 2, 3 , Shihab Shamma 1, 2 , Behtash Babadi 1, 2 1 Department of Electrical and Computer Engineering, 2 Institute for Systems Reasearch, 3 Department of Biology, University of Maryland, College Park, MD [email protected], [email protected], [email protected], [email protected] Application to Real Data 1 0 Spk2 Attended 0 \pi/2 \pi 0 0.5 1 1.5 2 2.5 3 κ = 0.5 κ = 1 κ = 2 κ = 4 κ = 8 θ (radian) “von Mises” density π/2 π The Proposed Model Decoupled in time with tractable non-linear operations Task Task Task Task Task Task Task

A Sate-Space Model for Decoding Auditory Attentional ... Akram-Bebadi... · MEG Spk2 MEG Spk1 10 20 30 40 50 0 0.5 1 0 dB −10 dB −20 dB 60 References Speech Segregation: Identifying

Embed Size (px)

Citation preview

Page 1: A Sate-Space Model for Decoding Auditory Attentional ... Akram-Bebadi... · MEG Spk2 MEG Spk1 10 20 30 40 50 0 0.5 1 0 dB −10 dB −20 dB 60 References Speech Segregation: Identifying

The Cocktail Party Problem

AcknowledgementThe authors thank Krishna Puvvada for interesting discussions. This project was supported by NIH grant 1R01AG036424.

Forward Model:

Attention Model:

Inverse Model:

Spk2(Female)

Spk2

Spk1

Attended

Spk2 Attended

MEG

Spk1

MEG

(Male)Spk1

x 10-5

0 125 250 375 500-6-4-202468

50t (ms)

M50TRF

M100TRF

Sink Source

50ft/Step

0

1

Our Contribution:1) Parsimonious use of covariates2) High temporal resolusion

~ Seconds3) Scalability

Existing Techniques:1) Full spectrotemporal features as

covariates2) Low temporal Resolution

~ Minutes

Objectives

Convex, but highly non-linear and coupled in time.Efficient solution: two nested EM algorithms

*

**

****

***

Time (s)10 20 30 40 500

0.5

1

10 dB

60

10 20 30 40 50 60−5

0

5

10 20 30 40 50 60−5

0

5

10 20 30 40 50 600

0.5

1

(x 10 )−3

(x 10 )−3

Time (s)

Spk2 Attended Spk2 Attended

MEGSpk2

MEGSpk1

10 20 30 40 500

0.5

1

0 dB−10 dB−20 dB

60

References

Speech Segregation: Identifying and tracking a target speaker, corrupted by acoustic interference 1 .

Neural Activity at the Cortical level:Strongly modulated by low-frequency temporal modula-tions (envelope) of the attended target speaker 2 .

Magnetic field map of the auditory MEG component (DSS) 3

1 Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the aoustical society of America, 25(5), 975-979.

2 Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859.

3 de Cheveigne, A., & Simon, J. Z. (2008). Denoising based on spatial filtering. Journal of neuroscience methods, 171(2), 331-339.

Speech envelope

Temporal receptive field

NoiseAuditory MEG

attending to Spk2attending to Spk1

MAP Estimate: Maximize

»

»

»»

Outer EM iteration :

E-Step: Compute *

M-Step: Update and ** Inner EM iteration : E-Step: Compute

***

M-Step: Update ****

end end

MEGSpk2

MEGSpk1

Spk2 Attended

Spk2 AttendedMEGSpk2

MEGSpk1

Time(s) Time(s)

MEGSpk2

MEGSpk1

MEGSpk2

MEGSpk1

Spk1 Attended

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 500

0.5

1

10 20 30 40 500

0.5

1

60

60

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 500

0.5

1

10 20 30 40 500

0.5

1

60

60

Sbj 1Sbj 2Classifier

Sbj 1Sbj 2Classifier

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 500

0.5

1

10 20 30 40 500

0.5

1

60

60

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 50 60−0.02

0

0.02

10 20 30 40 500

0.5

1

10 20 30 40 500

0.5

1

60

60

Sbj 1Sbj 2Classifier

Sbj 1Sbj 2Classifier

1

0

1

0Spk1 Attended

Time(s) Time(s)

Simulation Resaults

von Mises Distribution

The Inverse Solution1

0

1

0Spk1 Attended Spk1 Attended

1

0

1

0

The Inverse Problem: Estimating , given the observed data from trials.

Spk1 Attended

A Sate-Space Model for Decoding Auditory Attentional Modulation from MEG in a Competing-Speaker EnvironmentSahar Akram1, 2, Jonathan Z. Simon1, 2, 3, Shihab Shamma1, 2, Behtash Babadi1, 2

1Department of Electrical and Computer Engineering, 2 Institute for Systems Reasearch, 3 Department of Biology, University of Maryland, College Park, [email protected], [email protected], [email protected], [email protected]

Application to Real Data1

0

Spk2 Attended

0 \pi/2 \pi0

0.5

1

1.5

2

2.5

3

κ = 0.5κ = 1κ = 2κ = 4κ = 8

θ(radian)

“von

Mis

es”

dens

ity

π/2 π

The Proposed Model

Decoupled in time with tractable non-linear operations

Task Task

Task Task

Task Task

Task