17
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. Grais Hakan Erdogan 17 th International Conference on Digital Signal Processing,2011

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Embed Size (px)

Citation preview

Page 1: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Jain-De,Lee

Emad M. Grais Hakan Erdogan

17th International Conference on Digital Signal Processing,2011

Page 2: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Outline INTRODUCTION

NON-NEGATIVE MATRIX FACTORIZATION

SIGNAL SEPARATION AND MASKING

EXPERIMENTS AND DISCUSSION

CONCLUSION

Page 3: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Introduction

There are two main stages of this work– Training stage– Separation stage

Using NMF with different types of masks to improve the separation process

– The separation process faster– NMF with fewer iterations

Page 4: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Introduction

Problem formulation– The observe a signal x(t) ,which is the mixture of two

sources s(t) and m(t)

– Assume the sources have the same phase angle as the mixed

),(),(),( ),(),(),(

),(),(),(ftMjftSjftXj eftMeftSeftX

ftMftSftX

Where (t , f) be the STFT of x(t)

X = S + M

Page 5: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Non-negative Matrix Factorization

Non-negative matrix factorization algorithm

Minimization problem

Different cost functions C of NMF– Euclidean distance– KL divergence

mddnmn WBV ][][][

),(min,

BWVCWB

subject to elements of B,W 0≧

Page 6: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Non-negative Matrix Factorization

Euclidean distance cost function

KL divergence cost function

Multiplicative Update Algorithm

ji

jijiWB

BWVBWVC,

2,,

,))((),(min

ji

jijiji

jiji

WBBWV

BW

VVBWVC

,,,

,

,,

,))(

)(log(),(min

1

T

T

BWB

VB

WWT

T

W

WWB

V

BB

1

Page 7: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Non-negative Matrix Factorization

The magnitude spectrogram S and M are calculated by NMF

Larger number of basis vectors– Lower approximation error– Redundant set of basis– Require more computation time

musicmusicTrain

speechspeechTrain

WBM

WBS

Page 8: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Signal Separation and Masking

The NMF is used decompose the magnitude spectrogram matrix X

The initial spectrograms estimates for speech and music signals are respectively calculated as follows

WBBX musicspeech ][

Mmusic

Sspeech

WBM

WBS

~

~

Where WS and WM are submatrices in matrix W

Page 9: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Signal Separation and Masking

Use the initial estimated spectrograms and to build a mask as follows

Source signals reconstruction

S~

M~

PP

P

MS

SH ~~

~

XHM

XHS

)1(ˆ

ˆ

Where 1 is a matrix of ones is element-wise multiplication

Page 10: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Signal Separation and Masking

Two specific values of p correspond to special masks– Wiener filter(soft mask)

– Hard mask

22

2

~~

~

MS

SHWiener

)~~

~(

22

2

MS

SroundH hard

Page 11: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Signal Separation and Masking

The value of the mask versus the linear ratio for different values of p

Page 12: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Experiments and Discussion

Simulation– 16kHz sampling rate– Speech

• Training speech data-540 short utterances• Testing speech data-20 utterances

– Music• 38 pieces for training• 1 piece for testing

– Hamming window-512 point– FFT size-512 point

Page 13: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Experiments and Discussion Performance measurement of the separation

Page 14: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Experiments and Discussion

Page 15: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Experiments and Discussion

Page 16: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Experiments and Discussion

Page 17: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International

Conclusion The family of masks have a parameter to control the

saturation level

The proposed algorithm gives better results and facilitates to speed up the separation process