SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Jain-De,Lee

Emad M. Grais Hakan Erdogan

17th International Conference on Digital Signal Processing,2011

Outline INTRODUCTION

NON-NEGATIVE MATRIX FACTORIZATION

SIGNAL SEPARATION AND MASKING

EXPERIMENTS AND DISCUSSION

CONCLUSION

Introduction

There are two main stages of this work– Training stage– Separation stage

Using NMF with different types of masks to improve the separation process

– The separation process faster– NMF with fewer iterations

Introduction

Problem formulation– The observe a signal x(t) ,which is the mixture of two

sources s(t) and m(t)

– Assume the sources have the same phase angle as the mixed

),(),(),( ),(),(),(

),(),(),(ftMjftSjftXj eftMeftSeftX

ftMftSftX

Where (t , f) be the STFT of x(t)

X = S + M

Non-negative Matrix Factorization

Non-negative matrix factorization algorithm

Minimization problem

Different cost functions C of NMF– Euclidean distance– KL divergence

mddnmn WBV ][][][

),(min,

BWVCWB

subject to elements of B,W 0≧


Euclidean distance cost function

KL divergence cost function

Multiplicative Update Algorithm

ji

jijiWB

BWVBWVC,

2,,

,))((),(min

ji

jijiji

jiji

WBBWV

BW

VVBWVC

,,,

,

,,

,))(

)(log(),(min

1

T

T

BWB

VB

WWT

T

W

WWB

V

BB

1


The magnitude spectrogram S and M are calculated by NMF

Larger number of basis vectors– Lower approximation error– Redundant set of basis– Require more computation time

musicmusicTrain

speechspeechTrain

WBM

WBS

Signal Separation and Masking

The NMF is used decompose the magnitude spectrogram matrix X

The initial spectrograms estimates for speech and music signals are respectively calculated as follows

WBBX musicspeech ][

Mmusic

Sspeech

WBM

WBS

~

~

Where WS and WM are submatrices in matrix W


Use the initial estimated spectrograms and to build a mask as follows

Source signals reconstruction

S~

M~

PP

P

MS

SH ~~

~

XHM

XHS

)1(ˆ

ˆ

Where 1 is a matrix of ones is element-wise multiplication


Two specific values of p correspond to special masks– Wiener filter(soft mask)

– Hard mask

22

2

~~

~

MS

SHWiener

)~~

~(

22

2

MS

SroundH hard


The value of the mask versus the linear ratio for different values of p

Experiments and Discussion

Simulation– 16kHz sampling rate– Speech

• Training speech data-540 short utterances• Testing speech data-20 utterances

– Music• 38 pieces for training• 1 piece for testing

– Hamming window-512 point– FFT size-512 point

Experiments and Discussion Performance measurement of the separation




Conclusion The family of masks have a parameter to control the

saturation level

The proposed algorithm gives better results and facilitates to speed up the separation process

Documents

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS