17
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. Grais Hakan Erdogan 17 th International Conference on Digital Signal Processing,2011

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

  • Upload
    conor

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS. Emad M. Grais. Hakan Erdogan. 17 th International Conference on Digital Signal Processing,2011. Jain-De,Lee. Outline. INTRODUCTION NON-NEGATIVE MATRIX FACTORIZATION - PowerPoint PPT Presentation

Citation preview

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Jain-De,Lee

Emad M. Grais Hakan Erdogan

17th International Conference on Digital Signal Processing,2011

Outline INTRODUCTION

NON-NEGATIVE MATRIX FACTORIZATION

SIGNAL SEPARATION AND MASKING

EXPERIMENTS AND DISCUSSION

CONCLUSION

Introduction

There are two main stages of this work– Training stage– Separation stage

Using NMF with different types of masks to improve the separation process

– The separation process faster– NMF with fewer iterations

Introduction

Problem formulation– The observe a signal x(t) ,which is the mixture of two

sources s(t) and m(t)

– Assume the sources have the same phase angle as the mixed

),(),(),( ),(),(),(

),(),(),(ftMjftSjftXj eftMeftSeftX

ftMftSftX

Where (t , f) be the STFT of x(t)

X = S + M

Non-negative Matrix Factorization

Non-negative matrix factorization algorithm

Minimization problem

Different cost functions C of NMF– Euclidean distance– KL divergence

mddnmn WBV ][][][

),(min,

BWVCWB

subject to elements of B,W 0≧

Non-negative Matrix Factorization

Euclidean distance cost function

KL divergence cost function

Multiplicative Update Algorithm

ji

jijiWB

BWVBWVC,

2,,

,))((),(min

ji

jijiji

jiji

WBBWV

BW

VVBWVC

,,,

,

,,

,))(

)(log(),(min

1

T

T

BWB

VB

WWT

T

W

WWB

V

BB

1

Non-negative Matrix Factorization

The magnitude spectrogram S and M are calculated by NMF

Larger number of basis vectors– Lower approximation error– Redundant set of basis– Require more computation time

musicmusicTrain

speechspeechTrain

WBM

WBS

Signal Separation and Masking

The NMF is used decompose the magnitude spectrogram matrix X

The initial spectrograms estimates for speech and music signals are respectively calculated as follows

WBBX musicspeech ][

Mmusic

Sspeech

WBM

WBS

~

~

Where WS and WM are submatrices in matrix W

Signal Separation and Masking

Use the initial estimated spectrograms and to build a mask as follows

Source signals reconstruction

S~

M~

PP

P

MS

SH ~~

~

XHM

XHS

)1(ˆ

ˆ

Where 1 is a matrix of ones is element-wise multiplication

Signal Separation and Masking

Two specific values of p correspond to special masks– Wiener filter(soft mask)

– Hard mask

22

2

~~

~

MS

SHWiener

)~~

~(

22

2

MS

SroundH hard

Signal Separation and Masking

The value of the mask versus the linear ratio for different values of p

Experiments and Discussion

Simulation– 16kHz sampling rate– Speech

• Training speech data-540 short utterances• Testing speech data-20 utterances

– Music• 38 pieces for training• 1 piece for testing

– Hamming window-512 point– FFT size-512 point

Experiments and Discussion Performance measurement of the separation

Experiments and Discussion

Experiments and Discussion

Experiments and Discussion

Conclusion The family of masks have a parameter to control the

saturation level

The proposed algorithm gives better results and facilitates to speed up the separation process