39
Mingzi Li Department of Electrical Engineering Supervised by: Prof. Israel Cohen November 2013 Multisensory speech enhancement in noisy environments using bone-conducted and air-conducted microphones

Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Mingzi Li

Department of Electrical Engineering

Supervised by: Prof. Israel Cohen

November 2013

Multisensory speech enhancement

in noisy environments using

bone-conducted and air-conducted microphones

Page 2: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Outline

Introduction

Review of existing methods

Probabilistic approach

Geometric extension approach

Summary

Page 3: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Introduction • Speech enhancement

• Multi-sensory speech enhancement

• Bone-conducted microphone

• Research objectives

Page 4: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Speech enhancement

Introduction

Mobile phones VoIP

Teleconferencing

Hearing

aids

Speech

recognition

Improve speech

quality

Page 5: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Multi-sensory speech enhancement

Audio-visual speech processing (G. Potamianos et al,2004)

Air and throat microphones (M. Graciarena et al,2003)

Ear plug (O. M. Strand et al, 2003)

Stethoscope device (P. Heracleous et al, 2003)

Aliph’s Jawbone headsets

Electromagnetic motion sensor (GEMS) (G. C. Burnett,1999)

Physiological microphone (P-Mic) (M. V. Scanlon,1998)

Electroglottograph (EGG) (M. Rothenberg,1992)

Bone-Conducted Microphone (T. Yanagisawa et al, 1975)

Introduction

Page 6: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Bone-conducted microphone

Introduction

Page 7: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Conducting path (K. Kondo et al,2006)

Conducting path of AC and BC microphones

• Bone conducted:

• Less noise & low frequency

• Air conducted:

• More noise & complete frequency

,

t t t t

t t t t t t

t

t t

t

t

t

Y X V U AC

B H X GV W BC

X Clean

H G Transfer function

V Noise

U AC sensor Noise

W BC sensor Noise

Model

Introduction

Page 8: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Signal and spectrogram (A. Subramanya et al,2008)

Waveforms and spectrograms of the signals captured by the ABC microphone. The first row

shows the signal captured by the air microphone and the second row shows the signal

captured by the bone microphone.

Introduction

Page 9: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Research objectives

BC microphone as a dominant sensor:

• Geometric harmonics method

• Laplacian pyramid method

Compare the proposed methods with an existing method

Introduction

Page 10: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Review of existing methods • BC microphone as a supplementary sensor

• BC microphone as a dominant sensor

Page 11: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Methods

BC m as a supplementary sensor BC m as a dominant sensor

BC m for voice activity detection Equalization: IDFT; DFT; LMS

BC m for pitch detection Analysis and synthesize: LP; LSF (Neural Network)

BC m for low frequency enhancement Probabilistic: ML; MMSE (DBN)

Review of existing methods

Page 12: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Voice activity detection (M.Zhu et al,2007)

Review of existing methods

Speech detection using bone sensors: The top figure illustrates the speech signal captured

by the bone sensor when two people are talking at the same time. The middle figure shows the signal

captured by the regular microphone and bottom figure presents the detection result.

Page 13: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Pitch detection (M. S. Rahman et al, 2010)

Review of existing methods

Left: Pitch tracking of air-conducted speech in noiseless condition. Center: Speech

spectrogram. Right: Pitch tracking of bone-conducted speech in noiseless condition. The

experiments have been conducted on four speeches.

Page 14: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Pitch detection (Cont.)

Review of existing methods

Pitch contours estimated from speech when corrupted by noise. a) pitch contours estimated

from air-conducted speech, b) pitch contours estimated from bone-conducted speech.

Page 15: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

BC m for low frequency enhancement (M. S. Rahman,2011)

Block diagram when BC Speech is used for low frequency enhancement.

Page 16: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Equalization

Review of existing methods

IDFT (T. Shimamura et al,2005)

DFT (K. Kondo,2006)

Least Mean Square (LMS) filter (T. Shimamura,2006)

Page 17: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Analysis and synthesize

Review of existing methods

Linear prediction (LP) filter (T. T. Vu,2006)

Line spectral frequency (LSF) filter (T. T. Vu,2008)

Page 18: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Probabilistic

Review of existing methods

Maximum likelihood estimation (MLE) (Z.Liu et al, 2004)

MMSE estimator (A. Subramanya et al, 2005)

Dynamic Bayesian Network (DBN) (A. Subramanya et al,2008)

,

,

( , ) ( , , , )

( , , , ) ( , , ) ( , )

t t t t t t t t

s m

t t t t t t t t t t t t

s m

p X Y B p X S s M m Y B

p X Y B S s M m p M m Y B S s p S s Y B

2

2

, , / 2

, , , / 2

NV

n l W

Y l k X l kR

B l k H l k X l k

1

0

( ) , , ,ˆ ,

, , , , , ( )t

P T l t Y l k B l kX l k

E X l k Y l k B l k T l t

Page 19: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Probabilistic approach • Model

• Network description

• Transfer function & leakage factor

• MMSE estimator

• Result

Page 20: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Model

Air conducted(AC):

Bone conducted(BC):

Background noise:

AC Sensor noise:

BC Sensor noise:

Optimal linear mapping:

Leakage of noise:

Probabilistic approach

t t t tY X V U

t t t t t tB H X GV W

2~ (0, )t vV N

2~ (0, )t uU N

2~ (0, )t wW N

tH

tG

Page 21: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Assumption

Feature: Magnitude-normalized complex spectra

Training: Speech model(k-means)

Method: dynamic Bayesian network (DBN)

( , , , , , , , , )

( , , ) ( , , ) ( ) ( , )

( , ) ( ) ( ) ( ) ( )

t t t t t t t t t

t t t t t t t t t t t t t

t t t t t t

p Y B X X V S M U W

p Y X V U p B X V W p X X p X M S

p M S p S p V p U p W

tt

t

XX

X

1 1, , ,..., , ..., , , 1 ,

, log log , 1

Tf f N N

i j i j i j i j

f f f f

i j i j

d X X d X X d X X d X X i j T

d X X X X f N

Probabilistic approach

Page 22: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Network description

Dynamic Bayesian network

Speech/non-speech

Mixture index

Match the clean speech

Normalized speech

Optimal linear mapping

Leakage noise

Probabilistic approach

Page 23: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Transfer function & leakage factor

Transfer function (non-speech)

Leakage factor (speech)

2

2 2 2 22 2 2 2 2 2 *

2 *

( ) ( ) 4

2

v v v

v

v t w t v t w t v w t tt N t N t N

t

v t tt N

B Y B Y B YG

B Y

2

2 2 2 22 2 2 2 2 2 *

2 *

( ' ) ( ' ) 4 ( ' )

2 ( ' )

s s s

s

v t w t v t w t v w t tt N t N t N

t t

v t tt N

B Y B Y B YH G

B Y

Probabilistic approach

Page 24: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

MMSE estimator & result

Estimator

Result

ˆ( , ) ( 0 , ) ( , , 0, 0)

( 0 , ) ( , , 1) ( , , 1, )

t t t t t t t t t t t t

t t t t t t t t t t t t

m

X E X Y B p S Y B E X Y B S M

p S Y B p M m Y B S E X Y B S M m

Probabilistic approach

Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering

speaker

Page 25: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Geometric extension approach • Model

• Nyström extension method

• Geometric harmonics

• Laplacian pyramid estimation

• Result

Page 26: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Model

Train: ( Mapping from concatenation of noisy AC and BC

speech to clean speech.)

Test: ( Extension of the mapping from concatenation of noisy

AC and BC speech to clean speech.)

512 256:t f R R

t t

t

YYB X

B

Geometric extension approach

512 256*

:* *

*

f R Rt

t t

t

YYB X

B

Page 27: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Nyström extension method (C.T.H. Baker, 1977)

Goal: extend relevant “information” about a large dataset in a

high dimensional space.

Method: find a low-rank approximation to a symmetric,

positive semi-definite kernel.

In essence: only use partial information about the kernel to

solve a simpler eigenvalue problem, and then to extend the

solution using complete knowledge of the kernel.

Geometric extension approach

Page 28: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Nyström extension method (C.T.H. Baker, 1977)

Eigen function approximation

Nyström extension

Geometric extension approach

Scheme of learning functions

(N. Rabin, 2012)

Page 29: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Geometric harmonics (GH) (R.R. Coifman, 2006)

Definition

Example

Gaussian extension

Harmonic extension

Wavelet extension

Geometric extension approach

Page 30: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Geometric harmonics (GH) (R.R. Coifman, 2006)

Eigenvector approximation

Extension

Geometric extension approach

Page 31: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Comments of GH

Need to tune the parameters

Extension of the function is not the original function but the

projection of the function.

The extension range has relation to the complexity of the

function.

, .l

Geometric extension approach

Page 32: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Laplacian pyramid (LP) (Burt and Adelson,1983)

Geometric extension approach

Page 33: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Laplacian pyramid (LP) (N. Rabin, 2012)

Algorithm: Kernel:

Iteration:

Estimation:

21

0 0 0 0 0exp( / ) ( ) ( , )i j i i jW x x K q x w x x

0 01

1

1 0 1

210

1

( ) ( , ) ( )

( ) ( ) ( ) ( ) ( ) ( )

exp( / ) ( ) ( , )2

( ) ( , ) ( )

n

k i k ii

l

k k k l k k i ki

l i j l l i l i jl

n

l k l i k l ii

s x k x x f x

d x f x s x d x f x s x

W x x K q x w x x

s x k x x d x

( ) ( )k l klf x s x

Geometric extension approach

Approximation with

current kernel and

residual

Residual

previous-current

Start with a

coarse kernel

Stop after multiple

iterations

Page 34: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Comments of LP

Kernel method

Improve(Iterate) Diffusion

Residual

LP

• Statistic analysis

Geometric extension approach

0

1

ˆ ( ) ( , ) ( )n

k i k i

i

f x k x x f x

1ˆ ˆ ˆ( ) ( ) ( ) ( )l k l k l k l kf x f x K f x f x

1 0ˆ ˆ ˆ( ) ( ) ( ) ( )l k l k k l kf x f x K f x f x

1 0ˆ ˆ ˆ( ) ( ) ( )l k l k l kf x f x K I f x

2

21 1

0

1 1 1 1

2

0

1 1

2 2 2

0

1

ˆ ( ) ( )

( , ) ( ) ... ( , ) ( ) ( ) ( )

( , ) ... ( , )

( , ) ... ( ,

k k

n n l l

i k i i l i k i i i i i k s

i i i i

n n

i k i l i k i s

i i

n

i k i l i k

i

MSE E f x f x

E k x x f x n k x x f x n s x s x e

E k x x n k x x n e

E k x x n k x x

2 2

1

22 2 2 2 2

1 1 1

)

( 2 ln( ( , )))( , )

ln 2

n

i s

i

ln L nl i k

l i k i s i s

i l i

n e

Ei k x xE k x x n e e

1

1 1( ) ( , ) ( ) ( ) ( )

ˆ ( ) ( ) ( ) ( ) ( ) ( )

n l

l k l i k i i i i l ki i

k k l k k l k k s

E s x E k x x f x n s x s x

Bias E f x f x E s x f x s x f x e

Model : ( ) ( )k k ky x f x n

Page 35: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Result (GH)

Geometric extension approach

Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering

speaker

Page 36: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Result (LP)

Geometric extension approach

Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering

speaker

Page 37: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Comparison of Log Spectral Distortion

Summary

Page 38: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Conclusion

Probabilistic approach scheme improves the quality of

reconstructed speech.

Geometric harmonics can not describe the map very well.

Laplacian pyramid method enable further noise reduction,

but at the cost of distortion for the reconstructed speech.

Summary

Page 39: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates

Future research

Geometric harmonics in a multi-scale manner.

Find the relation between iteration number and noise level

for Laplacian pyramids.

Further processing needs to reduce distortions of

reconstructed speech in geometric methods.

Summary