Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Mingzi Li
Department of Electrical Engineering
Supervised by: Prof. Israel Cohen
November 2013
Multisensory speech enhancement
in noisy environments using
bone-conducted and air-conducted microphones
Outline
Introduction
Review of existing methods
Probabilistic approach
Geometric extension approach
Summary
Introduction • Speech enhancement
• Multi-sensory speech enhancement
• Bone-conducted microphone
• Research objectives
Speech enhancement
Introduction
Mobile phones VoIP
Teleconferencing
Hearing
aids
Speech
recognition
Improve speech
quality
Multi-sensory speech enhancement
Audio-visual speech processing (G. Potamianos et al,2004)
Air and throat microphones (M. Graciarena et al,2003)
Ear plug (O. M. Strand et al, 2003)
Stethoscope device (P. Heracleous et al, 2003)
Aliph’s Jawbone headsets
Electromagnetic motion sensor (GEMS) (G. C. Burnett,1999)
Physiological microphone (P-Mic) (M. V. Scanlon,1998)
Electroglottograph (EGG) (M. Rothenberg,1992)
Bone-Conducted Microphone (T. Yanagisawa et al, 1975)
Introduction
Bone-conducted microphone
Introduction
Conducting path (K. Kondo et al,2006)
Conducting path of AC and BC microphones
• Bone conducted:
• Less noise & low frequency
• Air conducted:
• More noise & complete frequency
,
t t t t
t t t t t t
t
t t
t
t
t
Y X V U AC
B H X GV W BC
X Clean
H G Transfer function
V Noise
U AC sensor Noise
W BC sensor Noise
Model
Introduction
Signal and spectrogram (A. Subramanya et al,2008)
Waveforms and spectrograms of the signals captured by the ABC microphone. The first row
shows the signal captured by the air microphone and the second row shows the signal
captured by the bone microphone.
Introduction
Research objectives
BC microphone as a dominant sensor:
• Geometric harmonics method
• Laplacian pyramid method
Compare the proposed methods with an existing method
Introduction
Review of existing methods • BC microphone as a supplementary sensor
• BC microphone as a dominant sensor
Methods
BC m as a supplementary sensor BC m as a dominant sensor
BC m for voice activity detection Equalization: IDFT; DFT; LMS
BC m for pitch detection Analysis and synthesize: LP; LSF (Neural Network)
BC m for low frequency enhancement Probabilistic: ML; MMSE (DBN)
Review of existing methods
Voice activity detection (M.Zhu et al,2007)
Review of existing methods
Speech detection using bone sensors: The top figure illustrates the speech signal captured
by the bone sensor when two people are talking at the same time. The middle figure shows the signal
captured by the regular microphone and bottom figure presents the detection result.
Pitch detection (M. S. Rahman et al, 2010)
Review of existing methods
Left: Pitch tracking of air-conducted speech in noiseless condition. Center: Speech
spectrogram. Right: Pitch tracking of bone-conducted speech in noiseless condition. The
experiments have been conducted on four speeches.
Pitch detection (Cont.)
Review of existing methods
Pitch contours estimated from speech when corrupted by noise. a) pitch contours estimated
from air-conducted speech, b) pitch contours estimated from bone-conducted speech.
BC m for low frequency enhancement (M. S. Rahman,2011)
Block diagram when BC Speech is used for low frequency enhancement.
Equalization
Review of existing methods
IDFT (T. Shimamura et al,2005)
DFT (K. Kondo,2006)
Least Mean Square (LMS) filter (T. Shimamura,2006)
Analysis and synthesize
Review of existing methods
Linear prediction (LP) filter (T. T. Vu,2006)
Line spectral frequency (LSF) filter (T. T. Vu,2008)
Probabilistic
Review of existing methods
Maximum likelihood estimation (MLE) (Z.Liu et al, 2004)
MMSE estimator (A. Subramanya et al, 2005)
Dynamic Bayesian Network (DBN) (A. Subramanya et al,2008)
,
,
( , ) ( , , , )
( , , , ) ( , , ) ( , )
t t t t t t t t
s m
t t t t t t t t t t t t
s m
p X Y B p X S s M m Y B
p X Y B S s M m p M m Y B S s p S s Y B
2
2
, , / 2
, , , / 2
NV
n l W
Y l k X l kR
B l k H l k X l k
1
0
( ) , , ,ˆ ,
, , , , , ( )t
P T l t Y l k B l kX l k
E X l k Y l k B l k T l t
Probabilistic approach • Model
• Network description
• Transfer function & leakage factor
• MMSE estimator
• Result
Model
Air conducted(AC):
Bone conducted(BC):
Background noise:
AC Sensor noise:
BC Sensor noise:
Optimal linear mapping:
Leakage of noise:
Probabilistic approach
t t t tY X V U
t t t t t tB H X GV W
2~ (0, )t vV N
2~ (0, )t uU N
2~ (0, )t wW N
tH
tG
Assumption
Feature: Magnitude-normalized complex spectra
Training: Speech model(k-means)
Method: dynamic Bayesian network (DBN)
( , , , , , , , , )
( , , ) ( , , ) ( ) ( , )
( , ) ( ) ( ) ( ) ( )
t t t t t t t t t
t t t t t t t t t t t t t
t t t t t t
p Y B X X V S M U W
p Y X V U p B X V W p X X p X M S
p M S p S p V p U p W
tt
t
XX
X
1 1, , ,..., , ..., , , 1 ,
, log log , 1
Tf f N N
i j i j i j i j
f f f f
i j i j
d X X d X X d X X d X X i j T
d X X X X f N
Probabilistic approach
Network description
Dynamic Bayesian network
Speech/non-speech
Mixture index
Match the clean speech
Normalized speech
Optimal linear mapping
Leakage noise
Probabilistic approach
Transfer function & leakage factor
Transfer function (non-speech)
Leakage factor (speech)
2
2 2 2 22 2 2 2 2 2 *
2 *
( ) ( ) 4
2
v v v
v
v t w t v t w t v w t tt N t N t N
t
v t tt N
B Y B Y B YG
B Y
2
2 2 2 22 2 2 2 2 2 *
2 *
( ' ) ( ' ) 4 ( ' )
2 ( ' )
s s s
s
v t w t v t w t v w t tt N t N t N
t t
v t tt N
B Y B Y B YH G
B Y
Probabilistic approach
MMSE estimator & result
Estimator
Result
ˆ( , ) ( 0 , ) ( , , 0, 0)
( 0 , ) ( , , 1) ( , , 1, )
t t t t t t t t t t t t
t t t t t t t t t t t t
m
X E X Y B p S Y B E X Y B S M
p S Y B p M m Y B S E X Y B S M m
Probabilistic approach
Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering
speaker
Geometric extension approach • Model
• Nyström extension method
• Geometric harmonics
• Laplacian pyramid estimation
• Result
Model
Train: ( Mapping from concatenation of noisy AC and BC
speech to clean speech.)
Test: ( Extension of the mapping from concatenation of noisy
AC and BC speech to clean speech.)
512 256:t f R R
t t
t
YYB X
B
Geometric extension approach
512 256*
:* *
*
f R Rt
t t
t
YYB X
B
Nyström extension method (C.T.H. Baker, 1977)
Goal: extend relevant “information” about a large dataset in a
high dimensional space.
Method: find a low-rank approximation to a symmetric,
positive semi-definite kernel.
In essence: only use partial information about the kernel to
solve a simpler eigenvalue problem, and then to extend the
solution using complete knowledge of the kernel.
Geometric extension approach
Nyström extension method (C.T.H. Baker, 1977)
Eigen function approximation
Nyström extension
Geometric extension approach
Scheme of learning functions
(N. Rabin, 2012)
Geometric harmonics (GH) (R.R. Coifman, 2006)
Definition
Example
Gaussian extension
Harmonic extension
Wavelet extension
Geometric extension approach
Geometric harmonics (GH) (R.R. Coifman, 2006)
Eigenvector approximation
Extension
Geometric extension approach
Comments of GH
Need to tune the parameters
Extension of the function is not the original function but the
projection of the function.
The extension range has relation to the complexity of the
function.
, .l
Geometric extension approach
Laplacian pyramid (LP) (Burt and Adelson,1983)
Geometric extension approach
Laplacian pyramid (LP) (N. Rabin, 2012)
Algorithm: Kernel:
Iteration:
Estimation:
21
0 0 0 0 0exp( / ) ( ) ( , )i j i i jW x x K q x w x x
0 01
1
1 0 1
210
1
( ) ( , ) ( )
( ) ( ) ( ) ( ) ( ) ( )
exp( / ) ( ) ( , )2
( ) ( , ) ( )
n
k i k ii
l
k k k l k k i ki
l i j l l i l i jl
n
l k l i k l ii
s x k x x f x
d x f x s x d x f x s x
W x x K q x w x x
s x k x x d x
( ) ( )k l klf x s x
Geometric extension approach
Approximation with
current kernel and
residual
Residual
previous-current
Start with a
coarse kernel
Stop after multiple
iterations
Comments of LP
Kernel method
Improve(Iterate) Diffusion
Residual
LP
• Statistic analysis
Geometric extension approach
0
1
ˆ ( ) ( , ) ( )n
k i k i
i
f x k x x f x
1ˆ ˆ ˆ( ) ( ) ( ) ( )l k l k l k l kf x f x K f x f x
1 0ˆ ˆ ˆ( ) ( ) ( ) ( )l k l k k l kf x f x K f x f x
1 0ˆ ˆ ˆ( ) ( ) ( )l k l k l kf x f x K I f x
2
21 1
0
1 1 1 1
2
0
1 1
2 2 2
0
1
ˆ ( ) ( )
( , ) ( ) ... ( , ) ( ) ( ) ( )
( , ) ... ( , )
( , ) ... ( ,
k k
n n l l
i k i i l i k i i i i i k s
i i i i
n n
i k i l i k i s
i i
n
i k i l i k
i
MSE E f x f x
E k x x f x n k x x f x n s x s x e
E k x x n k x x n e
E k x x n k x x
2 2
1
22 2 2 2 2
1 1 1
)
( 2 ln( ( , )))( , )
ln 2
n
i s
i
ln L nl i k
l i k i s i s
i l i
n e
Ei k x xE k x x n e e
1
1 1( ) ( , ) ( ) ( ) ( )
ˆ ( ) ( ) ( ) ( ) ( ) ( )
n l
l k l i k i i i i l ki i
k k l k k l k k s
E s x E k x x f x n s x s x
Bias E f x f x E s x f x s x f x e
Model : ( ) ( )k k ky x f x n
Result (GH)
Geometric extension approach
Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering
speaker
Result (LP)
Geometric extension approach
Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering
speaker
Comparison of Log Spectral Distortion
Summary
Conclusion
Probabilistic approach scheme improves the quality of
reconstructed speech.
Geometric harmonics can not describe the map very well.
Laplacian pyramid method enable further noise reduction,
but at the cost of distortion for the reconstructed speech.
Summary
Future research
Geometric harmonics in a multi-scale manner.
Find the relation between iteration number and noise level
for Laplacian pyramids.
Further processing needs to reduce distortions of
reconstructed speech in geometric methods.
Summary