16
S.Patil, S. Srinivasan, S. Prasad, R. Irwin, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL: http://www.cavs.msstate.edu/hse/ies/publications/conferences/ieee_secon/2006/state_space_al gorithms/ SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

  • Upload
    kelton

  • View
    23

  • Download
    1

Embed Size (px)

DESCRIPTION

SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT. S.Patil, S. Srinivasan, S. Prasad , R. Irwin, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University - PowerPoint PPT Presentation

Citation preview

Page 1: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

S.Patil, S. Srinivasan, S. Prasad, R. Irwin, G. Lazarou and J. Picone

Intelligent Electronic SystemsCenter for Advanced Vehicular Systems

Mississippi State University

URL: http://www.cavs.msstate.edu/hse/ies/publications/conferences/ieee_secon/2006/state_space_algorithms/

SEQUENTIAL STATE-SPACE FILTERS

FOR SPEECH ENHANCEMENT

Page 2: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 2 of 15Sequential State-Space Filters for Speech Enhancement

Recursive State-Space Filters

Models the state x(n) as the output of a linear system excited by white noise v1(n)

Relates the observable output of the system y(n) to the state x(n)

1( 1) ( 1, ) ( ) ( )n n n n n x F x v 2( ) ( ) ( ) ( )n n n n y H x v

Signal Model: Observation Model:

The filtering problem is posed as follows: Using the set of observed data , to find for each the MMSE of the state vector , and then to estimate the clean signal from the signal model.

1( 1) ( 1, ) ( ) ( )n n n n n x F x v

n( ) ( / )n ns Hx y2( ) ( ) ( )n n n y s v

ny1n x

s

Page 3: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 3 of 15Sequential State-Space Filters for Speech Enhancement

Auto-Regressive Model for Speech

• It is common to model speech data as being generated by an AR process:

• This framework can be used in a conventional state-space setup as follows:

p

ii nuinsnans

1

)()()()(

AR-Coefficients

AR order

Driving/Process noise

]1...,,0,0[

,

...,,,

0...,,1,0,00...,,0,1,0

,]ˆ,...ˆ,ˆ[

where

and

11

1

11

H

aaa

F

TsssX

VXHY

UXFX

pp

kpkpkk

kkk

kkk

Noisy observations

Filtered estimates

State-evolution function

Measurement function

Page 4: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 4 of 15Sequential State-Space Filters for Speech Enhancement

Kalman Filters

• Linear recursive filters where a Gaussian random (state) vector is propagated through a linear state space model.

• Objective: to find a ‘filtered’ estimate of the state vector, X , at time k represented as a linear combination of the measurements up to time k, such that the following quadratic cost function is minimized:

• P, Q and R represent the covariance matrices for the state-estimation error, process noise and measurement noise.

)]ˆ()ˆ[( kkkk XXMTXXE

a symmetric nonnegative definite weighting matrix

kkk

kkkkk

T

K

T

kk

kkkk

kkk

PHKIPXHYKXX

RHPHHPK

QFPFPXFX

T

)()ˆ(ˆˆ)(

ˆˆ

1

1

1Prediction

Estimation / Filtering

Page 5: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 5 of 15Sequential State-Space Filters for Speech Enhancement

Nonlinear Recursive State-Space Filters

Nonlinear State Estimation:

• Predicting state vector • Kalman gain• Predicting observation

• When propagating a GRV through a linear model (Kalman filtering), it is easy to compute the expectation, E[ ].

• If the same GRV passes through a nonlinear model, the resultant estimate may no longer be Gaussian, and calculation of E[ ] is no longer easy!

The calculation of E[ ] in the recursive filtering process requires an estimate the pdf of the propagating random vector

1( )

k k k x F x ,v

( )k k ky H x ,nF, H are (nonlinear) vector functions

1 1ˆ ˆ[ ( )]k k kE

x F x ,v

ˆ ˆ[ ( )]k k kE -y H x ,n

ˆ ˆ ˆ( )]k k k k kK x x y y

Prediction of Innovationˆ kx

a Posteriori estimate

1

k k k kx y y yk P P

K

Page 6: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 6 of 15Sequential State-Space Filters for Speech Enhancement

Unscented Kalman Filters

• Extended Kalman Filters (EKFs) avoid this by linearizing the state-space equations and reducing the estimate equations to point estimates:

• The covariance matrices (P) are computed by linearizing the dynamics

• The state distributions are approximated by a GRV and propagated analytically through a ‘first-order linearization’ of the nonlinear system

If the assumption of local linearity is violated, the resulting filter will be unstable

The linearization process itself involves computations of Jacobians and Hessians (which are nontrivial)

1ˆ ˆ( )k k

x F x ,v

1ˆ ˆk k k k

x y y yk P P

Kˆ ˆ( )k k k

-y H x ,n

1 ,k k kA Bv x x k k kC Dn y x

Page 7: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 7 of 15Sequential State-Space Filters for Speech Enhancement

The Unscented Transform

Problem statement: calculation of the statistics of a random variable which has undergone a nonlinear transformation

Approach: It is easier to approximate a distribution than it is to approximate an arbitrary nonlinear transformation.

Consider a nonlinearity:

• The Unscented transform chooses (deterministically) a set of ‘sigma’ points in the original (x) space which when mapped to the transformed (y) space will be capable of generating accurate sample means and covariance matrices in the transformed space

• An accurate estimation of these allows us to use the Kalman filter in the usual setup, with the expectation values being replaced by sample means and covariance matrices as appropriate

( )y g x

Page 8: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 8 of 15Sequential State-Space Filters for Speech Enhancement

Unscented Kalman Filters (UKFs)

• Instead of propagating a GRV through the “linearized” system, the UKF technique focuses on an efficient estimation of the statistics of the random vector being propagated through a nonlinear function

• To compute the statistics of , we ‘cleverly’ sample this function using ‘deterministically’ chosen 2L+1 points about the mean of x

( )y g x

0

( ( ) ) , 1, 2,....

( ( ) ) , , 1,....2

i x i

i x i L

x

x L P i L

x L P i L L L

Sigma Points

Page 9: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 9 of 15Sequential State-Space Filters for Speech Enhancement

Unscented Kalman Filters (UKFs)

It can be proved that with these as sample points, the following ‘weighted’ sample means and covariance matrices will be equal to the true values

• We use this set of equations to predict the states and observations in the nonlinear state-estimation model. The expected values E[ ] and the covariance matrices are replaced with these sample ‘weighted’ versions.

2

0

2

0

{ }{ }

, ( ), 0,1,2....2

Lmi i

i

Lc T

y i i ii

i i

y W y

P W y y y y

where y g i L

( )

( ) 2

( ) ( )

1

1, 1,2,...., 2

2( )

mo

co

c mi i

WL

WL

W W i LL

Page 10: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 10 of 15Sequential State-Space Filters for Speech Enhancement

Particle Filters

• Being increasingly employed for sequential state estimation from systems with a nonlinear state-space model

• These do not restrict the statistics of the propagating random vector to a Gaussian distribution

• These are based on sequential Monte Carlo techniques

Represent the pdfs

using particles

Update these particles with each

observation

Use these ‘learned’pdfs for

state estimation

Use the state estimates for

prediction / filtering / smoothing

Observed Signal

Page 11: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 11 of 15Sequential State-Space Filters for Speech Enhancement

Particle Filters

• Probability distributions are represented in terms of randomly picked samples (particles)

• These particles are then recursively updated using Bayesian estimation procedures

• Direct generation of particles for arbitrary distributions is difficult

• Particles are generated for another distribution – importance density function

Importance Density Function: chosen such that it facilitates easy drawing of samples

• Accuracy of density estimation using particles, and hence, of state estimation and filtering will improve with an increase in the number of particles used

• 100 particles were used for density estimation in this setup

Page 12: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 12 of 15Sequential State-Space Filters for Speech Enhancement

Experimental Setup

• A comparative analysis of the performance of three state-space sequential filtering algorithms is presented

• Three test signals were used:

• An Auto-Regressive process with known driving parameters

• A single tone sinusoid

• Sample data from an acoustic utterance

• An Auto-Regressive model (order=10) was used to represent these signals.

• Filtering was performed on a per-frame basis, to ensure that stationarity is not violated.

• The AR coefficients are learned using Yule-Walker equations.

• The filtering process in each frame is used by initializing the filter with the last 10 samples of the previously (filtered) frame.

Page 13: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 13 of 15Sequential State-Space Filters for Speech Enhancement

Experimental results

  AR process Sine Speech

I/P SNR KF PF UKF KF PF UKF KF PF UKF

-10.0 0.0293 0.0182 0.0064 0.0514 0.5607 0.1369 0.0194 0.0185 0.0079

-5.0 0.0128 0.0012 0.0052 0.0150 0.5408 0.0832 0.0065 0.0060 0.0051

0.0 0.0039 0.0051 0.0038 0.0053 0.1911 0.0366 0.0022 0.0019 0.0023

5.0 0.0015 0.0020 0.0026 0.0017 0.0882 0.0244 0.0007 0.0006 0.0011

10.0 0.0006 0.0007 0.0020 0.0006 0.0363 0.0207 0.0002 0.0002 0.0008

15.0 0.0001 0.0003 0.0017 0.0002 0.0129 0.0195 0.0001 0.0001 0.0007

20.0 0.0006 0.0005 0.0016 0.0001 0.0050 0.0206 0.0001 0.0001 0.0007

KF performs better at high SNRs

UKF performs better at low SNRs

Page 14: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 14 of 15Sequential State-Space Filters for Speech Enhancement

Conclusions and Future Work

• A comparison of three sequential state space filtering algorithms

• Performance studied as a function of SNR:

At high SNRs, KF consistently outperforms UKF and PF

At low SNRs, UKF and PF perform better than KF

• This experimental design is ready for applications in speech enhancement and robust speech recognition.

• The code is publicly available and suitable for use in a large scale speech enhancement / recognition applications.

• This work can be extended to a nonlinear AR model using a perceptron to learn the nonlinear vector functions.

• This will allow modeling of nonlinear state-space functions and measurement functions.

• Another attractive model for speech recognition applications could be a linear AR model and a nonlinear observation model.

Page 15: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 15 of 15Sequential State-Space Filters for Speech Enhancement

Resources

• Pattern Recognition Applet: compare popular linear and nonlinear algorithms on standard or custom data sets

• Speech Recognition Toolkits: a state of the art ASR toolkit for testing the efficacy of these algorithms on recognition tasks

• Foundation Classes: generic C++ implementations of many popular statistical modeling approaches

Page 16: SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Page 16 of 15Sequential State-Space Filters for Speech Enhancement

References

• L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.

• M. Gabrea, “Robust Adaptive Kalman Filtering-Based Speech Enhancement Algorithm,” Proceedings of the International Conference on Acoustics, Speech and Signal Signal Processing (ICASSP) , vol.  1, pp‑I301-304, May 2004.

• E. Wan and R. Merwe, “The unscented Kalman Filter for Nonlinear Estimation,” Proceedings of the Adaptive Systems for Signal Processing, Communications and Control Symposium, pp. 153-158, November 2005.

• G. Welch and G. Bishop, “Introduction to the Kalman Filter,” Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, April 2004.Figure 3: Output MSE vs. input SNR plot for a signal generated from an AR model

• S. Haykin and E. Moulines, “From Kalman to Particle Filters,” Proceedings of the International Conference on Acoustics, Speech and Signal Signal Processing (ICASSP), Philadelphia, PA, USA, March 2005.

• R. Merve, N. de Freitas, A. Doucet, and E. Wan, “The Unscented Particle Filter,” Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department, Cambridge University, U.K, August 2000.

• P. Djuric, J. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. Bugallo, and J. Miguez, “Particle Filtering,” IEEE Signal Processing Magazine, vol. 20, no. 5, pp. 19-38, September 2003.

• N. Arulampalam, S. Maskell, N. Gordan, and T. Clapp, "Tutorial On Particle Filters For Online Nonlinear/ Non-Gaussian Bayesian Tracking," IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174-188, February 2002.