of 35 /35
HIWIRE MEETING HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA JOSÉ C. SEGURA LUNA GSTC UGR GSTC UGR

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

  • View
    216

  • Download
    2

Embed Size (px)

Text of HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

Page 1: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

HIWIRE MEETINGHIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004CRETE, SEPTEMBER 23-24, 2004

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR

Page 2: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

2

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Schedule

VAD for noise suppression & frame-dropping Long-Term Spectral divergence Subband OS-based detector

Non-linear feature normalization Histogram equalization OS-based equalization Segmental implementation

Page 3: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

3

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

VAD (1)

VAD: motivation To get an estimation of the background noise for

Wiener filter design Spectral subtraction

To discard non-speech frames

WIENERFILTER / SS

VAD

FRAMEDROPPING

NOISEESTIMATION

RECOGNIZERNOISY SPEECH

Page 4: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

4

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

VAD (2)

Our approach

Use of rather long time spans (~100ms) instead of instantaneous measures

Increase discrimination

Use an statistical model in the log-FBE domain Smoother estimations

Use a feedback decision coupled with noise suppression VAD works on less noisy speech

Use of Order Statistics More robust estimation

Page 5: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

5

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (1)

J. Ramírez , J.C. Segura, C. Benítez, A. de la Torre and A.J. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication 42 (2004) 271–287

K

KnK

K

NFFT

k

NNn

ntkXK

tkN

speechtkN

speechnontkNtkNtkN

tkN

tkLTSE

NFFTtkLTSD

ntkXtkLTSE

),(12

1),(

)1,(

),()1()1,(),(

),(

),(1log10),(

),(max),(

1

02

2

10

Page 6: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

6

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (2)

Page 7: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

7

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (3)

Page 8: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

8

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (4)

Page 9: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

9

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (5)

Page 10: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

10

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (7)

Recognition experiments with AURORA 2 and 3

Page 11: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

11

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (6)

Page 12: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

12

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (1)

J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre, and A.J. Rubio,An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition, IEEE Trans. On Speech and Audio Processing (to appear in 2005)

Decision is based on averaged QSNR defined as a inter-quantile difference

Feedback structure VAD operates over the

noise-reduced signal

Page 13: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

13

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (2)

speech)(),(1

)(

9.0),(),(),(

speech-nonfor update),()1()1,(),(

22

),(),()1(),(

),(),(),( :statisticsOrder

)},(),,(,),,({:buffer Temporal

bandin at timeEnergy -log : ),(

1

5.0

)1()(

)12()()1(

tSNRtkQSNRK

tSNR

ptkEtkQtkQSNR

tkQtkEtkE

spNfpNs

tkfEtkEftkQ

tkEtkEtkE

NtkEtkENtkE

kttkE

K

k

Np

NN

ssp

Nr

Page 14: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

14

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (3)

Page 15: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

15

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (4)

Page 16: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

16

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (5)

Page 17: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

17

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Accurate VAD

Open topics

New alternatives to improve the performanceNew decision criteria based on OS- filtersAlready used for edge detection in images

Computational efficiencyDevelopment of computationally efficient algorithms

Page 18: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

18

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Feature normalization

Objective Transform features to remove undesired variability

Linear techniques CMS

Cepstral mean subtraction Removes the effect of linear channel distortion

CMVN Cepstral mean and variance normalization Extension of CMS to deal with variance reduction caused by the

additive noise

Page 19: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

19

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Feature normalization

Non-linear feature distortion Environment effects are non-linear for MFCC features And can hardly be removed with linear techniques Because not only the location (mean) and scale (variance) of

the feature distributions are affected, but also the shape (affecting higher order moments of the distribution)

Non-linear extensions CDF-matching approaches (HEQ and related) Have been proved to be more effective than linear ones Give normalization for not only the two first moments of the

probability distributions

Page 20: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

20

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

CDF-matching based equalization

The main idea Transform the features to match a given PDF In the one-dimensional case CDF-matching gives the solution

))((][ˆ)̂()(

)()̂()()(

)̂(][ˆ)(

1

ˆ

xCxTxxxC

duuxduupxC

xxTxxpx

XXX

xx

XX

XX

Page 21: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

21

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Equalization and robust classifiers

5.38.0expexplog nhnhxy

Page 22: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

22

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Invariance

CMS is invariant to additive bias CMVN is invariant to linear transformations Equalization to a reference distribution is invariant to any

invertible transformation (including non-linear ones)

xxCxGCy

xCxGC

yCyTy

xCxTx

xGy

XY

XY

YY

XX

ˆ))(()))(((ˆ

therefore and

)())((

then invertible is G() if

))(()(ˆ

))(()(ˆ

tiontransforma general A : )(

11

1

1

Page 23: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

23

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (1)

A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Pérez, C. Benítez and A.J. Rubio, Histogram equalization of speech representation for robust speech recognition, IEEE Tans. On Speech and Audio Processing (to appear in 2005)

Transformation of each component of the MFCC vector to a Gaussian reference

Cumulative distribution are estimated using histograms

Performance compared with CMS, CMVN and model-based feature compensation (VTS)

Combination with (VTS)

Page 24: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

24

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (2)

Page 25: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

25

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (3)

Page 26: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

26

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (4)

Page 27: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

27

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (5)

Page 28: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

28

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Segmental HEQ (1)

J.C. Segura, C. Benítez, A. de la Torre, A.J. Rubio and J. Ramírez, Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition, IEEE Signal Processing Letters, 11(5), May 2004

A segmental implementation of HEQ for non-stationary noise

A temporal buffer is used for the histogram estimation instead of the full sentence

The algorithmic delay is T frames

},,,{ TttTtt xxxX

Page 29: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

29

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Segmental HEQ (2)

Page 30: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

30

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

OSEQ: An efficient implementation (1)

A very computationally efficient algorithm based on Order Statistics

][12

5.0)())(ˆ(ˆ:tionTransforma

12,,1125.0:table Lookup

125.0

)(ˆ:estimation CDF

:Statistics Order

},,,{: bufferTemporal

11

1

)(

)12()()1(

rGTxr

xCx

TrTr

G[r]

Tr

xC

xxx

xxxX

ttXt

rX

Tr

TttTtt

Page 31: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

31

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

OSEQ: An efficient implementation (2)

Page 32: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

32

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Feature normalization

Open topics Reference distribution

Clean speech / Gaussian / ¿Others? Dynamic features normalization ( and )

After, before or simultaneously [Obuchi, Stern, EUSP’03]

Progressive normalization Not all MFCC are equally affected and do not have equal

discriminative power [de Wet, …, ICASSP’03] Lower order moments normalization [Hsu, Lee, ICASSP’04]

Parametric techniques Actual approaches are non-parametric [Haverinen, Kiss, EUSP’03]

New applications Speaker independence and adaptation Multi-stream normalization

Page 33: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

33

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Combination of techniques

Development of a combined robust front-end

An accurate VAD For noise parameter estimation

A noise reduction technique Spectral subtraction or Wiener filter Statistical feature compensation

A Frame-Dropping algorithm To discard non-speech frames

And a Feature normalization block For residual non-linear distortion compensation

Page 34: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

34

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

VAD (1)

Development of a combined robust front-end

WIENERFILTER / SS

VAD

FRAMEDROPPING

NOISEESTIMATION

FEATUREEQUALIZATION

NOISY SPEECHRECOGNIZER

Page 35: HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

HIWIRE MEETINGHIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004CRETE, SEPTEMBER 23-24, 2004

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR