21
30/08/2004 1 Department of Communication Technology A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition - Robust2004 workshop, Norwich, UK Zheng-Hua Tan, Børge Lindberg and Paul Dalsgaard {zt, bli, pd}@kom.aau.dk Aalborg University, Denmark

Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 1

Department ofCommunication Technology

A Comparative Study of Feature-Domain Error Concealment Techniques

for Distributed Speech Recognition

- Robust2004 workshop, Norwich, UK

Zheng-Hua Tan, Børge Lindberg and Paul Dalsgaard

{zt, bli, pd}@kom.aau.dk

Aalborg University, Denmark

Page 2: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 2

Department ofCommunication Technology

Agenda

• Feature-domain EC techniques– repetition – linear interpolation– subvector concealment

• Speech recognition experiments

• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations

Page 3: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 3

Department ofCommunication Technology

Motivation

Why to do this work?• A variety of EC techniques for DSR occur

– A survey

• Repetition vs. interpolation– Which is better?

• What makes an EC technique good for recognition?

Page 4: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 4

Department ofCommunication Technology

EC techniques

Two classes of EC techniques• Client based EC

– e.g. retransmission and forward error control (FEC)

• Server based EC(the redundancy in the transmitted signal is exploited)– in the model-domain

• Weighted Viterbi, missing feature theory

– in the feature-domain• Insertion based techniques: splicing, substitution, repetition• Interpolation based techniques: linear interpolation• Subvector concealment

Page 5: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 5

Department ofCommunication Technology

Subvector concealment

• Observation1: conventional EC schemes share a common characteristic - conducting EC at the vector level

• Observation 2: within erroneous vectors, a substantial number of subvectors are often error-free

Subvector based EC

Page 6: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 6

Department ofCommunication Technology

Subvector concealment (cont.)

• The ETSI-DSR standard– Feature-pair and SVQ: The n’th vector is

– Frame-pair:

Tnnnnnnn Eccccc ]log,,,...,,,[ 0121121 V

TTnTnTn ]][,][...,,][[ 650 S S S Feature-pair

Subvector

][ 1 V ,V nn

Page 7: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 7

Department ofCommunication Technology

• Buffering matrix

• Consistency test

TSSd OR TSSd jn

jn

jjn

jn

j ))1())1()1((())0())0()0((( 11

Subvector concealment (cont.)

B2NA1-2NA2A1A V V V . V V V A

BNNA

BNNA

BNNA

BNNA

BNNA

BNNA

BNNA

62

612

62

61

66

52

512

52

51

55

42

412

42

41

44

32

312

32

31

33

22

212

22

21

22

12

112

12

11

11

02

012

02

01

00

.

.

.

.

.

.

.

SSSSSS

SSSSSS

SSSSSS

SSSSSS

SSSSSS

SSSSSS

SSSSSS

AAAA

AAAA

AAAA

AAAA

AAAA

AAAA

AAAA

))(())(( 21

22

201

12

1 cAA

cAA TccdORTccd

Page 8: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 8

Department ofCommunication Technology

Consistency matrix and subvector concealment

B8A7A6A5A4A3A2A1A V V V V V V V V V V A

1110011001

1111111111

1001111111

1111111001

1111111111

1110011111

1110000111

0 for inconsistent

1 for consistentC =

6

5

4

3

2

1

0

S

S

S

S

S

S

S

Subvector concealment (cont.)

Page 9: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 9

Department ofCommunication Technology

Outline

• Feature-domain EC techniques– repetition – linear interpolation– subvector concealment

• Speech recognition experiments

• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations

Page 10: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 10

Department ofCommunication Technology

Recognition experiments

• two tasks: Danish digits and city names• the HTK based reference recogniser • the realistic GSM error patterns (EP) :

– EP1, 10 dB (C/I ratios )

– EP2, 7 dB

– EP3, 4dB

Page 11: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 11

Department ofCommunication Technology

Recognition experiments (cont.)

The %WER for three EC techniques

(a) Danish digits (b) city names

0

2

4

6

8

10

12

EP1 EP2 EP3

Repetition

Interpolation

Subvector

20

25

30

35

40

45

EP1 EP2 EP3

Repetition

Interpolation

Subvector

Page 12: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 12

Department ofCommunication Technology

Outline

• Feature-domain EC techniques– repetition – linear interpolation– subvector concealment

• Speech recognition experiments

• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations

Page 13: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 13

Department ofCommunication Technology

Comparative study - MFCC features

• Transmission errors of a random BER value of 2% is used.

• The original error-free MFCC features are directly compared with the features corrupted with errors but concealed either – by repetition – by interpolation– by subvector concealment

Page 14: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 14

Department ofCommunication Technology

Comparative study - MFCC features (cont.)

• MFCC c0

• Two observations

Page 15: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 15

Department ofCommunication Technology

Comparative study - MFCC features (cont.)

• Interpolation: straight line – constant value segment – zero value segment

Page 16: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 16

Department ofCommunication Technology

Comparative study - MFCC features (cont.)

• Repetition generated feature curves display similar shapes even though there are some displacements along the time axis as compared to the iMFCC feature.

• However, the DP embedded in the Viterbi algorithm makes this displacement relatively irrelevant.

Page 17: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 17

Department ofCommunication Technology

Comparative study - DP distances

– The Euclidean and DP distances between c0 of

MFCC and MFCC generated by different EC techniques for word “et”

0

1

2

3

4

5

Euclidean DP

Repetition Interpolation Subvector

– General expectation: interpolation performs better

• Signal reconstruction vs. speech recognition• Euclidean distance vs. DP distance

Page 18: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 18

Department ofCommunication Technology

Comparative study - DP distances (cont.)

Over 328 testing utterances• Number of smaller distances

• Subvector EC always gives the smallest for both distances.

0

50

100

150

200

250300

Euclidean DP

Repetition

Interpolation

Page 19: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 19

Department ofCommunication Technology

Comparative study - HMM state durations

• Viterbi decoding tracks the HMM state alignment • The average state-durations

• Two facts are observed:– repetition vs. interpolation– subvector vs. error-free

0

2

4

6

State durationRepetition InterpolationSubvector Erro-free

Page 20: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 20

Department ofCommunication Technology

Summary

• Three different EC techniques compared– the simple repetition technique is as good as

or even better than linear interpolation– subvector concealment performs best

• Comparative study– MFCC features– Euclidean and DP distances– HMM state durations

Page 21: Department of Communication Technology 30/08/2004 1 A Comparative Study of Feature-Domain Error Concealment Techniques for Distributed Speech Recognition

30/08/2004 21

Department ofCommunication Technology

A Comparative Study of Feature-Domain Error Concealment Techniques

for Distributed Speech Recognition

Thanks!