4
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World Wide Web”, IEEE Journal on Selected Areas in Communications, Jan 1999 2. “Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition,” IEEE Trans. on Audio, Speech and Language Processing, May 2007 3. “Robust speech recognition over mobile and IP networks in burst-like packet loss,” IEEE Trans. on Audio, Speech and Language Processing, Jan. 2006 4. “An Integrated Solution for Error Concealment in DSR Systems over Wireless

17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World

Embed Size (px)

Citation preview

Page 1: 17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World

17.0 Distributed Speech Recognition and Wireless Environment

References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World Wide Web”,

IEEE Journal on Selected Areas in Communications, Jan 1999 2. “Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition,” IEEE Trans. on Audio, Speech and Language Processing, May 2007

3. “Robust speech recognition over mobile and IP networks in burst-like packet loss,” IEEE Trans. on Audio, Speech and Language Processing, Jan. 2006

4. “An Integrated Solution for Error Concealment in DSR Systems over Wireless Channels,” Interspeech 2006

5. “A Unified Probabilistic Approach to Error Concealment for Distributed Speech Recognition,” Interspeech 2005

Page 2: 17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World

• An Example Partition of Speech Recognition Processes into Client/Sever

Distributed Speech Recognition (DSR) and Wireless Environment

– compressed and encoded feature parameters transmitted in packets Client/Server Structure

ServerClients

Wireless Network

Front-endSignal Processing

AcousticModels Lexicon

FeatureVectors

Linguistic Decoding and

Search Algorithm

Output Sentence

SpeechCorpora

AcousticModel

Training

LanguageModel

Construction

TextCorpora

LexicalKnowledge-base

LanguageModel

Input Speech

Grammar

Server

Client

Page 3: 17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World

• Problems with Wireless Networks– limited/dynamic bandwidth, low/time-varying bit rates– higher/time-varying error rates, random/bursty errors

• Client-Only Model

Possible Models for Distributed Speech Recognition (DSR) under Wireless Environment

Server-Only Model

Feature Extraction

RecognitionWireless Network

Server: Applications

Hand-held device-Client

Speech

Signal

recognition results

– speech recognition independent of wireless environment

– limitation by computation requirements for hand-held device

• Client-Server Modelcompressed feature

parameters

– proper division of computation requirements on client/server

– bandwidth saving– not compatible to existing

wireless voice communications– original speech can’t be

recovered from MFCC

– compatible to existing voice communications

– seriously degraded recognition accuracy (A)

– need to find recognition efficient feature parameters out of perceptually efficient feature parameters (B)

Feature Extraction

Feature Compression

Wireless Network

Hand-held device-Client

Speech

SignalFeature

RecoveryRecognition

ServerApplications

Speech Encoder

Wireless Network

Hand-held device-Client

Speech

Signal

for voice communications

Feature Extraction

Recognition

(B)

Feature Extraction Recognition

(A)

Speech Decoder Recovered

speech

Applications

Server

Page 4: 17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World

Client-Server Model•Split Vector Quantization for Feature Parameters (as an example)

•Error Protection– different error correction/detection schemes applied to V.Q. bit patterns for different

parameters or different bits for scalar quantization– important bits well protected while extra bit rate for error protection minimized– correct identification of errors very helpful but at higher cost– compatibility with existing wireless networking platform needed

•Error Concealment Examples– extrapolation

– interpolation also possible– performed with those sub-vectors with errors (if identifiable) only

xt: feature vector at time index t11

)1(1

ˆ

t

L

kktt xx

Lx

– delta parameters evaluated at server– bit rate minimized– computation requirements

minimized– recognition accuracy degradation

minimized with acoustic models trained by quantized parameters (matched condition)

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 E

C1 C2 C3 C4 C5 C6 C7 C8Sub-vectors C9 C10 C11 C12 E

V.Q. V.Q. V.Q. V.Q. V.Q. V.Q. Scalar Quantization(with optimized bit allocation)