57
A Comparative Study on the Effect of Different Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling Techniques Srinivasa Raghavan 1 , Nisha Meenakshi G 1 , Sanjeev Kumar Mittal 1 , Chiranjeevi Yarra 1 , Anupam Mandal 2 , K.R. Prasanna Kumar 2 , Prasanta Kumar Ghosh 1 1 SPIRE LAB, Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India, 2 Center for AI and Robotics, Bangalore, Karnataka, India SPIRE LAB, IISc, Bangalore 1

A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

A Comparative Study on the Effect of Different Codecson Speech Recognition Accuracy Using Various Acoustic

Modeling Techniques

Srinivasa Raghavan1, Nisha Meenakshi G1, Sanjeev Kumar Mittal1,Chiranjeevi Yarra1, Anupam Mandal2, K.R. Prasanna Kumar2,

Prasanta Kumar Ghosh1

1SPIRE LAB, Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India,2Center for AI and Robotics, Bangalore, Karnataka, India

x February 2017

SPIRE LAB, IISc, Bangalore 1

Page 2: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Section 1

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

SPIRE LAB, IISc, Bangalore 2

Page 3: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Focus

A Comparative Study on the Effect of Different Codecs on SpeechRecognition Accuracy Using Various Acoustic Modeling Techniques.

SPIRE LAB, IISc, Bangalore 3

Page 4: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Speech Coding & Automatic Speech Recognition (ASR)

SPIRE LAB, IISc, Bangalore 4

Note

1 The Channel Effect is not considered.

2 Effect of Language Model is not considered.

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZEDPHONEME

ASR with Codec Distorted Input

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Recognition

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Coding

Page 5: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Speech Coding & Automatic Speech Recognition (ASR)

SPIRE LAB, IISc, Bangalore 4

Note

1 The Channel Effect is not considered.

2 Effect of Language Model is not considered.

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZEDPHONEME

ASR with Codec Distorted Input

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Recognition

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Coding

Page 6: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Speech Coding & Automatic Speech Recognition (ASR)

SPIRE LAB, IISc, Bangalore 4

Note

1 The Channel Effect is not considered.

2 Effect of Language Model is not considered.

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZEDPHONEME

ASR with Codec Distorted Input

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Recognition

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Coding

Page 7: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Speech Coding & Automatic Speech Recognition (ASR)

SPIRE LAB, IISc, Bangalore 4

Note

1 The Channel Effect is not considered.

2 Effect of Language Model is not considered.

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZEDPHONEME

ASR with Codec Distorted Input

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Recognition

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

DNN

ACOUSTIC MODEL

+LANGUAGE

MODEL

RECOGNIZED PHONEME

Speech Coding

Page 8: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Common Speech Coders

SPIRE LAB, IISc, Bangalore 5

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

VoIPG.711AG.729AG.729BSPEEXADPCM

MobileG.728MELP

GSM-8k

WirelessAMR-WBAMR-NB

Speech Coding

Page 9: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Common Speech Coders

SPIRE LAB, IISc, Bangalore 5

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

VoIPG.711AG.729AG.729BSPEEXADPCM

MobileG.728MELP

GSM-8k

WirelessAMR-WBAMR-NB

Speech Coding

Page 10: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Common Speech Coding Strategies

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

Single Encoding-Decoding

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

Tandem Encoding-Decoding

SPIRE LAB, IISc, Bangalore 6

Page 11: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Common Speech Coding Strategies

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

Single Encoding-Decoding

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

Tandem Encoding-Decoding

SPIRE LAB, IISc, Bangalore 6

Page 12: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Problem statement

FEATUREEXTRACTION

ACOUSTIC MODEL

RECOGNIZEDPHONEME

DATA AMT

CLEAN/ DISTORTED

DUE TO CODECS

Problem statement

What is that specific codec trained acoustic model, that performs wellfor different types of input speech (coded or clean PCM) across differentAMTs? Robust to codec induced distortions.

SPIRE LAB, IISc, Bangalore 7

Page 13: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Problem statement

FEATUREEXTRACTION

ACOUSTIC MODEL

RECOGNIZEDPHONEME

DATA AMT

CLEAN/ DISTORTED

DUE TO CODECS

Problem statement

What is that specific codec trained acoustic model, that performs wellfor different types of input speech (coded or clean PCM) across differentAMTs? Robust to codec induced distortions.

SPIRE LAB, IISc, Bangalore 7

Page 14: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Problem statement

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

SPIRE LAB, IISc, Bangalore 8

Page 15: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Problem statement

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

SPIRE LAB, IISc, Bangalore 8

Page 16: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Key Finding 1

SPIRE LAB, IISc, Bangalore 9

G.711A

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

Page 17: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Key Finding 1

SPIRE LAB, IISc, Bangalore 9

G.711A

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

Page 18: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

C1

CN

C3

C2

CN

COCKTAIL ACOUSTIC MODEL

Cocktail Acoustic Model

SPIRE LAB, IISc, Bangalore 10

Page 19: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Introduction

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

C1

CN

C3

C2

CN

COCKTAIL ACOUSTIC MODEL

Cocktail Acoustic Model

SPIRE LAB, IISc, Bangalore 10

Page 20: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Previous Works

Section 2

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

SPIRE LAB, IISc, Bangalore 11

Page 21: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Previous Works

Existing Literature

Single Encoding-Decoding

1 Lower recognition for low bit-rate codecs [Euler et al. (1994), Lilly et al. (1996)].

2 Study of speech recognition with GSM codecs [Kim et al. (2000), H.-G. Hirsch (2002)].

3 ASR under noisy conditions using G.729, G.723.1 and GSM codecs [Grande et al. (2001)]

Tandem Encoding-Decoding

1 Impact on ASR performance more for low bit-rate codecs [Lilly et al. (1996)].

2 Study of ASR performance under unkown Tandem scenario [Salonidis et al. (1998)].

Compensation Strategies

1 Enhancement of the decoded speech, robust feature extraction [Dufour et al. (1996)]

2 Adaptation of acoustic models [Mokbel et al.. (1997), Salonidis et al. (1998),Srinivasamurthy et al. (2001)]

SPIRE LAB, IISc, Bangalore 12

Page 22: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Section 3

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

SPIRE LAB, IISc, Bangalore 13

Page 23: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

AMTs and Codecs

SPIRE LAB, IISc, Bangalore 14

List of codecs

Codec Type Band-width

Bit-Rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

Details1 Kaldi toolkit [Povey et al. (2011)].

2 ASR metric: Phoneme Error Rate (PER)

3 Codecs source: IT-UT standards, SoX,SPEEX.

4 0-gram language model.

Acoustic Modeling Techniques (AMT)

1 Monophone based GMM-HMM (MONO)

2 Context-dependent triphone basedGMM-HMM (CD-TRI)

3 The Subspace Gaussian models withboosted Maximum Mutual Information(SGMM)

4 DNN with DBN Pretraining (DNN-DP)

5 DNN with state-level MBR(DNN-DP-sMBR)

Page 24: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Datasets

SPIRE LAB, IISc, Bangalore 15

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

8 acoustic models using singleencoding-decoding.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

TIMIT database. Sampling rate:8kHz.

Training set: 462 speakers with3696 utterances.

Development Set: 50 speakers with400 utterances.

Test Set: 24 speakers with 192utterances.

Page 25: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Datasets

SPIRE LAB, IISc, Bangalore 15

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

8 acoustic models using singleencoding-decoding.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

TIMIT database. Sampling rate:8kHz.

Training set: 462 speakers with3696 utterances.

Development Set: 50 speakers with400 utterances.

Test Set: 24 speakers with 192utterances.

Page 26: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Datasets

SPIRE LAB, IISc, Bangalore 15

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

8 acoustic models using singleencoding-decoding.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

TIMIT database. Sampling rate:8kHz.

Training set: 462 speakers with3696 utterances.

Development Set: 50 speakers with400 utterances.

Test Set: 24 speakers with 192utterances.

Page 27: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Datasets

SPIRE LAB, IISc, Bangalore 15

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

8 acoustic models using singleencoding-decoding.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

TIMIT database. Sampling rate:8kHz.

Training set: 462 speakers with3696 utterances.

Development Set: 50 speakers with400 utterances.

Test Set: 24 speakers with 192utterances.

Page 28: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Datasets

SPIRE LAB, IISc, Bangalore 15

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

8 acoustic models using singleencoding-decoding.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

TIMIT database. Sampling rate:8kHz.

Training set: 462 speakers with3696 utterances.

Development Set: 50 speakers with400 utterances.

Test Set: 24 speakers with 192utterances.

Page 29: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Overview of Experiments: Single Encoding Decoding

SPIRE LAB, IISc, Bangalore 16

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS

? ? ? ? ?TOP ACOUSTIC MODELS

PCM

PCM

PCM

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

✓FIND THE TOP ACOUSTIC MODELS FROM 8

? ? ? ? ?TOP ACOUSTIC

MODELS

PCM

PCM

PCM

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

G.711A AMR-WB G.728 G.729A G.729B COCKTAIL

G.711A AMR-WB G.728 G.729A G.729B AMR-NB

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

6TRAINED

ACOUSTIC MODELS

6 BLINDTEST SETS

PCM

PCM

PCM

Page 30: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Overview of Experiments: Single Encoding Decoding

SPIRE LAB, IISc, Bangalore 16

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS

? ? ? ? ?TOP ACOUSTIC MODELS

PCM

PCM

PCM

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

✓FIND THE TOP ACOUSTIC MODELS FROM 8

? ? ? ? ?TOP ACOUSTIC

MODELS

PCM

PCM

PCM

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

G.711A AMR-WB G.728 G.729A G.729B COCKTAIL

G.711A AMR-WB G.728 G.729A G.729B AMR-NB

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

6TRAINED

ACOUSTIC MODELS

6 BLINDTEST SETS

PCM

PCM

PCM

Page 31: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Overview of Experiments: Single Encoding Decoding

SPIRE LAB, IISc, Bangalore 16

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS

? ? ? ? ?TOP ACOUSTIC MODELS

PCM

PCM

PCM

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

✓FIND THE TOP ACOUSTIC MODELS FROM 8

? ? ? ? ?TOP ACOUSTIC

MODELS

PCM

PCM

PCM

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

G.711A AMR-WB G.728 G.729A G.729B COCKTAIL

G.711A AMR-WB G.728 G.729A G.729B AMR-NB

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

6TRAINED

ACOUSTIC MODELS

6 BLINDTEST SETS

PCM

PCM

PCM

Page 32: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Experiments

Overview of Experiments: Tandem Encoding Decoding

SPIRE LAB, IISc, Bangalore 17

COCKTAIL

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

TRAINED ACOUSTIC MODELS

6 BLINDTEST SETS✓

EVALUATE THE PERFORMANCE OF THE SELECTED TOP

ACOUSTIC MODELS+COCKTAIL

MODEL

? ? ? ? ?

Page 33: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Section 4

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

SPIRE LAB, IISc, Bangalore 18

Page 34: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

Question

What are the best acoustic models across all the AMTs for variouscoded speech?

8 Candidate Models: G.711A, MELP, AMR-NB, AMR-WB, G.728,G.729A, G.729B, PCM.

8 development and 8 test datasets.

SPIRE LAB, IISc, Bangalore 19

Page 35: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Choice of Top codecs

SPIRE LAB, IISc, Bangalore 20

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

✓FIND THE TOP ACOUSTIC MODELS FROM 8

? ? ? ? ?TOP ACOUSTIC

MODELS

PCM

PCM

PCM

Page 36: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Choice of Top codecs

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15

20

25

30

35

40

45

50P

E R

(%

)

AMR−NB AMR−WB PCM G.711A G.728 G.729A G.729B MELP MATCHED

The average (standard deviation) PER (%) for 8 acoustic models and 5 AMTs across the

development sets.

ResultsPER decreases with the improvements in the AMTs.

Matched condition performs best across all the AMTs.

SPIRE LAB, IISc, Bangalore 21

Page 37: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Choice of Top codecs

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15

20

25

30

35

40

45

50P

E R

(%

)

AMR−NB AMR−WB PCM G.711A G.728 G.729A G.729B MELP MATCHED

The average (standard deviation) PER (%) for 8 acoustic models and 5 AMTs across the

development sets.

ResultsPER decreases with the improvements in the AMTs.

Matched condition performs best across all the AMTs.

SPIRE LAB, IISc, Bangalore 21

Page 38: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Choice of Top codecs

AMR−WB PCM G.711A G.728 G.729A G.729B0

2

4

Codec

Count

Histogram of top four ranked codecs across different AMTs.

Results

Higher bit rate codecs.

Most of them are narrowband codecs.

SPIRE LAB, IISc, Bangalore 22

Page 39: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Choice of Top codecs

AMR−WB PCM G.711A G.728 G.729A G.729B0

2

4

Codec

Count

Histogram of top four ranked codecs across different AMTs.

Results

Higher bit rate codecs.

Most of them are narrowband codecs.

SPIRE LAB, IISc, Bangalore 22

Page 40: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Performance of top codecs

SPIRE LAB, IISc, Bangalore 23

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

SETS

8TEST SETS

EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS

G.711A AMR-WB G.728 G.729A G.729BTOP ACOUSTIC MODELS

PCM

PCM

PCM

Page 41: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Performance of top codecs

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15

20

25

30

35

40

45

50P

E R

(%

)

PCM G.729A AMR−WB G.728 G.729B G.711A MATCHED

The average (standard deviation) PER (%) for the top 5 acoustic models (along with PCM and

Mixed) and 5 AMTs across the test sets

ResultsPER decreases with the improvements in the AMTs.

Least PER for G.711A based acoustic model.

SPIRE LAB, IISc, Bangalore 24

Page 42: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Single Encoding Decoding: Performance of top codecs

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15

20

25

30

35

40

45

50P

E R

(%

)

PCM G.729A AMR−WB G.728 G.729B G.711A MATCHED

The average (standard deviation) PER (%) for the top 5 acoustic models (along with PCM and

Mixed) and 5 AMTs across the test sets

ResultsPER decreases with the improvements in the AMTs.

Least PER for G.711A based acoustic model.

SPIRE LAB, IISc, Bangalore 24

Page 43: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Tandem Encoding Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

2C

3

Tandem Encoding-Decoding

Question

How do the top five acoustic models perform across all the AMTs fortandem coded speech?

6 Candidate models: G.711A, AMR-WB, G.728, G.729A, G.729B,Cocktail.

6 blind test sets: Combinations of ADPCM, GSM-8k, SPEEX.

SPIRE LAB, IISc, Bangalore 25

Page 44: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Tandem Encoding Decoding: Performance of top codecs

SPIRE LAB, IISc, Bangalore 26

COCKTAIL

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

TRAINED ACOUSTIC MODELS

6 BLINDTEST SETS✓

EVALUATE THE PERFORMANCE OF THE SELECTED TOP

ACOUSTIC MODELS+COCKTAIL

MODEL

G.711A AMR-WB G.728 G.729A G.729B

Page 45: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Tandem Encoding Decoding: Performance of top codecs

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR20

25

30

35

40

45

50

55

60

P E

R (

%)

AMR−WB G.729B G.728 G.729A G.711A COCKTAIL MATCHED

The average (standard deviation) PER (%) for 6 acoustic models and 5 AMTs across six blind

test sets

ResultsPER decreases with the improvements in the AMTs.

Least PER for G.711A based acoustic model.

Cocktail acoustic model is comparable to the matched condition.

SPIRE LAB, IISc, Bangalore 27

Page 46: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Results

Tandem Encoding Decoding: Performance of top codecs

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR20

25

30

35

40

45

50

55

60

P E

R (

%)

AMR−WB G.729B G.728 G.729A G.711A COCKTAIL MATCHED

The average (standard deviation) PER (%) for 6 acoustic models and 5 AMTs across six blind

test sets

ResultsPER decreases with the improvements in the AMTs.

Least PER for G.711A based acoustic model.

Cocktail acoustic model is comparable to the matched condition.

SPIRE LAB, IISc, Bangalore 27

Page 47: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Section 5

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

SPIRE LAB, IISc, Bangalore 28

Page 48: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Key Finding 1

SPIRE LAB, IISc, Bangalore 29

NarrowbandHigh bit-rate

codec

G.711A

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

Page 49: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Key Finding 1

SPIRE LAB, IISc, Bangalore 29

NarrowbandHigh bit-rate

codec

G.711A

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

N

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C

ACOUSTIC MODEL BUILT ON

DATA

?

Single Encoding-Decoding

Page 50: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

2C

3

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

ACOUSTIC MODEL BUILT ON

DATA

C1

C3

C2

CN

COCKTAIL ACOUSTIC MODEL

C1

C2

C3

Cocktail Acoustic Model

SPIRE LAB, IISc, Bangalore 30

Page 51: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

C1

ACOUSTIC MODEL BUILT ON

DATA

?C

2C

3

Tandem Encoding-Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

ACOUSTIC MODEL BUILT ON

DATA

C1

C3

C2

CN

COCKTAIL ACOUSTIC MODEL

C1

C2

C3

Cocktail Acoustic Model

SPIRE LAB, IISc, Bangalore 30

Page 52: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Summary

Conclusions

1 Studied the codec induced distortion on the ASR performance.

2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.

3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.

Future works

1 Effectiveness of the best performing models along with languagemodels.

2 Compensation of the codec induced distortions to aid ASR.

SPIRE LAB, IISc, Bangalore 31

Page 53: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Summary

Conclusions

1 Studied the codec induced distortion on the ASR performance.

2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.

3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.

Future works

1 Effectiveness of the best performing models along with languagemodels.

2 Compensation of the codec induced distortions to aid ASR.

SPIRE LAB, IISc, Bangalore 31

Page 54: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Summary

Conclusions

1 Studied the codec induced distortion on the ASR performance.

2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.

3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.

Future works

1 Effectiveness of the best performing models along with languagemodels.

2 Compensation of the codec induced distortions to aid ASR.

SPIRE LAB, IISc, Bangalore 31

Page 55: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Summary

Conclusions

1 Studied the codec induced distortion on the ASR performance.

2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.

3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.

Future works

1 Effectiveness of the best performing models along with languagemodels.

2 Compensation of the codec induced distortions to aid ASR.

SPIRE LAB, IISc, Bangalore 31

Page 56: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

Summary

Conclusions

1 Studied the codec induced distortion on the ASR performance.

2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.

3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.

Future works

1 Effectiveness of the best performing models along with languagemodels.

2 Compensation of the codec induced distortions to aid ASR.

SPIRE LAB, IISc, Bangalore 31

Page 57: A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

Conclusion

THANK YOU

SPIRE LAB, IISc, Bangalore 32