A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of...

A Comparative Study on the Effect of Different Codecson Speech Recognition Accuracy Using Various Acoustic

Modeling Techniques

Srinivasa Raghavan1, Nisha Meenakshi G1, Sanjeev Kumar Mittal1,Chiranjeevi Yarra1, Anupam Mandal2, K.R. Prasanna Kumar2,

Prasanta Kumar Ghosh1

1SPIRE LAB, Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India,2Center for AI and Robotics, Bangalore, Karnataka, India

x February 2017

SPIRE LAB, IISc, Bangalore 1

Introduction

Section 1

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

Introduction

A Comparative Study on the Effect of Different Codecs on SpeechRecognition Accuracy Using Various Acoustic Modeling Techniques.

Introduction

Speech Coding & Automatic Speech Recognition (ASR)

1 The Channel Effect is not considered.

2 Effect of Language Model is not considered.

FEATUREEXTRACTION

ACOUSTIC MODELING TECHNIQUES (AMT)

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZEDPHONEME

ASR with Codec Distorted Input

ENCODER DECODERCHANNEL

CODEC TYPES:WAVEFORM

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Recognition

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Coding

Introduction

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZEDPHONEME

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Recognition

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Coding

Introduction

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZEDPHONEME

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Recognition

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Coding

Introduction

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZEDPHONEME

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Recognition

PARAMETRICHYBRID

FEATUREEXTRACTION

GMM-HMMSGMM

ACOUSTIC MODEL

+LANGUAGE

RECOGNIZED PHONEME

Speech Coding

Introduction

Common Speech Coders

Codec Type Band-width

Bit-rate(kbps)

G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k

Hybrid Narrow 13

SPEEX Hybrid Wide 27.8

VoIPG.711AG.729AG.729BSPEEXADPCM

MobileG.728MELP

GSM-8k

WirelessAMR-WBAMR-NB

Speech Coding

Introduction

Common Speech Coders

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

VoIPG.711AG.729AG.729BSPEEXADPCM

MobileG.728MELP

GSM-8k

WirelessAMR-WBAMR-NB

Speech Coding

Introduction

Common Speech Coding Strategies

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1 CODEC 2 CODEC 3

Single Encoding-Decoding

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

Tandem Encoding-Decoding

Introduction

Common Speech Coding Strategies

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

Introduction

Problem statement

FEATUREEXTRACTION

ACOUSTIC MODEL

RECOGNIZEDPHONEME

DATA AMT

CLEAN/ DISTORTED

DUE TO CODECS

Problem statement

What is that specific codec trained acoustic model, that performs wellfor different types of input speech (coded or clean PCM) across differentAMTs? Robust to codec induced distortions.

Introduction

Problem statement

FEATUREEXTRACTION

ACOUSTIC MODEL

RECOGNIZEDPHONEME

DATA AMT

CLEAN/ DISTORTED

DUE TO CODECS

Problem statement

What is that specific codec trained acoustic model, that performs wellfor different types of input speech (coded or clean PCM) across differentAMTs? Robust to codec induced distortions.

Introduction

Problem statement

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

ACOUSTIC MODEL BUILT ON

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Introduction

Problem statement

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Introduction

Key Finding 1

G.711A

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Introduction

Key Finding 1

G.711A

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Introduction

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

COCKTAIL ACOUSTIC MODEL

Cocktail Acoustic Model

Introduction

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Previous Works

Section 2

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

Previous Works

Existing Literature

1 Lower recognition for low bit-rate codecs [Euler et al. (1994), Lilly et al. (1996)].

2 Study of speech recognition with GSM codecs [Kim et al. (2000), H.-G. Hirsch (2002)].

3 ASR under noisy conditions using G.729, G.723.1 and GSM codecs [Grande et al. (2001)]

1 Impact on ASR performance more for low bit-rate codecs [Lilly et al. (1996)].

2 Study of ASR performance under unkown Tandem scenario [Salonidis et al. (1998)].

Compensation Strategies

1 Enhancement of the decoded speech, robust feature extraction [Dufour et al. (1996)]

2 Adaptation of acoustic models [Mokbel et al.. (1997), Salonidis et al. (1998),Srinivasamurthy et al. (2001)]

Experiments

Section 3

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

Experiments

AMTs and Codecs

List of codecs

Bit-Rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

Details1 Kaldi toolkit [Povey et al. (2011)].

2 ASR metric: Phoneme Error Rate (PER)

3 Codecs source: IT-UT standards, SoX,SPEEX.

4 0-gram language model.

Acoustic Modeling Techniques (AMT)

1 Monophone based GMM-HMM (MONO)

2 Context-dependent triphone basedGMM-HMM (CD-TRI)

3 The Subspace Gaussian models withboosted Maximum Mutual Information(SGMM)

4 DNN with DBN Pretraining (DNN-DP)

5 DNN with state-level MBR(DNN-DP-sMBR)

Experiments

Datasets

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

8 acoustic models using singleencoding-decoding.

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

TIMIT database. Sampling rate:8kHz.

Training set: 462 speakers with3696 utterances.

Development Set: 50 speakers with400 utterances.

Test Set: 24 speakers with 192utterances.

Experiments

Datasets

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

Experiments

Datasets

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

Experiments

Datasets

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

Experiments

Datasets

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

Bit-rate(kbps)

Hybrid Narrow 4.40

AMR-WB

Hybrid Wide 23.85

Hybrid Narrow 13

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

CODEC 1

CODEC 1CODEC N

CODEC 2

CODEC 1

CODEC 2

CODEC 2 CODEC N

Experiments

Overview of Experiments: Single Encoding Decoding

G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS

? ? ? ? ?TOP ACOUSTIC MODELS

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

✓FIND THE TOP ACOUSTIC MODELS FROM 8

? ? ? ? ?TOP ACOUSTIC

MODELS

CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

G.711A AMR-WB G.728 G.729A G.729B COCKTAIL

G.711A AMR-WB G.728 G.729A G.729B AMR-NB

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

6TRAINED

ACOUSTIC MODELS

6 BLINDTEST SETS

Experiments

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

MODELS

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

6TRAINED

ACOUSTIC MODELS

6 BLINDTEST SETS

Experiments

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

MODELS

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

6TRAINED

ACOUSTIC MODELS

6 BLINDTEST SETS

Experiments

Overview of Experiments: Tandem Encoding Decoding

COCKTAIL

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

TRAINED ACOUSTIC MODELS

6 BLINDTEST SETS✓

EVALUATE THE PERFORMANCE OF THE SELECTED TOP

ACOUSTIC MODELS+COCKTAIL

? ? ? ? ?

Results

Section 4

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

Results

Single Encoding Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Question

What are the best acoustic models across all the AMTs for variouscoded speech?

8 Candidate Models: G.711A, MELP, AMR-NB, AMR-WB, G.728,G.729A, G.729B, PCM.

8 development and 8 test datasets.

Results

Single Encoding Decoding: Choice of Top codecs

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

MODELS

Results

MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15

AMR−NB AMR−WB PCM G.711A G.728 G.729A G.729B MELP MATCHED

The average (standard deviation) PER (%) for 8 acoustic models and 5 AMTs across the

development sets.

ResultsPER decreases with the improvements in the AMTs.

Matched condition performs best across all the AMTs.

Results

AMR−NB AMR−WB PCM G.711A G.728 G.729A G.729B MELP MATCHED

The average (standard deviation) PER (%) for 8 acoustic models and 5 AMTs across the

development sets.

Matched condition performs best across all the AMTs.

Results

AMR−WB PCM G.711A G.728 G.729A G.729B0

Histogram of top four ranked codecs across different AMTs.

Results

Higher bit rate codecs.

Most of them are narrowband codecs.

Results

AMR−WB PCM G.711A G.728 G.729A G.729B0

Histogram of top four ranked codecs across different AMTs.

Results

Higher bit rate codecs.

Most of them are narrowband codecs.

Results

Single Encoding Decoding: Performance of top codecs

8TRAINED

ACOUSTIC MODELS

8DEVELOPMENT

8TEST SETS

G.711A AMR-WB G.728 G.729A G.729BTOP ACOUSTIC MODELS

Results

PCM G.729A AMR−WB G.728 G.729B G.711A MATCHED

The average (standard deviation) PER (%) for the top 5 acoustic models (along with PCM and

Mixed) and 5 AMTs across the test sets

Least PER for G.711A based acoustic model.

Results

PCM G.729A AMR−WB G.728 G.729B G.711A MATCHED

The average (standard deviation) PER (%) for the top 5 acoustic models (along with PCM and

Mixed) and 5 AMTs across the test sets

Results

Tandem Encoding Decoding

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Question

How do the top five acoustic models perform across all the AMTs fortandem coded speech?

6 Candidate models: G.711A, AMR-WB, G.728, G.729A, G.729B,Cocktail.

6 blind test sets: Combinations of ADPCM, GSM-8k, SPEEX.

Results

Tandem Encoding Decoding: Performance of top codecs

COCKTAIL

ADPCMGSM-8KSPEEX

GSM-8KADPCMSPEEX

GSM-8KSPEEXADPCM

ADPCMSPEEX

GSM-8K

SPEEXADPCMGSM-8K

SPEEXGSM-8KADPCM

TRAINED ACOUSTIC MODELS

6 BLINDTEST SETS✓

EVALUATE THE PERFORMANCE OF THE SELECTED TOP

ACOUSTIC MODELS+COCKTAIL

G.711A AMR-WB G.728 G.729A G.729B

Results

AMR−WB G.729B G.728 G.729A G.711A COCKTAIL MATCHED

The average (standard deviation) PER (%) for 6 acoustic models and 5 AMTs across six blind

test sets

Cocktail acoustic model is comparable to the matched condition.

Results

AMR−WB G.729B G.728 G.729A G.711A COCKTAIL MATCHED

The average (standard deviation) PER (%) for 6 acoustic models and 5 AMTs across six blind

test sets

Cocktail acoustic model is comparable to the matched condition.

Conclusion

Section 5

1 Introduction

2 Previous Works

3 Experiments

4 Results

5 Conclusion

Conclusion

Key Finding 1

NarrowbandHigh bit-rate

G.711A

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Conclusion

Key Finding 1

NarrowbandHigh bit-rate

G.711A

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Conclusion

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Conclusion

Key Finding 2

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

FEATUREEXTRACTION

RECOGNIZEDPHONEME

CLEAN/ DISTORTED

DUE TO CODECS

Conclusion

Summary

Conclusions

1 Studied the codec induced distortion on the ASR performance.

2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.

3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.

Future works

1 Effectiveness of the best performing models along with languagemodels.

2 Compensation of the codec induced distortions to aid ASR.

Conclusion

Summary

Conclusions

Future works

Conclusion

Summary

Conclusions

Future works

Conclusion

Summary

Conclusions

Future works

Conclusion

Summary

Conclusions

Future works

Conclusion

THANK YOU

A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of...

Documents

VIDEO CODECS

EBU Evaluations of Multichannel Audio Codecs · EBU Evaluations of Multichannel Audio Codecs Phase 3 . ... Evaluations of Multichannel Audio Codecs ... Moonriver_Mancini :

LMR Codecs - GitHub Pages · 2020. 3. 20. · Some LMR Standards and their codecs Standard Past, Present and Future Codecs P25 IMBE, AMBE+2 TETRA ACELP, AMR 4.75(opt); Future TBD,

Multimedia Codecs on i

Stereo IP AUDIO codecs for SSL / STL

Negociacion de Codecs en Asterisk

A Primer on Codecs for Moving Image and Sound Archives · 2019-10-17 · A Primer on Codecs for Moving Image and Sound Archives & 10 Recommendations for Codec Selection and Management

Wavelet Transform Based Image Compression CODECS

VN-Matrix 325 Codecs • Setup Guide - extron.com€¦ · VN-Matrix ® 325 Codecs • Setup Guide. The Extron VN-Matrix 325 encoders/decoders (codecs) distribute digital SD, HD-SDI,

CODECS CODECS Versatile Systems from Barix and BSW

Amr Codecs

Msu Lossless Codecs Comparison 2007 Eng

JPEG 2000 Image Codecs Comparison...• It is hard to define a leader on high compression. ACDSee is the best on low compression rates. JPEG 2000 IMAGE CODECS COMPARISON CS MSU GRAPHICS&MEDIA

Data Formats and Codecs

ETSI Codecs

MPEG-4 Video Codecs Comparison - compression.rucompression.ru/video/codec_comparison/pdf/MSU_MPEG4_Comparison_eng.pdfMPEG-4 Video Codecs Comparison Project head: Dmitriy Vatolin Testing,

Multimedia Codecs on i - NXP Semiconductors · Comprehensive suite of optimized codecs (~40+ Audio/Video/Image Codecs) Highly optimized software that is coded by Freescale processor

A Primer on Codecs for Moving Image and Sound Archives€¦ · A Primer on Codecs for Moving Image and Sound Archives & 10 Recommendations for Codec Selection and Management by chris

Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Audio Codecs