View
0
Download
0
Category
Preview:
Citation preview
A Comparative Study on the Effect of Different Codecson Speech Recognition Accuracy Using Various Acoustic
Modeling Techniques
Srinivasa Raghavan1, Nisha Meenakshi G1, Sanjeev Kumar Mittal1,Chiranjeevi Yarra1, Anupam Mandal2, K.R. Prasanna Kumar2,
Prasanta Kumar Ghosh1
1SPIRE LAB, Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India,2Center for AI and Robotics, Bangalore, Karnataka, India
x February 2017
SPIRE LAB, IISc, Bangalore 1
Introduction
Section 1
1 Introduction
2 Previous Works
3 Experiments
4 Results
5 Conclusion
SPIRE LAB, IISc, Bangalore 2
Introduction
Focus
A Comparative Study on the Effect of Different Codecs on SpeechRecognition Accuracy Using Various Acoustic Modeling Techniques.
SPIRE LAB, IISc, Bangalore 3
Introduction
Speech Coding & Automatic Speech Recognition (ASR)
SPIRE LAB, IISc, Bangalore 4
Note
1 The Channel Effect is not considered.
2 Effect of Language Model is not considered.
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZEDPHONEME
ASR with Codec Distorted Input
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Recognition
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Coding
Introduction
Speech Coding & Automatic Speech Recognition (ASR)
SPIRE LAB, IISc, Bangalore 4
Note
1 The Channel Effect is not considered.
2 Effect of Language Model is not considered.
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZEDPHONEME
ASR with Codec Distorted Input
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Recognition
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Coding
Introduction
Speech Coding & Automatic Speech Recognition (ASR)
SPIRE LAB, IISc, Bangalore 4
Note
1 The Channel Effect is not considered.
2 Effect of Language Model is not considered.
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZEDPHONEME
ASR with Codec Distorted Input
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Recognition
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Coding
Introduction
Speech Coding & Automatic Speech Recognition (ASR)
SPIRE LAB, IISc, Bangalore 4
Note
1 The Channel Effect is not considered.
2 Effect of Language Model is not considered.
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZEDPHONEME
ASR with Codec Distorted Input
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Recognition
ENCODER DECODERCHANNEL
CODEC TYPES:WAVEFORM
PARAMETRICHYBRID
FEATUREEXTRACTION
ACOUSTIC MODELING TECHNIQUES (AMT)
GMM-HMMSGMM
DNN
ACOUSTIC MODEL
+LANGUAGE
MODEL
RECOGNIZED PHONEME
Speech Coding
Introduction
Common Speech Coders
SPIRE LAB, IISc, Bangalore 5
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
VoIPG.711AG.729AG.729BSPEEXADPCM
MobileG.728MELP
GSM-8k
WirelessAMR-WBAMR-NB
Speech Coding
Introduction
Common Speech Coders
SPIRE LAB, IISc, Bangalore 5
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
VoIPG.711AG.729AG.729BSPEEXADPCM
MobileG.728MELP
GSM-8k
WirelessAMR-WBAMR-NB
Speech Coding
Introduction
Common Speech Coding Strategies
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
Single Encoding-Decoding
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
Tandem Encoding-Decoding
SPIRE LAB, IISc, Bangalore 6
Introduction
Common Speech Coding Strategies
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
Single Encoding-Decoding
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
Tandem Encoding-Decoding
SPIRE LAB, IISc, Bangalore 6
Introduction
Problem statement
FEATUREEXTRACTION
ACOUSTIC MODEL
RECOGNIZEDPHONEME
DATA AMT
CLEAN/ DISTORTED
DUE TO CODECS
Problem statement
What is that specific codec trained acoustic model, that performs wellfor different types of input speech (coded or clean PCM) across differentAMTs? Robust to codec induced distortions.
SPIRE LAB, IISc, Bangalore 7
Introduction
Problem statement
FEATUREEXTRACTION
ACOUSTIC MODEL
RECOGNIZEDPHONEME
DATA AMT
CLEAN/ DISTORTED
DUE TO CODECS
Problem statement
What is that specific codec trained acoustic model, that performs wellfor different types of input speech (coded or clean PCM) across differentAMTs? Robust to codec induced distortions.
SPIRE LAB, IISc, Bangalore 7
Introduction
Problem statement
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
SPIRE LAB, IISc, Bangalore 8
Introduction
Problem statement
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
SPIRE LAB, IISc, Bangalore 8
Introduction
Key Finding 1
SPIRE LAB, IISc, Bangalore 9
G.711A
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
Introduction
Key Finding 1
SPIRE LAB, IISc, Bangalore 9
G.711A
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
Introduction
Key Finding 2
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
C1
CN
C3
C2
CN
COCKTAIL ACOUSTIC MODEL
Cocktail Acoustic Model
SPIRE LAB, IISc, Bangalore 10
Introduction
Key Finding 2
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
C1
CN
C3
C2
CN
COCKTAIL ACOUSTIC MODEL
Cocktail Acoustic Model
SPIRE LAB, IISc, Bangalore 10
Previous Works
Section 2
1 Introduction
2 Previous Works
3 Experiments
4 Results
5 Conclusion
SPIRE LAB, IISc, Bangalore 11
Previous Works
Existing Literature
Single Encoding-Decoding
1 Lower recognition for low bit-rate codecs [Euler et al. (1994), Lilly et al. (1996)].
2 Study of speech recognition with GSM codecs [Kim et al. (2000), H.-G. Hirsch (2002)].
3 ASR under noisy conditions using G.729, G.723.1 and GSM codecs [Grande et al. (2001)]
Tandem Encoding-Decoding
1 Impact on ASR performance more for low bit-rate codecs [Lilly et al. (1996)].
2 Study of ASR performance under unkown Tandem scenario [Salonidis et al. (1998)].
Compensation Strategies
1 Enhancement of the decoded speech, robust feature extraction [Dufour et al. (1996)]
2 Adaptation of acoustic models [Mokbel et al.. (1997), Salonidis et al. (1998),Srinivasamurthy et al. (2001)]
SPIRE LAB, IISc, Bangalore 12
Experiments
Section 3
1 Introduction
2 Previous Works
3 Experiments
4 Results
5 Conclusion
SPIRE LAB, IISc, Bangalore 13
Experiments
AMTs and Codecs
SPIRE LAB, IISc, Bangalore 14
List of codecs
Codec Type Band-width
Bit-Rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
Details1 Kaldi toolkit [Povey et al. (2011)].
2 ASR metric: Phoneme Error Rate (PER)
3 Codecs source: IT-UT standards, SoX,SPEEX.
4 0-gram language model.
Acoustic Modeling Techniques (AMT)
1 Monophone based GMM-HMM (MONO)
2 Context-dependent triphone basedGMM-HMM (CD-TRI)
3 The Subspace Gaussian models withboosted Maximum Mutual Information(SGMM)
4 DNN with DBN Pretraining (DNN-DP)
5 DNN with state-level MBR(DNN-DP-sMBR)
Experiments
Datasets
SPIRE LAB, IISc, Bangalore 15
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
8 acoustic models using singleencoding-decoding.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
TIMIT database. Sampling rate:8kHz.
Training set: 462 speakers with3696 utterances.
Development Set: 50 speakers with400 utterances.
Test Set: 24 speakers with 192utterances.
Experiments
Datasets
SPIRE LAB, IISc, Bangalore 15
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
8 acoustic models using singleencoding-decoding.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
TIMIT database. Sampling rate:8kHz.
Training set: 462 speakers with3696 utterances.
Development Set: 50 speakers with400 utterances.
Test Set: 24 speakers with 192utterances.
Experiments
Datasets
SPIRE LAB, IISc, Bangalore 15
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
8 acoustic models using singleencoding-decoding.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
TIMIT database. Sampling rate:8kHz.
Training set: 462 speakers with3696 utterances.
Development Set: 50 speakers with400 utterances.
Test Set: 24 speakers with 192utterances.
Experiments
Datasets
SPIRE LAB, IISc, Bangalore 15
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
8 acoustic models using singleencoding-decoding.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
TIMIT database. Sampling rate:8kHz.
Training set: 462 speakers with3696 utterances.
Development Set: 50 speakers with400 utterances.
Test Set: 24 speakers with 192utterances.
Experiments
Datasets
SPIRE LAB, IISc, Bangalore 15
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
Codec Type Band-width
Bit-rate(kbps)
G.711A Waveform Narrow 64MELP Parametric Narrow 2.4AMR-NB
Hybrid Narrow 4.40
AMR-WB
Hybrid Wide 23.85
G.728 Hybrid Narrow 16G.729A Hybrid Narrow 8G.729B Hybrid Narrow 8PCM Waveform Narrow 128ADPCM Waveform Wide 32GSM-8k
Hybrid Narrow 13
SPEEX Hybrid Wide 27.8
6 Tandem test databases: 1) ADPCM→GSM-8k→SPEEX, 2) ADPCM→SPEEX→GSM-8k,3) GSM-8k→ADPCM→SPEEX, 4) GSM-8k→SPEEX→ADPCM, 5)SPEEX→ADPCM→GSM-8k, 6) SPEEX→GSM-8k→ADPCM.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
8 acoustic models using singleencoding-decoding.
CODEC 1
CODEC 1CODEC N
CODEC 2
CODEC 1
CODEC 2
CODEC 2 CODEC N
CODEC 1 CODEC 2 CODEC 3
TIMIT database. Sampling rate:8kHz.
Training set: 462 speakers with3696 utterances.
Development Set: 50 speakers with400 utterances.
Test Set: 24 speakers with 192utterances.
Experiments
Overview of Experiments: Single Encoding Decoding
SPIRE LAB, IISc, Bangalore 16
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
✓
8DEVELOPMENT
SETS
8TEST SETS
EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS
? ? ? ? ?TOP ACOUSTIC MODELS
PCM
PCM
PCM
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
✓FIND THE TOP ACOUSTIC MODELS FROM 8
? ? ? ? ?TOP ACOUSTIC
MODELS
PCM
PCM
PCM
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
G.711A AMR-WB G.728 G.729A G.729B COCKTAIL
G.711A AMR-WB G.728 G.729A G.729B AMR-NB
ADPCMGSM-8KSPEEX
GSM-8KADPCMSPEEX
GSM-8KSPEEXADPCM
ADPCMSPEEX
GSM-8K
SPEEXADPCMGSM-8K
SPEEXGSM-8KADPCM
6TRAINED
ACOUSTIC MODELS
6 BLINDTEST SETS
PCM
PCM
PCM
Experiments
Overview of Experiments: Single Encoding Decoding
SPIRE LAB, IISc, Bangalore 16
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
✓
8DEVELOPMENT
SETS
8TEST SETS
EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS
? ? ? ? ?TOP ACOUSTIC MODELS
PCM
PCM
PCM
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
✓FIND THE TOP ACOUSTIC MODELS FROM 8
? ? ? ? ?TOP ACOUSTIC
MODELS
PCM
PCM
PCM
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
G.711A AMR-WB G.728 G.729A G.729B COCKTAIL
G.711A AMR-WB G.728 G.729A G.729B AMR-NB
ADPCMGSM-8KSPEEX
GSM-8KADPCMSPEEX
GSM-8KSPEEXADPCM
ADPCMSPEEX
GSM-8K
SPEEXADPCMGSM-8K
SPEEXGSM-8KADPCM
6TRAINED
ACOUSTIC MODELS
6 BLINDTEST SETS
PCM
PCM
PCM
Experiments
Overview of Experiments: Single Encoding Decoding
SPIRE LAB, IISc, Bangalore 16
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
✓
8DEVELOPMENT
SETS
8TEST SETS
EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS
? ? ? ? ?TOP ACOUSTIC MODELS
PCM
PCM
PCM
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
✓FIND THE TOP ACOUSTIC MODELS FROM 8
? ? ? ? ?TOP ACOUSTIC
MODELS
PCM
PCM
PCM
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1 CODEC 1
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
G.711A AMR-WB G.728 G.729A G.729B COCKTAIL
G.711A AMR-WB G.728 G.729A G.729B AMR-NB
ADPCMGSM-8KSPEEX
GSM-8KADPCMSPEEX
GSM-8KSPEEXADPCM
ADPCMSPEEX
GSM-8K
SPEEXADPCMGSM-8K
SPEEXGSM-8KADPCM
6TRAINED
ACOUSTIC MODELS
6 BLINDTEST SETS
PCM
PCM
PCM
Experiments
Overview of Experiments: Tandem Encoding Decoding
SPIRE LAB, IISc, Bangalore 17
COCKTAIL
ADPCMGSM-8KSPEEX
GSM-8KADPCMSPEEX
GSM-8KSPEEXADPCM
ADPCMSPEEX
GSM-8K
SPEEXADPCMGSM-8K
SPEEXGSM-8KADPCM
TRAINED ACOUSTIC MODELS
6 BLINDTEST SETS✓
EVALUATE THE PERFORMANCE OF THE SELECTED TOP
ACOUSTIC MODELS+COCKTAIL
MODEL
? ? ? ? ?
Results
Section 4
1 Introduction
2 Previous Works
3 Experiments
4 Results
5 Conclusion
SPIRE LAB, IISc, Bangalore 18
Results
Single Encoding Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
Question
What are the best acoustic models across all the AMTs for variouscoded speech?
8 Candidate Models: G.711A, MELP, AMR-NB, AMR-WB, G.728,G.729A, G.729B, PCM.
8 development and 8 test datasets.
SPIRE LAB, IISc, Bangalore 19
Results
Single Encoding Decoding: Choice of Top codecs
SPIRE LAB, IISc, Bangalore 20
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
8DEVELOPMENT
SETS
8TEST SETS
✓FIND THE TOP ACOUSTIC MODELS FROM 8
? ? ? ? ?TOP ACOUSTIC
MODELS
PCM
PCM
PCM
Results
Single Encoding Decoding: Choice of Top codecs
MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15
20
25
30
35
40
45
50P
E R
(%
)
AMR−NB AMR−WB PCM G.711A G.728 G.729A G.729B MELP MATCHED
The average (standard deviation) PER (%) for 8 acoustic models and 5 AMTs across the
development sets.
ResultsPER decreases with the improvements in the AMTs.
Matched condition performs best across all the AMTs.
SPIRE LAB, IISc, Bangalore 21
Results
Single Encoding Decoding: Choice of Top codecs
MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15
20
25
30
35
40
45
50P
E R
(%
)
AMR−NB AMR−WB PCM G.711A G.728 G.729A G.729B MELP MATCHED
The average (standard deviation) PER (%) for 8 acoustic models and 5 AMTs across the
development sets.
ResultsPER decreases with the improvements in the AMTs.
Matched condition performs best across all the AMTs.
SPIRE LAB, IISc, Bangalore 21
Results
Single Encoding Decoding: Choice of Top codecs
AMR−WB PCM G.711A G.728 G.729A G.729B0
2
4
Codec
Count
Histogram of top four ranked codecs across different AMTs.
Results
Higher bit rate codecs.
Most of them are narrowband codecs.
SPIRE LAB, IISc, Bangalore 22
Results
Single Encoding Decoding: Choice of Top codecs
AMR−WB PCM G.711A G.728 G.729A G.729B0
2
4
Codec
Count
Histogram of top four ranked codecs across different AMTs.
Results
Higher bit rate codecs.
Most of them are narrowband codecs.
SPIRE LAB, IISc, Bangalore 22
Results
Single Encoding Decoding: Performance of top codecs
SPIRE LAB, IISc, Bangalore 23
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
G.711A AMR-WB G.728 G.729A G.729B AMR-NB MELP
8TRAINED
ACOUSTIC MODELS
✓
8DEVELOPMENT
SETS
8TEST SETS
EVALUATE THE PERFORMANCE OF THE SELECTED TOP ACOUSTIC MODELS
G.711A AMR-WB G.728 G.729A G.729BTOP ACOUSTIC MODELS
PCM
PCM
PCM
Results
Single Encoding Decoding: Performance of top codecs
MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15
20
25
30
35
40
45
50P
E R
(%
)
PCM G.729A AMR−WB G.728 G.729B G.711A MATCHED
The average (standard deviation) PER (%) for the top 5 acoustic models (along with PCM and
Mixed) and 5 AMTs across the test sets
ResultsPER decreases with the improvements in the AMTs.
Least PER for G.711A based acoustic model.
SPIRE LAB, IISc, Bangalore 24
Results
Single Encoding Decoding: Performance of top codecs
MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR15
20
25
30
35
40
45
50P
E R
(%
)
PCM G.729A AMR−WB G.728 G.729B G.711A MATCHED
The average (standard deviation) PER (%) for the top 5 acoustic models (along with PCM and
Mixed) and 5 AMTs across the test sets
ResultsPER decreases with the improvements in the AMTs.
Least PER for G.711A based acoustic model.
SPIRE LAB, IISc, Bangalore 24
Results
Tandem Encoding Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
2C
3
Tandem Encoding-Decoding
Question
How do the top five acoustic models perform across all the AMTs fortandem coded speech?
6 Candidate models: G.711A, AMR-WB, G.728, G.729A, G.729B,Cocktail.
6 blind test sets: Combinations of ADPCM, GSM-8k, SPEEX.
SPIRE LAB, IISc, Bangalore 25
Results
Tandem Encoding Decoding: Performance of top codecs
SPIRE LAB, IISc, Bangalore 26
COCKTAIL
ADPCMGSM-8KSPEEX
GSM-8KADPCMSPEEX
GSM-8KSPEEXADPCM
ADPCMSPEEX
GSM-8K
SPEEXADPCMGSM-8K
SPEEXGSM-8KADPCM
TRAINED ACOUSTIC MODELS
6 BLINDTEST SETS✓
EVALUATE THE PERFORMANCE OF THE SELECTED TOP
ACOUSTIC MODELS+COCKTAIL
MODEL
G.711A AMR-WB G.728 G.729A G.729B
Results
Tandem Encoding Decoding: Performance of top codecs
MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR20
25
30
35
40
45
50
55
60
P E
R (
%)
AMR−WB G.729B G.728 G.729A G.711A COCKTAIL MATCHED
The average (standard deviation) PER (%) for 6 acoustic models and 5 AMTs across six blind
test sets
ResultsPER decreases with the improvements in the AMTs.
Least PER for G.711A based acoustic model.
Cocktail acoustic model is comparable to the matched condition.
SPIRE LAB, IISc, Bangalore 27
Results
Tandem Encoding Decoding: Performance of top codecs
MONO CD−TRI SGMM DNN−DP DNN−DP−sMBR20
25
30
35
40
45
50
55
60
P E
R (
%)
AMR−WB G.729B G.728 G.729A G.711A COCKTAIL MATCHED
The average (standard deviation) PER (%) for 6 acoustic models and 5 AMTs across six blind
test sets
ResultsPER decreases with the improvements in the AMTs.
Least PER for G.711A based acoustic model.
Cocktail acoustic model is comparable to the matched condition.
SPIRE LAB, IISc, Bangalore 27
Conclusion
Section 5
1 Introduction
2 Previous Works
3 Experiments
4 Results
5 Conclusion
SPIRE LAB, IISc, Bangalore 28
Conclusion
Key Finding 1
SPIRE LAB, IISc, Bangalore 29
NarrowbandHigh bit-rate
codec
G.711A
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
Conclusion
Key Finding 1
SPIRE LAB, IISc, Bangalore 29
NarrowbandHigh bit-rate
codec
G.711A
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
N
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C
ACOUSTIC MODEL BUILT ON
DATA
?
Single Encoding-Decoding
Conclusion
Key Finding 2
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
2C
3
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
ACOUSTIC MODEL BUILT ON
DATA
C1
C3
C2
CN
COCKTAIL ACOUSTIC MODEL
C1
C2
C3
Cocktail Acoustic Model
SPIRE LAB, IISc, Bangalore 30
Conclusion
Key Finding 2
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
C1
ACOUSTIC MODEL BUILT ON
DATA
?C
2C
3
Tandem Encoding-Decoding
FEATUREEXTRACTION
RECOGNIZEDPHONEME
CLEAN/ DISTORTED
DUE TO CODECS
ACOUSTIC MODEL BUILT ON
DATA
C1
C3
C2
CN
COCKTAIL ACOUSTIC MODEL
C1
C2
C3
Cocktail Acoustic Model
SPIRE LAB, IISc, Bangalore 30
Conclusion
Summary
Conclusions
1 Studied the codec induced distortion on the ASR performance.
2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.
3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.
Future works
1 Effectiveness of the best performing models along with languagemodels.
2 Compensation of the codec induced distortions to aid ASR.
SPIRE LAB, IISc, Bangalore 31
Conclusion
Summary
Conclusions
1 Studied the codec induced distortion on the ASR performance.
2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.
3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.
Future works
1 Effectiveness of the best performing models along with languagemodels.
2 Compensation of the codec induced distortions to aid ASR.
SPIRE LAB, IISc, Bangalore 31
Conclusion
Summary
Conclusions
1 Studied the codec induced distortion on the ASR performance.
2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.
3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.
Future works
1 Effectiveness of the best performing models along with languagemodels.
2 Compensation of the codec induced distortions to aid ASR.
SPIRE LAB, IISc, Bangalore 31
Conclusion
Summary
Conclusions
1 Studied the codec induced distortion on the ASR performance.
2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.
3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.
Future works
1 Effectiveness of the best performing models along with languagemodels.
2 Compensation of the codec induced distortions to aid ASR.
SPIRE LAB, IISc, Bangalore 31
Conclusion
Summary
Conclusions
1 Studied the codec induced distortion on the ASR performance.
2 G.711A, a narrowband high bit rate codec, results in the bestASR accuracy.
3 If the pool of tandem topologies are known a priori, cocktailacoustic model could be used.
Future works
1 Effectiveness of the best performing models along with languagemodels.
2 Compensation of the codec induced distortions to aid ASR.
SPIRE LAB, IISc, Bangalore 31
Conclusion
THANK YOU
SPIRE LAB, IISc, Bangalore 32
Recommended