44
Presented by, K.L Srinivas (M.Tech 2 nd year) CS626-460: Lecture 34 Pronunciation Scoring For Language Learners Using A Phone Recognition System K.L Srinivas (M.Tech 2 year) Guided by, Prof. Preeti Rao (Elect. Dept) Department of Electrical Engineering, IIT Bombay Mumbai , India

CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Presented by,K.L Srinivas (M.Tech 2nd year)

CS626-460: Lecture 34

Pronunciation Scoring For Language Learners Using A Phone Recognition

System

K.L Srinivas (M.Tech 2nd year)Guided by,

Prof. Preeti Rao (Elect. Dept)

Department of Electrical Engineering, IIT BombayMumbai , India

Page 2: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Introduction

Pronunciation refers to the manner in which a particular word of a language is uttered.

Motivation� Accurate pronunciation or articulation is a vital component of a language

acquisition process. � Fluency in speech of a non-native speaker of a language can be judged

by pronunciation and prosody.

Department of Electrical Engineering , IIT Bombay 2

by pronunciation and prosody. � Non availability of a classroom environment for learners.

Subjective EvaluationWord spoken: KaleidoscopicSpeaker 1 Speaker 2

Page 3: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Problem statement

� Developing computer based automatic pronunciation scoring system.� Accessing the closeness of language learner pronunciation to that of reference speaker (already stored in system).� To provide language learner with pronunciation score and feedback.

Department of Electrical Engineering , IIT Bombay 3

Page 4: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

A brief on Automatic Speech RecognitionRecognition

Page 5: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Introduction

� Automatic speech recognition (ASR) is a process by which an acoustic speech signal is converted into a set of words.

� Getting a computer to understand spoken language.

� Approaches to ASR� Template matching� Knowledge-based (rule based approach)

Department of Electrical Engineering , IIT Bombay 5

� Knowledge-based (rule based approach)� Statistical approach (machine learning)

Page 6: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Statistical based approach :

� Collect a large corpus of transcribed speech recordings.

� Train the computer to learn the corresponding instances (Machine learning).

� At run time, apply statistical processes to search through the space of all possible solutions and pick the statistically most likely one.

Department of Electrical Engineering , IIT Bombay 6

one.

Page 7: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Speech recognition tool kits :

� Sphinx and HTK are two widely accepted and used speech recognition tools.– CMU sphinx : Carnegie Mellon University (CMU)– HTK : Cambridge University

� Both the frameworks are used for developing, training and testing a speech model from existing corpus speech data.

Department of Electrical Engineering , IIT Bombay 7

testing a speech model from existing corpus speech data.

� Both use Hidden Markov Modeling techniques.

Page 8: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

MFCC feature vector :

• The Mel-Frequency Cepstrum Coefficients (MFCC) is a popular choice

• Frame size : 25 msec• Hop size : 10 msec• 39 feature per 10ms frame

• Absolute : Log Frame Energy (1) and MFCCs (12) • Delta : First-order derivatives of the 13 absolute

Department of Electrical Engineering , IIT Bombay 8

• Delta : First-order derivatives of the 13 absolute coefficients

• Delta-Delta : Second-order derivatives of the 13 absolute coefficients

Page 9: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Sphinx 3 :

Training :

Department of Electrical Engineering , IIT Bombay 9

Testing / Decoding:

Page 10: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Decoder ouput :

Recognition Hypothesis :– This gives the single best recognition result for each utterance

processed.– Linear word sequence with their time segmentation and their

scores.

Output format :

Department of Electrical Engineering , IIT Bombay 10

Output format :<word> <start frame> <end frame> <AScr> <LM Score><AScore +LM Score> <Ascale>

Page 11: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

�Non-native speech characters:

� Phone substitutions: S in word ‘she’ pronounced as s

� Phonotactic constraints: Stop cluster sk in school pronounced as iskUl

� Use of language model masks out the non-nativeness during recognition.

Automatic Speech Recognition for non-native speech

Department of Electrical Engineering , IIT Bombay 11

recognition.

� Accuracy of state-of-the-art phone recognition systems as low as 50%-70%

� Traditional ASR techniques cannot be used for non-native speech

� Phone recognition to be carried out in constrained mode

Page 12: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Back to pronunciation scoring

Page 13: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Pronunciation Scoring System

Generation of Pronunciation

Variants

Constrained Phone Decoder

Variant Boundary

Refinement and

Canonical transcription of the utterance

Input speech signal

Department of Electrical Engineering , IIT Bombay 13

Variant Selection

Refinement and Prosodic analysis

Prosody Score Articulation Score

Pronunciation Score

Page 14: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Pronunciation Variants

�Challenges

Canonical form: SIL f aa n d aa m ee n’ clt t aa l s SIL

Variant_1: SIL f aa n d a m ee n’ clt t aa l s SILVariant_2: SIL f aa n d ee m ee n’ clt t aa l s SIL

Word: Fundamentals

Department of Electrical Engineering , IIT Bombay 14

�Challenges• No ready database of speakers of Indian English• Multiple L1s for Indian speakers poses further challenges.• Native Hindi and native English databases are available

Page 15: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Constrained Phone DecodingHMM based recognizers

� HTK 3.4

� Sphinx 3

Decoding

Extraction of Decoder in Acoustic Models

Department of Electrical Engineering , IIT Bombay 15

Extraction of MFCC Feature

Vectors

Decoder in Forced

Alignment Mode

Input Speech Utterance

Acoustic Models from training

Variants

Aligned Phone Sequence with likelihood for each

variant

Page 16: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Variant Selection

Select Variant with the highest likelihood

Aligned Phone Sequence with likelihood for each

variant

Visual Feedback and Articulation Score

� Strik and Cucchiarini (2000): Pronunciation variations and modeling

� Goronzy, Rapp and Komp (2004 ): Non-native pronunciation variations and

Department of Electrical Engineering , IIT Bombay 16

generation ( native English speakers speaking German)

� Wesenick and Schiel (1994 ), Wesenick (1996): Generation of rules for

German pronunciation variations

� Franco et al. (1997) , Franco et al. (2000): A paradigm for automatic

assessment of pronunciation quality.

� Witt and Young (2000): Presented likelihood based goodness of

pronunciation scheme

Page 17: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Databases� TIMIT database

• 630 speakers of 8 major dialects of American English.• Each speaking 10 phonetically rich sentences.

� TIFR database• 100 native speakers of Hindi.• Each speaking 10 phonetically rich sentences.

� Indian English database - Testing• 30 Indian college students each speaking the 2 common sentences

Training

Department of Electrical Engineering , IIT Bombay 17

• 30 Indian college students each speaking the 2 common sentences from TIMIT database.

� 47 class TIMIT models: Entire phone set from TIMIT.

� 52 class Union models: Entire TIMIT phone set(47 phones) and 5 additional phones from the TIFR phone set making a total of 52 phones.

� 48 class Union models: Entire TIFR Hindi phone set(36 phones) and 12 phones from TIMIT.

Page 18: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Experiments and Evaluation

The focus of this work is to investigate the effect of selection of phone models from one of 47, 52-union and 48- union phonemodels

� Method I: The number of instances in which the surface transcription is within the top N decoded sequences in terms of

Department of Electrical Engineering , IIT Bombay 18

transcription is within the top N decoded sequences in terms of likelihood score.

�Method II: The edit distance between the most likely phone sequence and the surface transcription in terms of %correct and %accuracy

�Method III: Normalized likelihood error. A value of “0” for this measure indicates the best achievable performance.

Page 19: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Performances of Method I and IITabulation of Method I and Method II of evaluation for HTK 3.4 and Sphinx 3

HTK 3.4 Sphinx3

Method I Method II Method I Method IIDecoder models

# of Unique variants

Reference transcription in

%Corr %Acc Reference transcription in

%Corr %Acc

Top 1 Top 5 Top 1 Top 5

SA1 SA1

Department of Electrical Engineering , IIT Bombay 19

SA1 SA147class 636 5 7 82.4 79.4 2 6 83.8 80.252class 1263 6 9 81.8 80.0 1 6 82.5 78.448class 763 21 24 96.2 94.6 12 17 92.2 89.1

SA2 SA247class 1026 6 11 88.0 85.8 5 9 88.8 86.4

52class 1026 7 13 88.6 87.0 5 10 85.9 83.848class 1026 16 20 92.8 92.2 7 9 89.4 87.7

Page 20: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Performance of Method IIIDistribution of the likelihood scores across the 60 utterances

Department of Electrical Engineering , IIT Bombay 20

48-phone class has average likelihood error closest to zero of the three phone sets.

Page 21: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Articulation Scoring methods :

�Articulation score indicates the closeness of language learner’s pronunciation with native speaker (of target language) pronunciation.

�Detects phoneme level mispronunciation and extent to which phoneme has been mispronounced.

�Algorithm uses speech models derived from speech database of native speakers.

Department of Electrical Engineering , IIT Bombay 21

speakers.

�Uses forced align tools in the background to get acoustic scores (quantitative measure indicating acoustic fit for that particular speech segment).

� Two methods investigated:

• GOP (Goodness of Pronunciation) score [2].

• Method by Sunil K. Gupta [9].

Page 22: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

GOP scoring method :

� Confidence with which particular phone has been recognized .

�Also called as Goodness Of Pronunciation (GOP) score.

� GOP score is given by normalized log posterior probability

( )( )( )( ) log | ( )pGOP p P p O NF p=

Department of Electrical Engineering , IIT Bombay 22

( )( )

( )

( )

| ( )( ) log ( )

max | ( )

p

pq Q

P O p P pGOP p NF p

P O q P q∈

=

Page 23: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

GOP scoring method (cont.) :

( )( ) ( )( )( ) ( )log | log max |( )

( ) ( )

p pq Q

p g

P O p P O qGOP p l l

NF p NF p

∈= − = −

Department of Electrical Engineering , IIT Bombay 23

Block diagram for Articulation scoring

Page 24: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Method by Sunil K. Gupta :�Shortcoming of GOP score :

• Threshold selection was based on subjective rating of human judges.

• Not providing any quantitative measure to measure extent of mispronunciation.

• Free decoder not accurate enough leading to alignment errors.

Department of Electrical Engineering , IIT Bombay 24

� In this method two speech models have to be prepared:

• 48 class phone models ( 36 TIFR Hindi + 12 TIMIT English)

• Garbage model ( all phonemes of speech data combined to get one speech model)

Page 25: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Garbage model :� A single speech model combining all the phonemes of speech data.

� Entire speech corpus trained with garbage transcription.

Department of Electrical Engineering , IIT Bombay 25

Page 26: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Methodology (cont.) :

�Utterance is force aligned using Sphinx3_align with the reference transcription using 48 class phone models.

• Each phoneme of the transcription will have its own acoustic score.

• These log-likelihood scores are duration normalized given by

q il for q

Department of Electrical Engineering , IIT Bombay 26

�Similarly, utterance is force aligned with garbage transcription using Garbage model.

� Difference between these two likelihood is current phoneme likelihood

� This difference score (d) is used for coming up with phone articulation score using lookup score table.(explained in next slide)

q g id l l for q= −

g il for q

Page 27: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Formation of score table :

� For each utterance “In-grammar” and “Out-grammar” is formed

• In-grammar : When the transcription is conforming to target acoustic waveform.

• Out-grammar : Transcription selected is some random phrase from training database not conforming to target acoustic waveform.

� In-grammar and Out-grammar transcriptions are force aligned to come

Department of Electrical Engineering , IIT Bombay 27

� In-grammar and Out-grammar transcriptions are force aligned to come up with log-likelihood scores:

• In-grammar :

• Out-grammar :

.i iq g il and l for qi i i

q gd l l= −

.o oq g il and l for qo o o

q gd l l= −

Page 28: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Score table (cont.) :

�Using all the in-grammar points and out-grammar points , pdf is formed for each phoneme.

( , )i i if N µ σ=( , )o o of N µ σ=

Department of Electrical Engineering , IIT Bombay 28

�Using these probability density functions are used for coming up with score table .(table shown in results section)

Page 29: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Results :

� Histograms and Gaussian pdf (approximating data points) for both In-grammar and Out-grammar for phoneme “aa”:

Department of Electrical Engineering , IIT Bombay 29

Page 30: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Results (cont.) :

� Histograms and Gaussian pdf (approximating data points) for both In-grammar and Out-grammar for phoneme “ee”:

Department of Electrical Engineering , IIT Bombay 30

Page 31: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Results (cont.) :

� Combined PDF of In-grammar and Out-grammar for “aa” :

Department of Electrical Engineering , IIT Bombay 31

Page 32: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Results (cont.) :

� Combined PDF of In-grammar and Out-grammar for “ee” :

Department of Electrical Engineering , IIT Bombay 32

Page 33: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Results (score table) :

�Below calculations and table is for phoneme “aa” :

�f denotes probability density function.

� and are In-grammar and Out-grammar mean respectively.

�For In-grammar and Out-grammar points :

( ) log ( ) log ( )i oh x f x f x= −

iµ oµ

Department of Electrical Engineering , IIT Bombay 33

�For In-grammar and Out-grammar points :

�Score table for phoneme “aa” in next slide :

( ) log ( ) log ( ) 1.242

( ) log ( ) log ( ) 0.272

i i i o i

o i o o o

h f f

h f f

µ µ µµ µ µ

= − =

= − = −

Page 34: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Score table (phoneme “aa” ) :

D h (x) Score

1.242 100

1.185 90

0.748 80

( )ih µlog(90 /10)

( )log(10)

ih µ

log(80 / 20)( )

log(10)ih µ

Department of Electrical Engineering , IIT Bombay 34

0.457 70

0.219 60

0 50

log(10)log(70 / 30)

( )log(10)

ih µ

log(60 / 40)( )

log(10)ih µ

log(50 / 50)( ) 0

log(10)ih µ =

Page 35: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Score table (phoneme “aa” ) :

D h (x) Score

0 50

-0.048 40

-0.1001 30

log(90 /10)( )

log(10)oh µ

log(80 / 20)( )oh µ

log(50 / 50)( ) 0

log(10)oh µ =

Department of Electrical Engineering , IIT Bombay 35

-0.164 20

-0.259 10

-0.272 0( )oh µ

( )log(10)

h µ

log(70 / 30)( )

log(10)oh µ

log(60 / 40)( )

log(10)oh µ

Page 36: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Result (Speaker 1 : Fundamentals) :Phone Correct Pronun. Incorrect Pronun.

d Score d Score

h -8647 -90232

aa -14397 60% -2990 80%

n -99.5 -8224

d -23238 -27363

aa -15280 60% -40304 0%

Department of Electrical Engineering , IIT Bombay 36

aa -15280 60% -40304 0%

m -756 -813

ee -19837 -19767

n’ -7571 -5941

SI -12570 -5250

t -2023 -6451

aa -8659.5 80% -10531 70%

l -10920 -2014

Page 37: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Result (Speaker 2 : Fundamentals) :Phone Correct Pronun. Incorrect Pronun.

d Score d Score

h 370 -9969

aa -10269 70% -10499 70%

n -9675 -4115

d -25162 -17556

Department of Electrical Engineering , IIT Bombay 37

aa -2295 80% -46778 0%

m -4100 -11144

ee -24043 -34685

n’ 4284 -5253

SI -4595 -10971

t -10534 -5146

aa -14271 50% -7271 80%

l -10917 -18839

Page 38: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Duration scoring :

�Duration score provides feedback on normalized relative duration difference between language learner speech and reference speaker speech.

�Denotes whether a particular syllable is stressed or not.

�If Li and Ri are their respective durations corresponding to phoneme qi

then ,utterance consisting of N phones can be denoted by:

( , ,.......... ) for language learner speechL L L L=

Department of Electrical Engineering , IIT Bombay 38

�Normalized durations given by:

1 2( , ,.......... ) for language learner speechNL L L L=

1 2( , ,.......... ) for reference speaker speechNR R R R=

^ ^

1 1

and i ii iN N

i ii i

L RL R

L R= =

= =∑ ∑

Page 39: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Duration scoring (cont.) :

�Overall duration score given by :

�Maximum duration score is ‘1’ and minimum is ‘0’.

^ ^

1

max 0,1N

i ii

D L R=

= − − ∑

Department of Electrical Engineering , IIT Bombay 39

Page 40: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Duration scoring (Results) :

�Speaker_1 was taken as reference and duration scores were calculated for other speakers.

�Speaker_1

Department of Electrical Engineering , IIT Bombay 40

Page 41: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Duration scoring (Results) :

�Speaker_1 Vs Speaker_2

�Duration score = 0.573 (low due to differences in ‘f’, ’a’ and ‘s’ duration)

Department of Electrical Engineering , IIT Bombay 41

Page 42: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Duration scoring (Results) :

�Speaker_1 Vs Speaker_3

�Duration score = 0.485 (low due to differences in ‘f’, ’a’ and ‘s’ duration)

Department of Electrical Engineering , IIT Bombay 42

Page 43: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

Feedback , Articulation and Duration Score

Speaker 2

Canonical Transcription (Reference speaker)

SIL f aa n d aa m ee n’ clt t aa l s SILSpeaker 1

TranscriptionSIL f aa n d ee m ee n’ clt t aa l s SIL

Department of Electrical Engineering , IIT Bombay 43

SIL f aa n d ee m ee n’ clt t aa l s SIL

Feedback

Articulation Score: 72% Duration Score: 0.573

SIL f aa n d ee m ee n’ clt t aa l s SIL

Page 44: CS626-460: Lecture 34 Pronunciation Scoring For Language ...cs626-460-2012/cs626... · Pronunciation refers to the manner in which a particular word of a language is uttered. Motivation

References1) Strik, H., Neri, A., and Cucchiarini, C. 2008. Speech technology for language tutoring. In

Proceedings of LangTech ( Rome, Italy,February 28-29, 2008).

2) Witt, S., and Young, S. 2000. Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication. Vol. 30, pp. 95-108, 2000.

3) Franco, H., et al. 2000. Automatic scoring of pronunciation quality. Speech Communication.Vol. 30, pp. 83-93, 2000.

4) Kawai, G., Hirose, K. 1998. A method for measuring the intelligibility and non nativeness of phone quality in foreign language pronunciation training. In Proceedings of ICSLP-98 (Sydney, Australia, November 30- December 04,1998) .pp. 1823-1826.

Department of Electrical Engineering , IIT Bombay 44

5) Goronzy, S., Rapp, S., Kompe, R. 2004. Generating non-native pronunciation variants for lexicon adaption. Speech Communication. Vol. 42, pp. 109-123, 2004.

6) Lee, K.F. 1998. Large-vocabulary speaker-independent continuous speech recognition: The SPHINX system. Ph.D. dissertation, Comput. Sci. Dep., Carnegie Mellon University.

7) Young, S., et al. 2006. The HTK Book v3. Cambridge University, 2006.

8) Samudravijaya, K., Rawat, K.D., and Rao, P.V.S. 1998. Design of Phonetically Rich Sentences for Hindi Speech Database. J. Ac. Soc. Ind. Vol. XXVI, December 1998, pp. 466-471.

9) Sunil K. Gupta, Ziyi Lu and Fengguang Zhao, “ Automatic Pronunciation Scoring for Language Learning ”, U.S. Patent 7,219,059, May 15, 2007.