Introduction to Language Models for ASR · 2017. 2. 14. · Lexicon W o rd hyp otheses Syntax Intro duction to language mo dels CSR: acoustic and language mo dels + sea rch Hiera

Introdu tionLexi onWord hypothesesSyntaxIntrodu tion to Language Models for ASRSamudravijaya K hief�tifr.res.inTata Institute of Fundamental Resear h04-JAN-2006Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntaxIntrodu tion to language modelsCSR: a ousti and language models + sear hHierar hy of units: Knowledge sour esLexi al modelsPronun iation di tionary: list/lexi al treePronun iation variabilityGeneration of word hypothesesA ousti model: Phone hypotheses ! Lexi al model: Word hypothesesWord hypothesis (latti e) ! DAGSear h for most likely word sequen eSyntax and Semanti sLinguisti ally motivated modelsStatisti al grammarsRobust estimation of probabilitiesMeasure of quality of language modelsSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax A ousti vs symboli domain pro essing in ASRBayes' ruleHierar hy of units: Knowledge sour esASR: A Blo k Diagram PatternMatching

Signal

Feature

MatchingAcoustic

Extraction

LanguageModel

Matching

Model (acoustic domain)

(symbolic domain)

Sentence Hypothesis

Symbol sequence

TestingTraining

Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax A ousti vs symboli domain pro essing in ASRBayes' ruleHierar hy of units: Knowledge sour esKnowledge sour esPhone sequen e/phone hypothesis latti e==> Senten e hypothesis


Introdu tionLexi onWord hypothesesSyntax A ousti vs symboli domain pro essing in ASRBayes' ruleHierar hy of units: Knowledge sour esKnowledge sour esPhone sequen e/phone hypothesis latti e==> Senten e hypothesisLexi on manmnaSyntax Some man brought the apple.Apple the brought man some.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax A ousti vs symboli domain pro essing in ASRBayes' ruleHierar hy of units: Knowledge sour esKnowledge sour esPhone sequen e/phone hypothesis latti e==> Senten e hypothesisLexi on manmnaSyntax Some man brought the apple.Apple the brought man some.Semanti s Time ies like an arrowFruit ies like bananaPragmati s Turn left for the nearest hemistSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax A ousti vs symboli domain pro essing in ASRBayes' ruleHierar hy of units: Knowledge sour esCombining A ousti and Language ModelsLet Y : A ousti feature sequen eW : Word sequen e W = argmax P(WjY)WBayes' rule: P(WjY) = P(YjW)P(W)P(Y) W = argmax P(YjW)P(W)P(Y)WCSR: A ousti model, Language model and Hypothesis sear hSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax A ousti vs symboli domain pro essing in ASRBayes' ruleHierar hy of units: Knowledge sour esHierar hy of Units W = argmax P(YjW)P(W)P(Y)W

Sour e: \State of the Art in ASR (and beyond)", Steve YoungSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netPronun iation di tionary* Representing a word as a sequen e of units of re ognition* Pronun iation rules an be used* Manual veri� ation is ne essaryaage aa vbg g eaaja aa vbj jaba a vbb babbAsa a vbbb b aa sabhI a vbb bh iiSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netPronun iation di tionary* Representing a word as a sequen e of units of re ognition* Pronun iation rules an be used* Manual veri� ation is ne essaryaage aa vbg g eaaja aa vbj jaba a vbb babbAsa a vbbb b aa sabhI a vbb bh iiMultiple pronun iationsvij~nAna v i vbj j n aa nvij~nAna v i vbg g y aa nSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netExamples of pronun iation variabilityFeature spreading in oales en e: ae n t � > ae t where ae is nasalised


Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netExamples of pronun iation variabilityFeature spreading in oales en e: ae n t � > ae t where ae is nasalisedAssimilation ausing hanges in pla e of arti ulation:n � > m before labial stop as in input, an be, grampa


Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netExamples of pronun iation variabilityFeature spreading in oales en e: ae n t � > ae t where ae is nasalisedAssimilation ausing hanges in pla e of arti ulation:n � > m before labial stop as in input, an be, grampaAsyn hronous arti ulation errors ausing stop insertions:warm[p℄th, ten[t℄th, on[t℄ e, leng[k℄thSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netExamples of pronun iation variabilityFeature spreading in oales en e: ae n t � > ae t where ae is nasalisedAssimilation ausing hanges in pla e of arti ulation:n � > m before labial stop as in input, an be, grampaAsyn hronous arti ulation errors ausing stop insertions:warm[p℄th, ten[t℄th, on[t℄ e, leng[k℄thFast spee h:probably �� > problySamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netExamples of pronun iation variabilityFeature spreading in oales en e: ae n t � > ae t where ae is nasalisedAssimilation ausing hanges in pla e of arti ulation:n � > m before labial stop as in input, an be, grampaAsyn hronous arti ulation errors ausing stop insertions:warm[p℄th, ten[t℄th, on[t℄ e, leng[k℄thFast spee h:probably �� > probly||||||||||||{r-insertion in vowel-vowel transitions:stir [r℄up, dire tor [r℄ofSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netExamples of pronun iation variabilityFeature spreading in oales en e: ae n t � > ae t where ae is nasalisedAssimilation ausing hanges in pla e of arti ulation:n � > m before labial stop as in input, an be, grampaAsyn hronous arti ulation errors ausing stop insertions:warm[p℄th, ten[t℄th, on[t℄ e, leng[k℄thFast spee h:probably �� > probly||||||||||||{r-insertion in vowel-vowel transitions:stir [r℄up, dire tor [r℄ofContext dependent deletion:nex[t℄ weekSour e: \State of the Art in ASR (and beyond)", Steve YoungSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Pronun iation di tionary: list/lexi al treePronun iation variabilityPhone netRepresentation of a word as a phone nete clk k a clt t i se clk k a clt t I s e clk clt t I s

e clk clt t i s

e clk k a clt t sI

i



e clk clt t i s

e clk k a clt t sI

i

* "probabilities" of pronun iations an be estimated* many pronun iations ! higher word onfusions! performan e degradationSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR


e clk clt t i s

e clk k a clt t sI

i

* "probabilities" of pronun iations an be estimated* many pronun iations ! higher word onfusions! performan e degradation* Diale t and A ent (native/non-native speakers)* seek a dynami speaker spe i� pron di tionary.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageGeneration of word hypothesesGeneration of word hypotheses an result in* a single sequen e of words,* in a olle tion of the n-best word sequen es,* in a latti e of partially overlapping word hypotheses.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageGeneration of word hypothesesGeneration of word hypotheses an result in* a single sequen e of words,* in a olle tion of the n-best word sequen es,* in a latti e of partially overlapping word hypotheses.Goal: Find the path with the least ost (most likely word sequen e)A ousti eviden e ! Word latti e �� > DAGSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageProbabilities of phones at various time instants


Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageProbabilities of phones at various time instants

one

p(sil)

w n

ax oh

ahsil


Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageLatti e of phone hypotheses ! latti e of word hypotheses


Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageWord hypotheses at various time instants

Take Fidelity's ase as an exampleSour e: "EÆ ient algorithms for Spee h Re ognition", M.K.Ravishankar, PhD thesis: CMU-CS-96-143Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageWord Latti e as a Dire ted A y li Graph


Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + language

Sour e: http://www. s.washington.edu/resear h/jair/volume5/helzerman96a-html/node7.htmlSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Phone hypotheses ! word latti eDire ted A y li GraphEviden e: a ousti + languageSear h for most likely utteran eGoal: Find the path with the least ost=== most likely word sequen eAsso iate ost with ea h edge of DAG ost = - ( a ousti eviden e + language eviden e)Given a graph with N nodes and E edges, the least- ost path anbe found in time proportional to N+ESamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsContext Free GrammarA ommonly used mathemati al stru ture for modeling onstituentstru ture of natural languages.* Rules : S �� > NP + VP* Terminal symbols : vo abulary (words of the language)* Non-terminal symbols : NPSenten e Generator or parserSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsAn example

Sto hasti CFG assigns probs to various transitionsSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsBNF grammar is useful for ASR in a spe i� task domain.Task: voi e-operated interfa e for phone diallingDial a number or Call a person by nameExample senten es:I Dial three three two six �ve fourI Dial nine zero four one oh nineI Phone WoodlandI Call Steve YoungSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsBNF grammar is useful for ASR in a spe i� task domain.Task: voi e-operated interfa e for phone diallingDial a number or Call a person by nameExample senten es:I Dial three three two six �ve fourI Dial nine zero four one oh nineI Phone WoodlandI Call Steve Young$digit = ONE | TWO | THREE | FOUR | FIVE |SIX | SEVEN | EIGHT | NINE | OH | ZERO;$name = [ JOOP ℄ JANSEN |[ JULIAN ℄ ODELL |[ DAVE ℄ OLLASON |[ PHIL ℄ WOODLAND |[ STEVE ℄ YOUNG;DIAL <$digit> | (PHONE|CALL) $nameSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsProbability of a word sequen eLet W denote the word sequen e w1;w2; � � � ;wi .p(W) = p(w1)�p(w2jw1)�p(w3jw1;w2)�� p(wi j wi�1;wi�2; � � � ;w1)Not pra ti al due to `unlimited history':too many parameters for even a short WMarkovian assumption:I Disregard `too old' historyI k th order Markovian approximation: remember only 'k'previous wordsI Assume stationaritySamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language models


Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsParameter EstimationMaximum Likelihood Estimation: relative frequen iesUse ounts from training data.unigram: p(w) = C (w)=jV jSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsParameter EstimationMaximum Likelihood Estimation: relative frequen iesUse ounts from training data.unigram: p(w) = C (w)=jV jbigram: p(wnjwn�1) = C (wn�1;wn)Pw C (wn�1wn)p(wnjwn�1) = C (wn�1;wn)C (wn�1)Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsParameter EstimationMaximum Likelihood Estimation: relative frequen iesUse ounts from training data.unigram: p(w) = C (w)=jV jbigram: p(wnjwn�1) = C (wn�1;wn)Pw C (wn�1wn)p(wnjwn�1) = C (wn�1;wn)C (wn�1)n-gram: p(wnjw1w2 � � �wn�1) = C (w1;w2; � � � ;wn�1;wn)C (w1;w2; � � � ;wn�1)Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsData sparsity problem


Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsData sparsity problemExample: 1000 word vo abulary orpus divided into training set ofsize 1,500,000 words and test set of size 300,000 words.Observation: 23% of the trigrams o uring in test data nevero urred in the training subset!Similar observation with a 38 million word newspaper orpus.Robust parameter estimation is neededSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEquivalen e lassesCluster `similar' words into groups and use the group instead ofa tual word. Let Ci be the lass word wi was assigned to. Then,p(w3jw1;w2) = p(w3jC3)p(C3jw1;w2)p(w3jw1;w2) = p(w3jC1;C2)Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEquivalen e lassesCluster `similar' words into groups and use the group instead ofa tual word. Let Ci be the lass word wi was assigned to. Then,p(w3jw1;w2) = p(w3jC3)p(C3jw1;w2)p(w3jw1;w2) = p(w3jC1;C2)Manual lustering:Linguisti ategories: Parts of spee hSemanti ategories in narrow dis ourse domain (ATIS).In general, manual lustering does not improve mu h.De ision tree methodology an be used as well.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEquivalen e lassesCluster `similar' words into groups and use the group instead ofa tual word. Let Ci be the lass word wi was assigned to. Then,p(w3jw1;w2) = p(w3jC3)p(C3jw1;w2)p(w3jw1;w2) = p(w3jC1;C2)Manual lustering:Linguisti ategories: Parts of spee hSemanti ategories in narrow dis ourse domain (ATIS).In general, manual lustering does not improve mu h.De ision tree methodology an be used as well.Automati lass map onstru tion:Maximise the likelihood of the training text given the lass modelby making iterative ontrolled hanges to an initial lass map.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEliminating Zero Probabilities : SmoothingFrom the same training data, derive revised n-grams su h that non-gram is zero.Dis ounting: Take away some ounts from `high ount words' anddistribute them among `zero/low ount words'.


Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEliminating Zero Probabilities : SmoothingFrom the same training data, derive revised n-grams su h that non-gram is zero.Dis ounting: Take away some ounts from `high ount words' anddistribute them among `zero/low ount words'.Add-One Dis ounting:p(wnjwn�1) = C (wn�1;wn)+1C (wn�1)+jV jSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEliminating Zero Probabilities : SmoothingFrom the same training data, derive revised n-grams su h that non-gram is zero.Dis ounting: Take away some ounts from `high ount words' anddistribute them among `zero/low ount words'.Add-One Dis ounting:p(wnjwn�1) = C (wn�1;wn)+1C (wn�1)+jV jProblem: Too mu h mass an ow to zero ount words. LetC (wn�1;wn) = 2; C (wn�1) = 10; jV j = 1000:2 = 2=10Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEliminating Zero Probabilities : SmoothingFrom the same training data, derive revised n-grams su h that non-gram is zero.Dis ounting: Take away some ounts from `high ount words' anddistribute them among `zero/low ount words'.Add-One Dis ounting:p(wnjwn�1) = C (wn�1;wn)+1C (wn�1)+jV jProblem: Too mu h mass an ow to zero ount words. LetC (wn�1;wn) = 2; C (wn�1) = 10; jV j = 1000:2 = 2=10 ! (2 + 1)=(10 + 100) ! 3=110 = 0:03Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsAdd-One Dis ounting:p(wnjwn�1) = C (wn�1;wn)+1C (wn�1)+jV jC (wn�1;wn) = 2; C (wn�1) = 10; jV j = 1002=10 ! (2 + 1)=(10 + 100) ! 3=110Add less than One:p(wnjwn�1) = C (wn�1;wn)+�C (wn�1)+�jV jIf � = 0:1 2=10 ! (2 + 0:1)=(10 + 10) ! 2:1=20Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsAdd-One Dis ounting:p(wnjwn�1) = C (wn�1;wn)+1C (wn�1)+jV jC (wn�1;wn) = 2; C (wn�1) = 10; jV j = 1002=10 ! (2 + 1)=(10 + 100) ! 3=110Add less than One:p(wnjwn�1) = C (wn�1;wn)+�C (wn�1)+�jV jIf � = 0:1 2=10 ! (2 + 0:1)=(10 + 10) ! 2:1=20If � = 0:01, 2=10 ! (2 + 0:01)=(10 + 1) ! 2:01=11Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsGood-Turing Dis ountingLet N denote the number of bigrams that o ured times in the orpus.For bigrams that never o ured, the revised ount is � = N1N0In general, � = ( + 1)N +1N * Proper normalization is needed.* Suitable for estimation from large data.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsUsing n-gram `hierar hy': Combining frequen iesLinear interpolation of n-gramsp̂(w3jw1;w2) = �1p(w3jw1;w2) + �2p(w3jw2) + �3p(w3)with �i > 0; Pi �i = 1:0Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsUsing n-gram `hierar hy': Combining frequen iesLinear interpolation of n-gramsp̂(w3jw1;w2) = �1p(w3jw1;w2) + �2p(w3jw2) + �3p(w3)with �i > 0; Pi �i = 1:0Deleted Interpolation:Estimate �i s using held-out (or deleted data)Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsUsing n-gram `hierar hy': Combining frequen iesLinear interpolation of n-gramsp̂(w3jw1;w2) = �1p(w3jw1;w2) + �2p(w3jw2) + �3p(w3)with �i > 0; Pi �i = 1:0Deleted Interpolation:Estimate �i s using held-out (or deleted data)Detailed versionMake �i s a fun tion of ontext - �i (w1;w2)If a bigram ount is more a urate, in rease �1 (weight for betterestimated trigram).Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsUsing n-gram `hierar hy': Ba ko� if neededLinear interpolation:p̂(w3jw1;w2) = �1p(w3jw1;w2) + �2p(w3jw2) + �3p(w3)Ba ko�:if trigram ount > 0 no interpolationBa ko� to bigram otherwiseWe \ba ko�" to a lower order n-gram only if we have zeroeviden e for a higher order n-gram.A non-linear method of ombining ounts.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsBa ko� GrammarAn algorithm for omputing ba ko� trigram grammer isif (trigramCount) > 0) { // no hange in trigramProbelse if (bigramCount) > 0){ trigramProb = a1 * bigramProb} else { trigramProb = a2 * unigramProb}Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsAdaptive modelsCross domain adaptation:Create, at run-time, a dynami n-gram P a he (w jhistory) based onthe partial do ument seen. Interpolate:Pdynami (w jh) = �P a he(w jh) + (1� �)P a he(w jh)Optimize � on held-out data.Ca he LMs have been shown to yield lower perplexity andre ognition errors.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsAdaptive modelsCross domain adaptation:Create, at run-time, a dynami n-gram P a he (w jhistory) based onthe partial do ument seen. Interpolate:Pdynami (w jh) = �P a he(w jh) + (1� �)P a he(w jh)Optimize � on held-out data.Ca he LMs have been shown to yield lower perplexity andre ognition errors.Intra-domain adaptation:I Cluster heterogeneous training data along dimensions ofvariability (e.g., topi ).I Identify the topi of test dataI Dynami ally build LM from relevant subset of training orpusSamudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsEntropy: A measure of informationLet a random variable X have a probability fun tion p(x). Theentropy of X is H(X ) = �X p(x)log2p(x)The term 2H is alled the perplexity; it an be thought of as theweighted average number of hoi es a random variable has tomake: average bran hing fa tor.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsLower Cross Entropy = More a urate language modelWe have entropy of X asH(X ) = �X p(x)log2p(x)Cross Entropy is useful for omparing di�erent probabilitisti (language) models. If we do not know the a tual prob distribution,we an use a model of p; let m be the model. The ross entropy ofm on p is de�ned byH(p;m) = limn!in�nity � 1nX p(w1; � � � ;wn)logm(w1; � � � ;wn)H(p;m) is an upper bound on true entropy H(p).Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsQuality of Language Models : an IllustrationTraining data: WSJ0 orpus (20,000 words)Test data: 1.5 million wordsn-gram PerplexityUnigram 962Bigram 170Trigram 109


Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsA Short list of Relevant Books1. \Mathemati al Models for Spee h Te hnology"' Stephen Levinson,ISBN: 0-470-84407-8 Wiley, Mar h 20052. \Mathemati al Foundations of Spee h and Language Pro essing",M. Johnson, S. Khudanpur, M. Ostendorf and R. Rosenfeld(Editors), IMA Volumes in Mathemati s and Its Appli ations, Vol.138, Springer-Verlag, New York, Jan 2004.3. \Spee h and Language Pro essing: An Introdu tion to NaturalLanguage Pro essing, Computational Linguisti s, and Spee hRe ognition", By Daniel Jurafsky and J.H.Martin, ISBN8178085941 Pearson Edu ation Asia, 2000. Pri e Rs. 4254. \Foundations of Statisti al Natural Language Pro essing",Christopher D. Manning and Hinri h S htze, The MIT Press, 19995. \Statisti al Methods for Spee h Re ognition", Frederi k Jelinek,The MIT Press, 1997.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Introdu tionLexi onWord hypothesesSyntax Context Free Grammar; BNFStatisti al Language models: n-gramsSmoothing and ombining frequen iesQuality of language modelsOther Referen es1. \HLT-NAACL 2004 Workshop on Spoken Language Understandingfor Conversational Systems and Higher Level Linguisti Informationfor Spee h Pro essing".http://www.resear h.att. om/ dtur/NAACL04-Workshop/2. \Ma hine Learning", Publisher: Springer S ien e+Business MediaB.V., ISSN: 0885-6125 (Paper) 1573-0565 (Online); Volume 60,Numbers 1-3, September 2005 "Ma hine Learning in Spee h andLanguage Te hnologies" Guest editors: Pas ale Fung and Dan Roth.3. \Survey of the State of the Art in Human Language Te hnology\http:// slu. se.ogi.edu/HLTsurvey4. \Two de ades of statisti al language modeling: Where do we gofrom here?" R. Rosenfeld, Pro . IEEE, 88(8), 2000; http://www. s. mu.edu/~roni/papers/survey-slm-IEEE-PROC-0004.pdf5. \Spoken Language Understanding | An Introdu tion to theStatisti al Framework" Y. Wang, L. Deng, and A. A ero, In IEEESignal Pro essing Magazine, Vol 27 No. 5. Sepetmber 2005.Samudravijaya K hief�tifr.res.in Introdu tion to Language Models for ASR

Documents

Introduction to Language Models for ASR · 2017. 2. 14. · Lexicon W o rd hyp otheses Syntax Intro duction to language mo dels CSR: acoustic and language mo dels + sea rch Hiera