Upload
marion-mathews
View
213
Download
0
Embed Size (px)
Citation preview
Topic-independent Speaking-Style Transformation of Language model for
Spontaneous Speech Recognition
Yuya Akita , Tatsuya Kawahara
Introduction
• Spoken-style v.s. writing style– Combination of document and spontaneous corpus
• Irrelevant linguistic expression– Model transformation
• Simulated spoken-style text by randomly inserting fillers• Weighted finite-state transducer framework (?)• Statistical machine translation framework
• Problem with Model transformation methods– Small corpus, data sparseness – One of solutions:
• POS tag
Statistical Transformation of Language model
• Posteriori:
– X: source language model (document style)– Y: target language model (spoken language)
• So,
– P(X|Y) and P(Y|X) are transformation model
• Transformation models can be estimated using parallel corpus – n-gram count:
XP
YPYXPXYP
||
YXP
XYPXPYP
|
|
yxP
xyPxNyN n
LMn
LM |
|11
Statistical Transformation of Language model (cont.)
• Data sparseness problem for parallel corpus– POS information
• Linear interpolation• Maximum entropy
Training
• Use aligned corpus– Word-based transformation probability
– POS-based transformation probability
– Pword(x|y) and PPOS(x|y) are estimated accordingly
xN
xyNxyPword
|
xN
xyNxyPPOS
|
Training (cont.)
• Back-off scheme
• Linear interpolation scheme
• Maximum entropy scheme
– ME model is applied to every n-gram entry of document-style model
– spoken-style n-garm is generated if transform probability is larger than a threshold
exists if else
exists if
yxxyP
yxxyPxyP
POSPOS
word
|
||
xyPxyPxyP POSword |1||
i
ii yxfZ
xyP ,exp1
|
Experiments
• Training coprus:– Baseline corpus: National Congress of Japan, 71M words– Parallel corpus: budget committee in 2003, 666K– Corpus of Spontaneous Japan, 2.9M words
• Test corpus:– Another meeting of Budget committee in 2003, 63k words
Experiments (cont.)
• Evaluation of Generality of transformation model• LM
Experiments (cont.)
• r
Conclusions
• Propose a novel statistical transformation model approach
Non-stationary n-gram model
Concept
• Probability of sentence– n-gram LM
• Actually,
• Miss long-distance and word position information while applying Markov assumption
n
iiinii wwwwPsP
1121 ,,...|
n
iiinii
n
iplplplpl
n
iplplplpl
wwwwP
wwwwP
wwwwPsP
iiiininiii
iiii
1121
1,,,,
1,,,,
,,...|
,,...,|
,...,,|
112211
112211
Concept (cont.)
•
n
iiinii
n
iplplplpl
twwwwP
wwwwPsPiiiininiii
1121
1,,,,
,,,...|
,,...,|112211
Training (cont.)
• ML estimation
• Smoothing– Use low order – Use small bins– Transform with
Smoothed normal ngram
• Combination– Linear interpolation– Back-off
twwC
twwCtwwwwp
ini
iniiinii ,,...,
,,...,,,,...,|
11
1121
Smoothing with lower order (cont.)
• Additive smoothing
• Back-off smoothing
• Linear interpolation
VtwwC
twwCtwwP
ii
iiii
,,
1,,,|
1
11
otherwise ,
0,, if ,|,| 11
1 twP
twwCtwwPtwwP
i
iiiiGTii
twPttwwPttwwP iiiii ,1,|,| 11
Smoothing with small bins (k=1) (cont.)
• Back-off smoothing
• Linear interpolation
• Hybrid smoothing
otherwise |,
0,, if ,|,|
1
111
ii
iiiiGTii wwP
twwCtwwPtwwP
111 |~
1,|,|ˆ iiiiii wwPttwwPttwwP
iiiii wPwwPwwP 1||~
11
11 ||~
iiGTii wwPwwP
111 |~
1,|,|ˆ iiiiii wwPttwwPttwwP
Transformation with smoothed ngram
• Novel method
– If t-mean(w) decreases, the word is more important– Var(w) is used to balance t-mean(w) for active words– active word: words can appears at any position in the sentences
• Back-off smoothing & linear interpolation
11 |
1,|
2
iiSMOOTHEDwMeant
wVar
ii wwPeZ
twwP i
i
Experiments
Observation: Marginal position & middle position
Experiments (cont.)
• NS bigram
Experiments (cont.)
• Comparison with three smoothing techniques
Experiments (cont.)
• Error rate with different bins
Conclusions
• Traditional n-gram model is enhanced by relaxing its stationary hypothesis and exploring the word positional information in language modeling
Two-way Poisson Mixture model
Essential
• Poisson distribution
• Poisson mixture model
!
|n
enP
n
kR
r
p
jjrkjkr xkYxXP
1 1,,||
!
|,,
,,,,
j
xjrk
jrkj x
ex
jrkj
Poisson Rk
Poisson 1
Poisson 2…
Class k
πk1
πk2
πkRk
Σ
xp
...
X2
X1
Document x Multivariate Poisson, dim = p (lexicon size)
*Word clustering: reduce Poisson dimension=> Two-way mixtures