Upload
julianna-wade
View
216
Download
0
Embed Size (px)
Citation preview
Relevance Language Modeling For Speech RecognitionKuan-Yu Chen and Berlin Chen
National Taiwan Normal University, Taipei, TaiwanICASSP 2011
2014/1/17Reporter:陳思澄
Outline
• Introduction• Basic Relevance Model(RM)• Topic-based Relevance Model • Modeling Pairwise Word Association• Experimental• Conclusion
• In the relevance modeling to IR, each query is assumed to be associated with an unknown relevance class , and documents that are relevant to the information need expressed in the query are samples drawn from .
• When RM is applied to language modeling in speech recognition, we can conceptually regard the search history as a query and each of its immediately succeeding words as a document, and estimate a relevance model for modeling the relationship between and .
Introduction
R RelevanceDocuments
idQuery
R
R
Hw
HwP ,RM
H w
Basic Relevance Model• The task of language modeling in speech recognition
can be interpreted as calculating the conditional probability .
• is a search history , usually expressed as a sequence , and is one of its possible immediately succeeding word.
• Because the relevance class of each search history is not known in advance, A local feedback-like can be used to obtain a set of relevant documents to estimate the joint probability .
HwP |
wHP ,RM
lhhhH ,..., 21H
w
HR
Basic Relevance Model• where is the probability that we would randomly select
and is the joint probability of simultaneously observing H and w in .• The joint probability of observing H together with w is:
• Bag-of-word assumption: Assume the words are conditionallyIndependent given and their order is no importance.
M
m mLm DwhhPDPwHP1 1RM |,,,,
M
m
L
l mlmm DhPDwPDP1 1
||
mDPmD mLm DwhhPDP |,,,1
mD
mD
Basic Relevance Model• The conditional probability:
• The background n-gram language model trained on a large general corpus can provide the generic constraint information of lexical regularities.
HP
wHPHwP
RM
RMRM
,
L
lml
M
mm
L
lml
M
mmm
DhPDP
DhPDwPDP
11
11
)|()(
)|(|)(
)|()1()|()|( ,1 LLBGRMAdapt hhwPHwPHwP
• TRM makes a step forward to incorporate latent topic information into RM modeling
• Relevance documents of each search history are assumed to share a same set of latent topic variables
describing the “word-document” co-occurrence characteristics.
KT,,T,T 21
K
k mkkm DTPTwPDwP1
||
Topic-based Relevance Model
Topic-based Relevance Model
TRM can be represented by:
( Word of the document all come from the same topic.)
wHP ,TRM
M
m
K
k
L
l klkmkm ThPTwPDTPDP1 1 1
|||
K
k mkkm DTPTwPDwP1
||
M
m
L
l mlmm DhPDwPDPHP
wHPwHP
1 1RM
RMRM ||
,,
Modeling Pairwise Word Association• Instead of using RM to model the association
between an entire search history and a newly decoded word, we can also use RM to render the pairwise word association between a word in the history and a newly decoded word .lh w
Mm mmlml D|wPD|hPDPw,hP 1PRM
ll
l hP
whPhwP
PRM
PRMPRM
,
Modeling Pairwise Word Association
• A “composite” conditional probability for the search history to predict can be obtained by linearly combining of all words in the history:
• Where the value of the nonnegative weighting
coefficients are empirically set to be exponentially decayed.
H w lh|wPPRM lh
lL
l l hwPHwP ||PRM1PRM
l
• By the same token, a set of latent topics to describe word-word co-occurrence relationships in a relevant document , and the pairwise word association between a history word and the decodedword is thus modeled by
whP l ,TPRM
M
m
K
k kklmkm TwPThPDTPDP1 1
|||
KT,,T,T 21
lh
w
Experimental setup• Speech corpus: 196 hours(MATBN)• Vocabulary size: 72 thousands words• Trigram language model was estimated from a background text corpus consisting
of 170 million Chinese characters.• The baseline rescoring procedure with the background trigram language model
results in a character error rate(CER) of 20.08% on the test set.
Experimental• 1. We assess the effectiveness of RM and PRM with respect to different
numbers of retrieved documents being used to approximate the relevance class.
• 2.Measure the goodness of RM and PRM when a set of latent topic is additionally employed to describe the word-word co-occurrence relationships in a relevant document ,when the resulting models are TRM and TPRM.
• 3. Compare the proposed methods with several well-practiced language model adaption methods.
Experimental
Document No.
RM PRM
8 19.40 19.25
16 19.40 19.26
32 19.42 19.23
64 19.29 19.54
128 19.35 19.44
This reveals that only a small subset of relevant documents retrieved from the contemporaneous corpus is sufficient enough for dynamic language model adaptation.
PRM shows its superiority over RM for almost all adaptation settings.
Results of RM and PRM (in CER(%))
Experimental
Topic NO TRM TPRM
Uniform Priors
16 19.25 19.13
32 19.27 19.15
64 19.31 19.14
128 19.30 19.23
Dirichet Priors
16 19.26 19.27
32 19.35 19.14
64 19.30 19.11
128 19.17 19.24
Results of TRM and TPRM (in CER(%))
While simply assuming that the model parameters are uniformly distributed tends to perform Slightly worse than that with the Dirichlat prior assumption with their best setting.
Experimental• These results are at the same performance level as that obtained by
TPRM.• On the other hand , TBLM has its best CER of 19.32% , for which the
corresponding number of trigger pairs was determined using the development set.
• Our proposed methods seem to be good surrogates for the exiting language model adaptation methods , in terms of the CER reduction.
Topic No. PLSA LDA WTM WVM
16 19.21 19.29 19.02 19.09
32 19.22 19.30 18.98 18.95
64 19.17 19.28 19.01 19.00
128 19.15 19.15 18.89 19.00
Conclusion• We study a novel use of relevance information for dynamic
language model adaptation in speech recognition.• Our methods not only inherit the merits of several existing
techniques but also provide a flexible but systematic way to render the lexical and topical relationships between a search history and an upcoming word.
• Empirical results on large vocabulary continuous speech recognition seem to demonstrate the utility of the
presented models.• These methods can also be used to expand query models for
spoken document retrieval (SDR) tasks.