Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP 2011 2014/1/17

Relevance Language Modeling For Speech RecognitionKuan-Yu Chen and Berlin Chen

National Taiwan Normal University, Taipei, TaiwanICASSP 2011

2014/1/17Reporter:陳思澄

Outline

• Introduction• Basic Relevance Model(RM)• Topic-based Relevance Model • Modeling Pairwise Word Association• Experimental• Conclusion

• In the relevance modeling to IR, each query is assumed to be associated with an unknown relevance class , and documents that are relevant to the information need expressed in the query are samples drawn from .

• When RM is applied to language modeling in speech recognition, we can conceptually regard the search history as a query and each of its immediately succeeding words as a document, and estimate a relevance model for modeling the relationship between and .

Introduction

R RelevanceDocuments

idQuery

R

R

Hw

HwP ,RM

H w

Basic Relevance Model• The task of language modeling in speech recognition

can be interpreted as calculating the conditional probability .

• is a search history , usually expressed as a sequence , and is one of its possible immediately succeeding word.

• Because the relevance class of each search history is not known in advance, A local feedback-like can be used to obtain a set of relevant documents to estimate the joint probability .

HwP |

wHP ,RM

lhhhH ,..., 21H

w

HR

Basic Relevance Model• where is the probability that we would randomly select

and is the joint probability of simultaneously observing H and w in .• The joint probability of observing H together with w is:

• Bag-of-word assumption: Assume the words are conditionallyIndependent given and their order is no importance.

M

m mLm DwhhPDPwHP1 1RM |,,,,

M

m

L

l mlmm DhPDwPDP1 1

||

mDPmD mLm DwhhPDP |,,,1

mD

mD

Basic Relevance Model• The conditional probability:

• The background n-gram language model trained on a large general corpus can provide the generic constraint information of lexical regularities.

HP

wHPHwP

RM

RMRM

,

L

lml

M

mm

L

lml

M

mmm

DhPDP

DhPDwPDP

11

11

)|()(

)|(|)(

)|()1()|()|( ,1 LLBGRMAdapt hhwPHwPHwP

• TRM makes a step forward to incorporate latent topic information into RM modeling

• Relevance documents of each search history are assumed to share a same set of latent topic variables

describing the “word-document” co-occurrence characteristics.

KT,,T,T 21

K

k mkkm DTPTwPDwP1

||

Topic-based Relevance Model

Topic-based Relevance Model

TRM can be represented by:

( Word of the document all come from the same topic.)

wHP ,TRM

M

m

K

k

L

l klkmkm ThPTwPDTPDP1 1 1

|||

K

k mkkm DTPTwPDwP1

||

M

m

L

l mlmm DhPDwPDPHP

wHPwHP

1 1RM

RMRM ||

,,

Modeling Pairwise Word Association• Instead of using RM to model the association

between an entire search history and a newly decoded word, we can also use RM to render the pairwise word association between a word in the history and a newly decoded word .lh w

Mm mmlml D|wPD|hPDPw,hP 1PRM

ll

l hP

whPhwP

PRM

PRMPRM

,

Modeling Pairwise Word Association

• A “composite” conditional probability for the search history to predict can be obtained by linearly combining of all words in the history:

• Where the value of the nonnegative weighting

coefficients are empirically set to be exponentially decayed.

H w lh|wPPRM lh

lL

l l hwPHwP ||PRM1PRM

l

• By the same token, a set of latent topics to describe word-word co-occurrence relationships in a relevant document , and the pairwise word association between a history word and the decodedword is thus modeled by

whP l ,TPRM

M

m

K

k kklmkm TwPThPDTPDP1 1

|||

KT,,T,T 21

lh

w

Experimental setup• Speech corpus: 196 hours(MATBN)• Vocabulary size: 72 thousands words• Trigram language model was estimated from a background text corpus consisting

of 170 million Chinese characters.• The baseline rescoring procedure with the background trigram language model

results in a character error rate(CER) of 20.08% on the test set.

Experimental• 1. We assess the effectiveness of RM and PRM with respect to different

numbers of retrieved documents being used to approximate the relevance class.

• 2.Measure the goodness of RM and PRM when a set of latent topic is additionally employed to describe the word-word co-occurrence relationships in a relevant document ,when the resulting models are TRM and TPRM.

• 3. Compare the proposed methods with several well-practiced language model adaption methods.

Experimental

Document No.

RM PRM

8 19.40 19.25

16 19.40 19.26

32 19.42 19.23

64 19.29 19.54

128 19.35 19.44

This reveals that only a small subset of relevant documents retrieved from the contemporaneous corpus is sufficient enough for dynamic language model adaptation.

PRM shows its superiority over RM for almost all adaptation settings.

Results of RM and PRM (in CER(%))

Experimental

Topic NO TRM TPRM

Uniform Priors

16 19.25 19.13

32 19.27 19.15

64 19.31 19.14

128 19.30 19.23

Dirichet Priors

16 19.26 19.27

32 19.35 19.14

64 19.30 19.11

128 19.17 19.24

Results of TRM and TPRM (in CER(%))

While simply assuming that the model parameters are uniformly distributed tends to perform Slightly worse than that with the Dirichlat prior assumption with their best setting.

Experimental• These results are at the same performance level as that obtained by

TPRM.• On the other hand , TBLM has its best CER of 19.32% , for which the

corresponding number of trigger pairs was determined using the development set.

• Our proposed methods seem to be good surrogates for the exiting language model adaptation methods , in terms of the CER reduction.

Topic No. PLSA LDA WTM WVM

16 19.21 19.29 19.02 19.09

32 19.22 19.30 18.98 18.95

64 19.17 19.28 19.01 19.00

128 19.15 19.15 18.89 19.00

Conclusion• We study a novel use of relevance information for dynamic

language model adaptation in speech recognition.• Our methods not only inherit the merits of several existing

techniques but also provide a flexible but systematic way to render the lexical and topical relationships between a search history and an upcoming word.

• Empirical results on large vocabulary continuous speech recognition seem to demonstrate the utility of the

presented models.• These methods can also be used to expand query models for

spoken document retrieval (SDR) tasks.