Language Modeling for Information Retrievalcs626-449/cs626-460-2008/public_html/2… · Language Modeling for Information Retrieval. Kluwer Academic Publishers, 2003. JOHN LAFFERTY

OutlineIntroduction

The Classical ApproachThe Language Modeling Approach

Smoothing TechniquesRelation with Classical Approach

Language Modeling for Information Retrieval

Manoj Kumar Chinnakotla

KReSITIIT Bombay

Language Technologies for the WebMar 2006

Manoj Kumar Chinnakotla Language Modeling for Information Retrieval

OutlineIntroduction



Outline

1 Introduction

2 The Classical Approach

3 The Language Modeling Approach

4 Smoothing Techniques

5 Relation with Classical Approach


OutlineIntroduction



Probabilistic Models

The Central Problem in IR


OutlineIntroduction




Is this Document Relevant?


OutlineIntroduction





Model uncertainties in the problem well

Example

Is this term relevant?

Is this document relevant?

The Random VariablesRelevance (R) -R2 f0; 1gDocuments (D) -D 2 fD1; D2; : : : ; DNgQuery (Q) -Q 2 fAll Possible QueriesgA Term (Ai) - Ai 2 f0; 1g


OutlineIntroduction




The Ranking Function

Rank documents based on Posterior Probability of Relevance

Score(D; Q) = P(R= 1jD; Q) (1)

Ranking using followinglog-odds ratiois equivalent

Score(D; Q) = logP(R= 1jD; Q)P(R= 0jD; Q) (2)


OutlineIntroduction




Probabilistic Ranking Principle

Due to Robertson [6]

Central theorem for Probabilistic IR

Theorem

Ranking documents using the log-odds ratio ofposterior probabilityof relevanceis optimalwith respect to various retrieval measures (likeAverage Precision).


OutlineIntroduction



The Classical Approach

Due to Robertson-Sparck Jones [7]

Generative Model of Relevance

Rank documents based on the following log-odds ratio

Score(D; Q) = logP(DjQ; R= 1)P(DjQ; R= 0) (3)

For queryQ, most of collectionC is irrelevant

P(:jQ; R= 0) � P(:jQ; C) (4)


OutlineIntroduction



Binary Independence Retrieval Model

Assumingterm independence, we have

Score(D; Q) = XAi2D

logP(Ai jQ; R= 1)

P(Ai jQ; C) (5)

Popularly known as “Binary Independence Retrieval (BIR)”Model

Need to estimateRelevance Distribution P(:jQ; R= 1)


OutlineIntroduction



Are we back to the Original Problem?

EstimatingP(:jQ; R= 1) is equivalent to solving the originalproblem!

Challenge - No sample relevant documents available initiallyCurrent Approaches

Choose some initial estimates forP(wjQ; R= 1)Iteratively assume topk documents retrieved arerelevantUpdate estimates

Accuracy depends on initial separation achieved


OutlineIntroduction









OutlineIntroduction









OutlineIntroduction









OutlineIntroduction



The Language Modeling Approach

Basic Idea (Ponte and Croft [5])

Assuming documentD is relevant, what is the likelihood of userchoosing current queryQ to retrieveD?

Model the language of each document as a distribution overwords (Unigram)Individual document distributionsP(wjD) called“LanguageModels”Rank documents based onposterior probability of documentgiven query

P(DjQ) = P(QjD)| {z }Query Likelihood

�Document Priorz }| {

P(D) (6)


OutlineIntroduction



A Shift in Paradigm

Some Immediate BenefitsAllows integration of document importance throughDocumentPrior P(D)Document Priorcould be estimated from Link Analysisalgorithms (Page Rank, HITS)Ease of Estimation - Document size usually larger than the queryDocument Language ModelsP(wjD) could be pre-computed atindex time

Assuming uniform document priors,Query Likelihood RankingFunctionis given by

Score(D; Q) = Yw2D

P(wjD) (7)


OutlineIntroduction



Smoothing Techniques

MotivationThe Maximum Likelihood Estimator (MLE) forP(wjD) given by

Pml = c(w;D)Pw2D c(w;D) (8)

Since document length is limited, MLEPml

Assigns zero probability to words not observed inDHas high variance

Solution - Smoothing the MLE using collection modelP(wjC)Example

Jelinik-Mercer Smoothing

P�(wjD) = �Pml(wjD) + (1� �)P(wjC) (9)


OutlineIntroduction



Modeling of Relevance

Relation with Classical Approach

Figure:Two different factorizations of the same jointP(D; QjR)Two Approaches EquivalentLM Approach makes additional assumptions

Justification for Assumptions

For a given documentD, a language model is actually a model of thequeries to which the document isrelevant.


OutlineIntroduction




Where is Relevance?

Notion of relevance assumed implicitly in the model

This is a problem while handling “Relevance Feedback”Solution - Query Models or Relevance Models [3, 4]

Relevance Model or Query Model - Distribution encoding theinformation needAssume queryQ to be sample from Relevance Model�R

New Ranking Function - Divergence Based

Score(D) = KL(Djj�R)= X

w

P(wjD) � logP(wjD)P(wj�R) (10)


OutlineIntroduction




Where is Relevance?






w



OutlineIntroduction




Where is Relevance?






w



OutlineIntroduction




Implications for the LM Approach

Problem of Retrieval) Estimating Two DistributionsRelevance ModelP(wj�R)Document Language ModelsP(wjD)

Offers natural way to incorporate “Relevance Feedback”

Given relevant documents, update Relevance Model�R


Appendix References

ReferencesI

CROFT, W. B., AND LAFFERTY, J.

Language Modeling for Information Retrieval.Kluwer Academic Publishers, 2003.

JOHN LAFFERTY AND CHENGXIANG ZHAI .

Probabilistic Relevance Models Based on Document and Query Generation.In Language Modeling for Information Retrieval(2003), vol. 13, Kluwer International Series on IR, pp. 1–10.

LAFFERTY, J.,AND ZHAI , C.

Document Language Models, Query Models, and Risk Minimization for Information Retrieval.In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development ininformation retrieval(New York, NY, USA, 2001), ACM Press, pp. 111–119.

LAVRENKO, V., AND CROFT, W. B.

Relevance Based Language Models.In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development ininformation retrieval(New York, NY, USA, 2001), ACM Press, pp. 120–127.

PONTE, J. M., AND CROFT, W. B.

A Language Modeling Approach to Information Retrieval.In SIGIR ‘98: Proceedings of the ACM SIGIR conference on Research and Development in Information Retrieval(1998), pp. 275–281.


Appendix References

ReferencesII

ROBERTSON, S. E.

The Probability Ranking Principle in IR.Readings in information retrieval(1997), 281–286.

ROBERTSON, S. E.,AND JONES, S.

Relevance Weighting of Search Terms.Journal of the American Society for Information Science 27(1976), 129–146.

YATES, R. B., AND NETO, B. R.

Modern Information Retrieval.Pearson Education, 2005.


Appendix References

Thank You