21
Two-stage Language Models for Information Retrieval ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University *New Address Department of Computer Science University of Illinois, Urbana-Champaign

Two-stage Language Models for Information Retrieval

  • Upload
    nelia

  • View
    19

  • Download
    1

Embed Size (px)

DESCRIPTION

Two-stage Language Models for Information Retrieval. ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University. *New Address Department of Computer Science University of Illinois, Urbana-Champaign. Motivation. Retrieval parameters are needed to - PowerPoint PPT Presentation

Citation preview

Page 1: Two-stage Language Models for Information Retrieval

Two-stage Language Models for

Information Retrieval

ChengXiang Zhai*, John Lafferty

School of Computer Science

Carnegie Mellon University

*New Address

Department of Computer Science

University of Illinois, Urbana-Champaign

Page 2: Two-stage Language Models for Information Retrieval

Motivation

• Retrieval parameters are needed to

– model different user preferences

– customize a retrieval model according to different queries and documents

• So far, parameters have been set through empirical experimentation

• Can we set parameters automatically?

Page 3: Two-stage Language Models for Information Retrieval

Parameters in Traditional Models

• EXTERNAL to the model, hard to interpret

– Most parameters are introduced heuristically to implement our “intuition”

– As a result, no principles to quantify them

• Set through empirical experiments

– Lots of experimentation

– Optimality for new queries is not guaranteed

Page 4: Two-stage Language Models for Information Retrieval

“k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).”

Example of Parameter Tuning (Okapi)

(Robertson et al. 1999)

Page 5: Two-stage Language Models for Information Retrieval

The Way to Automatic Tuning ...

• Parameters must be PART of the model!

– Query modeling (explain difference in query)

– Document modeling (explain difference in doc)

• De-couple the influence of a query on parameter setting from that of documents

– To achieve stable setting of parameters

– To pre-compute query-independent parameters

Page 6: Two-stage Language Models for Information Retrieval

Two-stage Language Models

Risk Minimization Retrieval Framework

Two-stage Dirichlet-Mixture smoothing

The Rest of the Talk

Parameter estimation

Page 7: Two-stage Language Models for Information Retrieval

The Risk Minimization Framework(Lafferty & Zhai 01, Zhai 02)

Document Language ModelsDocuments

DOC MODELING

QueryQuery

Language Model

QUERY MODELING

Loss Function User

USER MODELING

Retrieval Decision: ?

Page 8: Two-stage Language Models for Information Retrieval

Parameter Setting in Risk Minimization

QueryQuery

Language Model

Document Language Models

Loss Function User

Documents

Query model parameters

Doc model parameters

User model parameters

Estimate

Estimate

Set

Page 9: Two-stage Language Models for Information Retrieval

Two-stage Language Models

Query Language Model

Document Language Model

Loss Function

otherwisec

ifl DQ

DQ

),(0),(

),|( Sdp D

Query

Doc

D

Q

d

),|( Uqp Qq

),ˆ|(),( UqpqdR DQ

Rank

Risk ranking formula

stage-1

stage-2

12

Smoothing!

Page 10: Two-stage Language Models for Information Retrieval

Sensitivity in Traditional (“one-stage”)

Smoothing

Keyword

Verbose(sentence-like)

Page 11: Two-stage Language Models for Information Retrieval

The Need of Two-stage Smoothing (I)

Accurate Estimation of Doc Model

Document

Text miningpaper

Language Model P(w|d)

…text 10/500=0.02mining 3/500=0.006assocation 1/500=0.002algorithm 2/500=0.004…data 0/500=0

Query = “data mining algorithms”

p(q) = p(“data”|d)p(“mining”|d)p(“algorithm”|d) = 0*0.006*0.004 = 0!

?

P(“data”|d) = ?

P(“unicorn”|d) = ?

Page 12: Two-stage Language Models for Information Retrieval

The Need of Two-stage Smoothing (II)

Explanation of Noise in Query

Query = “the algorithms for data mining”

d1: 0.04 0.001 0.02 0.002 0.003 d2: 0.02 0.001 0.01 0.003 0.004

p( “algorithms”|d1) = p(“algorithm”|d2)p( “data”|d1) < p(“data”|d2)

p( “mining”|d1) < p(“mining”|d2)

But p(q|d1)>p(q|d2)!

We should make p(“the”) and p(“for”) less different for all docs.

Page 13: Two-stage Language Models for Information Retrieval

Two-stage Dirichlet-Mixture Smoothing

c(w,d)

|d|P(w|d) =

+p(w|C)

+

Stage-1 Smoothing-Explain unseen words-Dirichlet prior-Add pseudo counts

(1-) + p(w|U)

Stage-2 Smoothing-Explain noise in query-2-component mixture-Linear interpolation

Page 14: Two-stage Language Models for Information Retrieval

Estimating using leave-one-out

P(w1|d- w1)

P(w2|d- w2)

N

i Vw i

ii d

CwpdwcdwcCL

11 )

1||

)|(1),(log(),()|(

log-likelihood

)ˆ C|(μargmaxμ 1μ

L

Maximum Likelihood Estimator

Newton’s Method

Leave-one-outw1

w2

P(wn|d- wn)

wn

...

Page 15: Two-stage Language Models for Information Retrieval

Estimating using Mixture Model

query

1

N

...

),|(maxargˆ

))|()ˆ|()((),|(

Uqp

UqpqpUqpN

i

m

jjdji i

1 1

1

Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm

Simultaneously adjust

, and 1,…, N to maximize

query likelihood

P(w|d1)d1

P(w|dN)dN

… ...

Stage-1

(1-)p(w|d1)+ p(w|U)

(1-)p(w|dN)+ p(w|U)

Stage-2

Page 16: Two-stage Language Models for Information Retrieval

Effectiveness of Parameter Estimation

• Five databases

– News articles (AP, WSJ, ZIFF, FBIS, FT, LA)

– Government documents (Federal Register)

– Web pages

• Four types of queries

– Long vs. short

– Verbose (sentence-like) vs. keyword

• Results: Automatic 2-stage Optimal 1-stage

Page 17: Two-stage Language Models for Information Retrieval

Collection query Optimal-JM Optimal-Dir Auto-2stageSK 20.3% 23.0% 22.2%*LK 36.8% 37.6% 37.4%SV 18.8% 20.9% 20.4%LV 28.8% 29.8% 29.2%SK 19.4% 22.3% 21.8%*LK 34.8% 35.3% 35.8%SV 17.2% 19.6% 19.9%LV 27.7% 28.2% 28.8%*SK 17.9% 21.5% 20.0%LK 32.6% 32.6% 32.2%SV 15.6% 18.5% 18.1%LV 26.7% 27.9% 27.9%*

AP88-89

WSJ87-92

ZIFF1-2

Automatic 2-stage results Optimal 1-stage results

Average precision (3 DB’s + 4 query types, 150 topics)

Page 18: Two-stage Language Models for Information Retrieval

Automatic 2-stage results Optimal 1-stage results

Average precision ( 2 large DB’s + 2 query types, 50 topics)

Collection Query Optimal-JM Optimal-Dir Auto 2-Stage351-400title 0.167 0.186 0.182351-400long 0.222 0.224 0.23401-450title 0.239 0.256 0.257401-450long 0.265 0.26 0.268401-450title 0.243 0.294 0.278*401-450long 0.259 0.275 0.284Web

Disk4&5-CR

Page 19: Two-stage Language Models for Information Retrieval

Conclusions

• Two-stage language models

– Direct modeling of both queries and documents

– Parameters are part of a probabilistic model

– Parameters can be estimated using standard estimation techniques

• Two-stage Dirichlet-Mixture smoothing

– Involves two meaningful parameters (I.e., document sample size and query noise)

– Achieves very good performance through automatically setting smoothing parameters

• It is possible to set parameters automatically!

Page 20: Two-stage Language Models for Information Retrieval

Future Work

• Optimality analysis in the two-stage parameter space

• Offline vs. online estimation

• Alternative estimation methods

• Parameter estimation for more sophisticated language models (e.g., with feedback)

Page 21: Two-stage Language Models for Information Retrieval

Thank you!