Upload
nelia
View
19
Download
1
Embed Size (px)
DESCRIPTION
Two-stage Language Models for Information Retrieval. ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University. *New Address Department of Computer Science University of Illinois, Urbana-Champaign. Motivation. Retrieval parameters are needed to - PowerPoint PPT Presentation
Citation preview
Two-stage Language Models for
Information Retrieval
ChengXiang Zhai*, John Lafferty
School of Computer Science
Carnegie Mellon University
*New Address
Department of Computer Science
University of Illinois, Urbana-Champaign
Motivation
• Retrieval parameters are needed to
– model different user preferences
– customize a retrieval model according to different queries and documents
• So far, parameters have been set through empirical experimentation
• Can we set parameters automatically?
Parameters in Traditional Models
• EXTERNAL to the model, hard to interpret
– Most parameters are introduced heuristically to implement our “intuition”
– As a result, no principles to quantify them
• Set through empirical experiments
– Lots of experimentation
– Optimality for new queries is not guaranteed
“k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).”
Example of Parameter Tuning (Okapi)
(Robertson et al. 1999)
The Way to Automatic Tuning ...
• Parameters must be PART of the model!
– Query modeling (explain difference in query)
– Document modeling (explain difference in doc)
• De-couple the influence of a query on parameter setting from that of documents
– To achieve stable setting of parameters
– To pre-compute query-independent parameters
Two-stage Language Models
Risk Minimization Retrieval Framework
Two-stage Dirichlet-Mixture smoothing
The Rest of the Talk
Parameter estimation
The Risk Minimization Framework(Lafferty & Zhai 01, Zhai 02)
Document Language ModelsDocuments
DOC MODELING
QueryQuery
Language Model
QUERY MODELING
Loss Function User
USER MODELING
Retrieval Decision: ?
Parameter Setting in Risk Minimization
QueryQuery
Language Model
Document Language Models
Loss Function User
Documents
Query model parameters
Doc model parameters
User model parameters
Estimate
Estimate
Set
Two-stage Language Models
Query Language Model
Document Language Model
Loss Function
otherwisec
ifl DQ
DQ
),(0),(
),|( Sdp D
Query
Doc
D
Q
d
),|( Uqp Qq
),ˆ|(),( UqpqdR DQ
Rank
Risk ranking formula
stage-1
stage-2
12
Smoothing!
Sensitivity in Traditional (“one-stage”)
Smoothing
Keyword
Verbose(sentence-like)
The Need of Two-stage Smoothing (I)
Accurate Estimation of Doc Model
Document
Text miningpaper
Language Model P(w|d)
…text 10/500=0.02mining 3/500=0.006assocation 1/500=0.002algorithm 2/500=0.004…data 0/500=0
…
Query = “data mining algorithms”
p(q) = p(“data”|d)p(“mining”|d)p(“algorithm”|d) = 0*0.006*0.004 = 0!
?
P(“data”|d) = ?
P(“unicorn”|d) = ?
The Need of Two-stage Smoothing (II)
Explanation of Noise in Query
Query = “the algorithms for data mining”
d1: 0.04 0.001 0.02 0.002 0.003 d2: 0.02 0.001 0.01 0.003 0.004
p( “algorithms”|d1) = p(“algorithm”|d2)p( “data”|d1) < p(“data”|d2)
p( “mining”|d1) < p(“mining”|d2)
But p(q|d1)>p(q|d2)!
We should make p(“the”) and p(“for”) less different for all docs.
Two-stage Dirichlet-Mixture Smoothing
c(w,d)
|d|P(w|d) =
+p(w|C)
+
Stage-1 Smoothing-Explain unseen words-Dirichlet prior-Add pseudo counts
(1-) + p(w|U)
Stage-2 Smoothing-Explain noise in query-2-component mixture-Linear interpolation
Estimating using leave-one-out
P(w1|d- w1)
P(w2|d- w2)
N
i Vw i
ii d
CwpdwcdwcCL
11 )
1||
)|(1),(log(),()|(
log-likelihood
)ˆ C|(μargmaxμ 1μ
L
Maximum Likelihood Estimator
Newton’s Method
Leave-one-outw1
w2
P(wn|d- wn)
wn
...
Estimating using Mixture Model
query
1
N
...
),|(maxargˆ
))|()ˆ|()((),|(
Uqp
UqpqpUqpN
i
m
jjdji i
1 1
1
Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm
Simultaneously adjust
, and 1,…, N to maximize
query likelihood
P(w|d1)d1
P(w|dN)dN
… ...
Stage-1
(1-)p(w|d1)+ p(w|U)
(1-)p(w|dN)+ p(w|U)
Stage-2
Effectiveness of Parameter Estimation
• Five databases
– News articles (AP, WSJ, ZIFF, FBIS, FT, LA)
– Government documents (Federal Register)
– Web pages
• Four types of queries
– Long vs. short
– Verbose (sentence-like) vs. keyword
• Results: Automatic 2-stage Optimal 1-stage
Collection query Optimal-JM Optimal-Dir Auto-2stageSK 20.3% 23.0% 22.2%*LK 36.8% 37.6% 37.4%SV 18.8% 20.9% 20.4%LV 28.8% 29.8% 29.2%SK 19.4% 22.3% 21.8%*LK 34.8% 35.3% 35.8%SV 17.2% 19.6% 19.9%LV 27.7% 28.2% 28.8%*SK 17.9% 21.5% 20.0%LK 32.6% 32.6% 32.2%SV 15.6% 18.5% 18.1%LV 26.7% 27.9% 27.9%*
AP88-89
WSJ87-92
ZIFF1-2
Automatic 2-stage results Optimal 1-stage results
Average precision (3 DB’s + 4 query types, 150 topics)
Automatic 2-stage results Optimal 1-stage results
Average precision ( 2 large DB’s + 2 query types, 50 topics)
Collection Query Optimal-JM Optimal-Dir Auto 2-Stage351-400title 0.167 0.186 0.182351-400long 0.222 0.224 0.23401-450title 0.239 0.256 0.257401-450long 0.265 0.26 0.268401-450title 0.243 0.294 0.278*401-450long 0.259 0.275 0.284Web
Disk4&5-CR
Conclusions
• Two-stage language models
– Direct modeling of both queries and documents
– Parameters are part of a probabilistic model
– Parameters can be estimated using standard estimation techniques
• Two-stage Dirichlet-Mixture smoothing
– Involves two meaningful parameters (I.e., document sample size and query noise)
– Achieves very good performance through automatically setting smoothing parameters
• It is possible to set parameters automatically!
Future Work
• Optimality analysis in the two-stage parameter space
• Offline vs. online estimation
• Alternative estimation methods
• Parameter estimation for more sophisticated language models (e.g., with feedback)
Thank you!