20
On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics Jun Wang Joint work with Jianhan Zhu Department of Computer Science University College London [email protected]

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Embed Size (px)

Citation preview

Page 1: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

On Statistical Analysis and Optimization of Information Retrieval

Effectiveness Metrics

Jun Wang

Joint work with Jianhan Zhu

Department of Computer Science

University College London

[email protected]

Page 2: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Motivation

IR Models

Calculate (relevance) scores for individual documents

Probability Indexing

BM25

Language Models

The Binary Independent Rel. Model

Page 3: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Motivation

✖m (a rank order | “true” relevance of documents))

A general definition:

Page 4: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

MotivationWe have different rank preferences and thus IR metrics

NDCG

IR ModelsMRR

MAP

?

Something missing in between

Page 5: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

MotivationThe fundamental question

What is the underlying generative retrieval process?

Page 6: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Outline

• What is happening right now• The statistical retrieval process• Text retrieval experiments

Page 7: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

What is happening right now (1)?

• Still focusing on (relevance) score, but with the acknowledgement the final rank context

– The “less is more” model [Chen&Karger 2006] extended the relevance model

– assumed the previously retrieved documents non-relevant when calculating the rel. of documents for the current rank position,

– equivalent to maximizing the Reciprocal Rank measure

Page 8: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

What is happening right now (2)?

• Still focusing on (relevance) score, but with the acknowledgement the final rank context

– In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]

Page 9: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

What is happening right now (3)?

• Focusing on IR metrics and Ranking– bypass the step of estimating the relevance states of

individual documents– construct a document ranking model from training data

by directly optimizing an IR metric [Volkovs&Zemel 2009]

• However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored [Yilmaz&Robertson2009]

Page 10: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

A “balanced” view of the retrieval process

– let us first understand (infer) the relevance of documents as accurate as possible,

– and to summarize it by the joint probability of documents’ relevance

– dependency between documents is considered

– Secondly, rank preference is specified by an IR metric.

– The rank decision making is a stochastic one due to the uncertainty about the relevance

– As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric

Given an IR Metric

Page 11: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

The statistical document ranking process

a = αργ µ αξα Ε(µ | θ)

= αργ µ αξα1 ,...,αΝ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))

ρ1 ,...,ρΝ

The joint probability of relevance given a query

IR metric:Input: 1.A rank order2.Relevance of docs. r1,...,rN

a1,...,aN

Page 12: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

The Optimal Ranker

uncertaintyFixed an IR Metric

OUTPUT: the estimated Performance Score

E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ

m

a1,...,aN

p(r1,...,rN | q)

E(m | q)

Page 13: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Now the question is how to calculate the Expected IR metric under the joint probability of relevance

if we predefine the IR metric

E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ

m(a1,...,aN | r1,...,rN )

Page 14: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank)

• Certain assumptions are needed

• The join distribution of relevance is summarized by the marginal means and co-variances

E(r1 | q),...,E(rN | q)cov(ri ,rj | q)

p(r1,...,rN | q)

Page 15: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Some of the results

• Expect Average Precision:

• Expected Reciprocal Rank (two documents):

E[ m ]

Page 16: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Properties of IR metrics under the uncertainty

Page 17: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

But, is this analysis can be used in practice?

• The key question is how to obtain the joint probability of relevance? – Click through data– Marginal mean

• Current IR models – relevance models, language models

- Co-variance of relevance- Use the documents’ score correlation to estimate the relevance

correlation. - It is query-independent. We approximate it by sampling queries

and calculating the correlation between documents’ ranking scores

E(r1 | q),..., E(rN | q)

cov(ri ,rj | q)

Page 18: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

TREC evaluation

Page 19: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

No free lunch

Page 20: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

The ideal can be applied for evaluation too.

uncertaintyFixed an IR Metric

Output the estimated Performance Score

m

a1,...,aN

p(r1,...,rN | q)

E(m | q)

Input a IR model

Relevance judgments