29
Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation Diego Moll´ a 1 Iman Amini 2 David Martinez 3 1 Macquarie University, Sydney, Australia 2 NICTA and RMIT, Melbourne, Australia 3 University of Melbourne, Melbourne, Australia GEAR’14, 11 July 2014

Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Embed Size (px)

DESCRIPTION

Slides of the paper presented at SIGIR GEAR 2014 (https://sites.google.com/site/sigirgear/)

Citation preview

Page 1: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Document Distance for the Automated Expansionof Relevance Judgements for Information Retrieval

Evaluation

Diego Molla1 Iman Amini2 David Martinez3

1Macquarie University,Sydney, Australia

2NICTA and RMIT,Melbourne, Australia

3University of Melbourne,Melbourne, Australia

GEAR’14, 11 July 2014

Page 2: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 2/26

Page 3: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 3/26

Page 4: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Evidence Based Medicine

http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/

GEAR 2014 Diego Molla, Iman Amini, David Martinez 4/26

Page 5: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Journal of Family Practice’s “Clinical Inquiries”

GEAR 2014 Diego Molla, Iman Amini, David Martinez 5/26

Page 6: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Can We Use the List of References for Evaluation?

GEAR 2014 Diego Molla, Iman Amini, David Martinez 6/26

Page 7: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 7/26

Page 8: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Expanding the Qrels

Buttcher et al. (2007) and Martinez et al. (2008) use MLmethods. ⇒ But we don’t have negative judgements.

Sakai & Lin (2010) use most popular documents retrieved byseveral systems. ⇒ But we don’t have several systems.

GEAR 2014 Diego Molla, Iman Amini, David Martinez 8/26

Page 9: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 9/26

Page 10: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 10/26

Page 11: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

The Cluster Hypothesis

Cluster Hypothesis (Rijsbergen, 1979)

“Closely associated documents tend to be relevant to the samerequests”

The cluster hypothesis has been used to improve the qualityof retrieval systems.

It has not been used to improve the quality of the evaluationof retrieval systems.

Our distance metrics

tf.idf of all words after lowercasing and removing stop words,then top 200 PCA components.

Distance metric: d(x , y) = 1− cos(x , y).

GEAR 2014 Diego Molla, Iman Amini, David Martinez 11/26

Page 12: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Computing Distances and Binning

1. Gather closest distances

q1 q2qrel11 qrel12 qrel21 qrel22

d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92

=⇒

0.75 d1 q1 no0.04 d1 q2 yes0.12 d2 q1 yes0.45 d2 q2 no0.34 d3 q1 yes0.75 d3 q2 no

2. Sort distances and bin

0.04 d1 q2 yes0.12 d2 q1 yes

0.34 d3 q1 yes0.45 d2 q2 no

0.75 d1 q1 no0.75 d3 q2 no

3. Compute bin ratio ofrelevance

1.0

0.5

0.0

GEAR 2014 Diego Molla, Iman Amini, David Martinez 12/26

Page 13: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Computing Distances and Binning

1. Gather closest distances

q1 q2qrel11 qrel12 qrel21 qrel22

d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92

=⇒

0.75 d1 q1 no0.04 d1 q2 yes0.12 d2 q1 yes0.45 d2 q2 no0.34 d3 q1 yes0.75 d3 q2 no

2. Sort distances and bin

0.04 d1 q2 yes0.12 d2 q1 yes

0.34 d3 q1 yes0.45 d2 q2 no

0.75 d1 q1 no0.75 d3 q2 no

3. Compute bin ratio ofrelevance

1.0

0.5

0.0

GEAR 2014 Diego Molla, Iman Amini, David Martinez 12/26

Page 14: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Computing Distances and Binning

1. Gather closest distances

q1 q2qrel11 qrel12 qrel21 qrel22

d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92

=⇒

0.75 d1 q1 no0.04 d1 q2 yes0.12 d2 q1 yes0.45 d2 q2 no0.34 d3 q1 yes0.75 d3 q2 no

2. Sort distances and bin

0.04 d1 q2 yes0.12 d2 q1 yes

0.34 d3 q1 yes0.45 d2 q2 no

0.75 d1 q1 no0.75 d3 q2 no

3. Compute bin ratio ofrelevance

1.0

0.5

0.0

GEAR 2014 Diego Molla, Iman Amini, David Martinez 12/26

Page 15: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 13/26

Page 16: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Data

The OHSUMED collection

Documents from MEDLINE.

63 queries, average of 50.97qrels per query.

Only positive qrels.

Original TREC-9 runs notavailable.

How we used it

16 distinct runs of Terrier.

Pooled top 100 docs per run.

All docs not in set of qrelsmarked as negative qrels.

The TREC-8 collection

News documents.

qrels pooled the top 100documents retrieved by theTREC-8 ad-hoc runs.

50 queries, average of 1,736qrels (94.56 positive).

Positive and negative qrels.

How we used it

All 116 runs from the TREC-8ad-hoc track.

Only first 100 qrels per query.

GEAR 2014 Diego Molla, Iman Amini, David Martinez 14/26

Page 17: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Distance vs. Relevance within the qrels

GEAR 2014 Diego Molla, Iman Amini, David Martinez 15/26

Page 18: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 16/26

Page 19: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 17/26

Page 20: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Adding Pseudo-qrels for Evaluation

1. Gather closest distances

q1 q2qrel11 qrel12 qrel21 qrel22

d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92

=⇒

0.75 d1 q10.04 d1 q20.12 d2 q10.45 d2 q20.34 d3 q10.75 d3 q2

2. Sort distances

0.04 d1 q20.12 d2 q10.34 d3 q10.45 d2 q20.75 d1 q10.75 d3 q2

3. Set threshold

0.04 d1 q20.12 d2 q10.34 d3 q10.45 d2 q20.75 d1 q10.75 d3 q2

GEAR 2014 Diego Molla, Iman Amini, David Martinez 18/26

Page 21: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Contents

1 BackgroundThe ScenarioRelated Work

2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets

3 Evaluation ResultsAdding Pseudo-qrelsEvaluation

GEAR 2014 Diego Molla, Iman Amini, David Martinez 19/26

Page 22: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Evaluation Method

Method1 Oracle: Evaluate the sample runs using all the available qrels.

2 Select a subset of qrels and expand it with pseudo qrels.

3 Compute Kendall’s Tau between Oracle and (2).

OHSUMED runs

16 algorithms fromTerrier 3.5.

TREC runs

All 116 runs fromTREC-8 ad-hoc track.

GEAR 2014 Diego Molla, Iman Amini, David Martinez 20/26

Page 23: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Evaluation Results — OHSUMED

GEAR 2014 Diego Molla, Iman Amini, David Martinez 21/26

Page 24: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Evaluation Results — TREC8

GEAR 2014 Diego Molla, Iman Amini, David Martinez 22/26

Page 25: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Evaluation Results — TREC8 (zoom)

GEAR 2014 Diego Molla, Iman Amini, David Martinez 23/26

Page 26: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Evaluation Results — TREC8 (different initial qrels)

GEAR 2014 Diego Molla, Iman Amini, David Martinez 24/26

Page 27: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Conclusions

Conclusions

Small evaluation improvement when adding documents similarto existing qrels.

The approach can be used when there are no negative qrels.

Further Work

Try other thresholds.

Try other distance metrics.

Double-check with human judgements.

GEAR 2014 Diego Molla, Iman Amini, David Martinez 25/26

Page 28: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Conclusions

Conclusions

Small evaluation improvement when adding documents similarto existing qrels.

The approach can be used when there are no negative qrels.

Further Work

Try other thresholds.

Try other distance metrics.

Double-check with human judgements.

GEAR 2014 Diego Molla, Iman Amini, David Martinez 25/26

Page 29: Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation

Background Document Distance to Expand Qrels Evaluation Results

Thank You

Questions?

Further information about our research:http://web.science.mq.edu.au/~diego/medicalnlp/

Diego Iman David

GEAR 2014 Diego Molla, Iman Amini, David Martinez 26/26