12
1 1 Mandl: Current Developments in Information Retrieval Evaluation Thomas Mandl Information Science University of Hildesheim [email protected] Tutorial @ ECIR Toulouse 6 th Apr. 2009 Current Developments in Information Retrieval Evaluation Mandl: Current Developments in Information Retrieval Evaluation Who Who am I? am I? Assistant Professor at University of Hildesheim Studies at University of Regensburg, Germany and University of Illinois at UC, USA PhD on Neural Networks in IR from University of Hildesheim Postdoc Thesis (Habilitation) 2006 on Quality in Web IR from University of Hildesheim Research on IR Participant at CLEF since 2002 Track Coordinator at CLEF since 2006 3 Mandl: Current Developments in Information Retrieval Evaluation Which system is better? Management approach? Mandl: Current Developments in Information Retrieval Evaluation Different Query Different Query types types Different Different evaluation evaluation • Navigational – In search of a homepage of company X • Informational – Yellow-Pages-Queries – question answering – Ad-hoc (Searching everything concerning topic X) 5 Mandl: Current Developments in Information Retrieval Evaluation „There must be some fundamental understanding of what it means to be good and what it means to be better“ (Bollmann/Cherniavsky 1983,3) 6 Mandl: Current Developments in Information Retrieval Evaluation documents (objects) Author Information Seeker Query Indexing Object- Attribute Matrix Document Corpus Query- Representation Result- Documents similarity- calculation representation Creation Formulation Evaluation IR Indexing

Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

1

1Mandl: Current Developments in Information Retrieval Evaluation

Thomas Mandl Information Science

University of Hildesheim

[email protected]

Tutorial @ ECIR

Toulouse6th Apr. 2009

Current Developments

in Information Retrieval

Evaluation

Mandl: Current Developments in Information Retrieval Evaluation

WhoWho am I?am I?

• Assistant Professor at University of Hildesheim

• Studies at University of Regensburg, Germany and

University of Illinois at UC, USA

• PhD on Neural Networks in IR from University of

Hildesheim

• Postdoc Thesis (Habilitation) 2006 on Quality in

Web IR from University of Hildesheim

• Research on IR

– Participant at CLEF since 2002

– Track Coordinator at CLEF since 2006

3Mandl: Current Developments in Information Retrieval Evaluation

• Which system is better?

• Management approach?

Mandl: Current Developments in Information Retrieval Evaluation

Different Query Different Query typestypes ––

Different Different evaluationevaluation

• Navigational

– In search of a homepage of company X

• Informational

– Yellow-Pages-Queries

– question answering

– Ad-hoc (Searching everything concerning topic X)

5Mandl: Current Developments in Information Retrieval Evaluation

„There must be some fundamental

understanding of what it means to be good

and what it means to be better“

(Bollmann/Cherniavsky 1983,3)

6Mandl: Current Developments in Information Retrieval Evaluation

documents

(objects)

Author

Information

Seeker

Query

Indexing

Object-

Attribute

Matrix

Document

Corpus

Query-

Representation

Result-

Documentssimilarity-

calculation

representation

Creation

Formulation

Evaluation IR

Indexing

Page 2: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

2

Mandl: Current Developments in Information Retrieval Evaluation

RoughRough OutlineOutline

•Cranfield

•Metrics

•Topics

•Users

8Mandl: Current Developments in Information Retrieval Evaluation

OverviewOverview

Cranfield Paradigm

Introduction

Validity

Evaluation Metrics

Binary relevance

Multi level relevance

Evaluation Initiatives

Topic Specific Analysis

Results

Optimization

User Studies

Bonus:

Site Search

Evaluation

Hands on

Activities

Mandl: Current Developments in Information Retrieval Evaluation

PART 1PART 1

Perspectives on the

Cranfield paradigm

Mandl: Current Developments in Information Retrieval Evaluation

WhyWhy evaluationevaluation??

• IR systems: numerous components, models and approaches

• not possible to predict the effectivity for a certaincollection

• No general prefercne for model or a certaincomponent has been proven

• The evaluation of effectivity is crucial

• A holistic evaluation of retrieval processes is difficult

• Success and satisfaction of the users should be theideal benchmark

Mandl: Current Developments in Information Retrieval Evaluation

WhyWhy evaluationevaluation??

• User satisfaction

– Proven documents help to supply the user's information need

– User interface

– System reaction time

– Adaptivity

• User-oriented evaluation is very complex and difficult

– individual and subjective impacts

• Mostly evaluation of Retrieval Systems

– User as „constant“

– Replaced by prototypical user (experts)

– Cranfield-Paradigm of evaluation

12Mandl: Current Developments in Information Retrieval Evaluation

RecallRecall and and PrecisionPrecision

DokumenterelevanterAnzahl

DokumenterelevanterrgefundendeAnzahlRecall =

Precision =DokumentegefundenerAnzahl

DokumenterelevanterrgefundendeAnzahl

• „The ability of the retrieval system to uncover

relevant documents is known as the recall

power of the system“ (Lancaster 1968,55)

Page 3: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

3

13Mandl: Current Developments in Information Retrieval Evaluation

• Which Retrieval model is the basis for Recall

and Precision?

Mandl: Current Developments in Information Retrieval Evaluation

ExamplesExamples

CLEF

year

Tas k Type Topic

language

number

runs

correlation

2000 Multilingual Eng lish 21 0.26

2001 Bilingual German 9 0.44

2001 Multilingual German 5 0.19

2001 Bilingual Eng lish 3 0.20

2001 Multilingual Eng lish 17 -0.34

2002 Bilingual German 4 0.33

2002 Multilingual German 4 0.43

2002 Bilingual Eng lish 51 0.40

2002 Monolingual German 21 0.45

2002 Monolingual Spanish 28 0.21

2003 Monolingual German 30 0.37

2003 Monolingual Spanish 38 0.39

2003 Monolingual Eng lish 11 0.16

2002 Multilingual Eng lish 32 0.29

2003 Bilingual German 24 0.21

2003 Bilingual Eng lish 8 0.41

2003 Multilingual Eng lish 74 0.31

0

0.2

0.4

0.6

0.8

1

1.2

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

15Mandl: Current Developments in Information Retrieval Evaluation

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 0,1 0,2 0,3 0,4 0,5

Determination „measuring point“

mostly Precision at a Recall of

0,1 0,2 0,3 ...

-> mean (arithmetic) ->

Average Precision (AP)

16Mandl: Current Developments in Information Retrieval Evaluation

OverviewOverview

• Cranfield Paradigm

– Introduction

– Validity

• Evaluation Metrics

– Binary relevance

– Multi-level relevance

• Evaluation initiative

• View on single queries

– Analysis

– Topic specific Optimisation

• User studies

17Mandl: Current Developments in Information Retrieval Evaluation

RelevanceRelevance

• Situational relevance describes the

(actual) utility of documents concerning

the information needs

– virtually hardly to capture

– rather a theoretical construct

• Pertinence is the utility observed by the

user concerning her/his information

need

cf. Fuhr 2003 18Mandl: Current Developments in Information Retrieval Evaluation

RelevanceRelevance

• Objective relevance is the relation betweenthe information need and the document, thatwas judged by one or several neutral observers

– Common basis of system evaluation!

– How objective can this be?

• System relevance marks the relevance of thedocument concerning the formal query, thatwas guessed by a system (= similarity – commonly described as: Retrieval value

– (english: Retrieval Status Value (RSV)

cf. Fuhr 2003

Page 4: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

4

19Mandl: Current Developments in Information Retrieval Evaluation

EstimationEstimation of of thethe recallrecall

• Precision is directly evident for every user of

an IR-System

• Recall however is neither evident for the user

nor is is possible to define it precisely with

adequate effort

– The number of relevant documents is unknown

– This is especially problematic for information

needs which aim at a high Recall (e.g. Patent

Novelty search)

cf. Fuhr 2003 20Mandl: Current Developments in Information Retrieval Evaluation

EstimationEstimation of of thethe RecallRecall

• Pooling-Method (Retrieval withseveral systems) – Apply several IR-Systems to the same set

of documents and mix the results of different systems

– Mostly strong overlapping in the sets of answers of the different systems, so thatthe effort doesn't increase linearly with theamount of analysed systems

cf. Fuhr 2003

21Mandl: Current Developments in Information Retrieval Evaluation

Relevant / Relevant / notnot relevantrelevant

• Binary Relevance decisions are often

criticised

• New Metrics for multi-level relevance being

are discussed

– Binary judgments prevail

– Lead to in similar results often

– More later

22Mandl: Current Developments in Information Retrieval Evaluation

EvaluationEvaluation

Cranfield-Paradigm of evaluation in the

Information Retrieval• To find objective evaluation standards for a comparison of

systems

• Maintain conditions for comparison constant

• Systems work with the same document corpus, same

information needs and same relevance judgments

• Abstraction from usage situation and context

23Mandl: Current Developments in Information Retrieval Evaluation

EvaluationEvaluation

Cranfield-Paradigm for evaluation in

Information Retrieval• Objective relevance is judged by a neutral user

• Relation between the expressed information need and the

document

• no individual and subjective relevance assessment in

situational context

• Currently, the basis of all evaluation initiatives in

Information Retrieval (TREC, CLEF, NTCIR, INEX, ...)

24Mandl: Current Developments in Information Retrieval Evaluation

TREC: Text Retrieval TREC: Text Retrieval ConferenceConference

• „TREC is a new ballgame for IR research and

development“ (Sparck Jones 1994)

• Evaluation initiative of the National Institute of

Standards and Technology (NIST) in the USA

• 1992: TREC-1 (Proceedings 1993)

Page 5: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

5

25Mandl: Current Developments in Information Retrieval Evaluation

CrossCross--LanguageLanguage Evaluation Forum Evaluation Forum

EU Förderung: DELOS NoEfor Digital Libraries Mandl et al. @ CLEF 2003 - 2006

Research on Evaluation

System development

Test environmentResearch on cross- and multi-lingual Information Retrieval Systems

Benchmarks

26Mandl: Current Developments in Information Retrieval Evaluation 26

ExampleExample TopicTopic

<num>10.2452/89-GC</num> <title>Trade fairs in Lower Saxony </title> <desc>Documents reporting about industrial or cultural

fairs in Lower Saxony. </desc> <narr>Relevant documents should contain information

about trade or industrial fairs which take place in the German federal state of Lower Saxony, i.e. name, type and place of the fair. The capital of Lower Saxony is Hanover. Other cities include Braunschweig, Osnabrück, Oldenburg and Göttingen. </narr> </top>

27Mandl: Current Developments in Information Retrieval Evaluation

ObjectivesObjectives of of evaluationevaluation initiativesinitiatives

• To find consistent evaluation standards for

retrieval systems (Standardisation)

• To provide comparison between different

systems

• To advance further development of IR

systems

• To consider the needs of the community

• To advance the evaluation methodology

28Mandl: Current Developments in Information Retrieval Evaluation

ProceedingsProceedings

• Test basis

– objects (documents, ....)

– queries (Topics)

• relevant information needs for potential users

– consistent weighting

• Time frame

– Release of topics

– Submission of results

– Publication of results

Mandl: Current Developments in Information Retrieval Evaluation

DocumentDocument collectioncollection

• Representative for a real world task

– Large

– Diverse

• Often used: News agency and news paper

collection

30Mandl: Current Developments in Information Retrieval Evaluation

RelevanceRelevance JudgmentJudgment

• Abstraction of individual user and his context

• Consistent evaluation

• Objective jurors, who are not in the user's

situation

• Objektive conclusions about relation between

content between topic and document

Page 6: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

6

31Mandl: Current Developments in Information Retrieval Evaluation

PoolingPooling MethodMethod

Jurors create topics

Relevance assessment by jurors

Pooling of all once found documents

Systems provide the Top 1000 documents for every topic

Ellen Voorhees – CLEF 2001 Workshop

32Mandl: Current Developments in Information Retrieval Evaluation

ProcedureProcedure

• Intellectual Evaluation

– relevant or not relevant

• statistical analysis

Mandl: Current Developments in Information Retrieval Evaluation Mandl: Current Developments in Information Retrieval Evaluation

35Mandl: Current Developments in Information Retrieval Evaluation

OverviewOverview

Cranfield Paradigm

Introduction

Validity

Evaluation Metrics

Binary relevance

Multi level relevance

Evaluation Initiatives

Topic Specific Analysis

Results

Optimization

User Studies

36Mandl: Current Developments in Information Retrieval Evaluation

How reliable is the

evaluation according to

the Cranfield-Paradigm?

Page 7: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

7

37Mandl: Current Developments in Information Retrieval Evaluation

GeoCLEF Monolingual EnglishGeoCLEF Monolingual English

3737

Bilingual 76% wrt Monolingual

Mandl: Current Developments in Information Retrieval Evaluation

Relevance AssessmentRelevance Assessment

Indirect Information

„foreign aid in Sub-Saharan Africa „

Is a document on the kidnapping of an aid worker

relevant?

„natural desasters in the Western USA“

Is a document on the insurance

costs caused by a

natural desaster relevant?

Mandl: Current Developments in Information Retrieval Evaluation

InterraterInterrater--ReliabilityReliability

• Isn’t relevance a rather subjective concept?

• Is there actually a consistency/agreement, if

several jurors evaluate the same set of

documents?

• Wouldn’t this lead to totally different

results?

• Asian approach?

40Mandl: Current Developments in Information Retrieval Evaluation

ComparisonComparison

• Several assessments could be created• By independent jurors

• Several rankings of systems are created

– Using alternative sets of relevance assessments

• How strongly do they differ/vary?

• How to compare rankings?

Mandl: Current Developments in Information Retrieval Evaluation

ComparisonComparison of of rankingsrankings

• Rank correlation coefficient

– Number of position changes (swaps)

– Kendalls Tau

– Spearman-Coefficient

42Mandl: Current Developments in Information Retrieval Evaluation

SubjektivitySubjektivity JurorsJurors

• Topic Developer is the primary juror

• For TREC 4, the documents in the pool were

evaluated several times

– Evaluation by primary jurors

– 200 relevant (as far as available) + 200 randomly

chosen documents form the new pool

– Two more jurors evaluate the new pool

Buckley & Voorhees 2005http://doi.acm.org/10.1145/290941.291017

Page 8: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

8

43Mandl: Current Developments in Information Retrieval Evaluation

SubjektivitySubjektivity JurorsJurors

• Results

–Overlap: between 30% and 40%

– Example: Topic 219: no document was classified

as relevant by any of the three jurors

– But: Less than 3% of the documents, that were

primarily rated as not relevant, were rated as

relevant afterwards

Buckley & Voorhees 2005http://doi.acm.org/10.1145/290941.291017

Mandl: Current Developments in Information Retrieval Evaluation

ChangesChanges of absolute of absolute valuesvalues

Mandl: Current Developments in Information Retrieval Evaluation

CorrelationCorrelation betweenbetween RankingsRankings

Rankings rather robust

46Mandl: Current Developments in Information Retrieval Evaluation

SubjectivitySubjectivity of Jurorsof Jurors

• Does the result depend on the jurors, whoevaluate the relevance of a document?

– Actually, jurors evaluate differently

– The absolute values for the system performancechange

– This does not have a large impact on the order of the systems

– The comparison between the systems turns out very similarly independent of the jurors

Buckley & Voorhees 2005

Mandl: Current Developments in Information Retrieval Evaluation

Expertise Level of JurorExpertise Level of Juror

• What is the effect of the knowledge of thejuror?

• Experiment with three different groups of jurors

– Gold: Topic Developer, Task experts

– Silver: Task experts

– Bronze: Non experts

• Data: Enterprise Track at TREC

– Three relevance levels

(Bailey et al. 2008)

Mandl: Current Developments in Information Retrieval Evaluation

Expertise Level of JurorExpertise Level of Juror

• Typical level of agreement

– Between 18% and 58%

– E.g. 24% of the highly relevant docs. as judged

by Bronze were judged irrelevant by Gold jurors

• Effect on system orderings

– 0.96 and 0.94 correlation between Gold and

Silver (for two measures)

– Only 0.73 and 0.66 between Gold and Silver

• Task Knowledge is important!

(Bailey et al. 2008)

Page 9: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

9

49Mandl: Current Developments in Information Retrieval Evaluation

CranfieldCranfield

• Test basis

– Objekts (Documents, ....)

• Sufficient?

– Queries (Topics)

• Sufficient queries?

• Large or small differences?

– Consistent relevance assessment

• Sufficient adjudgments?

• Difference between jurors

– Systems

• What if more system had participated> Pooling Method

Mandl: Current Developments in Information Retrieval Evaluation

QuestionsQuestions and and DoubtsDoubts

– Queries (Topics)

• Sufficient queries?

• Large or small differences?

– Relevance assessment

• Sufficient number of judgments

• What if ...

• Fewer topics would be available

• Fewer relevance judgments would be available

– Analysis

51Mandl: Current Developments in Information Retrieval Evaluation

VariablityVariablity

• OBSERVATION: The difference in the performance

between the systems is far smaller than between

the topics

52Mandl: Current Developments in Information Retrieval Evaluation

ExampleExample GeoCLEF 2006GeoCLEF 2006

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

Mono DE Mono EN Mono ES Mono PT

Variance of the systems in GeoCLEF 2006 Mandl 2008

53Mandl: Current Developments in Information Retrieval Evaluation

ExampleExample GeoCLEF 2006GeoCLEF 2006

Variance of the topics in GeoCLEF 2006

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

Mono DE Mono EN Mono ES Mono PT

Mandl 2008

54Mandl: Current Developments in Information Retrieval Evaluation

SignificanceSignificance teststests

• Descriptive Statistics: Description of

distributions e.g. mean and variance

• Statistics : evaluation of results

• Question: Do the statistical values vary

significantly from each other?

Page 10: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

10

55Mandl: Current Developments in Information Retrieval Evaluation

ResultResult of of SignificanceSignificance teststests

• The systems A and B do not vary significantly

from each other

• With a 95% probability there is a difference

and system A is better than System B

56Mandl: Current Developments in Information Retrieval Evaluation

GeoCLEF Monolingual EnglishGeoCLEF Monolingual English

5656

Bilingual 76% wrt Monolingual

Mandl et al. 2009

57Mandl: Current Developments in Information Retrieval Evaluation 5757

Mandl et al. 2009

Mandl: Current Developments in Information Retrieval Evaluation Harman 2005 http://www.haifa.il.ibm.com/sigir05-qp/index.html

Mandl: Current Developments in Information Retrieval Evaluation

VarianceVariance of of systemssystems

0.0

0.2

0.4

0.6

0.8

1.0

BK

GeoD

2

BK

GeoD

1

FU

HddG

YY

YTD

FU

HddG

YY

YTD

N

FU

HddG

YY

YM

TD

N

Avera

ge

Average Precision of the five best results for ten topics at GeoCLEF

2006 (monolingual German) Mandl 2008

Mandl: Current Developments in Information Retrieval Evaluation

SwapSwap RateRate

• Initial point: original ranking of the systems

– Leaving out topics

– Create a ranking

– How often is a „worse“ system in front of a better

one?

– Error rate

Page 11: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

11

Mandl: Current Developments in Information Retrieval Evaluation

Buckley & Voorhees 2000

Mandl: Current Developments in Information Retrieval Evaluation

SmallerSmaller Topic SetsTopic Sets

Sanderson & Zobel 2005

63Mandl: Current Developments in Information Retrieval Evaluation

NumberNumber of of topicstopics

• Do the common 50 topics suffice to compare

the systems?

• a certain difference has to exist between two

systems to prove statistically that one system

is better than the other

– From 50 topics the difference is located under 5%

– Partially as well significantly under 5% (absolute)

Sanderson & Zobel 2005

Mandl: Current Developments in Information Retrieval Evaluation

SubSub Set AnalysisSet Analysis

• Select n out of m topics

– Many combinations possible

– Repeat the selection

– Create a few rankings for each sub set size

– Calculate the average and show the corridor size

Mandl: Current Developments in Information Retrieval Evaluation

ExampleExample CorrelationCorrelation

0.0

0.2

0.4

0.6

0.8

1.0

1 3 5 7 9

11

13

15

17

19

21

23

25

Average correlation of the rankings for smaller sub sets of topics with the

ranking for all topics: Monolingual Spanish GeoCLEF 2006

Mandl 2008

Mandl: Current Developments in Information Retrieval Evaluation

• Statistical tests overrate the error rate

• Compared to Swap-Rate Analysis

• Tendency:

– More topics

– Fewer relevance judgments per topic

– -> higher reliability of the system comparison

Sanderson & Zobel 2005

Page 12: Tutorial @ ECIR Toulouse Who am I? › ~mandl › events › TutorialEvaluationECIR2009 › ECIR... · 1 Mandl: Current Developments in Information Retrieval Evaluation 1Thomas Mandl

12

67Mandl: Current Developments in Information Retrieval Evaluation

IsIs PoolingPooling sufficientsufficient??

• Do the systems find all relevant documents?

• Or are there many relevant documents, thatare not found, beyond the evaluated pools?

– The Pool has to have a certain quality

– Single systems do not contribute very much

– Leaving out single systems does essentially notchange the ranking of other systems

– Comparison remains robust

for TREC: Zobel (1998)

for CLEF: Braschler (2003)

68Mandl: Current Developments in Information Retrieval Evaluation

IsIs PoolingPooling sufficientsufficient??

• Incompleteness ...

– More topics are more important than an

exhaustive evaluation of the pools

– Higher statistical reliability with fewer judgments

Sanderson & Zobel 2005

Buckley & Voorhees 2004

69Mandl: Current Developments in Information Retrieval Evaluation

ObjectionObjection: : VariabilityVariability

• The difference of performance between the systems

is essentially smaller than between the topics

– robust performance over all queries more

important than a high average performance

– “Difficult” queries should gain higher weighting at

the evaluation

– -> Problem robustness

Buckley & Voorhees 2005, Mandl 2006

70Mandl: Current Developments in Information Retrieval Evaluation

ContextContext

• Is it possible to transfer the results of

experiments to other situations?

– No

Buckley & Voorhees 2005

Mandl: Current Developments in Information Retrieval Evaluation

RecentRecent DevelopmentsDevelopments

• Million Query Track at TREC

• Suggestion for Targeted Relevance

Judgments

(Moffat, Webber & Zobel 2007)

(Carterette et al. 2008)