Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation

Feeling Lucky? Multi-armed bandits for Ordering Judgements in Pooling-based Evaluation

David E. Losada

Javier Parapar, Álvaro Barreiro

ACM SAC, 2016

Evaluation

is crucial

compare retrieval algorithms, design new search solutions, ...

information retrieval evaluation: 3 main ingredients

queries

information retrieval evaluation: 3 main ingredients

relevance judgements

relevance assessments are incomplete

...search system 1 search system 2 search system 3 search system n

1. WSJ132. WSJ17..100. AP567101. AP555...

1. FT9412. WSJ13..100. WSJ19101. AP555...

1. ZF2072. AP881..100. FT967101. AP555...

1. WSJ132. CR93E..100. AP111101. AP555...

rankings of docs by estimated relevance (runs)

1. WSJ132. WSJ17..100. AP567101. AP555...

1. FT9412. WSJ13..100. WSJ19101. AP555...

1. ZF2072. AP881..100. FT967101. AP555...

1. WSJ132. CR93E..100. AP111101. AP555...

... pool depth

101. AP555...

... pool depth

101. AP555...

... pool depth

WSJ13WSJ17 AP567WSJ19AP111 CR93E

ZF207AP881FT967pool

101. AP555...

... pool depth

WSJ13WSJ17 AP567WSJ19AP111 CR93E

ZF207AP881FT967pool

...human assessments

finding relevant docs is the key

Most productive use of assessors' time is spent on judging relevant docs(Sanderson & Zobel, 2005)

Effective adjudication methods

Give priority to pooled docs that are potentially relevant

Can signifcantly reduce the num. of judgements required to identify a given num. of relevant docs

But most existing methods are adhoc...

Our main idea...

Cast doc adjudication as a reinforcement learning problem

Doc judging is an iterative process wherewe learn as judgements come in

Doc adjudication as a reinforcement learning problem

Initially we know nothing about the quality of the runs

? ? ? ?...

As judgements come in...

And we can adapt and allocate more docs for judgement from the most promising runs

Multi-armed bandits

unknown probabilities of giving a prize

Multi-armed bandits

play and observe the reward

Multi-armed bandits

exploration vs exploitation

exploits current knowledgespends no time sampling inferior actionsmaximizes expected reward on the next action

explores uncertain actionsgets more info about expected payofs may produce greater total reward in the long run

allocation methods: choose next action (play) based on past plays and obtained rewardsimplement diferent ways to trade between exploration and exploitation

Multi-armed bandits for ordering judgements

machines = runs

play a machine = select a run and get the next (unjudged) doc 1. WSJ13

2. CR93E..

(binary) reward = relevance/non-relevance of the selected doc

Allocation methods tested

random ϵn-greedy

with prob 1-ϵ plays the machine with the highest avg reward

with prob ϵ plays a random machine

prob of exploration (ϵ) decreases with the num. of plays

Upper Confdence Bound (UCB)

computes upper confdence bounds for avg rewards

conf. intervals get narrower with the number of plays

selects the machine with thehighest optimistic estimate

Allocation methods tested: Bayesian bandits

prior probabilities of giving a relevant doc: Uniform(0,1) ( or, equivalently, Beta(α,β), α,β=1 )

U(0,1) U(0,1) U(0,1) U(0,1)

evidence (O ∈ {0,1}) is Bernoulli (or, equivalently, Binomial(1,p) )

posterior probabilities of giving a relevant doc: Beta(α+O, β+1-O) (Beta: conjugate priorfor Binomial)

we iteratively update our estimations using Bayes:

two strategies to select the next machine:

Bayesian Learning Automaton (BLA): draws a sample from each the posterior distribution and selects the machine yieding the highest sample

MaxMean (MM): selects the machine with the highest expectation of the posterior distribution

test different document adjudication strategies in terms of how quickly they find the relevant docs in the pool

experiments

# rel docs found at diff. number of judgements performed

experiments: data

experiments: baselines

...WSJ13WSJ17 AP567WSJ19AP111 CR93E

ZF207AP881FT967pool

AP111, AP881, AP567, CR93E, FT967, WSJ13, ...

DocId: sorts by Doc Id

1. WSJ132. WSJ17..100. AP567

1. FT9412. WSJ13..100. WSJ19

1. WSJ132. CR93E..100. AP111

WSJ13, FT941, ZF207, WSJ17, CR93E, AP881 ...Rank: rank #1 docs go 1st, then rank #2 docs, ...

1. ZF2072. AP881..100. FT967

1. WSJ132. WSJ173. AP567..

1. FT9412. WSJ133. WSJ19..

1. WSJ132. CR93E3. AP111..

MoveToFront (MTF) (Cormack et al 98)

starts with uniform priorities for all runs (e.g. max priority=100) selects a random run (from those with max priority)

1. ZF2072. AP8813. FT967..

100 100 100 100

1. WSJ132. WSJ173. AP567..

1. FT9412. WSJ133. WSJ19..

1. WSJ132. CR93E3. AP111..

starts with uniform priorities for all runs (e.g. max priority=100) selects a random run (from those with max priority)

1. ZF2072. AP8813. FT967..

100 100 100 100

1. WSJ132. CR93E3. AP111..

extracts & judges docs from the selected run stays in the run until a non-rel doc is found

1. WSJ132. CR93E3. AP111..

WSJ13, CR93E

1. WSJ132. CR93E3. AP111..

WSJ13, CR93E, AP111

1. WSJ132. CR93E3. AP111..

extracts & judges docs from the selected run stays in the run until a non-rel doc is found when a non-rel doc is found, priority is decreased

100 99

WSJ13, CR93E, AP111

1. WSJ132. WSJ173. AP567..

1. FT9412. WSJ133. WSJ19..

1. WSJ132. CR93E3. AP111..

and we jump again to another max priority run

1. ZF2072. AP8813. FT967..

100 100 99 100

1. WSJ132. WSJ173. AP567.

...1. FT9412. WSJ133. WSJ19.

1. WSJ132. CR93E3. AP111.

Moffat et al.'s method (A) (Moffat et al 2007)

based on rank-biased precision (RBP) sums a rank-dependent score for each doc

1. ZF2072. AP8813. FT967.

0.200.160.13.

1. WSJ132. WSJ173. AP567.

...1. FT9412. WSJ133. WSJ19.

1. WSJ132. CR93E3. AP111.

Moffat et al.'s method (A) (Moffat et al 2007)

based on rank-biased precision (RBP) sums a rank-dependent score for each doc

1. ZF2072. AP8813. FT967.

0.200.160.13.

all docs are ranked by decreasing accummulated score and the ranked list defines the order in which docs are judged

WSJ13: 0.20+0.16+0.20+...

Moffat et al.'s method (B) (Moffat et al 2007)

evolution over A's method considers not only the rank-dependent doc's contributions but also the runs' residuals promotes the selection of docs from runs with many unjudged docs

Moffat et al.'s method (C) (Moffat et al 2007)

evolution over B's method considers not only the rank-dependent doc's and the residuals promotes the selection of docs from effective runs

MTF: best performing baseline

experiments: MTF vs bandit-based models

Random: weakest approach

BLA/UCB/ϵn-greedy are suboptimal

(sophisticated exploration/exploitation tradingnot needed)

MTF and MM: best performing methods

improved bandit-based models

MTF: forgets quickly about past rewards (a single non-relevance doc triggers a jump)

non-stationary bandit-based

solutions:

not all historical rewards count the same

MM-NS and BLA-NSnon-stationary variants of MM and BLA

stationary bandits

Beta( , ), , =1α β α β

rel docs add 1 to α non-rel docs add 1 to β

(after n iterations)

Beta(αn,β

= 1+jrel

= 1+jret

s – jrel

jrels : # judged relevant docs (retrieved by s)

jrets : # judged docs (retrieved by s)

all judged docs count the same

non-stationary bandits

Beta( , ), , =1α β α β

s= rate*

s+ rel

jrets= rate*

(after n iterations)

Beta(αn,β

= 1+jrel

= 1+jret

s – jrel

rate>1: weights more early relevant docs rate<1: weights more late relevant docs rate=0: only the last judged doc counts (BLA-NS, MM-NS) rate=1: stationary version

experiments: improved bandit-based models

conclusions

multi-arm bandits: formal & effective framework for doc adjudication in a pooling-based evaluation

it's not good to increasingly reduce exploration(UCB, ϵ

n-greedy)

it's good to react quickly to non-relevant docs(non-stationary variants)

future work

query-relatedvariabilities

hierarchical

bandits

stopping criteria

metasearch

reproduce our experiments & test new ideas!http://tec.citius.usc.es/ir/code/pooling_bandits.html

(our R code, instructions, etc)

David E. Losada

Javier Parapar, Álvaro Barreiro

Feeling Lucky? Multi-armed bandits for Ordering Judgements in Pooling-based Evaluation

Acknowledgements:

MsSaraKelly. picture pg 1 (modified).CC BY 2.0. Sanofi Pasteur. picture pg 2 (modified).CC BY-NC-ND 2.0. pedrik. picture pgs 3-5.CC BY 2.0. Christa Lohman. picture pg 3 (left).CC BY-NC-ND 2.0.Chris. picture pg 4 (tag cloud).CC BY 2.0.Daniel Horacio Agostini. picture pg 5 (right).CC BY-NC-ND 2.0.ScaarAT. picture pg 14.CC BY-NC-ND 2.0.Sebastien Wiertz. picture pg 15 (modified).CC BY 2.0. Willard. picture pg 16 (modified).CC BY-NC-ND 2.0.Jose Luis Cernadas Iglesias. picture pg 17 (modified).CC BY 2.0. Michelle Bender. picture pg 25 (left).CC BY-NC-ND 2.0.Robert Levy. picture pg 25 (right).CC BY-NC-ND 2.0.Simply Swim UK. picture pg 37.CC BY-SA 2.0.Sarah J. Poe. picture pg 55.CC BY-ND 2.0.Kate Brady. picture pg 58.CC BY 2.0. August Brill. picture pg 59.CC BY 2.0.

This work was supported by the “Ministerio de Economía y Competitividad”

of the Goverment of Spain and FEDER Funds underresearch projects

TIN2012-33867 and TIN2015-64282-R.

Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation

Technology

Pooling Operating Gas Pooling Areas

PFA JUDGEMENTS

Judgements of Learning

Multi-Armed Bandits in Metric Spaces - Brown …cs.brown.edu/~eli/papers/bandits-lip-full.pdfMulti-Armed Bandits in Metric Spaces Robert Kleinbergy Aleksandrs Slivkinsz Eli Upfalx

Bandits Stretchable Organizers

LANDMARK JUDGEMENTS ON ELECTION LAWeci.nic.in/eci_main/ElectoralLaws/.../LandmarkJudgementsVolIV.pdf · LANDMARK JUDGEMENTS ON ELECTION LAW VOLUME IV (A Compilation of important Judgements

Bethlehem Bandits Script - musiclinedirect.com . 8 Bethlehem Bandits ... Flip chart (See Production ... The Bethlehem Bandits should wear full length smocks tied around the waist with

NIA - Judgements

Albanian and Mounatain Bandits

Bandits Hobsbawm

Banish Backyard Bandits

BREADCRUMBS & BANDITS

Bad Bandits Preview

Overall Teacher Judgements

Dueling Bandits

POOLING ARRANGEMENTS FOR ENTERPRISE RISK CAPTIVES Spring Forum/Pooling... · POOLING ARRANGEMENTS FOR ENTERPRISE RISK CAPTIVES ... Enterprise Risk Pooling ... 35% of pooled risk BLUE

BANDITS: Bayesiandiﬀerentialsplicingaccountingfor sample ...eurobioc2019.bioconductor.org/slides/Flashlight4/Tiberi-Robinson_BANDITS.pdfEuropean Bioconductor Meeting 2019 BANDITS:

Multi-arm Bandits

Multi-Armed Bandits (MABs) - Purdue University Bandits (MABs) CS57300 - Data Mining Fall 2016 ... Multi-armed Bandits A B ... Formal Bandit Definition

Bandits and corsairs