49
Beat the Mean Bandit ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University)

Beat the Mean Bandit

  • Upload
    zagiri

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Beat the Mean Bandit. ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University). Optimizing Information Retrieval Systems. Increasingly reliant on user feedback E.g., clicks on search results Online learning is a popular modeling tool - PowerPoint PPT Presentation

Citation preview

Page 1: Beat the Mean Bandit

Beat the Mean Bandit

ICML 2011

Yisong Yue Carnegie Mellon University

Joint work with Thorsten Joachims (Cornell University)

Page 2: Beat the Mean Bandit

Optimizing Information Retrieval Systems

• Increasingly reliant on user feedback– E.g., clicks on search results

• Online learning is a popular modeling tool– Especially partial-information (bandit) settings

• Our focus: learning from relative preferences– Motivated by recent work on interleaved retrieval

evaluation (example following)

Page 3: Beat the Mean Bandit

Team Draft Interleaving(Comparison Oracle for Search)

Ranking A1.Napa Valley – The authority for lodging...

www.napavalley.com2.Napa Valley Wineries - Plan your wine...

www.napavalley.com/wineries3.Napa Valley College

www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley

www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine

www.napavintners.com6. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley

Ranking B1. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...

www.napavalley.com3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com5. NapaValley.org

www.napavalley.org6. The Napa Valley Marathon

www.napavalleymarathon.org

Presented Ranking1.Napa Valley – The authority for lodging...

www.napavalley.com2. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...

www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com 6.Napa Balley College

www.napavalley.edu/homex.asp7 NapaValley.org

www.napavalley.org

AB

[Radlinski et al. 2008]

Page 4: Beat the Mean Bandit

Ranking A1.Napa Valley – The authority for lodging...

www.napavalley.com2.Napa Valley Wineries - Plan your wine...

www.napavalley.com/wineries3.Napa Valley College

www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley

www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine

www.napavintners.com6. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley

Ranking B1. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...

www.napavalley.com3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com5. NapaValley.org

www.napavalley.org6. The Napa Valley Marathon

www.napavalleymarathon.org

Presented Ranking1.Napa Valley – The authority for lodging...

www.napavalley.com2. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...

www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com 6.Napa Balley College

www.napavalley.edu/homex.asp7 NapaValley.org

www.napavalley.org

B wins!

Click

[Radlinski et al. 2008]

Click

Team Draft Interleaving(Comparison Oracle for Search)

Page 5: Beat the Mean Bandit

…A B C Total wins Total losses

A wins vs… 0 1 0 1 0B wins vs… 0 0 0 0 1C wins vs… 0 0 0 0 0

Interleave A vs B

Page 6: Beat the Mean Bandit

Interleave A vs C

A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 0 0 0 1C wins vs… 1 0 0 1 0

Page 7: Beat the Mean Bandit

Interleave B vs C

A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 1 0 1 1C wins vs… 1 0 0 1 1

Page 8: Beat the Mean Bandit

Interleave A vs B

A B C Total wins Total lossesA wins vs… 0 1 0 1 2B wins vs… 0 2 0 2 1C wins vs… 1 0 0 1 1

Page 9: Beat the Mean Bandit

Outline

• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]

• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work

Page 10: Beat the Mean Bandit

Outline

• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]

• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work

• Algorithm: Beat-the-Mean

• Empirical Validation

Page 11: Beat the Mean Bandit

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

[Yue et al. 2009]

Page 12: Beat the Mean Bandit

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

• Cost function (regret):

• (bt, bt’) are the two bandits chosen• b* is the overall best one• (% users who prefer best bandit over chosen ones)

T

tttT bbPbbPR

1

1)'*()*(

[Yue et al. 2009]

Page 13: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Page 14: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Compare E & F:•P(A > E) = 0.61•P(A > F) = 0.61•Incurred Regret = 0.22

T

tttT bbPbbPR

1

1)'*()*(

Page 15: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Compare B & C:•P(A > B) = 0.55•P(A > C) = 0.55•Incurred Regret = 0.10

T

tttT bbPbbPR

1

1)'*()*(

Page 16: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Compare A & A:•P(A > A) = 0.50•P(A > A) = 0.50•Incurred Regret = 0.00

T

tttT bbPbbPR

1

1)'*()*(

Interleaving shows ranking produced by A.

Page 17: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

Violation in internal consistency!For strong stochastic transitivity: •A > D should be at least 0.06

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Page 18: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

Violation in internal consistency!For strong stochastic transitivity: •C > E should be at least 0.04

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Page 19: Beat the Mean Bandit

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

Violation in internal consistency!For strong stochastic transitivity: •D > F should be at least 0.04

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Page 20: Beat the Mean Bandit

Modeling Assumptions

• P(bi > bj) = ½ + εij

• Let b1 be the best overall bandit

• Relaxed Stochastic Transitivity– For three bandits b1 > bj > bk :– γ ≥ 1 (γ = 1 for strong transitivity **)– Relaxed internal consistency property

• Stochastic Triangle Inequality– For three bandits b1 > bj > bk :– Diminishing returns property

jkjk 11

(** γ = 1 required in previous work, and required to apply for all bandit triplets)

Page 21: Beat the Mean Bandit

Example Pairwise Preferences

A B C D E FA 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

γ = 1.5

jkjk , max 11

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Page 22: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Page 23: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Comparison Results

Page 24: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Mean Score &Confidence Interval

Page 25: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

A’s performance vs rest

Page 26: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

A’s mean performance

Page 27: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Page 28: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Page 29: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Page 30: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

01

00

00

00

0.001

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Page 31: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

01

00

00

00

0.001

0.00 1.00

E wins Total

01

00

00

00

00

00

0.001

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Page 32: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

01

00

00

00

0.001

0.00 1.00

E wins Total

01

00

00

00

00

00

0.001

0.00 1.00

F wins Total

00

00

01

00

00

00

0.001

0.00 1.00

Page 33: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1624

1122

1628

2030

1321

0.59150

0.49 0.69

B wins Total

1430

1530

1319

1520

1726

2025

0.63150

0.53 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.55150

0.45 0.65

D winsTotal

920

1528

1021

1123

1528

1530

0.50150

0.40 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.43150

0.33 0.53

Page 34: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1624

1122

1628

2030

1321

0.59150

0.49 0.69

B wins Total

1430

1530

1319

1520

1726

2025

0.63150

0.53 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.55150

0.45 0.65

D winsTotal

920

1528

1021

1123

1528

1530

0.50150

0.40 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.43150

0.33 0.53

B dominates E!(B’s lower bound greater than E’s upper bound)

Page 35: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1624

1122

1628

2030

1321

0.58120

0.49 0.67

B wins Total

1430

1530

1319

1520

1526

2025

0.62124

0.51 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.50126

0.39 0.61

D winsTotal

920

1528

1021

1123

1528

1530

0.49122

0.38 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.42120

0.31 0.53

Page 36: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1725

1122

1628

2030

1321

0.58121

0.49 0.67

B wins Total

1430

1530

1319

1520

1526

2025

0.62124

0.51 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.50126

0.39 0.61

D winsTotal

920

1528

1021

1123

1528

1530

0.49122

0.38 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.42120

0.31 0.53

Page 37: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1530

1929

1428

1833

2330

1525

0.56145

0.46 0.66

B wins Total

1533

1734

1524

2027

1526

2327

0.62145

0.52 0.72

C wins Total

1331

1128

1429

1530

2024

1627

0.48145

0.38 0.68

D winsTotal

1126

1731

1226

1429

1528

1733

0.49145

0.39 0.59

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

Page 38: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1530

1929

1428

1833

2330

1525

0.56145

0.46 0.66

B wins Total

1533

1734

1524

2027

1526

2327

0.62145

0.52 0.72

C wins Total

1331

1128

1429

1530

2024

1627

0.48145

0.38 0.68

D winsTotal

1126

1731

1226

1429

1528

1733

0.49145

0.39 0.59

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

B dominates F!(B’s lower bound greater than F’s upper bound)

Page 39: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1530

1929

1428

1833

2330

1525

0.55120

0.43 0.67

B wins Total

1533

1734

1524

2027

1526

2327

0.56118

0.44 0.68

C wins Total

1331

1128

1429

1530

2024

1627

0.45118

0.33 0.57

D winsTotal

1126

1731

1226

1429

1528

1733

0.48112

0.36 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

Page 40: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

4180

4475

3870

4275

2330

1525

0.55300

0.48 0.62

B wins Total

3169

3878

4778

5175

1526

2327

0.56300

0.49 0.63

C wins Total

3377

3177

3570

3976

2024

1627

0.46300

0.49 0.53

D winsTotal

3076

2777

3574

3573

1528

1733

0.42300

0.35 0.49

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

B dominates D!(B’s lower bound greater than D’s upper bound)

Page 41: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

4180

4475

3870

4275

2330

1525

0.55225

0.46 0.64

B wins Total

3169

3878

4778

5175

1526

2327

0.52225

0.43 0.61

C wins Total

3377

3177

3570

3976

2024

1627

0.33225

0.24 0.42

D winsTotal

3076

2777

3574

3573

1528

1733

0.42300

0.35 0.49

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

A dominates C!(A’s lower bound greater than C’s upper bound)

Page 42: Beat the Mean Bandit

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

4180

4475

3870

4275

2330

1525

0.5180

0.38 0.64

B wins Total

3169

3878

4778

5175

1526

2327

0.52147

0.45 0.49

C wins Total

3377

3177

3570

3976

2024

1627

0.33225

0.24 0.42

D winsTotal

3076

2777

3574

3573

1528

1733

0.42300

0.35 0.49

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

Eventually… A is last bandit remaining. A is declared best bandit!

Page 43: Beat the Mean Bandit

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

Page 44: Beat the Mean Bandit

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound

• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ

Page 45: Beat the Mean Bandit

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound

• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ

• Thus, we can bound the total regret with high probability:– γ is typically close to 1

TKORT log

7

We also have a similar PAC guarantee.

Page 46: Beat the Mean Bandit

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound

• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ

• Thus, we can bound the total regret with high probability:– γ is typically close to 1

TKORT log

7

We also have a similar PAC guarantee.

Not possible with previous approaches!

Page 47: Beat the Mean Bandit

•Simulation experiment where γ = 1.3•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]

•Beat-the-Mean maintains linear regret guarantee•Interleaved Filter suffers quadratic regret in the worst case

Page 48: Beat the Mean Bandit

•Simulation experiment where γ = 1 (original DB setting)•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]

•Beat-the-Mean has high probability bound•Beat-the-Mean exhibits significantly lower variance

Page 49: Beat the Mean Bandit

Conclusions

• Online learning approach using pairwise feedback– Well-suited for optimizing information retrieval systems

from user feedback– Models violations in preference transitivity

• Algorithm: Beat-the-Mean– Regret linear in #bandits and logarithmic in #iterations– Degrades smoothly with transitivity violation– Stronger guarantees than previous work– Empirically supported