Upload
zagiri
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Beat the Mean Bandit. ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University). Optimizing Information Retrieval Systems. Increasingly reliant on user feedback E.g., clicks on search results Online learning is a popular modeling tool - PowerPoint PPT Presentation
Citation preview
Beat the Mean Bandit
ICML 2011
Yisong Yue Carnegie Mellon University
Joint work with Thorsten Joachims (Cornell University)
Optimizing Information Retrieval Systems
• Increasingly reliant on user feedback– E.g., clicks on search results
• Online learning is a popular modeling tool– Especially partial-information (bandit) settings
• Our focus: learning from relative preferences– Motivated by recent work on interleaved retrieval
evaluation (example following)
Team Draft Interleaving(Comparison Oracle for Search)
Ranking A1.Napa Valley – The authority for lodging...
www.napavalley.com2.Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries3.Napa Valley College
www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine
www.napavintners.com6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...
www.napavalley.com3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com5. NapaValley.org
www.napavalley.org6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking1.Napa Valley – The authority for lodging...
www.napavalley.com2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com 6.Napa Balley College
www.napavalley.edu/homex.asp7 NapaValley.org
www.napavalley.org
AB
[Radlinski et al. 2008]
Ranking A1.Napa Valley – The authority for lodging...
www.napavalley.com2.Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries3.Napa Valley College
www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine
www.napavintners.com6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...
www.napavalley.com3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com5. NapaValley.org
www.napavalley.org6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking1.Napa Valley – The authority for lodging...
www.napavalley.com2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com 6.Napa Balley College
www.napavalley.edu/homex.asp7 NapaValley.org
www.napavalley.org
B wins!
Click
[Radlinski et al. 2008]
Click
Team Draft Interleaving(Comparison Oracle for Search)
…A B C Total wins Total losses
A wins vs… 0 1 0 1 0B wins vs… 0 0 0 0 1C wins vs… 0 0 0 0 0
Interleave A vs B
…
Interleave A vs C
A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 0 0 0 1C wins vs… 1 0 0 1 0
…
Interleave B vs C
A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 1 0 1 1C wins vs… 1 0 0 1 1
…
Interleave A vs B
A B C Total wins Total lossesA wins vs… 0 1 0 1 2B wins vs… 0 2 0 2 1C wins vs… 1 0 0 1 1
Outline
• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]
• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work
Outline
• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]
• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work
• Algorithm: Beat-the-Mean
• Empirical Validation
Dueling Bandits Problem
• Given K bandits b1, …, bK
• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions
[Yue et al. 2009]
Dueling Bandits Problem
• Given K bandits b1, …, bK
• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions
• Cost function (regret):
• (bt, bt’) are the two bandits chosen• b* is the overall best one• (% users who prefer best bandit over chosen ones)
T
tttT bbPbbPR
1
1)'*()*(
[Yue et al. 2009]
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Compare E & F:•P(A > E) = 0.61•P(A > F) = 0.61•Incurred Regret = 0.22
T
tttT bbPbbPR
1
1)'*()*(
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Compare B & C:•P(A > B) = 0.55•P(A > C) = 0.55•Incurred Regret = 0.10
T
tttT bbPbbPR
1
1)'*()*(
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Compare A & A:•P(A > A) = 0.50•P(A > A) = 0.50•Incurred Regret = 0.00
T
tttT bbPbbPR
1
1)'*()*(
Interleaving shows ranking produced by A.
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
Violation in internal consistency!For strong stochastic transitivity: •A > D should be at least 0.06
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
Violation in internal consistency!For strong stochastic transitivity: •C > E should be at least 0.04
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
Violation in internal consistency!For strong stochastic transitivity: •D > F should be at least 0.04
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Modeling Assumptions
• P(bi > bj) = ½ + εij
• Let b1 be the best overall bandit
• Relaxed Stochastic Transitivity– For three bandits b1 > bj > bk :– γ ≥ 1 (γ = 1 for strong transitivity **)– Relaxed internal consistency property
• Stochastic Triangle Inequality– For three bandits b1 > bj > bk :– Diminishing returns property
jkjk 11
(** γ = 1 required in previous work, and required to apply for all bandit triplets)
Example Pairwise Preferences
A B C D E FA 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
γ = 1.5
jkjk , max 11
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Comparison Results
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Mean Score &Confidence Interval
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
A’s performance vs rest
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
A’s mean performance
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
01
00
00
00
0.001
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
01
00
00
00
0.001
0.00 1.00
E wins Total
01
00
00
00
00
00
0.001
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
01
00
00
00
0.001
0.00 1.00
E wins Total
01
00
00
00
00
00
0.001
0.00 1.00
F wins Total
00
00
01
00
00
00
0.001
0.00 1.00
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1624
1122
1628
2030
1321
0.59150
0.49 0.69
B wins Total
1430
1530
1319
1520
1726
2025
0.63150
0.53 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.55150
0.45 0.65
D winsTotal
920
1528
1021
1123
1528
1530
0.50150
0.40 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.43150
0.33 0.53
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1624
1122
1628
2030
1321
0.59150
0.49 0.69
B wins Total
1430
1530
1319
1520
1726
2025
0.63150
0.53 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.55150
0.45 0.65
D winsTotal
920
1528
1021
1123
1528
1530
0.50150
0.40 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.43150
0.33 0.53
B dominates E!(B’s lower bound greater than E’s upper bound)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1624
1122
1628
2030
1321
0.58120
0.49 0.67
B wins Total
1430
1530
1319
1520
1526
2025
0.62124
0.51 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.50126
0.39 0.61
D winsTotal
920
1528
1021
1123
1528
1530
0.49122
0.38 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.42120
0.31 0.53
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1725
1122
1628
2030
1321
0.58121
0.49 0.67
B wins Total
1430
1530
1319
1520
1526
2025
0.62124
0.51 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.50126
0.39 0.61
D winsTotal
920
1528
1021
1123
1528
1530
0.49122
0.38 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.42120
0.31 0.53
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1530
1929
1428
1833
2330
1525
0.56145
0.46 0.66
B wins Total
1533
1734
1524
2027
1526
2327
0.62145
0.52 0.72
C wins Total
1331
1128
1429
1530
2024
1627
0.48145
0.38 0.68
D winsTotal
1126
1731
1226
1429
1528
1733
0.49145
0.39 0.59
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1530
1929
1428
1833
2330
1525
0.56145
0.46 0.66
B wins Total
1533
1734
1524
2027
1526
2327
0.62145
0.52 0.72
C wins Total
1331
1128
1429
1530
2024
1627
0.48145
0.38 0.68
D winsTotal
1126
1731
1226
1429
1528
1733
0.49145
0.39 0.59
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
B dominates F!(B’s lower bound greater than F’s upper bound)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1530
1929
1428
1833
2330
1525
0.55120
0.43 0.67
B wins Total
1533
1734
1524
2027
1526
2327
0.56118
0.44 0.68
C wins Total
1331
1128
1429
1530
2024
1627
0.45118
0.33 0.57
D winsTotal
1126
1731
1226
1429
1528
1733
0.48112
0.36 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
4180
4475
3870
4275
2330
1525
0.55300
0.48 0.62
B wins Total
3169
3878
4778
5175
1526
2327
0.56300
0.49 0.63
C wins Total
3377
3177
3570
3976
2024
1627
0.46300
0.49 0.53
D winsTotal
3076
2777
3574
3573
1528
1733
0.42300
0.35 0.49
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
B dominates D!(B’s lower bound greater than D’s upper bound)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
4180
4475
3870
4275
2330
1525
0.55225
0.46 0.64
B wins Total
3169
3878
4778
5175
1526
2327
0.52225
0.43 0.61
C wins Total
3377
3177
3570
3976
2024
1627
0.33225
0.24 0.42
D winsTotal
3076
2777
3574
3573
1528
1733
0.42300
0.35 0.49
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
A dominates C!(A’s lower bound greater than C’s upper bound)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
4180
4475
3870
4275
2330
1525
0.5180
0.38 0.64
B wins Total
3169
3878
4778
5175
1526
2327
0.52147
0.45 0.49
C wins Total
3377
3177
3570
3976
2024
1627
0.33225
0.24 0.42
D winsTotal
3076
2777
3574
3573
1528
1733
0.42300
0.35 0.49
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
Eventually… A is last bandit remaining. A is declared best bandit!
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound
• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound
• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ
• Thus, we can bound the total regret with high probability:– γ is typically close to 1
TKORT log
7
We also have a similar PAC guarantee.
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound
• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ
• Thus, we can bound the total regret with high probability:– γ is typically close to 1
TKORT log
7
We also have a similar PAC guarantee.
Not possible with previous approaches!
•Simulation experiment where γ = 1.3•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]
•Beat-the-Mean maintains linear regret guarantee•Interleaved Filter suffers quadratic regret in the worst case
•Simulation experiment where γ = 1 (original DB setting)•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]
•Beat-the-Mean has high probability bound•Beat-the-Mean exhibits significantly lower variance
Conclusions
• Online learning approach using pairwise feedback– Well-suited for optimizing information retrieval systems
from user feedback– Models violations in preference transitivity
• Algorithm: Beat-the-Mean– Regret linear in #bandits and logarithmic in #iterations– Degrades smoothly with transitivity violation– Stronger guarantees than previous work– Empirically supported