27
Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models

Bayesian inference for Plackett -Luce ranking models

  • Upload
    monte

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Bayesian inference for Packet-Lube ranking models. Bayesian inference for Plackett -Luce ranking models. John Guiver, Edward Snelson MSRC. Distributions over orderings. Many problems in ML/IR concern ranked lists of items - PowerPoint PPT Presentation

Citation preview

Page 1: Bayesian inference for  Plackett -Luce ranking models

Bayesian inference for Plackett-Luce ranking

modelsJohn Guiver, Edward Snelson

MSRC

Bayesian inference for Packet-Lube ranking models

Page 2: Bayesian inference for  Plackett -Luce ranking models

Distributions over orderings• Many problems in ML/IR concern ranked

lists of items• Data in the form of multiple independent

orderings of a set of K items• How to characterize such a set of

orderings?• Need to learn a parameterized probability

model over orderings

Page 3: Bayesian inference for  Plackett -Luce ranking models

Notation Items and rank positions are each indexed from ሼ1,⋯,𝐾ሽ≡ ℤ𝐾 A ranking 𝜌:ℤ𝐾 →ℤ𝐾 is a permutation which maps

item indices to position indices. 𝜌𝑖 is the rank position of item 𝑖

An ‘ordering’ 𝜔:ℤ𝐾 →ℤ𝐾 is a permutation which maps position indices to item indices. 𝜔𝑘 is the item whose rank position is 𝑘

Any ranking has a corresponding ordering, and vice versa, so that: 𝜔𝜌𝑖 = 𝑖 and 𝜌𝜔𝑘 = 𝑘

Page 4: Bayesian inference for  Plackett -Luce ranking models

Distributions• Ranking distributions are defined over the domain of

all K! rankings (or orderings)• A fully parameterised distribution would have a

probability for each possible ranking which sum to 1.– E.g. For three items:

𝑃3! = ሼ𝑝123,𝑝132,𝑝213,𝑝231,𝑝312,𝑝321ሽ, σ𝑝𝑖𝑗𝑘 = 1, 𝑝𝑖𝑗𝑘 > 0 • A ranking distribution is a point in this simplex• A model is a parameterised family within the

simplex

Page 5: Bayesian inference for  Plackett -Luce ranking models

Plackett-Luce: vase interpretation

vb

vr

vg

Probability:

bgr

r

vvvv

bg

g

vvv

b

b

vv

Page 6: Bayesian inference for  Plackett -Luce ranking models

Plackett-Luce model• PL likelihood for a single complete

ordering:

Page 7: Bayesian inference for  Plackett -Luce ranking models

obgr

g

vvvvv

bg

g

vvv

b

b

vv

Partial orderingsTop N

obr

b

vvvv

Bradley-Terry model for case of pairs

Plackett-Luce: vase interpretation

Page 8: Bayesian inference for  Plackett -Luce ranking models

Luce’s Choice Axiom

Page 9: Bayesian inference for  Plackett -Luce ranking models

Gumbel Thurstonian modelEach item represented by a score distribution on the real line.

Marginal matrixProbability of an item in a position

Page 10: Bayesian inference for  Plackett -Luce ranking models

Thurstonian Models, and Yellott’s Theorem

• Assume a Thurstonian Model with each score having identical distributions except for their means. Then:– The score distributions give rise to a Plackett-Luce model

if and only the scores are distributed according to a Gumbel distribution (Yellott)

• Result depends on some nice properties of the Gumbel distribution:𝐶𝐷𝐹: 𝒢ሺ𝑥a0𝜇,𝛽ሻ= 𝑒−𝑧 𝑤ℎ𝑒𝑟𝑒 𝑧ሺ𝑥ሻ= 𝑒−ሺ𝑥−𝜇ሻ𝛽 , 𝑃𝐷𝐹: ℊሺ𝑥a0𝜇,𝛽ሻ= 𝑧𝛽𝑒−𝑧

න ℊሺ𝑥a0𝜇,𝛽ሻ𝒢ሺ𝑥a0𝜇′,𝛽ሻ𝑑𝑥𝑡−∞ = 𝒢൫𝑡ห𝜇+ 𝛽𝑙𝑛൫1 + 𝑎ሺ𝜇,𝜇′ሻ൯,𝛽൯

൫1 + 𝑎ሺ𝜇,𝜇′ሻ൯

Page 11: Bayesian inference for  Plackett -Luce ranking models

Maximum likelihood estimation

• Hunter (2004) describes minorize/maximize (MM) algorithm to find MLE

• Can over-fit with sparse data (especially incomplete rankings)

• Strong assumption for convergence:– “in every possible partition of the items into two

nonempty subsets, some item in the second set ranks higher than some item in the first set at least once in the data”

Page 12: Bayesian inference for  Plackett -Luce ranking models

Bayesian inference: factor graph

vA vDvB vC vE

BAE

DE

Gamma priors

EBA

B

vvvv

EA

A

vvv

Page 13: Bayesian inference for  Plackett -Luce ranking models

Fully factored approximation• Posterior over P-L parameters, given

N orderings :

• Approximate as fully factorised product of Gammas:

Page 14: Bayesian inference for  Plackett -Luce ranking models

Expectation Propagation [Minka 2001]

Page 15: Bayesian inference for  Plackett -Luce ranking models

Alpha-divergenceKullback-Leibler (KL) divergence

Let p,q be two distributions (don’t need to be normalised)

Alpha-divergence ( is any real number)

Page 16: Bayesian inference for  Plackett -Luce ranking models

16

Alpha-divergence – special casesSimilarity measures between two distributions(p is the truth, and q an approximation)

α

Page 17: Bayesian inference for  Plackett -Luce ranking models

17

Minimum alpha-divergenceq is Gaussian, minimizes D(p||q)

= -∞ = 0 = 0.5 = 1 = ∞

Page 18: Bayesian inference for  Plackett -Luce ranking models

18

Structure of alpha space

0 1

zeroforcing

inclusive (zeroavoiding)

MFBP,EP

Page 19: Bayesian inference for  Plackett -Luce ranking models

Bayesian inference: factor graph

vA vDvB vC vE

BAE

DE

Gamma priors

EA

A

vvv

)()()1()(

)()(

:following eProject th

1

1

AEAEEA

EEA

AEA

vGamvGamvvdvvGam

vGamvv

vdvvGam

Page 20: Bayesian inference for  Plackett -Luce ranking models

Inferring known parameters

Page 21: Bayesian inference for  Plackett -Luce ranking models

Ranking NASCAR drivers

Page 22: Bayesian inference for  Plackett -Luce ranking models

Posterior rank distributionsMLEEP

Driver rank : 1 .... 83

Page 23: Bayesian inference for  Plackett -Luce ranking models

Conclusions and future work• We have given an efficient Bayesian

treatment for P-L models using Power EP• Advantage of Bayesian approach is:

– Avoid over-fitting on sparse data– Gives uncertainty information on the parameters– Gives estimation of model evidence

• Future work:– Mixture models– Feature-based ranking models

Page 25: Bayesian inference for  Plackett -Luce ranking models

Ranking movie genres

Page 26: Bayesian inference for  Plackett -Luce ranking models

Incomplete orderings• Internally consistent:

– “the probability of a particular ordering does not depend on the subset from which the items are assumed to be drawn”

• Likelihood for an incomplete ordering (only a few items or top-S items are ranked) simple:– only include factors for those items that

are actually ranked in datum n

Page 27: Bayesian inference for  Plackett -Luce ranking models

α = -1 power makes this tractable

Power EP for Plackett-Luce• A choice of α = -1 leads to a particularly nice

simplification for the P-L likelihood• An example of the type of calculation in the

EP updates, with a factor connecting two items A, E:

• Sum of Gammas can be projected back onto single Gamma

)(

)()1()(

)()(

1

1

A

EAEEA

EEA

AEA

vGam

vGamvvdvvGam

vGamvv

vdvvGam