31
Carnegie Mellon Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel Learning Hierarchical Riffle Independent Groupings from Rankings

Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

  • Upload
    ursa

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning Hierarchical Riffle Independent Groupings from Rankings. Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel. American Psychological Association Elections. Each ballot in election data is a ranked list of candidates [ Diaconis , 89] - PowerPoint PPT Presentation

Citation preview

Page 1: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

Carnegie Mellon

Jonathan Huang Carlos Guestrin

Carnegie Mellon UniversityICML 2010

Haifa, Israel

Learning Hierarchical Riffle Independent Groupings

from Rankings

Page 2: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

American Psychological Association Elections

Each ballot in election data is a ranked list of candidates [Diaconis, 89]5738 full ballots (1980 election)

5 candidatesWilliam BevanIra IscoeCharles KieslerMax SiegleLogan Wright

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

rankings

prob

abili

ty

candidates

rank

s

First-order matrix Prob(candidate i was

ranked j)

Candidate 3 has most

1st place votesAnd many last place votes

2

Page 3: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

3

Factorial Possibilitiesn items/candidates means n! rankings:

n n! Memory required to store n! doubles

5 120 <4 kilobytes15 1.31x1012 1729 petabytes (!!)

Possible learning biases for taming complexity:

Parametric? (e.g., Mallows, Plackett Luce,…)Sparsity? [Reid79, Jagabathula08]Independence/Graphical models? [Huang09]

(Not to mention sample complexity issues…)

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

rankings

prob

abili

ty

Page 4: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

4

Full independence on rankingsGraphical model for

joint ranking of 6 items

Rank of item C

A F

C D

B E

Mutual exclusivity leads to fully

connected model!

A F

C D

B E

{A,B,C}, {D,E,F} independent

Ranks of {A,B,C} a permutation of {1,2,3}

Ranks of {D,E,F} a permutation of {4,5,6}

Page 5: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

5

First-order independence condition

Rank

s

Candidates

Rank

s

Candidates

Not independent

Independent

vs.

A F

C D

B E

A F

C D

B E

candidates

rank

sProb(candidate i was ranked

j)

Sparsity: any ranking putting A, B, or C in ranks 4, 5, or 6 has zero probability!

But… such sparsity unlikely to exist in real data

Page 6: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

6

Drawing independent fruits/veggies

Veggies in ranks {1,2}, Fruits in ranks {3,4}(veggies always better than fruits)

Draw veggie rankings, fruit rankings independently:

Form joint ranking of veggies and fruits:Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke Broccoli>Veggies

Cherry Dates>Fruits

Broccoli

Artichoke

Cherry

Date

Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke

Broccoli

Date

Cherry

Full independence: fruit/veggie positions fixed ahead of time!

Page 7: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

7

Riffled independence

Riffled independence model [Huang, Guestrin, 2009]:

Draw interleaving of veggie/fruit positions (according to a distribution)

Artichoke Broccoli>Veggies

Cherry Dates>Fruits

Veggie

Fruit

Veggie

Fruit

>

>

>Veggie

Fruit

Veggie

Fruit

>

>

>Veggie

Fruit

Veggie

Fruit

>

>

>

Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke

Broccoli

Cherry

Dates

>

>

>

Page 8: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

8

Riffle ShufflesRiffle shuffle (dovetail shuffle)

Cut deck into two piles.Interleave piles.

Riffle shuffles corresponds to

distributions over interleavings Interleaving distribution

Page 9: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

9

American Psych. Assoc. Election (1980)

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

permutations

prob

abili

ty

Candidate 3 fully independent

Minimize: KL(empirical || riffle indep. approx.)

Best KL split{12345}

{1345} {2}

vs. Candidate 2 riffle independent

Page 10: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

10

Irish House of Parliament election 2002

64,081 votes, 14 candidates

Two main parties: Fianna Fail (FF)Fine Gael (FG)

Minor parties: Independents (I) Green Party (GP)Christian Solidarity (CS)Labour (L)Sinn Fein (SF)

[Gormley, Murphy, 2006]

Ireland

Page 11: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

11

Approximating the Irish Election

“True” first order marginals

Riffle Independent approx.

n=14 candidates

Major parties riffle independent of minor parties?

Sinn Fein, Christian Solidarity marginals not well captured by a single riffle

independent split!

CandidatesFF FF FFFG FGFGI I I I GPCS SF L

CandidatesFF FF FFFG FGFGI I I I GPCS SF L

Prob(candidate i was ranked j)

Page 12: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

12

Back to Fruits and Vegetables

banana, apple, orange, broccoli, carrot, lettuce

banana, apple, orange

broccoli, carrot, lettuce

candy, cookies

candy, cookies

??candy, cookies

??Need to factor out {candy, cookies}

first!(Sinn Fein, Christian

Solidarity)

Page 13: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

13

Hierarchical Decompositions

Banana, apple, orange, broccoli, carrot, lettuce,

candy, cookies

Banana, apple, orange Broccoli, carrot, lettuce

Candy, cookiesBanana, apple, orange, broccoli,

carrot, lettuce

All foods items

Healthy food Junk food

Fruits Vegetables

Fruits, Vegetables, marginally riffle independent

Page 14: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

14

Generative process for hierarchy

Interleave Healthy foods with Junk food

Interleave fruits/vegetables Rank Junk

food

Rank fruits Rank vegetables

better

Hierarchical Riffle Independent Models- Encode intuitive independence constraints

- Can be learned with lower sample complexity- Have interpretable parameters/structure

Page 15: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

15

Contributions

Structured Representation: Introduction of a hierarchical model based on riffled independence factorizations

Structure Learning Objectives: An objective function for learning model structure from ranking data

Structure Learning Algorithms: Efficient algorithms for optimizing the proposed objective

Page 16: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

16

Learning a HierarchyExponentially many hierarchies, each encoding a distinct set of independence assumptions

{12345}

{1345}

{2}

{13} {45}

{12345}

{1234}

{5}

{234} {1}

{2} {34}

{12345}

{245} {13}

{24} {5}

{12345}

{2345}

{1}

{34} {25}

{12345}

{1345}

{3}

{1} {452}

{12345}

{1234}

{5}

{234} {1}

{24} {3}

{12345}

{243} {15}

{34} {2}

Problem statement: given i.i.d. ranked data, learn the hierarchy that generated the

data

Page 17: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

17

Learning a HierarchyOur Approach: top-down partitioning of item set X={1,…,n}Binary splits at each stage

{12345}

{1234}

{5}

{234}{1}

Page 18: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

18

10 20 30 40 50 60 70 80 90 100 110 1200

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

permutations

prob

abili

ty

Minimize: KL(empirical || hierarchical model)

Hierarchical Decomposition

KL(true,best hierarchy)=6.76e-02, TV(true, best hierarchy)=1.44e-01

Best KL hierarchy{12345}

{1345} {2}

{13} {45}Research

psychologistsClinical

psychologists

Community psychologists

candidates

rank

s

1 2 3 4 5

1

2

3

4

5

candidates

rank

s

1 2 3 4 5

1

2

3

4

500.050.10.150.20.25

“True” first order Learned first order

Page 19: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

19

KL-based objective function

KL Objective:Minimize KL(true distribution || riffle independent approx)

Algorithm:For each binary partitioning of the item set: [A,B]

Estimate parameters: (ranking probabilities, for A, B and interleaving probabilities)Compute log likelihood of data

Return maximum likelihood partition

Need to search over exponentially many subsets!

If hierarchical structure of A or B is unknown, might not have enough samples to learn parameters!

Page 20: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

20

B

Finding fully independent sets by clustering

Why this won’t work (for riffled independence):

Pairs of candidates on opposite sides of the split can be strongly correlated in general(If I vote up Democrat, I am likely to vote down Republican)

Compute pairwise mutual informationsPartition resulting graph

APairwise measures unlikely to be

effective for detecting riffled independence!

Page 21: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

21

Higher order measure of riffled independence

Key insight:Riffled independence means: absolute rankings in A not informative about relative rankings in B

If i, (j,k) lie on opposite sides,Mutual information=0

Idea: measure mutual information between singleton rankings and pairwise

rankings

preference over Democrat i

relative preference over Republicans j & k

Page 22: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

22

Tripletwise objective function

Objective function (want to minimize):

A Ball items in set A

–plays nono role

inobjectiv

e

Tripletwise measure: no longer obviously a graph cut problem…

Page 23: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

23

Efficient Splitting: Anchors heuristic

Given two elements of A, we can decide whether rest of elements are in A or B:

large

small

anchor elements

Large

Small

Page 24: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

24

Efficient Splitting: Anchors heuristic

In practice, anchor elements a1, a2, unknown!

Theorem: Anchors heuristic recovers riffle independent split with high probability given polynomial samples (under certain strong connectivity assumptions)

Page 25: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

251 2 3 4 5

-6300

-6200

-6100

-6000

-5900

-5800

-5700

-5600

-5500

-5400

log-

likel

ihoo

d

log10(# samples)

true structure known

learned structure

random 1-chain (with learned parameters)

Structure learning with synthetic data

16 items, 4 items in each leaf

Page 26: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

26

Anchors on Irish election data

Irish Election hierarchy (first four splits):{1,2,3,4,5,6,7,8,9,10,11,12,13,14

}

{1,2,3,4,5,6,7,8,9,10,11,13,14

}{12}

{11}{1,2,3,4,5,6,7,8,9,10,13,14}

{2,3,5,6,7,8,9,10,14}

{1,4,13}

{2,5,6} {3,7,8,9,10,14}

Sinn Fein

Christian Solidarity

Fianna Fail

Fine Gael Independents, Labour, Green

Brute force optimization: 70.2sAnchors method: 2.3s

Running time

“True” first order Learned first order

Candidates

Ran

ks

2 4 6 8 10 12 14

2468101214

Candidates

Ran

ks

2 4 6 8 10 12 14

2468101214 0

0.05

0.1

0.15

0.2

0.25

Page 27: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

27

Irish log likelihood comparison

50 85 144 243 412 698 1180 2000-5000

-4500

logl

ikel

ihoo

d

# samples (logarithmic scale)

Optimized 1-chain

Optimized Hierarchy

Page 28: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

28

76 171 389 882 20010

0.2

0.4

0.6

0.8

1

# samples (log scale)

Suc

cess

rate

Major party {FF, FG} leaves recovered

Top partition recovered

All leaves recovered

Full tree recovered

Bootstrapped substructures: Irish data

Page 29: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

29

is a natural notion of independence for rankingscan be exploited for efficient inference, low sample complexityapproximately holds in many real datasets

Hierarchical riffled independencecaptures more structure in datastructure can be learned efficiently:

related to clustering and to graphical model structure learning efficient algorithm & polynomial sample complexity result

Riffled Independence…

Acknowledgements: Brendan Murphy and Claire Gormley provided the Irish voting datasets. Discussions with Marina Meila provided important initial ideas upon which this work is based.

Page 30: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

30

Sushi rankingDataset: 5000 preference rankings of 10 types of sushi

Types1. Ebi (shrimp)2. Anago (sea eel)3. Maguro (tuna)4. Ika (squid)5. Uni (sea urchin)6. Sake (salmon roe)7. Tamago (egg) 8. Toro (fatty tuna)9. Tekka-make (tuna roll)10. Kappa-maki (cucumber roll)

sushi

rank

s

Prob(sushi i was ranked j)

Fatty tuna (Toro)is a favorite!

No one likes cucumber roll !

Page 31: Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

31

{1,2,3,4,5,6,7,8,9,10}

{2} {1,3,4,5,6,7,8,9,10}

{1,3,5,6,7,8,9,10}{4}

{1,3,7,8,9,10} {5,6}

{3,7,8,9,10} {1}

{3,8,9} {7,10}

(sea eel)

(squid)

(sea urchin, salmoe roe)

(shrimp)

(tuna, fatty tuna, tuna roll) (egg, cucumber roll)

Sushi hierarchy