28
Combinatorial Laplacian and Rank Aggregation Combinatorial Laplacian and Rank Aggregation Yuan Yao Stanford University ICIAM, Z¨ urich, July 16–20, 2007 Joint work with Lek-Heng Lim

Combinatorial Laplacian and Rank Aggregationlekheng/meetings/datamining/yao.pdf · Combinatorial Laplacian and Rank Aggregation Outline 1 Two Motivating Examples 2 Reflections on

  • Upload
    vobao

  • View
    226

  • Download
    4

Embed Size (px)

Citation preview

Combinatorial Laplacian and Rank Aggregation

Combinatorial Laplacian and Rank Aggregation

Yuan Yao

Stanford University

ICIAM, Zurich, July 16–20, 2007

Joint work with Lek-Heng Lim

Combinatorial Laplacian and Rank Aggregation

Outline

1 Two Motivating Examples

2 Reflections on RankingOrdinal vs. CardinalGlobal, Local, vs. Pairwise

3 Discrete Exterior Calculus and Combinatorial LaplacianDiscrete Exterior CalculusCombinatorial Laplacian Operator

4 Hodge TheoryCyclicity of Pairwise RankingsConsistency of Pairwise Rankings

5 Conclusions and Future Work

Combinatorial Laplacian and Rank Aggregation

Two Motivating Examples

Example I: Customer-Product Rating

Example (Customer-Product Rating)

m-by-n customer-product rating matrix X ∈ Rm×n

X typically contains lots of missing values (say ≥ 90%).

The first-order statistics, mean score for each product, might sufferfrom

most customers just rate a very small portion of the products

different products might have different raters, whence meanscores involve noise due to arbitrary individual rating scales

Combinatorial Laplacian and Rank Aggregation

Two Motivating Examples

From 1st Order to 2nd Order: Pairwise Rankings

The arithmetic mean of score difference between product iand j over all customers who have rated both of them,

gij =

∑k(Xkj − Xki )

#{k : Xki ,Xkj exist},

is translation invariant.

If all the scores are positive, the geometric mean of score ratioover all customers who have rated both i and j ,

gij =

(∏k

(Xkj

Xki

))1/#{k:Xki ,Xkj exist}

,

is scale invariant.

Combinatorial Laplacian and Rank Aggregation

Two Motivating Examples

More invariant

Define the pairwise ranking gij as the probability that productj is preferred to i in excess of a purely random choice,

gij = Pr{k : Xkj > Xki} −1

2.

This is invariant up to a monotone transformation.

Combinatorial Laplacian and Rank Aggregation

Two Motivating Examples

Example II: Purely Exchange Economics

Example (Pairwise ranking in exchange market)

n goods V = {1, . . . , n} in an exchange market, with anexchange rate matrix A, such that

1 unit i = aij unit j , aij > 0.

which is a reciprocal matrix, i.e. aij = 1/aji

Ideally, a product triple (i , j , k) is called triangulararbitrage-free, if aijajk = aik

Money (universal equivalent): does there exist a universalequivalent with pricing function p : V → R+, such that

aij = pj/pi?

Combinatorial Laplacian and Rank Aggregation

Two Motivating Examples

From Pairwise to Global

Under the logarithmic map, gij = log aij , we have anequivalent theory:• the triangular arbitrage-free is equivalent to

gij + gjk + gki = 0

• universal equivalent is a global ranking f : V → R(fi = log pi ) such that

gij = fj − fi =: (δ0f )(i , j)

Here• Global ranking ⇔ universal equivalent (price)• Pairwise ranking ⇔ exchange rates

Combinatorial Laplacian and Rank Aggregation

Two Motivating Examples

Observations

In both examples,

contain cardinal information

involve pairwise comparisons

How important are they?

Combinatorial Laplacian and Rank Aggregation

Reflections on Ranking

Ordinal vs. Cardinal

Ordinal Rank Aggregation

Problem: given a set of partial/total order {�i : i = 1, . . . , n}on a common set V , find

(�1, . . . ,�n) 7→�∗,

as a partial order on V , satisfying certain optimal condition.

Examples:• voting• Social Choice Theory

Notes:• Impossibility Theorems (Arrow et al.)• Hardness in solving (NP-hard for Kemeny optimality etc.)

Combinatorial Laplacian and Rank Aggregation

Reflections on Ranking

Ordinal vs. Cardinal

Cardinal Rank Aggregation

Problem: given a set of functions fi : V → R (i = 1, . . . , n),find

(f1, . . . , fn) 7→ f ∗

as a function on V , satisfying certain optimal condition.

Examples:• customer-product rating, e.g. Amazon, Netflix• stochastic choice with f as probability distributions on V ,e.g. Google search, cardinal utility in Economics

Notes• relaxations leave rooms for ‘possibility’• ordinal rankings induced from cardinal rankings, but withinformation loss

Combinatorial Laplacian and Rank Aggregation

Reflections on Ranking

Global, Local, vs. Pairwise

Global, Local, and Pairwise Rankings

Global ranking is a function on V , f : V → RLocal (partial) ranking: restriction of global ranking on asubset U, f ′ : U → RPairwise ranking: g : V × V → R (with gij = −gji )• Note: pairwise rankings are simply skew-symmetric matricessl(n) or certain equivalence classes in sl(n). Also we may viewpairwise rankings as weighted digraphs.

Combinatorial Laplacian and Rank Aggregation

Reflections on Ranking

Global, Local, vs. Pairwise

Why Pairwise Ranking?

Human mind can’t make preference judgements on moderatelylarge sets (e.g. no more than 7± 2 in psychology study)

But human can do pairwise comparison more easily andaccurately

Pairwise ranking naturally arises in tournaments, exchangeEconomics, etc.

Pairwise ranking may reduce the bias caused by thearbitrariness of rating scale

Pairwise ranking may contain more information than globalranking (to be seen soon)!

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Our Main Theme

Below we’ll outline an approach to analyze

cardinal, and

pairwise

rankings, in a perspective from discrete exterior calculus.

Briefly, we’ll reach an orthogonal decomposition of pairwiserankings, by Hodge Theory,

Pairwise = Global + Consistent Cyclic + Inconsistent Cyclic

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Discrete Exterior Calculus

Simplicial Complex of Products

Let V = {1, . . . , n} be the set of products or alternatives to beranked. Construct a simplicial complex K :

0-simplices K0: V

1-simplices K1: edges {i , j} such that comparison (i.e.pairwise ranking) between i and j exists

2-simplices K2: triangles {i , j , k} such that• every edge exists in K1

• more considerations on consistency, like triangulararbitrage-free

Note: it suffices here to construct K up to dimension 2!

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Discrete Exterior Calculus

Cochains

k-cochains C k(K , R): vector space of k + 1-alternatingtensors associated with Kk+1

{u : V k+1 → R, uiσ(0),...,iσ(k)= sign(σ)ui0,...,ik}

for (i0, . . . , ik) ∈ Kk+2, where σ ∈ Sk+1 is a permutation on(0, . . . , k).

Inner product in C k(K , R): standard Euclidean

In particular,• global ranking: 0-cochains f ∈ C 0(K , R) ∼= Rn

• pairwise ranking: 1-cochains g ∈ C 1(K , R), gij = −gji

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Discrete Exterior Calculus

Coboundary Maps

k-dimensional coboundary maps δk : C k(V , R)→ C k+1(V , R)are defined as the alternating difference operator

(δku)(i0, . . . , ik+1) =k+1∑j=0

(−1)j+1u(i0, . . . , ij−1, ij+1, . . . , ik+1)

δk plays the role of differentiation

δk+1 ◦ δk = 0

In particular,• (δ0f )(i , j) = fj − fi is gradient of global ranking f• (δ1g)(i , j , k) = gij + gjk + gki is curl of pairwise ranking g

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Discrete Exterior Calculus

A View from Discrete Exterior Calculus

We have the following cochain complex

C 0(K , R)δ0−→ C 1(K , R)

δ1−→ C 2(K , R),

in other words,

Globalgrad−−→ Pairwise

curl−−→ Triplewise

andcurl ◦ grad(Global Rankings) = 0

Pairwise rankings = alternating 2-tensors = skew-symmetricmatrices = log of Saaty’s reciprocal matrices

Triplewise rankings = alternating 3-tensors

See also: Douglas Arnold’s talk on Tuesday

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Discrete Exterior Calculus

What does it tell us?

Globalgrad−−→ Pairwise

curl−−→ Triplewise

grad(Global) (i.e. im(δ0)): a proper subset of pairwiserankings induced from global

curl(Pairwise) (i.e. im(δ1)): measures theconsistency/triangular arbitrage on triangle {i , j , k}

(δ1g)(i , j , k) = gij + gjk + gki

• ker(curl) (i.e. ker(δ1)): consistent, curl-free, triangulararbitrage-free, in particular —• curl ◦ grad(Global) = 0 (i.e. δ1 ◦ δ0 = 0) says global rankingsare consistent/curl-free

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Discrete Exterior Calculus

Reverse direction: conjugate operators

Gradientgrad∗(=− div)←−−−−−−−− Pairwise

curl∗←−−− Triplewise

grad∗: δT0 under Euclidean inner product, gives the total

inflow-outflow difference at each vertex (negative divergence)

(δT0 g)(i) =

∑g∗i −

∑gi∗

• ker(δT0 ), as divergence-free, is cyclic (interior/boundary)

curl∗: δT1 , gives interior cyclic pairwise rankings along

triangles in K2, which are inconsistent

Combinatorial Laplacian and Rank Aggregation

Discrete Exterior Calculus and Combinatorial Laplacian

Combinatorial Laplacian Operator

Combinatorial Laplacian

Define the k-dimensional combinatorial Laplacian,∆k : C k → C k by

∆k = δk−1δTk−1 + δT

k δk , k > 0

k = 0, ∆0 = δT0 δ0 is the well-known graph Laplacian

k = 1,∆1 = curl ◦ curl∗− div ◦ grad

Important Properties:• ∆k positive semi-definite• ker(∆k) = ker(δT

k−1) ∩ ker(δk): k-harmonics, dimensionequals to k-th Betti number• Hodge Decomposition Theorem

Combinatorial Laplacian and Rank Aggregation

Hodge Theory

Hodge Decomposition Theorem

Theorem

The space of pairwise rankings, C 1(V , R), admits an orthogonaldecomposition into three

C 1(V , R) = im(δ0)⊕ H1 ⊕ im(δT1 )

whereH1 = ker(δ1) ∩ ker(δT

0 ) = ker(∆1).

Combinatorial Laplacian and Rank Aggregation

Hodge Theory

Hodge Decomposition Illustration

Figure: Hodge Decomposition for Pairwise Rankings

Combinatorial Laplacian and Rank Aggregation

Hodge Theory

An Example from Jester Dataset

Figure: Hodge Decomposition for a pairwise ranking on four Jester jokes(No.1 - 4): g1 gives a global ranking (order: 1 > 2 > 3 > 4) whichaccounts for 90% of the total norm; g2 is the consistent cyclic part ontriangles {{123}, {124}} with 7% norm; and g3 is the inconsistent cyclicpart.

Combinatorial Laplacian and Rank Aggregation

Hodge Theory

Cyclicity of Pairwise Rankings

Acyclic-Cyclic Decomposition

Corollary

Every pairwise ranking admits a unique orthogonal decomposition,

g = projim(δ0) g + projker(δT0 ) g

i.e.Pairwise = grad(Global) + Cyclic

Note: Pairwise rankings induced from global are exactly acycliccomponent, as the orthogonal complement of cyclic pairwiserankings.

Combinatorial Laplacian and Rank Aggregation

Hodge Theory

Consistency of Pairwise Rankings

Consistency

Definition

A pairwise ranking g is consistent on a triangle (2-simplex) (i , j , k)if gij + gjk + gki = 0, in other words, (δ1g)(i , j , k) = 0.

Note:

Consistency depends on the triangles(2-simplices), so for a pairwise rankingg , | curl(g)(i , j , k)| measures the curldistribution over triangles (2-simplices)in K2

−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.150

1

2

3

4

5

6

7

8

9x 10

4 Curl distribution of Jester dataset

Combinatorial Laplacian and Rank Aggregation

Hodge Theory

Consistency of Pairwise Rankings

Consistent Decomposition

Corollary

1 A consistent pairwise ranking g associated with K, has aunique orthogonal decomposition

g = projim(δ0) g + projH1g = grad(Global) + Harmonic

i.e. where harmonic is cyclic on the “holes” of the complex K.

2 Every consistent pairwise ranking on a contractible K, isinduced from a global ranking.

Note: (2) rephrases the famous theorem in exchange Economics:triangular arbitrage-free implies arbitrage-free and the existence ofuniversal equivalent.

Combinatorial Laplacian and Rank Aggregation

Conclusions and Future Work

Conclusions and Future Work

Conclusions

Hodge Theory provides an orthogonal decomposition forpairwise rankings

Such decomposition is helpful to characterize the cyclicity and(triangular) consistency of pairwise rankings

Future

Comparisons with other spectral methods• Fourier Analysis on symmetry groups (Diaconis)• Markov Chain based methods (PageRank, etc.) as graphLaplacians

Design new algorithms

Applications on large scale data sets, e.g. Netflix dataset.

Combinatorial Laplacian and Rank Aggregation

Acknowledgements

Gunnar Carlsson (Stanford)

Persi Diaconis (Stanford)

Nick Eriksson (Stanford)

Fei Han (UCB)

Susan Holmes (Stanford)

Xiaoye Jiang (Stanford)

Ming Ma (UCB and Beijing Institute of Technology)

Michael Mahoney (Yahoo! Research)

Steve Smale (TTI-U Chicago and UCB)

Shmuel Weinberger (U Chicago)