Ranking systems

Preview:

DESCRIPTION

 

Citation preview

The Page Rank Axioms

Based on Ranking Systems: The PageRank Axioms, by Alon Altman

and Moshe Tennenholtz.Presented by Aron Matskin

Judge and be prepared to be judged.Ayn Rand

רבי שמעון אומר שלשה כתרים הם: כתר תורה, וכתר כהונה, וכתר מלכות; וכתר שם

טוב עולה על גביהן.

פירקי אבות

Talking Points

Ranking and reputation in general Connections to the Internet world PageRank web ranking system PageRank representation theorem

Ranking: What

Abilities Choices Reputation Quality

Quality of information

Popularity Good looks What not?

Ranking: How

Voting Reputation systems Peer review Performance reviews Sporting competition Intuitive or ad-hoc

Ranking Systems’ Properties

Ad-hoc or systematic Centralized or distributed Feedback or indicator-based Peer, “second-party”, or third-party Update period Volatility Other?

Agents Ranking Themselves

Community reputation Professional associations Peer review Performance reviews (in part) Web page ranking

Ranking: Problems and Issues Eliciting information Information aggregation Information distribution Truthfulness

Strategic considerations Fear of retribution / expectation of kick-backs Coalition formation

Agent identification (pseudonym problem)

Need analysis!

Ranking Systems: Analysis

Empirical Because theories often lack

Theoretical Because theoreticians need to eat, too Provides valuable insight

Social Choice Theory

Two approaches: Normative – from properties to

implementations. Example: Arrow’s Impossibility Theorem

Descriptive – from implementation to properties. The Holy Grail: representation theorems (uniqueness results)

PageRank Method

A method for computing a popularity (or importance) ranking for every web page based on the graph of the web.

Has applications in search, browsing, and traffic estimation.

PageRank: Intuition

Internet pages form a directed graph

Node’s popularity measure is a positive real number. The higher number represents higher popularity. Let’s call it weight

Node’s weight is distributed equally among nodes it links to

We look for a stationary solution: the sum of weights a page receives from its backlinks is equal to its weight

b=2

c=1

a=2

1

1

1

1

PageRank as Random Walk

Suppose you land on a random page and proceed by clicking on hyper-links uniformly randomly

Then the (normalized) rank of a page is the probability of visiting it

PageRank: Some Math

Represent the graph as a matrix:

b

c

a 010

½01

½00

a b c

a

b

c

PageRank: Some Math

Find a solution of the equation:

AG r = r

Under the assumption that the graph is strongly connected there is only one normalized solution The assumption is not used by the real PageRank algorithm which uses workarounds to overcome it

The solution r is the rank vector.

Calculating PageRank

Take any non-zero vector r0

Let ri+1 = AG ri

Then the sequence rk converges to r

Since the Internet graph is an expander, the convergence is very fast: O(log n) steps to reach given precision

PageRank: The Good News

Intuitive Relatively easy to calculate Hard to manipulate Great for common case searches May be used to assess quality of

information (assuming popularity ≈ trust)

PageRank: The Bad News

PageRank is proprietary to Webmasters can’t manipulate it,

but can Every change in the algorithm is good

for someone and is bad for someone else

Popular become more popular Popularity ≠ quality of information

The Representation Theorem We next present a set of axioms (i.e.

properties) for ranking procedures Some of the axioms are more intuitive

then others, but all are satisfied by PageRank

We then show that PageRank is the only ranking algorithm that satisfies the axioms

We try to be informal, but convincing

Ranking Systems Defined

A ranking system F is a functional that maps every finite strongly connected directed graph (SCDG) G=(V,E) into a reflexive, transitive, complete, and anti-symmetric binary relation ≤ on V

Ranking Systems: Example MyRank ranks vertices in G in ascending

order of the number of incoming links

b

c

aMyRank(G): c = a < b

PageRank(G): c < a = b

Axiom 1: Isomorphism (ISO)

F satisfies ISO iff it is independent of vertex names Consequence: symmetric vertices

have the same rank

b

e

a

gf

j

i

he = f = g = h = i = j

a = b

Axiom 2: Self Edge (SE) Node v has a self-edge (v,v) in G’, but

does not in G. Otherwise G and G’ are identical. F satisfies SE iff for all u,w ≠ v:(u ≤ v u <’ v) and (u ≤ w u ≤’ w)

PageRank satisfies SE:Suppose v has k outgoing edges in G. Let (r1,…,rv,…,rN) be the rank vector of G, then (r1,…,rv+1/k,…,rN) is the rank vector of G’

Axiom 3: Vote by Committee (VBC)

a

c

b

a

c

b

1. In the example page a links only to b and c, but there may be more successors of a

2. Incoming links of a and all other links of the successors of a remain the same

Axiom 4: Collapsing (COL)

b

a

b

1. The sets of predecessors of a and b are disjoint

2. Pages a and b must not link to each other or have self-links

3. The sets of successors of a and b coincide

Axiom 5: Proxy (PRO)

1. All predecessors of x have the same rank2. |P(x)| = |S(x)|3. x is the only successor of each of its

predecessors

x

=

=

Useful Properties: DEL

1. |P(b)|=|S(b)|=12. There is no direct edge between a and c3. a and c are otherwise unrestricted

a

cb

d

a

c

d

DEL: Proof

a

cb

d

cb

d

a

VBC

DEL: Proof

cb

d

a

VBCcb

d

a

DEL: Proof

ISO,PROcb

d

a

cb

d

a

DEL: Proof

PROc

d

a

cb

d

a

DEL: Proof

PROc

d

a

c

d

a

DEL: Proof

VBCc

d

a

c

d

a

DEL: Proof

VBCc

d

a a

c

d

DEL for Self-Edge

It can also be shown that DEL holds for self-edges:

a a

Useful Properties: DELETE

1. Nodes in P(x) have no other outgoing edges

2. x has no other edges

x

=

=

=

=

DELETE: Proof

x

=

=

=

=

COL

x

y

DELETE: Proof

PRO

x

y

Useful Properties: DUPLICATE

1. All successors of a are duplicated the same number of times

2. There are no edges from S(a) to S(a)

c

b

d

a c

b

d

a

DUPLICATE: Proof

c

b

d

a c

b

d

a

VBC

DUPLICATE: Proof

c

b

d

a

VBC

c

b

d

a

DUPLICATE: Proof

c

b

d

a

COL

c

b

d

a

DUPLICATE: Proof

c

b

d

a

ISO,PRO

c

b

d

a

DUPLICATE: Proof

c

b

d

a

COL-1

c

b

d

a

DUPLICATE: Proof

VBC-1

c

b

d

a c

b

d

a

The Representation Theorem Proof

Given a SCDG G=(V,E) and a,b in V, we eliminate all other nodes in G while preserving the relative ranking of a and b

In the resulting graph G’ the relative ranking of a and b given by the axioms can be uniquely determined. Therefore the axioms rank any SCDG uniquely

It follows that all ranking systems satisfying the axioms coincide

Proof by Example on b and d

b

c

a

a b c

a

b

cd

⅓00½

⅓00½

⅓000

0110d

d

3

3

1

4

a

b

c

d

Step 1: Insert Nodes

b

c

a

d

b

c

a

d

By DEL the relative ranking is preserved

Step 2: Choose Node to Remove

b

c

a

d

Step 3: Remove “self-edges”

b

c

a

d

Step 4: Duplicate Predecessors

b

c

a

d

Step 5: DELETE the Node

b

cd

Step 5: DELETE the Extras

There still are nodes to delete: back to Step 2

b

cd

Step 2: Choose Node to Remove

Steps 3,4 - no changes

b

cd

Step 5: DELETE the Node

b

d

Step 6: DELETE the Extras

No original nodes to remove: proceed to Step 7

b

d

Step 7: Balance by Duplication

b

d

This is our G’

Step 8: Equalize by Reverse DEL

b

dBy ISO b=d. By DEL and SE: in G’ b<d.

Example for a and d

b

c

a

d

b

c

a

d

After Removal of c

ba

d

Duplicate Predecessors of b

ba

d

DELETE b

a

d

DELETE Extras

a

d

Before Balancing

a

d

After Balancing

a

dConclusion: a<d.

What about a and b?

ba

d

What about a and b?

ba

d

What about a and b?

ba

What about a and b?

ba

What about a and b?

ba

What about a and b?

ba

Conclusion: a=b.

Concluding Remarks

‘Representation theorems isolate the “essence” of particular ranking systems, and provide means for the evaluation (and potential comparison) of such systems’ – Alon & Tennenholtz

The Endc

b

d

a

½0

0½0

101

0a b

c

a

b

c

Recommended