24
1 Link Counts Linked by 2 Important Pages Linked by 2 Unimportant pages Sep’s Home Page Taher’s Home Page Yahoo! CNN DB Pub Server CS361 GOOGLE Page Rank engine needs speedup adapted from G. Golub et al

Link Counts

Embed Size (px)

DESCRIPTION

GOOGLE Page Rank engine needs speedup. Link Counts. Taher’s Home Page. Sep’s Home Page. CS361. DB Pub Server. CNN. Yahoo!. Linked by 2 Unimportant pages. Linked by 2 Important Pages. adapted from G. Golub et al. importance of page i. importance of page j. - PowerPoint PPT Presentation

Citation preview

Page 1: Link Counts

1

Link Counts

Linked by 2 Important Pages

Linked by 2 Unimportant

pages

Sep’s Home Page

Taher’s Home Page

Yahoo! CNNDB Pub Server CS361

GOOGLE Page Rank engine needs speedup

adapted from G. Golub et al

Page 2: Link Counts

2

Definition of PageRank

The importance of a page is given by the importance of the pages that link to it.

jBj j

i xN

xi

1

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 3: Link Counts

3

Definition of PageRank

1/2 1/2 1 1

0.1 0.10.1

0.05

Yahoo!CNNDB Pub Server

Taher Sep

0.25

Page 4: Link Counts

4

PageRank Diagram

Initialize all nodes to rank

0.333

0.333

0.333

nxi

1)0(

Page 5: Link Counts

5

PageRank Diagram

Propagate ranks across links(multiplying by link weights)

0.167

0.167

0.333

0.333

Page 6: Link Counts

6

PageRank Diagram

0.333

0.5

0.167

)0()1( 1j

Bj ji x

Nx

i

Page 7: Link Counts

7

PageRank Diagram

0.167

0.167

0.5

0.167

Page 8: Link Counts

8

PageRank Diagram

0.5

0.333

0.167

)1()2( 1j

Bj ji x

Nx

i

Page 9: Link Counts

9

PageRank Diagram

After a while…

0.4

0.4

0.2

jBj j

i xN

xi

1

Page 10: Link Counts

10

Computing PageRank Initialize:

Repeat until convergence:

)()1( 1 kj

Bj j

ki x

Nx

i

nxi

1)0(

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 11: Link Counts

11

Matrix Notation

jBj j

i xN

xi

1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

.1

.3

.2

.3

.1

.1TP

x

Page 12: Link Counts

12

Matrix Notation

.1

.3

.2

.3

.1

.1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

xPx TFind x that satisfies:

Page 13: Link Counts

13

Power Method Initialize:

Repeat until convergence:

(k)T1)(k xPx

T(0)x

nn

1...

1

Page 14: Link Counts

14

PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET.

So the PageRank problem is really:

not:

A side note

AxxFind x that satisfies:

xPx TFind x that satisfies:

Page 15: Link Counts

15

Power Method And the algorithm is really . . .

Initialize:

Repeat until convergence:

T(0)x

nn

1...

1

(k)1)(k Axx

Page 16: Link Counts

16

Power Method

u1

1u2

2

u3

3

u4

4

u5

5

Express x(0) in terms of eigenvectors of A

Page 17: Link Counts

17

Power Method

u1

1u2

22

u3

33

u4

44

u5

55

)(1x

Page 18: Link Counts

18

Power Method)2(x

u1

1u2

222

u3

332

u4

442

u5

552

Page 19: Link Counts

19

Power Method

u1

1u2

22k

u3

33k

u4

44k

u5

55k

)(kx

Page 20: Link Counts

20

Power Method

u1

1u2

u3

u4

u5

)(x

Page 21: Link Counts

21

Why does it work?

Imagine our n x n matrix A has n distinct eigenvectors ui.

ii uAu i

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.

Page 22: Link Counts

22

Why does it work? From the last slide:

To get the first iterate, multiply x(0) by A.

First eigenvalue is 1.

Therefore:

...;1 211

n0 uuux n ...221)(

n

n

(0)(1)

uuu

AuAuAu

Axx

nn

n

...

...

22211

221

n(1) uuux nn ...2221

All less than 1

Page 23: Link Counts

23

Power Method

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

u1

1u2

22

u3

33

u4

44

u5

55

n(1) uuux nn ...2221

n)( uuux 2

22221

2 ... nn u1

1u2

222

u3

332

u4

442

u5

552

Page 24: Link Counts

24

The smaller 2, the faster the convergence of the Power Method.

Convergence

n)( uuux k

nnkk ...2221

u1

1u2

22k

u3

33k

u4

44k

u5

55k