Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ? The internet is a global system of networks...

Preview:

Citation preview

Roshnika Fernando

PAGERANK

WHY PAGERANK?

The internet is a global system of networks linking to smaller networks.

This system keeps growing, so there must be a way to sort though all the information available.

PageRank is the algorithm used by the search engine Google to sort through internet webpages

A webpage’s rank determines the order it appears when a keyword search is performed on Google

Fun Fact: PageRank is named after Larry Page, one of the founders of Google, not after webpages

POPULARITY CONTEST

Rank, at its simplest, is the probability that a webpage will be visited

Sum of rank of all pages is 1

Rank of linked pagesaffects rank of page

Initially, rank = 1/(total # of pages available) ≈ 0 for internet

DETERMINING RANK

Let P be an i x j stochastic matrix where pi,j is the probability of going to webpage j from webpage i.

pi,j = (# of links to page j from page i) (# of links on page i)

Note: i and j are integers and positive values

Note: There are around 25 billion pi,j combinations on the internet

LONG TERM PROBABILITY

After a very long time, what is the probability that web surfers will be at a certain website?

Let be the stationary distribution vector where is the probability of being at state k.

Since stochastic matrices have eigenvalue λ = 1,

Solve for to determine long term probability of being at each webpage (aka the rank)

x

kx

0I)-(P

PP

x

xxxx

x

SMALL SCALE EXAMPLE

7 pages

linked to

one

another

0000000

0000002.00333.000015.05.002.00333.0002.0000002.0005.5.02.05.333.05.10

P

7

6

5

4

3

2

1

xxxxxxx

LINEAR PROGRAM

0000000

0000002.00333.000015.05.002.00333.0002.0000002.0005.5.02.05.333.05.10

P

7

6

5

4

3

2

1

xxxxxxx

.061

.045

.179

.105

.141

.166

.304

x

Solve for x vector using (P - I)x = 0 to obtain Page Rank

x vector is the eigenvector for eigenvalue λ = 1

SMALL SCALE SOLUTION

•As t → ∞

•pi,j given

•PageRank:

x1 = .304

x2 = .166

x3 = .141

x4 = .105

x5 = .179

x6 = .045

x7 = .061

SENSITIVITY ANALYSIS

What if a page has no links? What happens to the probability matrix P?

P is stochastic, meaning the sum of the columns must equal 1.

If a page has no links leading out, then pi,j for that given column will be distributed evenly to all rows in j so that

This assumes when someone reaches a dead end, the possibility of him/her going to a new page is entirely random

1

, 1i

jix

PROBABILITY AND RANK

The stationary distribution vector contains the rank of each webpage, which determines the order it appear when a keyword search is performed

This rank is the probability that a person will be at each of the billions of pages available online.

This takes several powerful computers to compute.

x

QUESTIONS?

CITATIONS

Austin, David. "How Google Finds Your Needle in the Web's Haystack." AMS.org. American Mathematical Society. Web. 09 Nov. 2009. <http://www.ams.org/featurecolumn/archive/page

rank.html>.

"PageRank." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/PageRank#False_or_

spoofed_PageRank>.

Photograph. PageRanks-Example. Wikipedia, 8 July 2009. Web. 9 Nov. 2009. <http://upload.wikimedia.org/wikipedia/commons/f/fb

/PageRanks-Example.svg>.

"Stochastic matrix." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/Stochastic_matrix>.