41
Markov Chains as methodology used by PageRank to rank the Web Pages on Internet. Sergio S. Guirreri - www.guirreri.host22.com Google Technology User Group (GTUG) of Palermo. 5th March 2010 Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.) Markov Chains as methodology used by PageRank to rank the Web Pages on Inte 5th March 2010 1 / 14

PageRank and Markov Chain

Embed Size (px)

DESCRIPTION

A brief introduction to the methodology used by PageRank to rank the webpages.

Citation preview

Page 1: PageRank and Markov Chain

Markov Chains as methodology used by PageRank torank the Web Pages on Internet.

Sergio S. Guirreri - www.guirreri.host22.com

Google Technology User Group (GTUG) of Palermo.

5th March 2010

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 1 / 14

Page 2: PageRank and Markov Chain

Overview

1 Concepts on Markov-Chains.

2 The idea of the PageRank algorithm.

3 The PageRank algorithm.

4 Solving the PageRank algorithm.

5 Conclusions.

6 Bibliography.

7 Internet web sites.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 2 / 14

Page 3: PageRank and Markov Chain

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.

Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Page 4: PageRank and Markov Chain

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Page 5: PageRank and Markov Chain

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Page 6: PageRank and Markov Chain

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.

The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Page 7: PageRank and Markov Chain

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Page 8: PageRank and Markov Chain

The idea of the PageRank algorithm.

PageRank’s idea.The idea behind the PageRank algorithm is similar to the idea of the impactfactor index used to rank the Journals [Page et al.(1999)][Brin and Page(1998)] [Langville et al.(2008)].

PageRank the impact factor of Internet.The impact factor of a journal is defined as the average number of citationsper recently published papers in that journal.By regarding each web page as a journal, this idea was then extended tomeasure the importance of the web page in the PageRank Algorithm.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 4 / 14

Page 9: PageRank and Markov Chain

The idea of the PageRank algorithm.

PageRank’s idea.The idea behind the PageRank algorithm is similar to the idea of the impactfactor index used to rank the Journals [Page et al.(1999)][Brin and Page(1998)] [Langville et al.(2008)].

PageRank the impact factor of Internet.The impact factor of a journal is defined as the average number of citationsper recently published papers in that journal.By regarding each web page as a journal, this idea was then extended tomeasure the importance of the web page in the PageRank Algorithm.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 4 / 14

Page 10: PageRank and Markov Chain

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.

let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

Page 11: PageRank and Markov Chain

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.

let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

Page 12: PageRank and Markov Chain

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

Page 13: PageRank and Markov Chain

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

Page 14: PageRank and Markov Chain

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

Page 15: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRank

Each pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

Page 16: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRankEach pi is the proportion of time that the surfer visiting the web page i.

The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

Page 17: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRankEach pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.

The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

Page 18: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRankEach pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

Page 19: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with reducible Markov Chain

Since the matrix Q can be reducible to ensure that the steady-stateprobability exists and is unique the following matrix P must be considered:

P = α

Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .

QN1 QN2 . . . QNN

+ (1− α)N

1 1 . . . 11 1 . . . 1. . . . . . . . . . . .1 1 . . . 1

(2)

Where 0 < α < 1 and the most popular values of α are 0.85 and (1− 1/N ).

Interpretation of PageRankThe idea of the PageRank (2) is that, for a network of N web pages, each webpage has an inherent importance of (1− α)/N .If a page Pi has an importance of pi , then it will contribute an importance ofα pi which is shared among the web pages that it points to.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 7 / 14

Page 20: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with reducible Markov Chain

Since the matrix Q can be reducible to ensure that the steady-stateprobability exists and is unique the following matrix P must be considered:

P = α

Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .

QN1 QN2 . . . QNN

+ (1− α)N

1 1 . . . 11 1 . . . 1. . . . . . . . . . . .1 1 . . . 1

(2)

Where 0 < α < 1 and the most popular values of α are 0.85 and (1− 1/N ).

Interpretation of PageRankThe idea of the PageRank (2) is that, for a network of N web pages, each webpage has an inherent importance of (1− α)/N .If a page Pi has an importance of pi , then it will contribute an importance ofα pi which is shared among the web pages that it points to.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 7 / 14

Page 21: PageRank and Markov Chain

The PageRank algorithm.

The PageRank with reducible Markov Chain

Solving the following linear system of equations subject to the normalizationconstraint one can obtain the importance of web page Pi :

p1p2...

pN

= α

Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .

QN1 QN2 . . . QNN

p1p2...

pN

+ (1− α)N

11...1

(3)

SinceN∑

i=1pi = 1

the (3) can be rewritten as

(p1, p2, . . . , pN )T = P(p1, p2, . . . , pN )T

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 8 / 14

Page 22: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:

there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Page 23: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Page 24: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Page 25: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Page 26: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Page 27: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 28: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 29: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 30: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 31: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→

limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 32: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→

Ak ≈ a1λk1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 33: PageRank and Markov Chain

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Page 34: PageRank and Markov Chain

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.

The convergence rate of the power method depends on the ratio of λ2λ1

.It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Page 35: PageRank and Markov Chain

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Page 36: PageRank and Markov Chain

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Page 37: PageRank and Markov Chain

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.

The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Page 38: PageRank and Markov Chain

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Page 39: PageRank and Markov Chain

Conclusions.

Really thanks to GTUG Palermoand

see you to the next meeting!

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 12 / 14

Page 40: PageRank and Markov Chain

Bibliography.

Bibliography.

Brin, S. and Page, L. (1998).The anatomy of a large-scale hypertextual Web search engine.Computer networks and ISDN systems, 30(1-7), 107–117.

Ching, W. and Ng, M. (2006).Markov Chains: Models, Algoritms and Applications.Springer Science + Business Media, Inc.

Haveliwala, T. and Kamvar, M. (2003).The second eigenvalue of the google matrix.Technical report, Stanford University.

Langville, A., Meyer, C., and FernAndez, P. (2008).Google’s PageRank and beyond: the science of search engine rankings.The Mathematical Intelligencer, 30(1), 68–69.

Page, L., Brin, S., Motwani, R., and Winograd, T. (1999).The PageRank Citation Ranking: Bringing Order to the Web.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 13 / 14

Page 41: PageRank and Markov Chain

Internet web sites.

Internet web sites.

Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. -Norwegian University of Science Technology -www.slideshare.net/sveino/semantics-and-search?type=presentation

Steven Levy (2010) - Exclusive: How Google’s Algorithm Rules the Web - WiredMagazine - www.wired.com/magazine/2010/02/ff_google_algorithm/

Ann Smarty (2009) - Let’s Try to Find All 200 Parameters in Google Algorithm -Search Engine Journal -www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 14 / 14