View
221
Download
0
Category
Preview:
Citation preview
Google PageRank Algorithm
By: Danny Lin
Table of Contents Google Search
History / What is Page Rank?
Page Rank Algorithm
Inbound/Outbound Links
Dangling Nodes
Constraints
Calculating your page rank
How to maximize your page rank score
Loopholes
Neat stuff
Google SearchGoogle search using PageRank:
1) Crawl the web and locate all publicly accessible webpages
2) Index the data from step 1 to allow for efficient searches for keywords or phrases
3) Rate the importance of each page in the database – using PageRank
4) Return results in descending order of importance with respect to search
Google’s Original Architectural Design
Source: http://infolab.stanford.edu/~backrub/over.gif
History• Page Rank was conceptualized by Sergey Brin and Lawrence Page; discussed in their 1998 paper: The anatomy of a large-scale hypertextual web search engine (http://infolab.stanford.edu/~backrub/google.html)
• Used to rank the importance of web pages
Source: https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/PageRank-hi-res.png/1280px-PageRank-hi-res.png
Page Rank AlgorithmPR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
PR(Tn) - The importance of page Tn.
C(Tn) - The number of outgoing links for page Tn.
PR(Tn)/C(Tn) - The calculated importance passed to page A from page Tn.
d - damping factor (0.85).
Inbound/Outbound LinksWith respect to page A:
Inbound links – links that point towards page A
Outbound links – links within page A pointing towards other pages
Dangling Nodes A dangling node is a page that does not have any outbound links.
Issue: They act as sinks that reduce the importance from the web.
Solution: Assume that the dangling node has a link to every other page. We randomly select the next page at random. This creates a stochastic matrix; all entries are nonnegative and the sum of each column is equal to 1.
Source: http://www.webworkshop.net/images/pr1.gif
ConstraintsMust be primitive, i.e. for some n, Sn has all positive entries where λ1 = 1 and λ2 < 1
Must be stochastic, i.e. all entries are nonnegative and the sum of each column is equal to 1.
Must be irreducible, i.e. you should not be able to perform row/column permutations such that you end up with a block upper-triangular form. The nodes must be strongly connected.
Calculating your page rank“Page Rank can be calculated using a simple iterative algorithm and corresponds to the principal eigenvector of the normalized link matrix (probability distribution) of the web”
Algorithm to calculate the normalized probability distribution:
1) Multiply stochastic matrix, S, with an random eigenvector, i1, to get new eigenvector, i2…
2) Repeat step 1 until in-1 = in (approx.)
LINEAR ALGEBRA TIME!!! Page Rank calculation time!
How to maximize your page rank score Internal Linking – having links to other pages within your website
Hierarchical Fully meshed
Good and plentiful content E.g. news website
Provide a useful service or product E.g. phpbb – online bulletin board system
Loopholes SEO (Search Engine Optimization) webpages to increase traffic flow conversions $$
An issues that arose from this: the selling of links from high PR pages
Source: http://www.bloggingcage.com/wp-content/uploads/2015/07/pr8links.png
Neat stuff Overview of a google search (1-2 minutes):
http://www.google.com/insidesearch/howsearchworks/thestory/index.html
How search has evolved (6 minutes):
https://www.youtube.com/watch?v=mTBShTwCnD4
Changes to Google’s search algorithm:
https://moz.com/google-algorithm-change
ReferencesContent
http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
http://www.ams.org/samplings/feature-column/fcarc-pagerank
http://infolab.stanford.edu/~backrub/google.html
http://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf
http://www.google.com/insidesearch/howsearchworks/thestory/index.html
Images
https://lh4.googleusercontent.com/-vAlbgOEKiNI/TtkBZvZLnDI/AAAAAAAAMrw/ooZ1Thuutmw/w1034-h587-no/OriginalGooglePage.PNG
http://infolab.stanford.edu/~backrub/over.gif
https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/PageRank-hi-res.png/1280px-PageRank-hi-res.png
http://www.webworkshop.net/images/pr1.gif
http://www.bloggingcage.com/wp-content/uploads/2015/07/pr8links.png
Questions?
Source: https://lh4.googleusercontent.com/-vAlbgOEKiNI/TtkBZvZLnDI/AAAAAAAAMrw/ooZ1Thuutmw/w1034-h587-no/OriginalGooglePage.PNG
Recommended