20
Opti mizing Sea rch Engines A Mathematical point of View By Swapnil Kotwal

Optimizing search engines

Embed Size (px)

DESCRIPTION

Optimizing Search Engines(A Mathematical Point view) Following things covered here - A basic introduction to Search Engine Optimizing. Introduction to Google and Bing Webmaster. - Use of Google Toolbar to see Page Rank of each page(Calculating importance of each page for Google Search Engines.) - PageRank Algorithm(I will the focus on this point mostly). - How it is useful to real SEO and practical implementation of SEO. - Google Bomb.

Citation preview

Page 1: Optimizing search engines

Optimizing Search Engines

A Mathematical point of View

By Swapnil Kotwal

Page 2: Optimizing search engines

Introduction to SEO

Search engine optimization (SEO) : SEO works in two ways

Pay Per Click.

Search engine's "natural" or un-paid ("organic") search results.

Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's "natural" or un-paid ("organic") search results. In general, the earlier (or higher ranked on the search results page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users (Wikipedia)

Page 3: Optimizing search engines

Optimizing search engines ("organic") 

search results.

factors which affects your web content search visibility:-

Title:- Title tag should be mention properly. Search Engines doesn't use the title tag 100% of the time. Occasionally, Google pulls the title from the anchor text of a link to that page. Make sure your words in Title and web link words matches.

Snippet :-The snippet is the description for the page that appears beneath the title. Google may pull this from the page’s meta description tag.  Put relevant sentences, it helps to best search queries.

Bolding :-Google bolds the query words anywhere they appear in the search result.

Cached link :- If the page is down or loading slowly, a searcher can still get to the information via the cache. If the page is accidentally deleted, the webmaster can retrieve the data from the cache to recreate the page. Also, the cache shows when the page was crawled.

Page 4: Optimizing search engines

Optimizing search engines ("organic") 

search results.

Meta data Other items in a search result include the URL, the page size etc.

Sitelink:- Link to other sites helps, improve search results.

Introduction to webmasters.(Google and Bing Search Engines)

Want to avoid negative search results about your web content ? Put unwanted links in robots.txt in webmaster tools.

This all operations could be done through HTML Meta tags and using webmaster tool from search engines.

Page 5: Optimizing search engines

Google PageRank Algorithm

Agenda :-

Fact.

Understanding Page Rank Algorithm.

Analysis.

Case Discussion.

Practical implementation.

Page 6: Optimizing search engines

Agenda :-

Fact.

Understanding Page Rank Algorithm.

Simple calculation of PageRank.

Analysis of PageRank Algorithm.

Case Discussion.

Practical Implementation.

References.

Page 7: Optimizing search engines

Fact

Developed by Larry Page and Sergey Brin in 1988

Trade Mark of Google.

Patterned by Stanford University.

Back Bone of Google Search Technology.

Page 8: Optimizing search engines

Understanding Simple PR Algorithm.Every inbound link increase the weightage of a page.

Page Rank is based on numbers of pages linked to that page.

Highest PageRank is 1 but in real world Indicated with numbers between 1 – 10(using a logarithmic scale.)

Hence , appropriate SERP listing.

Calculated by Nature and Numbers of back links.

Indicated on Google toolbar.

Page 9: Optimizing search engines

Definition of PageRank Algorithm.

Assume a small universe of four web pages: A, B, C and D  , then page rank calculated as:

PR(A) = PR(B) + PR(C) + PR(D)

Where page B had a link to pages C and A, page C had a link to page A, and page D had links to all three pages. Then PR is:-

PR(A) = PR(B)/2 + PR(C)/1 + PR(D)/3

Let denote , outbound links by L() then,

PR(A) = PR(B)/L(B) + PR(C)/L(C)+ PR(D)/L(D), final summation will be,

Page 10: Optimizing search engines

Understanding PageRank

PR(u) = ∑ PR(v)/L(v) , for every v ∈ Bu

i.e. the PageRank value for a page u is dependent on the PageRank values for each page v contained in the set Bu (the set containing all pages linking to page u), divided by the number L(v) of links from page v.

Introduction to Damping factor(By SergeyBrin)(d = 0.85) :-

The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d , generally assumed value is 0.85

So generalized PageRank algorithm is :-

PR( Pi ) = 1- d/N + d ∑ P(j)/L(j)

Page 11: Optimizing search engines

Understanding PageRank

A Simple Example:- Consider a small universe (A Set of N pages)where , we have only to web pages, then

Guess 1st :- Say, initial page rank of each page is 1.0 and d = 0.85

PR(A) = (1 – d) + d(PR(B)/1) and PR(B) = (1 – d) + d(PR(A)/1) We get,

PR(A) = 0.15 + 0.85 * 1 = 1 and PR(B) = 0.15 + 0.85 * 1 = 1

Guess 2nd :- Say, initial page rank of each page is 40 and d = 0.85

PR(A) = (1 – d) + d(PR(B)/1) and PR(B) = (1 – d) + d(PR(A)/1) We get,

Page 12: Optimizing search engines

Understanding PageRank

First Calculation:-

PR(A)= 0.15 + 0.85 * 40 = 34.25 ,

PR(B) = 0.15 + 0.85 * 0.385875  = 9.1775

Second Calculation:-

PR(A)= 0.15 + 0.85 * 29.1775 = 24.950875

PR(B) = 0.15 + 0.85 * 24.950875  = 21.35824375 and so on …

On Kth Calculation:- When the sum of PageRank of each page is equal to number of pages present in the set , that would be your page rank of page.

Average page rank never cross to 1.

Page 13: Optimizing search engines

Linear system of equations

Assume in small set ‘x’, we have

Pages 1, 2 , 3 , 4 then transition

Matrix will be, A =

Please note some observations here:-

Page 1:- donates = 1/3+1/3+1/3 = 1 and gains 1+1/2 = 1.5 importance.

Page 2:- donates = 1/3+1/2 = 0.83 and gains 1/3 = 0.33 importance.

Page 3 :- donates = 1 = 1 and gains 1/3 +1/2+1/2 = 1.33 importance.

Page 4 :- donates = 1/3+1/2 = 0.83 and gains 1/2 = 0.5 importance.

Page 14: Optimizing search engines

Solving Linear Equation:-

Arrange Linear system =

of equations

We get linear equation = ->

Solving this equation by

substitution method(substitute value of x2, we and ), we get,

Page 15: Optimizing search engines

Solving Linear Equation:-

We get a vector eigenvectors corresponding to the Eigen value 1 are of the form 

Here we don’t know about value of x1, choose x1/12 as some constant so we could get Eigen vector, whose average value is 1.

Page 16: Optimizing search engines

Solving Linear Equation:-

We could choose as a 1/31,

So that, sum of

PR(x) = 0.38 + 0.132 +0.29 + 0.19 = 0.992

(Since PR never cross 1 and average/Maximum PR will be 1)

Page 17: Optimizing search engines

How PR help you ?

How it is use full to me ?

Linking your web content

with many links can increase

your search visibility and

A outbound link from highly

Ranked page optimize your

search query.

Page 18: Optimizing search engines

Google Bomb :-

The terms Google bomb is creating large numbers of links, that cause a web page to have a high ranking for searches on unrelated or off topic keyword phrases, often for comical or satirical purposes.

Example of Google bomb:- Search For “completely wrong” in Google.

Page 19: Optimizing search engines

References:-

I would like to thanks to Dr. Vinayak Joshi, Department of Mathematics, University of Pune, Who introduce me to this algorithm and motivated me to deliver a session in 2009.

Wikipedia http://en.wikipedia.org/wiki/PageRank

Department of Mathematics, Cornell University, Lecture 3 and 6

Linear Algebra by Vivek Sahai and Vikas Bist.

Page 20: Optimizing search engines

Questions?

Questions?

Thanks !