11
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium,1998 Krishna Venkateswaran 1

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Embed Size (px)

Citation preview

Page 1: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Authoritative Sources in a Hyperlinked Environment

Jon M. KleinbergACM-SIAM Symposium,1998

Krishna Venkateswaran

1

Page 2: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Basic Idea R is grown to a set S so that it contains a rich amount of

authoritative pages.Include any page to S that is pointed to by a page in R.

R- Root set S contains t

results. R S- Base set

generated from algorithm.

‘S’ is used to determine the hubs and authorities.

2

Page 3: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Get a set of results for a query string from a text based search query.

Take the top ‘t’ results out of it and put it in a set R.

For every page in set R,◦ Add all the pages that the page points to into

the set R.◦ Add a maximum of d pages that points to the

page, into the set R. The new result set is named S.

Result returned:Base set S out of which we compute the top

authorities and hubs.

3

Page 4: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

HeuristicsTo determine what pages to add to the set S.

Heuristic 1: Avoiding navigational links.◦ Transverse links: links that are between pages

with different domain names.◦ Intrinsic links (navigational links): links that are

between pages within a domain.◦ Delete all intrinsic links.

Heuristic 2: Avoiding Mass endorsements.◦ Mass endorsements: A large number of pages in

a domain pointing to a single page.◦ Example: “This site is designed by …” and a link.◦ Eliminate this by setting a parameter m and

allowing only m pages from a single domain to point to a page.

4

Page 5: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Extracting authorities from the overall collection of pages, through an analysis of the link structure of G.

Good hub points to many good authorities and a good authority is pointed to by many good hubs.

Hubs Authorities unrelated page of large in-degree

5

Page 6: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Basic Idea Each page p has a non negative authority weight

and non negative hub weight.

If p points to pages with large authority weight values then the page has a large hub weight value.

If p is pointed to by pages with large hub weight values then the page has a large authority weight value.

Pages with higher weights are better authorities and hubs.

6

Page 7: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

I operation:◦ Authority weight of a page= Sum of all hub

weights of pages pointing to the page.

O operation: ◦ Hub weight of a page= Sum of all authority

weights of pages, this page points to.

I and O reinforce each other.

Normalization: The values of the hub and authority weights are divided with a value so that the squares of the sum doesn’t exceed 1.

7

Page 8: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Contd...q1 q1

q2 y[p]=sum of all x[q].

page p page p q2

x[p]=sum of all y[q] q3 q3

Operation I Operation O

Decision on when to stop the reinforcing process. 1)Apply I and O operations alternatively until a

fixed point is reached. 2)Choose a specific parameter ‘k’ and iterate the

process only to k number of times. 8

Page 9: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Given the set of pages in the form of a graph, set an integer value for parameter k.

k is the number of time the iteration occurs. Repeat the following process k times.

◦ Apply the I operation to a page and update its new authority weight.

◦ Apply the O operation to a page and update its hub weight.

◦ Normalize both the authority weight and the hub weight. Return the graph with the new authority weight

and hub weight for each page.

9

Page 10: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Observations The top authorities and hubs are determined by

finding the pages containing the top ‘c’ values for x and y from the graph resulted from the Iterate algorithm.

The Iterate procedure converges to fixed points x* and y* as k increases arbitrarily. ◦ Proved using principal eigenvectors.

Iterate algorithm results in densely linked collection of pages- rich in relevant pages. ◦ Most relevant collection of pages is the densest

graph.

10

Page 11: Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Results(java) Authorities

.328 http://www.gamelan.com/ Gamelan

.251 http://java.sun.com/ JavaSoft Home Page

.190 http://www.digitalfocus.com/digitalfocus/faq/howdoi.html The Java Developer: HowDoI

.190 http://lightyear.ncsa.uiuc.edu/srp/java/javabooks.html The Java Book

(\search engines") Authorities.346 http://www.yahoo.com/ Yahoo!.291 http://www.excite.com/ Excite.231 http://www.lycos.com/ Lycos Home Page.231 http://www.altavista.digital.com/ AltaVista: Main Page

(Gates) Authorities.643 http://www.roadahead.com/ Bill Gates: The Road Ahead.458 http://www.microsoft.com/ Welcome to Microsoft.440 http://www.microsoft.com/corpinfo/bill-g.htm

It was observed that the www.roadahead.com was the only site that was present in R initially.

This supports the algorithm because many of the pages don’t contain the search query in them. 11