Upload
dominic-horton
View
213
Download
0
Embed Size (px)
Citation preview
CS155b: E-Commerce
Lecture 16: April 10, 2001
WWW Searching and Google
WWW Digraph
• More than 1 Billion Nodes (Pages)
• Average Degree (links/Page) is 5-15. (Hard to Compute!)
• Massive, Distributed, Explicit Digraph
(Not Like Call Graphs)
“Hot” Research Area
• Graph Representation
• Duplicate Elimination
• Clustering
• Ranking Query Results
http://theory.stanford.edu/~focs98/tutorials.html
A. Broder & M. Henzinger
“Abundance” Problem
http://simon.cs.cornell.edu/home/kleinber/ ….. kleinber.html
• Given a query find:– Good Content (“Authorities”)– Good Sources of Links (“Hubs”)
• Mutually Reinforcing
• Simple (Core) Algorithm
A
H
T = {n Pages}, A = {Links}
Xp >0, p T non-negative “Authority Weights”
Yp >0, p T non-negative “Hub Weights”
I operation Update Authority Weights
Xp Yq
O operation Update Hub Weights
Yp Xq
Normalize: X2 = Y2 = 1
(q,p) A
(p,q) A
pp T p T
p
Core Algorithm
Z (1,1,…,1)
X Y Z
Repeat until Convergence
Apply I /* Update Authority weights */
Apply O /* Update Hub Weights */
Normalize
Return Limit (X*, Y*)
Convergence of (Xi, Yi) = (OI)i(Z,Z)
A = n x n “Adjacency Matrix”
Rewrite I and O:
X ATY ; Y AX
Xi = (ATA) i-1 ATZ ; Yi = (AAT)iZ
AAT Symm., Non-negative and Z = (1,1,…, 1)
X* = lim Xi = 1(ATA)
Y* = lim Yi = 1 (AAT)i
i
88
Whole Algorithm (k,d,c)
q Search Engine |S| < k
Base Set T:(In S, S , S) and <d links/page
Remove “Internal Links”Run Core Algorithm on TFrom Result (X,Y), Select
C pages with max X* valuesC pages with max Y* values
Examples (k= 200, d=5)q = censorship + net
www.EFF.orgwww.EFF.org/BlueRib.html
www.CDT.orgwww.VTW.orgwww.ACLU.prg
q = Gateswww.roadahead.comwww.microsoft.comwww.ms.com/corpinfo/bill-g.html
[Compares well with Yahoo, Galaxy, etc.]
Approach to “Massiveness”:Throw Out Most of G!!
• Non-principal Eigenvectors correspond to
“Non-principal Communities”
• Open (?):
Objective Performance Criteria
Dependence on Search Engine
Nondeterministic Choice of S and T
Google History
• Founded in 1998 by Larry Page and Sergey Brin, two Stanford Ph.D. candidates.
• Privately held company, whose backers include Kleiner Perkins Caufield & Byers and Sequoia Capital.
• Continues to win top awards for Search Engines. Computer Scientists love it!!!
Major Partners
• Yahoo!• Palm• Nextel• Netscape• Cisco Systems• Virgin Net• Netease.com• RedHat• Virgilio• Washingtonpost.com
Business Model
• The company delivers services through its own web site at www.google.com and by licensing its search technology to commercial sites
• Advertising: – Premium Sponsorship – Purchase a keyword
– AdWords – Manage your Ad text
I’d like to buy a Keyword
The advertiser’s text-based ad will appear at the top of a Google results page whenever the keyword they have purchased is included in a user's search.
The ads appear adjacent to, but are distinguished from, the results listings.
Category purchase
• Google uses a classification system to create an ongoing "Virtual Directory" of categories an advertiser can purchase.
• Advertisers can select the categories most appropriate to their business and Google will match the most relevant category ads to each user's search.
• The advantage of this approach is that it covers a broader audience that might be missed through the purchase of keywords alone.