Upload
gidgreen
View
1.549
Download
3
Embed Size (px)
Citation preview
8 — Search Engines
From Code to Product gidgreen.com/course
The Google bomb
From Code to Product Lecture 8 — Search Engines — Slide 2 gidgreen.com/course
Feeling lucky?
From Code to Product Lecture 8 — Search Engines — Slide 3 gidgreen.com/course
Lecture 8
• Introduction • How search works • Anatomy of HTML • On-page factors • Off-page factors • PageRank
From Code to Product Lecture 8 — Search Engines — Slide 4 gidgreen.com/course
History of web search
• 1994 — WebCrawler — first full text • 1994 — Yahoo — directory then portal • 1995 — AltaVista — first big index • 1997 — Google — link citation analysis • 2000 — 2004 — Yahoo uses Google • 2000 — Baidu — now leader in China • 2006 — Microsoft Live Search • 2009 — Bing, switched to by Yahoo
From Code to Product Lecture 8 — Search Engines — Slide 5 gidgreen.com/course
Importance of search
• US: 17.1B “core searches” in April 2012 – 55 per US citizen [comScore]
• 92% of online US adults use search – 96% of college graduates – 98% with income $75k+
• 70–80% ignore paid ads on right – (but only 10% ignore ads on top)
• 80% of sessions begins with a search
From Code to Product Lecture 8 — Search Engines — Slide 6 gidgreen.com/course
Sour
ces:
com
Scor
e, P
ew In
tern
et R
epor
t, U
ser
Cent
ric,
PC
Mag
azin
e,
http
://w
ww
.sea
rche
ngin
ejou
rnal
.com
/24-
eye-
popp
ing-
seo-
stat
isti
cs/4
2665
/
Googling google
From Code to Product Lecture 8 — Search Engines — Slide 7 gidgreen.com/course
Search as traffic source
From Code to Product Lecture 8 — Search Engines — Slide 8 gidgreen.com/course
Global market share
From Code to Product Lecture 8 — Search Engines — Slide 9 gidgreen.com/course
Google, 81.73%
Yahoo, 6.42% Baidu, 5.65%
Bing, 4.15% Other, 2.05%
Global
Sour
ce:
May
201
2 fi
gure
s fr
om h
ttp:
//w
ww
.net
mar
kets
hare
.com
/
USA market share
From Code to Product Lecture 8 — Search Engines — Slide 10 gidgreen.com/course
Google, 76.57%
Bing, 10.46%
Yahoo, 9.83% AOL, 1.47%
Ask, 1.33%
USA
Sour
ce:
May
201
2 fi
gure
s fr
om h
ttp:
//w
ww
.net
mar
kets
hare
.com
/
China market share
From Code to Product Lecture 8 — Search Engines — Slide 11 gidgreen.com/course
Baidu, 78.50%
Google, 16.60%
Sougou, 2.80%
SoSo, 1.40%
Others, 0.70%
China
Sour
ce:
http
://c
hine
sese
oshi
fu.c
om/c
hina
-sea
rch-
engi
ne-m
arke
t-sh
are/
Also: Japan, Czech Republic, South Korea, Russia,
Search engine results page
From Code to Product Lecture 8 — Search Engines — Slide 12 gidgreen.com/course
Where do people look?
From Code to Product Lecture 8 — Search Engines — Slide 13 gidgreen.com/course
Where do people click?
From Code to Product Lecture 8 — Search Engines — Slide 14 gidgreen.com/course
http
://w
ww
.seo
moz
.org
/blo
g/m
issi
on-i
mpo
sser
pble
-es
tabl
ishi
ng-c
lickt
hrou
gh-r
ates
Black-hat vs white-hat
From Code to Product Lecture 8 — Search Engines — Slide 15 gidgreen.com/course
Black-hat SEO White-hat SEO
Tricking Google Working with Google
Hidden keywords Prominent keywords
Cloaking for search Structured for search
Content scraping Unique content
Link spam and farms Attracting links
Short-lived boost (maybe) Long-term results
Google’s recommendations
From Code to Product Lecture 8 — Search Engines — Slide 16 gidgreen.com/course
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769
Lecture 8
• Introduction • How search works • Anatomy of HTML • On-page factors • Off-page factors • PageRank
From Code to Product Lecture 8 — Search Engines — Slide 17 gidgreen.com/course
How search works
• Crawling – Finding content to index
• Indexing – Preparing content for search
• Searching – Showing results to user
From Code to Product Lecture 8 — Search Engines — Slide 18 gidgreen.com/course
Basic crawling
• Create an empty URL queue (“frontier”) • Add one good URL, e.g. wikipedia.org • Repeat: – Select random URL from queue – Retrieve content for that URL – Add links in content to queue – Keep track to prevent repeat visits
From Code to Product Lecture 8 — Search Engines — Slide 19 gidgreen.com/course
Crawling issues
• Link prioritization • Duplicate content – Print versions, sorting
• Infinite loops – Database-driven sites
• Revisiting pages • Site overloading • Parallelization From Code to Product Lecture 8 — Search Engines — Slide 20 gidgreen.com/course
Indexing
From Code to Product Lecture 8 — Search Engines — Slide 21 gidgreen.com/course
Indexing
From Code to Product Lecture 8 — Search Engines — Slide 22 gidgreen.com/course
Inverted index
From Code to Product Lecture 8 — Search Engines — Slide 23 gidgreen.com/course
http
s://
deve
lope
r.ap
ple.
com
/lib
rary
/mac
/#do
cum
enta
tion
/use
rexp
erie
nce/
Conc
eptu
al/S
earc
hKit
Conc
epts
/sea
rchK
it_b
asic
s/se
arch
Kit_
basi
cs.h
tml
Other indexed information
• Page metadata • More about words – Prominence – Position – Frequency
• Links between pages – Including anchor text
• Images, etc…
From Code to Product Lecture 8 — Search Engines — Slide 24 gidgreen.com/course
Other formats
From Code to Product Lecture 8 — Search Engines — Slide 25 gidgreen.com/course
http
://s
uppo
rt.g
oogl
e.co
m/w
ebm
aste
rs/b
in/
answ
er.p
y?hl
=en&
answ
er=3
5287
Forms? Javascript?
Stemming, proximity, ANDs
From Code to Product Lecture 8 — Search Engines — Slide 26 gidgreen.com/course
Recent Google changes
• Aug 2012: sometimes 7 results • May 2012: knowledge graph • Jan 2012: top heavy ads penalty • Nov 2011: rewarding freshness • Feb 2011: hitting content farms • Dec 2010: social media signals • Dec 2009: real-time search
From Code to Product Lecture 8 — Search Engines — Slide 27 gidgreen.com/course
http
://w
ww
.seo
moz
.org
/goo
gle-
algo
rith
m-c
hang
e
Google web history (2005–2009)
From Code to Product Lecture 8 — Search Engines — Slide 28 gidgreen.com/course
Search + your world (2012)
From Code to Product Lecture 8 — Search Engines — Slide 29 gidgreen.com/course
http
://w
ww
.ube
rgiz
mo.
com
/201
2/01
/goo
gle-
now
-sea
rche
s-yo
ur-w
orld
/
Keyword research
But: consider also long tail (referrer logs)
From Code to Product Lecture 8 — Search Engines — Slide 30 gidgreen.com/course
Keyword research
From Code to Product Lecture 8 — Search Engines — Slide 31 gidgreen.com/course
Lecture 8
• Introduction • How search works • Anatomy of HTML • On-page factors • Off-page factors • PageRank
From Code to Product Lecture 8 — Search Engines — Slide 32 gidgreen.com/course
HTTP protocol
GET /wiki/Hypertext_Transfer_Protocol HTTP/1.1 Host: en.wikipedia.org User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8) AppleWebKit/534.50.2 (KHTML, like Gecko) Version/5.0.6 Safari/533.22.3 Referer: http://www.rexswain.com/httpview.html
Connection: close HTTP/1.0 200 OK Date: Sun, 17 Jun 2012 06:05:03 GMT Server: Apache Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en Last-Modified: Sat, 16 Jun 2012 03:14:24 GMT Content-Length: 164814 Content-Type: text/html; charset=UTF-8 Connection: close
From Code to Product Lecture 8 — Search Engines — Slide 33 gidgreen.com/course
Page structure <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="en"> <head> <title>Uber List Manager</title> </head> <body> <h1>Uber List Manager</h1> <p>The world's leading & best priced list
management software.</p> </body>
</html>
From Code to Product Lecture 8 — Search Engines — Slide 34 gidgreen.com/course
Key <HEAD> elements <head> <meta http-equiv="content-type" content="text/
html; charset=utf-8"> <link rel="stylesheet" type="text/css"
href="styles.css"> <script type="text/javascript" src=”script.js"></
script> <title>Uber List Manager</title> <meta name="description" content="An excellent
and well priced list management program."> <meta name="keywords" content="lists, list
manager, uber, mailing software"> </head> From Code to Product Lecture 8 — Search Engines — Slide 35 gidgreen.com/course
Key <BODY> elements
From Code to Product Lecture 8 — Search Engines — Slide 36 gidgreen.com/course
<body> <h1>Uber List Manager</h1> <p>The world's leading & best priced list
management software.</p> <h2>Features</h2> <h2>Customer stories</h2> <img src="images/ulm.jpeg" width="320" height="240"
alt="Screenshot" title="ULM in action"> <form action="form.php" method="post"> <input type="submit" value="Submit"> </form> <iframe src="iframe.html" width="300"
height="300"></iframe> </body>
External file summary
From Code to Product Lecture 8 — Search Engines — Slide 37 gidgreen.com/course
index.html
<html lang="en"> <head> <title> Uber List... </title> </head> <body> ... </body>
</html>
styles.css
body {background-color:yellow;}
h1 {font-size:24px;}
a:hover {text-decoration: underline;}
script.js
document.onkeydown=okd;
window.onbeforeunload=obl;
if (t>=0)
d=new Date();
Iframe.html
<html> <head>
</head> <body> ... </body>
</html>
ulm.jpeg
Links
From Code to Product Lecture 8 — Search Engines — Slide 38 gidgreen.com/course
Click <a href="more-information.html">here</a> for ULM benefits and pricing.
Click for <a href="more-information.html">ULM benefits and pricing</a>.
Click for <a href="more-information.html" title="ULM benefits and pricing">more about ULM</A>.
Better than <a href="http://slowlists.com/" rel="nofollow">our competitors</a>.
<a href="pricing.html"><img src="dollar-bill.jpeg" alt="Pricing"></a>
Internal targets
From Code to Product Lecture 8 — Search Engines — Slide 39 gidgreen.com/course
<a href="#history">2 History</a>
.
.
.
<h2 id="history">History</h2>
<a name="history"></a><h2>History</h2>
Rich snippets
<span itemprop="reviewCount">808</span> <span itemprop="streetAddress">204 E 43rd St</span> <span id="bizPhone" itemprop="telephone">(212) 972-1001</span> <span itemprop="priceRange">$$$$</span> <meta itemprop="ratingValue" content="4.5"> From Code to Product Lecture 8 — Search Engines — Slide 40 gidgreen.com/course
HTML5
<article> <section> <hgroup> <aside> <header> <footer> <nav>
From Code to Product Lecture 8 — Search Engines — Slide 41 gidgreen.com/course
http
://w
ww
.mic
hael
crop
per.
co.u
k/20
12/0
3/ht
ml5
-se
o-be
st-p
ract
ices
-103
2.ht
ml
Lecture 8
• Introduction • How search works • Anatomy of HTML • On-page factors • Off-page factors • PageRank
From Code to Product Lecture 8 — Search Engines — Slide 42 gidgreen.com/course
Page title
From Code to Product Lecture 8 — Search Engines — Slide 43 gidgreen.com/course
No more than ~70 characters / 10 words
URLs
SEO Cheat Sheet: Anatomy of A URL
©2009 SEOmoz ·∙ www.seomoz.org ·∙ Read SEOmoz. Rank Be er.
http://store.example.com/topics/subtopic/descriptive-product-name#top
1 ProtocolSubdomainDomainTop-Level DomainFolders / PathsPageNamed Anchor
2
1
3
4
5
6
7
2 3 4 5 5 6 7
http://www.example.com/index.php?product=1234&sort=price&print=11 2 3 4 5 6 7
1 ProtocolSubdomainDomainTop-Level DomainPage / File NameFile ExtensionCGI Parameters
2
3
4
5
6
7
7 7
Popular TLDs2
.com.net.org.edu.info
.biz.name
Popular ccTLDs*.cn.de.uk.nl
.eu.ru.ar
- China- Germany- United Kingdom- Netherlands- European Union- Russian Federation- Argentina
Popular Extensions.htm
.html.php.asp
.aspx.cfm.jsp
- Static HTML- Static HTML- PHP code- ASP code- ASP.NET- ColdFusion- Java Code
Keyword Priority1
Observed Google priorityof keyword placement:
- commercial- infrastructure- non-profit- schools- informational- small business- personal sites
SEO
-FR
IEN
DLY
UR
LO
LD D
YN
AM
IC U
RL
(1) Domain(2) Subdomain(3) Folder(4) Path/Page
1 SEOmoz correlational data (2009)
2 Verisign domain report (2009) * ccTLD = Country Code TLD
SEO Tips for URLs• Use subdomains carefully. They may be treated as separate entities, splitting domain authority.
• Separate path & page keywords with hyphens (”-”).
• Anchors may help engines understand page structure.
• Keyword effectiveness in URLs decreases as URL length and keyword position increases.1
2
1
From Code to Product Lecture 8 — Search Engines — Slide 44 gidgreen.com/course
http
://w
ww
.seo
moz
.org
/blo
g/se
o-ch
eat-
shee
t-an
atom
y-of
-a-u
rl
URLs: good vs bad www.really-cheap-great-mailing-list-manager.info www.mailingmanager.com googleblog.blogspot.com/view?post_id=3982098§ion_id=231 googleblog.blogspot.com/2012/04/introducing-google-drive.html amazon.com/store/products/books/computing/internet/seo/Eric+Edge/The%20Art%20Of%20SEO/details amazon.com/The-Art-SEO-Eric-Edge
From Code to Product Lecture 8 — Search Engines — Slide 45 gidgreen.com/course
Meta descriptions
From Code to Product Lecture 8 — Search Engines — Slide 46 gidgreen.com/course
Used for display but not for ranking Length: 150~160 characters
Avoid duplication across many pages
Keyword density
From Code to Product Lecture 8 — Search Engines — Slide 47 gidgreen.com/course
Formatting
From Code to Product Lecture 8 — Search Engines — Slide 48 gidgreen.com/course
and <b>good value</b>
and <span style="font-weight:bold;">good value</span>
and <span class="emboldened">good value</span>
and <em>good value</em>
and <strong>good value</strong>
<font size="+2>Features</font>
<big>Features</big>
<p style="font-size:24px;">Features</p>
<h2>Features<h2>
<h2 style="font-size:24px;">Features</h2>
Freshness and speed
• Freshness determined by: – Date the page appeared – Frequency of content change – Amount of content change – Rate of new incoming links
From Code to Product Lecture 8 — Search Engines — Slide 49 gidgreen.com/course
Javascript and Flash
From Code to Product Lecture 8 — Search Engines — Slide 50 gidgreen.com/course
Lecture 8
• Introduction • How search works • Anatomy of HTML • On-page factors • Off-page factors • PageRank
From Code to Product Lecture 8 — Search Engines — Slide 51 gidgreen.com/course
Links from external sites
• From high ranking sites – Hard to manipulate
• From .edu or .gov – No commercial motivation
• From topic-related sites • From many sites – Diversity of subject – Different ownership / IP block
From Code to Product Lecture 8 — Search Engines — Slide 52 gidgreen.com/course
Links on external pages
• All-important anchor text – First appearance counts – Diversity of anchors
• Higher on linking page • From core text content – Not navigation/footers – Image ALT text weaker
• Page has other good links
From Code to Product Lecture 8 — Search Engines — Slide 53 gidgreen.com/course
Power of anchors
From Code to Product Lecture 8 — Search Engines — Slide 54 gidgreen.com/course
Titles and URLs in anchors
Wikipedia, the free encyclopedia — 450 saves
Visit Wikipedia for more information
Recent referrers: en.wikipedia.org
http://en.wikipedia.org/wiki/Main_Page
From Code to Product Lecture 8 — Search Engines — Slide 55 gidgreen.com/course
Attracting links
• (Directories e.g. dmoz) • Inbound marketing – Great on-site content – Post articles elsewhere – Request reviews
• Viral marketing – Banners + widgets – Social network sharing
From Code to Product Lecture 8 — Search Engines — Slide 56 gidgreen.com/course
Link bait
From Code to Product Lecture 8 — Search Engines — Slide 57 gidgreen.com/course
Domain information
From Code to Product Lecture 8 — Search Engines — Slide 58 gidgreen.com/course
User monitoring
From Code to Product Lecture 8 — Search Engines — Slide 59 gidgreen.com/course
Duplicate content
• Other sites stealing your content • www.domain.com vs domain.com • domain.com/ vs domain.com/index.html • Printer-friendly versions • URL parameters
From Code to Product Lecture 8 — Search Engines — Slide 60 gidgreen.com/course
robots.txt
User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/ User-agent: BadBot Disallow: / Sitemap: http://www.example.com/sitemap.xml
From Code to Product Lecture 8 — Search Engines — Slide 61 gidgreen.com/course
Or in <HEAD> of page: <meta name="robots" content="noindex">
XML sitemaps
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset> From Code to Product Lecture 8 — Search Engines — Slide 62 gidgreen.com/course
Redirects and rel=canonical
From Code to Product Lecture 8 — Search Engines — Slide 63 gidgreen.com/course
http
://w
ww
.seo
moz
.org
/blo
g/an
-seo
s-gu
ide-
to-h
ttp-
stat
us-c
odes
Putting it all together
From Code to Product Lecture 8 — Search Engines — Slide 64 gidgreen.com/course
http://www.seomoz.org/article/search-ranking-factors#predictions
Lecture 8
• Introduction • How search works • Anatomy of HTML • On-page factors • Off-page factors • PageRank
From Code to Product Lecture 8 — Search Engines — Slide 65 gidgreen.com/course
A random walk
From Code to Product Lecture 8 — Search Engines — Slide 66 gidgreen.com/course
A
B
C
D
E
Probability distribution
From Code to Product Lecture 8 — Search Engines — Slide 67 gidgreen.com/course
http
://e
n.w
ikip
edia
.org
/wik
i/Fi
le:P
ageR
anks
-Exa
mpl
e.sv
g
The maths
From Code to Product Lecture 8 — Search Engines — Slide 68 gidgreen.com/course
http://en.wikipedia.org/wiki/PageRank
PageRank sculpting?
From Code to Product Lecture 8 — Search Engines — Slide 69 gidgreen.com/course
http
://w
ww
.seo
moz
.org
/blo
g/go
ogle
-say
s-ye
s-yo
u-ca
n-st
ill-
scul
pt-p
ager
ank-
no-y
ou-c
ant-
do-i
t-w
ith-
nofo
llow
PageRank in reality
• Domain authority signals • Nofollow links are clicked by people • Interval vs external links • Paid link and link farm detection • Removed from toolbar in 2009
From Code to Product Lecture 8 — Search Engines — Slide 70 gidgreen.com/course
https://sites.google.com/site/webmasterhelpforum/en/faq--crawling--indexing---ranking#pagerank
Tools
From Code to Product Lecture 8 — Search Engines — Slide 71 gidgreen.com/course