Upload
tess
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
White Hat Cloaking – Six Practical Applications. Presented by Hamlet Batista. “Good” vs “bad” cloaking is all about your intention Always weigh the risks versus the rewards of cloaking Ask permission— or just don’t call it cloaking! Cloaking vs “IP delivery”. Why white hat cloaking?. - PowerPoint PPT Presentation
Citation preview
White Hat Cloaking – Six Practical ApplicationsPresented by Hamlet Batista
Page 2
Why white hat cloaking?
“Good” vs “bad” cloaking is all about your intention
Always weigh the risks versus the rewards of cloaking
Ask permission— or just don’t call it cloaking!Cloaking vs “IP delivery”
Page 3
Crash course in white hat cloaking
When to cloak?
How do we cloak?
How can cloaking be detected?
Risks and next steps
1
2
4
5
Practical scenarios where good cloaking makes sense
Practical scenarios and alternatives
3
Page 4
When is practical to cloak?
Content accessibility- Search unfriendly Content Management Systems- Rich media sites- Content behind forms
Membership sites- Free and paid content
Site structure improvements- Alternative to PR sculpting via “no-follow“
Geolocation/IP delivery Multivariate testing
Page 5
Practical scenario #1
Regular users see URLs with many dynamic parameters URLs with session IDs URLs with canonicalization issues Missing titles and meta descriptions
Search engine robot sees Search engine friendly URLs URLs without session IDs URLs with a consistent naming
convention Automatically generated titles and
meta descriptions
Proprietary website management systems that are not search-engine friendly
Page 6
Practical scenario #2Sites built completely in Flash, Silverlight or any other rich media technology
Search engine robot sees A text representation of all graphical
(images) elements A text representation of all motion
(video) elements A text transcription of all audio in the
rich media content
Page 7
Practical scenario #3
Membership sites
Search users see Snippets of premium content on the
SERPs When they land on the site they are
faced with a registration form
Members sees The same content search engine
robots see
Page 8
Practical scenario #4
Step 1 Step 2 Step 3 Step 4 Step 5
Regular users follow a link structure designed for ease of navigation
Sites requiring massive site strucuture changes to improve index penetration
Search engine robots follow a link structure designed for ease of crawling and deeper index penetration of the most important content
Step 1 Step 3Step 2 Step 5Step 4
Page 9
Practical scenario #5
Sites using geolocation technology
Regular users see Content tailored to their geographical
location and/or user’s language
Search engine robot sees The same content consistently
Page 10
Practical scenario #6
Split testing organic search landing pages
Each regular user sees One of the content experiment
alternatives
Search engine robot sees The same content consistently
Page 11
How do we cloak?
Search robot detection By HTTP User agent By IP address By HTTP cookie test By JavaScript/CSS test By DNS double check By visitor behavior By combining all the techniques
Content delivery Presenting the equivalent of the
inaccesible content to robots Presenting the search-engine friendly
content to robots Presenting the content behind forms
robots
Cloaking is performed with a web server script or module
Page 12
Robot detection by HTTP user agent
Search robot HTTP request
66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “-”
A very simple robot detection technique
Page 13
Robot detection by HTTP cookie test
Search robot HTTP request
66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “Missing cookie info”
Another simple robot detection technique, but weaker
HTML Code
<div id="header"><h1><a href="http://www.example.com" title="Example Site">Example site</a></h1></div>
and the CSS code is pretty straight forward, it swaps out anything in the h1 tag in the header with an image
CSS Code
/* CSS Image replacement */#header h1 {margin:0; padding:0;}#header h1 a {display: block;padding: 150px 0 0 0;background: url(path to image) top right no-repeat;overflow: hidden;font-size: 1px;line-height: 1px;height: 0px !important;height /**/:150px;}
Page 14
Robot detection by JavaScript/CSS test
DHTML Content
Another option for robot detection
Page 15
Robot detection by IP address
Search robot HTTP request
66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “-”
A more robust robot detection technique
Page 16
Robot detection by double DNS check
Search robot HTTP requestnslookup
66.249.66.1
Name: crawl-66-249-66-1.googlebot.comAddress: 66.249.66.1
crawl-66-249-66-1.googlebot.com
Non-authoritative answer:Name: crawl-66-249-66-1.googlebot.comAddress: 66.249.66.1
A more robust robot detection technique
Page 17
Robot detection by visitor behavior
Robots differ substantially from regular users when visiting a website
Page 18
Combining the best of all techniques
Maintain a cache with a list of known search robots to reduce the
number of verification attempts
Label as possible robot any visitor with
suspicious behavior
Label a robot anything that identifies as such
Confirm it is a robot by doing a double DNS check. Also confirm suspect robots
Page 19
Clever cloaking detectionA clever detection technique is to check the caches at the newest datacenters
IP-based detection techniques rely on an up-to-date list of robot IPs
Search engines change IPs on a regular basis
It is possible to identify those new IPs and check the cache
Page 20
Risks of cloaking
Search engines do not want to accept any type of cloaking
Survival tips The safest way to cloak is to ask for
permission from each of the search engines that you care about
Refer to it as IP delivery.
Cloaking: Serving different content to users than to Googlebot. This is a violation of our webmaster guidelines. If the file that Googlebot sees is not identical to the file that a typical user sees, then you're in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.
http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html
Page 21
Next Steps
Make sure clients understand the risks/rewards of implementing white hat cloaking
More information and how to get started- How Google defines IP delivery, geolocation and cloaking
http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html
- First Click Free http://googlenewsblog.blogspot.com/2007/09/first-click-free.html- Good Cloaking, Evil Cloaking and Detection
http://searchengineland.com/070301-065358.php- YADAC: Yet Another Debate About Cloaking Happens Again
http://searchengineland.com/070304-231603.php- Cloaking is OK Says Google http://blog.venture-skills.co.uk/2007/07/06/cloaking-
is-ok-says-google/- Advanced Cloaking Technique: How to feed password-protected content to
search engine spiders http://hamletbatista.com/2007/09/03/advanced-cloaking-technique-how-to-feed-password-protected-content-to-search-engine-spiders/
Blog http://hamletbatista.com
LinkedIn http://www.linkedin.com/in/hamletbatista
Facebook http://www.facebook.com/people/Hamlet_Batista/613808617
Twitter http://twitter.com/hamletbatista
E-mail [email protected]
Page 22
I would be happy to help.
Feel free tocontact me