22
White Hat Cloaking – Six Practical Applications Presented by Hamlet Batista

White Hat Cloaking – Six Practical Applications

  • Upload
    tess

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

White Hat Cloaking – Six Practical Applications. Presented by Hamlet Batista. “Good” vs “bad” cloaking is all about your intention Always weigh the risks versus the rewards of cloaking Ask permission— or just don’t call it cloaking! Cloaking vs “IP delivery”. Why white hat cloaking?. - PowerPoint PPT Presentation

Citation preview

Page 1: White Hat Cloaking – Six Practical Applications

White Hat Cloaking – Six Practical ApplicationsPresented by Hamlet Batista

Page 2: White Hat Cloaking – Six Practical Applications

Page 2

Why white hat cloaking?

“Good” vs “bad” cloaking is all about your intention

Always weigh the risks versus the rewards of cloaking

Ask permission— or just don’t call it cloaking!Cloaking vs “IP delivery”

Page 3: White Hat Cloaking – Six Practical Applications

Page 3

Crash course in white hat cloaking

When to cloak?

How do we cloak?

How can cloaking be detected?

Risks and next steps

1

2

4

5

Practical scenarios where good cloaking makes sense

Practical scenarios and alternatives

3

Page 4: White Hat Cloaking – Six Practical Applications

Page 4

When is practical to cloak?

Content accessibility- Search unfriendly Content Management Systems- Rich media sites- Content behind forms

Membership sites- Free and paid content

Site structure improvements- Alternative to PR sculpting via “no-follow“

Geolocation/IP delivery Multivariate testing

Page 5: White Hat Cloaking – Six Practical Applications

Page 5

Practical scenario #1

Regular users see URLs with many dynamic parameters URLs with session IDs URLs with canonicalization issues Missing titles and meta descriptions

Search engine robot sees Search engine friendly URLs URLs without session IDs URLs with a consistent naming

convention Automatically generated titles and

meta descriptions

Proprietary website management systems that are not search-engine friendly

Page 6: White Hat Cloaking – Six Practical Applications

Page 6

Practical scenario #2Sites built completely in Flash, Silverlight or any other rich media technology

Search engine robot sees A text representation of all graphical

(images) elements A text representation of all motion

(video) elements A text transcription of all audio in the

rich media content

Page 7: White Hat Cloaking – Six Practical Applications

Page 7

Practical scenario #3

Membership sites

Search users see Snippets of premium content on the

SERPs When they land on the site they are

faced with a registration form

Members sees The same content search engine

robots see

Page 8: White Hat Cloaking – Six Practical Applications

Page 8

Practical scenario #4

Step 1 Step 2 Step 3 Step 4 Step 5

Regular users follow a link structure designed for ease of navigation

Sites requiring massive site strucuture changes to improve index penetration

Search engine robots follow a link structure designed for ease of crawling and deeper index penetration of the most important content

Step 1 Step 3Step 2 Step 5Step 4

Page 9: White Hat Cloaking – Six Practical Applications

Page 9

Practical scenario #5

Sites using geolocation technology

Regular users see Content tailored to their geographical

location and/or user’s language

Search engine robot sees The same content consistently

Page 10: White Hat Cloaking – Six Practical Applications

Page 10

Practical scenario #6

Split testing organic search landing pages

Each regular user sees One of the content experiment

alternatives

Search engine robot sees The same content consistently

Page 11: White Hat Cloaking – Six Practical Applications

Page 11

How do we cloak?

Search robot detection By HTTP User agent By IP address By HTTP cookie test By JavaScript/CSS test By DNS double check By visitor behavior By combining all the techniques

Content delivery Presenting the equivalent of the

inaccesible content to robots Presenting the search-engine friendly

content to robots Presenting the content behind forms

robots

Cloaking is performed with a web server script or module

Page 12: White Hat Cloaking – Six Practical Applications

Page 12

Robot detection by HTTP user agent

Search robot HTTP request

66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “-”

A very simple robot detection technique

Page 13: White Hat Cloaking – Six Practical Applications

Page 13

Robot detection by HTTP cookie test

Search robot HTTP request

66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “Missing cookie info”

Another simple robot detection technique, but weaker

Page 14: White Hat Cloaking – Six Practical Applications

HTML Code

<div id="header"><h1><a href="http://www.example.com" title="Example Site">Example site</a></h1></div>

and the CSS code is pretty straight forward, it swaps out anything in the h1 tag in the header with an image

CSS Code

/* CSS Image replacement */#header h1 {margin:0; padding:0;}#header h1 a {display: block;padding: 150px 0 0 0;background: url(path to image) top right no-repeat;overflow: hidden;font-size: 1px;line-height: 1px;height: 0px !important;height /**/:150px;}

Page 14

Robot detection by JavaScript/CSS test

DHTML Content

Another option for robot detection

Page 15: White Hat Cloaking – Six Practical Applications

Page 15

Robot detection by IP address

Search robot HTTP request

66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “-”

A more robust robot detection technique

Page 16: White Hat Cloaking – Six Practical Applications

Page 16

Robot detection by double DNS check

Search robot HTTP requestnslookup

66.249.66.1

Name: crawl-66-249-66-1.googlebot.comAddress: 66.249.66.1

crawl-66-249-66-1.googlebot.com

Non-authoritative answer:Name: crawl-66-249-66-1.googlebot.comAddress: 66.249.66.1

A more robust robot detection technique

Page 17: White Hat Cloaking – Six Practical Applications

Page 17

Robot detection by visitor behavior

Robots differ substantially from regular users when visiting a website

Page 18: White Hat Cloaking – Six Practical Applications

Page 18

Combining the best of all techniques

Maintain a cache with a list of known search robots to reduce the

number of verification attempts

Label as possible robot any visitor with

suspicious behavior

Label a robot anything that identifies as such

Confirm it is a robot by doing a double DNS check. Also confirm suspect robots

Page 19: White Hat Cloaking – Six Practical Applications

Page 19

Clever cloaking detectionA clever detection technique is to check the caches at the newest datacenters

IP-based detection techniques rely on an up-to-date list of robot IPs

Search engines change IPs on a regular basis

It is possible to identify those new IPs and check the cache

Page 20: White Hat Cloaking – Six Practical Applications

Page 20

Risks of cloaking

Search engines do not want to accept any type of cloaking

Survival tips The safest way to cloak is to ask for

permission from each of the search engines that you care about

Refer to it as IP delivery.

Cloaking: Serving different content to users than to Googlebot. This is a violation of our webmaster guidelines. If the file that Googlebot sees is not identical to the file that a typical user sees, then you're in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.

http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html

Page 21: White Hat Cloaking – Six Practical Applications

Page 21

Next Steps

Make sure clients understand the risks/rewards of implementing white hat cloaking

More information and how to get started- How Google defines IP delivery, geolocation and cloaking

http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html

- First Click Free http://googlenewsblog.blogspot.com/2007/09/first-click-free.html- Good Cloaking, Evil Cloaking and Detection

http://searchengineland.com/070301-065358.php- YADAC: Yet Another Debate About Cloaking Happens Again

http://searchengineland.com/070304-231603.php- Cloaking is OK Says Google http://blog.venture-skills.co.uk/2007/07/06/cloaking-

is-ok-says-google/- Advanced Cloaking Technique: How to feed password-protected content to

search engine spiders http://hamletbatista.com/2007/09/03/advanced-cloaking-technique-how-to-feed-password-protected-content-to-search-engine-spiders/

Page 22: White Hat Cloaking – Six Practical Applications

Blog http://hamletbatista.com

LinkedIn http://www.linkedin.com/in/hamletbatista

Facebook http://www.facebook.com/people/Hamlet_Batista/613808617

Twitter http://twitter.com/hamletbatista

E-mail [email protected]

Page 22

I would be happy to help.

Feel free tocontact me