Making Data from Google Webmaster Tools, Bing and SEOmoz Actionable

Preview:

DESCRIPTION

A presentation from SMX Melbourne 2012 on how to make data from Google Webmaster Tools, Bing and SEOmoz actionable. References log file analysis and external crawl tools to re-inforce the learnings from the major tool providers.

Citation preview

MAKING DATA FROM GWMT &

BING WMT ACTIONABLE

Richard Baxter, Founder, SEOgadget

Download this presentation:

...............................................................................................................................................................................................

So this is how my presentation

started out.

...............................................................................................................................................................................................

I ASKED @DBSEO HE USES GWMT FOR

• Links to your site

• Internal links

• URL parameters

• Crawl errors

• Crawl stats & Index Status

• Fetch as Google

• Sitemaps

• HTML Improvements

• Settings – geo-targeting

The most useful comment: Areas that lead to further “investigation or are used as

part of another process”

...............................................................................................................................................................................................

Features =

BORING presentation

...............................................................................................................................................................................................

Here is what I do to make all that

data actionable

...............................................................................................................................................................................................

#1 Find nasty indexed duplicate

Parameters of horrible-ness

...............................................................................................................................................................................................

This is probably *my* most useful report in GWMT – URL Parameter report shows

parameters found via Google’s Crawl

...............................................................................................................................................................................................

See how useful that is? A quick route to *all* the duplicates. Almost.

...............................................................................................................................................................................................

#2 Deciding if I need to do some log

file analysis

...............................................................................................................................................................................................

This is a simple, *very* high level log file analyser. It’s cool, but not directly

actionable

...............................................................................................................................................................................................

Yeah, there’s something wrong but what?! Let’s do some log file analysis…

...............................................................................................................................................................................................

WHAT A BASIC LOG LOOKS LIKE (FOR NORMAL PEOPLE)

Log file entry for Googlebot crawling from 10.230.15.234

Request IP Address: 10.230.15.234

Timestamp: [19/May/2012:10:10:18+0100]

Request Type: GET

Request URL: /all-about/Gainsborough%20Hotel

Protocol: HTTP/1.1

Header Response: 200

Bytes Transferred: 4 53

Referrer: Often blank, but NOT ALWAYS!

UA: Mozilla/5.0 (compatible; Googlebot/2.1;

+http://www.google.com/bot.html)

...............................................................................................................................................................................................

Data extracted from server logs, frequently occuring request URIs categorised

into buckets…..

...............................................................................................................................................................................................

THIS IS THE TYPE OF DATA YOU CAN GET

0

20000

40000

60000

80000

100000

120000

Total

Count of Request URL

Count of Ajax

Count of Image

Count of iFrame

Count of Widgetame

Log file entry for Googlebot crawling from 10.230.15.234

...............................................................................................................................................................................................

BAD GOOGLEBOT

Data extracted from server logs, frequently occuring request URIs categorised

into buckets…..

{ "ajaxSrc" : "http://www.domain.co.uk/news/uk-

news/massive-increase-in-ritalin-prescriptions-for-

hyperactive-130523?service=ajax&item=Articles"}

...............................................................................................................................................................................................

40,000 blank pages crawled EVERY day

By Google bot and do they tell you?

Nope.

...............................................................................................................................................................................................

AND SERVER HEADERS YOU’RE SERVING

0

20000

40000

60000

80000

100000

120000

301 404 200 500 (blank) 302 503

Total

Total

OH: I’m serving Googlebot with 100,000 301 redirects a day?! Thanks for letting

me know, GWMT…

...............................................................................................................................................................................................

#3 Deal with 404 errors at scale*

*(Reduce by about 60% dynamically)

...............................................................................................................................................................................................

LARGE VOLUMES OF 404 ERRORS

Sigh. Anyone fancy working through these one by one – first 1,000 will be easy,

thanks GWMT!

...............................................................................................................................................................................................

FIX LOTS OF 404 ERRORS WITH LEVENSTIEN DISTANCE

Article by Russ: http://mz.cm/TjRokj Gunnertech’s WP plugin: http://bit.ly/Q3LQao

https://seogadget.co.uk/excel-for-seo-mast3rclass-wour-nzz-w3bin4r/

becomes

https://seogadget.co.uk/excel-for-seo-masterclass-our-next-webinar/

How good is that?!

if($score<20) {

header("HTTP/1.1

301 Moved Permanently");

header("Location:

$correct");

exit;

} else {

return;

...............................................................................................................................................................................................

Don’t you HATE that you can only

get 1,000 site errors from the

web front end of GWMT?

I do…

...............................................................................................................................................................................................

#4 Problem solved

Getting WMT Data with Xampp

...............................................................................................................................................................................................

Install this to C:\xampp\ http://www.apachefriends.org/en/xampp-windows.html

...............................................................................................................................................................................................

Edit php.ini in here

Your programs

go here

...............................................................................................................................................................................................

REMOVE THE ; FROM EXTENSION=PHP_CURL.DLL

Edit Line 990 in c:\xampp\php\php.ini

...............................................................................................................................................................................................

CHANGE MAX_EXECUTION_TIME TO 90

Now check it works

For *very* slow

Hotel internet only

...............................................................................................................................................................................................

Yep, that works

...............................................................................................................................................................................................

NOW CREATE A FOLDER STRUCTURE

Inside C:\xampp\htdocs create a my-programs folder

...............................................................................................................................................................................................

\WMT-FETCH-DATA.PHP

Via mz.cm/JrMpXV thanks to @markginsberg for the intro – full documentation

from Google can be found here: bit.ly/PCrAfv

...............................................................................................................................................................................................

OOPS – DON’T FORGET TO RENAME GWTDATA.PHP

Simple file rename required

...............................................................................................................................................................................................

AND, YOU'RE DONE

FILES! Precious, awesome data.

...............................................................................................................................................................................................

#5 Check if those errors are,

still errors.

...............................................................................................................................................................................................

IS THIS STILL AN ERROR?

Use SEO Tools for Excel via: http://nielsbosma.se/projects/seotools/

IIS Executes links

IN JS – be warned

...............................................................................................................................................................................................

#6 Identify your linked to error

pages

...............................................................................................................................................................................................

IDENTIFY ERROR PAGES WITH LINKS

Use our Mozscape API extension for Excel to get the ACTUAL linked pages…

https://seogadget.co.uk/mozscape

...............................................................................................................................................................................................

#7 Do a proper link analysis by

Combining GWMT, Majestic,

+ SEOmoz

...............................................................................................................................................................................................

DO A FULL SITE LINK ANALYSIS

GWMT has (by far) the most diverse link data, but not all of it!

https://seogadget.co.uk/comparing-link-data-tools/

...............................................................................................................................................................................................

PASTE YOUR COMBINED LINK DATA INTO CLEANUP

tools.seogadget.co.uk – link clean-up and contact or use our api:

tools.seogadget.co.uk/use_api/

...............................................................................................................................................................................................

#8 Use the SEOmoz Pro Crawler

It’s excellent

...............................................................................................................................................................................................

Hey boss, check out my badass error fixing code skills….

............................................................................................................................................................................................... URL

Long URL

Overly-Dynamic URL

4XX (Client Error)

5XX (Server Error)

301 (Permanent Redirect)

Temporary Redirect

Title Missing or Empty

Meta Refresh

Title Element Too Short

Title Element Too Long (> 70 Characters)

Duplicate Page Content

Duplicate Page Title

Too Many On-Page Links

Missing Meta Description Tag

Meta-robots Nofollow

Blocked by X-robots

Blocked by meta-robots

Rel Canonical

Search Engine blocked by robots.txt

http_status_code

x_robots_tag_header

content_type_header

location_header

title

link_count

meta_description_tag

meta_robots_tag

meta_refresh_tag

rel_canonical_tag

duplicate_page_content

duplicate_title

time_crawled

blocking_all_user_agents

blocking_google

blocking_yahoo

blocking_bing

referrer

SEOmoz’s deep crawl export contains over 30 different flags and data points

including x-robots and user agent blocks. Nice – pro.seomoz.org

...............................................................................................................................................................................................

#9 Use Bing…

...............................................................................................................................................................................................

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

MAJESTICHISTORIC

GWMT MAJESTIC FRESH OSE BING SEARCHMETRICS aHrefs

#Links Reported

#UNIQUE RDs

LINK DIVERSITY VIA EXPORT NOT GREAT

Because the export data is limited, about 25% of the reported links in Bing are

available to us

...............................................................................................................................................................................................

LINK ANALYSIS CAN FILTER BY ANCHOR AND LINK TYPE

This is pretty cool, great for detecting over optimised anchor text

...............................................................................................................................................................................................

MARKUP VALIDATOR DOESN'T SPOT ARTICLE SCHEMA

Sigh – this is not as actionable and awesome as Google’s Rich Snippet Testing

Tool. It can’t see Twitter card yet, either.

...............................................................................................................................................................................................

SEO ANALYZER IS AWESOME

This is why you should be using Bing Webmaster Tools!

...............................................................................................................................................................................................

REPORTS AND DATA

Similar to GWMT’s index status

...............................................................................................................................................................................................

INDEX EXPLORER

This is a supremely useful tool – check out the ? Subfolder – all of the query

parameters getting indexed by Bing. Nice.

Richard Baxter, Founder, SEOgadget

Twitter: @richardbaxter

Blog: seogadget.co.uk

Email: richard@seogadget.co.uk

THANK YOU

Recommended