View
111
Download
0
Category
Preview:
DESCRIPTION
A presentation from SMX Melbourne 2012 on how to make data from Google Webmaster Tools, Bing and SEOmoz actionable. References log file analysis and external crawl tools to re-inforce the learnings from the major tool providers.
Citation preview
MAKING DATA FROM GWMT &
BING WMT ACTIONABLE
Richard Baxter, Founder, SEOgadget
Download this presentation:
...............................................................................................................................................................................................
So this is how my presentation
started out.
...............................................................................................................................................................................................
I ASKED @DBSEO HE USES GWMT FOR
• Links to your site
• Internal links
• URL parameters
• Crawl errors
• Crawl stats & Index Status
• Fetch as Google
• Sitemaps
• HTML Improvements
• Settings – geo-targeting
The most useful comment: Areas that lead to further “investigation or are used as
part of another process”
...............................................................................................................................................................................................
Features =
BORING presentation
...............................................................................................................................................................................................
Here is what I do to make all that
data actionable
...............................................................................................................................................................................................
#1 Find nasty indexed duplicate
Parameters of horrible-ness
...............................................................................................................................................................................................
This is probably *my* most useful report in GWMT – URL Parameter report shows
parameters found via Google’s Crawl
...............................................................................................................................................................................................
See how useful that is? A quick route to *all* the duplicates. Almost.
...............................................................................................................................................................................................
#2 Deciding if I need to do some log
file analysis
...............................................................................................................................................................................................
This is a simple, *very* high level log file analyser. It’s cool, but not directly
actionable
...............................................................................................................................................................................................
Yeah, there’s something wrong but what?! Let’s do some log file analysis…
...............................................................................................................................................................................................
WHAT A BASIC LOG LOOKS LIKE (FOR NORMAL PEOPLE)
Log file entry for Googlebot crawling from 10.230.15.234
Request IP Address: 10.230.15.234
Timestamp: [19/May/2012:10:10:18+0100]
Request Type: GET
Request URL: /all-about/Gainsborough%20Hotel
Protocol: HTTP/1.1
Header Response: 200
Bytes Transferred: 4 53
Referrer: Often blank, but NOT ALWAYS!
UA: Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
...............................................................................................................................................................................................
Data extracted from server logs, frequently occuring request URIs categorised
into buckets…..
...............................................................................................................................................................................................
THIS IS THE TYPE OF DATA YOU CAN GET
0
20000
40000
60000
80000
100000
120000
Total
Count of Request URL
Count of Ajax
Count of Image
Count of iFrame
Count of Widgetame
Log file entry for Googlebot crawling from 10.230.15.234
...............................................................................................................................................................................................
BAD GOOGLEBOT
Data extracted from server logs, frequently occuring request URIs categorised
into buckets…..
{ "ajaxSrc" : "http://www.domain.co.uk/news/uk-
news/massive-increase-in-ritalin-prescriptions-for-
hyperactive-130523?service=ajax&item=Articles"}
...............................................................................................................................................................................................
40,000 blank pages crawled EVERY day
By Google bot and do they tell you?
Nope.
...............................................................................................................................................................................................
AND SERVER HEADERS YOU’RE SERVING
0
20000
40000
60000
80000
100000
120000
301 404 200 500 (blank) 302 503
Total
Total
OH: I’m serving Googlebot with 100,000 301 redirects a day?! Thanks for letting
me know, GWMT…
...............................................................................................................................................................................................
#3 Deal with 404 errors at scale*
*(Reduce by about 60% dynamically)
...............................................................................................................................................................................................
LARGE VOLUMES OF 404 ERRORS
Sigh. Anyone fancy working through these one by one – first 1,000 will be easy,
thanks GWMT!
...............................................................................................................................................................................................
FIX LOTS OF 404 ERRORS WITH LEVENSTIEN DISTANCE
Article by Russ: http://mz.cm/TjRokj Gunnertech’s WP plugin: http://bit.ly/Q3LQao
https://seogadget.co.uk/excel-for-seo-mast3rclass-wour-nzz-w3bin4r/
becomes
https://seogadget.co.uk/excel-for-seo-masterclass-our-next-webinar/
How good is that?!
if($score<20) {
header("HTTP/1.1
301 Moved Permanently");
header("Location:
$correct");
exit;
} else {
return;
...............................................................................................................................................................................................
Don’t you HATE that you can only
get 1,000 site errors from the
web front end of GWMT?
I do…
...............................................................................................................................................................................................
#4 Problem solved
Getting WMT Data with Xampp
...............................................................................................................................................................................................
Install this to C:\xampp\ http://www.apachefriends.org/en/xampp-windows.html
...............................................................................................................................................................................................
Edit php.ini in here
Your programs
go here
...............................................................................................................................................................................................
REMOVE THE ; FROM EXTENSION=PHP_CURL.DLL
Edit Line 990 in c:\xampp\php\php.ini
...............................................................................................................................................................................................
CHANGE MAX_EXECUTION_TIME TO 90
Now check it works
For *very* slow
Hotel internet only
...............................................................................................................................................................................................
Yep, that works
...............................................................................................................................................................................................
NOW CREATE A FOLDER STRUCTURE
Inside C:\xampp\htdocs create a my-programs folder
...............................................................................................................................................................................................
IN MY-PROGRAMS
Inside C:\xampp\htdocs create a my-programs folder
1. DOWNLOAD THIS FILE & SAVE: http://php-webmaster-tools-
downloads.googlecode.com/files/gwtdata.v2.php
2. Create a sub folder called \csv
...............................................................................................................................................................................................
\WMT-FETCH-DATA.PHP
Via mz.cm/JrMpXV thanks to @markginsberg for the intro – full documentation
from Google can be found here: bit.ly/PCrAfv
...............................................................................................................................................................................................
OOPS – DON’T FORGET TO RENAME GWTDATA.PHP
Simple file rename required
...............................................................................................................................................................................................
AND, YOU'RE DONE
FILES! Precious, awesome data.
...............................................................................................................................................................................................
#5 Check if those errors are,
still errors.
...............................................................................................................................................................................................
IS THIS STILL AN ERROR?
Use SEO Tools for Excel via: http://nielsbosma.se/projects/seotools/
IIS Executes links
IN JS – be warned
...............................................................................................................................................................................................
#6 Identify your linked to error
pages
...............................................................................................................................................................................................
IDENTIFY ERROR PAGES WITH LINKS
Use our Mozscape API extension for Excel to get the ACTUAL linked pages…
https://seogadget.co.uk/mozscape
...............................................................................................................................................................................................
#7 Do a proper link analysis by
Combining GWMT, Majestic,
+ SEOmoz
...............................................................................................................................................................................................
DO A FULL SITE LINK ANALYSIS
GWMT has (by far) the most diverse link data, but not all of it!
https://seogadget.co.uk/comparing-link-data-tools/
...............................................................................................................................................................................................
PASTE YOUR COMBINED LINK DATA INTO CLEANUP
tools.seogadget.co.uk – link clean-up and contact or use our api:
tools.seogadget.co.uk/use_api/
...............................................................................................................................................................................................
#8 Use the SEOmoz Pro Crawler
It’s excellent
...............................................................................................................................................................................................
Hey boss, check out my badass error fixing code skills….
............................................................................................................................................................................................... URL
Long URL
Overly-Dynamic URL
4XX (Client Error)
5XX (Server Error)
301 (Permanent Redirect)
Temporary Redirect
Title Missing or Empty
Meta Refresh
Title Element Too Short
Title Element Too Long (> 70 Characters)
Duplicate Page Content
Duplicate Page Title
Too Many On-Page Links
Missing Meta Description Tag
Meta-robots Nofollow
Blocked by X-robots
Blocked by meta-robots
Rel Canonical
Search Engine blocked by robots.txt
http_status_code
x_robots_tag_header
content_type_header
location_header
title
link_count
meta_description_tag
meta_robots_tag
meta_refresh_tag
rel_canonical_tag
duplicate_page_content
duplicate_title
time_crawled
blocking_all_user_agents
blocking_google
blocking_yahoo
blocking_bing
referrer
SEOmoz’s deep crawl export contains over 30 different flags and data points
including x-robots and user agent blocks. Nice – pro.seomoz.org
...............................................................................................................................................................................................
#9 Use Bing…
...............................................................................................................................................................................................
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
MAJESTICHISTORIC
GWMT MAJESTIC FRESH OSE BING SEARCHMETRICS aHrefs
#Links Reported
#UNIQUE RDs
LINK DIVERSITY VIA EXPORT NOT GREAT
Because the export data is limited, about 25% of the reported links in Bing are
available to us
...............................................................................................................................................................................................
LINK ANALYSIS CAN FILTER BY ANCHOR AND LINK TYPE
This is pretty cool, great for detecting over optimised anchor text
...............................................................................................................................................................................................
MARKUP VALIDATOR DOESN'T SPOT ARTICLE SCHEMA
Sigh – this is not as actionable and awesome as Google’s Rich Snippet Testing
Tool. It can’t see Twitter card yet, either.
...............................................................................................................................................................................................
SEO ANALYZER IS AWESOME
This is why you should be using Bing Webmaster Tools!
...............................................................................................................................................................................................
REPORTS AND DATA
Similar to GWMT’s index status
...............................................................................................................................................................................................
INDEX EXPLORER
This is a supremely useful tool – check out the ? Subfolder – all of the query
parameters getting indexed by Bing. Nice.
Richard Baxter, Founder, SEOgadget
Twitter: @richardbaxter
Blog: seogadget.co.uk
Email: richard@seogadget.co.uk
THANK YOU
Recommended