Originally presented at SXSW March 13, 2011, on panel with Fred Beecher and Austin Govella. Modified and updated for Web 2.0 Expo talk, October 12, 2011, UX Web Summit September 26, 2012; Webdagene September 10, 2013.
Text of Site Search Analytics in a Nutshell
Site Search Analytics in a Nutshell Louis Rosenfeld
[email protected] @louisrosenfeld Webdagane 10 September
2013
Hello, my name is Lou www.louisrosenfeld.com |
www.rosenfeldmedia.com
Lets look at the data
No, lets look at the real data Critical elements in bold: IP
address, time/date stamp, query, and # of results: XXX.XXX.X.104 -
- [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET
/searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
No, lets look at the real data Critical elements in bold: IP
address, time/date stamp, query, and # of results: XXX.XXX.X.104 -
- [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET
/searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 What are users
searching?
No, lets look at the real data Critical elements in bold: IP
address, time/date stamp, query, and # of results: XXX.XXX.X.104 -
- [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET
/searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 What are users
searching? How often are users failing?
SSA is semantically rich data, and...
SSA is semantically rich data, and... Queries sorted by
frequency
...what users want--in their own words
A little goes a long wayA handful of queries/tasks/ways to
navigate/features/ documents meet the needs of your most important
audiences
A little goes a long wayA handful of queries/tasks/ways to
navigate/features/ documents meet the needs of your most important
audiences Not all queries are distributed equally
A little goes a long wayA handful of queries/tasks/ways to
navigate/features/ documents meet the needs of your most important
audiences
A little goes a long wayA handful of queries/tasks/ways to
navigate/features/ documents meet the needs of your most important
audiences Nor do they diminish gradually
A little goes a long wayA handful of queries/tasks/ways to
navigate/features/ documents meet the needs of your most important
audiences
A little goes a long wayA handful of queries/tasks/ways to
navigate/features/ documents meet the needs of your most important
audiences 80/20 rule isnt quite accurate
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long) The Long Tail is much longer than
youd suspect
The Zipf Distribution, textually
Some things you can do with SSA 1.Make it harder to get lost in
deep content 2.Make search smarter 3.Reduce jargon 4.Learn how your
audiences differ 5.Know when to publish what 6.Own and enjoy your
failures 7.Avoid disaster 8.Predict the future
#1 Make it harder to get lost
Start with basic SSA data: queries and query frequency Percent:
volume of search activity for a unique query during a particular
time period Cumulative Percent: running sum of percentages
Tease out common content types
Tease out common content types
Tease out common content types Took an hour to... Analyze top
50 queries (20% of all search activity) Ask and iterate: what kind
of content would users be looking for when they searched these
terms? Add cumulative percentages Result: prioritized list of
potential content types #1) application: 11.77% #2) reference:
10.5% #3) instructions: 8.6% #4) main/navigation pages: 5.91% #5)
contact info: 5.79% #6) news/announcements: 4.27%
Clear content types lead to better contextual navigation artist
descriptions album reviews album pages artist biosdiscography TV
listings
#2 Make search smarter
Clear content types improve search performance
Clear content types improve search performance
Clear content types improve search performance Content objects
related to products
Clear content types improve search performance Content objects
related to products Raw search results
Contextualizing advanced features
Session data suggest progression and context
Session data suggest progression and context search session
patterns 1. solar energy 2. how solar energy works
Session data suggest progression and context search session
patterns 1. solar energy 2. how solar energy works search session
patterns 1. solar energy 2. energy
Session data suggest progression and context search session
patterns 1. solar energy 2. how solar energy works search session
patterns 1. solar energy 2. energy search session patterns 1. solar
energy 2. solar energy charts
Session data suggest progression and context search session
patterns 1. solar energy 2. how solar energy works search session
patterns 1. solar energy 2. energy search session patterns 1. solar
energy 2. solar energy charts search session patterns 1. solar
energy 2. explain solar energy
Session data suggest progression and context search session
patterns 1. solar energy 2. how solar energy works search session
patterns 1. solar energy 2. energy search session patterns 1. solar
energy 2. solar energy charts search session patterns 1. solar
energy 2. explain solar energy search session patterns 1. solar
energy 2. solar energy news
Recognizing proper nouns, dates, and unique ID#s
#3 Reduce jargon
Saving the brand by killing jargon at a community college
Jargon related to online education: FlexEd, COD, College on Demand
Marketings solution: expensive campaign to educate public (via
posters, brochures) The Numbers (from SSA): Result: content
relabeled, money saved query rank query #22 online* #101 COD #259
College on Demand #389 FlexTrack *onlinepart of 213 queries
#4 Learn how your audiences differ
Who cares about what?
Who cares about what?
Who cares about what?
Who cares about what?
Why analyze queries by audience? Fortify your personas with
data Learn about differences between audiences Open University
Enquirers: 16 of 25 queries are for subjects not taught at OU Open
University Students: search for course codes, topics dealing with
completing program Determine whats commonly important to all
audiences (these queries better work well)
#5 Know when to publish what
Interest in the football team: going...
Interest in the football team: going... ...going...
Interest in the football team: going... ...going... gone
Interest in the football team: going... ...going... gone Time
to study!
Where navigation is failing (Professional Resources page) Do
users and AIGA mean different things by Professional
Resources?
Comparing what users nd and what they want
Comparing what users nd and what they want
Failed business goals? Developing custom metrics Netix asks 1.
Which movies most frequently searched? (query count) 2. Which of
them most frequently clicked through? (MDP views) 3. Which of them
least frequently added to queue? (queue adds)
Failed business goals? Developing custom metrics Netix asks 1.
Which movies most frequently searched? (query count) 2. Which of
them most frequently clicked through? (MDP views) 3. Which of them
least frequently added to queue? (queue adds)
Failed business goals? Developing custom metrics Netix asks 1.
Which movies most frequently searched? (query count) 2. Which of
them most frequently clicked through? (MDP views) 3. Which of them
least frequently added to queue? (queue adds)
#7 Avoid disasters
The new and improved search engine that wasnt Vanguard used SSA
to help benchmark existing search engines performance and help
select new engine New search engine performed poorly But IT needed
convincing to delay launch Information Architect & Dev Team
Meeting Search seems to have a few problems Nah . Wheres the proof?
You cant tell for sure.
What to do? Test performance of common queries Before and after
testing using two sets of metrics 1.Relevance: how reliably the
search engine returns the best matches rst 2.Precision: proportion
of relevant results clustered at the top of the list
Old engine (target) and new compared Note: low relevance and
high precision scores are optimal More on Vanguard case study:
http://bit.ly/D3B8c
Old engine (target) and new compared Note: low relevance and
high precision scores are optimal More on Vanguard case study:
http://bit.ly/D3B8c uh-oh
Old engine (target) and new compared Note: low relevance and
high precision scores are optimal More on Vanguard case study:
http://bit.ly/D3B8c uh-oh better
#8 Predict the future
Shaping the FinancialTimes editorial agenda FT compares these
Spiking queries for proper nouns (i.e., people and companies)
Recent editorial coverage of people and companies Discrepancy?
Breaking story?! Let the editors know! Seed your
Can SSA bring us together?
Lous TABLE OF OVERGENERALIZED DICHOTOMIES Web Analytics User
Experience What they analyze Users' behaviors (what's happening)
Users' intentions and motives (why those things happen) What
methods they employ Quantitative methods to determine what's
happening Qualitative methods for explaining why things happen What
they're trying to achieve Helps the organization meet goals
(expressed as KPI) Helps users achieve goals (expressed as tasks or
topics of interest) How they use data Measure performance (goal-
driven analysis) Uncover patterns and surprises (emergent analysis)
What kind of data they use Statistical data ("real" data in large
volumes, full of errors) Descriptive data (in small volumes,
generated in lab environment, full of errors)
Lands End and SKUs
Lands End and SKUs SKU: # 39072-2AH1
Use SSA to start work on a site report card
Use SSA to start work on a site report card SSA helps determine
common information needs
Read this Search Analytics forYour Site: Conversations with
Your Customers by Louis Rosenfeld (Rosenfeld Media, 2011)
www.rosenfeldmedia.com Use code WEBDAGENE2013 for 20% o all
Rosenfeld Media books
Louis Rosenfeld [email protected] www.louisrosenfeld.com
www.rosenfeldmedia.com www.slideshare.net/lrosenfeld
@louisrosenfeld @rosenfeldmedia Say hello