41
Academic Computing & Network Services Search Analytics: Search Analytics: Conversations with Conversations with Your Customers Your Customers Rich Wiggins Senior Information Technologist Michigan State University

Search Analytics: Conversations with Your Customers

  • View
    9.541

  • Download
    0

Embed Size (px)

DESCRIPTION

Did you know that the search box on your home page handles half or more of all your visitors requests? What do people search for most often when they visit your Web site? How can you tune your site search -- and your site -- to perform better? Rich Wiggins presents a talk that he and co-author Lou Rosenfeld prepared, covering the topis of search analytics, Best Bets, and tuning your Web site to match what your customers seek.

Citation preview

Page 1: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Search Analytics: Search Analytics: Conversations with Conversations with Your CustomersYour Customers

Rich Wiggins

Senior Information Technologist

Michigan State University

Page 2: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Blacksburg CondolencesBlacksburg Condolences

Page 3: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

ThesisThesis

• By analyzing search logs, you engage in a conversation with your customers

• At best, it’s a two way conversation:– Your users tell you what they seek– You tune your search engine (and your site) to give

them what they seek the most

• If you’re not analyzing your search logs, then you aren’t listening to your customers

• Search is too important to leave in the hands of robots

Page 4: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

The Wonderful Things The Wonderful Things Search Engines DoSearch Engines Do• Help harness massive amounts of content

– Thousands, millions, billions of URLs

• Cut across barriers– Document structure– Topical structure– Institutional structure, silos

Page 5: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

The Horrible Things that The Horrible Things that Search Engines DoSearch Engines Do

• Confuse low-value content with vital content– And point to obsolete content– And draft, internal, duplicative content

• Rank leaf pages ahead of starting points

• Rank popular or personal pages ahead of official content

Page 6: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Understand the Importance Understand the Importance of the Search Box of the Search Box

Page 7: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

MSU Keywords: MSU Keywords: Accidental ThesaurusAccidental Thesaurus• Circa 1999 MSU’s local AltaVista stopped

scaling• Search for “human resources” and you get

resume for a student in the HR program• We had to do something• We asked AltaVista for a way to goose the real

HR site to the top of the hit list• They didn’t deliver• So we rolled our own Best Bets service, called it

MSU Keywords• And it worked!

Page 8: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

MethodologyMethodology

• Study the most popular unique searches

• Map each to appropriate URL– “human resources” -> hr.msu.edu– “campus map” -> www.msu.edu/maps

• Watch the results:

• User complaints go down

• So do content provider complaints

• Continue to watch, learn, and act

Page 9: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Google Has Trained ’Em Google Has Trained ’Em to Search Firstto Search First

• Top 10 searches, www.msu.edu, Jan 2007

• “map” is a top search even with a map logo on the home page

• MSU Usability Center, testing 2006 redesign, ordered testers to stay away from the search box

• Nielsen 50% theory may underestimate

Unique Query

7218 campus map

5859 map

5184 im west

4320 library

3745 study abroad

3690 schedule of courses

3584 bookstore

3575 spartantrak

3229 angel

3204 cata

Page 10: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

The Zipf Curve: The Zipf Curve: Short Head, Torso, and Long TailShort Head, Torso, and Long Tail

Page 11: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Keep It In ProportionKeep It In Proportion

• 7218 campus map• 5859 map• 5184 im west• 4320 library• 3745 study abroad• 3690 schedule of courses• 3584 bookstore• 3575 spartantrak• 3229 angel• 3204 cata

Page 12: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Find the Sweet Spot; Find the Sweet Spot; Avoid Diminishing Avoid Diminishing ReturnsReturns

Rank Cumulative Percent

Count Query

1 1.40 7218 campus map

14 10.53 2464 housing

42 20.18 1351 webenroll

98 30.01 650 computer center

221 40.05 295 msu union

500 50.02 124 hotels

7877 80.00 7 department of surgery

Page 13: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Does Best Bets Apply to Does Best Bets Apply to Everyone?Everyone?

• Walter Underwood, former chief architect of Ultraseek:

• Perhaps you need a better search engine instead of Best Bets

• Best Bets requires human labor– Commitment of time and attention– … so do good search engine implementations

Page 14: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

We Didn’t Start the Fire; We Didn’t Start the Fire; Credit to:Credit to:• Vilfredo Pareto, circa 1890 – “the law of

the vital few” (simplified as “80-20 rule”)• George Kingsley Zipf, Harvard, circa 1932 –

counting the words used in Joyce’s Ulysses – “the” is more common than “no” or “Dublin”

• Bradford’s Law of Scattering, circa 1934 – a small number of journals accounts for a large percent of all important papers– Cited, most importantly, by the pricing model of

Elsevier for leading scientific journals

Page 15: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Anatomy of a Search Anatomy of a Search Log (from Google Log (from Google Search Appliance)Search Appliance)

• Critical elements in bold: Critical elements in bold: IP addressIP address, , time/date stamptime/date stamp, , queryquery, , and and # of results:# of results:

• XXX.XXX.X.104XXX.XXX.X.104 - - [ - - [10/Jul/2006:10:25:4610/Jul/2006:10:25:46 -0800] "GET /search? -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3ALaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=8&proxystylesheet=www&q=lincense+platelincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" &ip=XXX.XXX.X.104 HTTP/1.1" 200 971 200 971 00 0.02 0.02

• XXX.XXX.X.104XXX.XXX.X.104 - - [ - - [10/Jul/2006:10:25:4810/Jul/2006:10:25:48 -0800] "GET /search? -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=8&client=www&q=license+platelicense+plate&ud=1&site=AllSites&spell=1&oe=UTF-&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146146 0.16 0.16

• XXX.XXX.XX.130XXX.XXX.XX.130 - - [ - - [10/Jul/2006:10:24:3810/Jul/2006:10:24:38 -0800] "GET /search? -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3ALaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=8&proxystylesheet=www&q=regional+transportation+governance+commissregional+transportation+governance+commissionion&ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 &ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 6262 0.17 0.17

Full legend and more examples available from book site

Page 16: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Sample Query Analysis Sample Query Analysis ReportReport

Excel template available from book site

Page 17: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Querying your Queries: Querying your Queries: Some basic questionsSome basic questions 1/21/21. What are the most common unique queries?

2. Do any interesting patterns emerge from analyzing these common queries?

3. When common queries are searched, are the results the ones your users should be seeing?

4. Which common queries retrieve zero results?

5. Which common queries retrieve a large number of results, say 100 or more?

Page 18: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Querying your Queries: Querying your Queries: Some basic questionsSome basic questions 2/22/26. Which common queries retrieve results that don’t get

clicked through?

7. What page is the top source (referrer) per common query?

8. What is the number of click-throughs per common query?

9. Which result is most frequently clicked-through per common query?

10. What’s the average query length (number of terms, number of characters)?

11. Which URLs are users searching for?

Page 19: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Tune your Questions:Tune your Questions:Broad to specificBroad to specific

• Netflix asks:1. Which movies most frequently searched?

2. Which of them most frequently clicked through?

3. Which of them least frequently added to queue (and why)?

Examples: – “OO7” versus “007”– Porn-related (not carried by Netflix)– “yoga”: not stocking enough? or not indexing

enough record content?

Page 20: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

SA as Diagnostic Tool: SA as Diagnostic Tool: What can you fix or What can you fix or improve?improve?• User Research

• Interface Design: search entry interface, search results

• Retrieval Algorithm Modification

• Navigation Design

• Metadata Development

• Content Development

Page 21: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

User Research:User Research:What do they want?…What do they want?…• SA is a true expression of users’

information needs (often surprising: e.g., SKU numbers at LL Bean; URLs at IBM)

• Provides context by displaying aspects of single search sessions

Page 22: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

User Research:User Research:…who wants it?……who wants it?…

• What can you learn from knowing these things?– What specific segments want; determined by:

• Security clearance• IP address• Job function• Account information

– Which pages they initiate searches from

Page 23: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Look for Topical Patterns Look for Topical Patterns and Seasonal Changesand Seasonal Changes

Page 24: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

User Research:User Research:…and when do they want it?…and when do they want it?

• Time-based variation (and clustered queries)• By hour, by day,

by season• Helps determine

“best bets” and“guide” develop-ment

Page 25: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Search Entry Interface Design:Search Entry Interface Design:“The Box” or something else?“The Box” or something else?

• SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative)

• Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching)

…OR…

Page 26: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Search Results Interface Search Results Interface Design:Design:Which results where?Which results where?• #10 result is clicked through more often

than #s 6, 7, 8, and 9 (ten results per page)

From SLI Systems (www.sli-systems.com)

Page 27: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Search Results Interface Search Results Interface Design:Design:How to sort results?How to sort results?• Financial Times has found that users often

include dates in their queries• Obvious but effective improvement: allow users

to sort by date

Page 28: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Search System:Search System:What to change?What to change?

• Identify new functionality: Financial Times added spell checking

• Retrieval algorithm modifications:– Deloitte, Barnes & Noble use SA to

demonstrate that basic improvements (e.g., Best Bets) are insufficient

– Financial Times weights company names higher

Page 29: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Navigation:Navigation:Any improvements?Any improvements?

• Michigan State University builds A-Z index automatically based on frequent queries

Page 30: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Navigation:Navigation:Where does it fail?Where does it fail?

• Track and study pages (excluding main page) where search is initiated– Are there obvious issues that would cause a “dead

end”? – Are there user studies that could test/validate

problems on these pages?

• Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure

Page 31: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Metadata Development:Metadata Development:How do users express How do users express their needs?their needs?

• SA provides a sense of tone: how users’ needs are expressed – Jargon (e.g., “cancer” vs. “oncology,” “lorry”

vs. “truck,” acronyms)– Length (e.g., number of terms/query)– Syntax (e.g., Boolean, natural language,

keyword)

Page 32: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Metadata Development:Metadata Development:Which metadata values?Which metadata values?• SA helps in the creation

of controlled vocabularies• Terms are fodder for metadata

values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms

• Works with tools that cluster synonyms (example from www.behaviortracking.com), enabling concept searching and thesaurus development

Page 33: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Metadata Development:Metadata Development:Which metadata attributes?Which metadata attributes?

• SA helps in the creation of vocabularies• Simple cluster analysis can detect metadata attributes

(e.g., “product,” “person,” “topic”)• Look for variations between short head and long tail

(Deloitteintranet: “known-item” queries are common; research topics are infrequent)

known-itemqueries

researchqueries

Page 34: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Content Development:Content Development:Do we have the right content?Do we have the right content?

• SA identifies content that can’t be found (0 results)

• Does the content exist? If so, there are wording, metadata, or spidering problems

• If not, why not?

www.behaviortracking.com

Page 35: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Content Development:Content Development:Are we featuring the right stuff?Are we featuring the right stuff?

• Clickthrough tracking helps determine which results should rise to the top (example: SLI Systems)

• Also suggests which “best bets” to develop to address common queries

Page 36: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Organizational Impact:Organizational Impact:Educational Educational OpportunitiesOpportunities• SA is a way to “reverse engineer” how

your site performs in order to:– Sensitize organization to analytics,

specifically related to findability– Sensitize content owners/authors to benefits

of good practices around content titling, tagging, and navigational placement

Page 37: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Organizational Impact:Organizational Impact:Rethinking how you do Rethinking how you do thingsthings• Financial Times learns about breaking

stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage

• Discrepancy = possible breaking story; reporter is assigned to follow up

• Next step? Assign reporters to “beats” that emerge from SA

Page 38: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

SA as User SA as User Research Method: Research Method: Sleeper, but no panaceaSleeper, but no panacea• Benefits

– Non-intrusive– Inexpensive and (usually) accessible– Large volume of “real” data– Represents actual usage patterns

• Drawbacks– Provides an incomplete picture of usage: was user

satisfied at session’s end?– Difficult to analyze: where are the commercial

tools?• Ultimately an excellent complement to

qualitative methods (e.g., task analysis, field studies)

Page 39: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

SA Headaches:SA Headaches:What gets in the way?What gets in the way?

• Lack of time• Few useful tools for parsing logs, generating

reports• Tension between those who want to perform SA

and those who “own” the data (chiefly IT)• Ignorance of the method• Hard work and/or boredom of doing analysis

• From summer 2006 survey (134 responses), available at book site.

Page 40: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Please Share Your SA Please Share Your SA Knowledge:Knowledge:Visit our “book in progress” siteVisit our “book in progress” site

• Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2007)

• Site URL: www.rosenfeldmedia.com/books/searchanalytics/

• Feed URL: feeds.rosenfeldmedia.com/searchanalytics/

• Site contains:• Reading list• Survey results• Perl script for

parsing logs• Log samples• Report templates• …and more

Page 41: Search Analytics: Conversations with Your Customers

Academic Computing& Network Services

Contact InformationContact Information

• Rich Wiggins

[email protected]

• Louis Rosenfeld

[email protected]

• http://rosenfeldmedia.com/books/searchanalytics