Site search analytics workshop presentation

  • Published on
    28-Jan-2015

  • View
    102

  • Download
    0

Embed Size (px)

DESCRIPTION

Workshop presented at Webdagene 2013 (http://webdagene.no/en/) September 9, 2013; UX Lisbon (http://www.ux-lx.com), May 12, 2011; UX Hong Kong (http://www.uxhongkong.com/), February 17, 2011.

Transcript

  • 1. Workshop: Search Analytics forYour Site Louis Rosenfeld lou@louisrosenfeld.com @louisrosenfeld Webdagene 9 September 2013

2. Hello, my name is Lou www.louisrosenfeld.com | www.rosenfeldmedia.com 3. Agenda 1.The basics of Site Search Analytics (SSA) 2.Exercise 1 (pattern analysis) 3.Things you can do with SSA 4.Exercise 2 (longitudinal analysis 5.More things you can do with SSA 6.A case study 7.More on metrics 8.Things you can do today 9.Discussion 4. Lets look at the data 5. No, lets look at the real data Critical elements in bold: IP address, time/date stamp, query, and # of results: XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800] "GET /search?access=p&entqr=0 &output=xml_no_dtd&sort=date%3AD%3AL %3Ad1&ud=1&site=AllSites&ie=UTF-8 &client=www&oe=UTF-8&proxystylesheet=www& q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02 XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800] "GET /searchaccess=p&entqr=0 &output=xml_no_dtd&sort=date%3AD%3AL %3Ad1&ie=UTF-8&client=www& q=license+plate&ud=1&site=AllSites &spell=1&oe=UTF-8&proxystylesheet=www& ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 6. No, lets look at the real data Critical elements in bold: IP address, time/date stamp, query, and # of results: XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800] "GET /search?access=p&entqr=0 &output=xml_no_dtd&sort=date%3AD%3AL %3Ad1&ud=1&site=AllSites&ie=UTF-8 &client=www&oe=UTF-8&proxystylesheet=www& q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02 XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800] "GET /searchaccess=p&entqr=0 &output=xml_no_dtd&sort=date%3AD%3AL %3Ad1&ie=UTF-8&client=www& q=license+plate&ud=1&site=AllSites &spell=1&oe=UTF-8&proxystylesheet=www& ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 What are users searching? 7. No, lets look at the real data Critical elements in bold: IP address, time/date stamp, query, and # of results: XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800] "GET /search?access=p&entqr=0 &output=xml_no_dtd&sort=date%3AD%3AL %3Ad1&ud=1&site=AllSites&ie=UTF-8 &client=www&oe=UTF-8&proxystylesheet=www& q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02 XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800] "GET /searchaccess=p&entqr=0 &output=xml_no_dtd&sort=date%3AD%3AL %3Ad1&ie=UTF-8&client=www& q=license+plate&ud=1&site=AllSites &spell=1&oe=UTF-8&proxystylesheet=www& ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 What are users searching? How often are users failing? 8. SSA is semantically rich data, and... 9. SSA is semantically rich data, and... Queries sorted by frequency 10. ...what users want--in their own words 11. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences 12. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences Not all queries are distributed equally 13. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences 14. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences Nor do they diminish gradually 15. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences 16. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences 80/20 rule isnt quite accurate 17. (and the tail is quite long) 18. (and the tail is quite long) 19. (and the tail is quite long) 20. (and the tail is quite long) 21. (and the tail is quite long) 22. (and the tail is quite long) The Long Tail is much longer than youd suspect 23. The Zipf Distribution, textually 24. Insert Long Tail here 25. Agenda 1.The basics of Site Search Analytics (SSA) 2.Exercise 1 (pattern analysis) 3.Things you can do with SSA 4.Exercise 2 (longitudinal analysis 5.More things you can do with SSA 6.A case study 7.More on metrics 8.Things you can do today 9.Discussion 26. Exercise 1 (pattern analysis) Work in pairs Each pair should have a laptop with Microsoft Excel Laptop platform (Mac, PC) doesnt matter Download data les: 2005-October.xls Refer to exercise sheet No right answers Have fun! 27. Agenda 1.The basics of Site Search Analytics (SSA) 2.Exercise 1 (pattern analysis) 3.Things you can do with SSA 4.Exercise 2 (longitudinal analysis 5.More things you can do with SSA 6.A case study 7.More on metrics 8.Things you can do today 9.Discussion 28. Tune site-wide navigation 29. Nailing the basics in top-down navigation 30. Nailing the basics in top-down navigation 31. Tune contextual navigation 32. Start with basic SSA data: queries and query frequency Percent: volume of search activity for a unique query during a particular time period Cumulative Percent: running sum of percentages 33. Tease out common content types 34. Tease out common content types 35. Tease out common content types Took an hour to... Analyze top 50 queries (20% of all search activity) Ask and iterate: what kind of content would users be looking for when they searched these terms? Add cumulative percentages Result: prioritized list of potential content types #1) application: 11.77% #2) reference: 10.5% #3) instructions: 8.6% #4) main/navigation pages: 5.91% #5) contact info: 5.79% #6) news/announcements: 4.27% 36. Clear content types lead to better contextual navigation artist descriptions album reviews album pages artist biosdiscography TV listings 37. Make search smarter 38. Clear content types improve search performance 39. Clear content types improve search performance 40. Clear content types improve search performance Content objects related to products 41. Clear content types improve search performance Content objects related to products Raw search results 42. Enabling ltering/faceted search 43. Contextualizing advanced features 44. Session data suggest progression and context 45. Session data suggest progression and context search session patterns 1. solar energy 2. how solar energy works 46. Session data suggest progression and context search session patterns 1. solar energy 2. how solar energy works search session patterns 1. solar energy 2. energy 47. Session data suggest progression and context search session patterns 1. solar energy 2. how solar energy works search session patterns 1. solar energy 2. energy search session patterns 1. solar energy 2. solar energy charts 48. Session data suggest progression and context search session patterns 1. solar energy 2. how solar energy works search session patterns 1. solar energy 2. energy search session patterns 1. solar energy 2. solar energy charts search session patterns 1. solar energy 2. explain solar energy 49. Session data suggest progression and context search session patterns 1. solar energy 2. how solar energy works search session patterns 1. solar energy 2. energy search session patterns 1. solar energy 2. solar energy charts search session patterns 1. solar energy 2. explain solar energy search session patterns 1. solar energy 2. solar energy news 50. Recognizing proper nouns, dates, and unique ID#s 51. 2010 Louis Rosenfeld, LLC (www.louisrosenfeld.com). All rights reserved. Identifying a need for a glossary 27 52. Smarter best bets 53. 2010 Louis Rosenfeld, LLC (www.louisrosenfeld.com). All rights reserved. 29 Best bets without guessing 54. Frequent keywords recycled best bets 55. Learn how audiences differ 56. Who cares about what? (AIGA.org) 57. Who cares about what? (AIGA.org) 58. Who cares about what? (Open U) 59. Who cares about what? (Open U) 60. Who cares about what? (Open U) 61. Who cares about what? (Open U) 62. Why analyze queries by audience? Fortify your personas with data Learn about differences between audiences Open University Enquirers: 16 of 25 queries are for subjects not taught at OU Open University Students: search for course codes, topics dealing with completing program Determine whats commonly important to all audiences (these queries better work well) 63. Reduce jargon 64. Save the brand by killing jargon Jargon related to online education: FlexEd, COD, College on Demand Marketings solution: expensive campaign to educate public (via posters, brochures) Result: content relabeled, money saved query rank query #22 online* #101 COD #259 College on Demand #389 FlexTrack *onlinepart of 213 queries 65. Agenda 1.The basics of Site Search Analytics (SSA) 2.Exercise 1 (pattern analysis) 3.Things you can do with SSA 4.Exercise 2 (longitudinal analysis 5.More things you can do with SSA 6.A case study 7.More on metrics 8.Things you can do today 9.Discussion 66. Exercise 2 (longitudinal analysis) Work in pairs Each pair should have a laptop with Microsoft Excel Laptop platform (Mac, PC) doesnt matter Download data les: 2006-February.xls + 2006-June.xls Refer to exercise sheet No right answers Have fun! 67. Agenda 1.The basics of Site Search Analytics (SSA) 2.Exercise 1 (pattern analysis) 3.Things you can do with SSA 4.Exercise 2 (longitudinal analysis 5.More things you can do with SSA 6.A case study 7.More on metrics 8.Things you can do today 9.Discussion 68. Know when to publish what 69. Interest in the football team: going... 70. Interest in the football team: going... ...going... 71. Interest in the football team: going... ...going... gone 72. Interest in the football team: going... ...going... gone Time to study! 73. Before Tax Day 74. After Tax Day 75. Identify trends 76. Learn from failure 77. Failed navigation? Examining unexpected searching Look for places searches happen beyond main page Whats going on? Navigational failure? Content failure? Something else? 78. Where navigation is failing (Professional Resources page) Do users and AIGA mean different things by Professional Resources? 79. Comparing what users nd and what they want 80. Comparing what users nd and what they want 81. Failed business goals? Developing custom metrics Netix asks 1. Which movies most frequently searched? (query count) 2. Which of them most frequently clicked through? (MDP views) 3. Which of them least frequently added to queue? (queue adds) 82. Failed business goals? Developing custom metrics Netix asks 1. Which movies most frequently searched? (query count) 2. Which of them most frequently clicked through? (MDP views) 3. Which of them least frequently added to queue? (queue adds) 83. Failed business goals? Developing custom metrics Netix asks 1. Which movies most frequently searched? (query count) 2. Which of them most frequently clicked through? (MDP views) 3. Which of them least frequently added to queue? (queue adds) 84. Learn from search sessions 85. Sample search session (Teach for America intranet) 86. Session analysis These queries co-occur within sessions: why? 87. TFAnet session analysis results Searches for delta ICEG perform poorly (way below the fold) Users then try an (incorrect) alternative (delta learning team) 54 88. Identify content gaps 89. 0 results report (from behaviortracking.com) Are we missing something? Are we missing a type of something? 90. Identifying gaps helps force an issue 91. Identify failed content 92. 1.Choose a content type (e.g., events) 2.Ask:Where should users go from here? 3.Analyze the frequent queries from this content type from aiga.org 93. Analyze frequent queries generated from each content sample 94. Make content owners into stakeholders 95. Sandia National Labs Regularly record which documents came up at position #1 for 50 most frequent queries If and when that top document falls out of position #1, document's owner is alerted Result: healthy dialogue (often about following policies and procedures and their value) 96. Connecting pages (and their owners) that are found through search... 97. ...with how those pages were found 98. Predict the future 99. Shaping the FinancialTimes editorial agenda FT compares these Spiking queries for proper nouns (i.e., people and companies) Recent editorial coverage of people and companies Discrepancy? Breaking story?! Let the editors know! 100. Agenda 1.The basics of Site Search Analytics (SSA) 2.Exercise 1 (pattern analysis) 3.Things you can do with SSA 4.Exercise 2 (longitudinal analysis 5.More things you can do with SSA 6.A case study 7.More on metrics 8.Things you can do today 9.Discussion 101. Avoiding a disaster atVanguard Vanguard used SSA to help benchmark existing search engines performance and help select new engine New search engine performed poorly But IT needed convincing to delay launch Information Architect & Dev Team Meeting Search seems to have a few problems Nah . Wheres the proof? You cant tell for sure. 102. What to do? Test performance of most frequent queries Measure using original two sets of metrics 1.relevance: how reliably the search engine returns the best matches rst 2.precision: proportion of relevant and irrelevant results clustered at the top of the list 103. Relevance: 5 metrics (queries tested have best result) Mean: Average distance from the top Median: Less sensitive to outliers, but not useful once at least half are ranked #1 Count - Below 1st: How often is the best target something other than 1st? Count Below 5th: How often is the best target outside the critical area? Count Below 10th: How often is the best target beyond the rst page? 104. Relevance: 5 metrics (queries tested have best result) Mean: Average distance from the top Median: Less sensitive to outliers, but not useful once at least half are ranked #1 Count - Below 1st: How often is the best target something other than 1st? Count Below 5th: How often is the best target outside the critical area? Count Below 10th: How often is the best target beyond the rst page? OK! 105. Relevance: 5 metrics (queries tested have best result) Mean: Average distance from the top Median: Less sensitive to outliers, but not useful once at least half are ranked #1 Count - Below 1st: How often is the best target something other than 1st? Count Below 5th: How often is the best target outside the critical area? Count Below 10th: How often is the best target beyond the rst page? OK! Hmmm... 106. Relevance: 5 metrics (queries tested have best result) Mean: Average distance from the top Median: Less sensitive to outliers, but not useful once at least half are ranked #1 Count - Below 1st: How often is the best target something other than 1st? Count Below 5th: How often is the best target outside the critical area? Count Below 10th: How often is the best target beyond the rst page? OK! Hmmm... Uh oh 107. Precision: rating scale Evalua...