Search Analytics Business Value & NoSQL Backend

  • Published on
    27-Jan-2015

  • View
    103

  • Download
    1

DESCRIPTION

 

Transcript

  • 1. Search AnalyticsBusiness Value&NoSQL BackendOtis Gospodneti Sematext International@otisg @sematext sematext.comsematext.com/search-analytics

2. About Otis Gospodneti ASF Member: Lucene, Solr, Nutch, Mahout Author: Lucene in Action 1 & 2 Entrepreneur: Sematext, Simpy 2 Copyright 2011 Sematext Intl. All rights reserved. 3. Sematext Metrics 100% organic: no GMO, no VC 4 years old < 10 people 7 countries 3 timezones 2 continents > 100 customers 3 Copyright 2011 Sematext Intl. All rights reserved. 4. About SematextProducts & ServicesConsulting, Development, Tech Support: Search (Lucene, Solr, ElasticSearch...) Big Data (Hadoop, HBase, Voldemort...) Web Crawling (Nutch, Droids) Machine Learning (Mahout)4Copyright 2011 Sematext Intl. All rights reserved. 5. Agenda What is Search Analytics and why it matters Example reports and their value What we built, why, and how5Copyright 2011 Sematext Intl. All rights reserved. 6. Communication twitter.com/sematext twitter.com/otisg hash tags: #stsa or #stanalytics http://sematext.com/search-analytics/index.html Raise your hand! otis@sematext.com6Copyright 2011 Sematext Intl. All rights reserved. 7. The Compass Search logs are your Map Search Analytics is your Compass 7 Copyright 2011 Sematext Intl. All rights reserved. 8. High Level Why searchuserssearchexperience searchproviders8Copyright 2011 Sematext Intl. All rights reserved. 9. High Level Why This search sucks! It takes 17 tries to find anything here!F!?@#$%^&?!? searchuserssearchexperience searchprovidersCool, the latest search tweaks made our site really sticky! Awesome! 9Copyright 2011 Sematext Intl. All rights reserved. 10. Dont Be Like This Dude10Copyright 2011 Sematext Intl. All rights reserved. 11. Got Clue?Performance MonitoringTuningSearch Analytics UI Quality Assurance11 Copyright 2011 Sematext Intl. All rights reserved. 12. More Concrete Why Measure and monitor everything. Introspection. Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula 12 Copyright 2011 Sematext Intl. All rights reserved. 13. The Moment of Truth Question for the audience #1 What do you use for Search Analytics? a) Home grown stuff b) Google Analytics c) Omniture d) Webtrends e) Other f ) Nothing 13 Copyright 2011 Sematext Intl. All rights reserved. 14. Search Analytics Outline Collect: queries & clicks & interactions & ... Analyze: actions / xactions / conversions Output: reports over time Output++: feedback loop remember this The means, not the goal Ongoing, not one-off14Copyright 2011 Sematext Intl. All rights reserved. 15. Search vs. Web Analytics User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or evenunify it 15 Copyright 2011 Sematext Intl. All rights reserved. 16. Example Core Reports Rate & Volume, Latency (mean, avg, 90%) Click Through Rate, Mean Reciprocal Rank Top Queries by count, clicks, 0 hits... Query Trending Top Seen Docs, Top Clicked Docs (msft) Page & Click Depth Facet & Sort Usage ...16Copyright 2011 Sematext Intl. All rights reserved. 17. More Reports in More Detail See Search Analytics What? Why?How?http://blog.sematext.com/tag/analytics/17Copyright 2011 Sematext Intl. All rights reserved. 18. Part Dos Switching gears... Juno digs NoSQL18Copyright 2011 Sematext Intl. All rights reserved. 19. What Weve Built Search Analytics SaaS Numerous reports (e.g. query volume,rate, latency, term frequencies /comparisons, hit buckets, search origins,etc.) Trending over time Comparisons of time periods Top N reports Filter, slice and dice19Copyright 2011 Sematext Intl. All rights reserved. 20. Who Needs a Compass? We need it search-hadoop.com & search-lucene.com Our customers need it! You? 20 Copyright 2011 Sematext Intl. All rights reserved. 21. Sematext Search Analytics21Copyright 2011 Sematext Intl. All rights reserved. 22. Big Dreams SaaS Multitenant Large Scale Massive Data Cloud22Copyright 2011 Sematext Intl. All rights reserved. 23. Storage Choices RDBMS: MySQL, PostgreSQL HDFS Hive HBase Cassandra23Copyright 2011 Sematext Intl. All rights reserved. 24. SaaS vs. In-House Question for the audience #2 SaaS vs in-house Search Analytics? a) SaaS b) in-house24Copyright 2011 Sematext Intl. All rights reserved. 25. Sematext Search Analytics25Copyright 2011 Sematext Intl. All rights reserved. 26. Sematext Search Analytics26Copyright 2011 Sematext Intl. All rights reserved. 27. Sematext Search Analytics27Copyright 2011 Sematext Intl. All rights reserved. 28. Sematext Search Analytics28Copyright 2011 Sematext Intl. All rights reserved. 29. Data Flow See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/29 Copyright 2011 Sematext Intl. All rights reserved. 30. Data Collection See Search Analytics with Flume and HBasehttp://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ 30 Copyright 2011 Sematext Intl. All rights reserved. 31. Core Tech JavaScript Beacons Metric Capture Web App aka Receiver Flume Agents, Collectors, Sinks HBase MapReduce Aggregations Search Analytics Reporting Web App 31 Copyright 2011 Sematext Intl. All rights reserved. 32. What is Flume Distributed data/log collection service Scalable, configurable, extensible Centrally manageable, open source Agents get data from app, Collectors save it Abstractions: Source Decorator(s) Sink 32 Copyright 2011 Sematext Intl. All rights reserved. 33. What is HBase Scalable, reliable, distributed, column-oriented DB On top of HDFS MapReducable33Copyright 2011 Sematext Intl. All rights reserved. 34. Data Flow, Detailed 34 Copyright 2011 Sematext Intl. All rights reserved. 35. Why Flume Reliable delivery e.g. queue msgs locally if destination unreachable Easy, centralized management via Web UI orconsole Good community, good progress, now @ASF But: more complex, more moving parts On Flume: slideshare.net/cloudera/inside-flume Alternatives: Kafka, Scribe...35Copyright 2011 Sematext Intl. All rights reserved. 36. Why HBase Scalable raw & aggregate data storage MapReduce data input Fast scans for time ranges, fast key lookups Easy storage and compute power expansion Good looking roadmap, community, progress36Copyright 2011 Sematext Intl. All rights reserved. 37. Open Sourcing 2 open-source projects:github.com/sematext/HBaseWDgithub.com/sematext/HBaseHUT See sematext.com/open-source/index.html Patches for Flume and HBaseblog.sematext.com/tag/flume/37Copyright 2011 Sematext Intl. All rights reserved. 38. Challenges Data size. Solutions: Compression (4-5x smaller with lzo) Data pruning (variable levels) Query string distribution: very long-tail Lots of data to process, update, aggregate Young tools: Flume, HBase Poor IO on EC2 Hadoop distributions 38 Copyright 2011 Sematext Intl. All rights reserved. 39. Output++ AutoComplete - $MM improvement Better DYM Spellchecker Related Searches Recommendations Relevance Feedback ...39Copyright 2011 Sematext Intl. All rights reserved. 40. Closing the Loop searchuserssearchexperiencesearch providers40Copyright 2011 Sematext Intl. All rights reserved. 41. ResourceSearch Analytics for Your SiteLouis Rosenfeld http://rosenfeldmedia.com/books/searchanalytics/ 41Copyright 2011 Sematext Intl. All rights reserved. 42. Were HiringDig Search?Dig Analytics?Dig Big Data?Dig Performance?Dig working with and in open-source?Were hiring world-wide!http://sematext.com/about/jobs.html42Copyright 2011 Sematext Intl. All rights reserved. 43. Contactsematext.comblog.sematext.com@sematext@otisgotis@sematext.comWant SA? Grab me or go to:sematext.com/search-analyticsHash tags: #stsa or #stanalytics43Copyright 2011 Sematext Intl. All rights reserved.

Recommended

View more >