43
Search Analytics Business Value & NoSQL Backend Otis Gospodnetić Sematext International @otisg @sematext sematext.com sematext.com/search-analytics

Search Analytics Business Value & NoSQL Backend

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Search Analytics Business Value & NoSQL Backend

Search Analytics

Business Value&

NoSQL Backend

Otis Gospodnetić – Sematext International@otisg ◦ @sematext ◦ sematext.com

sematext.com/search-analytics

Page 2: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.2

About Otis Gospodnetić

• ASF Member: Lucene, Solr, Nutch, Mahout

• Author: Lucene in Action 1 & 2

• Entrepreneur: Sematext, Simpy

Page 3: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.3

Sematext Metrics● 100% organic: no GMO, no VC● 4 years old● < 10 people● 7 countries● 3 timezones● 2 continents● > 100 customers

Page 4: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.4

About Sematext

Products & ServicesConsulting, Development, Tech Support:

● Search (Lucene, Solr, ElasticSearch...)● Big Data (Hadoop, HBase, Voldemort...)● Web Crawling (Nutch, Droids)● Machine Learning (Mahout)

Page 5: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.5

Agenda

● What is Search Analytics and why it matters● Example reports and their value● What we built, why, and how

Page 6: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.6

Communication● twitter.com/sematext● twitter.com/otisg● hash tags: #stsa or #stanalytics● http://sematext.com/search-analytics/index.html● Raise your hand!● [email protected]

Page 7: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.7

The Compass

Search logs are your MapSearch Analytics is your Compass

Page 8: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.8

High Level Why

searchusers

searchproviders

searchexperience

Page 9: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.9

High Level Why

searchproviders

searchexperience

This search sucks!It takes 17 tries to find anything here!

F!?@#$%^&?!?

searchusers

Cool, the latest search tweaks made our site really sticky!

Awesome!

Page 10: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.10

Don't Be Like This Dude

Page 11: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.11

Got Clue?

Search Analytics

Performance Monitoring

Quality Assurance

Tuning UI

Page 12: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.12

More Concrete Why● Measure and monitor everything. Introspection.● Supports (re)design, navigation choices● Helps with content acquisition & enhancement● Improve search experience● Mula

Page 13: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.13

The Moment of Truth

Question for the audience #1

What do you use for Search Analytics?

a) Home grown stuffb) Google Analyticsc) Omnitured) Webtrendse) Otherf ) Nothing

Page 14: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.14

Search Analytics Outline● Collect: queries & clicks & interactions & ...● Analyze: actions / xactions / conversions● Output: reports – over time● Output++: feedback loop

● The means, not the goal● Ongoing, not one-off

remember this

Page 15: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.15

Search vs. Web Analytics● User intent and information needs vs. inferring● Hand in hand● Ideally you can relate data from both or even

unify it

Page 16: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.16

Example Core Reports● Rate & Volume, Latency (mean, avg, 90%)● Click Through Rate, Mean Reciprocal Rank● Top Queries by count, clicks, 0 hits...● Query Trending● Top Seen Docs, Top Clicked Docs (msft)● Page & Click Depth● Facet & Sort Usage● ...

Page 17: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.17

More Reports in More Detail● See Search Analytics What? Why?

How?

http://blog.sematext.com/tag/analytics/

Page 18: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.18

Part Dos

Switching gears... Juno digs NoSQL

Page 19: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.19

What We've Built● Search Analytics SaaS

● Numerous reports (e.g. query volume, rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.)

● Trending over time● Comparisons of time periods● Top N reports● Filter, slice and dice

Page 20: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.20

Who Needs a Compass?● We need it

● search-hadoop.com & search-lucene.com

● Our customers need it!

● You?

Page 21: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.21

Sematext Search Analytics

Page 22: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.22

Big Dreams● SaaS● Multitenant● Large Scale – Massive Data● Cloud

Page 23: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.23

Storage Choices● RDBMS: MySQL, PostgreSQL● HDFS● Hive● HBase● Cassandra

Page 24: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.24

SaaS vs. In-HouseQuestion for the audience #2

SaaS vs in-house Search Analytics?

a) SaaSb) in-house

Page 25: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.25

Sematext Search Analytics

Page 26: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.26

Sematext Search Analytics

Page 27: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.27

Sematext Search Analytics

Page 28: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.28

Sematext Search Analytics

Page 29: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.29

Data Flow● See Search Analytics with Flume and HBase

http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/

Page 30: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.30

Data Collection● See Search Analytics with Flume and HBase

http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/

Page 31: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.31

Core Tech● JavaScript Beacons● Metric Capture Web App aka Receiver● Flume Agents, Collectors, Sinks● HBase● MapReduce Aggregations● Search Analytics Reporting Web App

Page 32: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.32

What is Flume● Distributed data/log collection service● Scalable, configurable, extensible● Centrally manageable, open source

● Agents get data from app, Collectors save it● Abstractions: Source → Decorator(s) → Sink

Page 33: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.33

What is HBase● Scalable, reliable, distributed, column-oriented DB● On top of HDFS● MapReducable

Page 34: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.34

Data Flow, Detailed

Page 35: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.35

Why Flume● Reliable delivery

● e.g. queue msgs locally if destination unreachable● Easy, centralized management via Web UI or

console● Good community, good progress, now @ASF● But: more complex, more moving parts● On Flume: slideshare.net/cloudera/inside-flume● Alternatives: Kafka, Scribe...

Page 36: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.36

Why HBase● Scalable raw & aggregate data storage● MapReduce data input● Fast scans for time ranges, fast key lookups● Easy storage and compute power expansion● Good looking roadmap, community, progress

Page 37: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.37

Open Sourcing● 2 open-source projects:

github.com/sematext/HBaseWDgithub.com/sematext/HBaseHUT

● See sematext.com/open-source/index.html

● Patches for Flume and HBaseblog.sematext.com/tag/flume/

Page 38: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.38

Challenges● Data size. Solutions:

● Compression (4-5x smaller with lzo)● Data pruning (variable levels)

● Query string distribution: very long-tail● Lots of data to process, update, aggregate

● Young tools: Flume, HBase● Poor IO on EC2● Hadoop distributions

Page 39: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.39

Output++● AutoComplete - $MM improvement● Better DYM Spellchecker● Related Searches● Recommendations● Relevance Feedback● ...

Page 40: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.40

Closing the Loop

searchusers

searchproviders

searchexperience

Page 41: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.41

Resource

http://rosenfeldmedia.com/books/searchanalytics/

Search Analytics for Your SiteLouis Rosenfeld

Page 42: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.42

We're Hiring

Dig Search?Dig Analytics?Dig Big Data?Dig Performance?Dig working with and in open-source?We're hiring world-wide!http://sematext.com/about/jobs.html

Page 43: Search Analytics Business Value & NoSQL Backend

Copyright 2011 Sematext Int'l. All rights reserved.43

sematext.com blog.sematext.com @sematext @otisg [email protected]

Want SA? Grab me or go to: sematext.com/search-analytics

Hash tags: #stsa or #stanalytics

Contact