29
OCTOBER 13-16, 2016 AUSTIN, TX

Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

Embed Size (px)

Citation preview

Page 1: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Page 2: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

Event Processing and Data Analytics with Lucidworks Fusion Kiran Chitturi

Software Engineer, Lucidworks

Page 3: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

3

• How to capture/record user events ?

• How to use events/signals for recommendations ?

• How to produce reports/analytics from user events ?

• What type of recommendations can be generated for different user types?

Problem Statement

Page 4: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

4

• Library to collect user events from client-side tier of websites and apps (https://github.com/snowplow/snowplow-javascript-tracker)

• Open source equivalent for enterprise analytics

• Sends events using tracking pixel

• Signals API acts as a collector for Snowplow events

• Tracks page views, page pings, links and any custom configured events

• https://github.com/snowplow/snowplow/wiki/javascript-tracker

Event collection - Snowplow JS tracker

Page 5: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks
Page 6: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

6

• Examples:

• page-view, query, search-click, add-to-cart, rating

• Signals Schema:

• required fields: type

• additional properties can be specified in ‘params’ map

• Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’, ‘weight’, ‘count’

• Processing logic in ‘_signals_ingest’ pipeline

Event collection - JSON payloads

Page 7: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

test

Primary collection

Raw signals

collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Signals Service

JSON payloads

Snowplow payloads

Solr

Signals - data flow

Page 8: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

8

Example: page-view signal

{ "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "url": "http://www.ecommerce.com/abws-mcl008-080201" } }

{ "type_s": "pv", "flag_s": "event", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "62a26152-7971-406e-bf06-3df44974c220", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057367743463400 }

Input signal Indexed signal document

Page 9: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

9

Example: page-view signal

{ "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "page": "Dark Gray Wool Suit", "url": "http://www.ecommerce.com/abws-mcl008-080201", "userId": "12891291", "useragent_type_name_s": "Browser", "ipAddr": "64.134.151.1" "tz": "America/NewYork" } }

{ "type_s": "pv", "params.tz_s": "America/NewYork", "user_id_s": "12891291", "params.page_s": "Dark Gray Wool Suit", "tz_timestamp_txt": [ "Mon 2015-09-14 10:12:13.456 UTC" ], "flag_s": "event", "params.ipAddr_s": "64.134.151.1", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057643959353300 }

Input signal Indexed signal document

Page 10: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

10

Example: click signal

{ "type": "click", "params": { "query": "Madden 12", "docId": "2375201", "userId": "abc121", "position" : "4", "filterQueries": [ "cat00000", "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008" ] } }

{ "filters_orig_ss":[ "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008", "cat00000" ], "user_id_s":"abc121", "query_s":"madden 12", "type_s":"click", "params.position_s" : "4", "query_t": "madden 12", "doc_id_s":"2375201", "tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"], "filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002

$ abcat0703008 $ cat00000", "flag_s":"event", "query_orig_s":"Madden 12", "id":"69c609f6-a2c1-4f89-990e-88a63e68063d", "timestamp_tdt":"2015-10-13T18:33:04.01Z", "count_i":1, "_version_":1514941903557099520 }

Input signal Indexed signal document

Page 11: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

11

• Batch processing using Apache Spark

• spark-solr library (https://github.com/LucidWorks/spark-solr)

• Types

• Simple

• Click

• EventMiner

Aggregations

Page 12: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

12

Aggregations - data flow

Aggregation job

Aggregator Spark Agent

test

Primary collection

Raw signals collection

Worker Worker Cluster Mgr.

Spark

Aggregated signals collection

Spark Driver

Stores aggregated results

Fetches raw signals for processing

test_signals test_signals_aggr

Page 13: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

13

• Simple aggregations

• Top queries

• Top clicked documents

• Most popular categories

• …

• Complex aggregations

• Click stream aggregations with decaying weights

• Generate a Co-occurence matrix for (user, docId, query) tuple

Aggregation examples

Page 14: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

14

Example: simple aggregation { "type": "rating", "params": { "rating": “5.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }

API

test

Primary collection

Raw signals collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Solr

Signals Service

Page 15: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

15

Example: simple aggregation (continued)

15

test

Primary collection

Raw signals collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Solr

Submitted manually or

via scheduler

Aggregation Service Spark

Fetches raw signals for processing

Stores aggregated results

{ "id" : "test_simple_aggr", "signalTypes" : [ "rating" ], "selectQuery" : "*:*", "aggregator" : "simple", "groupingFields" : "params.source_s", "aggregates" : [ { "type" : "stddev", "sourceFields" : [ "params.rating_s" ], "targetField" : "stddev_rating_d" }, { "type": "topk", "sourceFields": ["params.rating_s"], "targetField": "topk_rating_ss" }, { "type": "mean", "sourceFields": ["params.rating_s"], "targetField": "mean_position_d" } ] } Aggregation

definition

job submission

Page 16: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

16

• Aggregated document:

Example: simple aggregation (continued){ "aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7", "aggr_type_s": "simple@doc_id_s-query_s-filters_s", "flag_s": "aggr", "type_s": "rating", "id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e", "aggr_id_s": "test_simple_aggr", "timestamp_tdt": "2015-10-15T02:26:17.337Z", "count_i": 5, “grouping_key_s": "web",

"stddev_rating_d": 1.6431676725154982,

"mean_position_d": 2.2,

"values.topk_rating_ss": ["2.0", "1.0", "5.0"], "counts.topk_rating_ss": ["2", "2", "1"], "errors.topk_rating_ss": ["0", "0", "0"] }

Page 17: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

17

Example: Click aggregation[ { "timestamp": "2014-09-01T23:44:52.533Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-09-05T12:25:37.420Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-08-24T12:56:58.910Z", "params": { "query": "Sharp TV", "docId": "1517163" }, "type": "click" }, { "timestamp": "2015-10-25T07:18:14.722Z", "params": { "query": "rca", "docId": "2877125" }, "type": "click" } ]

Signals indexed and aggregated

{ "doc_id_s": "1517163", "query_s": "sharp tv", "weight_d": 0.000006602878329431405, "count_i": 1 }, { "doc_id_s": "2009324", "query_s": "sharp", "weight_d": 0.000016734602468204685, "count_i": 2 }, { “doc_id_s”: "2877125", "query_s": "rca", "weight_d": 0.06324164569377899, "count_i": 1 }

aggregated docsraw docs

Page 18: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

18

• How to mix signals with search results ?

• Recommendation API

• Generic query pipeline configuration using 3 stage approach

• Sub-query

• Rollup-results

• Advanced-boost

Driving search relevancy

Page 19: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

19

Boosting search results using aggregated documentsUser App

Search query

Query-pipeline

stages

Set Params Query Solr

Raw signals collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Recommendation Stages

test

Primary collection

1. Query aggregated documents 2. Process results 3. Add parameters to the request

Search response

Page 20: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

20

Page 21: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

21

• Calculate Co-occurence matrix for tuples based on sessions

• Example: (userId, query, docId)

• Construct DAG from matrix data

• Recommendations are powered from Graph at query time

• Increases diversity in recommendations

• See https://lucidworks.com/blog/2015/08/31/mining-events-recommendations/

Event Miner aggregation

Page 22: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

22

Graph Navigation - Example Query

Page 23: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

23

Graph Navigation - Example Query

Page 24: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

24

Graph Navigation - Example Query

Page 25: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

25

Graph Navigation - Example Query

Page 26: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

Graph Navigation - Example Query

Page 27: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

27

Demo

Page 28: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks

28

Using Signals

=

Modifying Your Behavior in Response to your Environment

Events & Signals

Page 29: Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kiran Chitturi, Lucidworks