Webinar: Event Processing & Data Analytics with Lucidworks Fusion

Preview:

Citation preview

Event Processing and Data Analytics with Lucidworks Fusion

Kiran Chitturi,Software Engineer

Lucidworks Is Search

Connector Framework

Index Pipelines (ETL)

( )ScaleFault ToleranceReal-Time

Fusion APIs

Recommendations Personalization Contextual SearchRelevancy Tool

Machine Learning / Signal ProcessingAnalytics

Security

EcommerceSite

CustomerAnalytics

ProductCatalog

UserHistory

ConversionData

Lucidworks Fusion

5

• How to capture user events ?

• How to use events for recommendations ?

• How to produce reports from user events ?

• What type of recommendations can be generated for different user types?

Problem Statement

6

• Library to collect user events from client-side tier of websites and apps

• Sends events using tracking pixel

• Signals API acts as a collector for Snowplow events

• Tracks page views, page pings, clicks, links and any custom configured events

• https://github.com/snowplow/snowplow/wiki/javascript-tracker

Event collection - Snowplow JS tracker

test

Primary collection

Raw signals

collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Signals Service

JSON payloads

Snowplow payloads

Solr

Signals - data flow

9

• Examples:

• page-view, query, search-click, add-to-cart, rating

• Signals Schema:

• required fields: type

• additional properties can be specified in ‘params’ map

• Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’, ‘weight’, ‘count’

• Processing logic in ‘_signals_ingest’ pipeline

Event collection - JSON payloads

10

Example: page-view signal

{ "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "url": "http://www.ecommerce.com/abws-mcl008-080201" } }

{ "type_s": "pv", "flag_s": "event", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "62a26152-7971-406e-bf06-3df44974c220", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057367743463400 }

Input signal Indexed signal document

11

Example: page-view signal

{ "timestamp": "2015-09-14T10:12:13.456Z", "type": "pv", "params": { "page": "Dark Gray Wool Suit", "url": "http://www.ecommerce.com/abws-mcl008-080201", "userId": "12891291", "useragent_type_name_s": "Browser", "ipAddr": "64.134.151.1" "tz": "America/NewYork" } }

{ "type_s": "pv", "params.tz_s": "America/NewYork", "user_id_s": "12891291", "params.page_s": "Dark Gray Wool Suit", "tz_timestamp_txt": [ "Mon 2015-09-14 10:12:13.456 UTC" ], "flag_s": "event", "params.ipAddr_s": "64.134.151.1", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057643959353300 }

Input signal Indexed signal document

12

Example: click signal

{ "type": "click", "params": { "query": "Madden 12", "docId": "2375201", "userId": "abc121", "position" : "4", "filterQueries": [ "cat00000", "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008" ] } }

{ "filters_orig_ss":[ "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008", "cat00000" ], "user_id_s":"abc121", "query_s":"madden 12", "type_s":"click", "params.position_s" : "4", "query_t": "madden 12", "doc_id_s":"2375201", "tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"], "filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002

$ abcat0703008 $ cat00000", "flag_s":"event", "query_orig_s":"Madden 12", "id":"69c609f6-a2c1-4f89-990e-88a63e68063d", "timestamp_tdt":"2015-10-13T18:33:04.01Z", "count_i":1, "_version_":1514941903557099520 }

Input signal Indexed signal document

13

• Batch processing using Apache Spark

• spark-solr library (https://github.com/LucidWorks/spark-solr)

• Types

• Simple

• Click

• EventMiner

Aggregations

14

Aggregations - data flow

Aggregation job

AggregatorSpark Agent

test

Primary collection

Raw signals collection

Worker Worker Cluster Mgr.

Spark

Aggregated signals collection

Spark Driver

Stores aggregated results

Fetches raw signals for processing

test_signals test_signals_aggr

15

• Simple aggregations

• Top queries

• Top clicked documents

• Most popular categories

• …

• Complex aggregations

• Click stream aggregations with decaying weights

• Generate a Co-occurence matrix for (user, docId, query) tuple

Aggregation examples

16

Example: simple aggregation { "type": "rating", "params": { "rating": “5.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }

API

test

Primary collection

Raw signals collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Solr

Signals Service

17

Example: simple aggregation (continued)

17

test

Primary collection

Raw signals collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Solr

Submitted manually or

via scheduler

Aggregation Service Spark

Fetches raw signals for processing

Stores aggregated results

{ "id" : "test_simple_aggr", "signalTypes" : [ "rating" ], "selectQuery" : "*:*", "aggregator" : "simple", "groupingFields" : "params.source_s", "aggregates" : [ { "type" : "stddev", "sourceFields" : [ "params.rating_s" ], "targetField" : "stddev_rating_d" }, { "type": "topk", "sourceFields": ["params.rating_s"], "targetField": "topk_rating_ss" }, { "type": "mean", "sourceFields": ["params.rating_s"], "targetField": "mean_position_d" } ] } Aggregation

definition

job submission

18

• Aggregated document:

Example: simple aggregation (continued){ "aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7", "aggr_type_s": "simple@doc_id_s-query_s-filters_s", "flag_s": "aggr", "type_s": "rating", "id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e", "aggr_id_s": "test_simple_aggr", "timestamp_tdt": "2015-10-15T02:26:17.337Z", "count_i": 5, “grouping_key_s": "web",

"stddev_rating_d": 1.6431676725154982,

"mean_position_d": 2.2,

"values.topk_rating_ss": ["2.0", "1.0", "5.0"], "counts.topk_rating_ss": ["2", "2", "1"], "errors.topk_rating_ss": ["0", "0", "0"] }

19

Example: Click aggregation[ { "timestamp": "2014-09-01T23:44:52.533Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-09-05T12:25:37.420Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-08-24T12:56:58.910Z", "params": { "query": "Sharp TV", "docId": "1517163" }, "type": "click" }, { "timestamp": "2015-10-25T07:18:14.722Z", "params": { "query": "rca", "docId": "2877125" }, "type": "click" } ]

Signals indexed and aggregated

{ "doc_id_s": "1517163", "query_s": "sharp tv", "weight_d": 0.000006602878329431405, "count_i": 1 }, { "doc_id_s": "2009324", "query_s": "sharp", "weight_d": 0.000016734602468204685, "count_i": 2 }, { “doc_id_s”: "2877125", "query_s": "rca", "weight_d": 0.06324164569377899, "count_i": 1 }

aggregated docsraw docs

20

• How to mix signals with search results ?

• Recommendation API

• Generic query pipeline configuration using 3 stage approach

• Sub-query

• Rollup-results

• Advanced-boost

Driving search relevancy

21

Boosting search results using aggregated documentsUser App

Search query

Query-pipeline

stages

Set Params Query Solr

Raw signals collection

Aggregated signals

collection

test_signalstest_signals

_aggr

Recommendation Stages

test

Primary collection

1. Query aggregated documents 2. Process results 3. Add parameters to the request

Search response

22

Before

After

25

Demo

26

Using Signals

=

Modifying Your Behavior in Response to your Environment

Events & Signals

Recommended