Upload
anton-anokhin
View
43
Download
0
Embed Size (px)
Citation preview
1How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
How to create self-service analytics tool from activity logs garbage
2016 Sep 14 Wrike Tech Hub
Aleksei SmirnovData Analyst at Wrike Inc.
Aleksei PupyshevData Scientist at Wrike Inc.
2How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike is ...Workspace (Web Application) iOS & Android apps
Many integrations and public API
We're releasing new products and features as well as changing
old ones, very quickly.
3How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike is - Data Driven Development Company
4How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike Analytics Tools Evolution: What about logs?
So here we’ve implemented log processing infrastructure based on Spark SQL
Presentation from SPbDSM Sep 2015
UI events
Web Requests
Backend Services
ETL
More about parquet files structure:https://habrahabr.ru/company/wrike/blog/279797/
Thrift interface
5How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike Analytics Tools Evolution: Problems
Spark-submit python jobs
● More and more ETLs or pyspark jobs for different specific tasks and dashboards
● There is no common standard and knowledge (code) base for different metrics extractions / computations
● Many different specific sources in out for each analytics separately
● It’s hell to generate datasets for ML (predictions, lead-scoring, personalizations etc) or adhocs
● There is no ability to build one monitoring and alert system for wrike events and KPIs
● Hundreds of dashboards for Wrike data stakeholders which is difficult to get any insights about product and business development
● No metrics naming convention
6How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike Analytics Tools Evolution: Problems
7How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike Analytics Tools Evolution: Solution
● Unification of log-format data - different event timestamps formats to one, different
production tables to log-structure format, unifications of user_id for all sources
● Unification of grouping format - (in our case) user_id and day
● Standardisation of metric naming principles - positioning based naming schema:
entity__event__source__path__measure__unit__details
● Unification of auto-updateable metrics, features creating and metrics testing
process - via Jupiter Notebook using any of following syntax: Python, Pandas, SQL
(PandasSQL)
● Generating of one datasource which contains all user activity metrics and
features with updatable schema - Daily User Activity Data Mart (Vitrina)
8How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike User Activity Data Mart: Tech Stack
9How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike User Activity Data Mart: Under the Hood
10How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Logs:
● Client log (UI)● Web log (Requests)● Email log● Event log (Invitations, Registrations etc)● Search log● Mobile log● ...
UADataMart Under the Hood: Concatenating logs● Unification of log-format data - different event timestamps formats to one, different production tables to log-structure format, unifications of user_id for all sources
Production Data Bases (from many shards):
● Delta table● Files Attachments● Task changes● ...
Union of spark data frames with merging schema
~ we also should rename columns with adding of source prefix (except user_id and timestamp)
This operation isn’t expensive and very useful!
11How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
UADataMart Under the Hood: Grouping by User
This is expensive operations!
And then applying of “magic” map function
12How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
UADataMart Under the Hood: “magic” map function
13How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
UADataMart Under the Hood: “magic” map function
● Creating of Pandas Data Frame from
grouped Row object
● Applying of each “Metrics Module
Function” to copy of Pandas DF which
generates dictionary with appropriate
metrics (KPIs) name and value
● If exception occurs (some error inside
module function) generates dictionary with
default KPI values
● Concatenation of list of returned dictionaries
and converting to Row
14How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
UADataMart Under the Hood: Metrics Module Functions
Example: based on PandasSQL syntax
Note: here we can use any syntax we like or Python or Pandas!
15How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
UADataMart Under the Hood: Modules Structure
16How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike User Activity Data Mart: Under the Hood
17How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike User Activity Data Mart: Under the Hood
Dimensions
apply UDFs (converting to categorical value)for each dimensioncolumn
Categorical dimensions
grouping by categorical dimensions and aggregations (by all users) inside grouped data
Registration Period Paid Details Country KPI Name Sum of KPI Day
From 1 year to 2 year Paid US ses__x__x__x__avg__mn__x 1000000 2016.09.01
From 6 months to 1 year Free BR act__x__ws__dashb__cnt__ev__x 20000 2016.09.01
From 2 week to 1month Free GB act__x__ws__tlist__cnt__ev__x 100000 2016.09.02
~ 1 mln rows
18How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike User Activity Data Mart: For Wrike Data Stakeholders
● entity__event__source__path__measure__unit__detail
s
19How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Demo!
20How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Flow:
21How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Wrike Analytics Tools Evolution: Problems
Spark-submit python jobs
● More and more ETLs or pyspark jobs for different specific tasks and dashboards
● There is no common standard and knowledge (code) base for different metrics extractions / computations
● Many different specific sources in out for each analytics separately
● It’s hell to generate datasets for ML (predictions, lead-scoring, personalizations etc) or adhocs
● There is no ability to build one monitoring and alert system for wrike events and KPIs
● Hundreds of dashboards for Wrike data stakeholders which is difficult to get any insights about product and business development
● No metrics naming convention
22How to create self-service analytics tool from activity logs garbageWrike Tech Hub Aleksei Pupyshev, Aleksei Smirnov 14.09.2016 slide Wrike
Other Applications:
● Alarm system (notification when something goes wrong with metrics values)
● Email personalization● Recommendation system ( like wrike features recommendations,
search quality improvements, user-churn predictions, lead-scoring etc. )