23
Big data on a small budget Philipp Kandal / Twitter: @apphil / CTO & Co-Founder skobbler 20 th November 2013

Philipp Kandal , CTO, Skobbler - Big data on a small budget

Embed Size (px)

Citation preview

Page 1: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Philipp Kandal / Twitter: @apphil / CTO & Co-Founder skobbler20th November 2013

Page 2: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

What do I know about big data?

@apphil #2

- skobbler logs all positions from our users (100 billion+)

- > 10TB of data from users

- Products / revenues significantly Improved with Business Intelligence

Page 3: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Why should you learn about big data?

@apphil #3

Harvard Business Review: “Data Scientist: The Sexiest Job of the 21st Century”

Obama became president of the US in big parts due to the use of big data…

World class sports teams enhance their performance by big data

Amazon, Google, Facebook, etc. have all their dev-processes by now data-driven

Page 4: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

What are some great use-cases for big data?

@apphil #4

Analyzing of log files and user behavior (and predictions about future behavior)

A/B testing and automatic optimization of functionality

Improving monetization (e.g. ad optimization, etc.)

Checking adoption and usage of new features

Page 5: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

When better not to rely on big data?

@apphil #5

When qualitative feedback is better than quantitative one (e.g. very early stage companies)

When you don’t have enough users yet to get statistically relevant results

When you do not know what you are optimizing for

Page 6: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

How does a solid and simple workflow for big data analysis look like?

@apphil #6

Process

Analyse

Improve

Eval / Test

Log

Page 7: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Tools / technologies for a good big data setup

@apphil #7

Logging: MongoDB, VoltDB, Cassandra

Processing & Analyzing / Storing: Hadoop & Hbase (batch), Storm (real-time), Samza (real-time)

Optimizing: Mahout (machine learning)

Page 8: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

How can you build this without breaking the bank?

@apphil #8

- Analyse / process Async

- Cheap dedicated servers (vs. cloud)

- Use Open / Free Software

Page 9: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Key cost factor: Real-time, near-time vs. batch

@apphil #9

- Real-time much more expensive than batch

- Leverage as much pre-processing as possible

- Try using in-memory technology for real-time analytics

Page 10: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#1 Log: Initially as much data as feasible should be logged so it’s available later

@apphil #10

- Define interesting data (rather log too much if unsure)

- Upload / collect data- Decide on real-time,

near-time or batch processing in the chain

Page 11: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#2 Process: Enhance the data and make it as rich as possible and easy to query

@apphil #11

- Move data to processing environment

- Run logged data through processing chain so it can be queried

- Enhance the logged data with any additional data available (e.g. geography, social data, user data, etc.)

Page 12: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#3 Analyse: Cluster the data in meaningful groups and compare it

@apphil #12

- Define Key performance Indicators (KPI)

- Cluster data in a meaningful way (e.g. by geography, time of day, customer past behaviour)

- Compare data vs. reference sets

Page 13: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#4 Improve: Learn from analysis where your challenges are to optimize behavior

@apphil #13

- Manually / Automatically adjust features (e.g. lower prices in certain regions, etc.)

- Develop A/B testing scenarios and formulate improvement theories

Page 14: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#5 Evaluate

@apphil #14

Check if the KPIs improve after applying the changes

Accept changes that improved your users behavior / reject changes that kept them the same

Define which additional logs you might need to better cluster / identify behaviour

Go back to step #1

Page 15: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#1 Log: Practical example on how this works at skobbler

@apphil #15

Software version

Routing profile used

Device

Raw Positions

Geography (e.g. country)

Rating of the route (optional)

Destination reached (yes / no)

Etc.

Page 16: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#2 Process: Enhance and split the data based on drives and segments

@apphil #16

Combine the data on a per drive basis (= session)

Combine the data on a per segment basis (= how fast are people driving on a street versus our estimate)

Identify key behavior across the route (e.g. re-routings, etc.)

Page 17: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Example: Real time analysis with Twitter Storm framework to detect road changes

@apphil #17

Example visualization of drives in last five minutes (real-time)

Page 18: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Example: Historic driving patterns (processed with Hadoop / HBase)

@apphil #18

Page 19: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#3 Analyse: Try to see in which areas our routing is not optimal

@apphil #19

KPIs are:

Route rating (if given)

# of re-routings (the smaller the better)

Time to destination vs. estimation by routing

Cluster the data by

Routing algorithm (and parameters used)

Geography

Page 20: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#4 Improve: Come up with strategies to improve routing experience based on data

@apphil #20

For future routes improve the estimation on time taken on a segment vs. time actually travelled

Alter routing parameters based on country specifics to get better results (e.g. in Germany people drive faster on the Autobahn)

Page 21: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

#5 Evaluate: Deploy the changes and compare them to reference data

@apphil #21

- Deploy changes to production and compare ratings / timings vs. base values (~weekly)

- Verify if other parameters such as usage, etc. also improve

Page 22: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Big data on a small budget

Summary: Big data can drive big value but stay affordable

@apphil #22

Simple formula:

Log -> Process -> Analyze -> Improve -> Evaluate

= Success

Page 23: Philipp Kandal , CTO, Skobbler - Big data on a small budget

Thank you for your attention!Get in Touch: [email protected] Phone: +49-172-4597015 Follow me on

.com/apphil