27
Data Science: Bridging the Gap Between Data Generation and Data Comprehension Dr Carsten Riggselsen Principal Data Scientist Pivotal

Pivotal Digital Transformation Forum: Data Science Bridging the Gap

  • Upload
    pivotal

  • View
    6.439

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

Data Science: Bridging the Gap Between Data Generation and Data Comprehension

Dr Carsten Riggselsen Principal Data Scientist Pivotal

Page 2: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

2 © Copyright 2015 Pivotal. All rights reserved.

Analyzing data is nothing new

Page 3: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

3 © Copyright 2015 Pivotal. All rights reserved.

“Their Data” “Our Data” “My Data”

“Data”

“The Data”

“Data (Big)”

Page 4: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

4 © Copyright 2015 Pivotal. All rights reserved.

“Data” vs. “Data-Driven”

Deploy analytic apps and automation at scale

Store any type and size of data

Discover insights Create analytics algorithms

Page 5: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

5 © Copyright 2015 Pivotal. All rights reserved.

Page 6: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

6 © Copyright 2015 Pivotal. All rights reserved.

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

Data Science

Page 7: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

7 © Copyright 2015 Pivotal. All rights reserved.

Isolated Data Science

I don’t think (Big) Data is valuable, it’s a hype

– prove me wrong. We do BI and stuff

already. Data Science is a hype – prove me wrong.

Page 8: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

8 © Copyright 2015 Pivotal. All rights reserved.

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

Data Science

Page 9: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

9 © Copyright 2015 Pivotal. All rights reserved.

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

Page 10: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

10 © Copyright 2015 Pivotal. All rights reserved.

“Mere” convenience through Apps

Automate mundane or tedious tasks

Present information at a glance in an app

User Interaction with the app

Consistency and unbiasedness

24-7 availability

Scalability

Platform independence

Easy Provisioning

Page 11: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

11 © Copyright 2015 Pivotal. All rights reserved.

Smart Apps – Data Science Powered

Combining/link data sources/streams across areas and domains

There is an element of prediction involved based on accumulated data/info

Inferring (ab)normal patterns, e.g., profiling users, usage patterns

There is an element of root-cause identification involved

Page 12: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

12 © Copyright 2015 Pivotal. All rights reserved.

DS-Cheat-Sheet - Is it a SMART App?

q  Can past knowledge potentially improve on how to inform or act in the future?

q  Is past knowledge based on data/info from different domains? q  Do you need to affect outcomes in real-time? q  Are (ab)normal patterns to be inferred? q  Is the reason or cause for an action or a pattern unclear yet an important

thing to know?

q  Is the solution highly personalised? q  Is “crowdsourcing” knowledge (data/information) beneficial?

Page 13: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

13 © Copyright 2015 Pivotal. All rights reserved.

The Car Unlock Button – Press it!

Page 14: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

14 © Copyright 2015 Pivotal. All rights reserved.

“Siri or OK Google – unlock my car… UnnnLoooock my Caaaar…”

“OK – I will unlock your house”

Page 15: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

15 © Copyright 2015 Pivotal. All rights reserved.

SMART Unlock

Access to your Calendar/Agenda

Infer where/when you usually go by car

Awareness of Bank Holidays etc.

Knows where you parked your car

Knows where you are (GPS)

Page 16: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

16 © Copyright 2015 Pivotal. All rights reserved.

Works Efficient Convenient Smart

The Car-Unlock Experience

I unlocked your car!

Page 17: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

17 © Copyright 2015 Pivotal. All rights reserved.

Examples

Page 18: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

18 © Copyright 2015 Pivotal. All rights reserved.

Obstruction Duration Prediction

•  Predict duration of road incidents in London

•  Android app developed on top of the model

•  http://ds-demo-transport.cfapps.io

Page 19: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

19 © Copyright 2015 Pivotal. All rights reserved.

R E A LT I M E DASHBOARD Driving Prediction

https://youtu.be/5gySgGWJMHA

Page 20: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

20 © Copyright 2015 Pivotal. All rights reserved.

Time to Delivery

� Three sub problems –  Time to delivery estimate –  Time slot availability –  Courier scheduling

� Courier scheduling and time to delivery estimate may have mutual feedback

Logistics Comp. Logistics Comp.

Page 21: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

21 © Copyright 2015 Pivotal. All rights reserved.

Telco: Protecting Minors - Age Prediction

Estimate age of the customer based on their calling habits

Can distinguish minors in with an accuracy of >80%

•  Call records from March-Aug 2014

•  Corresponds to ~3TB data

•  Attributes are •  Calling party ID •  Called party ID •  Date •  Time •  Duration at start/end •  Location •  Type of call and bearer •  TAC •  Data

•  Call records from March-Aug 2014

•  Corresponds to ~3TB data

•  Attributes are •  Calling party ID •  Called party ID •  Date •  Time •  Duration at start/end •  Location •  Type of call and bearer •  TAC •  Data

CDR CRM Data Feature Importance Observation

Calls (holidays-schooltime) 0.08-0.06 Minors call less in school holiday

Average call length 0.07 Minors make shorter calls

Call timing (night-day) 0.07-0.03 Minors call more at nighttime

Number of phone uses 0.05 Minors use the phones less

Percentage of text use 0.05 Minors text less

Number of contacts 0.05 Minors less likely to have 1 contact

Percentage of calls to minors 0.04 Minors call other minors more

Percentage of voice use 0.04 Typical

Caller-Callee ratio 0.04 Minors receive more calls than make

Fri/Sat/Thurs ratio 0.04-0.03 Minors call more at weekends

Number of locations 0.04 Minors more likely to have 2 locs

Page 22: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

22 © Copyright 2015 Pivotal. All rights reserved.

Internal Transaction Fraud Detection

Beyond signatures

Beyond simple metrics for thresholding

Beyond manual engineering of rules

Monitor each and every entity in its environmental context

Page 23: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

23 © Copyright 2015 Pivotal. All rights reserved.

Internal Transaction Fraud Detection

Beyond signatures

Beyond simple metrics for thresholding

Beyond manual engineering of rules

Monitor each and every entity in its environmental context

Page 24: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

24 © Copyright 2015 Pivotal. All rights reserved.

2

5

3

3

3,25

UserID and Data Experts analyze Overall vote is determined

S(id) = w1 ·M1(id) + . . .+ wj ·Mj(id)

X

i

wi = 1

s.t.Weights are a measure of “importance” for model expert j. Initially uniform across all experts.

Mixture of Experts Metaphor

Page 25: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

25 © Copyright 2015 Pivotal. All rights reserved.

Anomalous User Behavior Comparison

Mean Anomaly Scores Users

Transaction Anomaly

SoD Risk

Terminated Employees

CDHDR Access

Anomaly

VPN Access

Anomaly

Cluster Outlier

Total Score # %

Reg B

Red 0.6 0.6 0.1 0.2 0.1 0.6 2.3 26 0.3%

Amber 0.4 0.5 0.1 0.1 0.1 0.6 1.7 73 0.8%

Green 0.0 0.0 0.0 0.0 0.1 0.0 0.1 8,765 98.9%

Reg A

Red 0.1 - - 1.0 0.4 0.9 2.4 1 0.01%

Amber 0.4 0.2 0.0 0.1 0.2 0.7 1.7 25 0.4%

Green 0.0 0.0 0.0 0.0 0.1 0.0 0.2 6,853 99.6%

Page 26: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

26 © Copyright 2015 Pivotal. All rights reserved.

Add SMARTness to your app by leveraging data

Don’t think of Data Science in an isolated fashion

Move beyond POCs on Big Data

Start with a minimal viable product/solution

Get the right platform and resources in place

Collaborate and interact

Conclusions

Page 27: Pivotal Digital Transformation Forum: Data Science Bridging the Gap

Digital Transformation Forum

Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH