32
Data-Driven? Data Science? Lecture (part 1) @AaltoBIZ, Feb 2, 2015 by Johan Himberg Data Scientist, @ReaktorNow

Lecture notes on being Data-Driven and doing Data Science

Embed Size (px)

Citation preview

Data-Driven? Data Science?

!

Lecture (part 1) @AaltoBIZ, Feb 2, 2015

by Johan Himberg Data Scientist,

@ReaktorNow

Data contains information• Data may contain information • Information gives the capacity for making

beneficial decisions • compare with “energy gives the capacity of doing

work” or “information is the currency of decision making”

• …yes, this is not the formal definition of “information”

Lecture @AaltoBIZ, Johan Himberg, 2015

Data• You can’t work with “only (big) data” • You need prior assumptions and models to

gain information from data. • Think of the famous example on

• ice cream • drowning, and • temperature

Lecture @AaltoBIZ, Johan Himberg, 2015

Data-Driven: Probability & Empiricism“Data driven means that progress in an activity is compelled by data rather than by intuition or personal experience. It is often labeled as the business jargon for what scientists call evidence based decision making”

Wikipedia 2015-02-02

“I take risks, sometimes patients die. But not taking risks causes more patients to die, so I guess my biggest problem is I've been cursed with the ability to do the math.”

fictional character Dr. House in Fox television series “House”

Data-Driven• Being “data-driven” is an old concept … and very trendy

• “Digitalisation”

• “Big Data”

• Data Science

• Business acumen [“what for”]

• Operations Research [“optimal decisions”]

• Probability theory [“how to handle uncertainties”]

• Analytics [“insight”]

• Computer Science [“how to implement all that”]

Lecture @AaltoBIZ, Johan Himberg, 2015

Ideals of being Data-Driven• be curious (seek for evidence)

• be active (test, don’t just observe and analyse)

• be Bayesian (understand uncertainties)

• be courageous (act on the evidence)

• be agile (learn, fail fast… but not too fast: collect enough evidence)

• be transparent and helpful (show and share information, co-operate)

• be truthful and non-political (don’t abuse data, work across silos)

• be wise (there is a time to be data-driven and a time to be intuitive)

Culture eats strategy for breakfast

attributed to P. Drucker, popularised by M. Fields

Lecture @AaltoBIZ, Johan Himberg, 2015

Why

Why?• Data business

• eg. Google, Facebook

• Operational and Strategic aspects • case Ford

Lecture @AaltoBIZ, Johan Himberg, 2015

Data business• Sell

• audiences (Google, Facebook, media, …) • information (credit rating, car register,…)

Lecture @AaltoBIZ, Johan Himberg, 2015

Case Ford

• In 2006 closed the year with a $12.6 billion loss, the largest in the company’s history. Alan Mulally CEO 2006

• “top-down data-driven culture, innovative data science techniques”

• profitable again in 3 years

• INFORMS Prize Winner, 2013, Best Company of the Year in Analytics, Operations Research: “Analytical tools and the operations research team supported many decisions in this period, and a number of critical applications were developed:

• a dealer vehicle recommendation system

• a detailed econometric model enabling the study of what-if analyses of inventory, production, pricing, and sales

• a strategic sourcing model to restructure the Ford auto interiors division”

Lecture @AaltoBIZ, Johan Himberg, 2015

Brynjolfson et al (2011) on Data-Driven

• survey data on the business practices and IT investments of 179 large, publicly traded companies

• Firms that emphasise “data driven decision making”

• have output and productivity that is 5-6% higher than what would be expected given other investments and IT usage.

• relationship also appears in asset utilisation, return on equity and market value

• statistical analysis suggests that this does not appear to be due to reverse causality (!)

Lecture @AaltoBIZ, Johan Himberg, 2015

Operations • Favour beneficial events: target the operations and marketing, cross-

sell, up-sell, …

• Avoid non-beneficial events: churn, people leaving, waste, credit loss, fraud, system failures

• Optimize: work force, schedules, prices, stocks, relevancy, production quality

• Rationalise: process efficiency, lead times, handle complexity, search time …

• Understand: master data, transactions, processes

• internally: ERP, CRM, HR, sales systems, production, …

• externally: location, routes, weather, demographics, estates, …

Strategic• Efficiency and competition

• react faster, streamlined decision making, risk awareness

• more financial efficiency

• not giving the information advance to competitors (cf. efficient markets, warfare)

• innovations

• Well-informed strategic decisions

• understanding and predicting customer groups, behaviorm and experience; product and service development

• understanding and predicting world events, economics, demographics, …: react to market fluctuations, changes in financial environment

• Brand aspects

• transparency, objectivity, personalisation as a part of company culture and brand

Lecture @AaltoBIZ, Johan Himberg, 2015

From business problem to predictive / prescriptive model

CRISP-DM and Plan-Do-Check-Act• CRISP-DM (EU consortium ~1996-2008)

• Compare to PDCA popularised by W. E. Deming

• Deming is attributed the quote “In God we trust; all others must bring data.”

• Comments

• Think first how to deploy, not last

• Don’t plan too big things up-front (the processes & data & test results might ruin your plan

• Keep backlog and communicate, but don’t stuck into “understanding” and “insights” if they are not the main task

• You can’t make final evaluation on data before “deployment” (remember: be empirical)

• You should deploy several tests on the field before “final” evaluation

• Do not silo yourselves according to the boxes in the cycle!

• Learn continuously. Be truthful and curious.

P

C

D

A

Lecture @AaltoBIZ, Johan Himberg, 2015

Action !optimize

decide deploy

!Data !

big, small, open local, web, meta, …

!

Information !

report visualize

model

Bus

ines

s dr

iver

s

challenge 1

challenge 2

challenge 3

challenge 4

challenge 5For example

• Automatised decisions; recommendation, targeting

• Simulation

• prescriptive, predictive modelling

For example

• documentation on meaning of the data

• KPIs, profiles, segments, factors, DW dashboards

• descriptive, diagnostic, predictive modelling

For example

• source integrations

• Extract - Load - Transform

• Metadata

• modelling for cleansing & consistency

modelling what are the actions what are the insights

wrangling what data means

testing what is the impact

Think & plan from deployment to data

Pick a challenge!

Lecture @AaltoBIZ, Johan Himberg, 2015

Small note: This is not a guideline for IT or enterprise architecture, which is an important question, too.

Architects may benefit from the observations collected during data-driven work, though.

Action!

Data Information

Bus

ines

s dr

iver

s

challenge 1

start from here!

challenge 3

challenge 4

challenge 5For example

• Business: need optimising for customer retention

• Marketing: we could start with special offer by SMS

• Data Scientist: we’ll set up test & control groups!

For example

• M: some past campaign results & execution…

• Solution expret: Field ZPOR means revenue per unit and it is calculated based on …

• Data Base adm : Source X in DW is aggregated on monthly level

• DS: let’s have historical data on X and validate model

For example

• DBA: we have X for 1M users for 1 yr fields a,b,c

• DS: field c seems suspicious, we’ll try to correct it

modelling what are the actions what are the insights

wrangling what data means

testing what is the impact

Data-Driven is inherently iterative and benefits from agility. Data and processes are often not like assumed. Be curious, keep backlog, inspect, adapt.

Lecture @AaltoBIZ, Johan Himberg, 2015

Action!

Data Information

Bus

ines

s dr

iver

s

challenge 1

challenge 2

challenge 3

challenge 4

challenge 5For example

• deploy campaign, collect responses

For example

• calibrate & apply model

For example

• get data for modeling

• store results

modelling what are the actions what are the insights

wrangling what data means

testing what is the impact

Execute based on model, collect data

results

Action!

Data Information

Bus

ines

s dr

iver

s

challenge 1

challenge 2

challenge 3

challenge 4

challenge 5Backlog example

• test & control group handling in marketing automation

• Involve N.N. to the process

Backlog example

• define new information source

• Look for a new data source for determining income on zip code areas

• correct documentation

• automatization for the campaign modelling

Backlog example

• better system configuration & architecture

• automatization for the campaign process…

• new data: record information on all campaigns

modelling what are the actions what are the insights

wrangling what data means

testing what is the impact

Information path focused backlog

Lecture @AaltoBIZ, Johan Himberg, 2015

Aim - Explore - Exploit • Röntgen and Fleming (Nobel laureates)

• their great findings were “accidental”, but

• they were skilled scientists doing disciplined research for some other aim

• Aim - explore - exploit

• Always aim at something specific but be open-minded and curious; insights come along with the process.

• Explore occasionally “from data to insights”. But don’t overdo exploration.

• If you find something interesting, make a disciplined test and exploit the finding.

Lecture @AaltoBIZ, Johan Himberg, 2015

Tech

Technology?• Variation is big: a combination of

• business

• data and information

• decision type

• deployment (actions)

• Things evolve rapidly

Lecture @AaltoBIZ, Johan Himberg, 2015

Technology?• Prefer systems

• that give mass-access to historical, transactional data on individual level instead of just aggregates (avoid being blinded by averages)

• from which you’ll get the data, transformations, and results out to another system (avoid being “data hostage”)

• where you see what the analytics actually does at least on modular level (avoid being “method hostage”) Prefer being able to see the actual implementation (open source).

Lecture @AaltoBIZ, Johan Himberg, 2015

Technology?• We have used

• R, ggplot2, Shiny, …

• Apache Spark

• Python

• cloud based solutions

• required proprietary products, if needed

• a specific task (don’t reinvent all wheels)

• dedicated hardware, if critical or confidential…

• How to document the process of data transformation. That’s a question!

Lecture @AaltoBIZ, Johan Himberg, 2015

!

Culture Organisation

Data Science is cross-functional • Data scientists main tasks are in methods, but also in

processes and machinery of

• making evidence based decisions (automated if possible)

• finding out confidence on the outcome (by active tests if possible)

• getting insights based on models and data

• Data science / data scientist act also as a “glue”

Lecture @AaltoBIZ, Johan Himberg, 2015

Cross-functional teams • Doing data-driven work and data science in any

organisation model boils down to

“Involve everyone along the information path”

[that was the red, crooked line in one of the previous slides]

Lecture @AaltoBIZ, Johan Himberg, 2015

Team & skills • A change of culture; information is everybody’s business

• Business / Marketing / Finance specialists

• Project / Process / Solution owners

• Research

• Data Stewards / DB administration

• Developers

• Visualization / UX experts

• …

• One data scientist can’t excel all of that but should be knowledgeable enough to work with everyone along the information path

Lecture @AaltoBIZ, Johan Himberg, 2015

Data Science skills• There is no “one” definition for Data Science or the skills

• Data science is a combination of business acumen, statistics, data mining, DBs, big data, machine learning, computer science, etc. See for example:

• http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html

• http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html

• You should team up anyway, the more the merrier, to find all relevant skills. A view to this:

• http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Team-Solution-Data-Scientist-Shortage.pdf

Ideals of being Data-Driven• be curious (seek for evidence)

• be active (test, don’t just observe and analyse)

• be Bayesian (understand uncertainties)

• be courageous (act on the evidence)

• be agile (learn, fail fast… but not too fast: collect enough evidence)

• be transparent and helpful (show and share information, co-operate)

• be truthful and non-political (don’t abuse data, work across silos)

• be wise (there is a time to be data-driven and a time to be intuitive)

Culture eats strategy for breakfast

attributed to P. Drucker, popularised by M. Fields

Lecture @AaltoBIZ, Johan Himberg, 2015

Read more

References & Suggested reading• Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does

Data-Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486

• T. Davenport, J. G. Harris, R. Morison. Analytics at Work – Smarter Decisions, Better Results

• I disagree with some cultural issues, but a good overview

• A. Croll B. Yoskovitz. Lean Analytics – Use Data to Build a Better Startup Faster

• data-driven thinking is crucial for start-ups

• Ford case

• http://blog.revolutionanalytics.com/2014/11/ford-uses-r-for-data-driven-decision-making.html

• http://dataconomy.com/how-ford-uses-data-science-past-present-and-future/

• https://www.informs.org/About-INFORMS/News-Room/Press-Releases/INFORMS-Prize-2013-Ford

• http://adage.com/article/datadriven-marketing/ford-names-chief-data-analytics-officer/296383/

Lecture @AaltoBIZ, Johan Himberg, 2015