Upload
johan-himberg
View
490
Download
2
Tags:
Embed Size (px)
Citation preview
Data-Driven? Data Science?
!
Lecture (part 1) @AaltoBIZ, Feb 2, 2015
by Johan Himberg Data Scientist,
@ReaktorNow
Data contains information• Data may contain information • Information gives the capacity for making
beneficial decisions • compare with “energy gives the capacity of doing
work” or “information is the currency of decision making”
• …yes, this is not the formal definition of “information”
Lecture @AaltoBIZ, Johan Himberg, 2015
Data• You can’t work with “only (big) data” • You need prior assumptions and models to
gain information from data. • Think of the famous example on
• ice cream • drowning, and • temperature
Lecture @AaltoBIZ, Johan Himberg, 2015
Data-Driven: Probability & Empiricism“Data driven means that progress in an activity is compelled by data rather than by intuition or personal experience. It is often labeled as the business jargon for what scientists call evidence based decision making”
Wikipedia 2015-02-02
“I take risks, sometimes patients die. But not taking risks causes more patients to die, so I guess my biggest problem is I've been cursed with the ability to do the math.”
fictional character Dr. House in Fox television series “House”
Data-Driven• Being “data-driven” is an old concept … and very trendy
• “Digitalisation”
• “Big Data”
• Data Science
• Business acumen [“what for”]
• Operations Research [“optimal decisions”]
• Probability theory [“how to handle uncertainties”]
• Analytics [“insight”]
• Computer Science [“how to implement all that”]
Lecture @AaltoBIZ, Johan Himberg, 2015
Ideals of being Data-Driven• be curious (seek for evidence)
• be active (test, don’t just observe and analyse)
• be Bayesian (understand uncertainties)
• be courageous (act on the evidence)
• be agile (learn, fail fast… but not too fast: collect enough evidence)
• be transparent and helpful (show and share information, co-operate)
• be truthful and non-political (don’t abuse data, work across silos)
• be wise (there is a time to be data-driven and a time to be intuitive)
Culture eats strategy for breakfast
attributed to P. Drucker, popularised by M. Fields
Lecture @AaltoBIZ, Johan Himberg, 2015
Why?• Data business
• eg. Google, Facebook
• Operational and Strategic aspects • case Ford
Lecture @AaltoBIZ, Johan Himberg, 2015
Data business• Sell
• audiences (Google, Facebook, media, …) • information (credit rating, car register,…)
Lecture @AaltoBIZ, Johan Himberg, 2015
Case Ford
• In 2006 closed the year with a $12.6 billion loss, the largest in the company’s history. Alan Mulally CEO 2006
• “top-down data-driven culture, innovative data science techniques”
• profitable again in 3 years
• INFORMS Prize Winner, 2013, Best Company of the Year in Analytics, Operations Research: “Analytical tools and the operations research team supported many decisions in this period, and a number of critical applications were developed:
• a dealer vehicle recommendation system
• a detailed econometric model enabling the study of what-if analyses of inventory, production, pricing, and sales
• a strategic sourcing model to restructure the Ford auto interiors division”
Lecture @AaltoBIZ, Johan Himberg, 2015
Brynjolfson et al (2011) on Data-Driven
• survey data on the business practices and IT investments of 179 large, publicly traded companies
• Firms that emphasise “data driven decision making”
• have output and productivity that is 5-6% higher than what would be expected given other investments and IT usage.
• relationship also appears in asset utilisation, return on equity and market value
• statistical analysis suggests that this does not appear to be due to reverse causality (!)
Lecture @AaltoBIZ, Johan Himberg, 2015
Operations • Favour beneficial events: target the operations and marketing, cross-
sell, up-sell, …
• Avoid non-beneficial events: churn, people leaving, waste, credit loss, fraud, system failures
• Optimize: work force, schedules, prices, stocks, relevancy, production quality
• Rationalise: process efficiency, lead times, handle complexity, search time …
• Understand: master data, transactions, processes
• internally: ERP, CRM, HR, sales systems, production, …
• externally: location, routes, weather, demographics, estates, …
Strategic• Efficiency and competition
• react faster, streamlined decision making, risk awareness
• more financial efficiency
• not giving the information advance to competitors (cf. efficient markets, warfare)
• innovations
• Well-informed strategic decisions
• understanding and predicting customer groups, behaviorm and experience; product and service development
• understanding and predicting world events, economics, demographics, …: react to market fluctuations, changes in financial environment
• Brand aspects
• transparency, objectivity, personalisation as a part of company culture and brand
Lecture @AaltoBIZ, Johan Himberg, 2015
CRISP-DM and Plan-Do-Check-Act• CRISP-DM (EU consortium ~1996-2008)
• Compare to PDCA popularised by W. E. Deming
• Deming is attributed the quote “In God we trust; all others must bring data.”
• Comments
• Think first how to deploy, not last
• Don’t plan too big things up-front (the processes & data & test results might ruin your plan
• Keep backlog and communicate, but don’t stuck into “understanding” and “insights” if they are not the main task
• You can’t make final evaluation on data before “deployment” (remember: be empirical)
• You should deploy several tests on the field before “final” evaluation
• Do not silo yourselves according to the boxes in the cycle!
• Learn continuously. Be truthful and curious.
P
C
D
A
Lecture @AaltoBIZ, Johan Himberg, 2015
Action !optimize
decide deploy
!Data !
big, small, open local, web, meta, …
!
Information !
report visualize
model
Bus
ines
s dr
iver
s
challenge 1
challenge 2
challenge 3
challenge 4
challenge 5For example
• Automatised decisions; recommendation, targeting
• Simulation
• prescriptive, predictive modelling
For example
• documentation on meaning of the data
• KPIs, profiles, segments, factors, DW dashboards
• descriptive, diagnostic, predictive modelling
For example
• source integrations
• Extract - Load - Transform
• Metadata
• modelling for cleansing & consistency
modelling what are the actions what are the insights
wrangling what data means
testing what is the impact
Think & plan from deployment to data
Pick a challenge!
Lecture @AaltoBIZ, Johan Himberg, 2015
Small note: This is not a guideline for IT or enterprise architecture, which is an important question, too.
Architects may benefit from the observations collected during data-driven work, though.
Action!
Data Information
Bus
ines
s dr
iver
s
challenge 1
start from here!
challenge 3
challenge 4
challenge 5For example
• Business: need optimising for customer retention
• Marketing: we could start with special offer by SMS
• Data Scientist: we’ll set up test & control groups!
For example
• M: some past campaign results & execution…
• Solution expret: Field ZPOR means revenue per unit and it is calculated based on …
• Data Base adm : Source X in DW is aggregated on monthly level
• DS: let’s have historical data on X and validate model
For example
• DBA: we have X for 1M users for 1 yr fields a,b,c
• DS: field c seems suspicious, we’ll try to correct it
modelling what are the actions what are the insights
wrangling what data means
testing what is the impact
Data-Driven is inherently iterative and benefits from agility. Data and processes are often not like assumed. Be curious, keep backlog, inspect, adapt.
Lecture @AaltoBIZ, Johan Himberg, 2015
Action!
Data Information
Bus
ines
s dr
iver
s
challenge 1
challenge 2
challenge 3
challenge 4
challenge 5For example
• deploy campaign, collect responses
For example
• calibrate & apply model
For example
• get data for modeling
• store results
modelling what are the actions what are the insights
wrangling what data means
testing what is the impact
Execute based on model, collect data
results
Action!
Data Information
Bus
ines
s dr
iver
s
challenge 1
challenge 2
challenge 3
challenge 4
challenge 5Backlog example
• test & control group handling in marketing automation
• Involve N.N. to the process
Backlog example
• define new information source
• Look for a new data source for determining income on zip code areas
• correct documentation
• automatization for the campaign modelling
Backlog example
• better system configuration & architecture
• automatization for the campaign process…
• new data: record information on all campaigns
modelling what are the actions what are the insights
wrangling what data means
testing what is the impact
Information path focused backlog
Lecture @AaltoBIZ, Johan Himberg, 2015
Aim - Explore - Exploit • Röntgen and Fleming (Nobel laureates)
• their great findings were “accidental”, but
• they were skilled scientists doing disciplined research for some other aim
• Aim - explore - exploit
• Always aim at something specific but be open-minded and curious; insights come along with the process.
• Explore occasionally “from data to insights”. But don’t overdo exploration.
• If you find something interesting, make a disciplined test and exploit the finding.
Lecture @AaltoBIZ, Johan Himberg, 2015
Technology?• Variation is big: a combination of
• business
• data and information
• decision type
• deployment (actions)
• Things evolve rapidly
Lecture @AaltoBIZ, Johan Himberg, 2015
Technology?• Prefer systems
• that give mass-access to historical, transactional data on individual level instead of just aggregates (avoid being blinded by averages)
• from which you’ll get the data, transformations, and results out to another system (avoid being “data hostage”)
• where you see what the analytics actually does at least on modular level (avoid being “method hostage”) Prefer being able to see the actual implementation (open source).
Lecture @AaltoBIZ, Johan Himberg, 2015
Technology?• We have used
• R, ggplot2, Shiny, …
• Apache Spark
• Python
• cloud based solutions
• required proprietary products, if needed
• a specific task (don’t reinvent all wheels)
• dedicated hardware, if critical or confidential…
• How to document the process of data transformation. That’s a question!
Lecture @AaltoBIZ, Johan Himberg, 2015
Data Science is cross-functional • Data scientists main tasks are in methods, but also in
processes and machinery of
• making evidence based decisions (automated if possible)
• finding out confidence on the outcome (by active tests if possible)
• getting insights based on models and data
• Data science / data scientist act also as a “glue”
Lecture @AaltoBIZ, Johan Himberg, 2015
Cross-functional teams • Doing data-driven work and data science in any
organisation model boils down to
“Involve everyone along the information path”
[that was the red, crooked line in one of the previous slides]
Lecture @AaltoBIZ, Johan Himberg, 2015
Team & skills • A change of culture; information is everybody’s business
• Business / Marketing / Finance specialists
• Project / Process / Solution owners
• Research
• Data Stewards / DB administration
• Developers
• Visualization / UX experts
• …
• One data scientist can’t excel all of that but should be knowledgeable enough to work with everyone along the information path
Lecture @AaltoBIZ, Johan Himberg, 2015
Data Science skills• There is no “one” definition for Data Science or the skills
• Data science is a combination of business acumen, statistics, data mining, DBs, big data, machine learning, computer science, etc. See for example:
• http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
• http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html
• You should team up anyway, the more the merrier, to find all relevant skills. A view to this:
• http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Team-Solution-Data-Scientist-Shortage.pdf
Ideals of being Data-Driven• be curious (seek for evidence)
• be active (test, don’t just observe and analyse)
• be Bayesian (understand uncertainties)
• be courageous (act on the evidence)
• be agile (learn, fail fast… but not too fast: collect enough evidence)
• be transparent and helpful (show and share information, co-operate)
• be truthful and non-political (don’t abuse data, work across silos)
• be wise (there is a time to be data-driven and a time to be intuitive)
Culture eats strategy for breakfast
attributed to P. Drucker, popularised by M. Fields
Lecture @AaltoBIZ, Johan Himberg, 2015
References & Suggested reading• Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does
Data-Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486
• T. Davenport, J. G. Harris, R. Morison. Analytics at Work – Smarter Decisions, Better Results
• I disagree with some cultural issues, but a good overview
• A. Croll B. Yoskovitz. Lean Analytics – Use Data to Build a Better Startup Faster
• data-driven thinking is crucial for start-ups
• Ford case
• http://blog.revolutionanalytics.com/2014/11/ford-uses-r-for-data-driven-decision-making.html
• http://dataconomy.com/how-ford-uses-data-science-past-present-and-future/
• https://www.informs.org/About-INFORMS/News-Room/Press-Releases/INFORMS-Prize-2013-Ford
• http://adage.com/article/datadriven-marketing/ford-names-chief-data-analytics-officer/296383/
Lecture @AaltoBIZ, Johan Himberg, 2015