32
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1 Analytics Academy – “Statistical thinking” (Client: a household-name media company) Information and Data Management

1505 Statistical Thinking course extract

Embed Size (px)

Citation preview

Page 1: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 11

Analytics Academy – “Statistical thinking”

(Client: a household-name media company)

Information andData Management

Page 2: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 22

Contents

1st DayIntroduction

The research process: CRISP-DM

AnalysisReporting vs. modellingIs there an effect?Is there a single cause?ForecastingCould there be more than one cause?

2nd dayWorking together

The Data Academy

Sharing resultsWhat to showHow to show it

Next stepsA further project

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2

Page 3: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 33

Introduction: Getting your data to speak

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3

Page 4: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 44

Customer Acquisition & RetentionMarketing Efficiency & Advertising RevenueCost to Serve & ProfitabilityPromotion & Pricing OptimisationDemand ForecastingFraud Detection

Many business challenges

4Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 5: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 555

Do you know what you are looking for?The business need for the analysis is formulated and confirmed.The question that the stakeholder needs an answer for is articulated.Steering away from analysing ‘right answer to the wrong question’.

Do you know what you will do with the answers you find?The desired outputs from the analysis are shaped in detail to ensure that the analysis produces outputs in a format that is fit-for-purpose.Actual outputs can be easily integrated into the stakeholders’ target documents, systems or processes.

Do you have a way to evaluate success?Can you measure the current situation in terms of money, time or units? Do you have a way of tracking the results of your work in the same units?

Before jumping into the data...

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 6: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 666

Gather ideas from people in your business about the cause –> effect relationships.Gather impressions about the different classes or types of events.Consider both positive and negative outcomes.

Translate these ideas/impressions into dataWhat would data have to look like to detect the effects and trends people believe in?Translate business objective into analysis goal…

Getting your data to speak

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 7: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 77

The research process:CRISP-DM

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7

Page 8: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 88

CRISP-DM process

Business data for analytics

1 Develop business

understanding

2 Develop data understanding

3 Prepare data

4 Develop model

5 Evaluate results

6 Deploy live model

Key:

Data set

Process stage

Flow between stages

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8

Page 9: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 99

CRISP-DMBusiness understanding

Determine objectivesEstablish use casesSummarise current situationDetermine project goalsMap business goals to data problemEstimate current value so that ROI can be calculatedCreate project plan

Data understandingCollect initial dataDocument the real meaning of each data fieldCapture baseline SQL Explore the dataCheck data quality

Page 10: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1010

Analysis: Reporting vs. Modelling

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10

Page 11: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 111111

vs.

What was the rate of net growth? Why did we have higher/lower rate?

Information based on user-directed queries(hypothesis testing)

Knowledge based on finding unknown relationships (hypothesis generation)

Historical Analysis Predictive Analysis

Monitors performance measures Determines performance measures

Reactive Proactive

Reporting vs. Modelling

ModellingReporting

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 12: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 121212

Sometimes we put the data into the model and see what happens. Other times we manipulate the inputs (or the outputs) in some way so as to give the algorithm more information to work with.

By combining multiple techniques, we can often gain better insight into the nature of potential solutions to a business problem and hopefully lead us to a more useful result.

Since more than one approach may be used to address a single business problem, the same data may be used to address a wide range of applications. It will depend on which model you choose, how you manipulate the data in the file, and which input or target variables you choose.

Map analysis goal to modelling technique

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 13: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1313

Analysis: Basic statistical terms

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13

Page 14: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1414

Level of measurementLevel of measurement Summary

statisticVisualization

Categorical or Nominal Mode Bar chart, pareto chart

Ranked or Ordinal Median, percentile

Bar chart

Numeric or Scale Mean or average

Histogram, line graph, bubble chart

0

5

10

15

20

25

30

35

40

45

50

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0

5

10

15

20

25

30

35

40

45

50

Page 15: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1515

Inserting functionsClick in a cellGo the Insert menuChoose Functions…Select a categoryClick on a function and look at the brief help (first letter searchworks)Click OK to paste

Click Help on this functionfor more informationand a worked example

Page 16: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1616

Numbers that describea distributionStatistic Function Definition

Mode =MODE The most common value

Percent =COUNT What proportion of the cases are in this group? COUNT in the group divided by total COUNT.

Percentile =PERCENTILE=PERCENTILERANK

How far down the list of an ordered set are you?

Median =MEDIAN The middle value of an ordered set. The 50th percentile.

Mean =AVERAGE Add all the values and divide by the count

Page 17: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1717

Analysis: Is there an effect?

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17

Page 18: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1818

Discussion of crosstabsA method to test if two variables have a non-random relationship

Also called chi-square analysis for the name of the statistic that is calculated Χ2 or X2

Page 19: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1919

Discussion of crosstabsIs there a relationship between the section you are reading on the website and whether or not you are motivated to subscribe?

Or are the numbers just due to the normal visit pattern on the site?

Data:Subscribe y/n on this visit to this sectionSection

YES NO TOTALHOME

NEWS

SPORT

FINANCE

COMMENT

BLOGS

CULTURE

TRAVEL

LIFESTYLE

FASHION

TECH

Offers

TOTAL

Page 20: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2020

Crosstabs exampleActual counts

Calculated %

Page 21: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2121

Analysis: Is there a single cause?

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21

Page 22: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2222

When we seek to predict something, what we are really saying is that we have in mind what the cause is and we are trying to predict how likely the effect is.

Modelling techniques do not make predictions on their own. Analysts structure the data input so that the model can use it in a cause and effect way.

Thus, it is important to make sure that all of the inputs into a model precede the output in time. You can’t put the effect before the cause.

22

Predictive modelling

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 23: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 232323

Try to get to models early in the process, even before you think you are ready.

Models can tell you things about the data that you can’t see “just by looking”

Build lots of models. Throw away the ones that you are done with

Refine models based on what you learn at each iteration.

Algorithms (within their limitations) are objective

Interpret the results, then make them better

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 24: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2424

Could there be more than one cause?

A topic for another course…

Advanced analyticsStructure the data into before and afterPick a targetTest multiple input hypotheses at once

Forecasting:ARIMA allows for including multiple time series inputsSpecial eventsWeatherEconomic trends

Multivariate propensityDiscover different predictive segmentsWorks best with predicting Y/N actions rather than values

Page 25: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2525

Sharing results: What to show

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25

Page 26: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2626

Your job is to inspire Your job is not to convince or teach

Lead with the important and interesting findingsExplain in general termsLeave the details at the Data Academy

Inspire them

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26

Page 27: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2727

BusinessObjectives

AnalysisResults

BusinessTerms

Modelling & Evaluation(Accuracy & Significance)

MeaningfulRelevant

ActionableQuantified

Translating analysis results into business terms

27Copyright © 2012 Red Olive Ltd, All Rights Reserved.

AnalysisGoals

Page 28: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2828

How you did itHow long it tookStatistical methods

What to leave in the Data Academy

Problems you hadCaveats related to dataDirty data

AnalysisGoals

AnalysisResultsModelling &

Evaluation(Accuracy & Significance)

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28

Page 29: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2929

Sharing results: How to show it

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29

Page 30: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3030

Can everyone read itIs someone color blind?Does someone have corrective lenses?

Will it print in black and white?Test printBlack text on dark colors, including red, will not print.Use white text instead

Wrong Better

Better

0

10

20

30

40

50

60

70

80

90

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Not as Nice

0102030405060708090

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Design – Colour

Page 31: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3131

Design – Colour for the colourblind

• http://www.colorbrewer2.org/

Page 32: 1505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3232

Contact information

Please direct enquiries to Jefferson Lynch: [email protected] Office: 01256 831100Mobile: 07860 353027

32