Upload
optimalbi-limited
View
213
Download
0
Embed Size (px)
Citation preview
1
1
Dr. Paul Bracewell8th November '06
Undermining the Future
2
Overview
RationaleApproachesCase Study (StateFleet, NSW Government)
Project BackgroundMethodologyResults
Summary
3
25 Years of SAS
The same techniques introduced in SAS 71 (first limited release of SAS) are still used today (Data Step, Regression and ANOVA).
http://en.wikipedia.org/wiki/SAS_System
4
Why the Title?
Undermining the Future – underneath any meaningful forecast, there is a lot of hard work required to ensure that the results are viable.
5
RationaleRobust results depend on stable framework
Appropriate Data (and manipulation) DOMAIN EXPERT
Appropriate Methodology ANALYST
Appropriate Interpretation END USER
For Results to be deployed“Marketing” of “Appropriateness” SPONSOR
“Marketing” of Results TRUST
“apart from the price tag, there is little difference between a model that isn’t deployed and a model that isn’t built…”
6
MethodologyData Mining Definitions
“Data mining is the … equivalent of sitting a huge number of monkeys down at keyboards, and then reporting on the monkeys who happened to type actual words.”
http://www.basketballfreesportspicks.com/glossary.shtml
“Data mining is the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules.”
Berry, M.J.A. & Linoff, G.S. (2000). Mastering data mining: The art and science of customer relationship management. John Wiley & Sons: New York.
2
7
Robust Methodologies…are rarely automaticrequire “expert” guidancetypically have little involvement from monkeysneed a team effort – interaction between skilled partiesrepetitive and/or interactiveembrace process variabilityallow for customers to evolve (not always a caterpillar)
8
ApproachesStatistical Process Control
Significant Change in Behaviour (Predicting Churn)within customer changesstandardisation (focus on change, not value of units)
collective behaviour changes (between customer)seasonalitygrowth/reduction in market
if normality can be approximated (via transformation), can obtain a probability – a relative rank score [0,1] – of normally exhibiting that behaviour (eliminate impact of very large values)effectively create a hypothesis test comparing current observedbehaviour with behaviour from specific period of time (past year, 4 years etc.)
9
ApproachesStatistical Inference
Predicting OutcomesRugby Example – use same principle as Chi-Square
Statistic to create relativity rugby rating (Think of a results grid, Expected Value is: Total Points Scored @ Home × Total Points Scored Away
/Total Points Scored)Why? To answer questions like: If Wellington plays
Canterbury, Canterbury plays Auckland, how will Wellington fare against Auckland?Continuous vs Discrete
10
ApproachesTime Series Analysis
Remove known componentsTest residuals for any remaining explainable
relationship Repeat removal and residual testing until no
relationship is left in the residuals (like strip mining)Smoothing
11
ApproachesData Driven
large data sets, possibly censuswhat you see is what you get……and all that there is
12
Case StudyStateFleet (NSW)
3
13
Project Background
Used Car Market Increasingly VolatileAs Tax Payer Money Involved, StateFleet:
can’t charge too much – poor use of Taxcan’t charge too little – slush fund wiped out
Public determine vehicle price (auction)Existing predictions not accurate enough
14
Reducing Impact of “Oil”
Split problem into 2 parts:1. Predict Base Trend
Public Buying Behaviour – confounding/nuisance variables
2. Model Impact of Attributes (mileage, engine size, etc.)Information known at completion of lease
show that approach is viable – descriptive model
Information known at start of leaseshow forecasting viability – predictive model
15
Other Techniques
Proportion of Value Modelledi.e. Used Sale Price/New Pricereduce impact of sales, currency etc.
Use of change percentage – removes unit of measure ($)
16
Cracking the Problem
Base trend biggest cause of dynamic volatilitye.g. petrol price, demand for small vs large cars,
increased popularity of 4WD etc.Perceived value of colour, engine, size etc.
relatively constant once base trend removed
17
Forecasting Base Trend
Two-stage approach1. Decision Tree: non-linearity, interactions2. RegressionRequired transparent, continuous results
Justify predictions to TreasuryCrude form of cointegration usedlarge vehicle sales dependent on small and 4WD sales at previous lags
18
Predicting Base Trend (Utes)
Forecasting 2 Years in Advance: Observed Predicted
4
19
Predicting Cohort Value
With all available information
(2 year forecast)
With all information known at start of lease
(2 year forecast)
20
Information Loss
0%10%20%30%40%50%60%70%80%90%100%
7. Marke
t
6. Con
dition
5. Clie
nt
4. Le
ase
3. Sp
ecific
2. Gen
eral
1. Ba
se
R-Sq for Observed and Predicted Residual Values for Cohorts sold within 60 day hold-out sample period
21
Project Outcome
Improved prediction capability by 250%Predicted results within 2% of actual result
Savings of $6,000,000 per yearAutomated tool created:
Saving timeResults auditable
Results reported in Sydney Morning Herald (29/08/06)
22