Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Steve Morlidge
Managing forecast performance
or My adventures in Errorland.
Steve Morlidge
Unilever (1978–2006) roles include: § Controller, Unilever Foods UK ($1 billion turnover) § Leader, Dynamic Performance Management change
project (part of Unilever’s Finance Academy), 2002–2006
Outside Unilever § Chairman of the BBRT, 2001–2006 § BBRT Associate, 2007 to present § Founder/director, Satori Partners Ltd., 2006 § Ph.D., Hull University (Management Cybernetics), 2005 § Visiting Fellow, Cranfield University, 2007 § Coauthored book Future Ready: How to Master
Business Forecasting, 2010 § Editorial Board, Foresight magazine, 2010 § Founder, CatchBull (forecasting performance
management software), 2011
My perspective
1. Demand forecasting in the supply chain, characterized by:
• volume/volatility/velocity/variety/very short term 2. Forecasting is a business process that should add
value 3. Forecasting should be theoretically robust but
evidence driven 4. Any insight has to be operationalized before it adds
value 5. I am not ‘properly qualified’ so I can ask the dumb
questions 6. Privileged access to data and insights
Some dumb questions
1. Is it possible to measure forecastability (i.e. assess forecast quality)?
2. Can we measure whether forecasting adds value?
3. Does forecasting add value? 4. How do we measure intermittent demand
forecast quality? 5. How do we value the value that forecasting adds
(part 1)? 6. Can forecast quality be ‘managed’?
The Theory
Forecastability and
Adding Value
Why forecast? 101(for the Supply Chain)
Replenishment based on consumption (Kanban)
Replenishment based on forecast
Errors from a good forecast (average demand) will be
lower…
Changes in stock levels follow pattern of demand
…so to avoid stock out we need safety stock based on the standard
deviation of demand
.. leading to less safety stock as it is based on the (lower) standard deviation of error
Replenishment based on consumption is equivalent to using the prior period’s actual as a forecast, so…
the upper bound of forecast error should be the naive forecast error - any higher and forecasting adds no value…
…and comparing errors to those from a naïve forecast also allows for forecastability because items with volatile demand
(high levels of noise) are more difficult to forecast well
Quality: a practical definition
• At the decision making level (e.g. low level stock replenishment point) forecasts should be: • Better than simple replenishment (higher bound
of forecast error = naïve forecast error) • As close as possible to minimum avoidable error
(lower bound of error = ?) • Produced at affordable cost (generate more
value than the process consumes)
Lower Limit of error (Issue 30)
0.0
50.0
100.0
150.0
200.0
250.0
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Signal = totally forecastable
Signal
Range of Naïve Forecast Errors
(including noise)
Range of Actual (Perfect) Forecast Errors
(excluding noise)
Changing Signal Min RAE = < 0.71
Flat Signal Min RAE = √0.5 =0.71
Goodwin: Foresight Summer 2013
The lower bound of forecast error is also related to the naïve forecast…expressed as Relative Absolute Error (RAE)…but how?
Noise = totally unforecastable
New thinking: new measures
Forecast Error
Zero Error
Total Error
Unavoidable Error?
0.5 (best) Practical limit of Forecastability? (any trend)
How low can you go?
Theoretical limit of Forecastability (no trend) 0.7 (good)
Value is added or destroyed at stock holding level (item/
location)
Destroying Value
Adding Value
simple replenishment = Naïve Forecast
0.0
1.0 (b/e)
Relative Absolute
Error (RAE)
Increasing volatility = increasing difficulty of forecasting
The Evidence
Forecastability Adding Value
The evidence (Issues 32, 33)
9 samples from 8 businesses – 330,000 data points
Median RAE
Wtd Av RAE
Sample 1 0.94
0.89
Sample 2 1.15
1.04
Sample 3 0.97
0.81
Sample 4 1.00
1.53
Sample 5 0.99
1.14
Sample 7 1.06
1.89
Sample 7 0.94
0.99
Sample 8 1.05
0.87
Sample 9 1.10
0.99
Mean
1.02
1.13 Excl Outliers 0.96 Very little value added
(4%)
Median MAPE
Forecast Accuracy
56% 49%
34% 77%
89% 34%
56% 35%
56% 45%
42% 8%
10% 35%
105% 53%
110% 51%
62% 43%
Traditional measures unhelpful
Rank Method RAE-wtd Type
1 ForecastPro 0.67 Expert2 B4J6automatic 0.68 Arima3 Dampen 0.70 Trend4 Comb6S4H4D 0.71 Trend5 Winter 0.71 Trend6 Forecast6X 0.72 Expert7 Holt 0.72 Trend8 Theta 0.73 Decomposition9 Single 0.73 Simple10 ARAMA 0.74 Arima11 AAM1 0.74 Arima12 PP4Autocast 0.76 Trend13 Autobox1 0.76 Arima14 Autobox3 0.78 Arima15 Naïve-2 0.79 Simple16 Autobox2 0.79 Arima17 Flores4Pearce61 0.80 Expert18 Automat4Ann 0.81 Expert19 AAM2 0.82 Arima20 Robust4Trend 0.86 Trend21 Theta4Sm 0.88 Trend22 RBF 0.88 Expert23 Flores4Pearce62 0.95 Expert24 Smartfcs 0.96 Expert
Average 0.78
M3 Competition
Research2013 9 samples from 8 businesses – 330,000 data points
Median RAE
Wtd Av RAE
Sample 1 0.94
0.89
Sample 2 1.15
1.04
Sample 3 0.97
0.81
Sample 4 1.00
1.53
Sample 5 0.99
1.14
Sample 7 1.06
1.89
Sample 7 0.94
0.99
Sample 8 1.05
0.87
Sample 9 1.10
0.99
Mean
1.02
1.13 Excl Outliers 0.96
Median MAPE
Forecast Accuracy
56% 49%
34% 77%
89% 34%
56% 35%
56% 45%
42% 8%
10% 35%
105% 53%
110% 51%
62% 43%
<0.5
0.5-‐0.7
0.7-‐1.0
>1.0
0% 6% 52% 42%
1% 5% 33% 62%
8% 12% 33% 47%
13% 11% 27% 49%
1% 9% 42% 48%
7% 10% 27% 56%
4% 11% 44% 40%
6% 2% 35% 57%
2% 3% 31% 64%
5% 8% 36% 52%
Distribution of RAE
Few forecasts can beat RAE of 0.5…natural
limit?
What “good” looks like
Half of all forecasts are destroying value
M3 Competition Forecasts RAE >1.0
The distribution of RAE
Significant levels of value destruction
0.5 - about as good as it gets
0.7 is good:
☛
Operationalization Examples from ForecastQT
Forecastability Adding Value
Relative Absolute
Error (RAE)
Value Added Score (VAS) -100 to 0 = ‘Unacceptable’
Key Concepts: Forecast Value Added
Forecast Error
Zero Error
100% Error
simple replenishment = Naïve Forecast 1.0
0.0
Cost of A
voidable Error VAS 0 - 30 = ‘Acceptable’
VAS 30 - 60 = ‘Good’
VAS 60 -100 = ‘Excellent’ 0.7
0.85
0.5
Unavoidable Error
Limit of Forecastability
Destroying Value
Adding Value
DRILL
Stock Holding Level
Performance Tracking
Apart from first few months
performance is excellent but
now falling away
Performance of benchmark
method (SES) shown as grey
line
League tables reflects different level of demand variability (Italy 41%, Sweden
19%)
European Packaging Manufacturer
Portfolio Management UK Spirits Business
RAE 0.85, 49% value destructive
UK Wine Business RAE 0.90, 53% value destructive
European Tobacco Business RAE 0.80, 25% value destructive
UK Beverages Business RAE 0.65, 25% value destructive
Segmentation analysis
High volume, low variability.
Optimise forecast algorithm and restrict judgemental input. Should be easy to
beat naive. Aim for green
High volume, high variability.
Forecasting involves judgement in addition to algorithms. Difficult but
possible to excel Aim for blue
Low volume, low variability.
Use simple/cheap approaches with no
judgemental intervention. Consider using naïve/kanban
Aim for amber
High volume, high variability.
Use simple/cheap approaches with no
judgemental intervention. Consider Make To Order
Aim for amber
HORSES MAD BULLS
DONKEYS JACK RABBITS
Increasing volatility (COV) à
Incr
easi
ng s
ize
(uni
ts/c
ost)
à
Segmentation Example (Issue 34)
Scatter plot not aligned with performance aspirations
Segmentation Analysis
High volume, low
variability...often where the value is destroyed (and MAPE is ‘best’!)
so… consider using
Kanban
The problem with measures
Intermittent Demand and
Valuing Adding Value
Remember this? (Issue 37)
☛
Period 1 Period 2 Period 3 Total
Actual 0 10 0 10
Forecast 0 0 0 0
AE 0 10 0 10
NAE 0 10 10 20
RAE 0.5
What is causing this?
Intermittent Demand (1)
Q1: What is the best forecast for this data series?
Q2: What is the mean absolute error for a forecast of 3 per period?
Q3: What is the mean absolute error for a forecast of zero per period?
Intermittent Demand (2)
Q1: What is the mean absolute error for a forecast of 4 per period…and zero?
Q2: What is the mean absolute error for a forecast of 5 per period?
Intermittent Demand (3)
Q1: Why is the pattern of errors so (apparently) inconsistent?
Q2: How much confidence do you now have in your metrics?
Asymmetric Demand • Absolute error metrics optimize on the median not the mean
• As a result whenever the distribution of data is asymmetric, absolute error: • is an unreliable guide to forecast performance • cannot be used for selecting algorithms
MAE for forecasts with different
levels of bias (Loss Function)
Criteria for an alternative metric
1. Optimizes on the mean 2. Easy to calculate, aggregate and interpret 3. Can be used for symmetric demand as well 4. Reflects the business impact of forecast error
Candidates
Metric Optimum Simple Symmetric Business Impact
Squared Error Measures
Mean No Yes Partly
Direct Measures
Mean? No Yes? Yes
Mean Based Measures
Mean Yes No No
Mean Based Measures e.g. Forecast vs Mean Actual
Cumulative Forecast Error Measures of bias
Loss Function optimizes on the mean and is linear
Mean Based Measures
1. Doesn’t reflect all qualities of a forecast 2. OK for algorithm selection but… 3. Can’t be used for measuring non – intermittent
demand forecast quality
Metric Optimum Simple Symmetric Business Impact
Mean Based Measures
Mean Yes No No
Forecast A Forecast B Forecast C
Forecast D
Bias Adjusted Error Metric
(abs)
Bias
+ av (abs)
+ Variation
Period'1 Period'2 Period'3 Period'4 Period'5 Mean'(abs)Actual 0.0 5.0 0.0 10.0 0.0 3.0
Forecast 0.0 0.0 0.0 0.0 0.0Error'vs'Mean &3.0 &3.0 &3.0 &3.0 &3.0 3.0Variation 3.0 8.0 3.0 13.0 3.0 6.0Bias'Adjusted 9.0
Zero Forecast
Forecast 3.0 3.0 3.0 3.0 3.0Error*vs*Mean 0.0 0.0 0.0 0.0 0.0 0.0Variation 3.0 2.0 3.0 7.0 3.0 3.6Bias*Adjusted 3.6
and… RAE 0.5 à 1.3
RAE 0.6
Forecast D
Forecast C Forecast B
Bias Adjusted Error
1. Reflects the two key qualities of a forecast: • How well it captures the LEVEL of demand • How well it captures the PATTERN of demand
2. OK for algorithm selection but… 3. Can be it used for measuring non – intermittent
demand forecast quality?
Bias = 0.0 Variation = 3.6
Bias = 0.0 Variation = 6.0
Bias = 0.0 Variation = 2.4
Bias = 0.0 Variation = 3.6
Forecast A
Bias Adjusted Error • Bias adjusted absolute error optimizes on the mean • As shape of the loss function is linear it is easy to interpret
Criteria for an alternative metric
1. Optimizes on the meanþ 2. Easy to calculate, aggregate and interpretþ 3. Can be used for symmetric demand as well☐ 4. Reflects the business impact of forecast
error☐
Forecasting 101- The Ideal
W1 W2 W3 W4 W5
Replenish based on forecast
No (cycle) stock at period end
Forecasting 101- The Reality
Month 1 Month 3 Month 4 Month 5 Month 2 Month 3&5 = over forecast (Bias)àtoo much cycle stock
Month 2&4 = under forecasted (Bias)à too little cycle stock
Month 2&3 = low variation requires less safety stock to meet service target
Month 4&5 = high variation requires more safety stock to meet service target
1. The level of excess/shortfall in cycle stock is directly proportional to the level of bias.
2. The level of safety stock is directly proportional to the level of variation (for a given target service level)
3. The the impact on the business of bias and variation (level and pattern) are independent of each other
Bias and traditional metrics (Issue 38)
Q1: What is the mean absolute error for a forecast of 2 per period?
Q2: What is the mean absolute error for a forecast of 3 per period?
Period'1 Period'2 Period'3 Period'4 Period'5 MeanActual 2 2 2 2 2 2.0
Period'1 Period'2 Period'3 Period'4 Period'5 MeanActual 1 2 4 2 1 2.0
Q3: What is the mean absolute error for a forecast of 2 per period?
Q4: What is the mean absolute error for a forecast of 3 per period?
= 0.0
= 1.0
= 0.8
= 1.4
Bias Adjusted Measures
Variation
Bias Bias
1. Error increase is directly proportional to the increase in bias 2. The impact of bias are variation (level and pattern) are
independent 3. BAMAE provides a consistent measure of the level of bias (and
impact on cycle stock) and variation (impact on safety stock) irrespective of the distribution of demand
Traditional Measures
Traditional error metrics underestimate the impact of bias especially when the level of bias is small compared to the level of variation. The outcome is sensitive to the distribution of demand.
As a result changes in the size of error do not reflect the impact on the stock or the business
The Loss Function for MAE is non
linear
Example UK FMCG Business
Candidates
Metric Optimum Simple Symmetric Business Impact
Squared Error Measures
Mean No Yes Partly
Direct Measures
Mean? No Yes? Yes?
Mean Based Measures
Mean Yes No No
Bias Adjusted Error
Mean Yes Yes Yes
Operationalising BAMAE and RAE
Managing Forecast Quality
How can this be operationalized? BAMAE = Bias Adjusted Mean Absolute Error
Value Added
Contribution of bias
Contribution of variation
under forecast
over forecast
= stock outs
= excess stock
RAE = Relative Absolute Error
BAMAE BAMAnE
Robust measure of forecast performance compared to simple replenishment (Value Added)
BARAE
Managing Forecast Performance
Forecasting System(s)
Analytical System
Performance Data (item level errors)
ERROR ANALYSIS • Forecast Value Added • Allowing for forecastability • Statistical alarms: signals from noise/
bias from variation • Continuously updated
JUDGEMENTAL INPUT • Adjusting history • Building forecast models • Changing and tuning models • Adjusting/overriding results
USED TO MAKE DECISIONS • replenishment
HIGH COMPLEXITY • 1000’s of forecasts • running in parallel • at multiple levels • across many dimensions • seasonality/intermittency
Management
ACTION • Compare and rank • Judgemental impact • Optimise portfolio • Real time issue resolution
Forecasters’ Judgement
History
Forecast
INPUT
OU
TPU
T
• SAP/APO • ForecastPro • SAS • Excel • etc
ForecastQT Home Page • Clear summary of three key indicators – Value Added Score (VAS), Bias and Variation • Supports drill down into SKU level detail • Easy to use tool with intuitive graphical interface based entirely in a web browser
Drill down by product (sorted by
size of bias)
Value Added (compared to ‘Kanban’
replenishment) and contribution of
bias and variation
Variation…demand vs error
volatility (PATTERN OF ERRORS
reflected in SAFETY STOCK)
Bias…in aggregate and in detail
(LEVEL OF ERRORS reflected in CYCLE STOCK)
Issue Management Example UK FMCG: <£1b revenue
High level bias (blue) c 0%, but low level and high level bias
(grey) = 15%...
…generating both over and under forecasting bias
alarms
Forecast Error
Excess stock
Unavoidable variation
Avoidable bias
Excess stock write off
Excess warehousing
costs
Excess financing costs
Over- forecasting
Under- forecasting
Avoidable lost sales
Expediting costs
Valuing Value Added (2) the cost of error
Avoidable variation
Stock impact of Avoidable Forecast Error (SAFE©) <0.25% of revenue
Total Cost of Avoidable Forecast Error (CAFE©) < 2% of revenue
What is this worth?
Cost of Sales Per $1b revenue1
Total Cost of Error 4%-6% $20m-$30m
Forecast Value Added 0%-2% $0m-$10m
Avoidable Error 1%-3% $5m-$15m
Avoidable Inventory2 $1m-$5m
1Assuming 50% Gross Margin 2Assuming 10 weeks stock cover