Managing forecast performance or My adventures in Errorland. · Coauthored book Future Ready: How...

Preview:

Citation preview

Steve Morlidge

Managing forecast performance

or My adventures in Errorland.

Steve Morlidge

Unilever (1978–2006) roles include: § Controller, Unilever Foods UK ($1 billion turnover) § Leader, Dynamic Performance Management change

project (part of Unilever’s Finance Academy), 2002–2006

Outside Unilever § Chairman of the BBRT, 2001–2006 § BBRT Associate, 2007 to present § Founder/director, Satori Partners Ltd., 2006 § Ph.D., Hull University (Management Cybernetics), 2005 § Visiting Fellow, Cranfield University, 2007 § Coauthored book Future Ready: How to Master

Business Forecasting, 2010 § Editorial Board, Foresight magazine, 2010 § Founder, CatchBull (forecasting performance

management software), 2011

My perspective

1.  Demand forecasting in the supply chain, characterized by:

•  volume/volatility/velocity/variety/very short term 2.  Forecasting is a business process that should add

value 3.  Forecasting should be theoretically robust but

evidence driven 4.  Any insight has to be operationalized before it adds

value 5.  I am not ‘properly qualified’ so I can ask the dumb

questions 6.  Privileged access to data and insights

Some dumb questions

1.  Is it possible to measure forecastability (i.e. assess forecast quality)?

2.  Can we measure whether forecasting adds value?

3.  Does forecasting add value? 4.  How do we measure intermittent demand

forecast quality? 5.  How do we value the value that forecasting adds

(part 1)? 6.  Can forecast quality be ‘managed’?

The Theory

Forecastability and

Adding Value

Why forecast? 101(for the Supply Chain)

Replenishment based on consumption (Kanban)

Replenishment based on forecast

Errors from a good forecast (average demand) will be

lower…

Changes in stock levels follow pattern of demand

…so to avoid stock out we need safety stock based on the standard

deviation of demand

.. leading to less safety stock as it is based on the (lower) standard deviation of error

Replenishment based on consumption is equivalent to using the prior period’s actual as a forecast, so…

the upper bound of forecast error should be the naive forecast error - any higher and forecasting adds no value…

…and comparing errors to those from a naïve forecast also allows for forecastability because items with volatile demand

(high levels of noise) are more difficult to forecast well

Quality: a practical definition

• At the decision making level (e.g. low level stock replenishment point) forecasts should be: •  Better than simple replenishment (higher bound

of forecast error = naïve forecast error) •  As close as possible to minimum avoidable error

(lower bound of error = ?) •  Produced at affordable cost (generate more

value than the process consumes)

Lower Limit of error (Issue 30)

0.0

50.0

100.0

150.0

200.0

250.0

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

Signal = totally forecastable

Signal

Range of Naïve Forecast Errors

(including noise)

Range of Actual (Perfect) Forecast Errors

(excluding noise)

Changing Signal Min RAE = < 0.71

Flat Signal Min RAE = √0.5 =0.71

Goodwin: Foresight Summer 2013

The lower bound of forecast error is also related to the naïve forecast…expressed as Relative Absolute Error (RAE)…but how?

Noise = totally unforecastable

New thinking: new measures

Forecast Error

Zero Error

Total Error

Unavoidable Error?

0.5 (best) Practical limit of Forecastability? (any trend)

How low can you go?

Theoretical limit of Forecastability (no trend) 0.7 (good)

Value is added or destroyed at stock holding level (item/

location)

Destroying Value

Adding Value

simple replenishment = Naïve Forecast

0.0

1.0 (b/e)

Relative Absolute

Error (RAE)

Increasing volatility = increasing difficulty of forecasting

The Evidence

Forecastability Adding Value

The evidence (Issues 32, 33)

9 samples from 8 businesses – 330,000 data points

Median  RAE  

 

Wtd  Av  RAE  

 

Sample 1 0.94

0.89

Sample 2 1.15

1.04

Sample 3 0.97

0.81

Sample 4 1.00

1.53

Sample 5 0.99

1.14

Sample 7 1.06

1.89

Sample 7 0.94

0.99

Sample 8 1.05

0.87

Sample 9 1.10

0.99

Mean

1.02

1.13 Excl Outliers 0.96 Very little value added

(4%)

Median  MAPE  

 

Forecast  Accuracy  

 

56% 49%

34% 77%

89% 34%

56% 35%

56% 45%

42% 8%

10% 35%

105% 53%

110% 51%

62% 43%

Traditional measures unhelpful

Rank Method RAE-wtd Type

1 ForecastPro 0.67 Expert2 B4J6automatic 0.68 Arima3 Dampen 0.70 Trend4 Comb6S4H4D 0.71 Trend5 Winter 0.71 Trend6 Forecast6X 0.72 Expert7 Holt 0.72 Trend8 Theta 0.73 Decomposition9 Single 0.73 Simple10 ARAMA 0.74 Arima11 AAM1 0.74 Arima12 PP4Autocast 0.76 Trend13 Autobox1 0.76 Arima14 Autobox3 0.78 Arima15 Naïve-2 0.79 Simple16 Autobox2 0.79 Arima17 Flores4Pearce61 0.80 Expert18 Automat4Ann 0.81 Expert19 AAM2 0.82 Arima20 Robust4Trend 0.86 Trend21 Theta4Sm 0.88 Trend22 RBF 0.88 Expert23 Flores4Pearce62 0.95 Expert24 Smartfcs 0.96 Expert

Average 0.78

M3 Competition

Research2013 9 samples from 8 businesses – 330,000 data points

Median  RAE  

Wtd  Av  RAE  

Sample 1 0.94

0.89

Sample 2 1.15

1.04

Sample 3 0.97

0.81

Sample 4 1.00

1.53

Sample 5 0.99

1.14

Sample 7 1.06

1.89

Sample 7 0.94

0.99

Sample 8 1.05

0.87

Sample 9 1.10

0.99

Mean

1.02

1.13 Excl Outliers 0.96

Median  MAPE  

Forecast  Accuracy  

56% 49%

34% 77%

89% 34%

56% 35%

56% 45%

42% 8%

10% 35%

105% 53%

110% 51%

62% 43%

 <0.5    

 0.5-­‐0.7    

0.7-­‐1.0    

 >1.0    

0% 6% 52% 42%

1% 5% 33% 62%

8% 12% 33% 47%

13% 11% 27% 49%

1% 9% 42% 48%

7% 10% 27% 56%

4% 11% 44% 40%

6% 2% 35% 57%

2% 3% 31% 64%

5% 8% 36% 52%

Distribution of RAE

Few forecasts can beat RAE of 0.5…natural

limit?

What “good” looks like

Half of all forecasts are destroying value

M3 Competition Forecasts RAE >1.0

The distribution of RAE

Significant levels of value destruction

0.5 - about as good as it gets

0.7 is good:

Operationalization Examples from ForecastQT

Forecastability Adding Value

Relative Absolute

Error (RAE)

Value Added Score (VAS) -100 to 0 = ‘Unacceptable’

Key Concepts: Forecast Value Added

Forecast Error

Zero Error

100% Error

simple replenishment = Naïve Forecast 1.0

0.0

Cost of A

voidable Error VAS 0 - 30 = ‘Acceptable’

VAS 30 - 60 = ‘Good’

VAS 60 -100 = ‘Excellent’ 0.7

0.85

0.5

Unavoidable Error

Limit of Forecastability

Destroying Value

Adding Value

DRILL

Stock Holding Level

Performance Tracking

Apart from first few months

performance is excellent but

now falling away

Performance of benchmark

method (SES) shown as grey

line

League tables reflects different level of demand variability (Italy 41%, Sweden

19%)

European Packaging Manufacturer

Portfolio Management UK Spirits Business

RAE 0.85, 49% value destructive

UK Wine Business RAE 0.90, 53% value destructive

European Tobacco Business RAE 0.80, 25% value destructive

UK Beverages Business RAE 0.65, 25% value destructive

Segmentation analysis

High volume, low variability.

Optimise forecast algorithm and restrict judgemental input. Should be easy to

beat naive. Aim for green

High volume, high variability.

Forecasting involves judgement in addition to algorithms. Difficult but

possible to excel Aim for blue

Low volume, low variability.

Use simple/cheap approaches with no

judgemental intervention. Consider using naïve/kanban

Aim for amber

High volume, high variability.

Use simple/cheap approaches with no

judgemental intervention. Consider Make To Order

Aim for amber

HORSES MAD BULLS

DONKEYS JACK RABBITS

Increasing volatility (COV) à

Incr

easi

ng s

ize

(uni

ts/c

ost)

à

Segmentation Example (Issue 34)

Scatter plot not aligned with performance aspirations

Segmentation Analysis

High volume, low

variability...often where the value is destroyed (and MAPE is ‘best’!)

so… consider using

Kanban

The problem with measures

Intermittent Demand and

Valuing Adding Value

Remember this? (Issue 37)

Period 1 Period 2 Period 3 Total

Actual 0 10 0 10

Forecast 0 0 0 0

AE 0 10 0 10

NAE 0 10 10 20

RAE 0.5

What is causing this?

Intermittent Demand (1)

Q1: What is the best forecast for this data series?

Q2: What is the mean absolute error for a forecast of 3 per period?

Q3: What is the mean absolute error for a forecast of zero per period?

Intermittent Demand (2)

Q1: What is the mean absolute error for a forecast of 4 per period…and zero?

Q2: What is the mean absolute error for a forecast of 5 per period?

Intermittent Demand (3)

Q1: Why is the pattern of errors so (apparently) inconsistent?

Q2: How much confidence do you now have in your metrics?

Asymmetric Demand •  Absolute error metrics optimize on the median not the mean

•  As a result whenever the distribution of data is asymmetric, absolute error: •  is an unreliable guide to forecast performance •  cannot be used for selecting algorithms

MAE for forecasts with different

levels of bias (Loss Function)

Criteria for an alternative metric

1.  Optimizes on the mean 2.  Easy to calculate, aggregate and interpret 3.  Can be used for symmetric demand as well 4.  Reflects the business impact of forecast error

Candidates

Metric Optimum Simple Symmetric Business Impact

Squared Error Measures

Mean No Yes Partly

Direct Measures

Mean? No Yes? Yes

Mean Based Measures

Mean Yes No No

Mean Based Measures e.g. Forecast vs Mean Actual

Cumulative Forecast Error Measures of bias

Loss Function optimizes on the mean and is linear

Mean Based Measures

1.  Doesn’t reflect all qualities of a forecast 2.  OK for algorithm selection but… 3.  Can’t be used for measuring non – intermittent

demand forecast quality

Metric Optimum Simple Symmetric Business Impact

Mean Based Measures

Mean Yes No No

Forecast A Forecast B Forecast C

Forecast D

Bias Adjusted Error Metric

(abs)

Bias

+ av (abs)

+ Variation

Period'1 Period'2 Period'3 Period'4 Period'5 Mean'(abs)Actual 0.0 5.0 0.0 10.0 0.0 3.0

Forecast 0.0 0.0 0.0 0.0 0.0Error'vs'Mean &3.0 &3.0 &3.0 &3.0 &3.0 3.0Variation 3.0 8.0 3.0 13.0 3.0 6.0Bias'Adjusted 9.0

Zero Forecast

Forecast 3.0 3.0 3.0 3.0 3.0Error*vs*Mean 0.0 0.0 0.0 0.0 0.0 0.0Variation 3.0 2.0 3.0 7.0 3.0 3.6Bias*Adjusted 3.6

and… RAE 0.5 à 1.3

RAE 0.6

Forecast D

Forecast C Forecast B

Bias Adjusted Error

1.  Reflects the two key qualities of a forecast: •  How well it captures the LEVEL of demand •  How well it captures the PATTERN of demand

2.  OK for algorithm selection but… 3.  Can be it used for measuring non – intermittent

demand forecast quality?

Bias = 0.0 Variation = 3.6

Bias = 0.0 Variation = 6.0

Bias = 0.0 Variation = 2.4

Bias = 0.0 Variation = 3.6

Forecast A

Bias Adjusted Error •  Bias adjusted absolute error optimizes on the mean •  As shape of the loss function is linear it is easy to interpret

Criteria for an alternative metric

1.  Optimizes on the meanþ 2.  Easy to calculate, aggregate and interpretþ 3.  Can be used for symmetric demand as well☐ 4.  Reflects the business impact of forecast

error☐

Forecasting 101- The Ideal

W1 W2 W3 W4 W5

Replenish based on forecast

No (cycle) stock at period end

Forecasting 101- The Reality

Month 1 Month 3 Month 4 Month 5 Month 2 Month 3&5 = over forecast (Bias)àtoo much cycle stock

Month 2&4 = under forecasted (Bias)à too little cycle stock

Month 2&3 = low variation requires less safety stock to meet service target

Month 4&5 = high variation requires more safety stock to meet service target

1.  The level of excess/shortfall in cycle stock is directly proportional to the level of bias.

2.  The level of safety stock is directly proportional to the level of variation (for a given target service level)

3.  The the impact on the business of bias and variation (level and pattern) are independent of each other

Bias and traditional metrics (Issue 38)

Q1: What is the mean absolute error for a forecast of 2 per period?

Q2: What is the mean absolute error for a forecast of 3 per period?

Period'1 Period'2 Period'3 Period'4 Period'5 MeanActual 2 2 2 2 2 2.0

Period'1 Period'2 Period'3 Period'4 Period'5 MeanActual 1 2 4 2 1 2.0

Q3: What is the mean absolute error for a forecast of 2 per period?

Q4: What is the mean absolute error for a forecast of 3 per period?

= 0.0

= 1.0

= 0.8

= 1.4

Bias Adjusted Measures

Variation

Bias Bias

1.  Error increase is directly proportional to the increase in bias 2.  The impact of bias are variation (level and pattern) are

independent 3.  BAMAE provides a consistent measure of the level of bias (and

impact on cycle stock) and variation (impact on safety stock) irrespective of the distribution of demand

Traditional Measures

Traditional error metrics underestimate the impact of bias especially when the level of bias is small compared to the level of variation. The outcome is sensitive to the distribution of demand.

As a result changes in the size of error do not reflect the impact on the stock or the business

The Loss Function for MAE is non

linear

Example UK FMCG Business

Candidates

Metric Optimum Simple Symmetric Business Impact

Squared Error Measures

Mean No Yes Partly

Direct Measures

Mean? No Yes? Yes?

Mean Based Measures

Mean Yes No No

Bias Adjusted Error

Mean Yes Yes Yes

Operationalising BAMAE and RAE

Managing Forecast Quality

How can this be operationalized? BAMAE = Bias Adjusted Mean Absolute Error

Value Added

Contribution of bias

Contribution of variation

under forecast

over forecast

= stock outs

= excess stock

RAE = Relative Absolute Error

BAMAE BAMAnE

Robust measure of forecast performance compared to simple replenishment (Value Added)

BARAE

Managing Forecast Performance

Forecasting System(s)

Analytical System

Performance Data (item level errors)

ERROR ANALYSIS •  Forecast Value Added •  Allowing for forecastability •  Statistical alarms: signals from noise/

bias from variation •  Continuously updated

JUDGEMENTAL INPUT •  Adjusting history •  Building forecast models •  Changing and tuning models •  Adjusting/overriding results

USED TO MAKE DECISIONS •  replenishment

HIGH COMPLEXITY •  1000’s of forecasts •  running in parallel •  at multiple levels •  across many dimensions •  seasonality/intermittency

Management

ACTION •  Compare and rank •  Judgemental impact •  Optimise portfolio •  Real time issue resolution

Forecasters’ Judgement

History

Forecast

INPUT

OU

TPU

T

•  SAP/APO •  ForecastPro •  SAS •  Excel •  etc

ForecastQT Home Page •  Clear summary of three key indicators – Value Added Score (VAS), Bias and Variation •  Supports drill down into SKU level detail •  Easy to use tool with intuitive graphical interface based entirely in a web browser

Drill down by product (sorted by

size of bias)

Value Added (compared to ‘Kanban’

replenishment) and contribution of

bias and variation

Variation…demand vs error

volatility (PATTERN OF ERRORS

reflected in SAFETY STOCK)

Bias…in aggregate and in detail

(LEVEL OF ERRORS reflected in CYCLE STOCK)

Issue Management Example UK FMCG: <£1b revenue

High level bias (blue) c 0%, but low level and high level bias

(grey) = 15%...

…generating both over and under forecasting bias

alarms

Forecast Error

Excess stock

Unavoidable variation

Avoidable bias

Excess stock write off

Excess warehousing

costs

Excess financing costs

Over- forecasting

Under- forecasting

Avoidable lost sales

Expediting costs

Valuing Value Added (2) the cost of error

Avoidable variation

Stock impact of Avoidable Forecast Error (SAFE©) <0.25% of revenue

Total Cost of Avoidable Forecast Error (CAFE©) < 2% of revenue

What is this worth?

Cost of Sales Per $1b revenue1

Total Cost of Error 4%-6% $20m-$30m

Forecast Value Added 0%-2% $0m-$10m

Avoidable Error 1%-3% $5m-$15m

Avoidable Inventory2 $1m-$5m

1Assuming 50% Gross Margin 2Assuming 10 weeks stock cover

Contact details

steve.morlidge@catchbull.com www.catchbull.com

Thank you

Recommended