Risk Management and Reliable Forecasting Using Un-Reliable Data
First Presented at Lean Kanban Central Europe, Hamburg. November 2014
Troy Magennis Twitter: @t_magennis
Get Slides: http://bitly.com/1E9Hh8l
@t_magennis2
Don’t Follow the Light
3
Question Current Approaches to…
EstimationForecasting
Risk
4
Sources of Forecast
Risk
Work
Throughput
Dependencies
6 @t_magennisPeople
@t_magennis7
People
• People are biased – intentionally and/or un-intentionally
• In order to forecast and manage risk– We need good expert opinions– We need to confirm these opinions against reality– We need to learn from our forecast errors
• Often we get opinions on a fractional understanding of the eventual problem solved
@t_magennis8
Not Getting Data(At All or Early Enough)
@t_magennis9
Getting Reliable Data from People
• Why would people take the time?– We tell them (rarely works as intended)– Was politely ask them (works sometimes)– We make it part of their self-interest (most likely)
• Gamification• Challenge their view on fairness
• NEVER: Embarrass a team or individual– you will totally destroy reliable data capture….
10
Strategy 1 – “Gamify” Presentation
Interactive charts get attention, vibrant colors for teams with good data
TeamsStrategies
Features
Teams don’t like being “Red”(default to red; teams will make them green)
Coloring teams in dull (grey) based on poor quality data capture often gets action.
Make it sexy. Show how “my” metric connects to strategy
@t_magennis11
Strategy 2 – Visibility to Decisions
• Operations Reviews! Giving meaning to data!• Make it clear when data has led to decisions
– “Based on the data and analysis presented, this is clearly an opportunity we will pursue.”
– “Lets track the first month actuals against the model and fully invest if it is tracking well.”
• Make it clear when more data would have “won”• “If I could clearly see the impact of giving you those extra team members,
this would be easy”
• Promote lively debate around data– React quickly if data presented is gamed or teams repetitively
fail against THEIR models
@t_magennis12
Strategy 3 – Perceived Fairness
• One team gets some “extra” attention based on an argument supported by data– Extra resources, More Investment– More time to demo
• With just a few examples, often there is an avalanche of willing metric support by others
• Make it clear why the data swayed a decision
@t_magennis13
Uncertain Data Quality
@t_magennis14
Checking for Gaming & Errors
• We can ask tougher questions– What assumptions are built into this forecast?
• Why would we be 2x better than we ever have before?
– Walk me through the logic supporting your analysis– Looking at historical data, we predict very poorly when
there are 3 or more dependent teams. Have you considered this?
• We can test for unlikely patterns– Distribution analysis– Benford’s Law
15 @t_magennis
Throughput per week
Evidence of data quality is a well formed and explainable
distribution shape
Customer: “Our data is crap. You can’t use any of it”
@t_magennis16
Distribution Shape & Outliers
• Plot visually using Histogram• Set a rule: E.g. >10 times the mode? (state it)
Mode is 3
50 & 100 are outliers worth discussion..
@t_magennis17
Benford’s Law
• Benford's Law, also called the First-Digit Law, refers to the frequency distribution of digits in many real-life sources of data.
• Know to apply to: electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, …, and processes described by power laws.
Source: Wikipedia
Common in story counts per epics in software projects. Also probable in lead time cycle time values.
@t_magennis18
Benford’s Law Applied to Story Count
• Story count estimate for 48 randomly picked epics
• The frequency of the first digits was computed
• These were compared to Benford’s prediction (green within 1.5%)
dBenford’s Prediction
P(d)
Actual DataP(d)
1 30.1% 31.3%2 17.6% 18.8%3 12.5% 20.8%4 9.7% 8.3%5 7.9% 8.3%6 6.7% 8.3%7 5.8% 0%8 5.1% 4.2%9 4.6% 0%
Based on real data n = 48
@t_magennis19
Data Analysis Spreadsheethttps://github.com/FocusedObjective/FocusedObjective.Resources
20Data
@t_magennis21
Forecasting using data without considering context
@t_magennis22
Throughput Trend by Week
W2-2012 W10-2012 W18-2012 W26-2012 W34-2012 W42-2012 W50-2012 W4-2013 W12-2013 W20-2013 W28-2013 W36-2013 W44-2013 W52-2013 W7-2014 W15-20140
200
400
600
800
1000
1200
1400
1600
All Enabling Spec Bugs NFRs
@t_magennis23
Throughput Trend by Week
W2-2012 W10-2012 W18-2012 W26-2012 W34-2012 W42-2012 W50-2012 W4-2013 W12-2013 W20-2013 W28-2013 W36-2013 W44-2013 W52-2013 W7-2014 W15-20140
200
400
600
800
1000
1200
1400
1600
All Enabling Spec Bugs NFRs
@t_magennis24
Throughput Trend by Week
W2-2012 W10-2012 W18-2012 W26-2012 W34-2012 W42-2012 W50-2012 W4-2013 W12-2013 W20-2013 W28-2013 W36-2013 W44-2013 W52-2013 W7-2014 W15-20140
200
400
600
800
1000
1200
1400
1600
All Enabling Spec Bugs NFRs
High Volatility
Decline?
Restructure?
Training? Coaches added
end of year break
@t_magennis25
Good Contextual Forecasting
• Know the past– Track the date of significant company events
• Reorgs, releases, competitor releases,
– Track reference data that may show context• Staff numbers by date, National Holidays
– Markup all charts and data with context labels• Consider the future
– What events are likely over the forecast period– Draw samples considering these contexts
@t_magennis26
Some Context Events…• Internal differences in team skills• Any change (Hawthorn Effect)• Change of Risk Profile• Unstable WIP• Poor Quality• Unstable Test Environment• Seasons - Vacations• Executive Re-org• Natural Disasters• Exceptional Sickness• Changes in Staff• Team Changes• Location• Environmental Disturbance• Moral Shifts• Process Change• Architectural Change• Fatigue (Low Work Moral)• Change of demand for different classes of service• Account of Expedites• Changes in how to measure• Poor record keeping• Delivery frequency / cadence• Org changes / staffing
• Gaming the System• Mergers and Acquisitions• Multi tasking• High attrition rates• Staff availability due to prod issues• Critical specialists not available• Introduce new technology• Technical architectural changes• Legal requirements (date fixed)• Beginning the project• User stories too large• Dependency identification• Technical complexity• External spot demands• Changing prioritization• Expedited work• External dependencies• Better coffee• Relevant training• Process changes• Process problem moving tickets• New management policy
@t_magennis27
Forecasting using poor estimates from “Experts”
“Uncertain Uncertainty”
@t_magennis28
Improving Estimates
Stop• Point estimates• Ignoring uncertainty• Thinking it’s easy• “Never speak of this again”• Inventing units (points)• Rewarding gaming• Tolerating ambiguity
Start• Using Range estimates• Expressing Un-certainty• Train & practice estimation• Learning with feedback• Using dollars, time, counts• Rewarding honesty• Presenting unbiased data
@t_magennis31
http://ccnss.org/materials/pdf/sigman/callibration_probabilities_lichtenstein_fischoff_philips.pdf
32
Estimation Training
• How sure you are about guesses?• This can be practiced• Calibration – Trivia Game
– Ask a question about a known actual– Ask people to guess the range
• “True or False: "A hockey puck fits in a golf hole” • “Confidence: Choose the probability that best
represents your chance of getting this question right...
50% 60% 70% 80% 90% 100%”
– Disclose the result – 50% (no idea) should get 50% of the questions right by guess alone
Source: http://en.wikipedia.org/wiki/Calibrated_probability_assessment
@t_magennis33
No Lead Time Data?
• No team yet? No history?• We need two estimates with probability
– 1 in 5 tasks should take less than 1 day– 4 in 5 tasks should take less than 5 days
• We need to solve the curve that fits these two probabilities (and hopefully the others)
@t_magennis34
http://bit.ly/1tC1Phy
• Why lead time is Weibull, Why you care…
@t_magennis35
80% <= 5 Days(4 in 5)
20% <= 1 Day(1 in 5)
How do we get experts to estimate ranges and predict higher order percentiles from two estimates?
36
80% <= 5 Days
20% <= 1 Day
p2 x2
p1 x1
See detailed paper on the mathematics: http://www.johndcook.com/quantiles_parameters.pdf
?
37
https://github.com/FocusedObjective/FocusedObjective.Resources
Excel Formula: =(LN(-LN(1-p2_param))-LN(-LN(1-p1_param)))/(LN(x2_param) -
LN(x1_param))
=x1_param/(POWER((-LN(1-p1_param)),(1/Shape_result)))
=Scale_result*POWER(-LN(1-A27),1/Shape_result)
@t_magennis38
Missing HUGE delays and workload beyond the 95th
Percentile
39
http://connected-knowledge.com/
@t_magennis40
Long Tail Distribution Sampling
Good chance of Samples
Low chance of Samples
41
Hard to sample high-end percentiles…
• You find high end quickly for uniform dist.– 12 samples (50% certain of finding 90% range)
• Not so, for long tail distribution (Eg. Weibull shape: 1.5)
– 88% never found after 1000 trials, avg. 425 if lucky
@t_magennis
From samples (likely in practice)
By Formula (NOT likely in practice)
@t_magennis42
What is Risk?
95% <= 8.29 Days
Big Risks
How can we identify these?
@t_magennis43
The RISK is out there…
Lazy
@t_magennis44
Contact Details
www.FocusedObjective.comDownload latest software, videos, presentations and articles on
forecasting and applied predictive analytics
[email protected] email address for all questions and comments
@t_magennisTwitter feed from Troy Magennis
@t_magennis45
CASE STUDY: ESTIMATING TOTAL STORY COUNT
Do we have to break down EVERY epic to estimate story counts?
@t_magennis46
Problem: Getting a high level time and cost estimate for
proposed business strategytime and costs
Approach: Randomly sample epics from the 328 proposed
and perform story breakdown. Then use throughput history to
estimate time and costs
@t_magennis47
9
5
13 13
11
9
13
117
5
35
14
4 19
1Sum: 51
14751128
35195131183
Trial 1Trial 2 Trial 100
…
Number of stories
Sample with replacementRemember to put the piece of paper back in after each draw!
@t_magennis48
Epic Breakdown – Sample Count
Process 50% CI
75% CI
95% CI
MC 48 samples 261 282 315
Actual Sum262
Facilitated by well known consulting company, team performed story breakdown (counts) of epics. 48 (out of 328) epics were analyzed.
Process 50% CI
75% CI
95% CI
MC 48 samples 261 282 315MC 24 samples 236 257 292
Process 50% CI
75% CI
95% CI
MC 48 samples 261 282 315MC 24 samples 236 257 292MC 12 samples 223 239 266
Process 50% CI
75% CI
95% CI
MC 48 samples 261 282 315MC 24 samples 236 257 292MC 12 samples 223 239 266MC 6 samples 232 247 268
@t_magennis49
PROBLEMS WITH NON-LINEAR SCALES
@t_magennis50
Fibonacci Bias…
1 2 3 5 8 13 … 21
Team (3 of 130, 82% Median 5) Median Mean SDTeam AProcess Change Team 5 4.4 3Team BUI Software Dev Team 5 5.4 6Team CLibrary Software Dev Team 5 5.7 5.5
Question: What is the
middle value for this scale?
Perceived (5) Mathematical (10.5)
Being < 0 at MEAN – 1 SD should be an
indicator something is
wrong!
@t_magennis51
Normal?
Expect ~50%
Expect~15%
Expect~35%
@t_magennis52
Paper: Does the use of Fibonacci numbers in Planning Poker affect effort estimates?
“Conclusion: The use of a Fibonacci scale, and possibly other non-linear scales, is likely to affect the effort
estimates towards lower values compared to linear scales.
A possible explanation for this scale-induced effect is that people tend to be biased towards toward the middle of the
provided scale, especially when the uncertainty is substantial. The middle value is likely to be perceived as
lower for the Fibonacci than for the linear scale.”
https://www.simula.no/publications/Simula.simula.1282
R. Tamrakar and M. Jørgensen (2012)
@t_magennis53
Really, really, know the question…
• What is the goal or question being asked?• How is this question answered now?
– Good enough? Is it believed?– Current cost OK?
• What data would be necessary to answer this question slightly better?– Is the cost justified?– Would the result be more reliable?
@t_magennis
Import/Cleaning Tools
54
Re-runnable / Automation
Machine Learning
Importing
Normalizing
Imputing
Visualization
Estimating missing values
@t_magennis55 Spurious Correlations: http://tylervigen.com/
@t_magennis56 Spurious Correlations: http://tylervigen.com/
@t_magennis57
Correlation != Causation
• Criteria for causality– The cause precedes the effect in sequence– The cause and effect are empirically correlated
and have a plausible interaction – The correlations is not spurious
Sources: Kan,2003 pp80 and Babbie, 1986(HTTP://XKCD.COM/552/ CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL 2.5 LICENSE)