Upload
salford-systems
View
1.002
Download
3
Tags:
Embed Size (px)
Citation preview
Salford Systems © Copyright 2012
Salford Systems © Copyright 2012
Felipe FernándezDan Steinberg
May, 2012
Broad Scale Predictive Modeling and Pricing Optimization in Retail Sales
Managing Complexity
Salford Systems © Copyright 2012
1. The ProblemManaging Complexit
y
Salford Systems © Copyright 2012
Problem Statement
Typical grocery store stocks about 50,000 SKUs Managing this inventories requires sophisticated
and automated logistics operations Managing daily and promotional prices requires
daily field information feeding automated tools Requirements:
Assure that products remain available to customers
Stock-outs of 3% to 10% of promoted products are common
Keep margins under control and prices competitive
Managing Complexity
Salford Systems © Copyright 2012
Estimated promotional event impact
Lift of promotions(positive or negative)
The problem Key question:
How many units of product x will we sell in store y next week?
What is the right price for product x to get margin y?
Managing Complexity
Salford Systems © Copyright 2012
The problem Key question:
How many units of product x will we sell in store y next week?
What is the right price for product x to get margin y?
Mechanisms may be complex, but crude Standard industry practice:
Simple moving average up-lifts
Linear regression to extract demand curvesPredict sales as a function of price (own) and other factors
Quality of the data usually far lower than ideal
Financial impact Commercial impact
Me-too promotions
Repetitive promotions
Incoherent pricing
Perceive price gouging
Aging of stocks
Carrying cost
Loss of sales
Managing Complexity
Salford Systems © Copyright 2012
Volatility of Sales Volume
Much more difficult to predict because substantial changes from business as usual Quantity sold may deviate substantially from historical averages
Running out of a promoted item risks alienating customers
Promotions:
an additional challenge
Spikes represent promos
Managing Complexity
Salford Systems © Copyright 2012
How Relevant is the error in demand forecasting?
Error measured as a percentage of predicted sales (units sold) Typical absolute error of 70% to 90% is common place (outside the US) In September 2009 across 4 promotions the chain experienced
Typical overstock 43% Typical understock of 15% Tradeoff: can avoid risk of understock by overstocking everything heavily
Goal is to achieve optimal balance
67
jan 10
-16%44%
60%
dez 09
-19%56%
75%
nov 09
-46%
40%
86%
out 09
-27%36%
63%
set 09
-15%43%
59%
Absolute error ( % of units sold)
% of absolute error
Forecast<ActuallForecast>Actual
Managing Complexity
Salford Systems © Copyright 2012
How Relevant is elasticity for pricing strategies?
For this product elasticity is low above R$11 54% of sales during last year the product was sold above R$11 averaging R$12,5 If that product would had been sold at R$15, unit sales would had remain the same
but But sales in R$ would had grown up to 13% and margin up to 40%
Managing Complexity
Salford Systems © Copyright 2012
1I. The CaseManaging Complexit
y
Salford Systems © Copyright 2012
More limited than the management of stocking for all products Focus on a subset of 7,000 frequently promoted items Promotions involve different media (TV, newspaper, flyers delivered to home) Advertised discounts (median: 7%; maximum: 70%)
Managing Complexity
The Project: Promotional Sales
Salford Systems © Copyright 2012
Media (TV is generally more powerful in inducing a boost in sales) Discount (the law of demand) Calendar (Summer vs. Winter, weekday vs. weekend, holidays, payday, etc.) Prior promotions (how recent, how frequent, previous discounts) Cross-price effects (sales of Product A in response to a discount on Product B) Assortment effects (the fact that Product A and Product B are both promoted)
Managing Complexity
Factors Influencing Sales
Salford Systems © Copyright 2012
Weather (rain or snow storms can severely disrupt transportation) Traffic (unusually principal traffic routes can be paralyzed for five hours) Soccer matches (during World Cup matches, many stores shut their doors) Historical database (we did not attempt to augment the historical record) Retail science micro factors (positioning on shelves, product neighbor effects,
special aisle displays, etc.)
Such detail was not available in our database which covers sales volumes for a huge geographic region and stores that do not necessarily comply with central directives.
We were thus limited to making use of the types of factors listed earlier.
Managing Complexity
Factors Influencing Sales
Salford Systems © Copyright 2012
Managing Complexity
Types of Goods Promoted Commodities (rice, beans, pasta, cooking oil,
coffee) Drug store items, personal hygiene (soaps,
shampoos, deodorants, etc.) Drinks (alcoholic: beer, wine, spirits;
nonalcoholic: soft drinks, ready made teas) Bakery items Perishables (fruits, vegetables, meat, fish) Deli items (sausage, cheeses, smoked fish,
pickles) Miscellaneous (non-food products)
Salford Systems © Copyright 2012
• Matrix to the left is an array of products (rows) by stores (columns)
• Need to produce a recommended stock level for every store/product cell
• For a typical promotion there could be up to 166*2500 or 415,000 predictions
• These are total forecasts for the entire promotion period (4-10 days typically)
• New promotions are being launched continuously with different prices, product mix, and media
Managing Complexity
Recommendation Challenge
Stores
Products
Salford Systems © Copyright 2012
Managing Complexity
Solution approach: Data Preparation We rolled-up daily sales to the promotion level
10-day promotion would be rolled up into a single row of data Non-promo periods were rolled up into “pseudo-promos” Some products promoted frequently with rich historical database (eg. 20,000
promotions), some rarely promoted products (eg. fewer than 100 promotions) Data quality is moderate at best: Data cleaning was certainly an option (eg.
nearest neighbor imputation would be reasonable for missing price information) We opted for no data cleaning Instead we opted for a robust learning machine that deals effectively
with the problems we inherited (outliers, missing values, anomalies) SQL Server extended with custom written C# modules as required Postgres version in development
Salford Systems © Copyright 2012
Managing Complexity
Modeling Strategies: TreeNet boosted trees
Classical statistical regressions were frequently driven by outliers and often produced “wild” predictions
We choose Friedman’s stochastic gradient boosting (TreeNet)
Salford Systems © Copyright 2012
Managing Complexity Data Mining Approach
Based on trees, the models are far less sensitive to outliers and data errors Achieved a dramatic improvement in R-Squared and other performance
measures (Mean Absolute Deviation (MAD), Percent Prediction Error) Naturally handles interaction detection Allows choice of Least Squares, Least Absolute Deviation, or Huber LS-LAD
hybrid objective criterion for models Our target variable is defined as percent deviation from normal sales
Yt* = (Yt – Y0) / Y0
Salford Systems © Copyright 2012
Managing Complexity Daily vs. Promotion Level
Models
A product promoted on 100 occasions with each promotion lasting 10 days:
1,000 daily records (great uncertainty, substantial random fluctuation)
Only 100 promotion level summary records (dangerously small sample) We elected to build both types of models
Daily models were used to predict the pattern of sales (fraction sold each day)
Promotion level models used to predict volumes
Salford Systems © Copyright 2012
1II. ConclusionsManaging Complexit
y
Salford Systems © Copyright 2012
Managing Complexity Results -Dramatic improvement of error distribution, actual unwanted problematic
overstock was reduced of 75% and actual store’s stock outs improved of near 50%. -Problematic stock; requires more than 30 days to be cleared
Illustrative
ANTES <0,5 >0,5<1 >1<2 >2<3 >3<4 >4<5 >5<6 >6<7 >7<8 >8<9 >9 Total geralD P H 4% 21% 48% 18% 4% 2% 1% 0% 0% 1% 2% 100%LIQUIDA 1% 7% 47% 22% 5% 6% 4% 1% 0% 3% 4% 100%P.A.S. 2% 14% 37% 24% 11% 4% 3% 1% 1% 0% 2% 100%PERFUMARIA 3% 17% 39% 20% 9% 5% 2% 2% 1% 1% 2% 100%SECA 2% 13% 40% 22% 9% 4% 2% 1% 1% 1% 5% 100%Total geral 2% 13% 42% 22% 8% 5% 3% 1% 1% 1% 4% 100%
APOS <0,5 >0,5<1 >1<2 >2<3 >3<4 >4<5 >5<6 >6<7 >7<8 >8<9 >9 Total geralD P H 8% 39% 38% 7% 2% 1% 1% 0% 0% 0% 2% 100%LIQUIDA 4% 34% 43% 10% 3% 2% 1% 0% 0% 0% 2% 100%P.A.S. 16% 36% 37% 6% 2% 1% 0% 0% 0% 0% 1% 100%PERFUMARIA 12% 36% 38% 8% 2% 1% 1% 0% 0% 0% 1% 100%SECA 9% 40% 38% 7% 3% 1% 1% 0% 0% 0% 1% 100%Total geral 9% 37% 39% 8% 3% 1% 1% 0% 0% 0% 2% 100%
Dep 1Dep 2Dep 3Dep 4Dep 5
Dep 1Dep 2Dep 3Dep 4Dep 5
Forecast volume error distribution (times actual sales)
45% of problematic over stock
15% of problematic overstok
Before
After
Salford Systems © Copyright 2012
% DO VOLUME PREVISTO Sem risco nem de falta nem de excesso (>0,5<2 EVENTOS)
MEDIA MOVIL % DO VOLUME PREVISTO Sem risco nem de falta nem de excesso (>0,5<2 EVENTOS)
70% 71%73%
77%
68%
79%77%
72%
60% 60%
66%69%
74%71% 71%
73%
58%
66%
69%
64%
70%
50%
55%
60%
65%
70%
75%
80%
85%
MAI I JUN I JUL I AGO I SET I OUT I NOV I DEZ I
EDIÇAO I
EDIÇAO II
EDIÇAO III
50%
55%
60%
65%
70%
75%
80%
85%
MAI I JUN I JUL I AGO I SET I OUT I NOV I DEZ I
EDIÇAO I
EDIÇAO II
EDIÇAO III
3 per. Mov.Avg.(EDIÇAO I)3 per. Mov.Avg.(EDIÇAO II)3 per. Mov.Avg.(EDIÇAO III)
Managing Complexity Evolution of accuracy
Over almost one year we saw continuous improvements of accuracy by tuning both our models and/or increments used in recommendations.
Accu
racy
10-20th day of the month
20-30th day of the month
1-10th day of the month
Salford Systems © Copyright 2012
Mass Modeling SystemManaged in SQL Server
© Copyright Salford Systems 2011
Managing Complexity
Modeling FilesConstructed variablesStore Characteristics
Calendar DetailsPromotion Details/MediaOwn & cross price details
DataBaseDaily Sales
All Products
Data Mining Engines
Predictive Models
Prediction ScenariosPricing, promotion details
Forecasted Sales, RevenuesRecommended stock levels
If a model is rejected for poor predictive performance (test R2 >=.10)
we search for a replacement model using a similar product Poor performance can be due to insufficient or poor quality data Related product models can perform extremely well by comparison
Salford Systems © Copyright 2012
Automated Modeling
A common modeling strategy applied to all products Main prediction system uses 30 predictors
TreeNet adapts raw inputs as needed Some products involve powerful interactions while others can be
safely predicted with purely additive models Models developed on demand as sufficient new data
becomes available Particular project normally required about 2500 models to be
developed overnight Special projects outside of normal promotion system required more
Managing Complexity
Salford Systems © Copyright 2012
Example Product Model Summary
Ridge regression yields an R2 of .137 on test dataTreeNet reaches a test data R2 of .226
Managing Complexity
Salford Systems © Copyright 2012
Managing Complexity
Sales Volume variation in some frequently promoted items
Distribution of Normal SalesHuge variation across similar sized stores
Salford Systems © Copyright 2012
Managing Complexity Distribution of Deviation from
Normal Sales During Promotion
Normal sales are based on the union of promo and non-promo days Deviation from normal is our target variable (and is often negative) Distribution of sales deviation from normal displays a median close to zero Interquartile range (Q3-Q1) is close to 100% of normal sales for these products
Salford Systems © Copyright 2012
Partial Dependency Plot %D Sales vs %D PriceManaging
Complexity
Fresh onion kg Rice Carolino 5kg T1
Detergent Omo 1kg Condensate Milk Mococa 395g TP
Reducing the price by just 15% from “normal” increases sales to more than 3x normal for detergent but nothing for fresh onion
Salford Systems © Copyright 2012
Number of days of a promotion Managing Complexity
Fresh onion kg Rice Carolino 5kg T1
Detergent Omo 1kg Condensate Milk Mococa 395g TP
Salford Systems © Copyright 2012
Impact of Most Recent DiscountManaging
Complexity
Fresh onion kg Rice Carolino 5kg T1
Detergent Omo 1kg Condensate Milk Mococa 395g TP
Salford Systems © Copyright 2012
Seasonal Pattern
During the last 5 weeks of the year sales steadily increase to more than 70% above normal by New Years Day and then gradually fall back to normal by March
Managing Complexity
Salford Systems © Copyright 2012
Locational Effects
Measure about a 15% difference between the most and least favorable regionsRecall that we already control for price and other key factors
Managing Complexity
Salford Systems © Copyright 2012
Number of Days Before Next Holiday
Some consumers start planning as much as 10 days in advance but more wait until the start of the holiday or the day or two before
Managing Complexity
Salford Systems © Copyright 2012
Managing Complexity Partial Dependency Plot%D Sales vs %D Competitor´s Price
Salford Systems © Copyright 2012
Managing Complexity Price Optimization to Maximize
Profit This problem is very well understood and is typically solved by making use of the slope of the demand curve
Constraints may well be imposed Maximum price to be applied to
highly inelastic products Minimum price to control sales
volumes in the presence of supply limitations
The TreeNet models naturally capture nonlinearity and interactions
The complexity of the TreeNet model requires that optimal solutions be found via simulation and search
Salford Systems © Copyright 2012
Managing Complexity Final Comments Retailers with more than 30 stores have problems to figure out the right amounts
of merchandise to supply each store Healthy turnovers of the stocks and right pricing strategies are key to the profitable
transition from 30 to 100s stores
Our solution have demonstrated on a massive retail chain that the problem of accuracy can be dramatically improved by using predictive analytics based on non-parametric modeling.
Salford Systems © Copyright 2012
Thank you