Upload
renu-selvamani
View
603
Download
2
Embed Size (px)
Citation preview
Mu Mart -Mu Sigma Business Case Analysis Using Analytics to Maximize Reach and Revenue
Prepared by Renuka S
PGP 2011-2013
IIM Trichy
Date: 30th Jan 2013
Agenda
SCQA
Data & Approach
Business Imperatives
Assumptions
EDA
Demand Forecasting
Pricing Mechanism
Assortment Planning
Customer Targeting
Enhancing User Experience
Challenges faced
The strategy proposed for the new stores of Mu Mart using
given data would help it maximize its Reach and Revenue
Situation
Mu Mart, has over 3,000 stores in 47
states
So far they have focused only on rural,
suburban and exurban centers
But, they are considering strategic
expansion into cities and urban centers
Their existing assortment strategy is
carrying large assortments owing to
limited real estate space
Complication
The new store to be launched in cities
which are unlike the existing localities
Existing assortment strategy cannot be
implemented here
Space within new stores are limited to
85000 Mod Space Units
Shelves uniformly can accommodate
only100 units each
Key Question(s)
What assortment strategy
should be adopted by Mu Mart
for urban stores?
Will it help Mu Mart maximize its
Reach and Revenue?
Answer
Optimize number of SKUs held
Foresee demand to adequately
stock the stores
Strategically place assortments
conventionally bought together
Increase promotions for current
and potential high valued
segments
SCQA
The data given by the Consumer Insights team were mined
using R to arrive at meaningful information
Raw
Datasets
Analytical
Approach
SKU
Level
Data
Customer
Level
Data
Transactions
Level
Data
Combined
Data
Low
Medium
High
Very High
Info
rmati
on
Co
nte
nt Low Medium High Very High
Size
Data Used
Data Given SKU Data
Information about
the 18000 SKU IDs
Customer Data
Information about
the 120000
customers
Transactions Data
Day to day
transactions from
1st Jan 2010 to 31st
Dec 2011
Combined Data
All of the above are
joined together
Steps Followed 1. The given raw
datasets were
imported to R
2. After the mining
operations, data
from R was taken
to Excel
3. PPT and Word
was used to
explain the steps
Data and Approach
The insights obtained by mining the data helps Mu Mart
make important Business Decisions
Accurate
Demand
Forecasting
Appropriate
Pricing
Mechanism
Optimizing
Brand
Selection
Business Imperatives Techniques Used
Predict the demand accurately to delicately
balance the tradeoffs due to margin lost and
sales foregone
Set prices in an intelligent manner that
would help increase revenues
Decide from plethora of brands, the ones to
be stocked given the space constraint and
revenue objective
Customer
Targeting
Enhancing
User
Experience
Identify current and potential high valued
customers for better targeting to get more
share of wallet
Meet customers’ expectations in terms of
store layout by stocking together commonly
bought items
Reven
ue M
axim
izatio
n
Reach
Maxim
izatio
n
Two Stage
Forecasting
Model*(see notes)
Multiple Regression
Linear Integer
Programming
Optimization
Logistic Regression
Market Basket
Analysis
Business Imperatives
The Business Decisions could be meaningfully identified
only by making some assumptions on the datasets
Assumption on Explanation Reason
1) Customer
Population
2) Product
Abstraction
3) Unit Size
4) Datasets
Customer demographics represented in
customer data represents the population of
the three given cities
The SKU IDs which are 11 characters can
be aggregated to product category(9
characters) and product type(8 characters)
Size a unit occupies in shelves do not affect
the no of units on shelf, but there is a cost of
$0.1 associated with each unit size
Final dataset is split into Training and Testing
datasets. Models are trained using year
2010 data and tested on year 2011 data
Reliable demographics information
about given cities was not
available publicly
For models and market basket
analysis aggregated datasets give
better results
To optimally allocate brands to
every product ,size was important
as retail space is expensive
To test the models for its
robustness, another similar
dataset was required
Assumptions
5) Customer
Preference
When a customer does not find the product
he wants he would leave; though he is
product specific he is not very brand specific
To allocate SKU_IDs only WRT
space occupied and revenue
brought in
EDA enabled broad level understanding of customers,
products and revenue contribution at different levels
Customers(Total= 120,000)
Ethnicities Income Groups
African American
Asian
Caucasian
Hispanic
Other
Very High
High
Medium
Low
Unknown
Ethnicity Revenue %
African
American $ 5,142,985 24%
Asian $ 1,953,622 9%
Caucasian $ 9,623,455 45%
Hispanic $ 4,066,667 19%
Other $ 623,271 3%
Income Group Revenue %
High $4,239,893 20%
Low $5,511,764 26%
Medium $8,402,488 39%
Unknown $421,566 2%
Very High $2,834,288 13%
SKU IDs
Categories
Types
18,000 units
18 units Market Basket Analysis
Demand Forecasting 181 units
Pricing Model
The Broad Insights
Revenue contribution for 2010-2011;
By each of the income groups and
each of the ethnicity group is similar
to the percentage distribution as
shown in the tables
Distribution of Income groups within
every ethnic group and vice versa is
similar to overall distribution
SKU Details & analysis at various levels SKU Nomenclature
Level 0: Brands
Level 1: Product Categories
Level 2: Product Types
EDA
24%
9%
45%
19%
3%
Ethnicities
African American
Asian
Caucasian
Hispanic
Other
20%
26% 39%
2% 13%
Income Group
High
Low
Medium
Unknown
Very High
The 2 Stage model forecasting no of brands sold in a day
combines benefits of exponential smoothing & regression
Demand Forecasting
This model uses exponentially smoothened forecasted values of dependent
variable as one of the independent variables in addition to other extraneous
independent variables
Why 2 stage model
No seasonality or
trend in data taken at
product category &
transaction level
Humungous amount of information is
available in patterns in past no of
units sold which can be used to
forecast future values
Advantages of 2 Stage
Model; Uses information
contained in the past
patterns of the dependent
variable(exponential
smoothing)
And the information
provided by the causally
related variables(multiple
regression)
Thus both the benefits are
combined
The 2 Stage forecasting model is highly robust as it is able
to explain well both the training and the testing datasets
R2 for Training Dataset(Year=2010) 88%
MAPE for the testing dataset 16%(hence
accuracy=84%)
Estimate Std. Error t value Pr(>|t|) Significance
(Intercept) -0.69 2.50E-01 -2.766 0.00567 **
Average Price of the product -0.08 9.59E-02 -0.783 0.43356
% of Customers buying this product category 13640.00 2.90E+01 469.648 < 2e-16 ***
Exponentially Smoothened Results 0.02 2.73E-03 5.605 2.09E-08 ***
No of Discounts in that Day for that product category 0.11 4.66E-02 2.413 0.01585 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model
Demand Forecasting
Variables Used
Dependent Variable : No of Units bought
Independent Variables : 1.Exponentially smoothened no of units(lagged)
2. Average Price of the Product category
3. No of discounts in that day for the product category
4. Percentage of customers buying the products in this category
Robustness of Two
Stage Model in
Forecasting
The current pricing mechanism needs a revamp due to low
revenue and improper targeting
Discounts lead to low
revenue
Discount Shopper –
No characteristic
Behavior
For instance in the year 2010, for nearly 30% of the
SKU_IDs the day the Unit Price was lowest was the day
revenue was lowest
The shoppers who buy when the Unit Prices are lowest are
representative of the overall population both in terms of
income group and ethnicity
Price & Revenue
High Correlation Higher the price an SKU_ID is, more revenue it brings as
seen from the 0.97 correlation exhibited by them.
Hence these reasons suggest that the pricing mechanism needs to be
revamped
Pricing Mechanism
A multiple regression model used to predict the prices
worked well on both testing and training datasets
Estimate Std. Error t value Pr(>|t|) Significance
(Intercept) 0.16 0.00 49.333 <2e-16 ***
No of units purchased -0.02 0.00 -21.831 <2e-16 ***
Previous day’s unit price 0.40 0.00 422.581 <2e-16 ***
Previous day’s no of units sold -0.01 0.00 -26.648 <2e-16 ***
Min display unit 0.00 0.00 1.851 0.0642 .
Revenue Contribution 112.00 0.20 562.447 <2e-16 ***
Min Mod Length 0.01 0.00 16.004 <2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model
R2 for Training Dataset(Year=2010) 79%
MAPE for the testing dataset 23%(hence accuracy=77%)
Robustness of Multiple Regression
Model in Forecasting
Pricing Mechanism
Variables Used
Dependent Variable Unit Price
Independent
Variables
No of Units Purchased Price to be set wrt no of items sold
Previous Day’s Unit Price Information lag in people’s minds
Previous Day’s No of Units Sold Price set based on recent sales
Min Display Unit Price affected by size as bigger size leads to
more attention Min Mod Length
Revenue Contribution Proxy for Brand value of SKU in its category
Assortment planning is done at two levels- strategic, to
address resource allocation &operational, to optimize SKUs
Constraint Sum(Decision Variables*Min Display Unit)<=Total allocated for that product category
Assortment Planning
Linear integer optimization model that efficiently allocated approximately 85,000 units to high
performing SKUs within every product category
Strategic
Level
Resources(Shelf space) is allocated to product categories
depending on their contribution to revenue (in the year 2010)
Operational
Level
Amount of each of SKU to be carried for each product category
would depend on their size and revenue contribution
Optimization
Objective
Maximize Sum{[(Minimum Display Unit*Average Price)-(Minimum Mod Length*Minimum Display
Unit*0.1)]*Decision Variables}
$0.1 cost associated with every
unit of mod length as larger
items cost more space
Higher the price, higher the revenue
No of units allocated
to every SKU of a
product category
Logistic Regression helped predict the characteristics of
high valued customers and identify potential high valued
High Valued customers are those who contribute to top 20% of the revenue in the years 2010 & 2011
Model
Dependent Variable:- Binary Variable indicating whether a customer is high
valued(1) or not(0)
Independent Variables Estimate Std. Error t value Pr(>|t|) Significance
(Intercept) -22.496389 0.27384 -82.151 < 2e-16 ***
Ethnicity: African_American(1/0) 0.059785 0.051259 1.166 0.2435
Ethnicity:Caucasian(1/0) 0.081489 0.043805 1.86 0.0628 .
Income Group:High(1/0) 0.245769 0.146732 1.675 0.0939 .
Income Group:Medium(1/0) 0.222782 0.143772 1.55 0.1213
Income Group:Low(1/0) 0.223944 0.145511 1.539 0.1238
Income Group:Very_High(1/0) 0.181818 0.149829 1.214 0.2249
Majorly Weekend Shopper 0.415228 0.004772 87.014 < 2e-16 ***
Majorly Weekday Shopper 0.418127 0.004457 93.818 < 2e-16 ***
Premium Shopper(1/0) 0.169173 0.042046 4.024 5.73E-05 ***
Discount Shopper(1/0) -0.113771 0.0431 -2.64 0.0083 **
Customer Targeting
=>
Significant
Variable
Thus a customer who is a Caucasian of Income Group High, also more of a weekday shopper
, buying products at a premium and do not buy on days there are discounts are High Valued!
Confusion
Matrix
(Training)
Actual/
Pred 1 0
1 4252 911
0 507 53110
Confusion
Matrix
(Testing)
Actual/
Pred 1 0
1 4036 1325
0 722 52687
% Concordance
97.6% 96.5%
Market Basket Analysis helped identify products types that
should be placed together to generate better sales
SKU76427 SKU76428 SKU76425 SKU76430 SKU76429 SKU76424 SKU76423 SKU76426 SKU76422 SKU76421
SKU76434 SKU76432 SKU76433 SKU76431
Market Basket Analysis Parameter Benchmarks Used
Support 8%
Confidence 30%
Product Types which are bought together for more than 10% of the days(Could be stacked sequentially)
Product Types which are bought together only on some days(Could be placed together but away from
the others)
These four product types are bought together more frequently than the
rest of the items and hence definitely should be placed together
In the best possible circumstances such a product placement would give rise to
an increase in revenue of whopping17%
Enhancing User Experience
The Analysis would be incomplete without the inclusion of
challenges faced while carrying out this project
Software Availability
SAS Q
SPSS P
R P
Although R was
the Chosen One,
it was mired with
complexities
Reached total allocation of
3758Mb: see help(memory.size)
Challenges with the Data
Limited
Scope
Lack of
Clarity
No
Customer
Insight
Non availability of various factors like customer age group, promotion
schemas highly limited the analysis scope
Since SKU_IDs were not given their actual names, there was a lack of
clarity in understanding customer preferences
There was no evident insight available about customers behaving in a
certain way, hence could not perform decision tree analysis
Challenges Faced
Thank You
Questions