Renuka iim trichy strategy for mu martv

Mu Mart -Mu Sigma Business Case Analysis Using Analytics to Maximize Reach and Revenue

Prepared by Renuka S

PGP 2011-2013

IIM Trichy

Date: 30th Jan 2013

Agenda

SCQA

Data & Approach

Business Imperatives

Assumptions

EDA

Demand Forecasting

Pricing Mechanism

Assortment Planning

Customer Targeting

Enhancing User Experience

Challenges faced

The strategy proposed for the new stores of Mu Mart using

given data would help it maximize its Reach and Revenue

Situation

Mu Mart, has over 3,000 stores in 47

states

So far they have focused only on rural,

suburban and exurban centers

But, they are considering strategic

expansion into cities and urban centers

Their existing assortment strategy is

carrying large assortments owing to

limited real estate space

Complication

The new store to be launched in cities

which are unlike the existing localities

Existing assortment strategy cannot be

implemented here

Space within new stores are limited to

85000 Mod Space Units

Shelves uniformly can accommodate

only100 units each

Key Question(s)

What assortment strategy

should be adopted by Mu Mart

for urban stores?

Will it help Mu Mart maximize its

Reach and Revenue?

Answer

Optimize number of SKUs held

Foresee demand to adequately

stock the stores

Strategically place assortments

conventionally bought together

Increase promotions for current

and potential high valued

segments

SCQA

The data given by the Consumer Insights team were mined

using R to arrive at meaningful information

Raw

Datasets

Analytical

Approach

SKU

Level

Data

Customer

Level

Data

Transactions

Level

Data

Combined

Data

Low

Medium

High

Very High

Info

rmati

on

Co

nte

nt Low Medium High Very High

Size

Data Used

Data Given SKU Data

Information about

the 18000 SKU IDs

Customer Data

Information about

the 120000

customers

Transactions Data

Day to day

transactions from

1st Jan 2010 to 31st

Dec 2011

Combined Data

All of the above are

joined together

Steps Followed 1. The given raw

datasets were

imported to R

2. After the mining

operations, data

from R was taken

to Excel

3. PPT and Word

was used to

explain the steps

Data and Approach

The insights obtained by mining the data helps Mu Mart

make important Business Decisions

Accurate

Demand

Forecasting

Appropriate

Pricing

Mechanism

Optimizing

Brand

Selection

Business Imperatives Techniques Used

Predict the demand accurately to delicately

balance the tradeoffs due to margin lost and

sales foregone

Set prices in an intelligent manner that

would help increase revenues

Decide from plethora of brands, the ones to

be stocked given the space constraint and

revenue objective

Customer

Targeting

Enhancing

User

Experience

Identify current and potential high valued

customers for better targeting to get more

share of wallet

Meet customers’ expectations in terms of

store layout by stocking together commonly

bought items

Reven

ue M

axim

izatio

n

Reach

Maxim

izatio

n

Two Stage

Forecasting

Model*(see notes)

Multiple Regression

Linear Integer

Programming

Optimization

Logistic Regression

Market Basket

Analysis

Business Imperatives

The Business Decisions could be meaningfully identified

only by making some assumptions on the datasets

Assumption on Explanation Reason

1) Customer

Population

2) Product

Abstraction

3) Unit Size

4) Datasets

Customer demographics represented in

customer data represents the population of

the three given cities

The SKU IDs which are 11 characters can

be aggregated to product category(9

characters) and product type(8 characters)

Size a unit occupies in shelves do not affect

the no of units on shelf, but there is a cost of

$0.1 associated with each unit size

Final dataset is split into Training and Testing

datasets. Models are trained using year

2010 data and tested on year 2011 data

Reliable demographics information

about given cities was not

available publicly

For models and market basket

analysis aggregated datasets give

better results

To optimally allocate brands to

every product ,size was important

as retail space is expensive

To test the models for its

robustness, another similar

dataset was required

Assumptions

5) Customer

Preference

When a customer does not find the product

he wants he would leave; though he is

product specific he is not very brand specific

To allocate SKU_IDs only WRT

space occupied and revenue

brought in

EDA enabled broad level understanding of customers,

products and revenue contribution at different levels

Customers(Total= 120,000)

Ethnicities Income Groups

African American

Asian

Caucasian

Hispanic

Other

Very High

High

Medium

Low

Unknown

Ethnicity Revenue %

African

American $ 5,142,985 24%

Asian $ 1,953,622 9%

Caucasian $ 9,623,455 45%

Hispanic $ 4,066,667 19%

Other $ 623,271 3%

Income Group Revenue %

High $4,239,893 20%

Low $5,511,764 26%

Medium $8,402,488 39%

Unknown $421,566 2%

Very High $2,834,288 13%

SKU IDs

Categories

Types

18,000 units

18 units Market Basket Analysis

Demand Forecasting 181 units

Pricing Model

The Broad Insights

Revenue contribution for 2010-2011;

By each of the income groups and

each of the ethnicity group is similar

to the percentage distribution as

shown in the tables

Distribution of Income groups within

every ethnic group and vice versa is

similar to overall distribution

SKU Details & analysis at various levels SKU Nomenclature

Level 0: Brands

Level 1: Product Categories

Level 2: Product Types

EDA

24%

9%

45%

19%

3%

Ethnicities

African American

Asian

Caucasian

Hispanic

Other

20%

26% 39%

2% 13%

Income Group

High

Low

Medium

Unknown

Very High

The 2 Stage model forecasting no of brands sold in a day

combines benefits of exponential smoothing & regression

Demand Forecasting

This model uses exponentially smoothened forecasted values of dependent

variable as one of the independent variables in addition to other extraneous

independent variables

Why 2 stage model

No seasonality or

trend in data taken at

product category &

transaction level

Humungous amount of information is

available in patterns in past no of

units sold which can be used to

forecast future values

Advantages of 2 Stage

Model; Uses information

contained in the past

patterns of the dependent

variable(exponential

smoothing)

And the information

provided by the causally

related variables(multiple

regression)

Thus both the benefits are

combined

The 2 Stage forecasting model is highly robust as it is able

to explain well both the training and the testing datasets

R2 for Training Dataset(Year=2010) 88%

MAPE for the testing dataset 16%(hence

accuracy=84%)

Estimate Std. Error t value Pr(>|t|) Significance

(Intercept) -0.69 2.50E-01 -2.766 0.00567 **

Average Price of the product -0.08 9.59E-02 -0.783 0.43356

% of Customers buying this product category 13640.00 2.90E+01 469.648 < 2e-16 ***

Exponentially Smoothened Results 0.02 2.73E-03 5.605 2.09E-08 ***

No of Discounts in that Day for that product category 0.11 4.66E-02 2.413 0.01585 *

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Model

Demand Forecasting

Variables Used

Dependent Variable : No of Units bought

Independent Variables : 1.Exponentially smoothened no of units(lagged)

2. Average Price of the Product category

3. No of discounts in that day for the product category

4. Percentage of customers buying the products in this category

Robustness of Two

Stage Model in

Forecasting

The current pricing mechanism needs a revamp due to low

revenue and improper targeting

Discounts lead to low

revenue

Discount Shopper –

No characteristic

Behavior

For instance in the year 2010, for nearly 30% of the

SKU_IDs the day the Unit Price was lowest was the day

revenue was lowest

The shoppers who buy when the Unit Prices are lowest are

representative of the overall population both in terms of

income group and ethnicity

Price & Revenue

High Correlation Higher the price an SKU_ID is, more revenue it brings as

seen from the 0.97 correlation exhibited by them.

Hence these reasons suggest that the pricing mechanism needs to be

revamped

Pricing Mechanism

A multiple regression model used to predict the prices

worked well on both testing and training datasets

Estimate Std. Error t value Pr(>|t|) Significance

(Intercept) 0.16 0.00 49.333 <2e-16 ***

No of units purchased -0.02 0.00 -21.831 <2e-16 ***

Previous day’s unit price 0.40 0.00 422.581 <2e-16 ***

Previous day’s no of units sold -0.01 0.00 -26.648 <2e-16 ***

Min display unit 0.00 0.00 1.851 0.0642 .

Revenue Contribution 112.00 0.20 562.447 <2e-16 ***

Min Mod Length 0.01 0.00 16.004 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Model

R2 for Training Dataset(Year=2010) 79%

MAPE for the testing dataset 23%(hence accuracy=77%)

Robustness of Multiple Regression

Model in Forecasting

Pricing Mechanism

Variables Used

Dependent Variable Unit Price

Independent

Variables

No of Units Purchased Price to be set wrt no of items sold

Previous Day’s Unit Price Information lag in people’s minds

Previous Day’s No of Units Sold Price set based on recent sales

Min Display Unit Price affected by size as bigger size leads to

more attention Min Mod Length

Revenue Contribution Proxy for Brand value of SKU in its category

Assortment planning is done at two levels- strategic, to

address resource allocation &operational, to optimize SKUs

Constraint Sum(Decision Variables*Min Display Unit)<=Total allocated for that product category

Assortment Planning

Linear integer optimization model that efficiently allocated approximately 85,000 units to high

performing SKUs within every product category

Strategic

Level

Resources(Shelf space) is allocated to product categories

depending on their contribution to revenue (in the year 2010)

Operational

Level

Amount of each of SKU to be carried for each product category

would depend on their size and revenue contribution

Optimization

Objective

Maximize Sum{[(Minimum Display Unit*Average Price)-(Minimum Mod Length*Minimum Display

Unit*0.1)]*Decision Variables}

$0.1 cost associated with every

unit of mod length as larger

items cost more space

Higher the price, higher the revenue

No of units allocated

to every SKU of a

product category

Logistic Regression helped predict the characteristics of

high valued customers and identify potential high valued

High Valued customers are those who contribute to top 20% of the revenue in the years 2010 & 2011

Model

Dependent Variable:- Binary Variable indicating whether a customer is high

valued(1) or not(0)

Independent Variables Estimate Std. Error t value Pr(>|t|) Significance

(Intercept) -22.496389 0.27384 -82.151 < 2e-16 ***

Ethnicity: African_American(1/0) 0.059785 0.051259 1.166 0.2435

Ethnicity:Caucasian(1/0) 0.081489 0.043805 1.86 0.0628 .

Income Group:High(1/0) 0.245769 0.146732 1.675 0.0939 .

Income Group:Medium(1/0) 0.222782 0.143772 1.55 0.1213

Income Group:Low(1/0) 0.223944 0.145511 1.539 0.1238

Income Group:Very_High(1/0) 0.181818 0.149829 1.214 0.2249

Majorly Weekend Shopper 0.415228 0.004772 87.014 < 2e-16 ***

Majorly Weekday Shopper 0.418127 0.004457 93.818 < 2e-16 ***

Premium Shopper(1/0) 0.169173 0.042046 4.024 5.73E-05 ***

Discount Shopper(1/0) -0.113771 0.0431 -2.64 0.0083 **

Customer Targeting

=>

Significant

Variable

Thus a customer who is a Caucasian of Income Group High, also more of a weekday shopper

, buying products at a premium and do not buy on days there are discounts are High Valued!

Confusion

Matrix

(Training)

Actual/

Pred 1 0

1 4252 911

0 507 53110

Confusion

Matrix

(Testing)

Actual/

Pred 1 0

1 4036 1325

0 722 52687

% Concordance

97.6% 96.5%

Market Basket Analysis helped identify products types that

should be placed together to generate better sales

SKU76427 SKU76428 SKU76425 SKU76430 SKU76429 SKU76424 SKU76423 SKU76426 SKU76422 SKU76421

SKU76434 SKU76432 SKU76433 SKU76431

Market Basket Analysis Parameter Benchmarks Used

Support 8%

Confidence 30%

Product Types which are bought together for more than 10% of the days(Could be stacked sequentially)

Product Types which are bought together only on some days(Could be placed together but away from

the others)

These four product types are bought together more frequently than the

rest of the items and hence definitely should be placed together

In the best possible circumstances such a product placement would give rise to

an increase in revenue of whopping17%

Enhancing User Experience

The Analysis would be incomplete without the inclusion of

challenges faced while carrying out this project

Software Availability

SAS Q

SPSS P

R P

Although R was

the Chosen One,

it was mired with

complexities

Reached total allocation of

3758Mb: see help(memory.size)

Challenges with the Data

Limited

Scope

Lack of

Clarity

No

Customer

Insight

Non availability of various factors like customer age group, promotion

schemas highly limited the analysis scope

Since SKU_IDs were not given their actual names, there was a lack of

clarity in understanding customer preferences

There was no evident insight available about customers behaving in a

certain way, hence could not perform decision tree analysis

Challenges Faced

Thank You

Questions

Documents

Renuka iim trichy strategy for mu martv