Market Research Doc

8/11/2019 Market Research Doc

1/38

MARKET RESEARCH

Teaching you how to fish


2/38

Why, you ask?

Because I have this burning desire inside of me to give gyaanto people who dont need it.

Also because it is saddening to see the one sensible marketing subject detested by the majority.

And because Mala issoawesome.

To the best of my knowledge, this guide is accurate and should suffice for the end term

examination. It took me two days to do this. An invaluable learning experience for me as well.

However, I am not claiming it to be perfect. So, if you have any corrections or additions to make

to this document, please drop a mail [email protected].

Cheers.

Oh, I almost forgot.

If you find this useful, do share it with others who might need this as well. There is no point in

holding onto information just to score more marks. That A grade in the mark sheet will takeyou only so far.

*Hardcore open source fan*
mailto:[email protected]:[email protected]:[email protected]:[email protected]


3/38

Contents

MULTI-DIMENSIONAL SCALING .................................................................................................................... 4

CONJOINT ANALYSIS ................................................................................................................................... 11

CLUSTER ANALYSIS ...................................................................................................................................... 17

FACTOR ANALYSIS ....................................................................................................................................... 30

BINARY LOGISTIC REGRESSION ................................................................................................................... 31

DISCRIMINANT ANALYSIS ............................................................................................................................ 38


4/38

MULTI-DIMENSIONAL SCALING

MDS allows the perceptions and preferences of the consumers to be clearly represented in a

spatial map. It gives quantitative estimates of similarity between groups of items.

Lets say, in a survey, the respondents were asked to give ratings for all possible pairs of nine

pairs of soft drinks in terms of their similarity and MDS results in the following spatialrepresentation of the soft drinks:

From this it can be inferred that Coke and Pepsi are the most similar as there is the least distance

between them, while Dr Pepper and Diet 7-up (Or Diet coke and Tango) are the most dissimilar.

The catch in this is that MDS doesnt give what the horizontal and vertical axes represent. Theseare dependent on the judgment of the researcher. For instance, vertical axis can be dietness and

horizontal axis can be flavor. Or it could be price, color, soda amount etc.

Other applications of MDS would be


5/38

Market segmentationGroup customers with similar interests

New product development - Where would you want your new soft drink to be placed in

the above map?

Pricing

Choice of retail outletsWhere and which channel?

AdvertisingWhich tagline or actor is suitable for different brands/products?

Eg. For advertising: Suppose there are four categories and four actors

1.

Telecom 1. Shahrukh Khan

2.

Fairness creams 2. Uday Chopra

3. Tourism 3. John Abraham

4. Viagra 4. Abhishek Bachchan

The respondents can be asked to rate individual pairs on a likert scale of 1-7 (1 Good match, 7

Not good match) and see which actor might be suitable for a category:


6/38

So, this shows that Abhishek Bachchan would be most suitable for Telecom ads (This is a

random figure. We know that its not true in reality), and none of the four are close enough to

Viagra. Probably because they dont have what it takes to represent that category.

How to do MDS in SPSS?

To obtain a spatial map for 5 brands of beer:

1. Obtain the data from respondents on a likert scale of 1-5 or 1-7, with 1 Most similar, 7

least similar

2.

Here row 1 represents Budvar, 2Budweiser and so on. So, theyll have 0 (or 1 ) against

themselves. Only upper half of the triangle in the data matrix suffices here. There mightbe some special cases where respondents may say, New Zealand is similar to Australia,

but Australia is not similar to New Zealand. Then, you will need the complete matrix ofdata to analyze.

3. Now, there are two methods of MDSPROXSCAL AND ALSCAL.

PROXSCAL is an update in the SPSS after ALSCAL was found to be inefficient.Apparently, we were taught ALSCAL, which I am not comfortable with. So you may

figure it out on your own once you have gone through the PROXSCAL method.

4. Go to Analyze > Scale > MDS [PROXSCAL] Click on Define in the next window that

appears.


7/38

5. Put all 5 brands of beer under proximities.

6.

In model, choose upper-triangular matrix and dimensions as 1 to 4, as we have 5 brands.

We need to see the stress plot and reduce the dimensions later.


8/38

7. Go to plots and check stress plot and uncheck common space for now.


9/38

8. Run it. The scree plot shows the elbow formed at dimension 2. After that, the stress

doesnt reduce by a significant amount as the dimensions increase. Our objective is to

keep both dimensions and stress minimum. So, we have to make a call at some point to

balance the two.

9.

Now, run the MDS PROXSCAL all over again, but this time, change the dimensions

range to Min2, max2 and check the common space plots.


10/38

10.Youll get the above result. It can be seen that Budweiser is most unique among the 5.

Heineken and Carlsberg, and Corona and Budvar are similar pairs. It is hard to name thedimensions with such a small sample. It could be aftertaste, degree of high that you get,

etc. The dimensions can also be judged by taking a ranking of beer attributes in a separate

survey.

11.

Validity: Stress values are indicative of the quality of MDS solutions. In general,following is the recommendation for stress values

In our output, it is 2.79% (0.02796) which makes it an excellent fit.


11/38

CONJOINT ANALYSIS

Used to determine how people value different features that make up an individual product orservice.

Lets say I am a shoe manufacturer and there are three important features(attributes):

Material

Color

Price

Further, we know that there is a range of feasible alternatives(attribute levels) for each of these

features, say,

Material Durability Price

Leather 1 year 300

Canvas 2 years 1000

Rubber 2000

Obviously, the markets ideal shoe would be


Leather 2 years 300

And the ideal shoe from my perspective (manufacturer) would be


Rubber 1 year 2000

Here is the basic marketing issue: Id be stripped naked selling the first shoe whereas the market

wouldnt buy the second. So, the most viable product lies somewhere in between. Conjoint

analysis lets us find out where.

Now, we would need a survey wherein the respondents are asked to evaluate different productcombinations. In the present case, we have 3*2*3 = 18 possible combinations. Generally, it is

advisable to bring it down to 10-12 combinations after careful judgment. For now, let us consider

all 18 cases.


12/38

Once you have got the scores/ranking, convert to binary variable format and enter into SPSS.

Below is an example:

For the Material attribute, there are 3 levels, so we use 2 variables.

If both VarCanvas and VarLeather are zero, it implies material is rubber. Similarly, for otherattributes and their respective levels.

Dur2 = 1 implies durability of 2 years.

V2000 and V1000 are two variables for the prices 2000 and 1000 rupees respectively.

Preference = 18, is the highest preference of the customer. So, consider them as scores and not asa ranking from 1 to 18.

Then, you do a linear regression.


13/38

Preference is the dependent variable and the rest are independent variables. Click OK.


14/38

The output is as follows:

Interpretation:

Firstly, the validity of the linear regression model has to be checked from the model summary.

Rsquare = 0.89 i.e. the model explains 89% of the variance. In general, 60%+ is considered to be

a good fit.

Note on R square:


15/38

The regression model on the left accounts for 38.0% of the variance while the one on the

right accounts for 87.4%. The more variance that is accounted for by the regression

model the closer the data points will fall to the fitted regression line. Theoretically, if a

model could explain 100% of the variance, the fitted values would always equal the

observed values and, therefore, all the data points would fall on the fitted regression line.

Second, ANOVA table. The null hypothesis of ANOVA would be there are no differences

between the means of the samples. Meaning, there is no significant difference between all the

variables taken which would make it redundant. So, the level of significance should be < 0.05 so

that the null hypothesis is rejected. Here, sig. = 0 from the table.

Third, we calculate the utilities from the coefficients table. The B column gives the observed

utilities. Note that the total utility for any attribute (like material) is equal to zero.

Utility (Canvas) = 1.667

Utility (Leather) = 5.833

Therefore, Utility (Rubber) = -7.50

Similarly for others:

Utility (Dur2) = 6.778, Utility (Dur1) = -6.778

Utility (V2000) = -6.167, Utility (V1000) = -3.833, Utility (V300) = 10

Four, we calculate the actual utilities because SPSS considered the utility of the implicit

variables (Rubber, Dur1 and V300) as zero, and it gave the other utilities of the explicit variables

in relative to zero.

So, Utility (Canvas)Utility (Rubber) = 1.667


16/38

And, Utility (Leather)Utility (Rubber) = 5.833

We already know that Utility(Canvas) + Utility(Leather) + Utility (Rubber) = 0

Solving these three equations, we get the actual utilities.

Material Utility

Canvas -0.85

Leather 3.36

Rubber -2.51

Similarly, do for other two attributes.

Durability Utility

2 years 3.389

1 year -3.389

Price Utility

2000 -2.837

1000 -0.493

300 3.33

Now, we can rank the different product combinations on the basis of maximum sum of utilities.

The top ten combinations would be


Utility

sum

Leather 2 years 300 10.079


Canvas 2 years 300 5.869

Rubber 2 years 300 4.209


Leather 1 year 300 3.301

Canvas 2 years 1000 2.046

Rubber 2 years 1000 0.386

Canvas 2 years 2000 -0.298

Rubber 2 years 2000 -1.958

This is where I, as a manufacturer, would make a trade-off. I would leave the top 3 preferredproducts as they would mean only losses for me and maybe go for the 4thor 5thcombination.

There would be no customers for the last few combinations.


17/38

CLUSTER ANALYSIS

Identifying groups of individuals or objects that are similar to each other but different

from individuals in other groups

Each object is assigned to only one cluster

In cluster analysis, there is no a priori information about the group or cluster membership

for any of the objects (In discriminant, we do. Will discuss that in later section.)

Mainly used for understanding buying behaviors and market segmentation.

DURR CASE:

Background - DRR Environmental Controls is a German conglomerate producing air emission

control systems and that has extensive industrial operations in the US. The company isconsidering introducing one or more offers in the US market and believes that its product will

need lower service costs that its competitors products.

Our objective is to propose a marketing segmentation, allowing DURR to target customers with

specific and efficient sales pitch.

MARKET AND COMPETITION:

While choosing a product in the market, customers look at four dimensions:

Efficiency

Delivery Time

Price

Delivery Terms

Each of these levels have 4 sub-levels.

Now, we do analysis using SPSS. We have 3 types of clustering: Hierarchical, K-means and two

step. We have studied only the first two methods.

For K-means, we need to know how many clusters we want. So first, we use hierarchical cluster

method.


18/38


19/38

We use squared Euclidean distance measurements

Also, check Dendogram under plots section and run the program.


20/38

The proximity matrix gives the squared Euclidean distances (Ill call it SED) between 2

companies across the 16 variables. For eg. The SED between 8 and 9 is 235.

How is this obtained? You remember how we used to find the distance between two points in co-

ordinate geometry? That was Euclidean distance too.

Square c and you get SED. So, consider company 8 and company 9 as two points and the

variable values as its co-ordinates.

Going back to the DURR data sheet, we get the SED between 8 and 9 as

(19-17)2+ (57)2 + (34) 2+ (0-0)2 + (19-20) 2+. = 235

Now, we know how all the possible SEDs are calculated. SPSS then arranges in increasing order

of the distances between the pairs and puts them in a cluster step by step. This is when you study

the agglomeration schedule table. It lists the 30 least distances. As there are 31 companies, we


21/38


22/38

You go on doing this till stage 8, then a company re-appears for the second time. This is where

the right half of the table helps.

At stage 9, in the right half, it says that company 8 had already appeared in stage 4. This means

you are going to put company 3 into the STAGE 4 box and then rename it to STAGE 9 now.

The number of boxes (or clusters) goes on reducing as you go further. In stage 14, for instance,

boxes 7 and 8 are merged to form a bigger box 14.

The dendogram also depicts this process. Take a moment and youll understand the figure. The

plus signs are the points where the merge happens. (Some SPSS outputs dont give dotted lines

or plus symbols, but they are also quite simple to read.)


23/38

Now, youll have to make a call on the number of clusters to keep. Look at the agglomeration

schedule and see where the maximum jump on distances happens. In class, Mala mentioned it as

stage 29 (a jump of 1000), so we should keep 3 clusters. But if you see the dendogram, the threeclusters would have 21, 1 and 9 companies which is not good segmentation. (Company 14

appears for the first time in stage 29). So, youll have to come down to two clusters with 22 and

9 companies.


24/38

Now that we know how many clusters we want, we can use K-means cluster analysis. The path

is: Analyze > Classify > K-means cluster

Put the 16 variables and put number of clusters as 2.

Under Save, check cluster membership.

Under options, check ANOVA table and cluster information for each case. Run the program.


25/38

In the output, two tables are important: First is the cluster membership table. Cluster membership

will also come in the data sheet in the last column.

Second is the ANOVA table. ANOVA analysis in itself is not important i.e. Sig. values play

no role. However, the differences between the F-ratios (F column in the ANOVA Table) makes

it possible to draw general conclusions about the role of the different mean variables in the

forming of the clusters.


26/38

It shows that V11 has the greatest influence in the forming of the clusters and V13 has the least

influence, among the given values.

Now, we do a means comparison between the two clusters.


27/38


28/38

Cluster Number ofCase

1 2 Total

Mean N Std.

Deviation

Mean N Std.

Deviation

Mean N Std.

Deviation

Exceeds 9% 15.56 9 5.028 32.14 22 12.552 27.32 31 13.250

Exceeds 5% 6.11 9 2.667 21.27 22 9.867 16.87 31 10.908

Meets specifications 4.22 9 2.819 10.05 22 8.375 8.35 31 7.644

Short by 5% .00 9 .000 .00 22 .000 .00 31 .000

6 months 17.33 9 4.975 36.82 22 11.283 31.16 31 13.287

9 months 10.11 9 4.226 22.77 22 10.099 19.10 31 10.502

12 months 6.44 9 3.909 10.32 22 5.801 9.19 31 5.552

15 months.00 9 .000 .00 22 .000 .00 31 .000

V1032.33 9 14.018 12.95 22 6.191 18.58 31 12.617

V11 24.33 9 10.817 7.77 22 4.985 12.58 31 10.343

V12 12.33 9 7.089 3.95 22 2.820 6.39 31 5.823

V13.33 9 1.000 .09 22 .426 .16 31 .638

Installed, with 2-

year warranty

29.33 9 11.619 16.77 22 8.799 20.42 31 11.126

Installed, with 1-

year warranty

19.56 9 11.706 11.59 22 6.659 13.90 31 9.005

Installed, with

service contract

4.33 9 3.775 6.09 22 3.663 5.58 31 3.722

FOB, with service

contract

.00 9 .000 .00 22 .000 .00 31 .000

Sales$_2004 34.133 9 41.9710 4.464 22 3.8761 13.077 31 25.8396

Profit% 9.611 9 11.8505 3.764 22 6.2700 5.461 31 8.4999

Return_on_Equity 19.467 9 13.6689 17.714 22 17.0450 18.223 31 15.9327

Employees 56.33 9 67.050 14.11 22 15.130 26.37 31 41.697

SalesGrowth_(2003-

2004)

9.467 9 12.0271 20.164 22 24.7709 17.058 31 22.1913

TopMgt 18.89 9 2.713 35.77 22 2.617 30.87 31 8.213

Engineering 20.11 9 3.296 40.59 22 2.631 34.65 31 9.851

Finance 28.89 9 3.180 10.09 22 2.759 15.55 31 9.124

Purchasing 32.11 9 3.983 13.64 22 5.019 19.00 31 9.723Growth 21.89 9 2.315 9.59 22 2.594 13.16 31 6.192

Profit 28.33 9 2.828 20.77 22 2.159 22.97 31 4.191

MarketShare 14.22 9 2.819 10.36 22 3.170 11.48 31 3.511

TechLeadership 15.78 9 2.489 8.77 22 3.023 10.81 31 4.301

CorpCitEnv 6.00 9 3.000 25.55 22 3.143 19.87 31 9.521

GovReg 13.56 9 7.452 25.18 22 6.382 21.81 31 8.491


29/38

It can be noticed that cluster 1 has high mean values (cluster centers) for price variable

and warranty variable. So, they are price sensitive and prefer good service.

Cluster 2 has high values for efficiency and delivery time variables. So, they want

efficient products and quick delivery.

You can write more gas on the companys strategies using the above two points.


30/38

FACTOR ANALYSIS

This is slightly complicated to explain in writing and requires a lot of time.

There is a series of 6 videos of less than 40 min in total by Dr. Dawg(Yeah, I know).

He explains the basics pretty well. Heres a link to the first video:

https://www.youtube.com/watch?v=MB-5WB3eZI8

You can find the rest 5 videos on your own from here. You can watch all 6 and get a hold on

factor analysis and then read a bit on the net or the text book.
https://www.youtube.com/watch?v=MB-5WB3eZI8https://www.youtube.com/watch?v=MB-5WB3eZI8https://www.youtube.com/watch?v=MB-5WB3eZI8


31/38

BINARY LOGISTIC REGRESSION

You can use binary logistic regression to predict whether youll pass the upcoming MR

examination or not, based on study time, test anxiety and lecture attendance.

BLR predicts the probability that an observation falls into one of two categories of a

dichotomous dependent variable based on one or more independent variables that can be either

continuous or categorical.

First, lets understand the different types of variables. Broadly there are two types: Categoricaland Continuous.

Categorical variables can be divided further into nominal, ordinal or dichotomous.

Nominal2 or more categories with no intrinsic order. Eg: Types of property - Houses,

co-ops or bungalows, OccupationDoctor, Engineer, Artist

Ordinal Similar to nominal, but here, the categories can be ordered and ranked. So if

you asked someone if they liked the concept of ALS ice bucket challenge and they could

answer Not very much, It is Ok or Yes, a lot, then it is ordinal. You can rank these

three as least positive, middle response, most positive respectively. DichotomousNominal variable with only 2 categories. Eg: Gender or property divided

into two segments only (Residential and commercial)

Continuous variable are also known as quantitative variables. Further divided into interval or

ratio variables.

Interval Can be measured along a continuum and they have a numerical value. Eg.

Temperature in degree Celsius

RatioInterval variables with the added condition that zero of the measurement indicates

that there is none of the variable. Eg. Height, mass, distance

For BLR, dependent variable must be dichotomous with the 2 categories being mutually

exclusive and exhaustive, while independent variables can be continuous or categorical.

How to do in SPSS?


32/38

Consider the example from our final project We wanted to predict whether the Customer

shopped online? based on their age, gender and expenditure per month.

Therefore, age, gender and expenditure become our independent variables while online

becomes the dependent variable. The responses to online shopping were coded as:

1 = No, I dont shop online

2 = Yes, I shop online

The SPSS output will model the changes in the likelihood of online shoppers as it has the highercoding value.

We had 100 respondents. Part of the survey looked like this:

Gender Age

Expenditure on

clothing per month

Online

shopping?

Female26 to30 Less than 1000 Yes

Male26 to30 1000-2000 No

Female22 to25 1000-2000 No

Male22 to25 1000-2000 No

Male26 to30 2000-5000 Yes

Male26 to30 2000-5000 Yes

Female26 to30 2000-5000 No

Code it suitably to use in SPSS


33/38

Go to binary logistic regression

Put online into dependent variable box and the other three into covariates box.


34/38

Under options, choose the three plots and continue.


35/38


36/38

Basically, regression develops an equation of the kind -

Online behavior = b1 * age + b2 * gender + b3 * expenditure + constant (intercept)

If no model is used, it just uses the intercept to explain the behavior. So the sig. value under

coefficient for the model has to be


37/38

1 2 Correct

Step 1 Online 1 16 23 41.0

2 7 54 88.5

Overall Percentage 70.0a. The cut value is .500

Variables in the equation

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 1a Gender .767 .433 3.133 1 .077 2.153

Age -.328 .278 1.397 1 .237 .720

Expenditure .599 .263 5.173 1 .023 1.820

Constant -.817 1.211 .456 1 .500 .442

a. Variable(s) entered on step 1: Gender, Age, Expenditure.

Examine the standard errors for the b coefficients. A standard error larger than 2.0 indicates

numerical problems. Analyses that indicate numerical problems should not be interpreted.

None of the independent variables in this analysis had a standard error larger than 2.0.

Also, from the level of significance, expenditure (p=0.023) added value significantly to the

model as compared to age(p=0.237) and gender(p=0.077). Least values of Significance has the

highest contribution to the equation.

We can observe that the logistic coefficient is highest for Gender and least for age. Consider theExp(B) column:

A unit increase in gender would increase the odds of online shopping by 2.153 times.

Since male was coded as 1 and female as 0, the odds were more in favor of the males A unit increase in age increases the odds of online shopping by 0.72 times, which means

every unit increase in age decreases online shopping chances by 28%. In our case, a unit

increase in age is increase of about 5 years as we have considered interval variable for

age.

A unit increase in expenditure increases the odds of online shopping by 1.82 times


38/38

Now, create/use your own data or get data from the net and try Binary Logistic regression.

DISCRIMINANT ANALYSIS

Explained in the PPT attached along with this document. 122 slides, but most of them are

screenshot images and I guess only the first half of the PPT is required for us i.e. one example of

discriminant analysis should be enough to understand.

Documents

Market Research Doc