Upload
zqasim
View
216
Download
0
Embed Size (px)
Citation preview
8/11/2019 Market Research Doc
1/38
MARKET RESEARCH
Teaching you how to fish
8/11/2019 Market Research Doc
2/38
Why, you ask?
Because I have this burning desire inside of me to give gyaanto people who dont need it.
Also because it is saddening to see the one sensible marketing subject detested by the majority.
And because Mala issoawesome.
To the best of my knowledge, this guide is accurate and should suffice for the end term
examination. It took me two days to do this. An invaluable learning experience for me as well.
However, I am not claiming it to be perfect. So, if you have any corrections or additions to make
to this document, please drop a mail [email protected].
Cheers.
Oh, I almost forgot.
If you find this useful, do share it with others who might need this as well. There is no point in
holding onto information just to score more marks. That A grade in the mark sheet will takeyou only so far.
*Hardcore open source fan*
mailto:[email protected]:[email protected]:[email protected]:[email protected]8/11/2019 Market Research Doc
3/38
Contents
MULTI-DIMENSIONAL SCALING .................................................................................................................... 4
CONJOINT ANALYSIS ................................................................................................................................... 11
CLUSTER ANALYSIS ...................................................................................................................................... 17
FACTOR ANALYSIS ....................................................................................................................................... 30
BINARY LOGISTIC REGRESSION ................................................................................................................... 31
DISCRIMINANT ANALYSIS ............................................................................................................................ 38
8/11/2019 Market Research Doc
4/38
MULTI-DIMENSIONAL SCALING
MDS allows the perceptions and preferences of the consumers to be clearly represented in a
spatial map. It gives quantitative estimates of similarity between groups of items.
Lets say, in a survey, the respondents were asked to give ratings for all possible pairs of nine
pairs of soft drinks in terms of their similarity and MDS results in the following spatialrepresentation of the soft drinks:
From this it can be inferred that Coke and Pepsi are the most similar as there is the least distance
between them, while Dr Pepper and Diet 7-up (Or Diet coke and Tango) are the most dissimilar.
The catch in this is that MDS doesnt give what the horizontal and vertical axes represent. Theseare dependent on the judgment of the researcher. For instance, vertical axis can be dietness and
horizontal axis can be flavor. Or it could be price, color, soda amount etc.
Other applications of MDS would be
8/11/2019 Market Research Doc
5/38
Market segmentationGroup customers with similar interests
New product development - Where would you want your new soft drink to be placed in
the above map?
Pricing
Choice of retail outletsWhere and which channel?
AdvertisingWhich tagline or actor is suitable for different brands/products?
Eg. For advertising: Suppose there are four categories and four actors
1.
Telecom 1. Shahrukh Khan
2.
Fairness creams 2. Uday Chopra
3. Tourism 3. John Abraham
4. Viagra 4. Abhishek Bachchan
The respondents can be asked to rate individual pairs on a likert scale of 1-7 (1 Good match, 7
Not good match) and see which actor might be suitable for a category:
8/11/2019 Market Research Doc
6/38
So, this shows that Abhishek Bachchan would be most suitable for Telecom ads (This is a
random figure. We know that its not true in reality), and none of the four are close enough to
Viagra. Probably because they dont have what it takes to represent that category.
How to do MDS in SPSS?
To obtain a spatial map for 5 brands of beer:
1. Obtain the data from respondents on a likert scale of 1-5 or 1-7, with 1 Most similar, 7
least similar
2.
Here row 1 represents Budvar, 2Budweiser and so on. So, theyll have 0 (or 1 ) against
themselves. Only upper half of the triangle in the data matrix suffices here. There mightbe some special cases where respondents may say, New Zealand is similar to Australia,
but Australia is not similar to New Zealand. Then, you will need the complete matrix ofdata to analyze.
3. Now, there are two methods of MDSPROXSCAL AND ALSCAL.
PROXSCAL is an update in the SPSS after ALSCAL was found to be inefficient.Apparently, we were taught ALSCAL, which I am not comfortable with. So you may
figure it out on your own once you have gone through the PROXSCAL method.
4. Go to Analyze > Scale > MDS [PROXSCAL] Click on Define in the next window that
appears.
8/11/2019 Market Research Doc
7/38
5. Put all 5 brands of beer under proximities.
6.
In model, choose upper-triangular matrix and dimensions as 1 to 4, as we have 5 brands.
We need to see the stress plot and reduce the dimensions later.
8/11/2019 Market Research Doc
8/38
7. Go to plots and check stress plot and uncheck common space for now.
8/11/2019 Market Research Doc
9/38
8. Run it. The scree plot shows the elbow formed at dimension 2. After that, the stress
doesnt reduce by a significant amount as the dimensions increase. Our objective is to
keep both dimensions and stress minimum. So, we have to make a call at some point to
balance the two.
9.
Now, run the MDS PROXSCAL all over again, but this time, change the dimensions
range to Min2, max2 and check the common space plots.
8/11/2019 Market Research Doc
10/38
10.Youll get the above result. It can be seen that Budweiser is most unique among the 5.
Heineken and Carlsberg, and Corona and Budvar are similar pairs. It is hard to name thedimensions with such a small sample. It could be aftertaste, degree of high that you get,
etc. The dimensions can also be judged by taking a ranking of beer attributes in a separate
survey.
11.
Validity: Stress values are indicative of the quality of MDS solutions. In general,following is the recommendation for stress values
In our output, it is 2.79% (0.02796) which makes it an excellent fit.
8/11/2019 Market Research Doc
11/38
CONJOINT ANALYSIS
Used to determine how people value different features that make up an individual product orservice.
Lets say I am a shoe manufacturer and there are three important features(attributes):
Material
Color
Price
Further, we know that there is a range of feasible alternatives(attribute levels) for each of these
features, say,
Material Durability Price
Leather 1 year 300
Canvas 2 years 1000
Rubber 2000
Obviously, the markets ideal shoe would be
Material Durability Price
Leather 2 years 300
And the ideal shoe from my perspective (manufacturer) would be
Material Durability Price
Rubber 1 year 2000
Here is the basic marketing issue: Id be stripped naked selling the first shoe whereas the market
wouldnt buy the second. So, the most viable product lies somewhere in between. Conjoint
analysis lets us find out where.
Now, we would need a survey wherein the respondents are asked to evaluate different productcombinations. In the present case, we have 3*2*3 = 18 possible combinations. Generally, it is
advisable to bring it down to 10-12 combinations after careful judgment. For now, let us consider
all 18 cases.
8/11/2019 Market Research Doc
12/38
Once you have got the scores/ranking, convert to binary variable format and enter into SPSS.
Below is an example:
For the Material attribute, there are 3 levels, so we use 2 variables.
If both VarCanvas and VarLeather are zero, it implies material is rubber. Similarly, for otherattributes and their respective levels.
Dur2 = 1 implies durability of 2 years.
V2000 and V1000 are two variables for the prices 2000 and 1000 rupees respectively.
Preference = 18, is the highest preference of the customer. So, consider them as scores and not asa ranking from 1 to 18.
Then, you do a linear regression.
8/11/2019 Market Research Doc
13/38
Preference is the dependent variable and the rest are independent variables. Click OK.
8/11/2019 Market Research Doc
14/38
The output is as follows:
Interpretation:
Firstly, the validity of the linear regression model has to be checked from the model summary.
Rsquare = 0.89 i.e. the model explains 89% of the variance. In general, 60%+ is considered to be
a good fit.
Note on R square:
8/11/2019 Market Research Doc
15/38
The regression model on the left accounts for 38.0% of the variance while the one on the
right accounts for 87.4%. The more variance that is accounted for by the regression
model the closer the data points will fall to the fitted regression line. Theoretically, if a
model could explain 100% of the variance, the fitted values would always equal the
observed values and, therefore, all the data points would fall on the fitted regression line.
Second, ANOVA table. The null hypothesis of ANOVA would be there are no differences
between the means of the samples. Meaning, there is no significant difference between all the
variables taken which would make it redundant. So, the level of significance should be < 0.05 so
that the null hypothesis is rejected. Here, sig. = 0 from the table.
Third, we calculate the utilities from the coefficients table. The B column gives the observed
utilities. Note that the total utility for any attribute (like material) is equal to zero.
Utility (Canvas) = 1.667
Utility (Leather) = 5.833
Therefore, Utility (Rubber) = -7.50
Similarly for others:
Utility (Dur2) = 6.778, Utility (Dur1) = -6.778
Utility (V2000) = -6.167, Utility (V1000) = -3.833, Utility (V300) = 10
Four, we calculate the actual utilities because SPSS considered the utility of the implicit
variables (Rubber, Dur1 and V300) as zero, and it gave the other utilities of the explicit variables
in relative to zero.
So, Utility (Canvas)Utility (Rubber) = 1.667
8/11/2019 Market Research Doc
16/38
And, Utility (Leather)Utility (Rubber) = 5.833
We already know that Utility(Canvas) + Utility(Leather) + Utility (Rubber) = 0
Solving these three equations, we get the actual utilities.
Material Utility
Canvas -0.85
Leather 3.36
Rubber -2.51
Similarly, do for other two attributes.
Durability Utility
2 years 3.389
1 year -3.389
Price Utility
2000 -2.837
1000 -0.493
300 3.33
Now, we can rank the different product combinations on the basis of maximum sum of utilities.
The top ten combinations would be
Material Durability Price
Utility
sum
Leather 2 years 300 10.079
Leather 2 years 1000 6.256
Canvas 2 years 300 5.869
Rubber 2 years 300 4.209
Leather 2 years 2000 3.912
Leather 1 year 300 3.301
Canvas 2 years 1000 2.046
Rubber 2 years 1000 0.386
Canvas 2 years 2000 -0.298
Rubber 2 years 2000 -1.958
This is where I, as a manufacturer, would make a trade-off. I would leave the top 3 preferredproducts as they would mean only losses for me and maybe go for the 4thor 5thcombination.
There would be no customers for the last few combinations.
8/11/2019 Market Research Doc
17/38
CLUSTER ANALYSIS
Identifying groups of individuals or objects that are similar to each other but different
from individuals in other groups
Each object is assigned to only one cluster
In cluster analysis, there is no a priori information about the group or cluster membership
for any of the objects (In discriminant, we do. Will discuss that in later section.)
Mainly used for understanding buying behaviors and market segmentation.
DURR CASE:
Background - DRR Environmental Controls is a German conglomerate producing air emission
control systems and that has extensive industrial operations in the US. The company isconsidering introducing one or more offers in the US market and believes that its product will
need lower service costs that its competitors products.
Our objective is to propose a marketing segmentation, allowing DURR to target customers with
specific and efficient sales pitch.
MARKET AND COMPETITION:
While choosing a product in the market, customers look at four dimensions:
Efficiency
Delivery Time
Price
Delivery Terms
Each of these levels have 4 sub-levels.
Now, we do analysis using SPSS. We have 3 types of clustering: Hierarchical, K-means and two
step. We have studied only the first two methods.
For K-means, we need to know how many clusters we want. So first, we use hierarchical cluster
method.
8/11/2019 Market Research Doc
18/38
8/11/2019 Market Research Doc
19/38
We use squared Euclidean distance measurements
Also, check Dendogram under plots section and run the program.
8/11/2019 Market Research Doc
20/38
The proximity matrix gives the squared Euclidean distances (Ill call it SED) between 2
companies across the 16 variables. For eg. The SED between 8 and 9 is 235.
How is this obtained? You remember how we used to find the distance between two points in co-
ordinate geometry? That was Euclidean distance too.
Square c and you get SED. So, consider company 8 and company 9 as two points and the
variable values as its co-ordinates.
Going back to the DURR data sheet, we get the SED between 8 and 9 as
(19-17)2+ (57)2 + (34) 2+ (0-0)2 + (19-20) 2+. = 235
Now, we know how all the possible SEDs are calculated. SPSS then arranges in increasing order
of the distances between the pairs and puts them in a cluster step by step. This is when you study
the agglomeration schedule table. It lists the 30 least distances. As there are 31 companies, we
8/11/2019 Market Research Doc
21/38
8/11/2019 Market Research Doc
22/38
You go on doing this till stage 8, then a company re-appears for the second time. This is where
the right half of the table helps.
At stage 9, in the right half, it says that company 8 had already appeared in stage 4. This means
you are going to put company 3 into the STAGE 4 box and then rename it to STAGE 9 now.
The number of boxes (or clusters) goes on reducing as you go further. In stage 14, for instance,
boxes 7 and 8 are merged to form a bigger box 14.
The dendogram also depicts this process. Take a moment and youll understand the figure. The
plus signs are the points where the merge happens. (Some SPSS outputs dont give dotted lines
or plus symbols, but they are also quite simple to read.)
8/11/2019 Market Research Doc
23/38
Now, youll have to make a call on the number of clusters to keep. Look at the agglomeration
schedule and see where the maximum jump on distances happens. In class, Mala mentioned it as
stage 29 (a jump of 1000), so we should keep 3 clusters. But if you see the dendogram, the threeclusters would have 21, 1 and 9 companies which is not good segmentation. (Company 14
appears for the first time in stage 29). So, youll have to come down to two clusters with 22 and
9 companies.
8/11/2019 Market Research Doc
24/38
Now that we know how many clusters we want, we can use K-means cluster analysis. The path
is: Analyze > Classify > K-means cluster
Put the 16 variables and put number of clusters as 2.
Under Save, check cluster membership.
Under options, check ANOVA table and cluster information for each case. Run the program.
8/11/2019 Market Research Doc
25/38
In the output, two tables are important: First is the cluster membership table. Cluster membership
will also come in the data sheet in the last column.
Second is the ANOVA table. ANOVA analysis in itself is not important i.e. Sig. values play
no role. However, the differences between the F-ratios (F column in the ANOVA Table) makes
it possible to draw general conclusions about the role of the different mean variables in the
forming of the clusters.
8/11/2019 Market Research Doc
26/38
It shows that V11 has the greatest influence in the forming of the clusters and V13 has the least
influence, among the given values.
Now, we do a means comparison between the two clusters.
8/11/2019 Market Research Doc
27/38
8/11/2019 Market Research Doc
28/38
Cluster Number ofCase
1 2 Total
Mean N Std.
Deviation
Mean N Std.
Deviation
Mean N Std.
Deviation
Exceeds 9% 15.56 9 5.028 32.14 22 12.552 27.32 31 13.250
Exceeds 5% 6.11 9 2.667 21.27 22 9.867 16.87 31 10.908
Meets specifications 4.22 9 2.819 10.05 22 8.375 8.35 31 7.644
Short by 5% .00 9 .000 .00 22 .000 .00 31 .000
6 months 17.33 9 4.975 36.82 22 11.283 31.16 31 13.287
9 months 10.11 9 4.226 22.77 22 10.099 19.10 31 10.502
12 months 6.44 9 3.909 10.32 22 5.801 9.19 31 5.552
15 months.00 9 .000 .00 22 .000 .00 31 .000
V1032.33 9 14.018 12.95 22 6.191 18.58 31 12.617
V11 24.33 9 10.817 7.77 22 4.985 12.58 31 10.343
V12 12.33 9 7.089 3.95 22 2.820 6.39 31 5.823
V13.33 9 1.000 .09 22 .426 .16 31 .638
Installed, with 2-
year warranty
29.33 9 11.619 16.77 22 8.799 20.42 31 11.126
Installed, with 1-
year warranty
19.56 9 11.706 11.59 22 6.659 13.90 31 9.005
Installed, with
service contract
4.33 9 3.775 6.09 22 3.663 5.58 31 3.722
FOB, with service
contract
.00 9 .000 .00 22 .000 .00 31 .000
Sales$_2004 34.133 9 41.9710 4.464 22 3.8761 13.077 31 25.8396
Profit% 9.611 9 11.8505 3.764 22 6.2700 5.461 31 8.4999
Return_on_Equity 19.467 9 13.6689 17.714 22 17.0450 18.223 31 15.9327
Employees 56.33 9 67.050 14.11 22 15.130 26.37 31 41.697
SalesGrowth_(2003-
2004)
9.467 9 12.0271 20.164 22 24.7709 17.058 31 22.1913
TopMgt 18.89 9 2.713 35.77 22 2.617 30.87 31 8.213
Engineering 20.11 9 3.296 40.59 22 2.631 34.65 31 9.851
Finance 28.89 9 3.180 10.09 22 2.759 15.55 31 9.124
Purchasing 32.11 9 3.983 13.64 22 5.019 19.00 31 9.723Growth 21.89 9 2.315 9.59 22 2.594 13.16 31 6.192
Profit 28.33 9 2.828 20.77 22 2.159 22.97 31 4.191
MarketShare 14.22 9 2.819 10.36 22 3.170 11.48 31 3.511
TechLeadership 15.78 9 2.489 8.77 22 3.023 10.81 31 4.301
CorpCitEnv 6.00 9 3.000 25.55 22 3.143 19.87 31 9.521
GovReg 13.56 9 7.452 25.18 22 6.382 21.81 31 8.491
8/11/2019 Market Research Doc
29/38
It can be noticed that cluster 1 has high mean values (cluster centers) for price variable
and warranty variable. So, they are price sensitive and prefer good service.
Cluster 2 has high values for efficiency and delivery time variables. So, they want
efficient products and quick delivery.
You can write more gas on the companys strategies using the above two points.
8/11/2019 Market Research Doc
30/38
FACTOR ANALYSIS
This is slightly complicated to explain in writing and requires a lot of time.
There is a series of 6 videos of less than 40 min in total by Dr. Dawg(Yeah, I know).
He explains the basics pretty well. Heres a link to the first video:
https://www.youtube.com/watch?v=MB-5WB3eZI8
You can find the rest 5 videos on your own from here. You can watch all 6 and get a hold on
factor analysis and then read a bit on the net or the text book.
https://www.youtube.com/watch?v=MB-5WB3eZI8https://www.youtube.com/watch?v=MB-5WB3eZI8https://www.youtube.com/watch?v=MB-5WB3eZI88/11/2019 Market Research Doc
31/38
BINARY LOGISTIC REGRESSION
You can use binary logistic regression to predict whether youll pass the upcoming MR
examination or not, based on study time, test anxiety and lecture attendance.
BLR predicts the probability that an observation falls into one of two categories of a
dichotomous dependent variable based on one or more independent variables that can be either
continuous or categorical.
First, lets understand the different types of variables. Broadly there are two types: Categoricaland Continuous.
Categorical variables can be divided further into nominal, ordinal or dichotomous.
Nominal2 or more categories with no intrinsic order. Eg: Types of property - Houses,
co-ops or bungalows, OccupationDoctor, Engineer, Artist
Ordinal Similar to nominal, but here, the categories can be ordered and ranked. So if
you asked someone if they liked the concept of ALS ice bucket challenge and they could
answer Not very much, It is Ok or Yes, a lot, then it is ordinal. You can rank these
three as least positive, middle response, most positive respectively. DichotomousNominal variable with only 2 categories. Eg: Gender or property divided
into two segments only (Residential and commercial)
Continuous variable are also known as quantitative variables. Further divided into interval or
ratio variables.
Interval Can be measured along a continuum and they have a numerical value. Eg.
Temperature in degree Celsius
RatioInterval variables with the added condition that zero of the measurement indicates
that there is none of the variable. Eg. Height, mass, distance
For BLR, dependent variable must be dichotomous with the 2 categories being mutually
exclusive and exhaustive, while independent variables can be continuous or categorical.
How to do in SPSS?
8/11/2019 Market Research Doc
32/38
Consider the example from our final project We wanted to predict whether the Customer
shopped online? based on their age, gender and expenditure per month.
Therefore, age, gender and expenditure become our independent variables while online
becomes the dependent variable. The responses to online shopping were coded as:
1 = No, I dont shop online
2 = Yes, I shop online
The SPSS output will model the changes in the likelihood of online shoppers as it has the highercoding value.
We had 100 respondents. Part of the survey looked like this:
Gender Age
Expenditure on
clothing per month
Online
shopping?
Female26 to30 Less than 1000 Yes
Male26 to30 1000-2000 No
Female22 to25 1000-2000 No
Male22 to25 1000-2000 No
Male26 to30 2000-5000 Yes
Male26 to30 2000-5000 Yes
Female26 to30 2000-5000 No
Code it suitably to use in SPSS
8/11/2019 Market Research Doc
33/38
Go to binary logistic regression
Put online into dependent variable box and the other three into covariates box.
8/11/2019 Market Research Doc
34/38
Under options, choose the three plots and continue.
8/11/2019 Market Research Doc
35/38
8/11/2019 Market Research Doc
36/38
Basically, regression develops an equation of the kind -
Online behavior = b1 * age + b2 * gender + b3 * expenditure + constant (intercept)
If no model is used, it just uses the intercept to explain the behavior. So the sig. value under
coefficient for the model has to be
8/11/2019 Market Research Doc
37/38
1 2 Correct
Step 1 Online 1 16 23 41.0
2 7 54 88.5
Overall Percentage 70.0a. The cut value is .500
Variables in the equation
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1a Gender .767 .433 3.133 1 .077 2.153
Age -.328 .278 1.397 1 .237 .720
Expenditure .599 .263 5.173 1 .023 1.820
Constant -.817 1.211 .456 1 .500 .442
a. Variable(s) entered on step 1: Gender, Age, Expenditure.
Examine the standard errors for the b coefficients. A standard error larger than 2.0 indicates
numerical problems. Analyses that indicate numerical problems should not be interpreted.
None of the independent variables in this analysis had a standard error larger than 2.0.
Also, from the level of significance, expenditure (p=0.023) added value significantly to the
model as compared to age(p=0.237) and gender(p=0.077). Least values of Significance has the
highest contribution to the equation.
We can observe that the logistic coefficient is highest for Gender and least for age. Consider theExp(B) column:
A unit increase in gender would increase the odds of online shopping by 2.153 times.
Since male was coded as 1 and female as 0, the odds were more in favor of the males A unit increase in age increases the odds of online shopping by 0.72 times, which means
every unit increase in age decreases online shopping chances by 28%. In our case, a unit
increase in age is increase of about 5 years as we have considered interval variable for
age.
A unit increase in expenditure increases the odds of online shopping by 1.82 times
8/11/2019 Market Research Doc
38/38
Now, create/use your own data or get data from the net and try Binary Logistic regression.
DISCRIMINANT ANALYSIS
Explained in the PPT attached along with this document. 122 slides, but most of them are
screenshot images and I guess only the first half of the PPT is required for us i.e. one example of
discriminant analysis should be enough to understand.