View
38
Download
2
Category
Tags:
Preview:
Citation preview
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
A Presenta*on from The Fes*val of NewMR – Training Day
3 December 2012
All copyright owned by The Future Place and the presenters of the material For more informa:on about NewMR events visit NewMR.org
Sponsored by:
See the eXhib:on for booths from media partners & supporters
An Introduc*on to Latent Class Analysis for Marke*ng Segmenta*on Tim Bock, Q
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
An Introduction to Latent Class Analysis for Marketing
Segmentation Tim Bock, Q
www.q-researchsoftware.com tim.bock@q-researchsoftware.com
+61 425 241 989
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Overview
• Latent class analysis versus cluster analysis – Theore:cal difference: probabili:es – Prac:cal differences:
• Non-‐numeric data (e.g., categorical data) • Missing values
• Applica:on: what do research buyer’s want? – Missing values – Response bias
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Latent class analysis turns data into segments
Worriers
Concerned
with decay
preven:on
Sociables Concerned with tooth colour
Sensory Concerned with
flavour
Independent Concerned with price
Adapted from: Haley, R. I. (1968). "Benefit Segmenta:on: A Decision Oriented Research Tool." Journal of Marke:ng 30(July): 30-‐35.
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Cluster Analysis
Latent Class Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Cluster Analysis versus Latent Class Analysis for segmenta*on
• Latent class analysis is theore:cally superior – Clearly-‐stated assump:ons – Cluster analysis is inconsistent with elementary laws of probability (in par:cular, Bayes’ Theorem)
• Latent class analysis so_ware is superior – Any type of data (via distribu:onal assump:ons): Categorical, Conjoint, Choice, MaxDiff, Rankings, etc.
– “Mixed” data (e.g., categorical and numeric) – Missing values – Response biases
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Specify number of clusters (k)
k-‐Means Cluster Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Specify number of clusters (k)
k-‐Means Cluster Analysis
Randomly allocate respondents to clusters
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Specify number of clusters (k)
Randomly allocate respondents to clusters
k-‐Means Cluster Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Specify number of clusters (k)
Randomly allocate respondents to clusters
Compute cluster means
k-‐Means Cluster Analysis
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Specify number of clusters (k)
Randomly allocate respondents to clusters
Compute cluster means
k-‐Means Cluster Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
k-‐Means Cluster Analysis Specify number of
clusters (k)
Randomly allocate respondents to clusters
Compute cluster means
Allocate respondents to most similar clusters
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
k-‐Means Cluster Analysis Specify number of
clusters (k)
Randomly allocate respondents to clusters
Compute cluster means
Allocate respondents to most similar clusters
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Specify number of clusters (k)
Randomly allocate respondents to clusters
Allocate respondents to most similar clusters
k-‐Means Cluster Analysis
Compute cluster means
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Specify number of clusters (k)
Randomly allocate respondents to clusters
Allocate respondents to most similar clusters
k-‐Means Cluster Analysis
Compute cluster means
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Specify number of clusters (k)
Randomly allocate respondents to clusters
Allocate respondents to most similar clusters
k-‐Means Cluster Analysis
Compute cluster means Repeat un:l changes in
cluster means are small or non-‐existent
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Specify number of clusters (k)
Randomly allocate respondents to clusters
Allocate respondents to most similar clusters
Repeat un:l changes in
cluster means are small or non-‐existent
k-‐Means Cluster Analysis
Compute cluster means
Specify number of classes (k)
Randomly allocate respondents to classes
Compute class parameters*
Compute probability of each respondent being
in each class
Repeat un:l changes in
class parameters are small or non-‐existent
Latent Class Analysis
Allocate respondents classes with highest
probabili:es
This is a comparison of batch k-‐means and Latent Class Analysis with an EM Algorithm. See Celeux and Govaert (1991), “Clustering criteria for discrete data and latent class models”, Journal of Classifica:on, 8(2) for a more mathema:cal comparison. * The class parameters are computed as weighted averages of the segmenta:on variables, where the weights are the probabili:es of each respondent being in each segment.
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Cluster Analysis
0 5 10 15 20 25 30 35
25
20
15
10
5
0
Latent Class Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Cluster Analysis
Latent Class Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
missing values
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
How many clusters (or classes) can you see in this data?
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Missing values and latent class analysis
A B C D
Cluster 1 1 2 3 4
Cluster 2 4 3 2 1
Cluster 3 1 2 2 1
Class means
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Missing values and cluster analysis
A B C D
Cluster 1 1 2 3 3
Cluster 2 MISSING MISSING MISSING MISSING
Cluster 3 3 3 2 1
Cluster means
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
distribu*onal assump*ons
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Distribu*onal assump*ons • Basic idea: instruct a latent class models
about how to interpret the data • Categorical assump:on:
look only at matches – Example: respondent 1 is most similar to 2 and 3 (i.e., they match on
two variables)
• Numeric assump:on: assign values and compute differences (e.g., Agree = 3, Neither = 2, Disagree = 1) – Example: respondent 1 is most similar to respondent 3
• Ranking assump:on: look at rela:ve order – Respondent 1 is iden:cal to respondent 4
Variable
ID A B C
1 Agree Agree Neither
2 Agree Disagree Neither
3 Agree Neither Neither
4 Neither Neither Disagree
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Example: Categorical data Data Shop Agree (A) or disagree (D) that “It is important to
shop around” Understand Agree (A) or disagree (D) that “I understand my
company's communica:on needs” Key Agree (A) or disagree (D) that “Communica:ons
technology is key to our business” Interested Agree (A) or disagree (D) that “I am interested in
communica:ons technology” Value Agree (A) or disagree (D) that “Value for money
is more important to us than gelng the best technology”
ID
Shop
Und
erstan
d
Key
Interest
Value
1 A A A A D 2 A A A D A 3 A A A A D 4 A A D A A 5 A D A D D 6 D A A A D 7 A D A D D 8 D D A A D 9 A A A A A 10 A A A A D 11 D A D D A 12 A A A A A 13 D D D D D … … … … … …
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Specify number of classes (k)
Randomly allocate respondents to classes
Compute class parameters
Compute probability of each respondent being
in each class
Repeat un:l changes in
class parameters are small or non-‐existent
Latent Class Analysis
Allocate respondents classes with highest
probabili:es
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
ID
Shop
Und
erstan
d
Key
Interest
Value
… … … … … … 6 D A A A D … … … … … …
Data Parameters
Looking at the parameters, which class do you think respondent 6 belongs to?
Size Shop Under-‐
stand Key Interest Value
Class 1 67% Agree 40% 40% 48% 16% 53%
Disagree 60% 60% 52% 84% 47%
Class 2 33% Agree 65% 90% 88% 100% 26%
Disagree 35% 10% 12% 0% 73%
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Compu*ng the probability of each respondent being in each class
Size Shop Under-‐
stand Key Interest Value
Class 1 67% Agree 40% 40% 48% 16% 53%
Disagree 60% 60% 52% 84% 47%
Class 2 33% Agree 65% 90% 88% 100% 26%
Disagree 35% 10% 12% 0% 73%
ID
Shop
Und
erstan
d
Key
Interest
Value
… … … … … … 6 D A A A D … … … … … …
Data Parameters
Ini:al best guess of probabili:es is given by the class sizes: Class 1: 67% Class 2: 33%
Prior
Probability that somebody in each class would give answers: Class 1: 60%×40%×48%×16%×47% = 1% Class 2: 35%×90%×88%×100%×73% = 20%
Class condi:onal densi:es 67%×1% 67%×1% + 3%×20% 33%×20% 67%×1% + 33%×20%
Posterior probability (Probability of being in a class)
Class 1: = 9% Class 2: = 91%
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Applica*on
n = 1,145 market researchers (GRIT2012/2013) “How important do you think each of the following atributes is to clients when they select a research provider?” 5 POINT SCALE RANDOMLY SHOW 15 OF 25 ATTRIBUTES TO EACH RESPONDENT
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Cluster Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Numeric Assump*on
Lowest price Previous experience with client/supplier
Rapid response to requests Listens well and understands client needs Flexibility on changing project parameters
Familiarity with client needs Completes research in an agreed-‐upon :me
Good rela:onship with client/supplier Breadth of experience in the target segment
Good reputa:on in the industry Familiarity with the industry or category Length of experience/:me in business
Has an access panel Company is financially stable
Has knowledgeable staff High quality analysis
Provides data analysis services Understands new consumer communica:on channels & technologies
Also does qualita:ve research Consulta:on on best prac:ces and methodology effec:veness
Uses sophis:cated research technology/strategies Provides highest data quality
Uses the latest sta:s:cal/analy:cal packages Offers unique methodology or approach
Uses the latest data collec:on technology
Segment 1 (45%)
%
Segment 2 (11%)
%
Segment 3 (45%)
% Segment 1 Segment 2 Segment 3
Numeric 3 class
Lowest pricePrevious experience with client/supplier
Rapid response to requestsLi s tens well and understands client needsFlexibi l i ty on changing project parameters
Fami l iarity with client needsCompletes research in an agreed-‐upontimeGood relationship with client/supplierBreadth of experience in the target
segmentGood reputation in the industryFami l iari ty wi th the industry or categoryLength of experience/time in business
Has an access panelCompany is financially stable
Has knowledgeable staffHigh quality analysis
Provides data analysis servicesUnderstands new consumercommunication channels & technologiesAl so does qualitative researchConsul tation on best practices and
methodology effectivenessUses sophis ti cated research technology/strategiesProvides highest data qualityUses the latest statistical/analyticalpackagesOffers unique methodology or approach
Uses the latest data collection technology
Impo
rtan
ce to
clients (Res
earch providers vie
wpoint): To
p 2 boxes (out of 5) -‐ reordered
50889598
8399979590939286
3671
97968684
6696
7596
577579
68736567
47506775
314033
1633
1358
302827
161827
151031
17
55878997
71959194
818285
512
369691
5945
2271
3769
739
14
Top 2 Box (%)
Percentages are Top 2 Box Scores. Where values are significantly higher than average the bars are shaded orange. Darker shades of orange correspond to smaller p-‐values.
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Categorical Assump*on
Lowest price Previous experience with client/supplier
Rapid response to requests Listens well and understands client needs Flexibility on changing project parameters
Familiarity with client needs Completes research in an agreed-‐upon :me
Good rela:onship with client/supplier Breadth of experience in the target segment
Good reputa:on in the industry Familiarity with the industry or category Length of experience/:me in business
Has an access panel Company is financially stable
Has knowledgeable staff High quality analysis
Provides data analysis services Understands new consumer communica:on channels & technologies
Also does qualita:ve research Consulta:on on best prac:ces and methodology effec:veness
Uses sophis:cated research technology/strategies Provides highest data quality
Uses the latest sta:s:cal/analy:cal packages Offers unique methodology or approach
Uses the latest data collec:on technology
Segment 1 (50%)
%
Segment 2 (50%)
% Segment 1 Segment 2
Al l categories
Lowest pricePrevious experience with client/supplier
Rapid response to requestsLi s tens well and understands client needsFlexibi l i ty on changing project parameters
Fami l iarity with client needsCompletes research in an agreed-‐upontimeGood relationship with client/supplierBreadth of experience in the target
segmentGood reputation in the industryFami l iari ty wi th the industry or categoryLength of experience/time in business
Has an access panelCompany is financially stable
Has knowledgeable staffHigh quality analysis
Provides data analysis servicesUnderstands new consumercommunication channels & technologiesAl so does qualitative researchConsul tation on best practices and
methodology effectivenessUses sophis ti cated research technology/strategiesProvides highest data qualityUses the latest statistical/analyticalpackagesOffers unique methodology or approach
Uses the latest data collection technology
Impo
rtan
ce to
clients (Res
earch providers vie
wpoint): To
p 2 boxes (out of 5) -‐ reordered
41909698
871009698
899592
8124
679897
8277
5593
6989
486560
66818390
61868786
707173
431832
8574
5243
2758
3660
1343
27
Top 2 Box (%)
Percentages are Top 2 Box Scores. Where values are significantly higher than average the bars are shaded orange. Darker shades of orange correspond to smaller p-‐values.
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Ranking Assump*on
Lowest price Previous experience with client/supplier
Rapid response to requests Listens well and understands client needs Flexibility on changing project parameters
Familiarity with client needs Completes research in an agreed-‐upon :me
Good rela:onship with client/supplier Breadth of experience in the target segment
Good reputa:on in the industry Familiarity with the industry or category Length of experience/:me in business
Has an access panel Company is financially stable
Has knowledgeable staff High quality analysis
Provides data analysis services Understands new consumer communica:on channels & technologies
Also does qualita:ve research Consulta:on on best prac:ces and methodology effec:veness
Uses sophis:cated research technology/strategies Provides highest data quality
Uses the latest sta:s:cal/analy:cal packages Offers unique methodology or approach
Uses the latest data collec:on technology
Segment 1 (54%)
%
Segment 2 (46%)
% Segment 1 Segment 2
Ranking
Lowest pricePrevious experience with client/supplier
Rapid response to requestsLi s tens well and understands client needsFlexibi l i ty on changing project parameters
Fami l iarity with client needsCompletes research in an agreed-‐upontimeGood relationship with client/supplierBreadth of experience in the target
segmentGood reputation in the industryFami l iari ty wi th the industry or categoryLength of experience/time in business
Has an access panelCompany is financially stable
Has knowledgeable staffHigh quality analysis
Provides data analysis servicesUnderstands new consumercommunication channels & technologiesAl so does qualitative researchConsul tation on best practices and
methodology effectivenessUses sophis ti cated research technology/strategiesProvides highest data qualityUses the latest statistical/analyticalpackagesOffers unique methodology or approach
Uses the latest data collection technology
Impo
rtan
ce to
clients (Res
earch providers vie
wpoint): To
p 2 boxes (out of 5) -‐ reordered86979898
80959495
808382
6119
4790
8162
5431
6740
6317
3526
23748191
68898890
788283
6322
539590
7571
5186
6988
4975
65
Top 2 Box (%)Percentages are Top 2 Box Scores. Where values are significantly higher than average the bars are shaded orange. Darker shades of orange correspond to smaller p-‐values.
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Latent class analysis sobware Product Data/distribu*onal assump*ons Covariates* Complex
Sampling*
Sawtooth So_ware Regression (discrete choice, ranks), Max-‐Diff No No
Q Numeric, Binary, Categorical, Ranks, Par:al Ranks, Ranks with Ties, Max-‐Diff, Regression (linear, discrete choice, ranks, par:al ranks, ranks with :es, best-‐worst), Mixed data
No No
Limdep Regression (linear, discrete choice, censored, ranks, par:al ranks, counts, survival, etc.)
Yes No
SAS (PROC LCA/LTA/Mixed)
Numeric, Binary, Categorical, Growth, Regression (discrete choice, ranks, par:al ranks)
Yes Yes
MPlus Numeric, Binary, Categorical, Ordered, Categorical, Counts, Mixed data
Yes Yes
Latent gold/Latent Gold Choice
Numeric, Binary, Categorical, Growth, Ranks, Par:al Ranks, Counts, Regression (linear, discrete choice, censored, ranks, par:al ranks)
Yes Yes
* Covariates and the ability to handle complex sampling can be relevant when applying latent class analysis to non-‐segmenta:on problems (e.g., crea:ng predic:ve models).
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Cluster Analysis
Latent Class Analysis
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Thank you
Tim Bock Q
Tim Bock, Q, Australia Festival of NewMR 2012 – Training Day – Session 1
Tim Bock, Q www.q-researchsoftware.com
tim.bock@q-researchsoftware.com +61 425 241 989
Recommended