Upload
phungngoc
View
253
Download
0
Embed Size (px)
Citation preview
UNDERSTANDING CONSUMER PREFERENCE
WITH MULTIVARIATE VISUALISATION
DAVID ARNOLD
20TH JUNE 2013
THE PRODUCT DESIGN PROCESS
PRODUCT MARKET INSTRUMENTAL CONSUMER
micro-structure
physical/chemical
performance Liking/in-use perception
ingredients
molecules
formulation
processing
cost
purchase behaviour
Sensory Descriptive Analysis
Internal Preference Mapping
predict
optimise
Sensory description
Sensory discrimination
External Preference
Mapping
SENSORY
TALK OUTLINE
• Sensory description (Product Profiling)
• Trained panels
• Untrained panels
• Consumer Preference
• Preference Mapping
• Drivers Analysis
PRODUCT PROFILING OBTAINING A QUANTITATIVE
DESCRIPTION OF PRODUCTS
TRAINED PANELS
• Trained assessors (panel monitoring)
• Well defined scales (0-100, VAS)
• Experimental design and carry over issues (order effects)
• We want to be able to compare across panel sittings
Analysis of Variance (adjustment for multiple testing, Fisher Analysis)
Uni-variate testing, ignores dependencies between questions
Multivariate analysis of variance (multivariate normal distribution assumption)
We would also like to visualise the interdependencies in the data and preferably
on a two dimensional map...
PRINCIPAL COMPONENT BIPLOTS
p x q p x k k x q
p x q p x 2 2 x q
We would like a good two dimensional representation to graph,
that captures as much information as possible.
Principal component analysis allows us to do this...
• Groups of similar products.
• Groups of similar questions.
• The relative associations between products and questions.
• Relative sizes of the differences between the products.
X is a mean centred matrix where rows are products and columns questions
Capturing,
PRINCIPAL COMPONENT ANALYSIS (PCA)
1
1
2
11v
1
1
2
12v
A PCA finds linear combinations of the questions that explain the
maximum variation in the data, with as few principal axes as possible.
The first axis explains the highest possible amount of variation on a single axis.
The next axis finds the direction explaining the maximum remaining unexplained
variation and so on.
XVY 1VV T
q
2
1
y
If the first two PCs explain a high proportion
of the variation then we can use Y and V to
represent X on a two dimensional plot.
Columns of X are mean centred, so the average scores
are now zero.
V is chosen to produce uncorrelated
principal axes (components) Y.
-60 -40 -20 0 20
-10
01
02
03
0
Individuals factor map (PCA)
Dim 1 (83.71%)
Dim
2 (
13
.35
%)
AB
C
D
E
REPRESENTING THE PRODUCTS IN TWO DIMENSIONS
5 x 2
Distances between products
are proportional to the
Malahanobis distance
-10 -5 0 5 10 15 20
-50
5
Variables factor map (PCA)
Dim 1 (83.71%)
Dim
2 (
13
.35
%)
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
Coldness
Comfort
Crumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
SheenStickiness
Tone.of.Colour
Visibility
Visual.Consistency
Visual.Texture
Wetness
REPRESENTING THE QUESTIONS IN TWO DIMENSIONS
Vector length approximates variance
15 x 2
HOWEVER THE TWO MAPS ARE RELATED
A
B
C
B > A > C
Each element of X is represented
by multiplying a row of y and a row
of v.
So the projections of the rows of y
onto the rows of v give the
relative ordering of the product mean
scores on each question.
Y2 and V2 can be overlaid to produce a PCA biplot....
PCA BIPLOT
This plot is produced using standardised X
So now product distances are Euclidean not Malahanobis.
Vectors are all the same length
Almost zero correlation between
Residual Look and Residual Feel
Almost perfect correlation between
Tone of Colour and Visual Consistency
Angles between vectors are still correlations
Projections give relative ordering of products on questions
PCA BIPLOT
• Question scoring scales are calibrated to be linear and continuous.
• Panellists are trained and checked for scoring consistency and
reproducibility.
• The intrinsic dimensionality of the data is well represented on two
(or possibly three) axes.
PCA biplots produce a good representation of this kind of data.
What if my data is not continuous
or is intrinsically higher than two or three dimensions?
We can use a general class of models termed Multidimensional Scaling
MULTIDIMENSIONAL SCALING (MDS)
Finds a configuration of points representing products
Like PCA it finds a low dimensional representation of the data.
• Better representation in 2D of intrinsically high dimensional data than PCA
• Can represent non metric data such as ordered ranks
However, unlike PCA it tries to find the best representation possible in the
chosen number of dimensions, usually two.
Distances between pairs of points no longer approximate the original
dissimilarities (distances in multidimensional space)
Instead the rank order of the distances between pairs of points
match the rank order of the dissimilarities (as well as possible)
-3 -2 -1 0 1 2
-1.0
-0.5
0.0
0.5
1.0
1.5
MDS1
MD
S2
A
BC
D
E
Coldness
ComfortCrumbling
Dragginess
Evenness.of.spread
Glidability
Residue.FeelResidue.Look
Sheen
Stickiness
Tone.of.Colour
Visibility
Visual.ConsistencyVisual.Texture
Wetness
MDS ON THE SENSORY DATA
The product relationships are
reproduced as closely as possible
in two dimensions.
Questions are overlaid as weighed
averages across the products
Its all about proximity
UNTRAINED PANELS FREE CHOICE PROFILING
FREE CHOICE PROFILING (FCP)
FCP provides an indication of which attributes are likely to be perceived by
consumers
• It uses untrained consumers
• There is no need to maintain a sensory panel
• Less resource required
WHAT’S INVOLVED?
• Develop individual vocabularies
• Incorporate individual vocabularies into score sheets
• Individuals assess samples using their personalised score sheets
• Individuals must score the same objects
• Individuals need not record the same attributes or number of attributes
Generalised Procustes Analysis (GPA) compares the differing panellist
data without disturbing the relationships contained inside the sets
Principal Components Analysis is used to produce individual assessor
configurations.
GENERAL PROCUSTES ANALYSIS
Individual's configurations are translated, rotated and scaled to minimise
the sum of squares for individual deviations
» maximises product deviations
» maximises the agreement between individuals on each product
Example
10 Spanish consumers scoring 10 washing up liquids using their own
sensory attributes
creamy/moisturizing
Spanish English
Cremoso Creamy
Suave Smooth
Deslizable Flowly
Espumosa Foamy
Pegajoso Sticky
Absorbible Absorbable
Espesa Thick
Consistente Consistent
Líquido Liquid/ Fluid
Transparente Transparent
Liviano Light
Acuoso Watery
Grasoso Greasy
Abundante Abundant/ Plentiful
Áspera Rough
Opaco Opaque
Aceitoso Oily
Denso Thick
Humectada Moisturizing
Ligero Light
Rápido Quick
Resbaladizo Slippery
Adherente Adherent
Fluida Fluid
Gelatinoso Gelatinous
Perdurable Long Lasting
Pesado Heavy
Seca Dry
Translúcido Tanslucent
...
light/smooth/opaque
thick/abundant/consistent
Absorbable/foamy
with 95% confidence
intervals
CONSUMER PREFERENCE PREFERENCE MAPPING AND
DRIVERS ANALYSIS
WHAT IS PRODUCT PREFERENCE MAPPING?
• Models of preferential choice
• The data is scores such as rank order of preference for different panellists
for a set of products
• Individuals are untrained consumers
Internal Preference Mapping
• Used to find consumer clusters
sharing a product preference
• Data consists of a products x consumer
scores
External Preference Mapping
• Predict maximum consumer
acceptance for a set of prototypes
• Uses an external data set to predict
consumer acceptance
Assumptions
• Different individuals perceive the products in the same way
• but individuals differ as to what an ideal combination of the products attributes are
INTERNAL PREFERENCE MAPPING
We try to represent products and consumers on a common
underlying scale and visually examine the relationship between them.
We locate individuals and products as points in a joint space
It says that an individual will pick the product in the set closest to their ideal point.
MULTIDIMENSIONAL UNFOLDING
1 2 A B C D E
1st 2nd 3rd 4th 5th
Judge 1 B C A E D
Judge 2 A B C E D
taken from Multidimensional Scaling, Cox and Cox (2005)
Suppose we have 2 judges and 5 essays for the judges to rank
A B
C E D
Judge 1 rankings
Unfolding (common ordering)
Judge 2 rankings
A
B C E D
We represent the consumers and
products on a two dimensional
map maintaining as accurately as
possible the ranks of each
consumer
BREAKFAST EXAMPLE
-10 -5 0 5 10
-10
-50
51
0
Joint Configuration Plot
Configurations D1
Co
nfig
ura
tio
ns D
2
toast
butoast
engmuff
jdonut
cintoast
bluemuff
hrolls
toastmarm
butoastj
toastmarg
cinbun
danpastry
gdonut
cofcake
cornmuff
12
3
4
5
6
78
9
101112
131415
16
17
18
19
20
2122
23
2425
2627
28
29
30
31
32
3334
35
3637
38
39
40
4142
42 individuals were asked to order 15 breakfast items
based on their preference.
similar preference These individuals
had a high preference for
Danish pastry
taken from the smacof R package
EXTERNAL PREFERENCE MAPPING
In addition to the consumer data (internal data), an external set
of data is present which describes the products in some way.
Sensory data should have prioritisation of the attributes based
on their importance to consumers
and have dimensions that pertain to preference.
We can use PCA to create the product sensory map.
The sensory axes are regressed onto the consumer preference
e.g. by a quadratic model using the PCs.
This data is used in some form of regression model to give meaning
to the axes on the map.
External data is usually sensory data and/or product technical data.
EXTERNAL PREFERENCE MAP FOR CONDITIONERS
DRIVERS ANALYSIS
DRIVERS ANALYSIS
We have seen some ways to represent the correlations and distances
between products, sensory attributes and consumer preference.
Can we gain further insights into the structure of the data?
Factor analysis and regression
Similar to PCA but we seek an interpretation of the components as
hidden or latent variables to explain the observed correlations
between questions
Observed questions (measurements)
Latent variables (can’t measure directly but are represented indirectly through the observed answers to our questions
Each observation has a score for each factor.
These can be regressed onto a measure of product liking.
GRAPHICAL MODELS (GM)
These are probability models for multivariate random observations,
whose independence structure is characterised by a graph.
Consist of (data, probability model) -> likelihood function
Gives a measure of the relative support the data gives for different model
formulations
x3 x1
x2
1960980
9601980
9809801
..
..
..
S
102.071.0
02.0168.0
71.068.011S
Idea of conditional independence
This holds for sets of random variables in a
joint distribution. (global Markov property)
GRAPHICAL MODELS (GM)
The correlations hide the probable structure
between the variables.
Makes my hair feel soft
Moisturises my hair
Allows my hair to move naturally
ADVANTAGES OF GRAPHICAL MODELS
Interpretation
Conditioning is the key concept underpinning GM.
Conditional independence gives a visualisation of the structure of the data
not just the correlations.
Unification
GM provides a unified framework for continuous data (correlation) and discrete data
(contingency table), and this unification suggests generalisation to mixed
variable systems.
UNDERSTANDING THE DRIVERS OF CONSUMER PREFERENCE
Products 10 (commercially available) laundry bars incomplete mixture of 9 ingredients
Sensory Data
19 panellists 32 attributes (day 1 & day 3) complete sequential evaluation
Liking Data Location 1 - soft water - 339 households Location 2 - hard water - 323 households complete sequential evaluation
Hand Wash Bars
A factor analysis was performed on the sensory data to reduce the number of attributes
FACTOR ANALYSIS
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
factor #
att
ribute
pattern loadings visualisation
Brightness1 Hardness1 Dryness1
Grip1 Rubbing1 Disintegration1
Grit1 Lather1
Soapiness1 Perfume1
Breaks1 Effort1 RinsWater1
BucketDep1 Economy1
Brightness3 Hardness3 Dryness3
Grip3 Rubbing3 Disintegration3
Grit3 Lather3
Soapiness3 Perfume3
Breaks3 Effort3 RinsWater3
BucketDep3 Economy3
1 2 3 4 5
perfume grip/economy grit/deposits
engagement
tactile feel
Highlight are the most important correlations with the factors
The factors were rotated
to align them as well as
possible with the
sensory questions.
This also induces
correlation
between them.
CONDITIONAL INDEPENDENCE (SENSORY FACTORS & LIKING)
Location 1 Liking driven by Engagement and tactile feel
Location2 liking driven by Engagement and Grip/Economy
Pref Location 1
Pref Location 2
We then related the formulation profiles of the test products
to match the preference drivers from the study.
USEFUL RESOURCES
Sensory and Preference Mapping in R
SensoMineR http://sensominer.free.fr
FactoMineR
smacof Preference mapping using unfolding (MDS)
Graphical Modelling
R packages gRim and gRain
MIM (Mixed Integer Modelling http://www.hypergraph.dk/)
Thank you