MV7.Cluster Analysis

Embed Size (px)

Citation preview

  • 8/13/2019 MV7.Cluster Analysis

    1/16

    Cluster Analysis

    M.ThenmozhiProfessor

    Department of Management StudiesIIT Madras

    [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    2/16

    CLUSTER ANALYSIS

    Searches for the natural groupings

    among objects described by p variables. Within each cluster - high homogenity but

    between clusters - high heterogenity.

    12/6/2013 2DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    3/16

    Data reduction - Information from entire population

    reduced to information about specific smaller sub-groups

    Segmenting market - basis number of variablesIdentifying similar test markets, similar firms, products

    Group personality profilesProduct positioning - brands into groups

    When:

    Large sample of data consisting of many variables - Datarecorded on continuous scale as well as on categoricalscale.

    Purpose

    12/6/2013 3DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    4/16

    Shampoo buying behaviour

    Degree of importance measured on 8variables: brand name, price, availability,

    brand image, Co. Name, advertising,retailer recommendations, family income.Result - Five Clusters

    12/6/2013 4DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    5/16

    Key patterns Classi f i cation1. High importance to price, brand,

    image, family, advt., influence,

    availability

    Conservative

    2. High importance to priceModerate product & Co. imageLess dependence on advtg. &

    retailer reco.

    Value formoney

    3. High brand imageModerate advtg. & Co. nameLow family influence & retailer

    reco

    Brandconscious &

    personal choice

    4. High brand image, loyalty, familyinfluence and low price

    Habitual purchaser

    5. High availability and low brandloyalty

    Switcher12/6/2013 5DoMS, IITM

    [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    6/16

    Stage 1:PartitioningHow should inter-object similarity be measured?

    Correlation coefficient/Distance

    What procedure(algorithm) should be used to place similar objects into groups?

    Hierarchical or Non-Hierarchical

    How many Clusters?

    12/6/2013 6DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    7/16

    Single linkage rule Complete linkage rule Average linkage rule Ward's method

    AgglomerativeType title here

    Divisive

    Hierarchical

    12/6/2013 7DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    8/16

    Agglomerative method :Hierarchical clustering procedure which starts with each object inseparate clusters. Subsequently, clusters closesttogether - aggregated.

    Single linkage method : Procedure based onminimum distance. Finds 2 objects with theshortest distance placed in one cluster - processcontinues until all objects are placed in onecluster.

    12/6/2013 8DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    9/16

    Complete linkage rule : Maximum distance Average linkage rule : Average distance

    Wards method : Distance b/w 2 clusters is thesum of the squares b/w the 2 clusters summedover all variables.

    Centroid method : Distance b/w 2 clusters is thedistance b/w the centroids - less affected outliers,requires metric data

    12/6/2013 9DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    10/16

    Stage 2: Interpretation - naming, using average scores(for each

    variable/cluster)- Significance

    Stage 3:

    Profiling - describing the characteristics of each cluster-explain how they may differ on relevantdimensions

    12/6/2013 10DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    11/16

    Mahalanobis distance : Standardized form of E.D. Data isstandardized by scaling responses in terms of standarddeviations and adjustments are made for inter-correlation

    b/w variables.

    Euclidean Distance

    D15 = sqrt{(X 11 - X 15)2

    + (X 21 - X 25)2

    + ... + (X n1 - X n5)2

    }

    X11 : Respondent scoren: number of variablesMatrix of inter-respondent distances.

    If variables are categorical - distance - no. of questions on

    which 2 respondents gave the same answers 12/6/2013 11DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    12/16

    Dendrogram : A tree graph - Graphicalrepresentation of the results of a clustering

    procedure in which the vertical axis consistsof the objects or individuals & thehorizontal axis represents the number of

    clusters formed.

    12/6/2013 12DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    13/16

    Validity:

    Two separate samples - see for similarity of results Use two different C.A. Program Discriminant analysis ANOVA

    12/6/2013 13DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    14/16

    Group M eans & Signif icance level for two groups

    MeansVariableCluster 1 Cluster2

    F.Ratio Level of Significance

    X1 4.94 8.82 55.4 0.0001

    X2 6.11 3.10 34.1 0.0001

    X3 13.52 17.81 68.6 0.0001

    X4 10.87 9.88 2.6 0.1160

    X5 5.52 5.92 1.0 0.3212

    X6 5.35 5.07 0.4 0.5183

    12/6/2013 14DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    15/16

    X1, X 2, X 3 - Interpretation - LabelingX4, X 5, X 6 - Not significant

    E.I.D. Parry

    200 respondents: Two most crucial factors considered

    while buying a particular brand of genset. 60%-initial cost of genset, running & maintenance cost Cluster analysis performed - ranking on 9 point scale -

    most important to least important. Sample size 50: split into two. Euclidean distance,

    single linkage rule.

    12/6/2013 15DoMS, [email protected]

  • 8/13/2019 MV7.Cluster Analysis

    16/16

    Six clusters I nitial Cost Running & maintenance cost

    1 Very low Very low

    2 Low Very low

    3 Moderate Low

    4 Low Moderate

    5 High Very low

    6 High Moderate

    12/6/2013 16DoMS, [email protected]