Cluster Analysis Forming Groups within the Sample of Respondents

Cluster Analysis

Forming Groups within the Sample of Respondents

Cluster vs. Factor Analysis

• Factor analysis for groups of items, identifying common traits underlying their ranges across respondents’ scores.

• Cluster analysis forms groups of respondents, based on the similarity of responses to “independent” items.

K-Means Cluster

• Procedural, judgmental approach, where you compare results of 2, 3, 4… cluster solutions.

• Best suited when you have a set of (6 or more) continuous, or interval coded, variables…– that have low and non-significant inter-correlation—

near independence, and…– good range of responses across sample.

• Produces cluster scores for subsequent analysis.

Validity of Clusters

• Examine the relative sizes and composition of the clusters—are the sizes helpful?

• Do the clusters have face validity? Can you assign names to segments produced from the analysis based on the means on the individual items?

• Can you add new items and retain the same clusters.

Reliability of the Clusters

• Are the clusters “stable” across different sets of (randomly assigned) respondents?

• Are the clusters “stable” with the inclusion or deletion of items.

• Can significant differences be shown from an ANOVA across means on each of the items used in the cluster analysis?

• Do the clusters illustrate differences in responses to separate items?

Limitations of Cluster

• Is largely dependent on the composition of sample, different composition of the sample will produce different clusters.

• Has a well-deserved poor reputation as an a-theoretical approach toward classification and data analysis.

• Best suited for exploratory research designed to exaggerate differences between groups of respondents.

• Addicting, creative approach to forming segments.

Suggested Technique to Create an Understandable Cluster Analysis

• Start with a subset of the questionnaire items used to form clusters.

• Start with 2 clusters, and increase to 3, then 4, examining changing clusters and sizes of each.

• Include additional items one at a time—do the cluster definitions improve in consistency?

Discriminant Analysis

• Cluster analysis forms classification, or categorical variable based on responses to continuous variables.

• Discriminant analysis takes a pre-determined classification variable and identifies continuous variables that show significant differences.

Appropriate Analyses for Project

• Descriptive Statistics– Frequencies on items, particularly those

showing popularity, strongest sentiments, importance.

• “Bivariate” Statistics– ANOVA, F-statistics for differences in means– T-tests for comparisons of means between

two groups– Cross-tabulations of categorical, nominally

coded items.

Multiple Regression

• Limited number of continuous variables that would appropriate/interesting as dependent variables for the Alltel project.

• Best: – Willingness to pay $xx for a certain carrier

service– “Must have” vs. “Don’t need” for features

items

Measurement Issues

• Correlations

• Reliability—mean inter-item correlations

• Factor Analysis– Data reduction to subscales and underlying

traits through explained variance.– Factor loadings (pattern matrices)– Later, we’ll use factor scores for visual plots

(perceptual mapping)

Classification Methods

• Discriminant Analysis– Categorical, nominally coded dependent

variable– Wilk’s Lambda– Classification results

• Cluster Analysis– Creates categorical variables from sets of

continuous, or interval coded items

Documents

Cluster Analysis Forming Groups within the Sample of Respondents