Upload
joel-benson
View
213
Download
0
Embed Size (px)
Citation preview
Steps to Performing a Cluster Analysis
Rod Funk
Chestnut Health Systems
Bloomington, IL
Performing a Cluster Analysis • First step is deciding on what variables you want
to cluster on
• Data can be continuous, counts or dichotomous
• Are the variables at one time point or are you wanting to look at trajectories across time– If across time, data will need to be in horizontal format: one row
per adolescent
– We name variables by time with a suffix for wave; _0, for intake, _3 for 3 months (i.e. dcs_0, dcs_3, dcs_6, etc.)
• Cluster analysis also expects there to be data for every variable used in the analysis. If you are missing just one variable for a record, no clusters will be calculated for that record.
Handling Missing Data
• Scale Level: In creating a scale that has shown good internal consistency (alpha>.7) we calculate using the average of answers as long was they have 3 valid answers:
– Compute dcs=rnd(mean.3(l3a15d,l3a16d,l3a17d,l3a18d,l3a19d)*5).
• Item level: random replacement of missing values– sort cases by loc xchk1.
– rmv ms2w=median(s2w,2).
– compute ms2w=rnd(ms2w).
– This replaces a missing S2w with the median of the 4 surrounding cases
Handling Missing Data
• Replacement of variables across time
• For scales where items not asked: – Use regression on scale using other items in
cluster at that wave along with the intake and last wave values
• For missing a wave of data: As long as it is not the first or last wave, interpolate using the average of the two surrounding waves.
Running the Cluster Analysis
• Sample syntax– CLUSTER Zpci_0 Zrpci_3 Zrpci_6 Zrpci_9 Zrpci_12 Zpci_30 Zici_0 Zrici_3 Zrici_6 Zrici_9 Zrici_12 ZSco01 Zmdci_0 Zrdci_3 Zrdci_6 Zrdci_9 Zrdci_12 ZSco02 Zl3v_0 Zrl3d_3 Zrl3d_6 Zrl3d_9 Zrl3d_12 Zl3d_30 Zl3w_0 Zrl3e_3 Zrl3e_6 Zrl3e_9 Zrl3e_12 Zl3e_30 Zmaxce_0 Zrmaxce_3 Zrmaxce_6 Zrmaxce_9 Zrmaxce_12 Zmaxce_30
– /METHOD WARD
– /MEASURE= SEUCLID
– /PRINT SCHEDULE
– /PLOTS NONE
– /SAVE CLUSTER(2,12) .
Demonstration
• Purpose
• To Show how to take the results of the cluster and create a table and figures for validating and deciding on the proper number of clusters.
• Will cover pivot tables in SPSS output, pasting into Excel and graphing in Excel