6
Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Embed Size (px)

Citation preview

Page 1: Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Steps to Performing a Cluster Analysis

Rod Funk

Chestnut Health Systems

Bloomington, IL

Page 2: Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Performing a Cluster Analysis • First step is deciding on what variables you want

to cluster on

• Data can be continuous, counts or dichotomous

• Are the variables at one time point or are you wanting to look at trajectories across time– If across time, data will need to be in horizontal format: one row

per adolescent

– We name variables by time with a suffix for wave; _0, for intake, _3 for 3 months (i.e. dcs_0, dcs_3, dcs_6, etc.)

• Cluster analysis also expects there to be data for every variable used in the analysis. If you are missing just one variable for a record, no clusters will be calculated for that record.

Admin
Page 3: Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Handling Missing Data

• Scale Level: In creating a scale that has shown good internal consistency (alpha>.7) we calculate using the average of answers as long was they have 3 valid answers:

– Compute dcs=rnd(mean.3(l3a15d,l3a16d,l3a17d,l3a18d,l3a19d)*5).

• Item level: random replacement of missing values– sort cases by loc xchk1.

– rmv ms2w=median(s2w,2).

– compute ms2w=rnd(ms2w).

– This replaces a missing S2w with the median of the 4 surrounding cases

Admin
Page 4: Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Handling Missing Data

• Replacement of variables across time

• For scales where items not asked: – Use regression on scale using other items in

cluster at that wave along with the intake and last wave values

• For missing a wave of data: As long as it is not the first or last wave, interpolate using the average of the two surrounding waves.

Admin
Page 5: Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Running the Cluster Analysis

• Sample syntax– CLUSTER Zpci_0 Zrpci_3 Zrpci_6 Zrpci_9 Zrpci_12 Zpci_30 Zici_0 Zrici_3 Zrici_6 Zrici_9 Zrici_12 ZSco01 Zmdci_0 Zrdci_3 Zrdci_6 Zrdci_9 Zrdci_12 ZSco02 Zl3v_0 Zrl3d_3 Zrl3d_6 Zrl3d_9 Zrl3d_12 Zl3d_30 Zl3w_0 Zrl3e_3 Zrl3e_6 Zrl3e_9 Zrl3e_12 Zl3e_30 Zmaxce_0 Zrmaxce_3 Zrmaxce_6 Zrmaxce_9 Zrmaxce_12 Zmaxce_30

– /METHOD WARD

– /MEASURE= SEUCLID

– /PRINT SCHEDULE

– /PLOTS NONE

– /SAVE CLUSTER(2,12) .

Admin
Page 6: Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

Demonstration

• Purpose

• To Show how to take the results of the cluster and create a table and figures for validating and deciding on the proper number of clusters.

• Will cover pivot tables in SPSS output, pasting into Excel and graphing in Excel

Admin