34
SAS Global Forum SAS Global Forum April 24, 2012 April 24, 2012 Dynamically Evolving Dynamically Evolving Systems: Systems: Cluster Analysis Using Time Cluster Analysis Using Time David J. Corliss, PhD David J. Corliss, PhD Magnify Analytic Solutions Magnify Analytic Solutions WINNER BEST CONTRIBUTED STATS PAPER 2012 SAS GLOBAL FORUM

Dynamically Evolving Systems: Cluster Analysis Using Time

Embed Size (px)

DESCRIPTION

David Corliss, PhD originally presented "Dynamically Evolving Systems: Cluster Analysis Using Time" at the SAS Global Forum in 2012.

Citation preview

Page 1: Dynamically Evolving Systems: Cluster Analysis Using Time

SAS Global ForumSAS Global ForumApril 24, 2012April 24, 2012

Dynamically Evolving Dynamically Evolving Systems:Systems:

Cluster Analysis Using TimeCluster Analysis Using Time

David J. Corliss, PhDDavid J. Corliss, PhD

Magnify Analytic SolutionsMagnify Analytic SolutionsWINNER

BEST CONTRIBUTED STATS PAPER

2012 SAS GLOBAL FORUM

WINNER BEST

CONTRIBUTED STATS PAPER

2012 SAS GLOBAL FORUM

Page 2: Dynamically Evolving Systems: Cluster Analysis Using Time

OUTLINE

Overview and HistoryOverview and History

A Basic ExampleA Basic Example

Plotting the ResultsPlotting the Results

StandardizationStandardizationNon-Periodic Non-Periodic

EventsEventsSummarySummary

Page 3: Dynamically Evolving Systems: Cluster Analysis Using Time

A Typical Cluster Analysis in SAS

Page 4: Dynamically Evolving Systems: Cluster Analysis Using Time

title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;

An Example of PROC FASTCLUS

Page 5: Dynamically Evolving Systems: Cluster Analysis Using Time

title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;

An Example of PROC FASTCLUS

Page 6: Dynamically Evolving Systems: Cluster Analysis Using Time

title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;

An Example of PROC FASTCLUS

Page 7: Dynamically Evolving Systems: Cluster Analysis Using Time

title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;

An Example of PROC FASTCLUS

Page 8: Dynamically Evolving Systems: Cluster Analysis Using Time

title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;

An Example of PROC FASTCLUS

Page 9: Dynamically Evolving Systems: Cluster Analysis Using Time

title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;

An Example of PROC FASTCLUS

Page 10: Dynamically Evolving Systems: Cluster Analysis Using Time

An Example of PROC FASTCLUS

Page 11: Dynamically Evolving Systems: Cluster Analysis Using Time

Cluster Analysis AppliedCluster Analysis Appliedto Time Series Datato Time Series Data

**** NOAA Precipitation Data ****; data work.noaa; infile "/home/sas/NESUG/noaa_mi_1950_2009_tab.txt" dsd dlm='09'x lrecl=1500 truncover firstobs=2; input state_code :3.0 division :3.0 year_month :$6. pcp :6.2 ; length year 8.0 month 8.0; year = left(year_month,1,4); month = right(year_month,5,2); run;

Page 12: Dynamically Evolving Systems: Cluster Analysis Using Time

Plotting the ResultsPlotting the Results

goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;

Page 13: Dynamically Evolving Systems: Cluster Analysis Using Time

Plotting the ResultsPlotting the Results

goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;

Page 14: Dynamically Evolving Systems: Cluster Analysis Using Time

Plotting the ResultsPlotting the Results

goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;

Page 15: Dynamically Evolving Systems: Cluster Analysis Using Time

Plotting the ResultsPlotting the Results

goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;

Page 16: Dynamically Evolving Systems: Cluster Analysis Using Time

Cluster Analysis AppliedCluster Analysis Appliedto Time Series Datato Time Series Data

Page 17: Dynamically Evolving Systems: Cluster Analysis Using Time

Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run; 

Page 18: Dynamically Evolving Systems: Cluster Analysis Using Time

Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run; 

Page 19: Dynamically Evolving Systems: Cluster Analysis Using Time

Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run; 

Page 20: Dynamically Evolving Systems: Cluster Analysis Using Time

Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run; 

Page 21: Dynamically Evolving Systems: Cluster Analysis Using Time

Standardization of VariablesStandardization of Variables

proc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run; proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change; run;  

Page 22: Dynamically Evolving Systems: Cluster Analysis Using Time

Use of PROC STANDARDUse of PROC STANDARD

proc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run; proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change; run;  

Page 23: Dynamically Evolving Systems: Cluster Analysis Using Time

Use of PROC STANDARDUse of PROC STANDARD

proc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run; proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change; run;  

Page 24: Dynamically Evolving Systems: Cluster Analysis Using Time

Standardization of Volatile DataStandardization of Volatile Data

Page 25: Dynamically Evolving Systems: Cluster Analysis Using Time

Identification of SeasonsIdentification of Seasons

Blue: Post-holiday lull

Purple: Winter Season

Red: Spring Run-upAmber: Summer Driving

SeasonYellow: Holiday Spikes

Green: Fall Season

Page 26: Dynamically Evolving Systems: Cluster Analysis Using Time

Non-Periodic EventsNon-Periodic Eventsof Variable Durationof Variable Duration

**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;

Page 27: Dynamically Evolving Systems: Cluster Analysis Using Time

One-Time EventsOne-Time Eventsof Variable Durationof Variable Duration

**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;

Page 28: Dynamically Evolving Systems: Cluster Analysis Using Time

Non-Periodic EventsNon-Periodic Eventsof Variable Durationof Variable Duration

**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;

Page 29: Dynamically Evolving Systems: Cluster Analysis Using Time

Non-Periodic EventsNon-Periodic Eventsof Variable Durationof Variable Duration

**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;

Page 30: Dynamically Evolving Systems: Cluster Analysis Using Time

Absolute versusAbsolute versusRelative MeasuresRelative Measures

Page 31: Dynamically Evolving Systems: Cluster Analysis Using Time

Time Series Clustering Time Series Clustering of Non-Periodic Eventsof Non-Periodic Events

Page 32: Dynamically Evolving Systems: Cluster Analysis Using Time

SummaryThe SAS procedures developed for cluster analysis can be applied to events that change over time to identify distinct, successive stages in dynamically evolving systems.

Both cyclical and non-repeating events may be analyzed.

Normalization of fields through the use of PROC STANDARD may be necessary prior to cluster analysis to obtain the best results.

Page 33: Dynamically Evolving Systems: Cluster Analysis Using Time

References

Fisher, R.A., 1936, Annals of Eugenics, 7, 2, 179

Tryon, R. C., 1939, Cluster analysis. Ann Arbor: Edwards Brothers

The SAS Institute, Cary, N.C. www.support.sas.com

Page 34: Dynamically Evolving Systems: Cluster Analysis Using Time

Questions

David J. Corliss

[email protected]

[email protected]

http://astro1.panet.utoledo.edu/~dcorliss