Upload
magnify-analytic-solutions
View
524
Download
0
Embed Size (px)
DESCRIPTION
David Corliss, PhD originally presented "Dynamically Evolving Systems: Cluster Analysis Using Time" at the SAS Global Forum in 2012.
Citation preview
SAS Global ForumSAS Global ForumApril 24, 2012April 24, 2012
Dynamically Evolving Dynamically Evolving Systems:Systems:
Cluster Analysis Using TimeCluster Analysis Using Time
David J. Corliss, PhDDavid J. Corliss, PhD
Magnify Analytic SolutionsMagnify Analytic SolutionsWINNER
BEST CONTRIBUTED STATS PAPER
2012 SAS GLOBAL FORUM
WINNER BEST
CONTRIBUTED STATS PAPER
2012 SAS GLOBAL FORUM
OUTLINE
Overview and HistoryOverview and History
A Basic ExampleA Basic Example
Plotting the ResultsPlotting the Results
StandardizationStandardizationNon-Periodic Non-Periodic
EventsEventsSummarySummary
A Typical Cluster Analysis in SAS
title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
An Example of PROC FASTCLUS
title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
An Example of PROC FASTCLUS
title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
An Example of PROC FASTCLUS
title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
An Example of PROC FASTCLUS
title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
An Example of PROC FASTCLUS
title2 'Preliminary Analysis by FASTCLUS'; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
An Example of PROC FASTCLUS
An Example of PROC FASTCLUS
Cluster Analysis AppliedCluster Analysis Appliedto Time Series Datato Time Series Data
**** NOAA Precipitation Data ****; data work.noaa; infile "/home/sas/NESUG/noaa_mi_1950_2009_tab.txt" dsd dlm='09'x lrecl=1500 truncover firstobs=2; input state_code :3.0 division :3.0 year_month :$6. pcp :6.2 ; length year 8.0 month 8.0; year = left(year_month,1,4); month = right(year_month,5,2); run;
Plotting the ResultsPlotting the Results
goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
Plotting the ResultsPlotting the Results
goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
Plotting the ResultsPlotting the Results
goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
Plotting the ResultsPlotting the Results
goptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none; proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
Cluster Analysis AppliedCluster Analysis Appliedto Time Series Datato Time Series Data
Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
Rate of Change VariablesRate of Change Variables**** interval percent change ****; proc sort data=work.doe; by year week;run; data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run; data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
Standardization of VariablesStandardization of Variables
proc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run; proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change; run;
Use of PROC STANDARDUse of PROC STANDARD
proc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run; proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change; run;
Use of PROC STANDARDUse of PROC STANDARD
proc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run; proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change; run;
Standardization of Volatile DataStandardization of Volatile Data
Identification of SeasonsIdentification of Seasons
Blue: Post-holiday lull
Purple: Winter Season
Red: Spring Run-upAmber: Summer Driving
SeasonYellow: Holiday Spikes
Green: Fall Season
Non-Periodic EventsNon-Periodic Eventsof Variable Durationof Variable Duration
**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
One-Time EventsOne-Time Eventsof Variable Durationof Variable Duration
**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
Non-Periodic EventsNon-Periodic Eventsof Variable Durationof Variable Duration
**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
Non-Periodic EventsNon-Periodic Eventsof Variable Durationof Variable Duration
**** Time series of high velocity absorption events ****; data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run; (and similarly for the last date of each event, which are merged with the first date) data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
Absolute versusAbsolute versusRelative MeasuresRelative Measures
Time Series Clustering Time Series Clustering of Non-Periodic Eventsof Non-Periodic Events
SummaryThe SAS procedures developed for cluster analysis can be applied to events that change over time to identify distinct, successive stages in dynamically evolving systems.
Both cyclical and non-repeating events may be analyzed.
Normalization of fields through the use of PROC STANDARD may be necessary prior to cluster analysis to obtain the best results.
References
Fisher, R.A., 1936, Annals of Eugenics, 7, 2, 179
Tryon, R. C., 1939, Cluster analysis. Ann Arbor: Edwards Brothers
The SAS Institute, Cary, N.C. www.support.sas.com
Questions
David J. Corliss
http://astro1.panet.utoledo.edu/~dcorliss