Transcript
Page 1: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Time Series Nonlinear Forecasting; Visualization; Applications

CSE 6242 / CX 4242 Mar 27, 2014

Duen Horng (Polo) ChauGeorgia Tech

Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song

Page 2: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Last TimeSimilarity search"

• Euclidean distance"• Time-warping"

Linear Forecasting"

• AR (Auto Regression) methodology"• RLS (Recursive Least Square)

= fast, incremental least square

2

Page 3: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

This TimeLinear Forecasting"

• Co-evolving time sequences"Non-linear forecasting"

• Lag-plots + k-NN"Visualization and Applications

3

Page 4: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Co-Evolving Time Sequences• Given: A set of correlated time sequences • Forecast ‘Repeated(t)’

Num

ber

of p

acke

ts

0

23

45

68

90

Time Tick

1 4 6 9 11

sentlostrepeated

??

Page 5: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Solution:

Q: what should we do?

Page 6: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Solution:

Least Squares, with • Dep. Variable: Repeated(t) • Indep. Variables:

• Sent(t-1) … Sent(t-w); • Lost(t-1) …Lost(t-w); • Repeated(t-1), Repeated(t-w)

• (named: ‘MUSCLES’ [Yi+00])N

umbe

r of

pa

cket

s

023456890

Time Tick

1 4 6 9 11

sentlostrepeated

Page 7: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Forecasting - Outline

• Auto-regression • Least Squares; recursive least squares • Co-evolving time sequences • Examples • Conclusions

Page 8: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Examples - Experiments• Datasets

– Modem pool traffic (14 modems, 1500 time-ticks; #packets per time unit)

– AT&T WorldNet internet usage (several data streams; 980 time-ticks)

• Measures of success – Accuracy : Root Mean Square Error (RMSE)

Page 9: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Accuracy - “Modem”

MUSCLES outperforms AR & “yesterday”

RMSE

0

1

2

3

4

Modems1 2 3 4 5 6 7 8 9 10 11 12 13 14

ARyesterdayMUSCLES

Page 10: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Accuracy - “Internet”RMSE

0

0.35

0.7

1.05

1.4

Streams1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ARyesterdayMUSCLES

MUSCLES consistently outperforms AR & “yesterday”

Page 11: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Linear forecasting - Outline

• Auto-regression • Least Squares; recursive least squares • Co-evolving time sequences • Examples • Conclusions

Page 12: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Conclusions - Practitioner’s guide

• AR(IMA) methodology: prevailing method for linear forecasting

• Brilliant method of Recursive Least Squares for fast, incremental estimation.

Page 13: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Resources: software and urls

• MUSCLES: Prof. Byoung-Kee Yi: http://www.postech.ac.kr/~bkyi/ or [email protected]

• R http://cran.r-project.org/

Page 14: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Books

• George E.P. Box and Gwilym M. Jenkins and Gregory C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice Hall, 1994 (the classic book on ARIMA, 3rd ed.)

• Brockwell, P. J. and R. A. Davis (1987). Time Series: Theory and Methods. New York, Springer Verlag.

Page 15: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Additional Reading

• [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003

• [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE 2000. (Describes MUSCLES and Recursive Least Squares)

Page 16: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Outline

• Motivation • ... • Linear Forecasting • Non-linear forecasting • Conclusions

Page 17: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Chaos & non-linear forecasting

Page 18: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Reference:

[ Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002.]

Page 19: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Detailed Outline

• Non-linear forecasting – Problem – Idea – How-to – Experiments – Conclusions

Page 20: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Recall: Problem #1

Given a time series {xt}, predict its future course, that is, xt+1, xt+2, ...Time

Value

Page 21: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Datasets

Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]

time

x(t)

Lag-plotARIMA: fails

Page 22: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

How to forecast?

• ARIMA - but: linearity assumption

Lag-plotARIMA: fails

Page 23: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

How to forecast?

• ARIMA - but: linearity assumption "

"

• ANSWER: ‘Delayed Coordinate Embedding’ = Lag Plots [Sauer92]

~ nearest-neighbor search, for past incidents

Page 24: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

General Intuition (Lag Plot)

xt-1

xtLag = 1, k = 4 NN

Page 25: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

General Intuition (Lag Plot)

xt-1

xt

New Point

Lag = 1, k = 4 NN

Page 26: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

General Intuition (Lag Plot)

xt-1

xt

4-NNNew Point

Lag = 1, k = 4 NN

Page 27: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

General Intuition (Lag Plot)

xt-1

xt

4-NNNew Point

Lag = 1, k = 4 NN

Page 28: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

General Intuition (Lag Plot)

xt-1

xt

4-NNNew Point

Interpolate these…

Lag = 1, k = 4 NN

Page 29: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

General Intuition (Lag Plot)

xt-1

xt

4-NNNew Point

Interpolate these…

To get the final prediction

Lag = 1, k = 4 NN

Page 30: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Questions:

• Q1: How to choose lag L? • Q2: How to choose k (the # of NN)? • Q3: How to interpolate? • Q4: why should this work at all?

Page 31: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q1: Choosing lag L

• Manually (16, in award winning system by [Sauer94])

Page 32: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q2: Choosing number of neighbors k

• Manually (typically ~ 1-10)

Page 33: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q3: How to interpolate?

How do we interpolate between the k nearest neighbors? A3.1: Average A3.2: Weighted average (weights drop with distance - how?)

Page 34: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q3: How to interpolate?

A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition)

Xt-1

xt

Page 35: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q3: How to interpolate?

A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition)

Xt-1

xt

Page 36: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q3: How to interpolate?

A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition)

Xt-1

xt

Page 37: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q3: How to interpolate?

A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition)

Xt-1

xt

Page 38: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Q4: Any theory behind it?

A4: YES!

Page 39: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Theoretical foundation

• Based on the ‘Takens theorem’ [Takens81] • which says that long enough delay vectors can

do prediction, even if there are unobserved variables in the dynamical system (= diff. equations)

Page 40: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Detailed Outline

• Non-linear forecasting – Problem – Idea – How-to – Experiments – Conclusions

Page 41: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Datasets

Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]

time

x(t)

Lag-plot

Page 42: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Datasets

Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]

time

x(t)

Lag-plotARIMA: fails

Page 43: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Logistic Parabola

Timesteps

Value

Our Prediction from here

Page 44: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Logistic Parabola

Timesteps

Value

Comparison of prediction to correct values

Page 45: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Datasets

LORENZ: Models convection currents in the air dx / dt = a (y - x) dy / dt = x (b - z) - y dz / dt = xy - c z

Value

Page 46: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

LORENZ

Timesteps

Value

Comparison of prediction to correct values

Page 47: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Datasets

Time

Value

• LASER: fluctuations in a Laser over time (used in Santa Fe competition)

Page 48: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Laser

Timesteps

Value

Comparison of prediction to correct values

Page 49: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Conclusions

• Lag plots for non-linear forecasting (Takens’ theorem)

• suitable for ‘chaotic’ signals

Page 50: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

References

• Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002.

• Sauer, T. (1994). Time series prediction using delay coordinate embedding. (in book by Weigend and Gershenfeld, below) Addison-Wesley.

• Takens, F. (1981). Detecting strange attractors in fluid turbulence. Dynamical Systems and Turbulence. Berlin: Springer-Verlag.

Page 51: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

References

• Weigend, A. S. and N. A. Gerschenfeld (1994). Time Series Prediction: Forecasting the Future and Understanding the Past, Addison Wesley. (Excellent collection of papers on chaotic/non-linear forecasting, describing the algorithms behind the winners of the Santa Fe competition.)

Page 52: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Overall conclusions

• Similarity search: Euclidean/time-warping; feature extraction and SAMs

• Linear Forecasting: AR (Box-Jenkins) methodology;

• Non-linear forecasting: lag-plots (Takens)

Page 53: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Must-Read Material• Byong-Kee Yi, Nikolaos D. Sidiropoulos,

Theodore Johnson, H.V. Jagadish, Christos Faloutsos and Alex Biliris, Online Data Mining for Co-Evolving Time Sequences, ICDE, Feb 2000.

• Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity Estimation Using Query Feedbacks, SIGMOD 1994

Page 54: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Thanks

Deepay Chakrabarti (CMU) "

"

Spiros Papadimitriou (CMU) "

"

Prof. Byoung-Kee Yi (Pohang U.)

""

Prof. Dimitris Gunopulos (UCR) "

"

Mengzhi Wang (CMU)

Page 55: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Time Series Visualization + Applications

47

Page 56: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Why Time Series Visualization?Time series is the most common data type"

• But why is time series so common?

48

Page 57: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

How to build time series visualization?

Easy way: use existing tools, libraries"

• Google Public Data Explorer (Gapminder)http://goo.gl/HmrH"

• Google acquired Gapminder http://goo.gl/43avY(Hans Rosling’s TED talk http://goo.gl/tKV7)"

• Google Annotated Time Line http://goo.gl/Upm5W"

• Timeline, from MIT’s SIMILE projecthttp://simile-widgets.org/timeline/"

• Timeplot, also from SIMILEhttp://simile-widgets.org/timeplot/"

• Excel, of course49

Page 58: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

How to build time series visualization?

The harder way:"

• R (ggplot2)"• Matlab"• gnuplot"• ..."

The even harder way:"

• D3, for web"• JFreeChart (Java)"• ...

50

Page 59: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Time Series VisualizationWhy is it useful?"

When is visualization useful?"

(Why not automate everything? Like using the forecasting techniques you learned last time.)

51

Page 60: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Time Series User Tasks• When was something greatest/least?

• Is there a pattern?

• Are two series similar?

• Do any of the series match a pattern?

• Provide simpler, faster access to the series

• Does data element exist at time t ?

• When does a data element exist?

• How long does a data element exist?

• How often does a data element occur?

• How fast are data elements changing?

• In what order do data elements appear?

• Do data elements exist together?

Muller & Schumann 03 citing MacEachern 95

Page 61: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

horizontal axis is time

Page 62: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

http://www.patspapers.com/blog/item/what_if_everybody_flushed_at_once_Edmonton_water_gold_medal_hockey_game/

Page 63: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

http://www.patspapers.com/blog/item/what_if_everybody_flushed_at_once_Edmonton_water_gold_medal_hockey_game/

Page 64: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

Gantt Chart Useful for project

How to create in Excel: http://www.youtube.com/watch?v=sA67g6zaKOE

Page 65: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

ThemeRiver Stacked graph Streamgraph

http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.html

http://bl.ocks.org/mbostock/3943967

Page 66: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

TimeSearcher support queries

http://hcil2.cs.umd.edu/video/2005/2005_timesearcher2.mpg

Page 67: Time Series - Georgia Institute of Technologypoloclub.gatech.edu/cse6242/2014spring/lectures/... · 3/27/2014  · Time Series User Tasks • When was something greatest/least?! •

GeoTimehttp://www.youtube.com/watch?v=inkF86QJBdA

59


Recommended