34
STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays) Office Hours: 10 am - 11 am (Tuesdays and Thursdays) 423 Evans Hall GSI: Brianna Heggeseth Section: 10 am - 12 pm or 12 pm - 2 pm (Fridays) Office Hours and Location: TBA Announcements, Lecture slides, Assignments, etc. will be posted on the course site at bspace. Tuesday, January 17, 12

STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

STAT 153: Introduction to Time SeriesInstructor: Aditya Guntuboyina

Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Office Hours: 10 am - 11 am (Tuesdays and Thursdays) 423 Evans Hall

GSI: Brianna Heggeseth

Section: 10 am - 12 pm or 12 pm - 2 pm (Fridays)

Office Hours and Location: TBA

Announcements, Lecture slides, Assignments, etc.will be posted on the course site at bspace.

Tuesday, January 17, 12

Page 2: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Course Outline

A Time Series is a set of numerical observations, each one being recorded at a specific time.

The aim of this course is to teach you how to analyze such data.

Examples of Time Series data are ubiquitous.

Tuesday, January 17, 12

Page 3: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

1800 1850 1900 1950 2000

0.0e+00

1.0e+08

2.0e+08

3.0e+08

Population of the United States

Year(once every ten years)

US

Pop

ulat

ion

Tuesday, January 17, 12

Page 4: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Course Outline (continued)

There are two approaches to time series analysis:

• Time Domain Approach

• Frequency Domain Approach (also known as the Spectral or Fourier analysis of time series)

Very roughly, 60% of the course will be on Time Domain methods and 40% on Frequency Domain methods.

Tuesday, January 17, 12

Page 5: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Time Domain Approach

Seeks an answer to the following question:

Given the observed time series, how does oneguess future values?

Forecasting or Prediction

Tuesday, January 17, 12

Page 6: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Time Domain (continued)

Forecasting is carried out through three steps:

• Find a MODEL that adequately describes the evolution of the observed time series over time.

• Fit this model to the data. In other words, estimate the parameters of the model.

• Forecast based on the fitted model.

Tuesday, January 17, 12

Page 7: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Time Series Models

Most of our focus in the Time Domain part of the course will be on the following two classes of models:

• Trend + Seasonality + Stationary ARMA

• Differencing + Stationary ARMA (consist of the ARIMA and Seasonal ARIMA models)

In the Time Domain Part of the course, we study these models and learn how to execute each of the three steps outlined in the previous slide with them.

Tuesday, January 17, 12

Page 8: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Time Series Models (continued)

These provide a sturdy toolkit for analyzing manypractical time series data sets.

State-Space models are a modern and very powerful class of time series models.

Forecasting in these models is carried out viaan algorithm known as the Kalman Filter.

We shall spend some time on these models althoughwe will not have time to study them in depth.

Tuesday, January 17, 12

Page 9: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Brightness of a variable star on 600 consecutive nights

day

Brightness

0 100 200 300 400 500 600

05

1015

2025

3035

Frequency Domain Approach

Tuesday, January 17, 12

Page 10: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Frequency Domain (continued)

Based on the idea that the observed time series is made up of CYCLES having different frequencies.

In the Frequency Domain Approach, the data is analyzed by discovering these hidden cycles along

with their relative strengths.

The key tool here is the Discrete Fourier Transform(DFT) or, more specifically, a function of the DFT

known as the Periodogram.

Tuesday, January 17, 12

Page 11: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Frequency Domain (continued)

In the Frequency Domain part of the course, we shall study the periodogram and its performance indiscovering periodicities when the data are indeed

made up of many different cycles.

It turns out that the raw periodogram is often toovariable as an estimator of the true Spectrum and we

shall study methods for improving it.

Tuesday, January 17, 12

Page 12: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Rest of this Lecture

• Some more Time Series Data Examples

• Simplest Time Series Model: Purely Random Process (Section 3.4.1)

• Sample Autocorrelation Coefficients and the Correlogram (Section 2.7 and Page 56)

Tuesday, January 17, 12

Page 13: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Annual Measurements of the level of Lake Huron 1875-1972

Year

Leve

l in

feet

1880 1900 1920 1940 1960

576

577

578

579

580

581

582

Tuesday, January 17, 12

Page 14: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Monthly Accidental Deaths in the US from 1973-1978

Time

Num

ber o

f Dea

ths

1973 1974 1975 1976 1977 1978 1979

7000

8000

9000

10000

11000

Tuesday, January 17, 12

Page 15: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

The first step in the time domain analysis of a timeseries data set is to find a model that well describes

the evolution of the data over time.

Basic Modelling Strategy: • Start Simple

• Build Up

Xt ,t = 1,,n independent N 0,σ 2( )Simplest Model:

Purely Random Process or Gaussian White Noise

Tuesday, January 17, 12

Page 16: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

0 20 40 60 80 100

-2-1

01

23

100 Observations from Gaussian White Noise with unit Variance

Time

Pur

ely

Ran

dom

Pro

cess

Tuesday, January 17, 12

Page 17: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Is this data set from a purely random process

Time

Data

0 20 40 60 80 100

-3-2

-10

12

Tuesday, January 17, 12

Page 18: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

How to check if a given time series is purely random?

Tuesday, January 17, 12

Page 19: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

How to check if a given time series is purely random?

Answer: Think in terms of Forecasting.

Tuesday, January 17, 12

Page 20: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

How to check if a given time series is purely random?

Answer: Think in terms of Forecasting.

For a purely random series, the given data can NOThelp in predicting Xn+1. The best estimate of Xn+1

is E(Xn+1) = 0.

In particular, X1 can not predict X2 and X2 can not predict X3 and so on.

Tuesday, January 17, 12

Page 21: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

How to check if a given time series is purely random?

Answer: Think in terms of Forecasting.

For a purely random series, the given data can NOThelp in predicting Xn+1. The best estimate of Xn+1

is E(Xn+1) = 0.

In particular, X1 can not predict X2 and X2 can not predict X3 and so on.

Therefore, the correlation coefficient between Y = (X1, ..., Xn-1) and Z = (X2, ..., Xn)

must be close to zero.

Tuesday, January 17, 12

Page 22: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

The formula for the correlation between Y and Z is

r =Xt − X (1)( ) Xt+1 − X (2)( )

t=1

n−1

Xt − X (1)( )2 Xt+1 − X (2)( )2t=1

n−1

∑t=1

n−1

∑X (1) =

Xtt=1

n−1

∑n −1

X (2) =Xt+1

t=1

n−1

∑n −1

This formula is usually simplified to obtain

X =Xt

t=1

n

∑n

Note the subscript on the left hand side above.

r1 =Xt − X( ) Xt+1 − X( )

t=1

n−1

Xt − X( )2t=1

n

Tuesday, January 17, 12

Page 23: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Sample Autocorrelation CoefficientsThe quantity r1 is called the Sample Autocorrelation Coefficient of X1, ..., Xn at lag one. Lag one because

this correlation is between Xt and Xt+1.

When X1, ..., Xn are obtained from a Purely RandomProcess, r1 is close to zero, particularly when n is large.

One can similarly consider Sample Autocorrelationsat other lags:

rk =Xt − X( ) Xt+ k − X( )

t=1

n− k

Xt − X( )2t=1

n

∑k = 1, 2, ...

Tuesday, January 17, 12

Page 24: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Correlogram

Mathematical Fact: When X1, ..., Xn are obtained from a Purely Random process, r1, r2, ... are independently

distributed according to N(0, 1/n).

So one way of testing if the series is purely randomis to plot the sample autocorrelations. This plot is

known as the Correlogram.

Use the function acf() in R to get the Correlogram.

ts.obs = rnorm(100)acf(ts.obs, lag.max = 20, type = “correlation”, plot = T, drop.lag.0 = F)

Tuesday, January 17, 12

Page 25: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

0 5 10 15 20

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF

Correlogram of a Purely Random Series of 100 Observations

The blue bands correspond to levels of ±1.96/√n

The correlogram plots rk against k. r0 always equals 1.

Tuesday, January 17, 12

Page 26: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

When X1, ..., Xn are obtained from a Purely Random process, the probability that a fixed rk lies outside

the blue bands equals 0.05.

A value of rk outside the blue bands is significant i.e., it gives evidence against pure randomness.

Interpreting the Correlogram

However, the overall probability of getting at least one rk outside the bands increases with the number

of coefficients plotted.

If 20 rks are plotted, one expects to get onesignificant value under pure randomness.

Tuesday, January 17, 12

Page 27: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Rules of Thumb(for deciding if a correlogram indicates

departure from randomness)

• A single rk just outside the bands may be ignored, but two or three values well outside indicate a departure from pure randomness.

• A single significant rk at a lag which has some physical interpretation such as lag one or a lag corresponding to seasonal variation also indicates evidence of non-randomness.

Chatfield (page 56)

Tuesday, January 17, 12

Page 28: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Is this data set from a purely random process

Time

Data

0 20 40 60 80 100

-3-2

-10

12

Tuesday, January 17, 12

Page 29: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

0 5 10 15 20

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF

Correlogram of the Data in the Previous Slide

This data was generated from a moving average process.

Tuesday, January 17, 12

Page 30: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Is this data set from a purely random process?

Time

Data

0 20 40 60 80 100

-3-2

-10

12

3

Tuesday, January 17, 12

Page 31: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

0 5 10 15 20

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF

Correlogram of the Data in the Previous Slide

Again, there is more structure in this datasetcompared to pure randomness.

Tuesday, January 17, 12

Page 32: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Is this data set from a purely random process?

Time

Data

0 20 40 60 80 100

-3-2

-10

12

Tuesday, January 17, 12

Page 33: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

0 5 10 15 20

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF

Correlogram of the Data in the Previous Slide

Lots of structure here.

Tuesday, January 17, 12

Page 34: STAT 153: Introduction to Time Seriesaditya/Site/Statistics_153...STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays)

Conclusions for Today

• Purely Random Process or Gaussian White Noise.

• Sample Autocorrelation Coefficient, rk.

• Correlogram.

• How do rks behave under pure randomness?

• How to tell from the Correlogram if there is evidence of departure from pure randomness?

Tuesday, January 17, 12