Time Series Regression (part 1) - IPB University · Smoothing method for seasonal time series data:...

Preview:

Citation preview

Time Series Regression (part 1) LECTURE 7|TIME SERIES FORECASTING METHOD rahmaanisa@apps.ipb.ac. id

Review Smoothing method for non-seasonal time series data:

Moving Average: SMA, DMA

Exponential Smoothing: SES, DES

Smoothing method for seasonal time series data: Additive Holt-Winter

Multiplicative Holt-Winter

Review

Outline Review of regression model

Independence assumption and the consequences of its violation

Regresion model for time series data set

Linear

Regression

??

Linear Regression

𝒚 = 𝑿𝜷+ 𝜺

dependent variable

independent variable(s)

error model

Linear Regression

1086420

20.0

17.5

15.0

12.5

10.0

7.5

5.0

S 0.911075

R-Sq 95.9%

R-Sq(adj) 95.9%

X

Y

Fitted Line PlotY = 2.803 + 1.511 X

Assumptions on Linear Regression Model

• The relationship between X and Y is linear

• 𝜀~𝑖. 𝑖. 𝑑 𝑁𝑜𝑟𝑚𝑎𝑙 0, 𝜎2

• No multicollinearity

Diagnostics

Serial Correlated Error

𝑐𝑜𝑣 𝑒𝑡 , 𝑒𝑡−𝑘 ≠ 0

where

𝑒𝑡 = error at time 𝑡

𝑒𝑡−𝑘 = error at time (𝑡 − 𝑘), 𝑘 = 1,2, …

Problems in Linear Regression: Serial Correlation

Positive serial correlation of

residuals

The residuals change sign in

gradual oscillation.

Problems in Linear Regression: Serial Correlation

Negative serial correlation of

residuals

The residuals bounce

between positive and negative, but

not randomly

Possible Causes of Serial Correlated Error

1) omitted variables

2) ignoring nonlinearities

3) measurement errors

Consequences of Serial Correlated Error

1. The OLS estimators are still unbiased and consistent

2. In large samples, the error may be still normally distributed

3. The estimators are no longer efficient no longer BLUE.

4. The estimated standard error may be underestimated,

5. the tests using the t and F distribution, may no longer be appropriate

Identification of Serial Correlated Error Residual Plot

Durbin Watson test

Runs Test

Breuch-Godfrey Test

Etc.

Possible Solutions for Autocorrelation Problem Cochrane-Orcutt

Hildreth-Lu

Distributed Lag

Etc.

IllustrationConsider the number of labour hours and sales (in dollars) data set as follows:

YearQuar-

terNumber of labour

hourssales in dollars

2011 1 126754 15349829

2011 2 129839 15629384

2011 3 106872 15720934

2011 4 123787 16230984

2012 1 137678 16809312

2012 2 138279 16923347

2012 3 109873 16978434

2012 4 137368 17203948

2013 1 139823 17830230

2013 2 138346 17937463

2013 3 112837 18074652

2013 4 149870 18347655

YearQuar-

terNumber of labour

hourssales in dollars

2014 1 147263 18438749

2014 2 147868 18604334

2014 3 113897 18740234

2014 4 149879 18943340

2015 1 149376 19276345

2015 2 156982 19173645

2015 3 123783 19147234

2015 4 159734 19842667

2016 1 159734 20783274

2016 2 169283 20348753

2016 3 128647 20873488

2016 4 163467 20475644

Source: kaggle.com

Illustration

The datasets is avalaible at:

https://github.com/raoy/Time-Series-Analysis

Illustration

170000160000150000140000130000120000110000100000

21000000

20000000

19000000

18000000

17000000

16000000

15000000

Number of labour hours

sale

s in

do

llars

Scatterplot of sales in dollars vs Number of labour hours

Pearson correlation

0.615

P-value 0.001

Correlations

Illustration

Source DF Adj SS Adj MS F-Value P-ValueRegression 1 2.32E+13 2.32E+13 13.37 0.001Error 22 3.82E+13 1.74E+12Total 23 6.14E+13

S R-sq R-sq(adj) R-sq(pred)1317579 37.79% 34.97% 27.52%

Model Summary

Analysis of Variance

Term Coef SE Coef T-Value P-Value VIFConstant 10373187 2167627 4.79 0Number of labour hours 56.8 15.5 3.66 0.001 1

Coefficients

Regression Equationsales in dollars = 10373187 + 56.8 Number of labour hours

Illustration

The residuals are NOT RANDOM!

Illustration

21000000200000001900000018000000170000001600000015000000

21000000

20000000

19000000

18000000

17000000

16000000

15000000

sales in dollars (t-1)

sale

s in

do

llars

Scatterplot of sales in dollars vs sales in dollars (t-1)

Sales is HIGHLY CORRELATED

with its value at (t-1) period

IllustrationRegression Equationsales in dollars = 1111051 + 0.9296 sales in dollars (t-1) + 2.80 Number of labour hours

Source DF Adj SS Adj MS F-Value P-Value

Regression 2 5.06E+13 2.53E+13 237.69 0Error 20 2.13E+12 1.06E+11Total 22 5.27E+13

S R-sq R-sq(adj) R-sq(pred)

326161 95.96% 95.56% 94.30%

Model Summary

Analysis of Variance

Term Coef SE Coef T-Value P-Value VIF

Constant 1111051 796218 1.4 0.178

sales in dollars (t-1) 0.9296 0.0546 17.01 0 1.58

Number of labour hours 2.8 4.88 0.57 0.573 1.58

Coefficients

Add the lag of SALES as independent variable

Illustration

Chapter Summary Assumptions on classical regression

modeling

Consequences of autocorrelated residuals

Regression modeling for time series data

Another Example

See chapter 4.8 on Hyndman (2013) https://www.otexts.org/fpp/4/8

Exercise 1Supposed there were 20 periods market share data set of a toothpasteproduct :

PeriodMarket

sharePrice Period

Market

sharePrice

1 3.63 0.97 11 7.25 0.79

2 4.20 0.95 12 6.09 0.83

3 3.33 0.99 13 6.80 0.81

4 4.54 0.91 14 8.65 0.77

5 2.89 0.98 15 8.43 0.76

6 4.87 0.90 16 8.29 0.80

7 4.90 0.89 17 7.18 0.83

8 5.29 0.86 18 7.90 0.79

9 6.18 0.85 19 8.45 0.76

10 7.20 0.82 20 8.23 0.78

Conduct regression modeling of market share (Y) towards price (X).Investigate autocorrelation of the residuals.

Exercise 2Conduct appropriate regression modeling using the following data set, and

investigate autocorrelation of the residuals.

Year Sales Advertising Year Sales Advertising

1975 11.7 9.4 1995 18.0 15.9

1976 12.0 9.6 1996 17.9 16.0

1977 12.3 10 1997 18.0 16.3

1978 12.8 10.4 1998 18.2 16.2

1979 13.1 10.8 1999 18.2 16.8

1980 13.6 10.9 2000 18.3 17.3

1981 13.9 11.7 2001 18.6 17.6

1982 14.4 12.2 2002 19.2 18.1

1983 14.7 12.5 2003 19.3 18.3

1984 15.3 12.9 2004 19.5 18.5

1985 15.5 13.0 2005 19.2 18.7

1986 15.8 13.2 2006 19.3 18.9

1987 16.1 13.8 2007 19.5 19.2

1988 16.6 14.2 2008 20.0 20.0

1989 16.9 14.6 2009 20.0 20.0

1990 16.7 14.4 2010 19.9 20.3

1991 16.9 15.0 2011 19.8 20.4

1992 17.4 15.4 2012 19.9 21.0

1993 17.6 15.7 2013 20.2 21.5

1994 17.9 15.9 2014 21.0 22.1

Next Topic…

Regression for Time Series Data Set (part 2)

ReferencesGujarati, D., McMillan, P. 2011. Econometrics by Example.

London: Palgrave Macmillan.

Hyndman, R.J and Athanasopoulos, G. 2013. Forecasting:principles and practice. https://www.otexts.org/fpp/6/2/ [March 21st, 2018]

Paulson, D.S. 2007. Handbook of Regression and Modeling:Applications for the Clinical and PharmaceuticalIndustries. Boca Raton: Chapman & Hall.

30

The handouts are available on the following site:

stat.ipb.ac.id/en

31

PREPARE

YOUR MID-EXAM

Recommended