25
Natural Gas Time Series Analysis Shalavadi 1 Natural Gas Time Series Analysis By Sandesh Shalavadi PSTAT 174 June 7, 2016

Time Series proj final

Embed Size (px)

Citation preview

Page 1: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

1

Natural Gas Time Series Analysis

By Sandesh Shalavadi

PSTAT 174

June 7, 2016

Page 2: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

2

Sandesh Shalavadi (#7210834)PSTAT 174Gyorgy Terdik6/6/2016

Natural Gas Data Time Series Analysis

I. Introduction

Natural gas is hydrocarbon gas that is made of mainly methane and smaller amounts of carbon dioxide, hydrogen sulfide, helium, or nitrogen. It is formed as numerous layers of plant and animal life are exposed to extreme heat and pressure under the Earth’s surface. It is a fossil fuel that is currently used for cooking, electricity generation, and heating. Although, in comparison to its counterparts, such as petroleum or coal, it is much more efficient and releases far fewer emissions to the environment without polluting the air. Therefore, it is a very safe gas and is also environmentally friendly, which is significant in today’s polluted atmosphere. In addition, natural gas is odorless, colorless, and shapeless. If a leak occurs, it will instantly accumulate upwards, so it wouldn’t be able to build and cause a potential explosion.

I chose natural gas as my idea to research because I am interested in the recent efforts to change climate control and combat global warming. In the Paris climate control talks in 2015, 195 countries gathered to discuss how they could alleviate the rapid expansion of global warming by reducing the consumption of greenhouse gases and making developing countries more eco-friendly so as to adapt to the adverse effects of the climate change as well as build a financial plan to support a pathway to climate-resilient development. The time series analysis of natural gas will help to improve competent prediction for the change of manufacturing costs and labor, which will each country reach their respective goals.

The goal for this project is to initiate time series analysis for an actual time series growth data set. I have chosen the (Price of Natural Gas) from 1996 to 2016. This is a span of 20 years which will show how fluctuation in price affected the overall data. First, I used R to analyze the data and then tried to fit a model. First, I used a differencing to transform the data to get a stationary time series and remove trend and seasonality. Then, I estimated the parameters from the ACF and PACF of the new differenced time series. I analyzed several plots to select five possible models to fit the data. Subsequently, I checked for the smallest AIC. Finally, I used the fitting models to forecast the future values.

II. Sections

Page 3: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

3

According to the original time series plot, it is not a stationary process since they are a few upwards trends that portray a significant increase in the price of natural gas. The mean is 4.625945 and the variance is 5.428623. Therefore, there was no seasonality in the original plot. There was sharp changes in the years 2001, 2006, and 2008. The price reached its highest point at October 2005 and its lowest point at December 1998.

Modelling:

From the autocorrelation function plot (ACF), the gradual decay of ACF shows not stationarity. Therefore, in order to get a stationary series, I differenced the data so I could remove the trend. I used differencing to transform the data instead of using box-cox or log transformation

Time Series plot of original data

ACF and PACF plot of original data

Page 4: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

4

so as to avoid using adding process. Differenced time series data:

Parameter estimation

After differencing the time series, the new time series plot looks stationary. The mean is 0.01101852, which is approximately 0. While the variance decreased to 0.6964492. Also, the

Time series plot of differenced data

ACF and PACF plot of differenced data

Page 5: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

5

ACF shows seasonality. Also, since the ACF and the PACF plot trail off rapidly, an ARIMA model is possible. According to the ACF plot, the ACF cuts off at lag 5 and lag 9. According to the PACF plot, the PACF cuts of at lag 5 and lag 9.

----------------------------------------------------------------------------------------------------------------adf.test(newgas)

Augmented Dickey-Fuller Testdata: newgasDickey-Fuller = -6.1023, Lag order = 5, p-value = 0.01alternative hypothesis: stationary

p-values < 0.05 indicated stationary.The low p-value from the ADF test also confirms that the data is stationary. ---------------------------------------------------------------------------------------------------------------- plot(decompose(newgas))

Seasonal trend of fitted data

From the decomposition of the time series, we can see that there is a quadratic trend with a seasonal trend, which suggests a seasonal model to fit the data.

Model Diagnostics:

Page 6: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

6

Fit the models

In order to find the best model, I fit the following models and afterwards, I checked their AIC in order to find the one model with the smallest AIC. Here are the formulas for the models:Model 1 ARIMA (1,1,0)Model 2 ARIMA (1,0,0): lowest AIC;Model 3 ARIMA (2,1,0)Model 4 ARIMA (2,1,1): second lowest AIC;Model 5 ARIMA (2,0,1) Utilizing “auto.arima” funciton for our first model to give us a general idea.We will attempt to find the best seasonal ARIMA model AIC value.auto.arima(gas)Series: gas ARIMA(1,0,0) with non-zero mean Coefficients: ar1 sar1 0.0400 -0.4823s.e. 0.0698 0.0591sigma^2 estimated as 0.6964: log likelihood=-266.92AIC=537.84 AICc=537.9 BIC=544.59From “auto.arima” function, we can see that ARIMA(1,0,0) is not a bad model with significance coefficient of 0.0400 for ar1.

Model AIC Coefficient s.e. Conclusion

SARIMA(1,0,0)×(1,1,0)12 615.75ar1=0.0400 0.0698 significantsar1-0.4823 0.0591 significant

SARIMA(1,0,0)×(1,0,0)12 540.88ar1=-0.0093 0.0683 significantsar1=-.0668 0.68 not significant

SARIMA(2,0,0)×(2,1,0)12 593.32ar1=0.0388 0.0698 significant

sar1=-0.6561 0.0648 not significant

SARIMA(2,0,0)×(2,1,1)12 541.01ar1=0.0117 0.0701 significant

sar1=-.1245 0.0717 significantsma1=-1.00 0.0731 significant

SARIMA(2,0,0)×(2,0,1)12 546.23

ar1=-0.0151 0.069 significantsar1=-0.0297 0.4389 not significantsar2=-0.0507 0.0733 not significantsma1=-.0422 -.0422 not significant

Although SARIMA(1,0,0)×(1,1,0)12 is a not bad model with all its coefficients are significant. We will conclude that SARIMA(1,0,0)×(2,1,0)12 is the best fit because it has smaller AIC value of 541.01 with all its coefficients are significant.Diagnostic checking for ARIMA(1,1,0):

Page 7: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

7

> fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12))> fit1Call:arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 1, 0), period = 12))Coefficients: ar1 sar1 0.0400 -0.4823s.e. 0.0698 0.0591sigma^2 estimated as 1.145: log likelihood = -304.88, aic = 615.75> Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)

Box-Pierce testdata: residuals(fit1)X-squared = 0.00081791, df = 1, p-value = 0.9772All p-values larger than 0.05, so it passed the tests.

SARIMA(1,0,0)×(1, 1, 0)12 is not a bad model with significance coefficient of 0.0400 for ar1 and significant coefficient of -0.4823 for ma1. Now, I will try SARIMA model to fit the series, and compared which one fits the best. The one with significant coefficient and lowest AIC values should be the best.

Diagnostic checking for ARIMA (1,0,0):

> fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12))> fit2Call:arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))Coefficients: ar1 sar1 intercept -0.0093 -0.0668 0.0110

Page 8: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

8

s.e. 0.0683 0.0680 0.0527sigma^2 estimated as 0.69: log likelihood = -266.44, aic = 540.88> Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)

Box-Pierce testdata: residuals(fit2)X-squared = 6.8896e-08, df = 1, p-value = 0.9998>tsdiag(fit2)

All p-values larger than 0.05, so it passed the tests.

According to the tsdiag plot for fit2, the standardized residuals plot doesn’t show clusters of volatility. The ACF plots show no significant autocorrelation between the residuals. The p-values for the Ljung–Box statistics are mostly above the blue. This could be our optimal model, but since fit4 has a lower AIC value, it is preferred.

Diagnostic checking for ARIMA (2,1,0):> fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12))> fit3Call:arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 0), period = 12))Coefficients: ar1 ar2 sar1 sar2

tsdiag plot of fit 2 model

Page 9: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

9

0.0388 -0.0592 -0.6561 -0.3351s.e. 0.0698 0.0699 0.0648 0.0626sigma^2 estimated as 0.9913: log likelihood = -291.66, aic = 593.32> Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)

Box-Pierce testdata: residuals(fit3)X-squared = 0.0045882, df = 1, p-value = 0.946

All p-values larger than 0.05, so it passed the tests.Diagnostic checking for ARIMA (2,1,1):> fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12))> fit4

Callarima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 1), period = 12))

Coefficients: ar1 ar2 sar1 sar2 sma1 0.0117 -0.0102 -0.1245 -0.1152 -1.0000s.e. 0.0701 0.0705 0.0717 0.0696 0.0731sigma^2 estimated as 0.6438: log likelihood = -264.5, aic = 541.01> Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)

Box-Pierce testdata: residuals(fit4)X-squared = 2.5971e-06, df = 1, p-value = 0.9987tsdiag(fit4)

All p-values are larger than 0.05, passes the tests.

Page 10: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

10

According to the tsdiag plot for fit4, the standardized residuals plot doesn’t show clusters of volatility. The ACF plots show no significant autocorrelation between the residuals. The p-values for the Ljung–Box statistics are all mostly above the blue. Therefore, we have white noise for SARIMA(2,0,0)×(2,1,1)12, and it is an adequate model.

The model equation is: (1 – φ1B + φ1B2)(1 – Φ1B12–Φ2 B24)(1 – B12)Xt = Wt

Diagnostic checking for ARIMA (2,0,1):

> fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))> fit5Call:arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 0, 1), period = 12))Coefficients: ar1 ar2 sar1 sar2 sma1 -0.0151 0.0072 -0.0297 -0.0507 -0.0422s.e. 0.0690 0.0690 0.4389 0.0733 0.4374 intercept 0.0105s.e. 0.0501sigma^2 estimated as 0.6877: log likelihood = -266.12, aic = 546.23

> Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)Box-Pierce test

data: residuals(fit5)X-squared = 6.152e-06, df = 1, p-value = 0.998

All p-values larger than 0.05, passed the tests.

Forecasting:plot(forecast(fit))

The forecast plot looks pretty good, and it seems to capture the overall movement of the data adequately.

Page 11: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

11

Diagnostic Plots for Time-Series Fits

Comparing real time data with 10 future values

y=read.table("realdataofnaturalgas.txt", header=T) data=ts(y$gas, frequency = 12, start = c(1991))par(mfrow=c(2,1))plot(forecast(fit4))plot(data,main="real data")

The second time series plot above is from the real data. Looking at the real data plot, there is a decreasing trend from 2014 to 2016, which is similar to the forecast plot. We can see that the forecast value is relatively close to the real value because most of the real data sets from 2014 to 2016 fall fairly into the confident intervals (the shadow part of forecast plot). As a result, the outcome shows that the forecast was fairly adequate.

Real data time series plot

Page 12: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

12

III. Sources

Data from: http://www.indexmundi.com/commodities/?commodity=natural-gas&months=240

IV. IV. Code

library(forecast)library(MASS)# include packages astsa, forecast, MASS, timeDate, timeSeries, tseriessetwd("C:/Sandesh/College Stuff/UCSB/PSTAT 126")x=read.table("realdataofnaturalgas.csv", sep=",", header=T) gas=ts(x$Price, start = c(1996,4), end = c(2014,4), frequency = 12)# mean and variance of gas data set before transformationmean(gas)var(gas)#ACF and PACF plot of original data set (non-stationary), ACF trails off acf2(ts(newgas))# transformed data set to make stationary time seriesnewgas=diff(gas)# plot of original and new data set to show difference after removing trends and adding lagplot(gas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2)plot(newgas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2)# mean and variance for transformed data setmean(newgas)var(newgas)#ACF and PACF of transformed data setacf2(ts(newgas))adf.test(newgas)plot(decompose(newgas))auto.arima(newgas)#fit models fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12))fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12))fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12))fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12))fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))

Page 13: Time Series proj final

Natural Gas Time Series AnalysisShalavadi

13

# ARIMA(1,1,0) has the smallest AIC, ARIMA(2,1,0),# ARIMA(1,0,0), ARIMA(2,1,1) and ARIMA(2,0,1) have similar AIC.

#simulate models> fit1> fit2> fit3> fit4> fit5> # Diagnostic checking for models fit1, fit2, fit3, fit4, fit5# plot acf of residuals, standardized residuals and p-values test>tsdiag(fit2)>tsdiag(fit4)> plot(forecast(fit1))# box-pierce and Ljung Box test for all fitted models> Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)> Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)> Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)> Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)> Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)#confidence intervals for all fitted models> confint(fit1)> confint(fit2)> confint(fit3)> confint(fit4)> confint(fit5)

y=read.table("realdata3.csv",sep=",", header=T) data=ts(y$Price,start = c(2014,6), end = c(2016,4), frequency=12)# forecast next 10 observations of original time series # prediction intervalpred<-predict(fit4, n.ahead = 10)pred.se<-pred$sepred<-predict(fit4, n.ahead = 10) par(mfrow=c(2,1)) plot(forecast(fit4)) plot.ts(data,main='real data', xlab='Year',ylab='Price',lwd=2)