8/11/2019 Arima Model Bu Budiasih
1/26
Autoregressive Integrated Moving
Average (ARIMA)
Popularly known as the Box-Jenkins
methodology
8/11/2019 Arima Model Bu Budiasih
2/26
ARIMA methodology emphasis not only on constructing
single-equation or simultaneous-equation models but also on
analyzing the probabilistic or stochastic properties of economic
time series on their own set of data.
Unlike the regression models, in which Yi is explained by k
regressor X1, X2, X3, ... , Xkthe BJ-type time series models
allow Yito be explained by past, or lagged, values of Y itself
and stochastic error terms.
For this reason, ARIMA models are sometimes called a
theoretic model because they are not derived from any
economic theory and economic theories are often the basis of
simultaneous-equation models.
Note that the emphasis in this topic is on univariate ARIMA
models, as this is pertaining to a single time series.
But can be extended to multivariate ARIMA models.
8/11/2019 Arima Model Bu Budiasih
3/26
Let us work with the GDP time series data for the United
States given in Table.
A plot of this time series is given in Figures1 (undifferenced
GDP) and 2 (first-differenced GDP)
GDP in level form is nonstationary but in (first) differenced
form it is stationary.
If a time series is stationary, then it can fit for ARIMA model
in a variety of ways.
An Autoregressive (AR) Process
Let Ytrepresent GDP at time t.
If we model Ytas (Yt - ) = 1(Yt-1) + ut where is the mean of Y and where utis an uncorrelated
random error term with zero mean and constant variance 2
(i.e., it is white noise), then we say that Ytfollows a first-order
autoregressive, or AR(l), stochastic process
8/11/2019 Arima Model Bu Budiasih
4/26
Here the value of Y at time t depends on its value in the
previous time period and a random term; the Y values are
expressed as deviations from their mean value.
In other words, this model says that the forecast value of Y at
time t is simply some proportion (=l) of its value at time (t-1)
plus a random shock or disturbance at time t; again the Y
values are expressed around their mean values.
But in the model, (Yt - ) = 1(Yt-1) + 2(Yt-2) + ut
Ytfollows a second-order autoregressive, or AR(2), process.
The value of Y at time t depends on its value in the previous
two time periods, the Y values being expressed around their
mean value .
In general, (Yt - ) = 1(Yt-1) + 2(Yt-2) + .. + p
(Yt-p) + ut
Here Yt is a pth order autoregressive or AR(p), process.
8/11/2019 Arima Model Bu Budiasih
5/26
A Moving Average (MA) Process
Suppose we model Y as follows: Yt= + 0ut+ 1ut-1
where is a constant and utas before, is the white noisestochastic error term.
Here Y at time t is equal to a constant plus a moving average
of the current and past error terms.
Thus, in the present case, Y follows a first-order movingaverage, or an MA(1), process.
But if Y follows the expression Yt= + 0ut+ 1ut-1+ 2ut-2
then it is an MA(2) process.
Generally, Yt= + 0ut+ 1ut-1+ 2ut-2+ .. + qut-qis anMA(q) process.
In short, a moving average process is simply a linear
combination of white noise error terms.
8/11/2019 Arima Model Bu Budiasih
6/26
An Autoregressive and Moving Average (ARMA) Process
It is quite likely that Y has characteristics of both AR and MA
and is therefore ARMA. Thus, Ytfollows an ARMA (1, 1) process if it can be written
as Yt= + 1Yt-1+ 0ut+ 1ut-1
because there is one autoregressive and one moving average
term and represents a constant term. In general, in an ARMA (p, q) process, there will be p
autoregressive and q moving average terms.
An Autoregressive Integrated Moving Average (ARIMA)
Process Many economic time series are nonstationary, that is, they are
integrated.
8/11/2019 Arima Model Bu Budiasih
7/26
If a time series is integrated of order 1 [i.e., it is I(1)], its first
differences are I(0), that is, stationary.
Similarly, if a time series is I(2), its second difference is I(0).
In general, if a time series is I(d), after differencing it d times
we obtain an I(0) series.
Therefore, if in a time series d times difference make it
stationary, then it is ARIMA (p, d, q) model is called an
autoregressive integrated moving average time series model.
where p denotes the number of autoregressive terms, d the
number of times the series has to be differenced before it
becomes stationary, and q the number of moving average terms.
An ARIMA(2,1,2) time series has to be differenced once (d =1)
becomes stationary and it has two AR and two MA terms.
8/11/2019 Arima Model Bu Budiasih
8/26
The important point to note is that to use the Box-Jenkins
methodology, we must have either a stationary time series or a
time series that is stationary after one or more differencing. Reason for assuming stationarity can be explained as follows:
The objective of B-J [Box-Jenkins] is to identify and estimate
a statistical model which can be interpreted as having
generated the sample data. If this estimated model is then to be used for forecasting we
must assume that the features of this model are constant
through time, and particularly over future time periods.
Thus the reason for requiring stationary data is that any modelwhich is inferred from these data can itself be interpreted as
stationary or stable, therefore providing valid basis for
forecasting.
8/11/2019 Arima Model Bu Budiasih
9/26
THE BOX-JENKINS (BJ) METHODOLOGY
Looking at a time series, such as the US GDP series in Figure.
How does one know whether it follows a purely AR process
(and if so, what is the value of p) or a purely MA process (and
if so, what is the value of q) or an ARMA process (and if so,
what are the values of p and q) or an ARIMA process.
In which case we must know the values of p, d, and q.
The BJ methodology answering these questions.
The method consists of four steps:
Step 1. Identification:That is, find out the appropriate values
of p, d, and q using correlogram and partial correlogram and
Augmented Dickey Fuller Test.
8/11/2019 Arima Model Bu Budiasih
10/26
8/11/2019 Arima Model Bu Budiasih
11/26
Step 2. Estimation:Having identified the appropriate p and q
values, the next stage is to estimate the parameters of the
autoregressive and moving average terms included in themodel.
Sometimes this calculation can be done by simple least squares
but sometimes we will have to resort to nonlinear (in
parameter) estimation methods.
Since this task is now routinely handled by several statistical
packages, we do not have to worry about the actual
mathematics of estimation.
Step 3. Diagnostic checking:Having chosen a particular
ARIMA model and having estimated its parameters, we next
see whether the chosen model fits the data reasonably well, for
it is possible that another ARIMA model might do the job as
well.
8/11/2019 Arima Model Bu Budiasih
12/26
This is why Box-Jenkins ARIMA modeling is more an art than
a science; considerable skill is required to choose the right
ARIMA model. One simple test of the chosen model is to see if the residuals
estimated from this model are white noise; if they are, we can
accept the particular fit; if not, we must start over.
Thus, the BJ methodology is an iterative process.
Step 4. Forecasting:One of the reasons for popularity of the
ARIMA modeling is its success in forecasting.
In many cases, the forecasts obtained by this method are more
reliable than those obtained from the traditional econometricmodeling, particularly for short-term forecasts.
Let us look at these four steps in some detail. Throughout, we
will use the GDP data given in Table .
8/11/2019 Arima Model Bu Budiasih
13/26
IDENTIFICATION
The chief tools in identification are the autocorrelation
function (ACF), the partial autocorrelation function (PACF),
and the resulting correlogram, which are simply the plots of
ACFs and PACFs against the lag length.
The concept of partial autocorrelation is analogous to theconcept of partial regression coefficient.
In the k-variable multiple regression model, the kth regression
coefficient kmeasures the rate of change in the mean value of
the regress and for a unit change in the kth regressor Xk,holding the influence of all other regressors constant.
8/11/2019 Arima Model Bu Budiasih
14/26
In similar fashion the partial autocorrelation kkmeasurescorrelation between (time series) observations that are k time
periods apart after controlling for correlations at intermediatelags (i.e., lag less than k).
In other words, partial autocorrelation is the correlationbetween Ytand Yt-kafter removing the effect of intermediateY's.
In Figure, we show the correlogram and partial correlogram ofthe GDP series.
From this figure, two facts stand out:
First, the ACF declines very slowly and ACF up to 23 lags are
individually statistically significantly different from zero, forthey all are outside the 95% confidence bounds.
Second, after the first lag, the PACF drops dramatically, andall PACFs after lag 1 are statistically insignificant.
8/11/2019 Arima Model Bu Budiasih
15/26
8/11/2019 Arima Model Bu Budiasih
16/26
Since the US GDP time series is not stationary, we have to
make it stationary before we can apply the Box-Jenkins
methodology.
In next Figure we plotted the first differences of GDP.
Unlike previous Figure, we do not observe any trend in this
series, perhaps suggesting that the first-differenced GDP time
series is stationary.
A formal application of the Dickey-Fuller unit root test showsthat that is indeed the case.
Now we have a different pattern of ACF and PACE The ACFs
at lags 1, 8, and 12 seem statistically different from zero.
Approximate 95% confidence limits for kare -0.2089 and+0.2089.
But at all other lags are not statistically different from zero.
This is also true of the partial autocorrelations .kk
8/11/2019 Arima Model Bu Budiasih
17/26
8/11/2019 Arima Model Bu Budiasih
18/26
Now how do the correlogram given in Figure enable us to findthe ARMA pattern of the GDP time series?
We will consider only the first differenced GDP series becauseit is stationary.
One way of accomplishing this is to consider the ACF andPACF and the associated correlogram of a selected number ofARMA processes, such as AR(l), AR(2), MA(1), MA(2),ARMA(1, 1), ARIMA(2, 2), and so on.
Since each of these stochastic processes exhibits typicalpatterns of ACF and PACF, if the time series under study fitsone of these patterns we can identify the time series with that
process. Of course, we will have to apply diagnostic tests to find out if
the chosen ARMA model is reasonably accurate.
8/11/2019 Arima Model Bu Budiasih
19/26
What we plan to do is to give general guidelines (see Table );
the references can give the details of the various stochastic
processes. The ACFs and PACFs of AR(p) and MA(q) processes have
opposite patterns; in AR(p) case the AC declines geometrically
or exponentially but the PACF cuts off after a certain number
of lags, whereas the opposite happens to an MA(q) process.
Table: Theoretical Patterns of ACF and PACF
Type of Model Typical pattern of ACF Typical pattern of PACF
AR(p)
Decays exponentially
or with damped sinewave pattern or both
Significant spikesthrough lags p
MA(q)
Significant spikes
through lags q Declines exponentially
ARMA(p,q) Exponential decay Exponential decay
8/11/2019 Arima Model Bu Budiasih
20/26
ARIMA Identification of US GDP:
The correlogram and partial correlogram of the stationary
(after first-differencing) US GDP for 1970-1 to 1991-IV given
in Figure shown
The autocorrelations decline up to lag 4, then except at lags 8
and 12, the rest of them are statistically not different from zero
(the solid lines shown in this figure give the approximate 95%
confidence limits). The partial autocorrelations with spikes at lag 1, 8, and 12
seem statistically significant but the rest are not; if the partial
correlation coefficient were significant only at lag 1, we could
have identified this as an AR (l) model. Let us therefore assume that the process that generated the
(first-differenced) GDP is at the most an AR (12) process.
We do not have to include all the AR terms up to 12, only the
AR terms at lag 1, 8, and 12 are significant.
8/11/2019 Arima Model Bu Budiasih
21/26
ESTIMATION OFTHE ARIMA MODEL
Let denote the first differences of US GDP.
Then our tentatively identified AR model is
Using Eviews, we obtained the following estimates:
t = (7.7547) (3.4695) (-2.9475) (-2.6817)
R2= 0.2931 d = 1.7663
*
tY
*
1212
*
88
*
11
*
tttt YYYY
*
12
*
8
*
1
*2644.02994.03428.00894.23
tttt YYYY
8/11/2019 Arima Model Bu Budiasih
22/26
DIAGNOSTIC CHECKING
How do we know that the above model is a reasonable fit to
the data?
One simple diagnostic is to obtain residuals from the above
model and obtain ACF and PACF of these residuals, say, up to
lag 25.
The estimated AC and PACF are shown in Figure.
As this figure shows, none of the autocorrelations and partial
autocorrelations is individually statistically significant.
Nor is the sum of the 25 squared autocorrelations, as shown by
the Box-Pierce Q and Ljung-Box LB statistics statistically
significant.
Correlogram of autocorrelation and partial autocorrelation give
that the residuals estimated from are purely random. Hence,
there may not be any need to look for another ARIMA model.
8/11/2019 Arima Model Bu Budiasih
23/26
8/11/2019 Arima Model Bu Budiasih
24/26
FORECASTING
Suppose, on the basis of above model, we want to forecast
GDP for the first four quarters of 1992. But in the above model the dependent variable is change in the
GDP over the previous quarter.
Therefore, if we use the above model, what we can obtain are
the forecasts of GDP changes between the first quarter of 1992and the fourth quarter of 1991, second quarter of 1992 over the
first quarter of 1992, etc.
To obtain the forecast of GDP level rather than its changes, we
can "undo" the first-difference transformation that we had usedto obtain the changes.
(More technically, we integrate the first-differenced series.)
8/11/2019 Arima Model Bu Budiasih
25/26
To obtain the forecast value of GDP (not GDP) for 1992-1,
we rewrite model as
Y1992,I- Y1991,IV= + l[Y1991,IVY1991,III] + 8[Y1989,IV
Y1989,III] + 12[Y1988,IVY1988,III] + u1992-I
That is, Y1992,I= + (1+l)Y1991,IVlY1991,III+ 8Y1989,IV8Y1989,III+ 12Y1988,IV12Y1988,III + u1992-I
The values of , l, 8, and 12are already known from the
estimated regression.
The value of u1992-Iis assumed to be zero.
Therefore, we can easily obtain the forecast value of Y1992-I.
*
1212
*
88
*
11
*
tttt YYYY
8/11/2019 Arima Model Bu Budiasih
26/26