Testing for Second Order Stationarity and Forecasting Time Series · 2016-11-04 · and auto-covariances of the series are also constant in time then we describe the series as second

Testing for Second Order Stationarityand Forecasting Time Series

Stephen Arthur BradleyLevel 6 project 10cp

Deadline: Monday 9 February 2015

1

Acknowledgement of SourcesFor all ideas taken from other sources (books, articles, internet), the source of the ideas ismentioned in the main text and fully referenced at the end of the report.

All material which is quoted essentially word-for-word from other sources is given inquotation marks and referenced.

Pictures and diagrams copied from the internet or other sources are labelled with areference to the web page or book, article etc.

Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Contents1 Examining the Stationarity of Time Series 4

1.0.1 Examples of Stationary and Locally Stationary Series . . . . . . . . 41.1 Packages to Test for Stationarity . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Locits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.2 Fractal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Initial Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Testing for Features of Stationarity . . . . . . . . . . . . . . . . . . . . . . 7

2 Local Auto Covariance 92.1 Examining the plotted Local auto-covariance Function . . . . . . . . . . . 10

3 Forecasting 133.1 Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Non-stationary Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Box Jenkins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3

1 Examining the Stationarity of Time SeriesThis project will focus on analysing recent data relating to the Gross Domestic Product(GDP) of the UK using data collected by the Office of National Statistics (ONS).

If a series is constant in mean then we say it is first order stationary and if the varianceand auto-covariances of the series are also constant in time then we describe the series assecond order stationary (or weakly stationary). Throughout the text we will use the termstationary loosely to mean weakly stationary.

Most methods for analysing time series are designed for stationary series. Our aimwill be to construct accurate forecasts for ONS data but we cannot make a reasonableforecast for a series using classical methods unless it is stationary. The methods used willtherefore heavily depend on the stationarity of the time series in question, and so it isvery important to have some verification that a series is stationary before it is possible toimplement classical methods. Hence, before applying any methods to a series we shallspend some time discussing whether it is stationary and conducting some statistical tests.

There are many ways in which we could test a series for stationarity and we may wishto apply a selection of tests in order to test separate features. If a graphical plot of thetime series is clearly non-stationary then we assume non-stationarity and try to alter thedata; the statement of stationarity implies some very restricting properties and is unlikelyto be achieved in raw data. If, for example, there were an obvious trend in the data thenthe corresponding correlogram would not decay so any analysis in the time domain wouldbe ridiculous, thus, detrending the series is a crucial step. If there are no obvious featuresimplying non-stationarity, i.e. there appears to be no trend or seasonal effect, then wemay want to start by testing whether the mean is constant in time. One way we could testwhether the mean of our time series, {X(t)}1≤t≤n, is independent of time, t, is to form apaired t test where X(t) is matched to X(t+n

2 ) for each t ≤ n2 , when n is even.

Next we may want to test whether the variance is constant using a Bartlet test (Bartlett,1937) in which the data is split into multiple sections and the sample variance of eachsection is compared for significant differences. The last feature of a stationary series whichwe must test for is that the covariance function is constant in time and this is much moredifficult to test so we shall introduce some sophisticated packages for this purpose which aredetailed below. For those which are non-stationary we shall consider further manipulations(e.g. differencing) to the series to see if we can form a stationary series from it (Nason,2013). All analysis shall be carried out in R (R Core Team, 2014).

1.0.1 Examples of Stationary and Locally Stationary Series

Autoregressive - moving average processes (ARMA) are a commonly used example ofa stationary time series with the following representation, where Zt is a purely randomprocess with mean zero, variance σ2 and µ is the expected value of Xt. An ARMA(p,q)process is asymptotically stationary under some restrictions on {αt}0≤t≤p and {βt}0≤t≤q .

Xt − µ = α1(Xt−1 − µ) + ...+ αp(Xt−p − µ) + Zt + β1Zt−1 + ...+ βqZt−q

We do not have to restrict ourselves to considering real valued functions. The series:

Xt = A(ω)eiωt

is deterministic so the expectation is not constant and the series is non-stationary. However,we can generalise this case into an example of a stationary series by integrating over aninfinite number of frequencies shown below.

Xt =

∫A(ω)eiωtdω

4

A series comprising of the interaction of certain wavelets is non-stationary but can be con-sidered as locally stationary and has the following form, where a wavelet is any oscillationstarting and ending at zero.

Xt =

∞∑j=1

∑k

wj,kΨj,k−tξj,k

Here, {Ψj,k−t} represents a set of discrete wavelets, {ξj,k} is a set of iid random variables(to represent error) and {wj,k} is a set of amplitudes which describe the evolution of Xt

and are fitted to the data. This is the model fitted by the package locits (detailed below)which chooses the Haar Wavelets as the set of discrete wavelets.

Figure 1: GDP data from The Office of National Statistics (2014)

1.1 Packages to Test for StationarityInitially, we may consider testing for stationarity by checking whether the series is constantin the mean or whether the local variances of each section are not significantly differentfrom one another. These methods are easy to apply and will be implemented later. How-ever, these tests are flawed because if we test for a constant variance in this way then anarbitrary split in the data must be chosen which decreases the generality of the test. Toreach a final decision about the stationarity of a series we must implement a more sophisti-cated procedure and next we shall discuss some packages designed for this purpose.

1.1.1 Locits

We may consider conducting a hypothesis test for stationarity using locits which completesthis test by computing an estimate of the Evolutionary Wavelet Spectrum (EWS) , i.e. thewavelet periodogram, and correlating it with a set of Haar wavelets to decompose the seriesinto a weighted sum of these wavelets, where the coefficients in this sum are referred to asHaar wavelet coefficients. The null hypothesis of stationarity is then rejected if any of theHaar wavelet coefficients are significantly large (Nason, 2014).

The EWS of a time series is a measure of the local power of the series at a certaintime and scale, where time is rescaled. Intuitively, local power can be considered as thecontribution to the variance with respect to a particular wavelet. Locits also functions by

5

using the EWS estimate to calculate the sample time-localized auto-covariance (lacvf).This lacvf will be discussed in greater detail in the following section.

A common criticism of wavelet analysis is that the wavelets used are chosen arbitrarilyand in some circumstances, different wavelet choices could lead to tests with differing pow-ers. Here, the standard Haar wavelets are implemented and Bonferroni and false discoveryrate (FDR) analysis are used to determine which Haar wavelet coefficients are significant.

Locits’ test for stationarity runs a large number of hypothesis tests at a 95% significancelevel so we would expect about 5% of these hypothesis tests to be rejected if the nullhypothesis were true. The Bonferroni correction and FDR analysis are implemented totreat this problem of incorrect rejections.

If we are considering conducting n tests at the 5% level then we could argue that theoverall significance becomes 5n%. Thus, Bonferroni correction will alter the outcome ofthe test by setting a new significance level for all individual tests equal to 5

n %. This alter-ation will ensure that the overall significance level is at most 5% (Shaffer 1995). Hence theBonferroni correction will significantly reduce the likelihood of rejecting the null hypothe-sis when it is true i.e. the type I error will be very low. For large values of n these methodswill create a much greater type II error in each individual hypothesis test and this causesthe overall type II error to be fairly large. This means that some non-stationary time serieswill not be rejected.

False Discovery Rate refers to the number of hypotheses we would expect to be rejectedunder a type I error, so we can expect FDR analysis to be similar to the Bonferroni cor-rection. In this case FDR analysis refers to the Benjamini-Hochberg Procedure (Benjaminiand Hochberg, 1995) which is a much less conservative approach. Consider that we arerunning n hypothesis tests, then the Benjamini-Hochberg procedure will find the largestvalue of m∈ {1, ..., n} such that the p value for test m is less than m

n 5% (since we aregenerally concerned with tests at the 5% level) and will state that all hypothesis tests up totest m lead to a rejection (Shaffer, 1995).

1.1.2 Fractal

The package fractal is mainly concerned with fractal analysis and fractal modeling but alsooffers a test for stationarity. This package conducts the Priestley-Subba Rao (PSR) testfor non stationarity which is based upon examining how homogeneous a set of spectraldensity function (SDF) estimates are across time, across frequency, or both (Constantineand Percival, 2014). Similar to locits, fractal judges the stationarity of a series from analysisin the frequency domain. First, the logarithm of the spectral estimate is taken for a set oftimes and frequencies, ANOVA analysis is then applied and examined for signs that thepower spectrum is time dependent. If there is significant evidence to suggest that the powerspectrum varies with time then the original time series is unlikely to be stationary and thenull hypotheses of stationarity is rejected.

Leakage bias is one problem with this method of analysis, which refers to the deviationin the spectral estimates due to white noise variation of the data. To minimise this bias, thetime series is divided uniformly into non-overlapping blocks and sinusoidal tapers are usedto develop the eigenspectra for each block of the time series.

Both of the above packages are relatively fast and include very helpful indications ofthe local nonstationarities in the data (which will be shown in more detail later). However,fractal is based on theory which is restricted to Gaussian processes whereas locits is notand this makes locits a more powerful test.

1.2 Initial Data AnalysisThroughout this project we shall refer to the data set of seasonally adjusted GDP by cat-egory of income up to 2014 Q3 as collected by The Office of National Statistics (2014).Before we can analyse this data set, we need some information about the structure of the

6

Figure 2: Differenced GDP data

data. More specifically, we would like to plot the data to search for signs of a trend orany periodic variation. Once we have some understanding of these features in the data wecan manipulate the series accordingly in an attempt to construct a stationary series. Thisapproach is suggested by Chatfield (1989) and the plot is shown in Figure 1.

Examining Figure 1 there is a very robust rate of increase (trend) and a dip in valueimmediately after the 2008 crisis which could arguably be treated as a small set of outliersbut for simplicity we shall assume no outliers are present. In any case, the data is clearlynon-stationary because it is dominated by trend which looks as though it is somewherebetween linear and quadratic so we shall difference the data in an attempt to remove thistrend. The effect of the first difference is shown in Figure 2. The first differences areunlikely to be stationary because,

• The most prominent feature of the series is that the variance is not constant in timeand explodes towards the end of the series.

• The trend has been largely removed but a significant positive trend is still present andit is very important that no significant trend is present before we can apply classicalmethods.

1.3 Testing for Features of StationarityPlotting the acf of the first differences (Figure 3) shows that the auto-correlation does notdecay which implies that the residual trend is infact significant and so the series is non-stationary. The periodic oscillations in the acf also suggest that there is a periodic com-ponent present in the series. Due to this we shall look instead at the series of seconddifferences.

Figure 2 appears to show that any significant trend has been removed so we shall as-sume first order stationarity and are now more interested in whether the second momentis constant. Figure 2 does not seem to reflect that this is the case since the sample localvariance increases with time. We can test this by splitting the series into four equal sectionsand calculating whether the variances of these sections are significantly different from oneanother. Here we shall choose Bartlett’s test of homoscedasticity (Bartlett 1937) for thispurpose.

It cannot be ignored that this test relies on the assumption that the time series analysedhas a Gaussian structure. Hence, we must first test whether the series can be treated as arealisation from a Gaussian distribution. We shall implement the Shapiro-Wilk normality

7

Figure 3: Correlogram of differenced GDP data from The Office of National Statistics(2014)

test (Shapiro and Wilk, 1965) which has the test statistic:

(∑n

i=1 aixi)2∑n

i=1(xi − x̄)2

where the values a1, ..., an are calculated from the sample covariance matrix.For the time series of second differences, the Shapiro-Wilk test yields a p-value of

1.057x10−14 so we must reject the hypothesis that the whole series is a Gaussian processand we cannot apply the Bartlett test to the whole series. However, applying this test to thefirst 70 observations reveals the following.

Shapiro-Wilk normality test

data: sample(d[1:70])W = 0.9622, p-value = 0.03316

This is less than 0.05 so we shall assume that this subset of 70 values can be treated as aGaussian process and apply the Bartlett test to this subset of the data.

> bartlett.test(list(d[1:35],d[36:70]))

Bartlett test of homogeneity of variances

data: list(d[1:35], d[36:70])Bartlett’s K-squared = 4.5786, df = 1, p-value = 0.03237

Analysing only the first 70 values is extremely restricting but the p-value is 0.03<0.05,so even though we are restricted to a very small section we still must accept the hypothesisthat the series is heteroskedastic, i.e. the local variance of the series is time dependent. Thisimplies that the whole series has a time dependent local variance and so the whole series isnon-stationary.

From this, we may expect the co-variance to also be time dependent and locits can beapplied to test this hypothesis. However, hwtos2 is only applicable when the length of

8

a time series is equal to a power of two and because the series which we are interestedin is initially close to zero we shall truncate the series by adding a vector of zeros to thestart, instead of taking a subset of the data. The output of this is shown in Figure 4 whichcorroborates with the result from the Bartlett test above and verifies that the data is non-stationary. We could continue to difference the series until we find a stationary series fromit but this method is exceedingly clumsy and will quickly lead to the Slutsky-Yule effect; asinusoidal pattern will be imposed on the data.

Figure 4: Plotted nonstationarities of second differences data, truncated with zeros on theleft

2 Local Auto CovarianceFigure 4 confirms that the series of second differences are non-stationary from a hypothesistest relating to the structure of the data which in-turn relies heavily on the local auto-covariance function (lacvf). This section will analyse the behaviour of the lacvf and thelacf of this series for different lags and at different times.

Although the series is decidedly non-stationary, it will appear to be stationary if weconsider small sections of the series separately and from this idea we can treat the seriesof second differences as a locally stationary time series. Under this assumption, applyingthe locits package in R to the series gives an estimate for it’s local auto-covariance functionand by normalising these values we are able to examine the local auto-correlation function(lacf).

If we were to assume second order stationarity then we would replace the lacf withthe correlogram. Thus, to compare the choice of accepting or rejecting the hypothesisof stationarity it makes sense to compare the correlogram with the time varying lacf, eventhough only the lacf will be a relevant object. This correlogram, as shown in Figure 7, givesr1= -0.434, r5= -0.262 and the significance bars are of magnitude ≈ 0.127 which suggeststhat the auto-correlation function (acf) is zero for all other lags, where rh represents theauto-correlation at lag h.

9

2.1 Examining the plotted Local auto-covariance FunctionIf this series is stationary then we would expect a simple linear regression model withintercept equal to -0.434 (r1 from the correlogram) and a slope equal to zero to be a goodmodel for the lacf at lag 1. From the plotted lacf at lag 1 in Figure 5, we can see that thisis an unlikely model for the data due to some predominant features, for example, the plotseems to have a linear trend which would imply that the acf cannot be constant and so theseries is not stationary. This argument is difficult to oppose because a linear regressionmodel with zero slope is obviously a poor model for this plot due to the systematic error.Although this cannot be ignored, the slope of this plotted data is not very sharp (0.0006) sofurther hypothesis testing is required. One such hypothesis test for this purpose is describedbelow.

Figure 5: Local auto-correlation for lag 1 with the fitted linear regression model fitted inblue

Since the mean of this data (-0.464) is fairly close to our previously mentioned r1, soit seems sensible to assume that the only issue is the possible trend in the data. This isequivalent to testing whether the slope is in fact zero. This is explored in R to give theanalysis of the fitted coefficients shown below, where ’gdp’ refers to the data set we areanalysing.

> d1=diff(gdp)> d2=diff(d1)> d=c(rep(0,19),d2)> A=lacf(d)> L1=A$lacr[,2]> t=1:256> anova( lm(L1 ˜ t ) )

Analysis of Variance Table

Response: L1Df Sum Sq Mean Sq F value Pr(>F)

t 1 0.46976 0.46976 236.48 < 2.2e-16 ***Residuals 254 0.50455 0.00199

The test shown above reveals a p-value of < 2x10−16) << 0.05 which makes it clear that

10

we should reject the hypothesis that the true value of the slope for this model is zero. Thisalone is enough information to clarify that a straight line with no slope is an unreasonablemodel for the lacf at lag 1. This shows that the local auto-covariance cannot be constant intime and helps to explain why the hypothesis of stationarity of the time series is rejected.

Figure 6: Local auto-covariance function for lags 5,8,11 along with local auto-correlationfunctions

Figure 7 shows that lags 5,8 and 11 may also yield auto-correlations which are differentfrom zero so these are the lags which we shall explore further. The plotted lacvf and lacf forthese particular lags are shown in Figure 6. Despite r5 (-0.262) being significantly differentfrom zero in the correlogram, Figure 6 shows that the lacvf at lag five centres on a valueclose to zero with a fairly low variance.

Figure 7: Correlogram for series of second differences under assumption of stationarity

Comparing these two Figures calls attention to further contradictions; it is clear thattaking the lag ’five’ is not the only lag which yields an acf seemingly different from thecorrelogram. The acf at lag 8 is outside the significance bars but the lacf for lag 8 isextremely well centred around zero with very little variance. We reach a similar result ifwe compare these Figures for a lag of 11 but this is not as strong as the previous contrasts

11

because r11 in Figure 7 is very close to the significance bar. These issues present furtherevidence to suggest that the lacf is not representative of the acf. This means that the lacf istime dependent and so the series non-stationary.

Perhaps a more intuitive way of examining this time varying lacvf is to plot the localauto-covariances at different time points similar to a correlogram. Judging by Figure 5 thebiggest difference in the lacvf could be observed by evaluating the lacvf at times 32 and175; this corresponds to a comparison of the lacvf between 1962 Q4 and 1998 Q3. Thuswe shall observe the covariances at these times to see if they are significantly different fromone another, as shown in Figure 8.

The plot on the left of Figure 8 shows many features which contrast with the plottedlacf on the right, namely;

1. The LHS lacvf plot shows that all confidence intervals contain the point 0 i.e. noneof the estimated lacvf values are significantly different from zero. Whereas the RHSplot shows that the lacvf at lag 1 is certainly different from zero.

2. An auto-covariance at lag zero is equivalent to the variance of a series, so the lacvfat lag 0 will give the local variance of the time series. A comparison of the lacvfestimate at lag 0 in each case shows that the local variance at time 175 is substantiallylarger than the local variance at time 32, since the local variance is well below 3x105

(300,000) at time 32 and is well above 4x106 (4,000,000) at time 175.

3. The confidence intervals at time 32 are very wide relative to the size of the estimates,whereas the confidence intervals at time 175 are much lower relative to the magnitudeof the estimates. This suggests that the structure at each of these times is completelydifferent i.e. around time 32 the series has some covariance structure but is mostsimilar to a purely random process and by time 175 the time series behaves similarlyto a moving average process of order one.

Figure 8: Confidence intervals for the local auto-covariances at times 32 (1962) and 175(1998) respectively

These two graphs (Figure 8) are very different so considering this, and that the FDR andBonferroni methods have a very low type 1 error probability, the statement that the seriesis stationary along with this changing lacf is dubious at best. Furthermore, if we are tomodel this data or construct forecasts then we would not be able to do so with a great deal

12

of confidence. For this reason we would be sensible to consider a different transformationof the time series in order to make predictions.

3 ForecastingOur main focus in most practical situations regarding the analysis of time series is to con-struct accurate forecasts. To do this we will first need to alter the time series into a stationaryform and find a model for the altered form. We could do this as we have done above bydifferencing or we could apply other manipulations. In any case we do not want to maketoo many alterations to the data because this will stretch the confidence intervals for ourpredictions. For example, if differencing the data once does not make it stationary then wemay want to consider a different method as opposed to continuing to difference the data.

Figure 9: Diagnostics of the fitted linear model using parameters t2,t3,t4

3.1 Linear ModelIf we cannot easily manipulate a series to form a stationary series then we could resortto fitting a linear model as a simple prediction method. This approach is also very usefulwhen a large number of predictions have to be made.

> t=1:239> t2=tˆ2> t3=tˆ3> t4=tˆ4> fit2=lm(gdp˜t+t2+t3+t4)> summary(fit2)Coefficients:

Estimate Std. Error t value Pr(>|t|)(Intercept) 8.089e+03 1.736e+03 4.660 5.31e-06 ***t -1.681e+02 9.978e+01 -1.685 0.0934 .t2 -3.046e-01 1.686e+00 -0.181 0.8567t3 8.798e-02 1.054e-02 8.345 6.20e-15 ***t4 -2.169e-04 2.179e-05 -9.952 < 2e-16 ***

13

This suggests that parameters t and t2 do not significantly decrease the standard error ofour model. To check whether t and/or t2 are redundant in the model we can apply an anovaanalysis with these two parameters fitted last, because this type of analysis will applysuccessive hypothesis tests to decide whether the final parameter is redundant or not.

Analysis of Variance Table

Response: gdpDf Sum Sq Mean Sq F value Pr(>F)

t4 1 4.3303e+12 4.3303e+12 1.5812e+05 < 2.2e-16 ***t3 1 3.1043e+11 3.1043e+11 1.1335e+04 < 2.2e-16 ***t2 1 1.4723e+09 1.4723e+09 5.3762e+01 3.671e-12 ***t 1 7.7711e+07 7.7711e+07 2.8376e+00 0.09341 .Residuals 234 6.4083e+09 2.7386e+07

This gives a p-value of 0.093 > 0.05 so we cannot reject the null hypothesis that the coef-ficient of parameter t is zero. Hence we shall remove t from the model and apply a similaranalysis to t2, which gives a p-value of 4.319x10−12 << 0.05 so we shall retain t2 in themodel. This alteration should make the model more relevant and useful but before we useit to make a prediction we should analyse whether the model fits the data well. We can dothis visually with a selection of plots shown in Figure 9.

The peaks in cooks distance towards the end of the series show that some values havea much greater effect on the model than others and this is a strong sign of a poor model.Also, both residual plots on the LHS indicate systematic error, e.g. any fitted value near1.5e+5 (150,000) will correspond to an overestimation and a fitted value closer to 3.5e+5(350,000) will be considerably overestimated. These are serious problems with the modeland are difficult to remedy. Hence a different method may be more appropriate.

3.2 Non-stationary ForecastingForecasting a non-stationary time series is very difficult and a variety of methods are pro-posed including an extension of Box Jenkins procedure i.e. fitting a parametric model (e.g.an ARMA process) whose parameters change over time (Nason et al., 2014). This is verycomplicated because we would have to find an effective method of choosing the order ofthis process before we can even calculate the fitted parameters. Here we shall apply a dif-ferent approach which focuses only on the local partial auto-correlation function (lpacf)because this is much more easily applied and is an efficient forecasting method. For thismethod to be applicable we must consider a time series with 0 expectation which is locallystationary.

Recall the series of second differences of the GDP dataset. This series has been shownin the previous sections to be non-stationary but if the series is observed over very smalltime intervals it starts to look stationary. Under this idea we can treat the series as a locallystationary time series. Also note that the mean of the series is consistently zero so bothcriteria are fulfilled. Now we can use the package lpacf to estimate the local partial auto-correlation function and construct our forecasts.

The function lpacf firstly chooses the dimension of the Yule Walker equations and thencalculates the smoothed wavelet periodogram which is used to forecast the lacvf. Then theYule Walker equations give the forecast mean for the lpacf using a choice of three methods(Killick, 2014).

• fixed generates forecasts for the series using a limited number of recent observationsand this number is previously determined by lpacf.

• extend is similar to the fixed method but the number of previous observations usedis limited to the number of steps ahead which are being forecast.

14

• recursive is applied when forecasting more than one step ahead. The first forecast iscalculated and is then treated as an observed value when calculating the subsequentforecast. This is the method most similar to a classical methods, i.e. Box Jenkinsmethod is approached in the same way.

Applying the fixed method, to the series gives the following result;

Number of times: 91Number of lags: 23Range of times from: 74 to 164Part series was analyzed (alltimes=FALSE)Smoothing binwidth used was: 147

Binwidth was chosen automatically

Since we explored the structure of the lacvf and the lacf at lags 1,5,8 and 11 (consideringthem to be the most significant values) we shall also observe the structure of the lpacf atthese lags. Contrary to the analysis regarding the lacvf and the lacf, the lpacf is a veryconsistent function which is most obvious from the lpacf at lag 1 as illustrated on the LHSof Figure 10. This indicates that the lpacf is a much more reliable tool for forecasting thanthe lacvf or the lacf.

To inspect this consistency we shall compare all calculated values of the lpacf at twodifferent times. These different times must be chosen to give the greatest difference instructure of the lpacf. Since the lpacf at lag 1 is much more significant than all other lagswe shall set these time points so as to maximise the difference in the lpacf at lag 1. Fromthe data we find that time points 1 and 77 yield the largest difference and the lpacf at thesetimes is plotted on the RHS of Figure 10. This plot reveals that the lpacf has a very similarstructure at both time points and only shows a real difference towards the tails.

Figure 10: The LHS shows the lpacf for lags 1, 5, 8 and 11 and the RHS shows the completelpacf for two different time points, 1 and 77.

This is much more reliable than the lacf and so, we shall now apply the lpacf functionto forecast the series of second differences using the recursive method, as described above,with input similar to the following.

> d=diff(diff(gdp))> forecast.lpacf( d,10, forecast.type=’recursive’)

This leads us to the predictions plotted in Figure 11. These forecasts look very convinc-ing because they seem to fit the pattern set by the observed data. The confidence intervals

15

plotted on the LHS of Figure 11 also look reasonable; if the method was not very usefulin making these predictions then the confidence intervals would be very large and if themethod was overconfident in it’s results we would observe very narrow confidence inter-vals. Note that each of these intervals has a very similar size due to our recursive methodwhich treats all previous forecasts as fixed.

Figure 11: The series of (transformed) second differences along with the (untransformed)series of GDP. Including predictions for the following ten values with confidence intervalsfor these predictions.

Back transforming (applying the inverse function to our differencing procedure) theseconfidence intervals creates the confidence intervals which relate to the untransformed data,shown on the RHS of Figure 11. These intervals are increasingly large and the largest ofwhich seems unreasonably so. More than the choice of method, this illustrates the problemscaused by too many transformations and there may be a more relevant approach which canbe applied to just one transformation of the data, e.g. the series of first differences.

3.3 Box JenkinsNote that the problems with the linear model (fitted previously) are caused by the suddendrop in 2008 and we have mentioned that this drop in the GDP series is most likely to havebeen caused by the economic crisis in 2008. If this is true then the data which occurs beforethis drop is irrelevant to future values and so irrelevant to a prediction. Using this idea weshall form our predictions from a data set consisting of all points after the drop (data points218 to 239).

This shortened data set is plotted in Figure 12 and is roughly linear so differencing thisseries is likely to give us a stationary series. In this case we can apply the popular linearestimator, Box Jenkins procedure. Box Jenkins procedure can be computationally expen-sive but is intuitively simple. This method consists of fitting a model with an error term e.g.an autoregressive moving average model (ARMA) and setting all unobserved/future errorsequal to zero. Box Jenkins procedure has the lowest prediction error of any other linearestimator and is generally the preferred choice of method. The difficulty of this method isthe fitting of the model.

Let x(N, r) represent the prediction for XN+r made at time N, where N is the lengthof the observed time series. Then Box Jenkins method of prediction uses a linear estimatorof XN+r; we call this a linear estimator because it is linear in {Xt}0≤t≤N . Expressionsfor Xt and x(N, r) are shown below, for some real valued sequence θi and a sequence of

16

Figure 12: Series of data points after the 2008 crisis, accompanied by the series of differ-ences

purely random variables Zt with zero mean and variance σ2.

Xt =

i=∞∑i=0

θiZt−i x(N, r) =

i=∞∑i=0

θi+rZN−i (1)

Prediction error for an r-step-ahead prediction made at time N shall be denoted as e(N, r) =XN+r − x(N, r). Note that e(N,r) has zero expectation under the assumption that the cor-rect model has been fitted. The variance of the prediction, denoted similarly as V(N,r), isgiven as follows.

V (N, r) = var(XN+r − x(N, r)) = var(

i=r−1∑i=0

θiZN+r−i) = σ2Z

i=r−1∑i=0

θ2i (2)

This shows that the variance increases by a squared quantity each time r is increased, thissuggests that Box Jenkins procedure may be unreliable when conducting a large number ofpredictions.

The dataset in question is very small but we would still like to have some confirmationthat the series is second order stationary so we shall apply locits to give the following.

There are 7 hypothesis tests altogetherThere were 0 FDR rejectsNo p-values were smaller than the FDR val of:Using Bonferroni rejection p-value is 0.007142857And there would be 0 rejections.

Let Yt denote the differenced series s.t. Yt = Xt −Xt−1. From this short analysis we findno rejections so we shall assume that Yt is in fact a stationary series. Figure 13 shows thecorrelogram of the differenced series which is entirely contained within the confidence bars,this is representative of a purely random process which leads to the model Yt − µY = Zt,where µY denotes the expectation of Yt. This is equivalent to θ0 = 1 and θi = 0 fori ≥ 1 in the expansions (1) above. The plotted partial auto-correlations on the RHS ofFigure 13 also show no values significantly different from zero which corroborates withthe correlogram in supporting the white noise model.

Under this model y(N, r) = E(YN+r) = E(µY +ZN+r) = µY ,∀r ≥ 1. More specif-ically, this gives x(N, r) = x(N, r − 1) + µY = ... = x(N, 0) + rµY = XN + rµY , forr ≥ 1. From equation (2) it is clear that predictions for the series {Yt} will have a constant

17

variance of σ2Z and we can use this to find an expression for the variance of our prediction

error with respect to {Xt} as follows.

Figure 13: Correlogram and Partial correlogram for data points after the 2008 crisis

var(e(N, r)) = var[XN+r − x(N, r)]= var[XN+r−XN+r−1+XN+r−1−XN+r−2+...+XN+1−x(N, r)]= var[(XN+r −XN+r−1) + ...+ (XN+1 −XN )− rµY ]= var[YN+r + ...+ YN+1 − rµY ]= var[YN+r + ...+ YN+1]= var[YN+r] + ...+ var[YN+1]= rσ2

Z

Hence, a rough 95% prediction interval for XN+r is given by formula (3) using the97.5% quantile of a standard Gaussian distribution (1.96).

[x(N, r)− (1.96)√rσZ , x(N, r) + (1.96)

√rσZ ], r ≥ 1 (3)

or equivalently,

[XN + rµY − (1.96)√rσZ , XN + rµY + (1.96)

√rσZ ], r ≥ 1 (4)

Formula (4) is more practical form because we can estimate all the values which we donot know already by fitting our purely random process model. Using R to fit this modelgives µ̄Y =3787.9 and σ̄Z=3247.0, substituting these estimates into formula (4) gives theprediction intervals for {Xt} which are shown in Figure 14.

Comparing Figure 14 with Figure 11, we might consider taking preference to the BoxJenkins procedure even though the first confidence interval from the Box Jenkins procedurehas length 12110.2 whereas the lpacf method gives an interval of 9754.6. This suggests that,for a one step ahead prediction, the lpacf method is the more efficient method of prediction.However, the Box Jenkins method yields a smaller confidence interval for all of the otherpredictions and has a more reasonable structure to it’s resultant prediction intervals whichindicates that the Box Jenkins method is more appropriate in this case.

Note that it would not be fair to compare these methods’ general performance fromthis analysis because the lpacf method is restricted to modeling the second differenceswhich stretches the prediction intervals.Also, we cannot compare the performance of thesemethods directly by applying lpacf to the series of GDP data after the crisis because thatseries is stationary but has a non-zero mean which violates the assumptions in applying thelpacf method.

18

Figure 14: Shortened series of GDP extended with predictions from the Box Jenkinsmethod where the dotted lines indicated a roughly 95% confidence interval for these pre-dictions

To conclude, the Box Jenkins approach yields a more reliable and useful result thanother considered methods so the predictions we would have the most faith in are shown inFigure 14.

19

ReferencesBartlett, M. S. (1937) Properties of sufficiency and statistical tests, Journal of the Royal

Statistical Society, Series A (Statistics in Society), 160, 268–282.

Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical andpowerful approach to multiple testing., Journal of the Royal Statistical Society: Series B(Statistical Methodology), 57, 289–300.

Chatfield, C. (1989) The Analysis of Time Series: An Introduction, Chapman and Hall,London, fourth edition.

Constantine, W. L. B. and Percival, D. B. (2014) fractal: Frac-tal time series modeling and analysis, R package version 2.0-0, url:http://cran.r-project.org/web/packages/fractal/fractal.pdf.

Killick, R. (2014) lpacf: R documentation, R package version 1.3, url:http://127.0.0.1:31024/library/lpacf/html/lpacf.html.

Nason, G. (2013) A test for second-order stationarity and approximate confidence intervalsfor localized auto-covariances for locally stationary time series., Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 75, 879–904.

Nason, G. (2014) locits: Test of stationarity and lo-caised autocovariance, R package version 1.4, url:http://cran.r-project.org/web/packages/locits/locits.pdf.

Nason, G., Knight, M., and Eckley, I. (2014) Forecasting locally stationary time series, url:http://forecasters.org/wp/wp-content/uploads/gravity forms/7-2a51b93047891f1ec3608bdbd77ca58d/2014/07/Killick RebeccaISF2014.pdf .

R Core Team (2014) R: A Language and Environment for Statistical Com-puting, R Foundation for Statistical Computing, Vienna, Austria, url:http://www.R-project.org/.

Shaffer, J. P. (1995) Multiple hypothesis testing, Annual Review of Psychology, 46, 561–584.

Shapiro, S. and Wilk, M. (1965) An analysis of variance test for normaility (completesamples), Biometrika, 52, 591–611.

The Office of National Statistics (2014) Gross domestic product by category of income, url:http://www.ons.gov.uk/ons/datasets-and-tables/data-selector.html?cdid=YBHA&dataset=qna&table-id=D.

20

Documents

Testing for Second Order Stationarity and Forecasting Time Series · 2016-11-04 · and auto-covariances of the series are also constant in time then we describe the series as second