14
Journal of Forecasting, Vol. 12, 365-378 (1993) Forecasting with Generalized Bayesian Vector Autoregressions K. RAO KADIYALA Purdue University, West Lafayetre, IN 47907, U. S.A and SUNE KARLSSON Stockholm School of Economics, Stockholm, Sweden ABSTRACT The effects of using different distributions to parameterize the prior beliefs in a Bayesian analysis of vector autoregressions are studied. The well- known Minnesota prior of Litterman as well as four less restrictive distributions are considered. TWO of these prior distributions are new to vector autoregressive models. When the forecasting performance of the different parameterizations of the prior beliefs are compared it is found that the prior distributions that allow for dependencies between the equations of the VAR give rise to better forecasts. KEY WORDS Diffuse prior ENC prior Normal-Diffuse prior Normal-Wishart prior Minnesota prior Monte Carlo integration Multivariate time series The use of Vector Autoregressive (VAR) models in applied economics has increased significantly following the criticism of the Cowles Commission approach to the modelling of systems of simultaneous equations (see, for example, Sims, 1980). There has been a shift from the modelling of economic systems with structural equations towards modelling the joint time- series behaviour of the variables. The frequent use of VARs for modelling the time-series behaviour can partly be explained by their relative ease of use as compared to the richer class of vector ARMA models. The main advantage of VAR models lies in the identification stage, especially if all variables are taken to enter with identical lags. The estimation problem is also particularly simple in the case of identical lags. In the classical analysis, OLS is efficient under the usual assumptions and the Bayesian case is also considerably more straightforward. The main disadvantage of the VAR models is the large number of parameters that need to be estimated. With the sample sizes common in economics the classical estimation procedures often run into degrees-of-freedom problems. In a Bayesian framework, on the other hand, sharp posteriors can often be obtained even with relatively uninformative prior distributions. As a consequence, VAR modelling is one of the areas within applied economics where 0277-6693/93/030365-14$12.00 Received September 1991 0 1993 by John Wiley & Sons, Ltd. Revised June 1992

Forecasting with generalized bayesian vector auto regressions

Embed Size (px)

Citation preview

Page 1: Forecasting with generalized bayesian vector auto regressions

Journal of Forecasting, Vol. 12, 365-378 (1993)

Forecasting with Generalized Bayesian Vector Autoregressions

K. RAO KADIYALA Purdue University, West Lafayetre, IN 47907, U. S.A

and

SUNE KARLSSON Stockholm School of Economics, Stockholm, Sweden

ABSTRACT The effects of using different distributions to parameterize the prior beliefs in a Bayesian analysis of vector autoregressions are studied. The well- known Minnesota prior of Litterman as well as four less restrictive distributions are considered. TWO of these prior distributions are new to vector autoregressive models. When the forecasting performance of the different parameterizations of the prior beliefs are compared it is found that the prior distributions that allow for dependencies between the equations of the VAR give rise to better forecasts.

KEY WORDS Diffuse prior ENC prior Normal-Diffuse prior Normal-Wishart prior Minnesota prior Monte Carlo integration Multivariate time series

The use of Vector Autoregressive (VAR) models in applied economics has increased significantly following the criticism of the Cowles Commission approach to the modelling of systems of simultaneous equations (see, for example, Sims, 1980). There has been a shift from the modelling of economic systems with structural equations towards modelling the joint time- series behaviour of the variables.

The frequent use of VARs for modelling the time-series behaviour can partly be explained by their relative ease of use as compared to the richer class of vector ARMA models. The main advantage of VAR models lies in the identification stage, especially if all variables are taken to enter with identical lags. The estimation problem is also particularly simple in the case of identical lags. In the classical analysis, OLS is efficient under the usual assumptions and the Bayesian case is also considerably more straightforward.

The main disadvantage of the VAR models is the large number of parameters that need to be estimated. With the sample sizes common in economics the classical estimation procedures often run into degrees-of-freedom problems. In a Bayesian framework, on the other hand, sharp posteriors can often be obtained even with relatively uninformative prior distributions. As a consequence, VAR modelling is one of the areas within applied economics where 0277-6693/93/030365-14$12.00 Received September 1991 0 1993 by John Wiley & Sons, Ltd. Revised June 1992

Page 2: Forecasting with generalized bayesian vector auto regressions

366 Journal of Forecasting Vol. 12, Iss. Nos. 3 & 4

Bayesian methods have been most extensively used. The Bayesian procedure of Litterman (1980), with the so-called ‘Minnesota prior’, is the most frequently used method, but the natural conjugate Normal-Wishart prior (Litterman, 1980; Broemeling, 1985) and the diffuse prior (Geweke, 1988) have also seen some use.

The Litterman procedure does not take account of dependencies between the equations and is of a univariate nature. The other two priors, on the other hand, allow for interaction and dependencies between the equations.

In most situations any prior concepts that the researcher may have about the parameter values in the VAR are, at best, expressed in terms of the first and second moments and perhaps the support of the prior distribution. It is, in most cases, very hard to form a prior belief concerning the dependencies between parameters in different equations as well as in the same equation. It is nonetheless important that such dependencies are allowed for and that we ‘let the data speak’ on this point.

In this paper the effects of various methods of parameterizing the prior beliefs are compared by way of their forecast performance. In order to make the comparisons richer, two additional priors not previously utilized in a VAR setting, the Normal-Diffuse and the ENC priors, are introduced.

In the comparisons of forecasting performance we do not consider the issue of model selection. Two models and data sets available in the literature are simply taken as given for the purpose of the forecasting exercises. The issue of model selection in VAR-based forecasting models is studied by Kling and Bessler (1985) and Edlund and Karlsson (1993).

The rest of the paper is organized as follows. The distributions used to express the prior beliefs are introduced together with the corresponding posterior distributions in the next section. The two forecasting approaches that are considered are discussed in the third section. The numerical methods used to evaluate the posterior distributions are reviewed in the fourth section. The fifth section presents the results of three forecasting experiments and conclusions are presented in the final section.

BAYESIAN ANALYSIS OF VARs

Let y t be the row vector of q variables of interest observed at time t and xf a row vector of r exogenous variables that might influence y t . The VAR can then be written as

I)

The Ai’s and C are matrices of unknown parameters, of dimensions q x q and r x q, respectively. Throughout it will be assumed that the vector of disturbances, ut, is distributed as multivariate normal with mean vector 0 and variance-covariance matrix % and that uf, is independent of us for all t # s.

Prior beliefs The prior beliefs used in the comparison are essentially those embedded in the Minnesota prior of Litterman (1980). A finite distributed lag structure is assumed to describe the time-series behaviour of each of the variables.

These prior beliefs are made operational by specifying the prior moments as follows. The prior means of the regression parameters are set to zero except for the first own lag, which has a prior mean of unity. The prior parameter variances decrease with the lag length, making the

Page 3: Forecasting with generalized bayesian vector auto regressions

K. R . Kadiyala and S. Karlsson Generalized Bayesian Vector Autoregressions 361

prior tighter around zero. The regression parameters on the exogenous variables have a large prior variance, making the prior relatively uninformative on these parameters. The prior covariances are set to zero for simplicity. Finally, the variances are scaled to account for differing variability in the variables. See also Litterman (1986a) for a discussion and motivation of these prior beliefs.

The tightness of the prior distribution, that is, the magnitude of the prior variances, is determined by the three hyperparameters TI , T Z , and 7r3. Where m is the tightness on own lags, 7r2 is the tightness on lags of other variables and 7r3 is the tightness on lags of exogenous variables. In each case the prior variance of the parameter is proportional to the relevant hyperparameter.

The prior-posterior pairs For the technical discussion of the prior and posterior distributions the following notation is needed. Write equation (1) as yr = ztr + U r , where Zr = ( x r , yr - 1, ..., ~ r - ~ j , r = [Cf , A!, ..., A;) I. Performing the conventional stacking of the row vectors yr, zit, and ut for t = 1, ..., Tinto Y, Z, and U we have Y = Z r + U. Then, letting the subscript i denote the ith column vector, we can express the equation for each variable as yi = Z y i + ui. For y, y, and u the vectors obtained by stacking the columns of Y, I', and U, the system can be written as y = (I 0 Z ) y + u. Note that u - N(O,* 0 I). In addition, the tilde denotes parameters of the prior distribution, the bar represents the parameters of the posterior distribution, the OLS estimates of r and y are denoted by f and +, respectively, and s f , is the residual variance of a p-lag univariate autoregression on variable i.

The Minnesota prior This is the prior distribution advocated by Litterman (1986a). Each equation is treated separately and the prior beliefs are parameterized as yi - N(Ti, E i ) with the residual variance, $ i i , fixed at the value 0.81sf. The prior variance matrix of the parameters, E i , is assumed diagonal with diagonal elements

0.817r3sf for coefficients on exogenous variables 2

for coefficients on lags of variable j # i T Z S i

ksf

for coefficients on own lags Zl

k -

where k denotes the lag length. With normally distributed data this yields the normal posterior distribution y; - N(y; , E i ) ,

with Ci = (ET' + $ i l Z ' Z ) - ' and Ti = C i ( E ? ' T i + $GIZfy i ) . Note that this is equivalent to a multivariate analysis where the prior distribution of the vectors yi is jointly normal with a block-diagonal variance-covariance matrix and $xed and diagonal residual variance- covariance matrix.

The researcher using the Minnesota prior is thus claiming knowledge about the residual variance-covariance matrix, while at the same time admitting less than perfect knowledge about the regression parameters in the VAR. We find this to be strange since, in general, it is easier to form an opinion about the regression parameters than about the residual variance-covariance matrix. The assumption that the variance-covariance matrix is known can, to some extent, be justified by the practice of obtaining the diagonal elements from the

Page 4: Forecasting with generalized bayesian vector auto regressions

368 Journal of Forecasting Vol. 12, Iss. Nos. 3 & 4

data. The diagonality assumption is more troublesome, since this rarely is supported by the data. In fact, much of the current controversy over the interpretation of VAR models is centred on the issue of how to achieve a diagonal variance-covariance matrix by transformations of the system of equations (1).

In the remainder of this section we introduce four families of prior distributions which allow for non-diagonal residual variance-covariance matrices. In effect, this takes us from a limited information analysis to a full information analysis of the systems of equations (1) (see Drkze and Richard, 1983, for an overview of these issues in the context of simultaneous-equation systems). In the case of a VAR, the analysis is, however, more straightforward since there are no right-hand side endogenous variables and no a priori restrictions on the parameters.

The Normal- Wishart prior' The natural conjugate prior for normal data is the Normal-Wishart where the vector of regression parameters, y, is normally distributed conditional on the residual variance- covariance matrix +, and + has an inverted Wishart distribution with a degrees of freedom:

y I + - N ( ; j . , + @ ( Q ) , + - IW(Sr ,a )

Integrating out + of the joint prior we have the marginal prior distribution function of the ( p q + r ) x r matrix r as

p(r) 1 @ + (r - f')rfj-l(r - f-) I - ( a + p q + r ) / 2

A matricvariate t-distribution with a degrees of freedom, I' - MT(f i - ' , 4, f', a), with prior mean and variance E ( y ) = 7, a > q, and Var(y) = (a - q - l)-'* @ 6, a > q + 1 . The posterior distribution is given by

y I +, y -N(T , * @ a), + l y -IW(%, T + a)

where T is the number of observations, fi = (fj-' + Z'Z)-' , = fi@-'f' + Z ' Z f ) , and + (Y - Zf) ' (Y - Zfi) - r'(fi-' + ZfZ)?l. The marginal posterior

distribution of r is, of course, matricvariate t , r - M T @ - ' , %, f-, T + a). See Zellner (1971) or Drkze and Richard (1983) for a derivation of these results.

The two main shortcomings of the Minnesota prior, the forced independence between equations and the fixed residual variance-covariance matrix, are not present here. On the other hand, the structure of the variance-covariance matrix of y force us to treat all equations symmetrically.

The prior beliefs are specified as follows. The diagonal elements of % are set to

(3)

and the off-diagonal elements are set to zero. The prior marginal expectation of the residual variance-covariance matrix thus coincides with the fixed variance-covariance matrix of the Minnesota prior. The diagonal elements of d are then chosen so that the prior variance of y is 0.81 msf for coefficients on exogenous variables and ( XZS:)/ (ks?) for the coefficient on lag k of variable j , dependent variable i. Note that the restricted structure of the variance-covariance matrix of y force us to treat all variables in the same way. As a consequence, the prior variances for own lags differ from the Minnesota prior.

= f r Z f Z f + f"fi-'f' +

$ij = 0 . 8 1 ( ~ ~ - q - l ) ~ ?

' Litterman (19801 considered this prior as well as the Minnesota prior but rejected the use of the Normal-Wishart prior on the grounds that it is computationally inconvenient. The calculation of the parameters of the posterior distribution are, however, only marginally more complicated than for the Minnesota prior.

Page 5: Forecasting with generalized bayesian vector auto regressions

K. R. Kadiyala and S. Karlsson Generalized Bayesian Vector Autoregressions 369

The ith moment of y exists if u - q 2 i for u the degrees of freedom of the marginal matricvariate t-distribution. For forecasts m periods ahead the mth posterior moment is needed. In addition, the moment of order 2m is required for the variance of the forecast. The prior is specified in terms of means and variances of the regression parameters and the prior second moment is needed. That is, CY - q 2 2 is needed for specifying the prior and T + 01 - q 2 2m is required for forecasting. The prior degrees of freedom are set to

(4) a = max(q+ 2 , q + 2m - T)

The diffuse (Jeffreys’) prior The diffuse prior, proposed by Geisser (1965) and Tiao and Zellner (1964), is closely related to the Normal-Wishart in the sense that the posterior distribution is of the same form. Consequently, the posterior variance-covariance matrix of y suffers from the same restrictions as with the Normal-Wishart prior. Reflecting ignorance about the regression parameters, we have the prior distribution

p ( y , t) 0: I * I - ( q + 1 ) / 2

The posterior distribution is obtained as

y I t , y - N ( . j . , + @(Z’Z) - ’ ) , t l y - Z W ( Y - Z f ) ’ ( Y -Zf’), T - p q - r )

r - MT(Z’Z, (Y - Zf’)’(Y - Z f ), f, T - pq - r) .

Note that we have E ( r 1 y) = f and Var(y( y) = ( T - p q - r - q - l)-’(Y - Zf) ’ (Y - Zf’) @ (Z’Z)-’. That is, the posterior moments are similar to what we obtain from Zellner’s classical SUR estimator.

with the marginal posterior distribution of r as matricvariate t,

The Normal-Diffuse prior This prior avoids the restrictions on the variance-covariance matrix of y and still allows for a non-diagonal residual variance-covariance matrix. The multivariate normal prior on the regression parameters of the Minnesota prior is combined with the diffuse prior on the residual variance-covariance matrix. That is, we have prior independence between y and t with

(Q+ 1112 y - N % Q , P(+) 1 t I - This also allows the researcher to specify prior dependence between parameters of different equations. The marginal posterior of the parameters is proportional to the product of the marginal prior distribution and a matricvariate t-distribution. The matricvariate t-factor will give rise to posterior dependence between the equations even if the prior is specified with a diagonal or block-diagonal variance-covariance matrix. The matricvariate t-factor is identical to the posterior associated with the Diffuse prior. The form of the marginal posterior is troublesome in the sense that large differences between the information contained in the prior and in the likelihood function might cause the posterior to be bimodal, and thus the posterior mean to have low posterior probability. In the forecasting exercises the diagonal elements of 2 are set as in equation (2) and the off-diagonal elements are set to zero.

The Extended Natural Conjugate prior The Extended Natural Conjugate (ENC) prior overcomes the restrictions on Var(.y) of the Normal-Wishart prior by reparameterizing the VAR in equation (1). Let A be a q(pq + r ) X q

Page 6: Forecasting with generalized bayesian vector auto regressions

370 Journal of Forecasting Vol. 12, Zss. Nos. 3 & 4

matrix with the columns yi on the diagonal and all other elements zero, that is,

y1 0 ... 0

A = ( ; y2 0 0)

Yq ... Also let Z = I ’ @ Z, where 1 is a q x 1 vector of ones. Equation (1) can then be rewritten as Y = ZA + U. For the prior distribution

P(A) oc ~ @ + ( A - A ) ~ M ( A - A ) ~ - ~ ’ ~ q 1 A - ZW(+ + (A - i i l r M ( a - &,a)

and normal data, the posterior distribution for this prior is given by Dreze and Morales (1976) as

P(A 1 y ) oc 1 + + (A - A)’R(a - A) 1 A, y - rw(+ + (A - ii)lM(a - A), T + Q )

where m = M + Z‘Z, $=* + A ‘ M A + Y ‘ Y -A’RA and A is the solution to @A = M A + Z’Y. If M is of full rank,

The marginal distribution of A has the form of a matricvariate t density. However, due to the restricted structure of A it is not matricvariate t.

When parameterizing the prior beliefs, the following fact (Dreze and Richard, 1983) is used. If * is diagonal, M is block diagonal and has the same structure as A, then the prior distribution of y factors into independent multivariate t priors with Q - pq - r degrees of freedom for the parameters of each equation.

The diagonal elements of * are set as in equation (3) and the off-diagonal elements are set to zero. The prior expectation of q conditional on A = A then coincides with the fixed variance-covariance matrix of the Minnesota prior. From the independent multivariate t’s we have

will be of full rank and is unique.

and the diagonal elements of M i i are chosen such that the prior variances of Ti are as in equation (2) with the off-diagonal elements zero. A sufficient condition for the existence of the ith moment of y is given by Drbze and Richard (1983) as Q - q - p q - r 2 i. Consequently, the prior degrees of freedom are set to a = max (2 + q + pq + r, 2m + q + p4 + r - TI.

The main disadvantage of the Normal-Diffuse and ENC priors is that no closed-form solutions exists for the posterior expectation and variance of y and they must be evaluated numerically.

FORECASTS

In applications of the Minnesota prior, forecasts are generated using the posterior parameter means and the chain rule of forecasting. The implied loss function behind this approach is the sum of the squared errors for the parameters of the VAR. The chain rule will also be used for the OLS forecasts that are generated for comparative purposes.

For the other posterior distributions presented above, forecasts will be generated using a procedure suggested by Chow (1973). The loss function here is the sum of the squared forecast errors. Consequently the posterior risk is minimized by setting the forecast to the posterior

Page 7: Forecasting with generalized bayesian vector auto regressions

K. R. Kadiyala and S. Karlsson Generalized Bayesian Vector Autoregressions 37 1

expectation of the forecasted variables. Rewriting equation (1) as a first-order system y: = y:- IA + x,D + u:, where y: = [yt , ..., y f - p + I ) , D = (C, 0, ..., O), u: = lu,, 0, ..., 0) and

A1 I 0 ... 0

A = ( ? I ... i) Ap i

The forecasts up to h periods ahead, conditional on y, is then given by y: (h) 1 y = y:Ah + CfZd xt+h-iDA'. The expectation under the posterior distribution is then obtained as the integral

Since no closed forms are readily available for the integral in equation (3, it is evaluated numerically.

MONTE CARL0 INTEGRATION

Following Kloek and van Dijk (1978), we have chosen to evaluate equation ( 5 ) using Monte Carlo integration instead of standard numerical integration techniques. ' Standard numerical integration is relatively inefficient when the integral has a high dimensionality and we expect to achieve greater precision with a lower computational effort using Monte Carlo integration. An additional benefit is that with Monte Carlo integration probabilistic error bounds are readily available.

For the priors that yield a matricvariate t posterior the procedure is straightforward since draws can be generated directly from the matricvariate t distribution and the integral in equation ( 5 ) is estimated as the sample mean of the forecasts. For the other posteriors importance sampling is used, that is, the posterior distribution is approximated by a distribution from which it is known how to generate random numbers. The forecasts calculated from each draw are then weighted by the ratio of the kernel of the posterior and the kernel of the importance function at the draw and the integral is estimated by the weighted mean.

For the ENC posterior the importance function is constructed according to a suggestion of Bauwens (1984). The importance function is the product of independent multivariate t-distribution functions, each approximating the marginal distribution of the parameters of one of the equations. For the Normal-Diffuse posterior, the 2-0 poly t-distribution (Drtze, 1977) is used as the importance function.

Antithetic variates are used in all cases to further reduce the sampling error (see Geweke, 1988, for an example of the effect of antithetic variates in applications like these). Karlsson (1989) gives a full account of the methods used here.

FORECASTING EXPERIMENTS

Three forecasting experiments were conducted in order to assess the performance of the five methods of parameterizing the prior beliefs. The experiments are selected to reflect real-life

'Other alternatives are the Gibbs sampler and Laplace approximation. For the purpose of this paper we have, however, only considered the use of Monte Carlo integration.

Page 8: Forecasting with generalized bayesian vector auto regressions

372 Journal of Forecasting Vol. 12, Iss. Nos. 3 & 4

13 12

12 11

11 10

l o \ . _ . . . . . . . . . . . . . . . _ , . . . , 9\

Table I. Hyperparameters of the prior distribution

. . . . . . . . . . . . . . . . . . . . . . .

Experiment Canadian data US wheat

Parameter Small sample Large sample export data

*I 0.07 0.07 0.07 A2 0.007 0.007 0.007 7r3 1.4.105 1.4-105 1.4. lo5 a (degrees of 9 4 6 freedom)a 21 16 19

"First row is the Normal-Wishart prior and second is the ENC prior.

situations in which VARs are used. In the experiments part of the data are set aside and the models are fitted to the rest of the data. Forecasts are then generated for the time periods that were set aside. The hyperparameters XI, XZ, and a3 were somewhat arbitrarily set to the values in Table I. They are close to the values that Litterman (1986b) and Doan et al. (1984) found to work well for the Minnesota prior.

The first two experiments use data on Canadian money and GNP from the first quarter of 1955 to the last quarter of 1977. As is evident from Figure 1, this data set displays very little variation around the trend and should be relatively easy to forecast. The data were obtained from Hsiao (1979) and has also been analysed by Lutkepohl (1982). As indicated in the introduction, we do not consider the issue of model identification. Instead, the model selected by Lutkepohl for the logarithm of nominal GNP and the logarithm of M2 is used with minor modifications. The Normal-Wishart and Diffuse priors require that we have the same right- hand side variables in all equations and the maximum lag length selected by Lutkepohl is used for both variables. In addition, a time trend is included. The right-hand side variables are thus the same in both equations, five lags of log GNP and log M2, a constant term and a time trend. In both experiments forecasts are made 12 periods (three years) ahead, starting with forecasts made as of the first quarter of 1970.

For the first experiment a subset of the data was used in order to mirror situations where few data are available. Ten sets of forecasts were made and the first set was based on 17 observations and the last set on 26 observations. In this experiment forecasts were not generated from the diffuse posterior since they had no moments for the longer lead times. The

Page 9: Forecasting with generalized bayesian vector auto regressions

K. R . Kadiyala and S. Karlsson Generalized Bayesian Vector Autoregressions 373

Table 11. Root mean square error, forecasts of Canadian money and GNP, small sample

Method Lead Normal- Normal-

Variable time OLS Minnesota Wishart ENC Diffuse

Log GNP 1 2 3 4 6 8

10 12

Sum 1-12

Log M2 1 2 3 4 6 8

10 12

Sum 1-12

0 .O 166 0.0270b 0.0396b 0.0495 0.0736b 0.1091b 0.1456 0.1831b 1 .0870b

0.0231 0.0506b 0.0719b 0.0930b 0.0981 0. 1244b 0.1624 0.1965b 1.3494b

0.0096 " 0.0154 0.0272 0.0384 0.0665

0.1396b 0. 1696" 0.9854b

0.105Ob

0.0104 0.0205 0.0298 0.0390 0.0598 0.0920" 0.1213a 0.1483" 0.8880"

0.0099 0.0155 0.0263 " 0.0371 0.0640 0.1028 0.1408 0.1772 0.9860b

0.0122b 0.0241 0.0353 0.0460b 0.0669 0.0965 0.1262 0.1564 0.9537

0.0098 0.0096" 0.0153 0.0152" 0.0265 0.0266 0.0370a 0.0372 0.0635' 0.0641 0.1023" 0.1028b 0.1392 0.1388" 0.1734 0.1715 0.9749 0.9739"

0.0103" 0.0103" 0.0199" 0.0199" 0.0289 0.0286a 0.0380 0.0371 " 0.0598 0.0578" 0.0930 0.0922 0.1254 0.1249 0.1579 0.1556 0.9083 0.8966

aThe best result for each lead time. bRMSE which differs significantly from the best at the 5% level.

Root Mean Squared Errors of these forecasts are in Table 11. We also report those instances when a RMSE differ significantly, at the 5 % level, from the lowest RMSE for that lead time.

The significance tests reported in Table I1 use the procedure of Ashley et al. 1980. Note that this procedure tests a sufficient but not necessary condition for the equality of two RMSEs. The test situation is equivalent to testing the joint null of equal variance and squared mean of the forecast errors against the alternative that the differences are both positive. Clearly this differ from testing the necessary condition that the sum of the differences in variance and squared mean is zero. Consequently, the test will fail to reject the null for large differences in the RMSE when, say, the difference in squared mean is significantly negative but partly offset by a large positive difference in variance.

The ENC and Normal-Diffuse priors do best for log GNP and the Minnesota and Normal- Diffuse priors do best for log M2. The RMSE for OLS is significantly worse than the best RMSE for most of the lead times. It is clear that the introduction of prior information improves the forecasts in this experiment. The Minnesota prior gives the lowest RMSE for several lead times but in no case does this differ significantly from the RMSE of the more general prior distributions. The Minnesota prior is, on the other hand, significantly worse than the ENC or Normal-Diffuse prior in four cases.

In the second experiment the full data set was used and 20 sets of forecasts were made. The first set of forecasts was based on 56 observations and the last set on 75 observations. RMSEs of the forecasts are in Table 111.

In this case, the likelihood function dominates the posteriors and the differences between forecasts is small. We do, however, find that the Normal-Diffuse prior does best for log GNP

Page 10: Forecasting with generalized bayesian vector auto regressions

374 Journal of Forecasting Vol. 12, Iss. Nos. 3 & 4

Table 111. Root mean square error, forecasts of Canadian money and GNP, large sample

Lead Variable time OLS Minnesota

Method Normal- Wishart ENC

Normal- Diffuse Diffuse

Log GNP 1 2 3 4 6 8

10 12

Sum 1-12

Log M2 1 2 3 4 6 8

10 12

Sum 1-12

0.0141 0.0239 0.0324 0.0424b 0.0595 0.0776b 0.1019b 0.1296 0.8054b

0.0120 0.0238 0.0334 0.0433 0.0616b 0.0799 0.0996 0.1252" 0.8033

0.01 32 0.0225" 0.0307 0.0394 0.0548 0.0718 0.0959 0.1231 0.7540

0.0105" 0.0218 0.0325 0.0439 0.0652 0.0855 0.1062b 0.1322b 0.8421

0.0131 " 0.0132 0.0228 0.0229 0.0314 0.0314 0.0403 0.0398 0.0561 0.0550 0.0747 0.0724 0.1004 0.0958 0.1293 0.1217" 0.7825 0.7547

0.01 16 0.01 12b 0.0234 0.0225 0.0344 0.0333 0.0454 0.0441 0.0661 0.0642b 0.0871 0.0840 0.1089b 0.1039 0.1358b 0.1275 0.8646b 0.8280

0.0132 0.0225" 0.0306" 0.0393 " 0.0546" 0.0716" 0.0957" 0.1232 0.7527"

0.0105" 0.0217" 0.0324" 0.0436 0.0644 0.0842 0.1042b 0.1295 0.8294

0.0141 0.0239 0.0324 0.0425 0.060Ob 0.0783 0.1034b 0.1328b 0.815Sb

0.012Ob 0.0238b 0.0333 0.043 1 " 0.0610a 0.0793" 0.0993" 0.1258 0.8007 "

a*bAs Table 11.

S h i p m e n t s 0.61 0.4 0.2 0.0 7.0 7.6 7.4 7.21 . . . . . . . . . . . . 1974 1976 1970 1900 1902 1904

Price -2.6 -2.0 -3.0 -3.2 - 3 . 4

-3.6 - 3 . 0 - 4 . 0 1 . . . . . . . . . . . +

L

1974 1976 1970 1900 1902 1904

Exchange R a t e 5.21 5.1 5.0 4.9 4.0 4.7 4.6 4.5 4 . 4 t . . . . . . . . . . . . 1974 1976 1970 1900 1982 1984

Sales 91

41 . . . . . . . . , . . , 1974 1976 1970 1900 1902 1904

Figure 2. The US wheat export data

Page 11: Forecasting with generalized bayesian vector auto regressions

K . R . Kadiyala and S. Karlsson Generalized Bayesian Vector Autoregressions 375

Table IV. Root mean square error, forecasts of US wheat export data

Method Lead Normal- Normal-

Variable time OLS Minnesota Wishart ENC Diffuse Diffuse

SALE

Sum

EXCH

Sum

SHIP

Sum

PRICE

Sum

1 2 3 4 6 8

10 12 18 24

1-24

1 2 3 4 6 8

10 12 18 24

1-24

1 2 3 4 6 8

10 12 18 24

1-24

1 2 3 4 6 8

10 12 18 24

1-24

0.4380 0.5014 0.4820" 0.5071 " 0.4857 0.4790b 0.4838 0.4847b 0.4964 0.5522b

12.029

0.0270 0.0440 0.0612b 0.0771 0.1001 0.1142 0.1267 0.1428 0.2185 0.2896 3.8 190

0.2060 0.2616 0.3040 0.3487 0.3928 0.3748b 0.3519b 0.3393 0.3282b 0.3009b 7.8956b

0.0490b 0.0721 0.0772 0.0845 0.0883 0.1 1 09 0.1226b 0.1252b 0.1724b 0.2025 3.2317

0.4091 0.4989 0.4949 0.5125 0.48 14 0.4810 0.4814 0.481 1 0.4933 0.5458b

1 1.993

0.0268 0.0439 0.0618 0.0785 0.1035 0.1 192 0.1328 0.1482 0.2186 0.2825 3.8487

0.1901 0.2468b 0.2922b 0.3311b 0.3673 0.3467 0.3246b 0.3141 0.3054 0.2876b 7.3723b

0.0441 0.0604 0.0641 0.0730 0.0785 0.1016b 0.1 132b 0.1184b 0.1691 0.2026 3.0845

0.4288b 0.5143 0.4933 0.5339b 0.4925 0.4953 0.4708" 0.4768' 0.4922 0.5564

12.1 17

0.0269 0.0440 0.0615b 0.0774b 0.1009 0.1145 0.1253 0.1377 0.2026" 0.2561 " 3.5893 "

0.1844" 0.233 1 " 0.2739" 0.3143" 0.3544b 0.3320 0.3036 0.291 6 " 0.301 1 0.2851 7.0953

0.0434 0.0592 0.0644 0.0731 0.0769b 0.0988" 0.1082" 0.1091 " 0.1568" 0.1817* 2.8872'

0.4061 0.4903" 0.4881 0.5095 0.4822b 0.4867 0.4801 0.4804 0.4908 ' 0.5424 '

11.955'

0.0266 " 0.0435 " 0.0612 0.0776 0.1023 0.1171 0.1296 0.1427 0.2071 0.261 1 3.6685

0.1865 0.2384 0.2801 0.3 154 0.3458" 0.3232' 0.3031" 0.2943 0.2952 " 0.2825' 7.0341 ' 0.0424 ' 0.0570" 0.0618" 0.0726" 0.0800 0.1015 0.1131 0.1182 0.1640 0.1919 3.0151

0.4067 0.4956 0.4942 0.5099 0.4796" 0.4806 0.4824 0.4827 0.4946 0.5477

12.099

0.0267 0.0437 0.0618 0.0786 0.1038 0.1191 0.1318 0.1457 0.2107 0.2656 3.7328b

0.1891 0.2455 0.2900b 0.3276b 0.3608b 0.3386b 0.3154 0.3048 0.2986b 0.2850 7.2313b

0.0438 0.0599 0.0636 0.0728 0.0783 0. l O l O b 0.1121b 0.1 161 0.1629 0.1905 2.9957

0.4380 0.5015 0.4829 0.5081 0.4839 0.4769" 0.4828 0.4887 0.4991 0.5594b

12.085

0.0270 0.0439 0.0610" 0.0768" 0.0996a 0.1 128" 0.1234" 0.1370a 0.2052 0.2639b 3.6183

0.2060b 0.2609 0.3035 0.3468 0.3897 0.3692b 0.3429b 0.3308 0.3234b 0.3006b 7.7993

0.0490b 0.0722 0.0773 0.0850b 0.0892

0.1240 0.1262 0.1712b 0.1959b 3.21 82

0.1121b

Page 12: Forecasting with generalized bayesian vector auto regressions

376 Journal of Forecasting Vol. 12, Iss. Nos. 3 d? 4

and that the Diffuse prior does best for log M2. For log GNP, the RMSEs of OLS and the Diffuse prior are significantly worse than the best (Normal-Diffuse) for most lead times. For log M2, on the other hand, the Diffuse prior is significantly better than the other methods at several lead times. The evidence on the value of the prior information is thus somewhat mixed in this case. The priors which allow for dependence between the equations tend to do better than OLS and the Minnesota prior-just as for the smaller data set.

The third experiment use a data set and model from Bessler and Babula (1987) (Figure 2). This data set is considerably more noisy than the Canadian data and is thus a tougher test for the forecasting methods. The VAR consists of the following variables: US export shipments of wheat (SHIP), an exchange rate index for the US dollar (EXCH), the dollar price of wheat (PRICE), and US export sales of wheat (SALE). Seasonally adjusted logarithms of monthly data for January 1974 to March of 1985 are used. The right-hand variables are three lags of each variable and a constant term. For this experiment 30 sets of forecasts 24 periods (two years) ahead were made, with the first set of forecasts made as of October 1980. The RMSEs for these forecasts are in Table IV.

For the wheat export data, the Normal-Wishart and ENC priors do best overall. For the individual variables, the ENC prior does best for the sales variable, the Diffuse and Normal- Diffuse priors do best for the exchange rate variable, the ENC prior does best for the shipment variable, and the Normal-Wishart prior does best for the price variable. The significance tests does not provide a clear picture for the sales and exchange-rate variables. For the shipments and price variables the Normal-Diffuse and ENC priors clearly dominates the other methods. This forecasting experiment thus lends further support for the use of prior information and prior distributions which allow for dependencies between equations.

In addition to the RMSE, the mean absolute percentage error, mean error, and the log determinant of the forecast error variance-covariance matrix were also calculated. These measures are reported in Karlsson (1989) and the pattern is similar to the one displayed by the RMSE.

CONCLUSIONS

Methods for Bayesian analysis of Vector Autoregressions that allow for dependencies between equations are suggested. When evaluated based on the forecast performance several of the methods suggested here do better than the frequently used Minnesota prior. In no case does the Minnesota prior provide forecasts that are significantly better than the more general prior distributions.

The forecasting experiments indicate that the ability to take account of dependencies between parameters of different equations is an important factor in determining the forecast performance of a statistical model. Since such dependencies most certainly are not a phenomena peculiar to the data sets analysed here, the methods suggested in this paper should be of value in many other applications as well.

In as much as the forecast performance is indicative of the goodness of the fit of a model, the results obtained here should carry over to situations where forecasting is not the main issue. Specifically, the use of prior specifications that allow for dependence between equations should prove profitable also when inference about the parameters of the VAR or functions of the parameters, such as impulse response functions or variance decompositions, is the key concern. The Normal-Wishart and Diffuse priors are especially suitable in this context, since analytic expressions for the first and second posterior moments of the parameters are available.

Page 13: Forecasting with generalized bayesian vector auto regressions

K. R. Kadiyala and S. Karlsson Generalized Bayesian Vector Autoregressions 311

ACKNOWLEDGEMENTS

Earlier versions of this paper have been presented at the Joint Statistical Meetings in Washington, DC, 1989, at the 13th Nordic Conference in Mathematical Statistics, Odense, 1989, at FIEF and at Uppsala University. We have benefited from comments by participants in these seminars. In particular, we would like to thank Jerry Thursby, Sheng Hu, John Carlson, Erik Ruist, and P.-0. Edlund. The second author wishes to acknowledge the support of the Royal Swedish Academy of Sciences and the Swedish Research Council for Humanities and Social Sciences (HSFR).

REFERENCES

Ashley, R., Granger, C. W. J. and Schmalensee, R., ‘Advertising and aggregate consumption: an

Bauwens, L., Bayesian Full I n formation Analysis of Simultaneous Equation Models Using Integration

Bessler, D. A. and Babula, R. A., ‘Forecasting wheat exports: do exchange rates matter?’ Journal of

Broemeling, L. D., Bayesian Analysis of Linear Models, New York: Marcel Dekker, 1985. Chow, G. C., ‘Multiperiod predictions from stochastic difference equations by Bayesian methods’,

Econometrica, 41 (1973), 109-18, 796. Reprinted in Feinberg and Zellner (eds), Studies in Bayesian Econometrics and Statistics in Honor of Leonard J. Savage, Amsterdam: North-Holland, 1975.

Doan, T., Litterman, R. and Sims, C., ‘Forecasting and conditional projection using realistic prior distributions’, (with discussion), Econometric Reviews, 3 (1984), 1-144.

Dreze, J. H., ‘Bayesian regression analysis using poly t-densities’, Journal of Econometrics, 6 (1977),

Drkze, J . H. and Morales, J.-A., ‘Bayesian full information analysis of simultaneous equations’, Journal of the American Statistical Association, 71 (1976), 919-23. Reprinted in A. Zellner (ed). Bayesian Analysis in Econometrics and Statistics, Amsterdam: North-Holland, 1980.

Drkze, J . H. and Richard, J.-F., ‘Bayesian analysis of simultaneous equation systems’, in 2. Griliches and M. D. Intrilligator (eds), Handbook of Econometrics, Vol. I, Amsterdam: 1983.

Edlund, P.-0. and Karlsson, S., ‘Forecasting the Swedish unemployment rate: VAR vs. transfer function modelling’, International Journal of Forecasting, 9 (1993), forthcoming.

Geisser, S., ‘Bayesian estimation in multivariate analysis’, Annals of Mathematical Statistics, 36 (1969,

Geweke, J., ‘Antithetic acceleration of Monte Carlo integration in Bayesian inference’, Journal of Econometrics, 38 (1988), 73-89.

Hsiao, C., ‘Autoregressive modelling of Canadian money and income data’, Journal of the American Statistical Association, 74 (1979), 553-60.

Karlsson, S., Bayesian Analysis of Vector Autoregressions, Unpublished doctoral dissertation, Purdue University, 1989.

Kling, J. L. and Bessler, D. A. ‘A comparison of multivariate forecasting procedures for economic time series’, International Journal of Forecasting, 1 (1989, 5-24.

Kloek, T. and van Dijk, H., ‘Bayesian estimates of equation system parameters: an application of integration by Monte Carlo’, Econometrica, 46 (1978), 1-19. Reprinted in A. Zellner (ed.), Bayesian Analysis in Econometrics and Statistics, Amsterdam: North-Holland, 1980.

Litterman, R. B., ‘A Bayesian procedure for forecasting with vector autoregressions’, mimeo, Massachusetts Institute of Technology, 1980.

Litterman, R. B., ‘Forecasting with Bayesian vector autoregressions-five years of experience’, Journal of Business & Economic Statistics, 4, (1986a), 25-38.

Litterman, R. B., ‘Specifying vector autoregressions for macro economic forecasting’, in P. Goel and A. Zellner (eds), Bayesian Inference and Decision Techniques, Amsterdam: Elsevier Science, pp.

analysis of causality’, Econometrica, 48 (1980), 1149-67.

by Monte Carlo, Berlin: Springer-Verlag, 1984.

Business & Economic Statistics, 5 (1987), New York: 397-406.

329-54.

150-59.

79-94. 1986b.

Page 14: Forecasting with generalized bayesian vector auto regressions

318 Journal of Forecasting Vol. 12, Iss. Nos. 3 & 4

Liitkepohl, H., ‘Differencing multiple time series: another look at Canadian money and income data’,

Sims, C. A., ‘Macroeconomics and reality’, Econornetrica, 48 (1980), 1-48. Tiao, G. C. and Zellner, A., ‘On the Bayesian estimation of multivariate regression’, Journal of the

Zellner, A., An Introduction to Bayesian Inference in Econometrics, New York: John Wiley, 1971.

Journal of Time Series Analysis, 3 (1982), 235-43.

Royal Statistical Society Ser. B , 26 (1964), 389-99.

Authors’ biographies: K. Rao Kadiyala is Professor of Economics in the Krannert Graduate School of Management, Purdue University. H e received his BSc(Hons) from Andhra University, India, MStat from the Indian Statistical Institute, India, and PhD from the University of Minnesota. He was on the faculties of the Indian Statistical Institute, Wayne State University, the University of Western Ontario, and San JosC State University.

Sune Karlsson is Assistant Professor of Economic Statistics a t the Stockholm School of Economics. He holds a BSc in Statistics from the University of Uppsala and a PhD in Economics from Purdue University.

Authors’ addresses: K . Rao Kadiyala, Krannert Graduate School of Management, Purdue University, West Lafayette, IN 41901, USA.

Sune Karlsson, Department of Economic Statistics, Stockholm School of Economics, Box 6501, 11383, Stockholm, Sweden.