Model and distribution uncertainty in multivariate GARCH estimation: A Monte Carlo analysis

Computational Statistics and Data Analysis 54 (2010) 2786–2800

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Model and distribution uncertainty in multivariate GARCH estimation: AMonte Carlo analysisE. Rossi a, F. Spazzini b,∗a Dipartimento di Economia Politica e Metodi Quantitativi, Università di Pavia, Via S.Felice 5, 27100 Pavia, Italyb Dipartimento di Economia, Università degli Studi di Milano, Via Conservatorio 7, 20122 Milano, Italy

a r t i c l e i n f o

Article history:Received 12 October 2008Received in revised form 2 June 2009Accepted 4 June 2009Available online 11 June 2009

a b s t r a c t

Multivariate GARCH models are in principle able to accommodate the features ofthe dynamic conditional covariances; nonetheless the interaction between modelparametrization of the second conditional moment and the conditional density of assetreturns adopted in the estimation determines the fitting of such models to the observeddynamics of the data. Alternative MGARCH specifications and probability distributions arecompared on the basis of forecasting performances by means of Monte Carlo simulations,using both statistical and financial forecasting loss functions.

© 2009 Elsevier B.V. All rights reserved.

1. Introduction

Predicting the second-order moments in asset returns is important for many issues in financial econometrics and inapplications like asset pricing, portfolio optimization, risk management, etc. The results on multivariate GARCH (MGARCHhereafter) models show that this class of multivariate volatility models is able to accommodate the features of dynamiccorrelation processes. The main drawback is a rapidly increasing number of parameters with the cross-sectional dimensionof the dataset considered.More recently, several models have been proposed that have a fair amount of generality while keeping the number

of parameters at a reasonable level. However, simpler conditional covariance models can be highly misspecified, yieldingseverely distorted forecasts.Moreover, it is well known that the interaction between the covariance parametrization and the conditional density of

asset returns adopted in the quasi-maximum likelihood estimation procedures determines the fitting of such models to thedynamics of the data (see e.g. Fiorentini et al. (2003), Bauwens and Laurent (2005) and Bauwens et al. (2007)).This paper aims to evaluate the interactions, in forecasting volatilities and covolatilities, between conditional second

moment specifications and probability distributions adopted in the likelihood computation. We measure, by means ofMonte Carlo simulations, the relative performances of alternative conditional second moment and probability distributionspecifications using both statistical and financial forecasting loss functions. The Monte Carlo analysis is motivated by thefact that the precision of the ex post evaluation depends on the observability of the true volatilities which are indeedunobservable on themarkets. In this respect, this work can be considered a generalization of Engle (2002) since we considermodels which are misspecified in the conditional volatility, correlation and density components.The Data Generating Process (DGP) is a generalization of the multivariate stochastic volatility model proposed

by Danielsson (1998). It is able to generate dynamic correlations, leverage effects, volatility clustering and asset specific

∗ Corresponding address: Dipartimento di Scienze Economiche Aziendali e Statistiche, Università degli Studi di Milano, Via Conservatorio 7,20122, Milano, Italy. Tel.: +39 0236694509; fax: +39 0236694510.E-mail addresses: [email protected] (E. Rossi), [email protected] (F. Spazzini).

0167-9473/$ – see front matter© 2009 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2009.06.004

http://www.elsevier.com/locate/csda

http://www.elsevier.com/locate/csda

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.csda.2009.06.004

E. Rossi, F. Spazzini / Computational Statistics and Data Analysis 54 (2010) 2786–2800 2787

kurtosis. We chose as DGP a process external to the class of models compared in our analysis for two sets of reasons: first,multivariate stochastic volatility models are coherent with the hypothesis that asset prices in a no-arbitrage and frictionlessmarket follow semi-martingale processes (see e.g. Shephard and Andersen (2008)); second, a DGP which is external to theclass of competing models allows us to fairly compare the alternative forecasts. We limit our analysis to bivariate models,hence ourMonte Carlo experiment does not take into account the problems connected with the curse of dimensionality andfeasibility considerations in general.The models considered are the Dynamic Conditional Correlation (DCC) model, the Asymmetric DCC-EGARCH, and the

BEKK parametrizations estimated using quasi-maximum likelihood with Normal, Student’s t Copula, multivariate Laplace,multivariate t densities and the recently introduced Multiple Degrees of Freedom Student’s t distribution (MDFt) (Serbanet al., 2007).First, for what concerns the results, our exercise shows that explicitly modeling the leverage effects present in the data

improves the forecasting ability only if the leptokurtosis is correctly accounted for. Both DCC specifications, with Gaussiandisturbances, lead to a severe underestimation of the VaR, while allowing for leptokurtic error distributions yields a sensibleenhancement in the risk measurement performances of these models. BEKK models with Normal distributions completelyfail to describe the tail behavior of our data set. On the contrary, with Laplace, multivariate t or MDFt distributions thefraction of VaR violations is a very goodmatch of the correct one. Finally, considering densities which allow for asset specificlevels of kurtosis (e.g Copula t and MDFt) strongly improves the VaR forecasting performances.The paper is structured as follows. The models are introduced in Section 2, while Section 3 reports the log-likelihoods

considered. The Data Generating Process and the evaluation measures are introduced in Section 4; Section 5 reports theresults of the Monte Carlo experiments and Section 6 concludes the paper.

2. The models

We suppose that the N-vector of asset returns rt has a conditional distribution:

rt |=t−1 ∼ D(0,Ωt) t = 1, . . . , T ; (1)

where D is a continuous distribution and =t−1 is the information set at time t − 1 (Et−1[·] ≡ E[·|=t−1], hereafter).Ωt is theconditional variance–covariance matrix, assumed to be time-varying and parametrized as Dynamic Conditional CorrelationGARCH (DCC), Asymmetric DCC EGARCH (ADCC), Scalar BEKK (SBEKK ) and Diagonal BEKK (DBEKK ).We estimate eachmodelusing different densities for the likelihood function. The assumed processes for the conditional covariances are misspecifiedby construction, as explained in Section 4.

2.1. Dynamic Conditional Correlation models

2.1.1. DCC-GARCHIn the Dynamic Conditional Correlation GARCH, introduced by Engle (2002), the conditional variance–covariance matrix

is written as:

Ωt = HtRtHt t = 1, . . . , T ;

where Rt is the conditional correlation matrix, and the standardized returns are defined as:

εt = H−1t rt t = 1, . . . , T ;

where Ht = diagσ1t , . . . , σNtwith

σit = Vart−1[rit ]1/2 i = 1, . . . ,N t = 1, . . . , T ;

where the σ 2it are modeled as univariate GARCH processes. The conditional variance–covariance matrix of εt is theconditional correlation matrix of the asset returns,

Et−1(εtε′t) = H−1t ΩtH−1t = Rt = ρij,t t = 1, . . . , T ;

which is defined as

ρij,t =qij,t

√qii,tqjj,t

,

where qij are modeled as:

qij,t = ρ ij + α(εi,t−1εj,t−1 − ρ ij)+ β(qij,t−1 − ρ ij). (2)

The matrix Qt is positive definite as long as it is a weighted average of positive definite and semidefinite matrices. Inmatrix form:

Qt = (1− A− B)Q + A(εt−1ε′t−1)+ B(Qt−1), (3)

2788 E. Rossi, F. Spazzini / Computational Statistics and Data Analysis 54 (2010) 2786–2800

where Q is estimated by S = 1T

∑t εt ε

′t , and A and B are two scalars. However, Aielli (2006) suggests to modify the standard

DCC in order to correct the asymptotic bias which is due to the fact that 1T∑t εt ε

′t does not converge to Q .

Finally, the DCC model specification is:

σ 2it = ωi + αir2it−1 + βiσ

2it−1, (4)

Qt = (1− A− B)Q + A(εt−1ε′t−1)+ B(Qt−1), (5)

Rt = diagQt−1/2 Qt diagQt−1/2. (6)

2.1.2. Asymmetric DCC-EGARCHThe Asymmetric DCC-EGARCH (Cappiello et al. (2003)) with conditional variances modeled as an EGARCH(1, 0) is:

σ 2it = exp(ωi + αiεit−1 + δi (|εit−1| − E |εit−1|)+ βi log(σ 2it−1)

), (7)

Qt = (Q − AQ − BQ − CN)+ A(εt−1ε′t−1)+ B(Qt−1)+ C(ηt−1η′

t−1), (8)

Rt = diagQt−1/2 Qt diagQt−1/2, (9)

where ηt = 1I(εt < 0) εt ( is the Hadamard product), with A, B and C scalars. The matrix N is estimated byN = 1

T

∑Tt=1 ηt η

′t .

2.2. BEKK parametrization of MGARCH

The BEKK (from the authors: Baba Engle Kraft and Kroner, see Engle and Kroner (1995)) restriction of a MultivariateGARCH(1, 1) model is:

Ωt = CC ′ + Art−1r ′t−1A′+ BΩt−1B′, (10)

where C is a lower triangular matrix. This model guarantees the positive definiteness of the conditional variance matrixΩtunder a set of conditions shown in Engle and Kroner (1995). An obvious restriction is the assumption that the matrices Aand B are either diagonal or scalar.

3. The likelihoods

3.1. DCC - Normal log-likelihood

When D(0,Ωt) is a Multivariate Normal density the log-likelihood of the DCC-GARCH model can be expressed as:

LT (θ) = −12

T∑t=1

(N log(2π)+ log |Ht |2 + r ′tH

−2t rt + ε

′

tR−1t εt − ε

′

tεt + log |Rt |).

The parameter vector θ includes the parameters in the conditional variance processes σ 2it ,ψ , as well as φ, the parametersin the correlation process Rt . The log-likelihood function can be written as the sum of two components. The first is theVolatility component:

LV (ψ) ≡ LV ,T (ψ) = −12

T∑t=1

N∑i=1

(log(2π)+ log(σ 2it )+

r2itσ 2it

). (11)

The second is the Correlation component:

LC (ψ, φ) ≡ LC,T (ψ, φ) = −12

T∑t=1

(ε′tR−1t εt − ε

′

tεt + log |Rt |). (12)

Once we have estimated the returns conditional variances σ 2it we can compute the standardized residuals εit = rit/σit andemploy them in the maximization ofLC .


3.2. DCC-Laplace log-likelihood

An alternative assumption for the conditional distribution of the returns is represented by the asymmetric multivariateLaplace (see Cajigas and Urga (2007) and Kotz et al. (2001)). The Multivariate Laplace is an interesting distribution forfinancial applications since it can accommodate both leptokurtosis and skewness. Moreover it has a closed form densityand it is closed under linear combinations, allowing for a trivial portfolio Value at Risk computation if the assets returns aredistributed according to a Multivariate Laplace law.The conditional asymmetric Laplace multivariate density of the vector rt is given by:

f (rt;m,Ωt) =2 exp(r ′tΩ

−1t m)

(2π)N/2|Ωt |1/2

(r ′tΩ−1t rt

2+m′Ω−1t m

)ν/2Kν

(√(2+m′Ω−1t m)(r ′tΩ

−1t rt)

),

where ν = (2− N)/2 and Kν(u) is the modified Bessel function of the second kind (see Abramowitz and Stegun (1965) fordetails). The vectorm is the location parameter and thematrixΩt is the scale parameter of this distribution. The distributionis unimodal with the mode equal to zero. As a consequence the parameter m also determines the level of asymmetry. Themean of distribution is equal to m while the conditional covariance matrix is Ωt + mm′. When m is zero Et−1(rt) = 0 andVart−1(rt) = Ωt . In this particular case the log-likelihood of DCC models is

LT (θ) =T∑t=1

−12log |Ωt | +

ν

2

(log(r ′tΩ

−1t rt)− log(2)

)+ log Kν

(√2(r ′tΩ

−1t rt)

). (13)

We can now proceed with the maximization of (13) w.r.t. θ . The same log-likelihood is used to estimate BEKK models.

3.3. DCC-Copula log-likelihood

Themultivariate density of rt can be constructed using copulas.We chose to link univariate t distributions via a Student’st Copula (see Serban et al. (2007) for a similar application). The Student copula function is defined as

CRt ,νC (u1, . . . , uN) = TRt ,νC (t−1νC(u1), . . . , t−1νC (uN)),

where ui = Fi(ri) with Fi being the c.d.f. of a univariate t distribution with νi degrees of freedom; TRt ,νC is the c.d.f. of themultivariate t with conditional correlation matrix Rt and common degree of freedom parameter νC , with t−1νC being theinverse of the univariate Student’s t distribution function. The density function for the Student copula is

c(u1, . . . , uN |Rt , νC ) = |Rt |−1/20

(νC+N2

) [0(νC2

)]N (1+ 1νCζ ′R−1t ζ

)−(νC+N)/2[0

(νC+12

)]N0(νC2

) N∏i=1

(1+ ζ 2i

νC

)−(νC+1)/2 , (14)

where ζi = t−1νC (ui) and 0(·) is the Gamma function.Let Θ = (ψ1, . . . , ψN;φ) be the parameters vector to be estimated, where ψi, i = 1, . . . ,N contains the parameters

of the marginal distributions Fi (i.e. the parameters of the processes σ 2it and the degree of freedom parameters νi) and φ isthe vector of the copula parameters, in this case the parameters in the specification of Rt and νC . It follows from the Sklar’sTheorem that the log-likelihood function for the joint conditional distribution D(·,Θ) is given by

LT (Θ) =T∑t=1

log(c(F1(r1,t;ψ1), . . . , FN(rN,t;ψN);φ))+T∑t=1

N∑i=1

log fi(ri,t;ψi), (15)

where c is the copula density, whereas fi are the marginal densities. Hence, the log-likelihood of the joint distribution is justthe sum of the log-likelihoods of the margins and the log-likelihood of the copula:

LT (Θ) =T∑t=1

(lm,t(ψ)+ lc,t(φ)), (16)

where ψ = (ψ1, . . . , ψN)′. The marginal model for each return series is:

rit =

√νi − 2νi

σityit ,

yit ∼ i.i.d. t(0, 1, νi), i = 1, . . . ,N are i.i.d. Student’s t with νi degrees of freedom, with density given by:

fi(rit; νi) =0((νi + 1)/2)

√(νi − 2)π0(νi/2)

σ−1it

(1+

r2it(νi − 2)σ 2it

)−(νi+1)/2νi > 2.


Table 1MAE, Bias and RMSE of the estimated sample correlations (ρM ) of data obtained from a t Copula with correlation ρC . We simulate 10000 samples of 10000observations each for every value of ρC .

ρC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9MAE 0.0218 0.0218 0.0211 0.0201 0.0188 0.0165 0.0135 0.0107 0.0078 0.0041Bias 0.0071 −0.0069 0.0057 0.0043 −0.0012 −0.0018 −0.0005 −0.0004 −0.0012 −0.0015RMSE 0.2841 0.2731 0.1311 0.0837 0.0585 0.0417 0.0282 0.0192 0.0121 0.0058

ρ M –ρ

C

–0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

–0.1

–0.05

0

0.05

0.1

Fig. 1. Boxplot of ρM − ρC .

The σit are modeled as in Eq. (4) or Eq. (7). The component of the log-likelihood for the tth observation corresponding to themarginal densities is:

lm,t(ψ) =N∑i=1

log0

(νi + 12

)− log0

(νi2

)−12logπ −

12log(νi − 2)

−νi + 12

log(1+

r2it(νi − 2)σ 2it

)−12log σ 2it

, (17)

while that from the Student copula is:

lc,t(Θ) = −12log |Rt | + log0

(νC + N2

)−νC + N2

log(1+

1νCζ ′R−1t ζ

)

+ (N − 1) log[0

(νC2

)]− N log

[0

(νC + 12

)]+νC + 12

N∑i=1

log(1+

ζ 2i

νC

). (18)

The correlation matrix Rt in the Student copula (14) is modeled as in Eqs. (5) and (6) or as in Eqs. (8) and (9).Adopting the Inference For Margins method (see Joe (1997)) we firstly maximize the second part of (15) with respect to

the vector ψ and secondly the first part with respect to φ, that is νC and the parameters in the processes of Rt .A drawback of thismodel is that the copula correlation is calculatedwith the transformed returns instead of the observed

returns. In other words, Eq. (14) shows that the arguments of the Student copula are the transformations ζi = t−1ν (ui). Thus,Rt is the correlation matrix between these transformedmarginals. The actual correlation between returns can be computedusing simulations from the estimated copula.In order to assess the error induced by considering the Copula correlation as the returns correlation, we set up a Monte

Carlo experiment with 10000 replications in which, for every replication, we simulate 10000 observations from a t Copulawith νC = 8 degrees of freedom and Student’s t marginals with 7 and 12 degrees of freedom respectively and with constantcorrelation equal to ρC = [−.9,−.8, . . . , .8, .9]. Then, we compute the linear correlation between these simulated valuesand boxplot the differences between the estimated linear correlations and the true Copula correlations. The results areshown in Fig. 1. Table 1 shows the MAE, the relative bias and the RMSE associated with using the sample correlation of thereturns as estimator of the underlying Copula correlation.The bias is of two orders ofmagnitude smaller than the estimated correlations and its distribution is symmetric and fairly

concentrated around zero, hence we decided to ignore this bias and carry on our exercise considering the copula correlationas the correlation between the returns; moreover the computational burden associated with the simulation of the linearcorrelations is unmanageable within our Monte Carlo setup.


3.4. BEKK-Multivariate Student’s t log-likelihood

A natural alternative to the multivariate Gaussian density is the Student density, see Harvey et al. (1992) and Fiorentiniet al. (2003). The latter has an extra scalar parameter, the degrees of freedom parameter, denoted ν hereafter. The value ofν indicates the order of existence of the moments: e.g. if ν = 2, the second-order moments do not exist, but the first-ordermoments exist. For this reason, it is convenient (although not necessary) to assume that ν > 2, so that Ωt can always beinterpreted as a conditional covariance matrix. The multivariate Student’s t density for rt is

f(rt; 0,

ν − 2ν

Ωt , ν

)=

0(ν+N2

)0(ν2

)[π(ν − 2)]N/2

|Ωt |−1/2

[1+

r ′tΩ−1t rt

ν − 2

]− N+ν2,

hence the log-likelihood function for observation t results to be:

lt(θ) = log0(ν + N2

)− log0

(ν2

)−N2logπ −

N2log(ν − 2)

−N + ν2

log

(1+

r ′tΩ−1t rt

ν − 2

)−12log |Ωt | .

3.5. BEKK-Multiple Degrees of Freedom student’s t log-likelihood

To allow different levels of heavy tailedness for different error terms we adopt the Multiple Degrees of Freedom t (MDFt,hereafter) introduced by Serban et al. (2007). Let the returns be generated as:

rt = Ω1/2t Vyt t = 1, . . . , T ;

with

V = diag

[√ν1 − 2ν1

, . . . ,

√νN − 2νN

],

where yit ∼ i.i.d. t(0, 1, νi), i = 1, . . . ,N . The density function for yit results to be:

f (yit; νi) =0((νi + 1)/2)√(νiπ)0(νi/2)

(1+

y2itνi

)−(νi+1)/2νi > 2.

Given the assumption of independent yit , the joint density of rt is given by

D(rt; θ) =N∏i=1

f

(u′iΩ

−1/2t rt

√(νi − 2)/νi

; νi

) ∣∣∣V−1Ω−1/2t

∣∣∣ ,with ui = (0, . . . , 0, 1, 0, . . . , 0)where 1 is in the ith position. Finally, the t-contribution to the log-likelihood function is

lt(rt; θ) = −N∑i=1

log

√νi − 2νi+

N∑i=1

log

0

(νi+12

)√πνi 0

(νi2

)− 1

2log |Ωt |

−

N∑i=1

(νi + 12

)log

1+ 1νi

(u′iΩ

−1/2t rt

√(νi − 2)/νi

)2 .

4. Data Generating Process and Monte Carlo setup

In order to obtain simulated data which are able to mimic as closely as possible the stylized facts observable in realfinancial time series sampled at daily frequency, we implement a Data Generating Process as in Danielsson (1998). Werelaxed the original assumption of constant correlation between the innovations of the returns, allowing this correlation toevolve through time following four different deterministic functions of the time index.The bivariate DGP is defined as:

yt = Ψtεt , (19)


0 20000

0.2

0.4

0.6

0.8

1

ρCONSTANT

ρ t0 2000

0

0.2

0.4

0.6

0.8

1

ρSTEP

ρ t

0 20000

0.2

0.4

0.6

0.8

1

ρSINE

ρ t

0 20000

0.2

0.4

0.6

0.8

1

ρFASTSINE

ρ t

Fig. 2. Correlation trajectories for the simulation study.

where Ψ ′tΨt = Ωt is the 2× 2 covariance matrix and εt is a 2× 1 vector of shocks to returns. The covariance matrix can befurther decomposed as follows:

Ωt = HtRtHt , (20)

where Ht is a diagonal matrix containing the square root of the univariate volatilities, and Rt is the time-varying correlationmatrix.The volatility of each asset is modeled via the univariate stochastic volatility process

ht,i = exp[ωi + δi log(ht−1,i)+ γiηt,i], i = 1, 2 (21)

while for the dynamic correlation matrix we adopt the specifications shown in Eq. (22) and in Fig. 2:

R12,t = R21,t ≡ ρt ,ρCONSTANT ,t = 0.6,ρSTEP,t = 0.2+ 0.51I(t > 750),ρSINE,t = 0.5+ 0.2 cos (2π t/1000) ,

ρFASTSINE,t = 0.5+ 0.2 cos (2π t/20) . (22)

These specifications are able to cover a great variety of empirically relevant correlation structures and they present theadditional advantage of being comparable with the correlation structures adopted by Engle (2002). This is particularlyinteresting since it gives us the opportunity to extend the results of Engle (2002) in a framework in which there is jointmisspecification of volatilities, correlation and innovations distribution.As far as the volatility shocks ηt are concerned, we impose unit variance and fixed covariance ς to these shocks. In order

to introduce both within and across assets leverage effects, we impose that the covariance matrix of returns and volatilitiesshocks L = E(εtη′t) 6= 0. The diagonal elements of L contain the within asset leverage effects, while the non-diagonalelements induce the across assets leverage.Finally, we further generalize the process in order to introduce different levels of excess kurtosis in the shocks of both the

returns and the volatilities by coupling univariate Student’s t distributions, with νi, i = 1, . . . , 4 d.o.f. for each innovation,with a Student’s t Copula with νC d.o.f.The values of the parameters chosen for our Monte Carlo experiment are shown in Table 2. Choosing these values for

the parameters of the DGP, we are able to closely match the descriptive statistics of the NASDAQ 100 and of the S&P 500indexes.

4.1. Copula simulation and Monte Carlo setup

Obtaining innovations with a dependence structure described by an elliptical copula is straightforward, even if thecomputational burden is almost fifteen times bigger than that associated to the simulation of a multivariate normaldistribution.Firstly, we need to draw the innovations from a four-variate t distribution with νC degrees of freedom and correlation

matrix given by

CC =

Rt L

L1 ςς 1

.


Table 2Parameters of the Data Generating Process.

Univariate volatility parameters

ω1 δ1 γ1 ω2 δ2 γ20.01 0.96 0.23 −0.04 0.95 0.34

Degrees of freedom

ν1 ν2 ν3 ν4 νC7 12 5 10 8

Leverage parameters and volatility innovations correlation

L11 L22 L12 ς

−0.4 −0.3 −0.1 0.32

Table 3Average descriptive statistics over the 200 Monte Carlo replications with constant correlation (standard errors in parenthesis).

Statistic Series 1 Series 2 Statistic Series 1 Series 2

Mean 0.0022(0.0232)

0.0023(0.0172) Skewness 0.0705

(0.2691)0.0490(0.4003)

Std. Dev. 1.0521(0.0633)

0.7757(0.0514) Kurtosis 9.1493

(3.5885)7.1764(2.2859)

Max 6.6347(1.9111)

5.2276(1.4579) p-Normal <1% <1%

Min −6.1711(1.4498)

−5.0848(1.5000)

There is statistical dependence between these four variables, and each has a Student’s t marginal distribution. Next, weapply the Student’s t cumulative distribution function separately to each variable, changing the marginal distributions intoUniform distributions over the interval (0, 1). At this point the statistical dependence is that proper of the multivariate tdistribution, but the marginal distributions are Uniforms over the interval (0, 1): we obtained a sample with a t Copuladistribution with νC degrees of freedom and correlation matrix given by CC . Finally, we exploit the fact that applying theinverse c.d.f. of any distribution F to a U(0, 1) random variable results in a random variable whose distribution is exactly F ;hence, we apply to each variable inverse standardized t c.d.f.with d.o.f. ν1 . . . ν4 and this concludes our simulation algorithm.We generate 200 independent datasets for every correlation process with 3000 observations each, discarding the first

1000 observations of each sample to minimize the initialization bias. Table 3 reports the average descriptive statistics andthe corresponding standard deviations over the 200 Monte Carlo replications with constant correlation.The first 1500 observations of each sample are used to estimate the 14 competing models. The in-sample performances

of the models are compared on the basis of the mean Bayesian Information Criterion (BIC), the Mean Absolute Error of theestimated correlation with respect to the true correlation:

MAE =1T

∑|ρt − ρt |, (23)

and the mean distance between the estimated conditional variances and the true variances, using the Frobenius norm asdecision criterion:∥∥∥diag(Ht − Ht)∥∥∥

F=

√∑i

∣∣∣Hii,t − Hii,t ∣∣∣2. (24)

These quantities are shown in Table 4.

4.2. Out-of-sample evaluation criteria

In order to assess the out-of-sample performances of the competing models, we use the last 500 observations of eachsample to compare the one-step-ahead forecasting capabilities of the models from both statistical and financial points ofview. As statistical performance criteria, we compute the Mean Absolute Forecasting Error for each model m over the 200Monte Carlo replications:

MAFE(m) =1N

N∑i=1

|(em,T+i+1)|, (25)

where N = 500, T = 1500 and em,i is the forecasting error at time i, computed as the difference between the forecastedvolatility or correlation and the actual one. This error statistic presents the shortcoming that the underlying loss function


Table 4Bayesian Information Criterion, Mean Absolute Error between the true and the estimated correlation and Frobenius norm of the difference between thetrue and the estimated covariance.

DCC ADCC SBEKK DBEKKN L tc N L tc N L t MDFt N L t MDFt

BIC

CONSTANT 6.987 6.045 4.415 7.099 6.052 4.377 5.092 4.958 4.445 4.460 5.074 4.948 4.445 4.463STEP 7.107 6.033 4.508 7.256 6.042 4.480 5.113 4.969 4.548 4.564 5.104 4.963 4.540 4.570SINE 7.097 6.032 4.509 7.185 6.019 4.481 5.118 4.969 4.544 4.562 5.111 4.970 4.542 4.563FASTSINE 7.031 6.008 4.532 7.150 6.009 4.505 5.119 4.972 4.573 4.587 5.117 4.969 4.564 4.587

MAE


FROB


is symmetric. Hence, as additional performance measures, we include in our analysis the Mean Mixed Error of Under-prediction (MMEU) and the Mean Mixed Error of Over-prediction (MMEO), as suggested by Brailsford and Faff (1996). TheMMEU can be defined as:

MMEU(m) =1N

(NU∑i=1

√|(em,T+i+1)| +

NO∑i=1

|(em,T+i+1)|

), (26)

while the MMEO as:

MMEO(m) =1N

(NO∑i=1

√|(em,T+i+1)| +

NU∑i=1

|(em,T+i+1)|

), (27)

where NU is the number of times that the predicted conditional variance or correlation is smaller than the actual one and NOis its complementary. While computing these statistics, when the absolute value of the forecast error is larger than unity,we square the error in order to achieve the desired penalty.The last statistic evaluation criterion presented in this work entails the classical Mincer–Zarnowitz regression, which

involves the regression of the realization of a variable on its forecast; the forecast is optimal if the null hypothesis of zerointercept and unity slope coefficient cannot be rejected. Since our datasets are simulated, we do not need to rely on volatilityproxies as in Patton and Sheppard (2008). The univariate Mincer–Zarnowitz regression takes the form:

zi,T+t = α + β zi,T+t + uT+t , (28)

where zi denotes the actual volatility or correlation and zi represents the corresponding forecast; T = 1500 is the in-samplethreshold and t = 1, . . . , 500.The covariance matrix of the estimated regression parameters is constructed using the Newey and West estimator, in

order to control for heteroskedasticity and autocorrelation in the residuals. If the null hypothesis α = 0 ∩ β = 1 cannotbe rejected by a Wald test, the forecast can be considered optimal with respect to the relevant filtration.Furthermore, we implement amultivariate version of theMincer–Zarnowitz test; we regress the distinct elements of the

actual covariance matrix on the corresponding elements of the forecasted covariance matrix using a Seemingly UnrelatedRegression Equations approach, rather than a panel estimator as suggested in Patton and Sheppard (2008). Hence, in ourframework, the Multivariate Mincer–Zarnowitz regression takes the form:

vech(ΩT+t) = α + β vech(ΩT+t)+ uT+t , (29)

where the vech operator stacks the column of a matrix in a column vector excluding the upper diagonal terms ; α and β arein this case 3 × 1 vectors. As an indicator of goodness of fit in these SURE regressions we adopt the McElroy systemwidemeasure:

R2 = 1−u′S−1u

y′(Σ−1 ⊗

(I − ιι′

N

))y, (30)


where Σ =[σij]with σij = u′iuj/(N−1), S = Σ⊗ I , N is the number of observations in each equation, I is an N×N identity

matrix, ι is a N × 1 vector of ones and y = vec(vech(ΩT+t)′

t=1,...,500,T=1500).

After having compared the forecast capabilities of these models with statistical indicators, we now focus on an exerciseof risk measurement in order to inspect the forecasting abilities of these models when exploited in a financial context.In order to do so, we consider a hedging (withweights 1 and−1) and an equally weighted portfolio. For each, we forecast

the Value at Risk at 99% over an holding period of 1 day, using the last 500 observations of each sample as back-testing data.The VaRs are based on forecasts obtained from models estimated with rolling samples of fixed size of 1500 observations.This exercise is then repeated for every Monte Carlo iteration.The vectors of Value at Risk violations and forecasts allow us to perform the Dynamic Quantile test proposed by Engle

and Manganelli (2004).

5. Results

5.1. In-sample results

Table 4 reports some decision criteria based on the in-sample performances of the competing models. The entries ofthe Table are simply the averages over the 200 Monte Carlo replications of the relevant statistic (in boldface the preferredchoice).Looking at Table 4, we notice how allowing for densities able to accommodate the different levels of excess kurtosis

dramatically reduces the Bayesian Information Criterion. This is particularly evident for the DCC class, while in the BEKKfamily the improvements are less striking. Another interesting feature is related to the leverage effects: modeling theleverage assumes significance only if the differences in the excess kurtosis of the data are taken into account with a copulamodel, with the ADCC Copula being the preferred model for every DGP considered. Moreover, it is worth noting that theranking of the models induced by all three in-sample decision criteria is invariant under the true correlation specification.The MAE for the correlation (Eq. (23)) and the Frobenius Norm (Eq. (24)) for the volatilities confirm that the flexibility of

DCC specifications greatly improves the in-sample fitting of these models.Finally, we can observe how the Laplace distribution outperforms both the Gaussian distribution and the t Copula. We

expected this result with respect to the Gaussian distribution, but the relatively poor performance of the Copula model isindeed puzzling, given that the data were drawn from amodel with a structure rather close to the structure of a Copula DCC,as confirmed by the BIC.

5.2. Out-of-sample results

Table 5 reports the results of the out-of-sample comparison of the competing models on the basis of the Mean AbsoluteForecasting Error described in Eq. (25).We can observe, in general, how the DCC-GARCH and the BEKK specifications have similar performances as indicated

by the MAFE criterion, while the ADCC–EGARCH parametrizations lead to a substantial improvement of the forecastingaccuracy. Looking at the first series, characterized by an higher kurtosis of the shocks of both the return and the volatility,we can observe how the MAFE criterion prefers a Laplace distribution over a Student’s t for all the models and correlationspecifications considered, with the ADCC Laplace being the preferred model in 4 out of 4 cases. Focusing now on the secondseries, less leptokurtic, we can observe how these differences in accuracy fade away, with the simple Normal distributionbeing preferred by theMAFE criterion for everymodel considered and for 3 out of 4 DGPs. Finally, considering the correlationforecast, DCC are strongly preferred to BEKK specifications, with the ADCC Laplace minimizing the forecasting error in 3cases, and the ADCC Copula being the second best choice.Looking nowmore in detail at the BEKK performances, we can observe how diagonal BEKKmodels generally increase the

correlation forecasting accuracy but perform worse in forecasting volatilities.Concluding, we obtain some counterintuitive results; it seems, in fact, that increasing the flexibility in the specification of

the conditional variances (Diagonal BEKK vs. Scalar BEKK) and/or in the distribution of the innovations (Student’s t marginalandCopula vs. Laplace andMDFt vs.Multivariate t), the forecasting accuracy is not improved, despite the increasednumericalcomplexity of the estimation procedures. On the other hand, explicitly modeling the leverage effects present in the datastrongly improves the forecasting results.Analyzing now the results from asymmetric loss function, reported in Tables 6 and 7, we can observe how every

considered model tends to under-predict both volatilities, and this effect is stronger for the first series. In general, DCCmodels outperform BEKK specifications when looking at theMMEO, while the converse is true looking at the error of under-prediction. Interestingly, fitting an ADCC EGARCH to our datasets, we are able to reduce substantially the magnitude ofthese under-predictions, even if their number increases. Finally, considering the correlations forecasts, we notice how theDCC Laplaceminimizes the over-predictionswhile the Copula ADCCminimizes the under-predictions. Once again, themodelranking does not seem to depend on the true correlation specification.The analysis of the correlation between point forecasts (results not reported here to save space) reveals two main

interesting features. First, if we focus on the single class ofmodels, e.g. DCC, we notice that the forecasts are highly correlated


Table 5Mean Absolute Forecasting Error.


MAFE 1


MAFE 2


MAFE 12


Table 6Mean Mixed Error of Over-prediction.


MMEO 1


MMEO 2


MMEO 12


Table 7Mean Mixed Error of Under-prediction.


MMEU 1


MMEU 2


MMEU 12



Table 8Multivariate Mincer–Zarnowitz test: Average R2 .

R2MZ DCC ADCC SBEKK DBEKKN L tc N L tc N L t MDFt N L t MDFt


Table 9Univariate Mincer–Zarnowitz test: Average rejection rates.


% Rej. MZ1

CONSTANT 99 98 98 99 97 99 96 94 94 90 96 90 97 92STEP 97 96 95 98 97 99 98 93 94 95 95 90 94 92SINE 99 98 99 100 97 100 98 94 95 87 96 93 96 96FASTSINE 98 96 97 100 97 99 96 87 93 94 96 88 92 93

% Rej. MZ2


% Rej. MZ12


across the distributions, suggesting that the error distribution has a negligible impact on the description of the dynamics ofthe data. Second, it seems that correlations across models are mainly driven by the presence of an asymmetric specification.The results of the Multivariate Mincer–Zarnowitz regressions are shown in Table 8; the performances of the competing

models are extremely poor, with rejection rates equal to 100% in every case (not reported in the table). Despite the fact thatthis test always rejects the null hypothesis of optimal forecast, the explanatory power of the regressions, as described by theMcElroy systemwide measure (Eq. (30)), are quite satisfactory, especially considering ADCC EGARCH models. Even in thiscase, capturing the leverage effects allows to sensibly increase the explanatory power of the regressions, despite the factthat the rejection rate fails to lower.We perform the same test in a univariate framework, implementing autonomous regressions for the three forecast series.

The results are shown in Table 9.We can observe how forecasting the second series is less problematic, given the reduced kurtosis in the variance shocks,

as confirmed by the lower rejection rate for every model considered. Interestingly enough, the rejection rate relative tothe first series increases with the ADCC–EGARCH models, while the opposite is true if we look at the second series. Thisconfirms that the benefit in forecasting accuracy induced by explicitly modeling the leverage effect present in the dataquickly disappears as the probability of extreme shocks increases. Interestingly, the best predictors of both the volatilitiesare the BEKK models when coupled with Laplace and MDF-t distributions. This is an unexpected result, given the inabilityof these specifications to take in account the leverage effects imposed to the data. On the other end, considering both theleverage effects and the different levels of kurtosis greatly improves the correlation forecasts, where the ADDC Copula is thepreferred model in 3 out of 4 cases.Finally, Table 10 reports the R2 of the univariateMincer–Zarnowitz regressions.We can see how leptokurtic ADCCmodels

obtain the highest R2 for both volatilities and correlation with Laplace and t Copula specification respectively.

5.3. Value at Risk forecasting results

Looking now at the results of the back-testing of the 99% Value at Risk of an equally weighted portfolio, depictedin Table 11, we can draw some general conclusions about the overall performances of the competing models whencoupled with different distributions for the innovations. First of all, when coupled with Gaussian disturbances, every modelspecification leads to a strong underestimation of the Value at Risk, resulting in about 40% rejection rates in the DynamicQuantile test. In all cases, we observe how the rejections are due to a number of Value at Risk violations more than 50%higher than the theoretical coverage rate. Another striking evidence of the inadequacy of the Gaussian distribution in this


Table 10Univariate MZ test: average R2 .


R2MZ1


R2MZ2


R2MZ12


Table 11Equally Weighted portfolio VaR forecasting results: Average Unconditional Coverage ratios and number of Dynamic Quantile test rejections.


θ


% DQ


framework is the fact that neither the unconditional coverage ratio nor the Dynamic Quantile rejection rates decrease whenthe leverage effects are explicitly modeled.Coupling DCC specifications with leptokurtic error distributions, we are able to obtain a sensible enhancement in the

performances of these models. In fact, both Laplace and Copula t allow us to obtain a fraction of VaR violations much closerto the correct value of 1% and to register a number of DQ test rejections much lower than the rejection rate observed withthe Gaussian distribution. In addition to this, when a leptokurtic distribution is assumed, the VaR forecasting performancesimprove if the leverage effect is accounted for with an ADCC–EGARCH specification. It is worth noting that with flexibleinnovations distributions we are able to obtain an unconditional coverage ratio exactly equal to the correct one (whilein general Laplace models slightly over-predict the Value at Risk), even if the Dynamic Quantile test prefers the ADCCLaplace model in 3 out of 4 cases. A possible explanation is related to the leverage effect in the data: since BEKK modelsare not able to model this feature, there is evidence of violations clustering due to the unexplained asymmetric correlationbetween negative shocks and volatilities. This result is evenmore important if we take into account the additional difficultiesintroduced in the estimation of the model by considering a multivariate distribution built via Copula functions. Moreover,the Value at Risk for a portfolio built with assets distributed as a multivariate symmetric Laplace is available in closed form(at least when considering one-step-ahead forecasts). This is not true for the Copula and MDF-t cases, in which we need toretrieve the Value at Risk using simulations.Analyzing now the same exercise performed with a hedging portfolio (Table 12), we notice some important differences

from the equally weighted case. First of all the ADCC Copula strongly under-predicts the Value at Risk, achieving resultscomparable to those of the much simpler Gaussian counterpart. This could be caused by the sensible correlation over-prediction of ADCC Copula, keeping also in mind the fact that the ADCC Laplace performs better than the DCC Laplacein 3 out of 4 cases. It is also interesting how in this exercise BEKK models achieve a better match of the unconditionalcoverage ratio than DCC models; this can be explained by the fact that using DCC models, when the predicted correlationincreases, the hedging portfolio incurs in huge losses. On the other hand, since BEKK models tend to strongly under-predictthe correlation (especially when the true correlation is dynamic), as shown in Tables 6 and 7, they over-predict the true VaRof the hedging portfolio, and this results in a lower number of VaR violations. Another interesting result of this experiment


Table 12Hedging portfolio VaR forecasting results: Average Unconditional Coverage ratios and number of Dynamic Quantile test rejections.


θ


% DQ


is that, differently from the equally weighted scenario, explicitly modeling the different levels of kurtosis of the series doesnot seem to be crucial to obtain the correct unconditional coverage ratio.

6. Conclusions

The aim of this work is to evaluate the beneficial effects induced by more flexible volatility structures and probabilitydistributions of the innovations in a context where both the conditional variance and the error distribution aremisspecified.We believe that this framework replicates the vast majority of real life empirical studies. We have applied a selection ofwidely adopted Multivariate GARCHmodels to simulated data obtained via a generator able to mimic the principal stylizedfacts of observed financial time series.Our main contributions can be summarized as follows. First of all, directly modeling the conditional correlation matrix

allows us to obtain an increase in both in-sample and out-of-sample forecasting performances. Secondly we are able toshow the paramount importance of jointly modeling the asymmetric effects in volatilities and the leptokurtosis in thestandardized innovations. Thirdly, all the models considered tend to strongly underestimate the variance of the data, andBEKK models steadily under-predict the conditional correlation.Analyzing the results of the Value at Risk forecasting, we can observe how the Normal distribution is completely

inadequate in our framework, leading to strong and persistent under-predictions of the Value at Risk with both BEKK andDCC class models. Furthermore, modeling the leverage effect in the data presents substantial benefits only if the excesskurtosis of the data is taken into account. These two results together stress further the importance of capturingwith suitableprobability distributions the excess kurtosis shown by the data.It is worth noting also that explicitly modeling the different levels of heavy tailedness of the data with a Copula-DCC

GARCH model leads to results very similar to those obtained with the much computationally simpler Multivariate LaplaceDCCmodel. On the other hand, in the BEKK framework, this flexibility inmodeling the tail behavior of themarginal densitiesgreatly improves both in-sample and out-of-sample results.A natural extension of this work can be the measurement of the costs associated with a misspecification of the skewness

of the Data Generating Process.

Acknowledgements

We are grateful to the Associate Editor and two anonymous referees for their comments and suggestions that haveresulted in a much improved paper. We also thank the participants of the 2nd Workshop in Computational and FinancialEconometrics in Neuchâtel, Switzerland, June 2008 for the stimulating discussions. The first author’s researchwas supportedby PRIN 2006.

References

Abramowitz, M., Stegun, I.A., 1965. Handbook of Mathematical Functions. In: National Bureau of Standards, Applied Mathematics Series, vol. 55. DoverPublications, (Sections 9.1.1, 9.1.89, and 9.12, formulas 9.1.10 and 9.2.5).

Aielli, G.P., 2006. Consistent estimation of large scale dynamic conditional correlations. Unpublished paper. Department of Statistics, University of Florence.Bauwens, L., Hafner, C.M., Rombouts, J.V.K., 2007. Multivariate mixed Normal conditional heteroskedasticity. Computational Statistics & Data Analysis 51(7), 3551–3566.

Bauwens, L., Laurent, S., 2005. A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticitymodels. Journal of Business & Economic Statistics 23, 346–354.

Brailsford, T., Faff, R., 1996. An evaluation of volatility forecasting techniques. Journal of Banking & Finance 20 (3), 419–438.Cajigas, J., Urga, G., 2007. Dynamic conditional correlation models with asymmetric multivariate laplace innovations. Mimeo. Cass Business School, CityUniversity.

Cappiello, L., Engle, R., Sheppard, K., 2003. Asymmetric dynamics in the correlations of global equity and bond returns. ECB Working Papers Series, 204.Danielsson, J., 1998. Multivariate stochastic volatility models: Estimation and comparison with MGARCHmodels. Journal of Empirical Finance 5, 155–173.


Engle, R., 2002. Dynamic Conditional Correlation—A simple class ofmultivariateGARCHmodels. Journal of Business and Economic Statistics 20 (3), 339–350.(12).

Engle, R., Kroner, K., 1995. Multivariate simultaneous GARCH. Econometric theory 11, 122–150.Engle, R., Manganelli, S., 2004. CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles. Journal of Business and Economic Statistics 22,367–381.

Fiorentini, G., Sentana, E., Calzolari, G., 2003. Maximum likelihood estimation and inference in multivariate conditionally heteroskedastic dynamicregression models with Student t innovations. Journal of Business and Economic Statistics 21, 532–546.

Harvey, A.C., Ruiz, E., Sentana, E., 1992. Unobservable component time series models with ARCH disturbances. Journal of Econometrics 52, 129–158.Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman and Hall, London.Kotz, S., Kozubowski, T., Podgórski, K., 2001. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics,Engineering, and Finance. Birkhäuser.

Patton, A., Sheppard, K., 2008. Evaluating volatility and correlation forecasts. Oxford Financial Research Centre, OFRC Working Papers Series.Serban, M., Brockwell, A., Lehoczky, J., Srivastava, S., 2007. Modelling the dynamic dependence structure in multivariate financial time series. Journal ofTime Series Analysis 28 (5), 763–782.

Shephard, N., Andersen, T., 2008. In: Andersen, T. (Ed.), Stochastic Volatility: Origins and Overview. In: Handbook of Financial Time Series, Springer Verlag.

Documents

Model and distribution uncertainty in multivariate GARCH estimation: A Monte Carlo analysis