Scalable inference for a full multivariate stochastic volatility

Scalable inference for a full

multivariate stochastic

volatility model

SYstemic Risk TOmography:

Signals, Measurements, Transmission Channels, and Policy Interventions

P. Dellaportas, A. Plataniotis and M. Titsias UCL(London),AUEB(Athens),AUEB(Athens)

Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne February 19, 2016

I An important indicator of systemic risk is instantaneous volatilities and

correlations

I N-dimensional asset returns: rt = µt + "t , "t ⇠ N(0,⌃t ), t = 1, · · · ,T .

I The focus is shifted to modelling and predicting the covariance matrices ⌃t so

we assume that rt ⌘ "t .

I For realistic financial applications (portfolio allocation, systemic risk) think of N in

hundreds and T = 2000.

I Problem 1: The number of parameters in ⌃t is N(N + 1)/2 which grows

quadratically in N. The total number of parameters that need to be estimated is

TN(N + 1)/2.

I Problem 2: The N(N + 1)/2 parameters of each ⌃t are restricted since ⌃t

should be positive definite.

I Problem 3: There are many missing values (about 3% in the data we looked at)

and series with short lengths.

1-d Stochastic volatility model

I 1-dimensional returns

rt ⇠ N(µt ,�2t ),

with unobservable variances

log�2t+1 = µ+ � log�2

t + ⌘t , ⌘t ⇠ N(0, ⌧2),

I MCMC algorithms since 1994; sequential importance sampling, adaptive MCMC,

Laplace approximations, etc.

I Compare the stochastic volatility parameter-driven models with GARCH-type

observational-driven models

Volatility matrices - State of the art

I Two recent review articles on mulativariate stochastic volatility (Asai, McAleer,Yu,

2006; Chib, Omori, Asai, 2009); current state of the art is parsinomious

modelling of ⌃t and factor models with few independent factors, each one of

them being modelled as univariate stochastic volatility processes.

I A review article on multivariate GARCH models (Bauwens, Laurent, Rombouts;

2006); state of the art is parsimonious modelling of ⌃t and two-step estimation

procedures.

I Other approaches include Wishart processes (Philipov and Glickman; 2006) and

dynamic matrix-variate graphical models via inverted Wishart processes

(Carvalho and West; 2007).

Dynamic eigenvalue and eigenvector modelling

I We decompose ⌃t = Ut⇤t UTt and model Ut and ⇤t with an AR(1) process.

Direct modelling of Ut is hard.

I Since Ut is a rotation matrix, it can be parameterised w.r.t. N(N � 1)/2 Givens

angles, each one belonging to matrix Gjt :

Ut =

N(N�1)2

Y

j=1

Gjt

2-Dim

⌃t =

0

B

B

@

cos(!t ) sin(!t )

� sin(!t ) cos(!t )

1

C

C

A

0

B

B

@

�1t 0

0 �2t

1

C

C

A

0

B

B

@

cos(!t ) sin(!t )

� sin(!t ) cos(!t )

1

C

C

A

T

I Uniqueness: �1t > �2t , �⇡2 < !t <

⇡2

3-Dim

(Ignoring t): ⌃ = U⇤UT = G12G13G23⇤GT23GT

13GT12

U =

0

B

B

B

B

B

@

cos(!12) sin(!12) 0

� sin(!12) cos(!12) 0

0 0 1

1

C

C

C

C

C

A

0

B

B

B

B

B

@

cos(!13) 0 sin(!13)

0 1 0

� sin(!13) 0 cos(!13)

1

C

C

C

C

C

A

0

B

B

B

B

B

@

1 0 0

0 cos(!23) sin(!23)

0 � sin(!23) cos(!23)

1

C

C

C

C

C

A

U =

N(N�1)2

Y

j=1,k>j

Gjk =

N(N�1)2

Y

j=1,k>j

0

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

@

1 0 . . . . . . 0...

0 cos(!jk ) 0 . . . 0 sin(!jk ) 0 . . .

...

0 0 . . . 1 . . . 0 . . . 0...

0 � sin(!jk ) 0 . . . 0 cos(!jk ) 0 . . .

...

0 0 0 . . . 0 1

1

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

I Note the sparsity of the N-dimensional rotation matrix: it contains 4 elements

with cosines and sines of the angle, ones in the diagonal, and zeroes

everywhere else.

I rt = (r1t , . . . , rNt )T , rt ⇠ MVN

�

0,Ut⇤t UTt

o

.

I Transformations: hit = log⇤it , �it = log⇣

⇡/2+!it⇡/2�!it

⌘

, i = 1, . . . ,N, t = 1, . . . ,T

hi,t+1 = µhi + �h

i · (hit � µhi ) + �h

i · ⌘hit , i = 1, . . . ,N

�j,t+1 = µ�j + ��

j · (�jt � µ�j ) + ��

j · ⌘�jt , j = 1, . . . ,N(N � 1)

2

where ⌘hit , ⌘

�jt ⇠ N

n

0, 1o

independently, and we denote

✓h = (�h1, . . . ,�

hN , µ

h1, . . . , µ

hN ,�

h1 , . . . ,�

hN)

✓� = (��1 , . . . ,�

�N(N�1)

2, µ�

1 , . . . , µ�N(N�1)

2,��

1⌘ , . . . ,��N(N�1)

2)

Priors

hi,t+1 = µhi + �h

i · (hit � µhi ) + �h

i · ⌘hit , i = 1, . . . ,N

�j,t+1 = µ�j + ��

j · (�jt � µ�j ) + ��

j · ⌘�jt , j = 1, . . . ,N(N � 1)

2

µhi ⇠ N(µ1,�

21), i = 1, . . . ,N

�hi ⇠ N(µ2,�

22), i = 1, . . . ,N

µ�j ⇠ N(µ3,�

23), j = 1, . . . ,

N(N � 1)2

��j ⇠ N(µ3,�

23), j = 1, . . . ,

N(N � 1)2

The Exchangeability assumption via a hierarchical model allows borrowing strength.

Partial exchangeability conditional on markets, sectors, etc is probably more realistic.

A general model formulation

A more general structure is a K-factor model constructed with an N ⇥ K matrix of factor

loadings B:

I rt = Bft + et , ✏t ⇠ N (0,�2I)

I The factor loadings matrix B has fixed/known structure while its non-zero

elements follow a Gaussian prior distribution

I ft ⇠ N (0,⌃t )

I ⌃t follows the multivariate stochastic volatility model with the Givens matrix

construction

I We need to constrain B so that the model is identifiable

I We do NOT need this model only when N is large: this model can treat missing

values -this is very important in real applications.

Computation

I With the Givens angles type model formulation we now deal a non-linear

likelihood plus a Gaussian process prior

I MCMC for these problems: Use an auxiliary Langevin MCMC based on an idea

by Titsias in the discussion of the RSSB discussion paper by Girolami and

Carderhead (2011).

I The Computational complexity: It is O(d3) for Normal densities of dimesion d ;

we achieve O(d2) even for the derivatives of the likelihood wrt Givens angles, so

our MCMC algorithm has complexity O(d2).

I Missing data are treated without any problem

The Sampling algorithm

Model: rt = Bft + et , ✏t ⇠ N (0,�2I), ft ⇠ N (0,⌃t) Denote by X

all latent paths

p(B,�2, (ft)Tt=1|rest) /

TY

t=1

N (rt |Bft ,�2I)N (ft |0,⌃t(xt))

!p(B,�2),

p(X |rest) /

TY

t=1

N (ft |0,⌃t(xt))

!p(X |✓h, ✓�),

p(✓h, ✓�|rest) / p(X |✓h, ✓�)p(✓h, ✓�).

We do not need to generate the missing data in rt

Sampling the Gaussian latent process

I Denote F = (f1, . . . , fT )

I Prior p(X) = N (X |M,Q�1)

I Current state of X is Xn. Use slice Gibbs:

I Introduce auxiliary variables U that live in the same space as X :

p(U|Xn) = N (U|Xn + �2r log p(F |Xn), �

2 I)

I U injects Gaussian noise into Xn and shifts it by (�/2)r log p(F |Xn)

I We cannot sample from p(X |U) so we use a Metropolis step: Propose Y from

proposal q:

q(Y |U) =1

Z(U)N (Y |U,

�

2I)p(Y )

= N (Y |(I +�

2Q)�1(U +

�

2QM),

�

2(I +

�

2Q)�1).

where Z(U) =R

N (Y |U, �2 I)p(Y )dY .

I Accept Y with Metropolis-Hastings probability min(1, r):

r =p(F |Y )p(U|Y )p(Y )

p(F |Xn)p(U|Xn)p(Xn)

q(Xn|U)

q(Y |U)=

p(F |Y )p(U|Y )p(Y )

p(F |Xn)p(U|Xn)p(Xn)

1Z(U)N (Xn|U, �

2 I)p(Xn)

1Z(U)N (Y |U, �

2 I)p(Y )

=p(F |Y )N (U|Y + �

2 Gy ,�2 I)

p(F |Xn)N (U|Xn + �2 Gt ,

�2 I)

N (Xn|U, �2 I)

N (Y |U, �2 I)

=p(F |Y )

p(F |Xn)exp

⇢

�(U � Xn)T Gt + (U � Y )T Gy �

�

4(||Gy ||2 � ||Gt ||2)

�

where Gt = r log p(F |Xn), Gy = r log p(F |Y ) and ||Z || denotes the Euclidean

norm of a vector Z .

I The Gaussian prior terms p(Xn) and p(Y ) have been cancelled out from the

acceptance probability, so their computationally expensive evaluation is not

required: the resulting q(Y |U) is invariant under the Gaussian prior.

I Tune � to achieve an acceptance rate of around 50 � 60%.

O(K 2) computation for the K-factor MSV model

I ft ⇠ N(0,⌃t ), ⌃t = Ut⇤t UTt , Ut =

Q

K (K�1)2

j=1 Gjt

log MSV(ft ) = �K2

log(2⇡)�12

KX

i=1

hit �12

vTt vt , (1)

where vt = ⇤� 1

2t UT

t ft and where we used that log |⌃t | = log |⇤t | =PK

i=1 hit .

I Given vt the above expression takes O(K ) time to compute.

I Gij (!ji,t )T ft takes O(1) time to compute since all of its elements are equal to the

corresponding ones from the vector ft apart from the i-th and j-th elements that

become ft [i] cos(!ji,t )� ft [i] sin(!ji,t ) and ft [j] sin(!ji,t ) + ft [j] cos(!ji,t ),

respectively.

I Similarly rht log MSV and r!ij,t log MSV are calculated in O(K 2) time.

O(N2) computation for the MSV model

Initialize vt = ft .for i = 1 to N � 1 do

for j = i + 1 to N do

c = cos(!ji,t), s = sin(!ji,t)t1 = vt [i], t2 = vt [j]vt [i] c ⇤ t1 � s ⇤ t2vt [j] s ⇤ t1 + c ⇤ t2

end for

end for

vt = vt � diag(⇤� 12

t ) (elementwise product)

The Sampling algorithm revisited

Model: rt = Bft + et , ✏t ⇠ N (0,�2I), ft ⇠ N (0,⌃t )

Denote by X all latent paths

p(B,�2, (ft )Tt=1|rest) /

0

@

TY

t=1

N (rt |Bft ,�2I)N (ft |0,⌃t (xt ))

1

A p(B,�2),

p(X |rest) /

0

@

TY

t=1

N (ft |0,⌃t (xt ))

1

A p(X |✓h, ✓�),

p(✓h, ✓� |rest) / p(X |✓h, ✓�)p(✓h, ✓�).

Sampling the latent factors in O(TNK ) time

I p(ft |rest) / N (rt |Bft ,�2I)N (ft |0,⌃t ) = N (ft |��2M�1t BT rt ,M�1

t ) where

Mt = ��2BT B + ⌃t . To simulate from this Gaussian we need first to compute

the stochastic volatility matrix ⌃t and subsequently the Cholesky decomposition

of Mt . Both operations have a cost O(K 3).

I We replace the exact Gibbs step with a much faster Metropolis within Gibbs step

that scales as O(T (NK + K 2)).

I To achieve this we apply the same auxiliary Langevin scheme as before

The Data

I 571 stocks from Europe Stoxx 600 index

I Daily data from 08/01/2010 to 5/1/2014 (T = 2017)

I 36340 missing values or 36340/(571 ⇤ 2017) = 3.2%

I Factor model with 30 factors: the dimension of the latent path is

2017 ⇥ 30 ⇥ 31/2 = 937, 905

I Choice of number of factors: Based on predictive performance wrt quadratic

covariation. We tried 20, 30 and 40 factors.

Next day minimum variance portfolio weights for

the 571 stocks

Pairwise correlations across time

Log-Variances across time

I January 2009: Banking shares in the UK plummet as the Royal Bank of Scotland

posts the biggest loss in British history. The Bank of England reduces the base

rate of interest to a new historic low of 1%. The U.S. economy lost 598,000 jobs

during January 2009, with unemployment rising to 7.6 percent. Bankruptcies in

the United Kingdom rose during 2008 by 50 percent to an all-time high.

California’s Alliance Bank and Georgia’s FirstBank are closed, raising the

number of 2009 U.S. bank failures to eight.

I July 2012: Barclays chairman and Chief Executive of British bank Barclays

resign following a scandal in which the bank tried to manipulate the Libor and

Euribor interest rates systems. The central banks of the European Union, Great

Britain, and the People’s Republic of China, in what appears to be a co-ordinated

action, each loosen their respective monetary systems.

Discussion

I Incorporation of Leverage effects, Jumps

I Small N: Nested Laplace approximations (PhD thesis by Plataniotis,

AUEB),importance sampling based on copulas (in progress)

I Bayesian model determination for number of factors

I Relations with other PCN proposals

This project has received funding from the European Union’s

Seventh Framework Programme for research, technological

development and demonstration under grant agreement n° 320270

www.syrtoproject.eu

This document reflects only the author’s views.

The European Union is not liable for any use that may be made of the information contained therein.

Economy & Finance

Scalable inference for a full multivariate stochastic volatility