Balance Sheet XVA by Deep Learning and GPU · Balance Sheet XVA by Deep Learning and GPU St ephane Cr epey1, Rodney Hoskinson2, and Bouazza Saadeddine1,3 September 22, 2019 Abstract

Balance Sheet XVA by Deep Learning and GPU

Stephane Crepey1, Rodney Hoskinson2, and Bouazza Saadeddine1,3

September 22, 2019

Abstract

Two competing XVA paradigms are the semi-replication framework and a cost-of-capital, incomplete market approach. Burgard and Kjaer once dismissed an earlier in-carnation of the Albanese and Crepey holistic, incomplete market XVA model as beingwell-grounded economically but difficult to solve explicitly. We show that this model, seton a forward/backward SDE formulation, is not only elegant, but also able to be solvedefficiently using GPU computing combined with artificial intelligence methods in a wholebank balance sheet context. We calculate the mark-to-market cube (or its increment, inthe context of trade incremental XVA computations) using GPU computing and the XVAprocesses using deep learning quantile and quantile regression methods.

Keywords: Counterparty risk, X-valuation adjustment (XVA), deep learning, graphics pro-cessing units (GPU).

Mathematics Subject Classification: 91B25, 91B26, 91B30, 91G20, 91G40, 62G08, 62J02,62M45, 68Q32, 82C32.

1 Introduction

XVAs, with X as C for credit, D for debt, F for funding, M for margin, or K for capital, are post-2007–09 crisis valuation adjustments for financial derivatives. Two competing XVA paradigmsare the “semi-replication” framework of Burgard and Kjaer (2011, 2013, 2017), Green, Kenyon,and Dennis (2014), and the cost-of-capital, incomplete market approach of Albanese, Caenazzo,and Crepey (2016, 2017), Albanese and Crepey (2019b). Burgard and Kjaer once dismissed anearlier incarnation of the Albanese and Crepey holistic (balance sheet dynamics), incompletemarket XVA model as being well-grounded economically but difficult to solve explicitly. Inthis paper we show that the Albanese and Crepey model, set on a forward/backward SDEformulation, is not only elegant, but also able to be solved efficiently using GPU computingcombined with machine learning methods in a whole bank balance sheet context.

1 LaMME, Univ Evry, CNRS, Universite Paris-Saclay, 91037, Evry, France2 ANZ Banking Group, Singapore3 Quantitative Research GMD/GMT Credit Agricole CIB, ParisAcknowledgement: We are grateful for their comments to the participants of the Quantminds 2019 confer-

ence, in particular Marc Chataigner, Matthew Dixon, and Chris Kenyon, and to Lokman Abbas-Turki for hisexpert advice on the GPU implementation.

Disclaimers: The views expressed herein by Rodney Hoskinson are his personal views and do not necessarilyreflect the views of ANZ Banking Group. They are not guaranteed fit for any purpose so use at your own risk.

Email addresses: [email protected], [email protected], [email protected]

1

1.1 Outline and Contribution

Section 2 is a succint presentation of the XVA equations. The XVA deep learning regressioncomputational strategy is presented in Section 3. Section 4 is a numerical case study on large,multi-counterparty portfolios of interest rate swaps. Section A yields a primer of the CNTKbased deep learning computational toolkit used in our numerics. The underlying interest rateand FX model is briefly presented in Section B.

This paper represents an XVA implementation milestone. The first generation consistedof nested Monte Carlo implemented by explicit CUDA programming on GPUs (see Albanese,Caenazzo, and Crepey (2017), Abbas-Turki, Diallo, and Crepey (2018)). The second generation(this paper) takes advantage of GPUs leveraging via pre-coded CUDA / AAD deep learningpackages that are used for the embedded regression and quantile regression task.

Further, mainly engineering developments would regard the inclusion of XVA AAD sensi-tivities, and assess the scalability of the approach when hybrid portfolios are considered.

2 XVA Equations

In line with a cost-of-capital XVA rationale, we stick to bank shareholder valuation, i.e. valua-tion of pre-bank default cash flows only. As detailed in Albanese and Crepey (2019a, Section 4.1(see also Remark 2.1 and before there)) and Crepey and Song (2017, Section 4.2), shareholdervaluation is tantamount to valuation of all (as opposed to pre-bank default restricted) cash-flows, but with respect to the singular bank survival probability measure. We denote by Et,VaRt, and ESt (and simply, in case t = 0, E, VaR, and ES) the time-t conditional expectation,value-at-risk, and expected shortfall with respect to the latter stochastic basis.

As a consequence, all our XVAs are nonnegative and, even though we do crucially includethe default of the bank itself in our modeling, unilateral. This makes them naturally in line withthe regulatory requirement that capital should not diminish as an effect of the sole deteriorationof the bank credit spread.

To alleviate the notation we use the risk-free asset used as numeraire.

2.1 Cash Flows

The client derivative portfolio of a bank, with final maturity T , is partitioned into bilateralnetting sets of contracts which are jointly collateralized and liquidated upon clients or bankdefault. Given a netting set c of the client portfolio, we denote by:

• Pc and P c, the corresponding contractually promised cash flows and clean value processes;

• τc, Jc, and Rc, the corresponding default times, survival indicators, and recovery rates,whereas τ , J , and R are the analogous data regarding the bank itself, with bank creditspread process λ taken as a proxy of its risky funding spread process (referring the readerto Albanese et al. (2017) and Armenti and Crepey (2019) for the discussion of cheaperfunding schemes for initial margin);

• τ δc = τc + δ and τ δ = τ + δ, where δ is a positive margin period of risk, in the sense thatthe liquidation of the netting set c happens at time τ δc ∧ τ δ;

• VMc,PIMc, and RIMc, the variation margin (re-hypotecable collateral) exchanged be-tween the bank and client c, counted positively when received by the bank, and the re-lated initial margin (segregated collateral) posted and received by the bank, all assumedstopped before τc ∧ τ ;

2

• RC and CR, the reserve capital and capital at risk of the bank, respectively meant tocope with the expected and the beyond-expected losses of the bank.

The contractually promised cash flows are supposed to be hedged out by the bank (but oneconservatively assumes no XVA hedge), so that the bank is left with the following trading cashflows C and F (see Albanese and Crepey (2019a, Lemmas 5.2 and 5.3) for detailed derivationsof analogous equations in a slightly simplified setup), where δ means a Dirac measure:

• The (counterparty) credit cash flows

dCt = ∑c;τc≤τδ

(1−Rc)(

(P c + Pc)τδc∧τδ − (Pc + VMc + RIMc)(τc∧τ)−

)+δτδc∧τδ(dt)

− (1−R)∑c;τ≤τδc

((P c + Pc)τδ∧τδc − (Pc −VMc + PIMc)(τ∧τc)−

)−δτδ∧τδc (dt);

(1)

• The (risky) funding cash flows

dFt = Jtλt

(∑c

Jc(P c −VMc)− RC−CR)+tdt

− (1−R)(∑

c

Jc(P c −VMc)− RC−CR)+τ−

δτ (dt)

+ Jtλt∑c

JctPIMctdt− (1−R)

∑c

Jcτ−PIMcτ−δτ (dt),

(2)

where the RC and CR terms account for the fungibility of reserve capital and capital atrisk with variation margin.

2.2 Valuation

The (other than K)VA equations are

RC = CA = CVA + FVA + MVA, (3)

the so-called “contra-assets valuation” sourced from the clients and deposited in the reservecapital account of the bank, where, for t < τ ,

CVAt = Et∑t<τδc

(1−Rc)(

(P c + Pc)τδc − (Pc + VMc + RIMc)τc−

)+FVAt = Et

∫ T

t

λs

(∑c

Jc(P c −VMc)− CA− CR)+sds

MVAt = Et∫ T

t

λs∑c

JcsPIMcsds,

(4)

3

giving rise to a trading loss and profit process L of the bank such that

L0 = 0 and, for t < τ,

dLt =∑c

(1−Rc)(


)+δτδc (dt)

+ λt

(∑c

Jc(P c −VMc)− CA− CR)+tdt

+ λt∑c

JctPIMctdt

+ dCAt.

(5)

The capital at risk (CR) of the bank is its resource devoted to cope with losses beyond theirexpected levels that are already taken care of by reserve capital CA.

Definition 2.1 ECt is the time-t conditional 97.5% expected shortfall1 of (L◦t+1 − L◦t ) underQ.

The (KVA) risk margin is loss-absorbing, hence part of capital at risk. As a consequence,shareholder capital at risk (SCR) is only the difference between the capital at risk (CR) of thebank and the (to be determined) KVA risk margin, i.e.

SCR = CR−KVA. (6)

Given a positive target hurdle rate h:

Definition 2.2 We set

CR = max(EC,KVA), (7)

for a KVA process such that, for t < τ ,

KVAt = Et[ ∫ T

t

h(CRs −KVAs

)ds]. (8)

That is, for t < τ ,

KVAt = Et[ ∫ T

t

he−h(s−t)CRsds]

= Et[ ∫ T

t

he−h(s−t) max(ECs,KVAs

)ds].

(9)

The next-to-last identity is the continuous-time analog of the risk margin formula under theSwiss solvency test cost of capital methodology: see Swiss Federal Office of Private Insurance(2006, Section 6, middle of page 86 and top of page 88). This formula can be used either in thedirect mode, for computing the KVA corresponding to a given h, or in the reverse-engineeringmode, for defining the “implied hurdle rate” associated with the actual level on the risk marginaccount of the bank. Cost of capital proxies have always been used to estimate return on equity(ROE). The KVA is a refinement, dynamic and fine-tuned for derivative portfolios, but the baseROE concept itself is far older than even the CVA. In particular, the KVA is very useful in thecontext of collateral and capital optimization.

1See e.g. Follmer and Schied (2016, Section 4.4).

4

2.3 The XVA Equations are Well-Posed

In view of (3), the second line in (4) is in fact an FVA equation. Likewise, the second line in(9) is a KVA equation. Moreover, as capital at risk is fungible with variation margin, i.e. inconsideration of the CR terms in (4)-(5), where CR = max(EC,KVA), we actually deal withan (FVA,KVA) system, and even, as EC depends on L (cf. Definition 2.1), with a forwardbackward system for the forward loss process L and the backward pair (FVA,KVA).

However, the coupling between (FVA,KVA) and L can be disentangled by the followingPicard iteration:

• Let CVA and MVA be as in (4), L(0) = KVA(0) = 0, and , for t < τ ,

FVA(0)t = Et

∫ T

t

λs

(∑c

Jc(P c −VMc)− CA(0))+sds, (10)

where CA(0) = CVA + FVA(0) + MVA;

– Hence FVA(0) is the FVA accounting only for the re-hypothecation of the variationmargin received on hedges, and reflecting the possible use of reserve (but not risk)capital as VM;

• For k ≥ 1, writing explicitly EC = EC(L) to emphasize the dependence of EC on L, let

L(k)0 = 0 and, for t < τ ,

dL(k)t =

∑c

(1−Rc)(


)+δτδc (dt)

+ λt

(∑c

Jc(P c −VMc)− CA(k−1)−max(EC(L(k−1)),KVA(k−1)))+

tdt

+ λt∑c

JctPIMctdt+ dCA

(k−1)t ,

KVA(k)t = hEt

∫ T

t

e−h(s−t)max(ECs(L

(k)),KVA(k)s

)ds,

CA(k)t = CVAt + FVA

(k)t + MVAt where FVA

(k)t =

Et∫ T

t

λs

(∑c

Jc(P c −VMc)− CA(k) −max(EC(L(k)),KVA(k)

))+sds.

(11)

Theorem 4.1 in Crepey, Elie, Sabbagh, and Song (2019) Assuming square integrabledata, the XVA equations are well-posed within square integrable solution (including when oneaccounts for the fact that capital can be used for funding variation margin). Moreover, the abovePicard iteration converges to the unique square integrable solution of the XVA equations.

2.4 Collateralization Schemes

We denote by ∆ct = Pct − Pc(t−δ)− the cumulative contractual cash flows with the client c

accumulated over a past period of length δ. We consider both “no CSA” netting sets c, withVM = RIM = PIM = 0, and “(VM/IM) CSA” netting sets c, with VMc

t = P ct and, for t ≤ τc,

RIMct = VaRt

((P ctδ + ∆c

tδ)− Pct

), PIMc

t = VaRt(− (P ctδ + ∆c

tδ) + P ct

), (12)

5

for some PIM and RIM quantile levels apim and arim (and tδ = t+ δ). The default times of theclients and the bank itself are jointly modeled by a “common shock” or dynamic Marshall-Olkincopula as per Crepey, Bielecki, and Brigo (2014, Chapt. 8–10) and Crepey and Song (2016)(see also Elouerkhaoui (2007, 2017)).

The following result can be derived by similar computations as the ones in Armenti andCrepey (2019, Appendix).

Proposition 2.1 In a common shock default model of the clients and the bank itself, withpre-default intensity processes γc of the clients and assuming continuous market factors, thenCVA = CVAnocsa + CVAcsa, where, for t < τ,

CVAnocsat =

∑c nocsa

1t<τc(1−Rc)Et∫ T

t

(P csδ + ∆csδ)

+γcse−

∫ stγcududs

+∑

c nosca

1τc<t<τδc (1−Rc)Et(P cτδc + ∆cτδc

)+, (13)

CVAcsat =

∑c csa

1t<τc(1−Rc)(1− arim)×

Et∫ T

t

(ESs − VaRs) ((P csδ + ∆csδ)− P

cs ) γcse

−∫ stγcududs (14)

+∑c csa

1τc<t<τδc (1−Rc)Et(

(P cτδc + ∆cτδc

)− (P cτc + RIMcτc))+

,

where (ESs − VaRs) in (14) is computed at the arim confidence level.Assuming its posted initial margin borrowed unsecured by the bank, then MVA = MVAcsa,

where, for t < τ,

MVAcsat =

∑c csa

Jct Et∫ T

t

(1−R)γcsPIMcse−

∫ stγcududs. (15)

3 XVAs Computational Strategy with Deep Learning

3.1 Computational challenges

A CVA as per (4) can be computed as the sum of the CVAs restricted to each netting set (i.e.client of the bank) c. The (initial margins and) the MVA are also most accurately calculated ateach netting set level. By contrast, we deal with a semilinear FVA equation that can only besolved at the level of the overall client portfolio of the bank. The KVA can only be computedat the level of the overall portfolio and relies on conditional risk measures of future fluctuationsof the trading loss process L, which itself involves future fluctuations of other XVAs (as theseare part of the bank liabilities).

These are heavy computations encompassing all the derivative contracts of the bank. Yetthese computations require accuracy so that trade incremental XVA computations (which arerequired as XVA add-ons to derivative entry prices, see Albanese and Crepey (2019b, Section5)) are not in the numerical noise of the machinery.

3.2 Deep Regression XVA Framework

As developed in Abbas-Turki, Diallo, and Crepey (2018), computational strategies for the XVAequations involve a mix of nested Monte Carlo and of simulation/regression schemes, optimally

6

implemented on GPUs. In view of Figure 1 in Abbas-Turki, Diallo, and Crepey (2018), apure nested Monte Carlo approach would involve five nested layers of simulation. Moreover,nested Monte Carlo implies intensive repricing of the mark-to-market cube, i.e. pathwise cleanvaluation P c of each netting set c, or/and high dimensional interpolation.

In what follows, we use no nested Monte Carlo or conditional repricing of future MtM cubes:beyond the base MtM layer, each successive XVA layer is “learned” instead.

Specifically, we calculate the mark-to-market cube (or the impact on the cube of a new trade,in the context of trade incremental XVA computations) using GPU computing, the pathwiseXVAs by deep learning regression (extension of Longstaff and Schwartz (2001) kind of schemesto deep neural network regression bases as also considered in Hure, Pham, and Warin (2019)or Beck, Becker, Cheridito, Jentzen, and Neufeld (2019)), and the conditional value-at-risksand expected shortfalls involved in the embedded pathwise EC and IM computations by deepquantile regression, as follows.

We recall from Fissler, Ziegel, and Gneiting (2016) and Fissler and Ziegel (2016) that value-at-risk is elicitable, expected shortfall is not, but their pair is jointly elicitable. As a consequence(cf. Dimitriadis and Bayer (2019)), given features X and labels Y (random variables), theconditional value-at-risk and expected shortfall functions q(·) and e(·) such that VaR(Y |X) =q(X) and ES(Y |X) = e(X) can be characterized as the minimizer, over all measurable pair-functions (q(·), e(·)), of the error (expected loss)

Eρ(q(·), e(·);X,Y ), (16)

for a loss function ρ of the form

ρ(q(·), e(·);X,Y ) = (1{Y <q(X)} − α)G1(q(X))− (1{Y <q(X)})G1(Y )+

G2(e(X))(e(X)− q(X) + α−1(q(X)− Y ))1{Y <q(X)} −G2(e(X)) +G(Y ),

where the functions G1, G2, G2 = G2, and G satisfy suitable regularity conditions. The lastterm, G, plays no role in the minimization, but is meant to guarantee nonnegativity and thepossibility to define a related training pseudo-R-squared.

In practice, we minimize numerically the error (16) for G1(z) = z, G2 = ln(1 + ez), andG(z) = αG1(z)+G2(z) , based on m independent simulated values of (X,Y ), stated over a deepneural network family of functions (q, e)(x) ≡ (q, e)θ(x), with inputs X and network weights θ(multilayer perceptron, see e.g. Goodfellow, Bengio, and Courville (2017), while Dimitriadis andBayer (2019) restrict themselves to linear regression). The minimizing pair (q, e)θ representsthe two scalar neural network outputs for the respective conditional VaR and ES functions(q, e).

The left and right panels of Figure 1 show the respective neural networks for pathwisevalue-at-risk/expected shortfall (with error (16)) and pathwise XVAs (with classical quadraticnorm error). Beyond their now extensively documented “unreasonably good” generalizationand scalability performances (see Goodfellow et al. (2017) and Section 4.3), in the case ofconditional value-at-risk and expected shorfall computations, deep learning (quantile) regressionis also easier to implement (see Section A) than more naive methods, such as resimulation andsort-based schemes for the value-at-risk and expected shorfall at each outer node of a nestedMonte Carlo simulation (cf. Barrera, Crepey, Diallo, Fort, Gobet, and Stazhynski (2019)).

7

H.,2

H20,1 H20,2

H1,1

H.,1

H1,2 H1,3

H.,3

H20,3

ES XVAt

VaR XVAt

RFnt

RF1t

Input Layer 3 by 20 Hidden layers Output Layer

H.,2

H20,1 H20,2

H1,1

H.,1

H1,2 H1,3

H.,3

H20,3

XVAt

RFnt

RF1t

Input Layer 3 by 20 Hidden layers Output Layer

Figure 1: Neural networks with state variables (realizations of the risk factors at the consid-ered pricing time) as features, 100 epochs, full batch training, 3 by 20 hidden nodes. (Left)Joint value-at-risk/expected shortfall neural networks: labels are pathwise XVA item (e.g. lossfunction 1 year increment), output is joint estimate of pathwise conditional value-at-risk and ex-pected shorfall of the label given the features at a selected confidence level; Hyperbolic tangentactivation. (Right) XVAs neural network: labels are pathwise XVA items (e.g. CVA), outputis estimate of pathwise conditional mean of the label given the features at selected alpha level;ReLU activation.

3.3 Deep XVAs Algorithm

Given pricing times ti with time step h, our fully discrete (time and space) algorithm forsimulating the Picard iteration (11) until numerical convergence to the XVA processes goes asfollows:

• For each pricing time t = ti and client c, learn the corresponding VaRt and ESt termsvisible in (12) or (under the time-discretized outer integral in) (14);

• For each pricing time t = ti and client c, learn the corresponding Et terms visible in (13)through (15);

• For FVA(0), consider the following time discretization of (10) with time step h:

FVA(0)t ≈ Et[βtβ−1t+hFVA

(0)t+h] + hλt

(∑c

Jct (P ct −VMct)− CA

(0)t

)+(17)

and, for each t = ti, learn the corresponding Et in (17), then solve the semi-linear equation

for FVA(0)t (where CA

(0)t includes FVA

(0)t , see after (10));

• For each Picard iteration k, simulate forward L(k) as per the first line in (11) (which onlyuses known or already learned quantities), and:

– For economic capital EC(L(k)), for each t = ti, learn ESt((L(k))◦ti+1 − (L(k))◦ti

)(cf. Definition 2.1);

– KVA(k) and FVA(k) then require a backward recursion solved by deep learning ap-proximation much like the one for FVA(0) above.

Numerically, one thus iterates the Picard iteration (11) over k as many times as is requiredto reach a fixed point within a preset accuracy. For the reasons explained after Equation (32)in Albanese, Caenazzo, and Crepey (2017, Section 5), two to three iterations (k = 2 or 3) aresufficient.

8

4 Swap Portfolio Case Study

We consider an interest rate swap portfolio case study with 10 clients in different economies,involving 10 one-factor Hull White interest-rates, 9 Black-Scholes exchange rates, and 11 Cox-Ingersoll-Ross default intensity processes (common shock drivers, cf. Section 2.4), i.e. up to40 risk factors used as deep learning features (including the client default indicators): SeeSection B for more detail.

In this model we consider a bank portfolio of 10,000 randomly generated swap trades, with

• swap rates uniformly distributed on [0.005, 0.05],

• number of six-monthly coupon resets uniform on [5 . . . 60],

• trade currency and counterparty both uniform on [1, 2, 3 . . . , 10],

• notional uniform on [10000, 20000, . . . , 100000],

• direction: either “asset heavy” bank 75% likely to pay fixed in the swaps, or “liability-heavy” bank 75% likely to receive fixed.

• collateralization: either “no CSA” portfolio” without initial margin (IM) nor variationmargin (VM), or “CSA” portfolio with VM = MtM and IM posted by the bank (PIM)pledged at 99% gap risk value-at-risk, IM received by the bank (RIM) covering 75% gaprisk and leaving excess as residual gap CVA,

• 97.5% expected shortfall of 1-year ahead loss process for economic capital,

• incremental trade given as a par 30 year swap with 100k notional.

We use GPU-based Monte Carlo simulation with 105 paths with 10 time steps per year riskfactor evolution and 2 portfolio pricings per year. Our deep learning XVA implementation usesCNTK, the Microsoft Cognitive Toolkit. The figures that follow only display profiles (i.e. termstructures, that is, expectations as a function of time), but all the corresponding processesare computed pathwise, based on the algorithm of Section 3.3, allowing for all XVA inter-dependencies. XVA profiles (or pathwise XVAs if wished) are of course much more informativefor traders than the spot XVA values (or time 0 confidence intervals) returned by most XVAsystems.

4.1 Portfolio-Wide XVA Profiles

Figure 2 is a sanity check that the deeply regressed CVA in one year, CVAc1 (here in the case

of a no CSA netting set c), is consistent with the output of a nested Monte Carlo. In fact,the design of the corresponding neural network architecture is guided by k-fold cross-validationand benchmarking with nested Monte Carlo.

9

Figure 2: Random variable CVAc1 (in the case of a no CSA netting set c) obtained by deep

regression (green histogram) versus nested Monte Carlo (orange histogram) applied to the cor-responding integrands (blue histogram) visible in (13).(Left): in-sample distributions; (Right):out-of-sample distributions.

Figure 3 is a further sanity check that (left) the profiles of the successives iterates L(k) of theloss process converge rapidly with k, with the profile for k = 3 already visually indistinguishablefrom the one for k = 2 (see before Section 4), and (right) the loss process L(3) (as, in fact, anyiterate L(k)), displayed as its mean and mean ± 2 stdev profiles, is numerically centered aroundzero (consistent with its martingale property). The latter holds, at least, beyond t ∼ 5 years:for earlier times, the regression errors, accumulated backward across pricing times since thefinal maturity of the portfolio, induce a significant bias (the corresponding confidence intervalsno longer contains 0) .

-14000

-12000

-10000

-8000

-6000

-4000

-2000

0

2000

4000

6000

0 5 10 15 20 25 30

Dom

estic

Cur

recn

y U

nits

Year

Asset-Heavy Bank Mean Loss Process

Loss Iter 0 Loss Iter 1 Loss Iter 2

Loss Profile Undiscounted: 0 -6046.73 -7346.54 -7480.53 -8028.38 -7805.07 -7793.8Year 0 0.25 0.5 0.75 1 1.25 1.5

Loss Profile Loss Mean 0 -6074.34 -7500.03 -7881.7 -8777.84 -9012.09 -9537.71Loss +2SE: Loss Mean+2SE 0 -3776.55 -4346.32 -4106.97 -4459.56 -4218.33 -4325.98Loss -2SE: Loss Mean-2SE 0 -8372.13 -10653.7 -11656.4 -13096.1 -13805.8 -14749.5

-25000

-20000

-15000

-10000

-5000

0

5000

10000

15000

20000

0 5 10 15

Dom

estic

Cur

recn

y U

nits

Year

Asset-Heavy Bank Loss Process

Loss Mean Loss Mean+2SE Loss Mean-2SE

Figure 3: (Left) Profiles of the processes L(k), for k = 1, 2, 3; (Right) Mean ± 2 stdev profilesof the process L(3).

Figure 4 shows the out-of-samples (top) error profiles related to the deep learning of path-wise CVAs per counterparty and (bottom) FVA profiles, reusing (right) or not (left) the trainedweights from the previous learning time step (ti) to initialize the training of the weights of thenext learning time step (ti−1). The CVA error profiles reveal more difficulty in learning theearliest CVAs, because of a high (absolute) variance of the corresponding cash flows (integratedover longer time frames), as well as the latest CVAs, because of a high (relative) variance of

10

the corresponding (counterparty default) cash flows, which are mostly (but not all) zeros. Theblue FVA curves represent the mean FVA originating cash flows, which, in principle as on thepictures, should match the orange mean FVA itself learned from these cash flows. In terms ofsuch mean) FVA profiles, the impact of reusing the weights or not is practically indiscernible.However, regarding the extremes, one can see that the 5th and 95th percentiles FVA estimateswith reuse of the trained weights are a bit smoother in time than the one without it.

Figure 4: Out-of-sample learning performance. (Top) learning error profiles. (Bottom) LearnedFVA profiles. (Left) With reuse of the trained weights from one time step to the previous one.(Right) Without it.

Figure 5 shows the porftolio-wide XVA profiles of the asset-heavy (top) vs. liability–heavy(bottom) portfolio and of the no CSA (left) vs. CSA portfolio (right). Obviously, asset–heavy or no CSA means more CVA. The correponding curves also emphasize the transferfrom counterparty credit into funding risk prompted by extensive collateralisation. HoweverFVA/MVA risk is ignored in current derivatives capital regulation.

Figure 6 shows that (top left) capital at risk as funding has a material impact on the already(reserve capital as funding)-reduced FVA, (top right) treating KVA as a risk margin gives ahuge discounting impact, (bottom left) deep learning detects material initial margin convexityin the asset-heavy CSA portfolio, and (bottom right) deep learning detects material economiccapital convexity in the asset-heavy no CSA portfolio. These observations demonstrate thatpathwise capital and margin calculations are indeed necessary for accurate FVA, MVA, andKVA calculations.

4.2 Incremental Trades

Figure 7 shows the trade incremental XVA profiles produced by our deep learning approach.Note that, for obtaining such smooth incremental profiles, it has been key to use commonrandom numbers, as much as possible, between the original portfolio XVA computations andthe ones regarding the portfolio expanded with the new trade.

11

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Asset-Heavy - XVA no CSA

CVA

FVA

KVA

Do

me

stic

Cu

rre

ncy

Un

its

30

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Asset-Heavy - XVA IM CSA

CVA

MVA

KVA

-

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

450,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Liability-Heavy - XVA no CSA

CVA

FVA

KVA

Do

me

stic

Cu

rre

ncy

Un

its

30

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Liability-Heavy - XVA IM CSA

CVA

MVA

KVA

Figure 5: (Top left) Asset-heavy portfolio, no CSA. (Top right) Asset-heavy portfolio, VM/IMCSA. (Bottom left) Liability–heavy portfolio, no CSA. (Bottom right) Liability-heavy portfolio,VM/IM CSA.

Years

30

-

10

20

30

40

50

60

70

80

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Asset-Heavy - Incremental XVA IM CSA

CVA

MVA

KVA

-70

-60

-50

-40

-30

-20

-10

-

10

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Asset-Heavy (mainly Payer) - Incremental Receiver XVA IM CSA

CVA

MVA

KVA

-

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0

1,000.0

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Liability-Heavy - Incremental XVA no CSA

CVA

FVA

KVA

-1,000.0

-900.0

-800.0

-700.0

-600.0

-500.0

-400.0

-300.0

-200.0

-100.0

-

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Liability-Heavy - Incremental XVA no CSA

CVA

FVA

KVA

Figure 7: (Top left) Asset-heavy portfolio, no CSA. Incremental receive fix trade. (Top right)Liability-heavy portfolio, no CSA. Incremental pay fix trade. (Bottom left) Asset-heavy port-folio under CSA. Incremental Pay Fix Trade. (Bottom right) Liability-heavy portfolio underCSA. Incremental receive fix trade.

12

-

50,000

100,000

150,000

200,000

250,000

300,000

350,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

cny

Un

its

Years

Swaps Portfolio Liability-Heavy - FVA offsets - no CSA

FVA No Offset - Bank level FCA

FVA CA Offset

FVA CA EC Offset

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

mst

ic C

urr

en

cy U

nit

sYears

Swaps Portfolio Asset-Heavy - KVA Discounting no CSA

Discount OIS+h Discount OIS

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

2,000,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Asset-Heavy - Posted IM Unconditional vs Average Conditional

Unconditional Average Conditional

80

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

0 1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Do

me

stic

Cu

rre

ncy

Un

its

Years

Swaps Portfolio Asset-Heavy- Convexity ES(L) Unconditional vs Average Conditional: no CSA

Unconditional Average Conditional

Figure 6: (Top left) FVA ignoring the off-setting impact of reserve capital and capital at risk(blue), FVA as per (10) accounting for the off-setting impact of reserve capital but ignoringthe one of capital at risk (green), refined FVA as per (4) accounting for both impacts (red).(Top right) KVA ignoring the off-setting impact of the risk margin, i.e. with CR instead of(CR − KVA) in (8) (red), refined KVA as per (7)–(8) (blue). (Bottom left) In the case ofthe asset-heavy portfolio under CSA, unconditional PIM profile, i.e. with VaRt replaced byVaR in (12) (blue), vs. path-wise PIM profile, i.e. mean of the path-wise PIM process as per(12) (red). (Bottom right) In the asset-heavy portfolio no CSA case, unconditional economiccapital profile, i.e. EC profile ignoring the words “time-t conditional” in Definition 2.1 (blue),vs. path-wise economic capital profile, i.e. mean of the path-wise EC process as per Definition2.1 (red).

13

4.3 Scalability

Additional results obtained by doubling the numbers of counterparties and risk factors (for20 counterparties and 80 risk factors) indicate that scaling to realistically high dimension isachievable. The timings displayed in Table 1 are taken on Lenovo P52 laptop with NVidiaQuadro P3200 GPU @ 5.5 Teraflops peak FP32 performance, and 14 streaming multiprocessors.NVidia Tesla V100 should be at least 6-10 times faster. Cache computations and scale overmultiple V100s for acceptable trade incremental pricing performance.

Computation Time (seconds)10 clients 40 risk factors 20 clients 80 risk factorsNo CSA CSA No CSA CSA

Initial Risk Factor &Trade Pricing Simulation Cuda 118.0 118.0 134.0 134.0Counterparty Learning Calculations 386.7 1,890.0 435.8 2,211.1Bank Level Learning Calculations 525.3 156.0 687.4 229.9Total Initial Batch 1,030.0 2,164.0 1,257.2 2,574.9

Re-simulate 1 counterparty trade pricing Cuda 19.0 19.0 25.0 25.0Counterparty Learning Calculations 38.7 189.0 43.6 221.1Bank Level Learning Calculations 525.3 156.0 687.4 229.9Total Incremental Trade 583.0 364.0 756.0 476.0

Table 1: GPU Simulation + CNTK Learning Indicative Timings

A Deep Learning Computational Toolkit

Our deep learning XVA implementation uses CNTK, the Microsoft Cognitive Toolkit, which isendowed with the following features:

• Open source unified deep learning toolkit (url https://cntk.ai, but last MS release 2.7April 2019),

• Neural network computational steps described via a directed graph,

• Easily realize and combine DNNs, CNNs, RNNs/LSTMs and other types,

• Implements stochastic gradient descent (SGD, error backpropagation) learning with au-tomatic differentiation on CPU and GPU,

• Parallelization across multiple GPUs and servers,

• Core in C++/CUDA, wrappers for Python, C#, Java.

The last point is key for XVA applications, which are usually developed in C++: CNTKautomatic differentiation in C++/CUDA means C++ training, as opposed to automatic dif-ferentiation in Python (hence no C++ training) with Tensorflow. CNTK allows embeddingthe deep learning task within XVA process, hence leveraging GPU: our ensuing deep learningcalculations achieve around 80 to 90% Cuda occupancy; the GPU acceleration factor is of theorder of 15.

A.1 CNTK—Joint value-at-risk and expected shortfall neural net-work loss function in C++

14

i n l i n e FunctionPtr RiskMeasureLearning : : VaRES Loss ( const Var iab le& pred i c t i on ,const Var iab le& l a b e l s , const double q , const std : : wstr ing& name)

{// Define P laceho lder v a r i a b l e sVar iab le l a b e l P l a c e h o l d e r = Placeho lde rVar i ab l e (L” l a b e l ” ) ;Var iab le q u a n t i l e P l a c e h o l d e r = Placeho lde rVar i ab l e (L” q u a n t i l e ” ) ;Var iab le pred ict ionPlaceho lderVaR = Placeho lde rVar i ab l e (L” predictionVaR ” ) ;Var iab le pred i c t i onPlaceho lde rES = Placeho lde rVar i ab l e (L” pred ict ionES ” ) ;// S l i c e the 2−element NN p r e d i c t i o n tensor i n t o VaR and ESVar iab le predict ionVaR = S l i c e ( p r ed i c t i on , { Axis (0 ) } , { 0 } , { 1 }) ;Var iab le pred ic t ionES = S l i c e ( p r ed i c t i on , { Axis (0 ) } , { 1 } , { 2 }) ;

FunctionPtr one = OnesLike ( l a b e l P l a c e h o l d e r ) ;FunctionPtr Ind = ElementTimes ( one , Less ( l abe lP l a c eho lde r ,

pred ict ionPlaceho lderVaR ) ) ;FunctionPtr alpha = ElementTimes ( quant i l eP laceho lde r , one ) ;FunctionPtr VaRP1 = ElementTimes ( Minus ( Ind , alpha ) , pred ict ionPlaceho lderVaR ) ;FunctionPtr VaRP2 = ElementTimes ( Ind , l a b e l P l a c e h o l d e r ) ;//g1 ( z )=z ; g2 ( z )=exp ( z ) /(1+exp ( z ) ) , g cu r l y 2=ln (1+eˆ z )FunctionPtr ESP1 = Minus ( pred ic t ionPlaceho lderES , pred ict ionPlaceho lderVaR ) ;FunctionPtr ESP2 = ElementTimes ( Minus ( predict ionPlaceholderVaR , l a b e l P l a c e h o l d e r

) , Ind ) ;FunctionPtr ESP3 = ElementDivide (ESP2 , q u a n t i l e P l a c e h o l d e r ) ;FunctionPtr ESP4 = ElementDivide (Exp( pred i c t i onP laceho lde rES ) , Plus ( one , Exp(

pred i c t i onPlaceho lde rES ) ) ) ;FunctionPtr ESP5 = Log ( Plus ( one , Exp( pred i c t i onPlaceho lde rES ) ) ) ;FunctionPtr ESP6 = Plus ( ElementTimes ( alpha , l a b e l P l a c e h o l d e r ) , Log ( Plus ( one , Exp

( l a b e l P l a c e h o l d e r ) ) ) ) ;FunctionPtr ESVaR = Plus ( Minus ( ElementTimes (ESP4 , Plus (ESP1 , ESP3) ) , ESP5) , ESP6

) ;

FunctionPtr jointVaRESloss = Plus ( Minus (VaRP1, VaRP2) , ESVaR) ;//Bind p l a c e h o l d e r s and return l o s s f u n c t i o nr e turn AsBlock ( std : : move( jointVaRESloss ) , { {predict ionPlaceholderVaR ,

predictionVaR } , { pred ic t ionPlaceho lderES , pred ic t ionES } , { l abe lP l a c eho lde r ,l a b e l s } , { quant i l eP laceho lde r , Constant : : S ca l a r ( ( f l o a t ) q ) } } , L”JointVaRESLoss” , name) ;

A.2 CNTK: Training the neural network for conditional value-at-riskand expected shortfall in C++

void RiskMeasureLearning : : VaRESDeepLearning ( const MatrixXd& X, const MatrixXd& Y, const double quant i l e , const Dev iceDesc r ip tor& dev i ce )

{rescaleXYData (X, Y) ;auto inputVar = InputVar iab le ({ m inputDim } , DataType : : Float , L” f e a t u r e s ” ) ;auto labe l sVar = InputVar iab le ({ m outputDim } , DataType : : Float , L” Labels ” ) ;NDShape inputShape ({ m inputDim }) ;ValuePtr inputValue = Value : : CreateSequence ( inputShape , m inputData , device ,

t rue ) ;NDShape labe lShape ({ m outputDim }) ;ValuePtr l abe lVa lue = Value : : CreateSequence ( labelShape , m labelData , device ,

t rue ) ;

auto tra in ingOutput = FullyConnectedFeedForwardRegressionNet ( inputVar ,m outputDim + 1 , m hiddenLayersDim , m numHiddenLayers , device ,m nonLinearity , L” tra in ingOutput ” ) ;

auto t r a i n i n g L o s s = ReduceSum( VaRES Loss ( trainingOutput , labe l sVar , quant i l e , L”LossFunction ” ) , Axis : : AllAxes ( ) , L” LossFunction ” ) ;

m pred ic t ion = tra in ingOutput ;

15

ProgressWriterPtr pw = MakeSharedObject<MyProgressWriter >(0 , 0 , 0 , 0 , 0 , 0) ;m learner=MomentumSGDLearner( trainingOutput−>Parameters ( ) , m learningRate ,

m Momentum, t rue ) ;m tra iner=CreateTra iner ( trainingOutput , t ra in ingLos s , m predict ion , vector<

LearnerPtr >({ m learner }) , { pw }) ;f o r ( s i z e t i = 0 ; i < m iterat ionCount ; ++i )

m tra iner−>TrainMinibatch ({ { inputVar , inputValue } , { l abe l sVar ,l abe lVa lue } } , d ev i c e ) ;

EvaluateSequence ( m tra iner−>Evaluat ionFunct ion ( ) , m inputData , learningRM : :JointVaRES , f a l s e , dev i c e ) ;

B Multi-Risk Factor Framework

Considering a covariance structure directly between the risk factors will make it difficult tohandle the default intensity. Instead, we will impose a correlation structure between the drivingBrownian motions of our factors. Let R ∈ M4,4(R) be our correlation matrix, which we will

assume to be positive definite. We can then write the Cholesky decomposition R =(R

12

)>R

12 .

For every currency i ∈ {1, 2}, we consider a vector of independent Brownian motions W(i)t

where the superscript i indicates that the process is a Brownian motion in the Q(i) world, whichin turn is where cash-flows in the currency i are priced.

B.1 Interest rates, HW, i ∈ {1, 2}, factor idx = i

dri(t) = (θi(t)− airi(t))dt+ σi

⟨R

12 ei,dW

(i)t

⟩with θi an exogenous deterministic function to be able to fit the forward curve at time 0.

Or equivalenty, we may write ri(t) = xi(t) + βi(t) with βi being a deterministic functionand: {

dxi(t) = −aixi(t)dt+ σi

⟨R

12 ei,dW

(i)t

⟩xi(0) = 0

If fi(0, .) is the time-0 instantaneous forward curve for rate i, then:

∀t ≥ 0, βi(t) = fi(0, t) +σ2i

2a2i(1− e−ait)2

B.2 FX rate, GBM, (i, j) ∈ {(1, 2), (2, 1)}, factor idx = 3

d FXj,i(t)

FXj,i(t)= (ri(t)− rj(t))dt+ αj,iσ3

⟨R

12 e3,dW

(i)t

⟩where α1,2 = −1 and α2,1 = 1. By comparing with the dynamics obtained by FX inversion,one can deduce the following Brownian motion change between the Q(1) and the Q(2) worlds:

dW(2)t = dW

(1)t − σ3R

12 e3dt

or equivalently:(dQ(2)

dQ(1)

)t

= exp

(∫ t0σ3

⟨R

12 e3,dW

(i)s

⟩− 1

2

∫ t0σ23

∥∥∥R 12 e3

∥∥∥2 ds

)=

FX2,1(t)FX2,1(0)

exp(∫ t

0(r2(s)− r1(s))ds

)16

B.3 Default intensity, CIR++, crncy = 1, factor idx = 4

We write the instantaneous default intensity process as λ(t) = y(t) + γ(t) where γ is again adeterministic function helping fit exactly the time-0 CDS curve and:{

dy(t) = k(µ− y(t)) dt +ν√y(t)

⟨R

12 e4,dW

(1)t

⟩y(0) > 0

where 2kµ > ν2.If s(0, .) is the time-0 default intensity curve implied from time-0 CDS quotes, then:

∀t ≥ 0, γ(t) = s(0, t) +1

2ν2y(0)B(t)2 + k(y(0)− µ)B(t)− y(0)

where

B(t) =2(eγt − 1)

(γ + k)(eγt − 1) + 2γ

and γ =√k2 + 2ν2

B.4 Summing up

dx1(t) = −a1x1(t)dt+ σ1

⟨R

12 e1,dW

(1)t

⟩Interest rates (stochastic part)

dx2(t) = −a2x2(t)dt+ σ2

⟨R

12 e2,dW

(2)t

⟩dFX2,1(t)FX2,1(t)

= (r1(t)− r2(t))dt+ σ3

⟨R

12 e3,dW

(1)t

⟩FX

dy1(t) = k1(µ1 − y1(t)) dt +ν1√y1(t)

⟨R

12 e4,dW

(1)t

⟩Credit (stochastic part)

dW(2)t = dW

(1)t − σ3R

12 e3dt Switching between W

(1)t and W

(2)t

References

Abbas-Turki, L., B. Diallo, and S. Crepey (2018). XVA principles, nested Monte Carlostrategies, and GPU optimizations. International Journal of Theoretical and Applied Fi-nance 21, 1850030. Preprint version available on https://math.maths.univ-evry.fr/crepey.

Albanese, C., S. Caenazzo, and S. Crepey (2016). Capital and funding. Risk Magazine, May71–76.

Albanese, C., S. Caenazzo, and S. Crepey (2017). Credit, funding, margin, and capi-tal valuation adjustments for bilateral portfolios. Probability, Uncertainty and Quan-titative Risk 2 (7), 26 pages. Preprint version available on https://math.maths.univ-evry.fr/crepey.

Albanese, C. and S. Crepey (2019a). Capital valuation adjustment and funding valuationadjustment. Working paper available on https://math.maths.univ-evry.fr/crepey (First,very preliminary version: arXiv:1603.03012 and ssrn.2745909, March 2016).

Albanese, C. and S. Crepey (2019b). XVA analysis from the balance sheet. Working paperavailable on https://math.maths.univ-evry.fr/crepey.

Armenti, Y. and S. Crepey (2019). XVA Metrics for CCP optimisation. Working paperavailable on https://math.maths.univ-evry.fr/crepey.

17

Barrera, D., S. Crepey, B. Diallo, G. Fort, E. Gobet, and U. Stazhynski (2019). Stochas-tic approximation schemes for economic capital and risk margin computations. ESAIM:Proceedings and Surveys 65, 182–218.

Beck, C., S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld (2019). Deep splitting methodfor parabolic PDEs. arXiv:1907.03452.

Burgard, C. and M. Kjaer (2011). In the balance. Risk Magazine, October 72–75.

Burgard, C. and M. Kjaer (2013). Funding costs, funding strategies. Risk Magazine, Decem-ber 82–87. Preprint version available at https://ssrn.com/abstract=2027195.

Burgard, C. and M. Kjaer (2017). Derivatives funding, netting and accounting. Risk Maga-zine, March 100–104. Preprint version available at https://ssrn.com/abstract=2534011.

Crepey, S., T. R. Bielecki, and D. Brigo (2014). Counterparty Risk and Funding: A Tale ofTwo Puzzles. Chapman & Hall/CRC Financial Mathematics Series.

Crepey, S., R. Elie, W. Sabbagh, and S. Song (2019). When capital is a funding source:The XVA Anticipated BSDEs. Working paper available on https://math.maths.univ-evry.fr/crepey.

Crepey, S. and S. Song (2016). Counterparty risk and funding: Immersion and beyond.Finance and Stochastics 20 (4), 901–930.

Crepey, S. and S. Song (2017). Invariance times. The Annals of Probability 45 (6B), 4632–4674.

Dimitriadis, T. and S. Bayer (2019). A joint quantile and expected shortfall regression frame-work. Electronic Journal of Statistics 13 (1), 1823–1871.

Elouerkhaoui, Y. (2007). Pricing and hedging in a dynamic credit model. International Jour-nal of Theoretical and Applied Finance 10 (4), 703–731.

Elouerkhaoui, Y. (2017). Credit Correlation: Theory and Practice. Palgrave Macmillan.

Fissler, T. and J. Ziegel (2016). Higher order elicitability and Osband’s principle. The Annalsof Statistics 44 (4), 1680–1707.

Fissler, T., J. Ziegel, and T. Gneiting (2016). Expected Shortfall is jointly elicitable withValue at Risk—Implications for backtesting. Risk Magazine, January.

Follmer, H. and A. Schied (2016). Stochastic Finance: An Introduction in Discrete Time(4th ed.). De Gruyter Graduate.

Goodfellow, I., Y. Bengio, and A. Courville (2017). Deep Learning. MIT Press.

Green, A., C. Kenyon, and C. Dennis (2014). KVA: capital valuation adjustment by replica-tion. Risk Magazine, December 82–87. Preprint version “KVA: capital valuation adjust-ment” available at ssrn.2400324.

Hure, C., H. Pham, and C. Warin (2019). Some machine learning schemes for high-dimensional nonlinear PDEs. arXiv:1902.01599.

Longstaff, F. A. and E. S. Schwartz (2001). Valuing American options by simulation: Asimple least-squares approach. The Review of Financial Studies 14 (1), 113–147.

Swiss Federal Office of Private Insurance (2006). Technical document on the Swiss solvencytest. https://www.finma.ch/FinmaArchiv/bpv/download/e/SST techDok 061002 E wo Li 20070118.pdf.

18

Documents

Balance Sheet XVA by Deep Learning and GPU · Balance Sheet XVA by Deep Learning and GPU St ephane Cr epey1, Rodney Hoskinson2, and Bouazza Saadeddine1,3 September 22, 2019 Abstract