28
CHAPTER 1 VaR, CVaR and mean-downside risk portfolio selection Paolo Vanini 1 and Luigi Vignola 2 Preliminary Draft, Please Do Not Distribute 1 Corporate Risk Control, Z¨ urcher Kantonalbank, [email protected] 2 Models and Methods, Z¨ urcher Kantonalbank, [email protected] 1

V Arc Var Shortfall Portfolio

Embed Size (px)

DESCRIPTION

V Arc Var Shortfall Portfolio

Citation preview

CHAPTER 1

VaR, CVaR and mean-downside risk portfolio

selection

Paolo Vanini1 and Luigi Vignola2

Preliminary Draft, Please Do Not Distribute

1Corporate Risk Control, Zurcher Kantonalbank, [email protected] and Methods, Zurcher Kantonalbank, [email protected]

1

Contents

Chapter 1. VaR, CVaR and mean-downside risk portfolio selection 11. Quadratic Finance or Practice of Value-at-Risk (VaR) 41.1. Examples 112. Downside risk measures 152.1. Properties of Value-at-Risk (VaR) 152.2. Conditional Value-at-Risk (CVaR) 173. Mean-downside risk portfolio selection 193.1. Preliminaries 193.2. Mean-downside risk portfolio selection 223.3. Example 243.4. Mean-downside risk portfolio selection, the case of normal distributions 274. Further topics in VaR analysis: Asymptotic analysis for large portfolios 285. Further topics in VaR and CVaR analysis: Sensitivity analysis, stress

testing and back testing 286. Summary and critique 28

3

4 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

In the chapter ?? we analyzed optimal portfolio selection given a mean-variancecriterion. There are a number of critical comments which can be raised for the useof this criterion. For the moment being we content ourself with the remark that(i) an asymmetric risk measures could be preferred (since individuals typically areloss averse but not gain averse) and (ii) regulatory authorities enforced the useof Value-at-Risk (VaR), which is a downside risk measure. Hence, if you intendto work in the finance industry you simply have to know downside risk measures.Due to its practical importance we first give fairly detailed practical expositionof VaR. Then, we discuss general properties of VaR and compare it with the riskmeasure Conditional Value-at-Risk (CVaR). Both measures are then used in port-folio selection where the objective is to maximize expected return given a VaR ora CVaR, respectively. It turns out, that mean-CVaR portfolio selection has moredesirable mathematical properties than the corresponding mean-VaR portfolio se-lection problem. We end this section with a summary and some critical commentsregarding the use of downside risk measures in portfolio selection.

1. Quadratic Finance or Practice of Value-at-Risk (VaR)

We define VaR in this section, explain some common approximations used inthe finance industry and provide some examples.VaR is a single statistical measure of possible portfolio losses. Losses greater thanthe VaR are suffered only with a specified small probability. Depending on theassumptions, VaR condenses all of the risk in a portfolio into a single number thatdescribes the magnitude of the likely losses on the portfolio.Consider a single period intervention model [0, t] starting at time t = 0 and endingat time t > 0. That is, no actions take place during the (0, t]. This period is calledthe holding -, liquidisation - or prognostication period according to the scope ofthe business unit. The VaR concept is based on a probability distribution of futurechanges in market value of the portfolio given the information available at time 03.To define the VaR concept, we denote V φ(t) the value process of the portfolio φand

∆V φ(0) = V φ(0 + t)− V φ(0)

is change in market value of the portfolio φ. Let z1−α be a threshold value suchthat

P (∆V φ(0) ≤ z1−α) = 1− α , P (∆V φ(0) > z1−α) = α .

Since α = 0.95, 0.99 in applications, the threshold value in CHF units is encom-passed only with a small probability 1 − α. z1−α is the (1 − α)-quantile of theprobability distribution P , which models the change in market value of the port-folio. Given this threshold, the VaR is by definition the amount of money, whichis needed to cover losses up to the threshold z1−α but not more extreme losses.Therefore, VaR depends on the security level α, the time interval [0, t], the port-folio φ and the distribution of the portfolio. Therefore, VaR is implicitly definedby

P (∆V φ(0) ≤ −VaRαφ) = 1− α if P (∆V φ(0) < 0) > 1− α (1)

or by

VaRαφ = 0 if P (∆V φ(0) < 0) ≤ 1− α .

3In section Examples we extend this one period setup to the multi period one, which basicallyleads to the use of conditional probabilities.

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 5

V a R

C V a R

P ro b a b ility 1 - α

M a x im um lo ss

Po rtfo lio lo ss

Freq

uenc

y

Figure 1. VaR and CVaR for the possible losses of a portfolio.

We omit the time dependence parameter since at the moment we are in a one-periodworld. Therefore

VaRαφ = max0,−z1−α) (2)

and we always assume in the sequel that the normal case

VaRαφ = −z1−α (3)

holds. If the distribution is symmetric to zero, the important equality

−z1−α = zα (4)

holds.See Figure 1 for an illustration. If Y is a random variable, we denote by

FY (x) = PY ≤ x =∫

y≤xf(y)dy (5)

its distribution function.

Assumption 1. We assume that all random variables possess an absolutelycontinuous distribution.

For the quantiles, we then also use the notation F−1Y (1− α) = z1−α.Equation (1) defines VaR implicitly and unless some specific probability dis-

tributions are chosen, the equation is non-linear and can not be explicitly solved.Hence, approximations (which define for example so-called Quadratic Finance) orasymptotic analysis is needed to calculate the VaR.An equivalent definition of VaR, where we use Y for a portfolio loss, i.e. Y =−∆V (0), is given next:

Definition 2. For a fixed level α, the Value-at-Risk for the level α, VaRα(Y ) ,is defined as the α-quantile, i.e.

VaRα(Y ) = F−1Y (α) (6)

:= infv| FY (v) ≥ α

= infv|∫

x≤vfY (x)dx ≥ α .

6 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

Therefore, VaRα(Y ) is with respect to the probability level α, the lowestamount v such that, with probability α, the loss will not exceed v. The infimum inthe definition of the quantile function is needed since the equation which implicitlydefines VaR can have either more than one solution or none; in both cases, theVaR is well defined using the definition with the infimum. If we use this definition,typical values of α are 1 or 5 percent. It should be clear now to the reader, thatthere is not a single definition of VaR and other risk measures in the literature; afact which led to considerable confusions.

Typical holding periods of 1, 2, 10 days and 1 month. Obviously, the loss thatis guaranteed to occur with a probability of 1 percent is larger than the loss witha probability of 5 percent. The choice of the holding period is important sincethe VaR computed using a t-period is ”approximately”

√t times larger than the

VaR using a 1-day holding period. In practice, this scaling behavior is widely usedand enforced by regulatory authorities. Nevertheless, this scaling behavior stronglydepends on the specific probability distribution under consideration. Applying thisbehavior for example to a non-gaussian situation with heavy tails, the results canbe significantly wrong. We discuss in the Section ”Examples” the conditions suchthat the square-root scaling behavior holds.

In order to compute VaR, first market factors are identified that affect thevalue of the portfolio like exchange rates, interest rates and stock prices. Typically,market factors are identified by decomposing the instruments in the portfolio intosimpler instruments directly related to basic market risk factors and interpretingthe actual instruments as portfolios of the simpler instruments.There are three main methods to calculate VaR: Historical simulation, analyticalmethod and Monte Carlo simulation.Historical simulation is a simple method that requires relatively few assumptionsabout the statistical distributions of the underlying market factors. To constructthe distribution of the profit and losses (P&L), take the current portfolio, subjectit to the actual changes in the market factors experienced during each of the last Nperiods. Equivalently, N sets of hypothetical market factors are constructed usingtheir actual values and the changes experienced during the last N periods. Thestrength of the method, i.e. not to choose a possible misspecified distribution, isalso the weakness of the method: What happens if the future is not ”equal” to thepast?The Monte Carlo method is similar to the historical simulation method. The maindifference, is the choice of a distribution which is believed to adequately capture thepossible changes in the market factors. Then, a pseudo-random number generatoris used to generate thousands of hypothetical changes in the market factors. Usingthese changes, thousands of hypothetical Profit and Loss’ (P&L) on the currentportfolio and the distribution of the possible P&L are constructed. The VaR isthen determined form this distribution.Analytical methods are based on assumptions that the underlying market factorssatisfy are multivariate normal distributed. In this way, the P&L, which is alsonormally distributed, can be determined.The basic analytical approach used in the finance industry is to approximate (??)using Taylor series. We develop this approach next.We define the following notions for a jointly normal distributed random vector

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 7

P = (P1, . . . , Pn) with f the density function:

µi =

∫ ∞

−∞(xi − µi)fi(xi)dxi (7)

σ2ij =

∫ ∞

−∞(xi − µi)(xj − µj)fij(xi, xj)dxidxj , (8)

where the density function of a n-dimensional joint normally distributed randomvector is given by

f(x) =1

(2π)n2

√detV

exp

(

−1

2〈x− µ, V −1(x− µ)〉

)

(9)

and where V = σ2ij1≤i,j≤n is the variance-covariance matrix.

Assumption 3. All random variables are normally distributed in Section 1.

This is the first assumption typically made in the Quadratic Finance approachto VaR.

A further assumption in calculating VaR for portfolios is the assumption thatprices of the securities S(t) in a portfolio depend on underlying market factors withvalues denoted by P (t). Hence, if there are N securities and n stochastic marketfactors P = (P1(t), . . . , Pn(t)) the portfolio value at time t is given by

V (t) := V φ(P (t)) =

N∑

j=1

φjSj(P (t)) (10)

where we omit the superindex notation for the strategy if no confusion is possible.Clearly, if the prices are complicated functions of the factors, to calculate the VaRexactly is in most cases not possible. Therefore, as usual in science and practice,one relies to Taylor’s theorem to replace the possible non-linear expressions bylinear, quadratic and if necessary, higher order approximations. The first Taylorapproximation, which is called delta approximation in practice, is still the mostlyused one. If we denote by

∆0 = ∇P (V ) = (∂P1V |P (0), . . . , ∂PnV |P (0)) (11)

the gradient vector, the delta approach is simply the assumption

V (t) = V (0) + 〈∆0, (P (t)− P (0))〉 . (12)

Since second order terms are suppressed, this approach is exact only if the portfoliosare linear in their risk factors. Delta hedging then simply means choosing a portfolioV , i.e.

V (t) = V (t)− 〈∆0, P (t)− P (0)〉which implies that V (t) = V (0), i.e. all uncertainty of the future portfolio valueis removed if the portfolio is linear. Note that the position in the new portfolio isminus the delta!

ExampleConsider the Black and Scholes model and the price of a European call option C(t)given by

C(t) = C(S,K, t, T, r, σ) = SN(d1)−Ke−r(T−t)N(d2)

with

d1 =1

σ√T − t

lnS

Ker(T − t)+

1

2σ√T − t , d2 = d1 − σ

√T − t ,

8 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

0.6 0.8 1.2 1.4

0.2

0.4

0.6

0.8

1

Figure 2. The delta of a call option as a function of the underlyingprice S with strike price K = 1, r = 5 percent (per annum, p.a.),σ = 16 percent p.a. and T − t = 180.

T the maturity, S the price of the underlying, K the strike price, r the risk freeinterest rate and N(x) the cumulative standard normal distribution function. If weassume that our portfolio consists of this call option, i.e. V (t) = C(t), it followsfrom Black and Scholes formula that the portfolio is a non-linear function of therisk factor S (i.e. P is the underlying’s price). This is the only risk factor, sinceall other parameters such as r or σ are constant in the Black and Scholes model.More specifically, the dynamic of the returns of the underlying follows a geometricBrownian motion

dSt

St= αdt+ σdBt

with Bt a standard Brownian motion and the instantaneous expected drift α andthe instantaneous expected variance σ2 constants. A model with µ and σ constantand Bt a Brownian motion is called a plain vanilla model of returns. This is thestandard benchmark model in finance. The delta vector then possesses a singlecomponent given by

∂V

∂P|P0

=∂V

∂S|S0

= ∆0 .

Since we have an explicit formula, the delta can easily be calculated. Consider forexample T − t = 180 days, K = S (i.e. the option is at-the-money), r = 5 percent(per annum, p.a.), σ = 16 percent p.a. and S0 = 100 . Then, using Black andScholes formula it follows

∆0 =∂V

∂S|S0

= 0.60931

and the option price is 5.7981, since

∆0 =1

2+

1

2Erf(

√T − t

(2 r + σ2

)

2√2σ

)

with Erf the Error Function.To understand what delta hedging means, suppose that you want to hedge a

short position4 of 100 calls. Hence you receive cash up front but you have a potentialliability at maturity time T . Since a single option gives the buyer for example the

4A ”short position” in options means that you have sold or written the option and in a ”longposition” you have bought the option.

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 9

right of buying 100 shares, the delta hedging position in a short position is to buy60.931 shares. Therefore, the position in the underlying asset which is minus thedelta of the derivative is a hedge of changes in price of the derivative if continuallyreset as delta changes (hedging in the Black and Scholes model is done continuouslyin time) and if there are no jumps in the underlying price process (which is true bythe geometric Brownian motion dynamic).

If the linearity assumptions is not fulfilled, delta hedging leads to the hedgeerror

(V (t)− V (0))delta hedged = 〈(P (t)− P (0)), HessS(V (0))(P (t)− P (0))〉+ o(||P (t)− P (0)||2) (13)

with Hess the Hessian matrix. The Hessian matrix, which is denoted Γ0 (the”Gamma”) in the practitioners literature, measures how deltas changes if prices ofmarket factors, such as underlying assets, currency or commodities change. Hence,in this second order approach also we then speak about delta-gamma hedging.We reconsider the call option-problem above. Up to second order we have for theportfolio of call options and stocks (we recall that P = S)

V (t)− V (0) = −100 · 1 · ∂V∂S|S0

+ 60.93 · 1 + 〈(S(t)− S(0)), HessS(V (0))(S(t)− S(0))〉

= −60.931 + 60.931 + 〈(P (t)− P (0)), HessS(V (0))(P (t)− P (0))〉〈(P (t)− P (0)), HessS(V (0))(P (t)− P (0))〉 ,

i.e. the first order corrections are delta-hedged or, the portfolio is so-called delta-neutral. The Gamma equals

Γ0 = 0.033929 .

Therefore, an increase of 1 of the S implies an increase of the delta of 0.03929. Tomake the portfolio gamma-neutral, 100 · 0.033929 = 3.3929 stocks are included inthe portfolio. This however changes the delta-hedging, i.e. we have to correct thenumber of stocks for delta hedging to

64.117 = 60.931− 3.3929 · 0.0609 + 3.392

which is the number of stocks needed for a delta-neutral and a gamma-neutral port-folio.As an exercise, you should (i) plot the ∆,Γ as function of the stock prices, (ii) asfunction of time to maturity (for at-, in- or our-of-the money call options).

In general, we consider dynamics for the factors of the form

P (t) = (P1(0) exp(η1(t)), . . . , P1(0) exp(ηn(t))) (14)

η(t) = (η1(t), . . . , ηn(t)) ∼ N (ν(t),W−1(t)) .

For example a geometric Brownian motion satisfies the dynamic’s assumption (seethe Section ”Examples” for a motivation of such a dynamic). We make clear,that although the factors are dynamic, we still are restricted in a one-period riskmeasuring model. Hence interventions will only take place at the end of periodtime independent of the factors realization in between.Inserting the dynamics in the quadratic expansion of wealth and expanding the

10 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

0.6 0.8 1.2 1.4

0.2

0.4

0.6

0.8

1

Figure 3. The gamma of a call option as a function of the under-lying price S with strike price K = 1, r = 5 percent (per annum,p.a.), σ = 16 percent p.a. and T − t = 180.

exponential in η also up to second order, we finally get

V (t) = V (0) + 〈∆0, (P (t)− P (0))〉+ 〈(P (t)− P (0)), HessS(V )(P (t)− P (0))〉+ o(||P (t)− P (0)||2)

= V (0) + 〈∆, η〉+ 1

2〈η,Γη〉+ o(||η||2) (15)

∆i = Pi(0)∆0,i

Γij =

P 2i (0)Γ0,ii +∆i if i = j

Pi(0)Pj(0)Γ0,ij if i 6= j(16)

This is the situation we consider in this section: Portfolios expanded up to secondorder and Gaussian distributions: This defines so called ”Quadratic Finance”.Using (15), the calculation of the VaRα

P (V φ(0)− V φ(t) ≥ VaRαφ(V )) = α (17)

is approximated in the Quadratic Finance approximation by

P (〈∆, η〉+ 1

2〈η,Γη〉 ≤ −VaRαφQF (V )) = α (18)

or written explicitly

1

(2π)−12n

〈∆,η〉+ 12〈η,Γη〉≤−VaRα

φ

QF (V )

1√detV

exp−12〈η−µ,V −1(η−µ)〉 dnη = α (19)

With the variable transformations

ηi−µi = yi , i = 1, . . . , n , ∆′ = ∆+µΓ , VaRα∗(V ) = −VaRαφQF (V )−〈µ,∆〉−1

2〈µ,Γµ〉(20)

the final expression for VaR is

1

(2π)−12n

〈∆′,y〉+ 12〈y,Γy〉≤VaRα

∗(V )

1√detV

exp−12〈y,V −1y〉 dny = α . (21)

To calculate the VaR in the the expression (21) means the calculation of agaussian integral over a domain defined by a quadratic form. It is well-knownthat gaussian integrals can explicitly calculated if the integration domain is notrestricted. In the case under consideration, perturbative or asymptotic tools from

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 11

mathematics and mathematical physics can be used to analyze (21) further. Thisapproximation corresponds to the case of large portfolios, which is an importantsubject to tackle. We discuss such a more advanced procedure in the final sectionof this chapter.

1.1. Examples.

ExampleWe first consider a single risk factor P (t) ∼ N (µ, σ2) and the value process V (t, P (t))is assumed to be explicitly time dependent. The change in value over a time periodt is

V (t, P (t)) = V (0, P (0)) +∂V

∂t|t=0t+

∂V

∂P|P (t)=P (0)(P (t)− P (0))

+1

2

∂2V

∂P 2|P (t)=P (0)(P (t)− P (0))2 + o(|P (t)− P (0)|2, t) (22)

=: V (0, P (0)) + µtt+∆0(P (t)− P (0))︸ ︷︷ ︸

normal

+1

2Γ0(P (t)− P (0))2

︸ ︷︷ ︸

non-central χ2

+ o(|P (t)− P (0)|2, t) (23)

where we expanded the value process up to second order in the risk factor butonly up to first order in time. The reason for this asymmetry is due to stochasticcalculus: If one assumes a geometric Brownian motion for example for the marketfactor dynamics, a part of the second order term (P (t)− P (0))2 turns out to be offirst order in time using Ito’s Lemma. If we set ∆V2 = V (t, P (t))− V (0, P (0))

∆V2 =: µtt+∆0(P (t)− P (0))︸ ︷︷ ︸

normal

+1

2Γ0(P (t)− P (0))2

︸ ︷︷ ︸

non-central χ2

, (24)

it follows, that the characterization of the 2nd order approximation of

P (−∆V2 ≤ VaRα(V )) = α

is equivalent to the characterization of the sum of a normal and non-central χ2

random variable. But completing the square in (24) implies

∆V2 =: µtt+1

2Γ0(Ψ + P (t)− P (0))2 , (25)

with

Ψ =∆0Γ0

, µt = µt −1

2

∆20Γ0

.

Since Ψ + P (t)− P (0) is normally distributed, it follows

2∆V2 − µt

Γ0σ2=: (

Ψ + P (t)− P (0)

σ)2 ∼ χ2v,d , v = 1 , d = (

Ψ + µ

σ)2 (26)

where we recall that for the single risk factor P (t) ∼ N (µ, σ2).

Example

We consider in this example the case of a single stock and a portfolio of stocks.The risk factors P are the stock prices S and therefore, the portfolio value V processis linear in the risk factors. Furthermore, assuming normally distributed returns,VaR can be explicitly calculated using some approximations.

12 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

We first assume, that there is a single stock with price St and a geometricBrownian motion price dynamics

dSt = St(µdt+ σdBt), t ≥ 0 , (27)

with (Bt) a standard Brownian motion and the initial price S0 given. Using Ito’slemma, the solution of the (27) is

St = S0 exp

(

(µ− 1

2σ2)t+ σBt

)

. (28)

From the solution follows that for 0 ≤ t < t+ s the return Rst at time t+ s for the

past period t+ s is

Rst+s = ln(St+s

St) = µ+ σ(Bt+s −Bs) , (29)

with µ = µ− 12σ2 . The definition of the Brownian motion then implies that

Rst+s ∼ N (µs, σ2s) .

We assume that the portfolio value process is

V (t) = φSt

with φ a constant number of stocks. We distinguish the case of a long positionφ > 0 and of a short position φ < 0. Over the holding period [t, t + s] the changein value ∆V (t) = V (t+ s)− V (t) follows

∆V (t) = φSt(eRst+s − 1) . (30)

To calculate the VaR we use conditional probabilities, i.e.

Pt(∆V (t) ≤ −V aRα,s) = P (∆V (t) ≤ −V aRα,s| Vt = vt) = 1− α

if

Pt(∆V (t) < 0) ≥ 1− α

or

V aRα,s) = 0 , if Pt(∆V (t) < 0) ≤ 1− α . (31)

First, since at time t we know the price of the stock with certainty, we consider theVaR for the holding period conditional on this information an not on all possibleprices at time t, which would imply the use of the unconditional probability. Second,the price process is a Markov process, i.e. conditioning on the whole history of theprice process at time t is equivalent to conditioning on the value at time t. In otherwords, the past possesses no informational content aside if we know the presentprice. Third, we always consider the generic first case in 32). Hence we suppress thecase, where the situation is that favorable with losses occurring with a probabilitysmaller than 1− α. Using (29) in (32) it follows with an elementary calculation

V aRα,s = φSt

(

1− e−µs−sign(φ)zασ√s)

, (32)

where the sign-function shows the difference between a short and a long position,respectively, and zα is the α-quantile of a standard normal distribution, i.e. thesolution of

P (Z ≤ zα) = α

with Z ∈ N (0, 1). The relation z1−α = −zα holds and this notation of the quantilesis frequently used instead of notation F−1(α)(= zα) for the quantiles. Typicalvalues in applications are z0.95 = 1.65, z0.99 = 2.33. Two approximations of theexact expression (32) used in practice are the (i) mean approximation (M), i.e.setting µ = 0, and (ii) the linear approximation (L), i.e. ex = 1 + x + o(x)

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 13

and neglecting the higher order terms. Applying both approximations, V aRα,s

is approximated by

V aRMLα,s = |φSt|zασ

√s , (33)

which is the approach used in RiskMetricsTM for example. In this approximationthe time scaling behavior

V aRMLα,s =

√sV aRML

α,1 σ√s , (34)

immediately follows. If we apply on the linear approximation - which is then justthe ∆-approach discussed above - we get

V aRLα,s = V aR∆α,s = −φStµs+ |φSt|zασ√s . (35)

If we next consider a portfolio of N stocks with constant number of stocks φj ,j = 1, 2, . . . , N , one get for the linear or delta approximation

V aRLα,s = V aR∆α,s = −〈φSt, µ〉s+√

〈φSt, V φSt〉zα√s . (36)

If also the mean vector µ is set equal to zero, formula (36) simplifies again tothe expression used by RiskMetricsTM which again possess the square-root timescaling property and is useful from a second respect: Since only the covariancematrix matters as a parameter, from the matrix equation

V = DRD ,

with R the correlation matrix and D the diagonal matrix with the standard devia-tions of the stocks on its diagonal, the covariance matrix can be constructed fromthe two measurable matrices.

Example

The next examples are taken from the documentations of the RiscMetricsGroupTM. Since the strategy φ is trivial, we omit it from the notation. We considera US based firm which has a portfolio of a 140 Mio DEM position, i.e. a foreigncurrency position (a FX position). Hence,

V DEM (0) = S1(0) = 140 Mio DEM .

What is the 5 percent VaR over 1 day of this position. The first step is mark-to-market the exposure. This means in this case to express the DEM position in USD.If we assume that 1.40 is the DEM/USD foreign exchange, we get

V USD(0) = S1(0) = 100 Mio DEM .

The single risk factor is FX in this example and assumed to be normally distributed

with mean zero and a daily variance σ2,DEM/USDday = 0.565 percent5 The quantile

function at the 5 percent level takes the value 1.65. Therefore,

VaR5%,1day = 100 · 1.65 · 0.565% Mio USD = 932000 USD .

This implies that in 95% of the time, the firm will not lose more than 932 000 USDover the next 24 hours.We consider the same firm, but we assume it now possess a 140 Mio DEM positionin 10 years German government bonds. Again, what is the 5 percent VaR over 1day of this position. The difference to the former example is a second risk factor, i.e.interest rate risk. The first step in determining the VaR is again mark-to-marketof the position which is same as before.First, we consider the two risk factors alone. We get

VaRIR5%,1day = 100 · 1.65 · 0.605% Mio USD = 999000 USD

5Old parameters can be downloaded for free from the JP Morgan homepage.

14 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

where IR denotes ”interest rate risk” and for FX risk as before

VaRFX5%,1day = 100 · 1.65 · 0.605% Mio USD = 999000 USD

Let us consider the impact of both risk factor on the VaR next. That for, we haveto know the correlation factor between FX risk and interest rate risk (IR). FromJP Morgan home page, we get

corrIR,FX = −0.27 .

For the first time in this notes, we meet a negative correlation and it is a must, thatyou think about why this negative sign makes economic sense. From the generalformula, we get

VaRFX + IR5%,1day =√

σ2IR + σ2FX + 2corrIR,FX · σIR · σFX100 Mio USD = 1.168 Mio USD

Hence, the VaR considered with the diversification effect is much less than the sumof both individual risk factors without taking their correlation into consideration.We extend the example further by considering two assets for the US based firm.Asset S1 is a future cash flow of 1 Mio DEM received at the end of 1 year. Sincethe 1 year DEM rate is 10 percent, today’s value S1(0) is

S1(0) = 909091 DEM .

The second asset S2 is an at-the-money put option on the DEM/call on the USDspecified by: Contract size 1 Mio DEM, expiration 1 month, premium of the option0.0105 the spot exchange at contracting time is 1.538 DEM/USD and the impliedvolatility6 σ = 14 percent. Hence the portfolio is

V (t) = 1 · S1(t) + 1 · S2(t) .In this example there are several risk factors: Price risk factor are the USD/DEMexchange rate and the 1y DEM bond price. We consider these two factors. Otherfactors are USD interest rate for the option and the implied volatility. The returnR is a sum of a bond return and the option return, i.e.

R = RBond + RDEM/USD .

The option return is approximated up to first order, i.e.

RDEM/USD = RDEM/USD +∆0RDEM/USD

where RDEM/USD is the return of the DEM/USD exchange rate, ∆0 is the delta ofthe option and RBond is the price return of a 1 y government bond. The goal is tocalculate VaR5percent,5day. The following daily data can be downloaded: σFX,1day =0.42%, σBond,1day = 0.08% and corrFX,Bond;1day = −0.17 and the other data arethe same as in the previous examples. This implies that the delta of the option is

∆0 = −0.4919 .Since we are interested in a 5 day VaR, we use the scaling law

VaR5%,5day =√5VaR5%,1day .

Applying the DEM/USD exchange rate used above, mark-to-market implies for thebond position 591086 USD. Mark-to-market of the second position is the difference

6If C(t) is the theoretical price of an option and CM (t) the market price, implied volatilityis defined by C(t, σ) = CM (t).

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 15

between the FX position and the FX delta hedge which equals 300331 USD. Finally,VaR is calculated as follows:

VaR5%,5day =√5VaR5%,1day

=√5 · 1.65 ·

σ2Bond,1d + (1 +∆0)2σ2DEM/USD,1d

+2(1 + ∆0) · corrFX,Bond;1day · σBond,1d · σDEM/USD,1d

12

Inserting the values gives

VaR5%,5day = 4684 USD .

If the VaR for both assets are calculated independent of the other factor one gets

VaRBond5%,5day = 1745 USD , VaROption5%,5day = 4654 USD .

Hence, VaR of the sum is smaller than the sum of the VaRs; this is the diversificationeffect.

2. Downside risk measures

We assume that all random variables in this section possess a first and secondmoment and that the distribution functions are differentiable7, unless we explicitlystate the regularity assumptions. These assumptions remove many technicalities ofa more general approach which may obscure the basic ideas.

2.1. Properties of Value-at-Risk (VaR). If it is clear from the context,which random variable is under consideration, we simply write F for the distribu-tion function. It is best to think about the random variables Y, Y1, Y2 and othersintroduced below as unspecified losses. Since in this section time and the portfo-lio strategy do not matter, we omit them in the notation. Before we state someproperties of VaRα, we define properties of risk measures in terms of preferencestructures induced by dominance relations.

Definition 4. (1) A relation between two random variables Y1, Y2 is ofstochastic dominance of order 1, Y1 ≺sd1 Y2, iff

E[f(Y1)] ≤ E[f(Y2)] (37)

for all integrable, monotonic functions f .(2) A relation between two random variables Y1, Y2 is of stochastic dominance

of order 2, Y1 ≺sd2 Y2, iffE[f(Y1)] ≤ E[f(Y2)] (38)

for all integrable, concave, monotonic functions f .(3) A relation between two random variables Y1, Y2 is of monotonic dominance

of order 1, Y1 ≺md1 Y2, iffE[f(Y1)] ≤ E[f(Y2)] (39)

for all integrable, concave functions f .(4) Two random variables Y1, Y2 defined on a probability space (Ω,A, P ) are

comonotone, if for all elementary events ω, ω′ ∈ Ω

(Y1(ω)− Y2(ω))(Y1(ω′)− Y2(ω

′)) ≥ 0 , P − a.s. . (40)

7In other words, the measures are assumed to be absolutely continuous.

16 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

Stochastic dominance of first order (SD1), is equivalent to

FY1(x) ≥ FY2

(x) , ∀x,which means that FY1

attaches higher payoffs than FY1in all states x. Hence, Y1

should be preferred by a decision maker who prefers more to less. If we think aboutthe function f being an utility index, with SD1 solely, we are unable to providean accurate ordering of probability distributions. This means, that SD1 does notaccount for the decision makers attitudes towards risk aversion. As we will provein the Chapter ??, risk aversion is equivalent to the concavity of the utility indexf . This leads to the introduction of stochastic dominance of order 2. Clearly, onecould consider SDn, for n > 2. These generalization are rarely considered since theset of agents satisfying the corresponding restrictions becomes smaller and smaller.

Proposition 5. The following relations hold between the dominance criteriadefined in 4:

Y1 ≺sd1 Y2 implies Y1 ≺sd2 Y2 (41)

Y1 ≺sd2 Y2 iff Y1 ≺sd1 Y2 and Y1 ≺md1 Y2Y1 ≺sd2 Y2 iff

∫ x

−∞FY1

(u)du ≤∫ x

−∞FY2

(u)du for all x .

Two random variables Y1, Y2 are comonotone, if there exists a representation

Y1 = f(U) , Y2 = g(U)

with f, g monotone, increasing and U a uniformally distributed random variable on[0, 1].

Proof. Only the second last statement is proven; the other being immediateand for the final one, see Wang (1997), Insurance: Mathematics and Economics,volume 2.Since ∫ x

−∞FY (u)du =

∫ ∞

−∞[x− u]+dF (u)

withz+ = maxz, 0 , z− = minz, 0 , (42)

it follows, that Y1 ≺sd2 Y2 is equivalent to∫ ∞

−∞h(u)dFY1

(u) ≤∫ ∞

−∞h(u)dFY2

(u)

for all functions

h(u) =∞∑

k=0

(−αk)[xk − u]+ , αk ≥ 0 .

The functions h are dense in the set of all concave, monotone functions which provesthe claim. ¤

We next state basic properties of VaRα in

Proposition 6. (1) VaRα is translation equivariant, i.e.

VaRα(Y + c) = VaRα(Y ) + c (43)

for c any real number.(2) VaRα is positively homogeneous, i.e.

VaRα(cY ) = cVaRα(Y ) (44)

for c > 0.

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 17

(3)VaRα(Y ) = −VaR1−α(−Y ) . (45)

(4) VaRα is monotonic w.r.t stochastic dominance of order 1, i.e. if

Y1 ≺sd1 Y2 (46)

thenVaRα(Y1) ≤ VaRα(Y2) . (47)

(5) VaRα is comontone additive, i.e. if Y1 and Y2 are comonotone, then

VaRα(Y1 + Y2) = VaRα(Y1) +VaRα(Y2) . (48)

Proof. We prove only 5., the other claims being immediate from the defi-nitions. Suppose that Y1 = f(U) with U uniform on [0, 1] and f monotonicallyincreasing. Then

VaRα(Y1) = F−1Y1(α) = F−1f(U)(α) = f(α) .

The same reasoning applies to VaRα(Y2), i.e. VaRα(Y2) = g(α) for g monotonicallyincreasing. Therefore,

VaRα(Y1 + Y2) = f(α) + g(α) = VaRα(Y1) + VaRα(Y2) . (49)

¤

The key missing property of VaR is convexity, i.e. the incentive to diversify theportfolio. The risk measure CVaR, discussed in the next section, will possess thisproperty.

2.2. Conditional Value-at-Risk (CVaR). Conditional Value-at-Risk (CVaR)8

is defined as the solution of the optimization problem

CVaRα(Y ) = infa+ 1

1− αE[Y − a]+ : a ∈ R. (50)

This definition of CVaRα turns out to be very powerful for portfolio optimizationproblems since the functional to be minimized is linear and does not depend onVaRα and its cumbersome mathematical properties.

Nevertheless, the following characterization of CVaR is a very intuitive one andavoids the solution of an optimization problem. In fact, this result is often used todefine CVaR.

Proposition 7. Conditional Value-at-Risk can be written as the conditionalexpectation

CVaRα(Y ) = E[Y | Y ≥ VaRα(Y )] . (51)

Furthermore, VaRα is a minimizer of (50), i.e.

VaRα(Y ) = F−1(α) ∈ argmina+ 1

1− αE[Y − a]+ . (52)

This property of VaRα holds true also for non-differentiable distributions F .

For a proof of the first claim see Rockafellar and Uryasev (1999) and for thesecond one, see Pflug (2000). Proposition 7 shows that CVaRα is a conditional tailestimation, while VaRα is a percentile (quantile) function. Figure ?? illustrates thetwo risk measures. It is also not surprising, given Propositioin 7, that there will beclose connections between VaRα and CVaRα. But ist is not evident, that CVaRα

possesses properties which are more economic sound; i.e. we prove that CVaRα is acoherent risk measure in the sense of Artzner et al. (1999). In summary, given theminimum amount v determined by VaRα, CVaRα is the conditional expectationsof losses above that amount. v.

8CVaR is also called Mean Excess Loss or Expected Shortfall in the literature.

18 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

The following alternative characterizations of CVaRα can therefore be given:

CVaRα(Y ) = E [Y | Y ≥ VaRα(Y )] (53)

=1

1− α

∫ 1

α

F−1(x)dx

=1

1− α

∫ ∞

F−1(α)

xdF (x) .

In the next proposition we state and prove basic properties of CVaRα.

Proposition 8. (1) CVaRα is translation equivariant, i.e.

CVaRα(Y + c) = CVaRα(Y ) + c (54)

for c a real number.(2) CVaRα is positively homogeneous, i.e.

VaRα(cY ) = cVaRα(Y ) (55)

for c > 0.(3) If Y has a density,

E[Y ] = (1− α)CVaRα(Y )− αCVaR1−α(−Y ) . (56)

(4) CVaRα is monotonic w.r.t monotonic dominance of order 2, i.e. if

Y1 ≺md2 Y2 (57)

thenCVaRα(Y1) ≤ CVaRα(Y2) . (58)

(5) CVaRα is monotonic w.r.t stochastic dominance of order 2, i.e. if

Y1 ≺sd2 Y2 (59)

thenCVaRα(Y1) ≤ CVaRα(Y2) . (60)

(6) CVaRα is convex, i.e. for arbitrary random variables Y1 and Y2 and 0 <λ < 1,

CVaRα(λY1 + (1− λ)Y2) ≤ λCVaRα(Y1) + (1− λ)CVaRα(Y2) (61)

Proof. Translation equivariance and positive homogeneity are immediate fromthe characterization of CVaRα. To prove 3., we note

CVaR1−α(−Y ) = E[−Y | − Y ≥ VaR1−α(−Y )] = E[−Y | − Y ≥ VaRα(Y )]

= −E[Y | Y ≤ VaRα(Y )] .

This implies

E[Y ] = αE[Y | Y ≤ VaRα(Y )] + (1− α)E[Y | Y ≥ VaRα(Y )]

= −αCVaR1−α(−Y ) + (1− α)CVaRα(Y ) .

To prove the convexity property, we fix numbers ai such that

CVaRα(Yi) = ai +1

1− αE[Yi − ai]

+ . (62)

Since the function y → (y − a)+ is convex, we have

CVaRα(λY1 + (1− λ)Y2)

≤ λa1 + (1− λ)a2 +1

1− αE[λY1 + (1− λ)Y1 − λa1 + (1− λ)a2]

+

≤ λa1 + (1− λ)a2 +λ

1− αE[Y1 − a1]

+ +1− λ

1− αE[Y2 − a2]

+

≤ λCVaRα(Y1) + (1− λ)CVaRα(Y2) .

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 19

The proofs of the ordering properties follow form the fact that the function y →(y − a)+ is convex and monotone. ¤

Since in the axiomatic setup of Artzner et al. (1999), a risk measure is coherentif it is translation invariant, convex, positively homogeneous and monotonic w.r.t.stochastic dominance of order 1, CVaRα is a coherent risk measure in this sense forcontinuous random variables which is a standing assumption in this section unlessotherwise specified.

The next proposition compares VaRα and CVaRα as risk measures. Basically,the two values coincide only if the tails of the distribution are cut off. Hence, VaRα

is insensitive w.r.t. the tails of a distribution: It is not able to measure the decayrate of the density function for heavy losses and neither the expected loss, given weare in the heavy loss region. CVaRα is a measure for the last property. If we write

Y c = min(Y, c)

for the right censored variable Y , for c := VaRα(Y ) it follows

CVaRα(Yc) = VaRα(Y ) . (63)

Proposition 9. (1) CVaRα(Y ) ≥ VaRα(Y ).(2) VaRα(Y ) = supc : CVaRα(Y c) = c(3) If Y is nonnegative, then

(E[Y n]− (1− α)CVaRα(Y

n)

α)

1n → VaRα(Y ) , (n→∞).

Proof. The proof of 1. is immediate and to prove 2., we note that from therepresentation

CVaRα(Y ) =1

1− α

∫ 1

α

F−1(u)du

follows that

CVaRα(Yc) =

1

1− α

∫ 1

α

min(F−1(u), c)du

which implies 2. To prove 3., we first note that for every nonnegative randomvariable Z,

(E[ZN ])1n → infu : P (Z > u) = 0 , (n→∞).

Proposition 8 implies

−CVaR1−α(−Y ) =1

α(E[Y ]− (1− α)CVaRα(Y )) .

Furthermore, a similar characterization as in Proposition 7 implies

−CVaR1−α(−Y ) = E[Y | Y ≤ F−1(α)] = supa− 1

αE[Y − a]− : a ∈ R .

Finally, from

(E[Y n|Y n ≤ F−1Y n (α))1n = (E[Y n|Y ≤ F−1Y (α))

1n → VaRα(Y ),

the result follows. ¤

3. Mean-downside risk portfolio selection

3.1. Preliminaries. So far the functional form of CVaRα is given by

CVaRαφ =1

1− α

gφ(y)≥VaRα(y)g(φ, y)f(y)dy

for f the density function and g a loss function depending on the decision vector φand the random vector Y . For the loss function g one can consider the differencebetween a portfolio value at time 0 and at a future time t given a portfolio φ. Again,

20 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

the probability that g(φ, y) ≥ VaRα is equal to 1− α. Thus, CVaRα comes out asthe condition expectation of the loss associated with φ given the loss being equalor larger than VaRα. The key in the optimization approach is to characterize theVaR and CVaR in terms of the function

Hφα(β) = β +

1

1− α

y∈Rn

(gφ(y)− β)+f(y)dy . (64)

The expression

Gφ(β) =

y∈Rn

(gφ(y)− β)+f(y)dy (65)

is the L1-norm of the so-called expected regret function. This measure can be inter-preted as a measure of under-performance of a portfolio w.r.t. a given benchmark β.

The following proposition summarizes the main features of the function H.We throughout assume that the probability distribution of y has a density and weassume that the probability of g not exceeding a threshold α, given by

Fφ(β) =

gφ(y)<β

f(y)dy

with f the density of vector y, is a everywhere continuous function with respect toβ.

Proposition 10. Given the assumptions stated above about the distribution ofy, the function H is convex and continuously differentiable. The CVaR o the lossassociated with any x can be determined from

CVaRαφ = min

β∈R

Fφ(β) . (66)

The set Aφα of values of α which minimize the function Hφα(β) is a nonempty, closed,

bounded intervals (perhaps reducing to a single point) and the VaRαφ of the loss is

the left end point of Aφα. In particular,

CVaRαφ = Hφ

α(VaRαφ) (67)

The power of the formulas in Proposition 10 is apparent because continuouslydifferentiable convex functions are especially easy to minimize numerically. It alsofollows, that the CVaRα can be calculated without first having to calculate theVaRα on which its definition depends, which would be more complicated.

Proof. The proof uses the following result of Shapiro and Wardi (1994):

Lemma 11. If we fix φ and define

Gφ(β) =

y∈Rn

(gφ(y)− β)+f(y)dy . (68)

Then Gφ is a convex, continuously differentiable function in β with derivative

G′φ(β) = Fφ(β)− 1 . (69)

From the lemma of 11 follows, that H is convex and continuously differentiablewith derivative

∂Hφα(β)

∂β= 1 +

1

1− α(Fφ(β)− 1) =

1

1− α(Fφ(β)− β) . (70)

Therefore, the values of β which minimize the function H are those for which(Fφ(β) − β) = 0. Since F is continuous and nondecreasing in β, they form a

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 21

nonempty closed interval. This further yields the validity of the VaRα formula inthe proposition. In particular, we have

minβ∈R

Hφα(β)

= Hφα(VaRα

φ)) = VaRαφ +

1

1− α

y∈Rm

(gφ(y)−VaRαφ)+f(y)dy

= VaRαφ +

1

1− α

gφ(y)≥VaRαφ(gφ(y)−VaRα(φ))f(y)dy

= VaRαφ +

1

1− α

g(φ,y)≥VaRαφgφ(y)f(y)dy − VaRα

φ

1− α

gφ(y)≥VaRα(φ)f(y)dy

= VaRαφ + (1− α)

1

1− αCVaRα

φ − VaRαφ

1− α(1− F (φ,VaRα(φ))

= VaRαφ +CVaRα −

VaRαφ

1− α(1− α)

= CVaRαφ . (71)

This proofs the CVaRα formula in the proposition and concludes the proof. ¤

The integral Hφα(β) can be approximated in various ways, which will become

important in the optimization problems discussed below. Suppose, for example,that the probability distribution of y is sampled generating a sequence of of vectorsy1, . . . , ys. Then the corresponding approximation to the integral Hφ

α(β) reads

Hφα(β) = β +

1

s(1− α)

s∑

k=1

(gφys − β)+ . (72)

The expression H is piecewise linear and convex in β but not differentiable w.r.t.to β. Using a line search algorithm or representing the the function in terms anextended linear program, the function can nevertheless readily minimize.

The next proposition captures other advantages of VaRα and CVaRα repre-sented though the formulas of proposition 10.

Proposition 12. Minimizing CVaRα is equivalent to minimizing the functionH in the following sense:

minφ∈X

CVaRαφ = min

(φ,β)Hφα(β) . (73)

A pair (φ∗, β∗) achieves the second minimum ⇐⇒ x∗ achieves the first minimumand β∗ ∈ Aφ∗α . Therefore, if the set Aφ

α consists of a single point, the minimizationof F (φ, β) over (φ, β) produces a pair (φ∗, β∗) such that

φ∗ minimizes the CVaRα

β∗ minimizes the VaRα . (74)

The proof is straightforward and omitted (see Rockafellar and Uryasev (1999)).We note that the set Aφ

α consists of a single in applications. Therefore, in suchcases according to the proposition for determining the strategy (action) x whichminimizes CVaRα it is not necessary to work directly with the function CVaRα,which may be hard because of the VaRα being involved in the definition of CVaRand the troublesome mathematical properties of VaR. Instead, one can work withHφα(β) which is convex in β and in most applications also in (φ, β).

22 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

3.2. Mean-downside risk portfolio selection. We apply the results of lastsection to the VaR and CVaR portfolio optimization problems. That for, we changeour notation such that it is compatible with the notation used in the former portfoliooptimization problems. The random variable Y in the general setup of the Sections2.1 and 2.2 or the more specific loss function g(φ, Y ) of last section is now specifiedto be the gain

−Y = −g(φ, Y ) = 〈φ,R〉with φ the n-dimensional portfolio vector and R the n-dimensional return vector.The mean E[R] is abbreviated with µ.

The mean-VaRα optimization problem reads

minφ

VaRα(−〈φ,R〉) (VAR) (75)

s.t.

〈φ, µ〉 ≥ µ0

〈φ, e〉 = 1

φ ≥ 0 .

The mean-CVaRα optimization problem reads

minφ

CVaRα(−〈φ,R〉) (CVAR∗) (76)

s.t.

〈φ, µ〉 ≥ µ0

〈φ, e〉 = 1

φ ≥ 0 .

(1) The properties in Proposition 6 and 8, respectively, imply that equiva-lent to the minimization is the maximization of the objective functionsVaR(1−α)(〈φ,R〉) and CVaR(1−α)(〈φ,R〉), respectively.

(2) The VaR and CVaR are usually defined in monetary units and not inpercentage returns as we do here. Hence, we consider the case wherethere is a one-to-one relationship between percentage return and monetaryvalue. If portfolios have zero net investment, this relationship may notbe given. Since a goal is to compare the solution properties of the twoabove programs with the mean-variance (M) problem of Markowitz, it isconsistent to use percentage values.

(3) In the programs the goal was to minimize risk while requiring a minimumexpected return. In practice, one often encounters the situation, whereexpected returns are to be maximized while not allowing large risks. As wehave shown in the mean-variance setup, the two problems can be swappedinto each other such that they lead to the same efficient frontier. The nextproposition states this property for arbitrary convex risk measures.

Proposition 13. Consider a risk function Rφ(Y ) and a reward func-tion Wφ(Y ), which depend on a decision vector φ and on a random vari-able Y . Consider the problems

minφ

Rφ(Y )− λWφ(Y ) , φ ∈ X,λ ≥ 0,

minφ

Rφ(Y ) , Wφ(Y ) ≥W0 , φ ∈ X,

maxφ

Wφ(Y ) , Rφ(Y ) ≤ R0 , φ ∈ X, .

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 23

There are one-to-one relationships between the three parameters λ,W0 andR0 such that for the risk measure R

φ(Y ) convex, the reward functionWφ(Y ) concave and X a convex set, the three problems generate the sameefficient frontier.

Since the proof is similar to the case in the mean-variance setup, weomit it.

A major drawback of the program (VAR) is its non-convexity. Hence, several localminima may occur. From convex analysis follows, that non-convexity in optimiza-tion problems make them very difficult to solve numerically. Hence, such choicesof risk measures in portfolio selection should be avoided. If we also recall the otherunpleasant features of VaR as measure of risk, there is no big loss in doing so.

Why is the (CVAR∗) programm better suited, given that conditional VaR usesVaR in its definition. To understand why the mean-CVaR optimization programpossess nice feature for optimization, we refer to the proper definition (50) of CVaRand not to the representation using conditioning on VaR. The mean-CVaRα opti-mization problem then reads

minφ,a

a+1

1− αE[Z] (CVAR) (77)

s.t.

Z ≥ −〈φ,R〉 − a , with probability 1

〈φ, µ〉 ≥ µ0

〈φ, e〉 = 1

Z ≥ 0

φ ≥ 0 .

The program (CVAR) is a linear program in φ and a and due to Proposition ??equivalent to the original one. I.e. the number of decision variable increased by1. Since the constraints are convex, the solution set is a convex polyhedron. Thegeneral properties imply that every local optimum is global one. This crucialproperty is missing in the program (VAR). The properties of the program doe notdepend on any assumptions about the distribution of the random variables. Thedrawback of (CVAR) is its (typical) infinite dimensionality. For example, if thesample space is not finite,

Z(ω) ≥ −〈φ,R(ω)〉 − a , with probability 1

is not a finite is the set. The infinite dimensionality is not a serious problem forpractical purposes since the random variable Y is assumed to be discrete takingthe values

−〈φ,Ri〉 , i = 1, . . . , N (78)

with all values having equal probability. The vectors Ri are called scenarios.Clearly, other possibilities to reduce the problem to a finite dimensional one could

24 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

be used. Therefore, the discrete version of the (CVAR) program reads

minφ,a

a+1

1− α

1

N

N∑

j=1

zi (CVARdiscrete) (79)

s.t.

zi ≥ −〈φ,Ri〉 − a , i = 1, . . . , N

〈φ, µ〉 ≥ µ0

〈φ, e〉 = 1

zi ≥ 0 , i = 1, . . . , N

φ ≥ 0 .

Finally, the program (CVARdiscrete) is a finite dimensional linear optimizationproblem which can be solved using any linear program (LP) algorithm.

3.3. Example. We apply the theory of the last sections to a one period opti-mization problem with transaction costs. We denote by φ0 the n-dimensional vectorof the initial portfolio and φ is the optimal portfolio we are looking for. Contraryto the examples considered so far, we work with absolute values in this example.We write S(0) for the vector of initial prices and S(1) for the scenario dependentprices at time t = 1. Therefore, the loss function reads

g = −〈φ, S(1)〉+ 〈φ0, S(0)〉.The return performance function given the strategy φ is the expected portfoliovalue at the end of the period divided by the initial value,

E[Rφ] = −〈φ,E[S(1)]〉〈φ, S(0)〉 . (80)

The first asset is assumed to be cash S1 which a certain return Rcash. In theoptimization program, several restrictions are introduced. First, we consider abalance constraint which maintains the value of the portfolio less the transactioncosts. We assume that this costs are linear, i.e. proportional to the value of theshares traded. Non-linear costs are much more cumbersome. If ci are the marginalcosts of instrument i, the balance constraint is

〈S(0), φ0〉 = 〈S(0), φ〉+ 〈cS(0), |φ0 − φ|〉 . (81)

Assuming a proportional trading fee, the problem can reformulated in the form oflinear constraints

〈S(0), φ0〉 = 〈S(0), φ〉+ 〈cS(0), δ − δ〉φ0 − δ + δ = φ , δ ≥ 0, δ ≥ 0 . (82)

There exist a non-linear constraint δiδi = 0 which means that simultaneous asecurity is bought and sold. Such a constraint can be neglected, since it is neveroptimal to act in this way. The next constraint is the bounded position changeassumption

0 ≤ δi ≤ δmaxi , 0 ≤ δi ≤ δmaxi , i = 1, . . . , n . (83)

We further allow for bounds on positions

φi≤ φi ≤ φi , i = 1, . . . , n . (84)

The next constraint is a value constraint. We do not allow an instrument i toconstitute more than a given percent νi, i.e.

φiSi(1) ≤ νi〈φ, S(1)〉 . (85)

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 25

Current regulations impose capital requirements for investment proportional to theVaR of the portfolio. Since CV aR ≥ V aR and the nice features in optimizingconditional VaR, we impose a CVaR constraint

CVaRα(〈φ,R〉) ≤ κ . (86)

Proposition (X) implies, that we can use instead of CVaRα the function Hα(φ, β),i.e. the constraint

Hφα(β) ≤ κ . (87)

The next step is to approximate the integral by its discrete form, i.e.

Hφα(β) = β + Γ

J∑

j=1

(gφYj − β)+ (88)

with Γ = 1J(1−α) and J samples of the vector Y . Since in our case the loss function

g is linear w.r.t. to φ, the function Hφα(β) is convex and piecewise linear. As a final

step, the piecewise linear CVaR-constraint is replaced by a set of linear constraints.That for, dummy variables zj , j = 1, . . . , J are introduced and the linear systemreads

β + Γ

J∑

j=1

zj ≤ κ

zj ≥ f(φ, yj)− β , zj ≥ 0 , j = 1, . . . , J . (89)

Therefore, the optimization problem is

minφ,β

E[Rφ] = −〈φ,E[S(1)]〉〈φ, S(0)〉 (90)

s.t.

(82), (83), (84), (85), (89) .

Solving this linear program, we get

φ∗ : optimal portfolio vector

CVaRα∗ : optimal CVaR

VaRα∗ : optimal VaR

E[Rφ∗

] : maximum expected return .

We see from this list, that the VaR is obtained as a by-product.It is worth noting the size of the LP: The are 3n+J +1 variables, 2(n+1)+J

constraints without counting the bounds for δ, δ and the position bounds. Thesystem of linear constraints, written in matrix form, leads to the constraint matrix.The number of nonzero elements in this matrix is 6n + nJ + n2 + 3J + 1. Itfollows, that the number of nonzero elements in the constraint matrix is growingquadratically in the number of securities and the number of scenarios.

Scenarios are used to approximate the integral of the CVaR function. Wedescribe one approach using historical data without assuming a particular distribu-tion. Consider n time series for the n assets. Each series is divided in J scenarios,where the period length equals the time horizon ∆t in the portfolio optimization.The historical return rhij in scenario j for asset i is then

rhij =Si(t+∆t)

Si(t).

26 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

The expected return for each instrument is then

E[rhi ] =1

J

J∑

j=1

rhij .

End of period prices and expected end of period prices are then

Sij(t+∆t)h = Si(0)rhij , E[Sij(t+∆t)h] =

1

J

J∑

j=1

Shij(t+∆t)

In the sequel, we consider the S& P 100 Stocks index with a two weeks time horizon(∆t = 10 days). The following limits were set:

φi= 0 , φi =∞ , ∀i,

i.e. no short selling is allowed,

δmax = δmax =∞ ,

i.e. unlimited changes in the positions are possible,

νi = 20%,

i.e. the maximum amount of the portfolio invested in a single asset is 20 percentand the return of cash was set 0.16 over two weeks. We consider α = 0.90. Then, inthe following table, a risk or risk level of 0.1 means that we allow for a 10% loss ofthe initial portfolio value with the probability of 10%. Finally, the initial portfoliocontained only cash with a value of 10 000 USD.

Risk 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%Cash 2000 960 0 0 0 0 0 0 0 0AA 1 1 0 0 0 0 0 0 0 0AIT 0 11 24 24 28 28 17 0 0 0AVP 0 0 0.6 0 0 0 0 0 0 0BAX 4 0 0 0 0 0 0 0 0 0BEL 3 2 3 0 0 0 0 0 0 0CSC 0 0.6 0 0 0 0 0 0 0 0CSCO 0 0 0 3 7 21 30 30 30 30ETR 33 10 0 0 0 0 0 0 0 0GD 10 13 8 0 0 0 0 0 0 0HM 50 0 0 0 0 0 0 0 0 0IBM 5 10 11 5 3 0 1.8 1.3 0 9IFF 1 0 0 0 0 0 0 0 0 0LTD 5 5 3 0 0 0 0 0 0 0MOB 4 5 0 0 0 0 0 0 0 0MSFT 0 0 0 0 0 0 0 0 0 7MTC 0 0 0 0 0 0 0 0 0 0SO 26 11 0 0 0 0 0 0 0 0T 11 21 31 35 35 35 35 35 17 0TAN 8 18 28 32 37 37 37 37 37 37TXN 0 0 0 0 0 0 0 12 26 26UCM 47 47 47 47 29 13 0 0 0 0UIS 0 0 9 27 40 45 45 45 45 45WMT 0 0 1 6 0 0 11 18 22 0

Portfolio configuration: number of shares of stock in the optimized portfoliodepending on the risk level. To simplify the numbers, for optimal number of

shares greater than 1, the digits were set equal to zero.

Optimal Portfolio Selection Chapter 1 Version: May 29, 2001 27

The table shows the following facts:

(1) The higher the allowed risk level, the lower the number of assets in theportfolio. Hence, in this case assets with a higher return are becomingmore important. Hence, diversifying the portfolio reduces risk.

(2) Small changes in the allowed risk has a strong impact on the resultingportfolio. Therefore, it is important to think about what the appropriateallowed risk is!

3.4. Mean-downside risk portfolio selection, the case of normal dis-tributions. In many applications people assume that the random vectors are nor-mally distributed. Therefore, it is worth to consider this case. We denote by X theset of admissible portfolios, i.e.

X = φ ∈ Rn| φk ≥ 0,∀k,∑

j

φk = 1, 〈φ, µ〉 ≥ r∗ . (91)

We then consider the following three problems:

minφ∈X

CVaRα(〈φ,R〉) , (CVAR), (92)

minφ∈X

VaRα(〈φ,R〉) , (VAR), (93)

and

minφ∈X

var(〈φ,R〉) , (M) . (94)

Proposition 14. Suppose that Y is normally distributed. Then the loss asso-ciated with each strategy φ is also normally distributed. Furthermore α ≥ 0.5 andwe assume that

〈φ, µ〉 = r∗

, i.e. the constraint is active to any two of the three problems (CVAR), (VAR), (M).Then the solution any two of the three problems is the same, i.e. a common portfolioφ∗ is optimal for both criteria under consideration.

Proof. Expressing the VaR and CVaR under the normality assumption im-plies

VaRα(Y ) = 〈φ, µ〉+ c1(α)√

var(〈φ,R〉)CVaRα = 〈φ, µ〉+ c2(α)

var(〈φ,R〉)c1 =

√2erf−1(2α− 1) (95)

c2 = (√2π exp(erf−1(2α− 1)2)2(1− α))−1

erf(y) =2√π

∫ y

0

e−t2

dt . (96)

If the return constraint is binding, we get

VaRα(Y ) = r∗ + c1(α)√

var(〈φ,R〉)CVaRα = r∗ + c2(α)

var(〈φ,R〉) .But then minimizing either of this expressions with respect to φ is evidently thesame than minimizing the variance σ2 over the same set X ′ ⊂ X, where the returninequality constraint is replaced by an equality. ¤

4. Further topics in VaR analysis: Asymptotic analysis for largeportfolios

TO BE WRITTEN

28 Optimal Portfolio Selection, Chapter 1 Version May 29, 2001

5. Further topics in VaR and CVaR analysis: Sensitivity analysis,stress testing and back testing

TO BE WRITTEN

6. Summary and critique

TO BE WRITTEN

• Mean-CVaR portfolio optimization fulfills the requirements of the initialquestion.

• Mean-VaR portfolio optimization should be avoided (economical and math-ematical reasons).

• Under normality assumption, the Mean-VaR -, Mean-CVar - and theMean-Variance problem are equivalent (i.e. there exist parameters suchthat Mean-Variance optimization can be used).

• We did not tackled the problem of aggregating risk for different businessunits with different time horizons. VaR and CVaR have ”bad” scalingproperties!

• One-period world. The mathematical programs are not intended for dy-namic setups or for data with specific patterns (intraday trading).