78
Convex Stochastic optimization WiSe 2019/2020 Ari-Pekka Perkki¨ o February 17, 2020 Contents 1 Practicalities and background 3 2 Introduction 3 2.1 Examples ............................... 4 3 Convex sets 6 3.1 Separation theorems ......................... 8 4 Convex functions 10 4.1 Lower semicontinuity, recessions and directional derivatives ... 12 4.2 Continuity of convex functions ................... 15 5 Convex normal integrands 17 6 Integral functionals 28 7 Dynamic programming 33 7.1 Conditional expectation of a normal integrand .......... 35 7.2 Existence of solutions ........................ 42 1

Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Convex Stochastic optimization WiSe 2019/2020

Ari-Pekka Perkkio

February 17, 2020

Contents

1 Practicalities and background 3

2 Introduction 3

2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Convex sets 6

3.1 Separation theorems . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Convex functions 10

4.1 Lower semicontinuity, recessions and directional derivatives . . . 12

4.2 Continuity of convex functions . . . . . . . . . . . . . . . . . . . 15

5 Convex normal integrands 17

6 Integral functionals 28

7 Dynamic programming 33

7.1 Conditional expectation of a normal integrand . . . . . . . . . . 35

7.2 Existence of solutions . . . . . . . . . . . . . . . . . . . . . . . . 42

1

Page 2: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

7.3 Kernels and conditioning . . . . . . . . . . . . . . . . . . . . . . . 46

7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

8 Conjugate duality 57

8.1 Convex conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . 57

8.2 Conjugate duality in optimization . . . . . . . . . . . . . . . . . . 62

8.3 Dual spaces of random variables . . . . . . . . . . . . . . . . . . 64

8.4 Conjugate duality in stochastic optimization . . . . . . . . . . . . 65

8.5 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

9 Examples 72

10 Absence of duality gap 75

2

Page 3: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

1 Practicalities and background

Upon passing the exam, attending and solving the exercises give a bonus to thefinal grade.

We assume that the following concepts are familiar:

1. Vector spaces, topology.

2. Probability space, random variables, expectation, convergence theorems.

3. Conditional expectations, martingales.

4. The fundamentals of discrete time financial mathematics.

2 Introduction

Let (Ω,F , P ) be a probability space with a filtration (Ft)Tt=0 of sub-σ-algebrasof F and consider the dynamic stochastic optimization problem

minimize Eh(x, u) :=

∫h(x(ω), ω)dP (ω) over x ∈ N , (SP )

where, for given integers nt and m,

N = (xt)Tt=0 |xt ∈ L0(Ω,Ft, P ;Rnt)

is the space of adapted processes, h is an extended real-valued B(Rn) ⊗ F-measurable function, where n := n0 + . . . + nT . Here and in what follows,we define the expectation of a measurable function φ as +∞ unless the positivepart φ+ is integrable1. The function Eh is thus well-defined extended real-valuedfunction on N .

We will assume throughout that the function h(·, ω) is convex for every ω ∈ Ω.Then Eh is a convex function of N and (SP ) is a convex stochastic optimizationproblem on the space of adapted processes.

The aim of this course is to analyze (SP ) using dynamic programming and con-jugate duality. These lead to characterizations of optimal solutions of (SP ) via”Bellman equations”and ”Karush-Kuhn-Tucker conditions”, both starting pointsof various modern numerical methods. Duality theory also leads to characteri-zations and lower bounds of the optimal value of (SP ), classical example beingthat the superhedging price of an option in a liquid market can be computedvia ”martingale measures”.

1In particular, the sum of extended real numbers is defined as +∞ if any of the termsequals +∞.

3

Page 4: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

2.1 Examples

Example 2.1 (Mathematical programming). Consider the problem

minimize Ef0(x) over x ∈ Nsubject to fj(x) ≤ 0 P -a.s., j = 1, . . . ,m,

where fj are normal integrands. The problem fits the general framework with

h(x, ω) =

f0(x, ω) if fj(x, ω) ≤ 0 for j = 1, . . . ,m,

+∞ otherwise.

When each f0(·, ω), . . . , fm(·, ω) is an affine function, the problem becomes alinear stochastic optimization problem.

Example 2.2 (Optimal investment). Let s = (st)Tt=0 be an adapted RJ -valued

stochastic process describing the unit prices or assets in a perfectly liquid fi-nancial market. Consider the problem of finding a dynamic trading strategyx = (xt)

Tt=0 that provides the “best hedge” against a financial liability of deliver-

ing a random amount c ∈ L0 cash at time T . If we measure our risk preferencesover random cash-flows with the “expected shortfall” associated with a nonde-creasing convex “loss function” V : R→ R, the problem can be written as

minimize EV

(u−

T−1∑t=0

xt ·∆st+1

)over x ∈ ND, (2.1)

where ND denotes the set of adapted trading strategies x = (xt)Tt=0 that satisfy

the portfolio constraints x ∈ Dt for all t = 0, . . . , T almost surely. Here Dt isa random Ft-measurable set consisting of the portfolios we are allowed to holdover time period (t, t+ 1].

The problem fits the general framework with

h(x, ω) =

V(u(ω)−

∑T−1t=0 xt ·∆st+1(ω)

)if xt ∈ Dt(ω) for t = 0, . . . , T

+∞ otherwise.

This example can be extended to a semi-static hedging problem, where some (orall) of the assets are allowed to be traded only at the initial time t = 0. It is alsopossible to allow some of the assets to be ”American type options”. The abovecould also be readily extended by allowing the loss function V to be random orby adding transaction costs.

Example 2.3 (Stochastic control). The problem

minimize E

[T∑t=0

Lt(Xt, Ut)

]over X,U ∈ N ,

subject to ∆Xt = AtXt−1 +BtUt−1 + ut t = 1, . . . , T

4

Page 5: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

fits the general framework with x = (X,U),

h(x, ω) =

∑Tt=0 Lt(xt, ω) if π∆xt − At(ω)xt−1 = ut(ω) for t = 1, . . . , T,

+∞ otherwise,

where π = [I 0] and At = [At Bt]. Here the stochastic process X is the ”state”and U is the ”control”.

Example 2.4 (Problems of Bolza). Consider the problem

minimizex∈N

E

T∑t=0

Kt(xt,∆xt), (2.2)

where nt = d, ∆xt := xt − xt−1, x−1 := 0.

The problem fits the general framework with

h(x, ω) =

T∑t=0

Kt(xt,∆xt, ω).

For instance, currency market models fit into this framework, where componentsof xt describe different currencies in the portfolio hold at time (t, t+ 1].

Example 2.5 (Risk measures). Some modern optimization problems in financeare given in terms of risk measures that do not a priori fit into (SP ). However,some of them can be expressed as (SP ) by introducing additional variables.

Consider the problemminimize

x∈NV(x) (2.3)

for V : L0 → R defined by

V(x) = infα∈R

Ec(α, x),

where c(·, ·, ω) is convex on R× Rn.

Extending N := L0(F−1) × N for a trivial F−1 and denoting x = (α, x), theproblem fits the general framework with

h(x, ω) = c(α, x, ω).

Assume now that c(α, x, ω) = α+ θ(g(x, ω)−α, ω) for convex θ and g. When θis nondecreasing with θ(0) = 0 and 1 ∈ ∂θ(0),

V(x) = infαE[α+ θ(g(x)− α)]

is known as optimized certainty equivalent of the random variable g(x). Whenθ(u) = et − 1, we get the entropic risk

V(x) = logEeg(x)

5

Page 6: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

while for θ(u) = u+

γ with γ ∈ (0, 1) we obtain the conditional value at risk at γ

V(x) = infαE[α+

1

γ(g(x)− α)+].

When g is affine and θ(u) = 12u

2 + u, we obtain the ”Mean Variance” riskmeasure

V(x) =1

2E[(g(x)− Eg(x))2] + Eg(x).

3 Convex sets

Let U be a real vector space. A set A ⊆ U is convex if

λu+ (1− λ)u′ ∈ A.

for every u, u′ ∈ A and λ ∈ (0, 1). For λ ∈ R and sets A and B, we define thescalar multiplication and a sum of sets as

λA := λu | u ∈ AA+B := u+ u′ | u ∈ A, u′ ∈ B.

The summation is also known as Minkowski addition. With this notation, A isconvex if and only if

λA+ (1− λ)A ⊆ A ∀λ ∈ (0, 1).

Convex sets are stable under many algebraic operations. Let X be anotherlinear vector space.

Theorem 3.1. Let J be an arbitrary index set, (Aj)j∈J a collection of convexsets and A ⊂ X × U a convex set. Then,

1. for λ ∈ R+, the scaled set λA is convex,

2. for finite J , the sum∑j∈J A

j is convex,

3. the intersection⋂j∈J A

j is convex,

4. the projection u ∈ U | ∃x : (x, u) ∈ A is convex.

Proof. Exercise.

Next we turn to topological properties of convex sets. Let τ be a topology onU (the collection of open sets, their complements are called closed sets) and letA ⊂ U . The interior intA of A is the union of all open sets contained in A andclosure clA is the intersection of closed sets containing A.

6

Page 7: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

The set A is a neighborhood of u if u ∈ intA. We denote the collection ofneighborhoods of u by Hu and the collection of open neighborhoods of u byHou. Note that A is a neighborhood of u if and only if A contains an openneighborhood of u.

Exercise 3.0.1. For A ⊂ U , u ∈ clA if and only if A ∩O 6= ∅ for all O ∈ Hou.

A function g from U to another topological space V is continuous at a point u ifthe preimage of every neighborhood of g(u) is a neighborhood of u. A functionf is continuous if it is continuous at every point.

Exercise 3.0.2. A function is continuous if and only if the preimage of everyopen set is open.

A collection E of neighborhoods of u is called a neighborhood base if every neigh-borhood of u contains and element of E . Evidently H0

u is a neighborhood base.

Exercise 3.0.3. Given local bases Eu of u and Ev of v = g(u), g is continuousat u if and only if the preimage of every element of Ev contains an element ofEu.

Given another topological space (U ′, τ ′), the product topology on U × U ′ is thesmallest topology containing all the sets O × O′ | O ∈ τ,O′ ∈ τ ′. We al-ways equip products of topological spaces with the product topology. Clearly(O,O′) ∈ Hou ×Hou′ is a neighborhood basis of (u, u′).

Exercise 3.0.4. Let p : U × U ′ → V be continuous. For every u′ ∈ U ′,u 7→ p(u, u′) is continuous.

The space (U, τ) is a topological vector space (TVS) if (u, u′)→ u+u′ is contin-uous from U × U to U and (u, α)→ αu is continuous from U × R to U .

Exercise 3.0.5. In a topological vector space U ,

1. αO ∈ H00 for all α 6= 0 and O ∈ Ho0,

2. for all u ∈ U and O ⊂ U , (O + u) ∈ Hou if and only if O ∈ Ho0,

3. sum of a nonempty open set with any set is open,

4. for every O ∈ Ho0, there exists O′ ∈ Ho0 such that 2O′ ⊂ O,

5. αA ∈ H0 for all α 6= 0 and A ∈ H0,

6. for all u ∈ U and A ∈ U , (A+ u) ∈ Hu if and only if A ∈ H0,

7. for every A ∈ H0, there exists A′ ∈ H0 such that 2A′ ⊂ A.

A set C is symmetric if x ∈ C implies −x ∈ C.

7

Page 8: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Lemma 3.2. In a topological vector space, every (resp. convex) neighborhoodof the origin contains a symmetric (resp. convex) neighborhood of the origin.

Proof. Let A ∈ H0. By continuity of p(α, u) := αu from R × U to U , there isα′ and O ∈ Ho0 such that αO ⊂ A for all |α| ≤ α′. The set B :=

⋃|α|≤α′(αO) is

the sought neighborhood.

Assume additionally that A is convex. The set A∩ (−A) is symmetric, so, sinceB ⊂ A is symmetric as well, B ⊂ A ∩ (−A). Hence A ∩ (−A) is a symmetricconvex set containing a neighborhood of the origin.

Lemma 3.3. Let C be a convex set in a TVS. Then intC and clC are convex.

Proof. Let λ ∈ (0, 1). We have intC ⊂ C, so λ(intC) + (1 − λ) intC ⊂ C.Since sums and strictly positive scalings of open sets are open, we see thatλ(intC) + (1− λ) intC ⊂ intC, since intC is the largest open set contained inC. Since λ ∈ (0, 1) was arbitrary, this means that intC is convex.

To prove that clC is closed we use results from Exercises 3.0.1 and 3.0.5. Letu, u′ ∈ clC, λ ∈ (0, 1) and O ∈ Ho0. It suffices to show that

λu+ (1− λ)u′ + O ∩ C 6= ∅.

There are O,O′ ∈ Ho0 with λO + (1 − λ)O′ ⊂ O and u ∈ C ∩ (u + O) andu′ ∈ C ∩ (u′ +O′). Thus

λu+ (1− λ)u′ ⊂ λ(u+O) + (1− λ)(u′ +O′) ⊂ λu+ (1− λ)u′ + O

where the left side belongs to C.

3.1 Separation theorems

Sets of of the formu ∈ U | l(u) = α

are called hyper-planes, where l is a real-valued linear function and α ∈ R. Eachhyperplane generates two half-spaces (opposite sides of the plane)

u ∈ U | l(u) ≤ α, u ∈ U | l(u) ≥ α.

A hyperplane separates sets C1 and C2 if they belong to the opposite sides ofthe hyperplane. The separation is proper unless both sets are contained in thehyperplane. In other words, proper separation means that

supl(u1 − u2) | ui ∈ Ci ≤ 0 and infl(u1 − u2) | ui ∈ Ci < 0.

A set C ⊂ U is called algebraically open if α ∈ R | u + αu′ ∈ C is open forany u, u′ ∈ U . The set C is algebraically closed if its complement is open, orequivalently, if the set α ∈ R | u+ αu′ ∈ C is closed for any u, u′ ∈ U .

8

Page 9: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Exercise 3.1.1. In a topological vector space, open (resp. closed) sets are alge-braically open (resp. closed), and the sum of a nonempty algebraically open setwith any set is algebraically open.

The following separation theorem states that the origin and an algebraicallyopen convex set not containing the origin and can be properly separated.

Theorem 3.4. Assume that C in a linear vector space U is an algebraicallyopen convex set with 0 /∈ C. Then there exists a linear l : U → R such that

supl(u) | u ∈ C ≤ 0, infl(u) | u ∈ C < 0.

In particular, l(u) < 0 for all u ∈ C.

Proof. This is an application of Zorn’s lemma. Omitted.

The above separation theorem implies a series of other separation theorems forconvex sets. In the locally convex setting below, we get separation theoremsin terms of continuous linear functionals, or equivalently, in terms of closedhyperspaces as the next exercise shows.

A real-valued function g is bounded from above on B ⊂ U if there is M ∈ Rsuch that g(u) < M for all u ∈ B. If g is continuous at u ∈ dom g, then it isbounded from above on a neighborhood at u. Indeed, choose a neighborhoodg−1((−∞,M)) for some M > g(u).

Theorem 3.5. Assume that l is a real-valued linear function on a topologicalvector space. Then the following are equivalent:

1. l is bounded from above in a neighborhood of the origin.

2. l is continuous.

3. u ∈ U | l(u) = α is closed for all α ∈ R.

4. u ∈ U | l(u) = 0 is closed.

Proof. Exercise.

A topological vector space is locally convex (LCTVS) if every neighborhood ofthe origin contains a convex neighborhood of the origin.

Theorem 3.6. Assume that C is a closed convex set in a LCTVS and u /∈ C.Then there is a continuous linear functional separating properly u and C.

Proof. The origin belongs to the open set (C−u)C , so there is a convex O ∈ Ho0such that 0 /∈ C − u+O. By Theorem 3.4, there is a linear l such that

l(u′) < 0 ∀ u′ ∈ C − u+O.

This means that l(u′) < l(u) for all u′ ∈ C+O, so l is continuous by Theorem 3.5.

9

Page 10: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

The following corollary is very important in the sequel. For instance, it will givethe biconjugate theorem that is the basis of duality theory in convex optimiza-tion.

Corollary 3.7. The closure of convex set in a LCTVS is the intersection of allclosed hyperplanes containing the set.

Proof. By Lemma 3.3, clC is convex for convex C. For any u /∈ clC, there is,by the above theorem, a closed half-space Hu such that clC ⊂ Hu and u /∈ Hu.We get

clC =⋂u/∈C

Hu.

4 Convex functions

Throughout the course, R = R∪±∞ is the extended real line. For a, b ∈ R, theordinary summation is extended as a+ b = +∞ if a = +∞, and as a+ b = −∞if a 6= +∞ and b = −∞.

Let g : U → R. The function g is convex if

g(λu+ (1− λ)u′) ≤ λg(u) + (1− λ)g(u′)

for all u, u′ ∈ U and λ ∈ [0, 1]. A function is convex if and only if its epigraph

epi g := (u, α) ∈ U × R | g(u) ≤ α

is a convex set. Applying the last part of Theorem 3.1 to epi g, we see that thedomain

dom g := u ∈ U | g(u) <∞

is convex when g is convex.

Exercise 4.0.1. A function g is convex if and only if the strict epigraph

oepi g = (u, α) ∈ U × R | g(u) < α

is convex.

Many algebraic operations also preserve convexity of functions.

Theorem 4.1. Let J be an arbitrary index set, (gj)j∈J a collection of convexfunctions, and p : X × U → R a convex function. Then,

1. for finite J and strictly positive (λj)j∈J , the sum∑j∈J λ

jgj is convex,

10

Page 11: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

2. the infimal convolution

x 7→ inf∑

gj(uj) |∑

uj = u

is convex,

3. the supremum u 7→ supj∈J gj(u) is convex,

4. the marginal function u 7→ infx p(x, u) is convex.

Proof. Exercise.

The function g is called positively homogeneous if

g(αu) = αg(u) ∀ u ∈ dom g and ∀ α > 0,

and sublinear if

g(α1u1 + α2u2) ≤ α1g(u1) + α2g(u2) ∀ ui ∈ dom g and ∀ αi > 0.

The second part in the following exercise shows that norms are convex.

Exercise 4.0.2. Let g be an extended real-valued function on U .

1. If g is positively homogeneous and convex, then it is sublinear.

2. If g is positively homogeneous, then it is convex if and only if

g(u1 + u2) ≤ g(u1) + g(u2) ∀ ui ∈ dom g.

3. If g is convex, then

G(λ, u) =

λg(u/λ) if λ > 0,

+∞ otherwise

is positively homogeneous and convex on R× U . In particular,

p(u) = infλ>0

G(λ, u)

is positively homogeneous and convex on U .

The third part above is sometimes a surprising source of convexity. It alsoimplies properties for recession functions and directional derivatives introducedlater on.

Let G be a function from a subset domG of X to U and let K ⊂ U be a convexcone. The function G is K-convex if

epiKG := (x, u) | x ∈ domG,G(x)− u ∈ K

is a convex set in X × U . Note that domG is convex, being the projection ofepiK G to X. When G : X → R, G is convex if and only if it is K-convex forK = R−, in which case epiK G = epiG.

11

Page 12: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Lemma 4.2. The function G is K-convex if and only if domG is convex and

G(λx1 + (1− λ)x2)− λG(x1)− (1− λ)G(x2) ∈ K

for every xi ∈ domG and λ ∈ (0, 1)

Proof. Exercise.

The composition gG of g and G is defined by

dom(gG) := x ∈ domG | G(x) ∈ dom g(gG)(x) := g(G(x)) ∀x ∈ dom(gG).

The range of G is denoted by rgeG.

Theorem 4.3. If G is K-convex and g is convex such that g(u1) ≤ g(u2)whenever u1 ∈ rgeG and u1 − u2 ∈ K, then gG is convex.

Proof. Exercise.

Exercise 4.0.3. Let G a convex function on X and h a nondecreasing convexfunction on rgeG. Then hG is convex.

Exercise 4.0.4. The function g(u) = Πni=1u

λii is concave on Rn, where u =

(u1, . . . , un), λi > 0 and∑ni=1 λi < 1.

Hint: When∑ni=1 λi = 1, apply Exercise 4.0.2. For the general case, combine

with the composition rule.

4.1 Lower semicontinuity, recessions and directional deriva-tives

Assume now that g is an extended real-valued function on U . The function g issaid to be proper if it is not identically +∞ and if it never takes the value −∞.The function g is lower semicontinuous (lsc) if the level-set

levα g := u ∈ U | g(u) ≤ α

is closed for each α ∈ R. Equivalently, g is lsc if its epigraph is closed, or if, forevery u ∈ U ,

supA∈Hu

infu′∈A

g(u′) ≥ g(u).

For sequences, a lsc function g satisfies

lim infuν→u

g(uν) ≥ g(u).

When U is ”sequential” (e.g., a Banach space), this property is equivalent tolower semicontinuity. We denote

argmin g := u ∈ U | g(u) = infu′∈U

g(u′).

12

Page 13: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 4.4. Let J be an arbitrary index set, g a lsc function and (gj)j∈J acollection of lsc functions on a topological vector space U . Then,

1. for a continuous F : V → U , gF is lsc.

2. for finite J and strictly positive (λj)j∈J , the sum∑j∈J λ

jgj is lsc,

3. the supremum u 7→ supj∈J gj(u) is lsc.

Proof. Exercise.

Let g be a lsc convex function. Given u ∈ dom g, the function

G(λ, u) :=

λ(g(u+ u/λ)− g(u)) if λ > 0,

+∞ otherwise.

is positively homogeneous and convex on R×U by Exercise 4.0.2. The functionG(·, u) is a decreasing on R+, i.e.,

λ 7→ g(u+ λu)− g(u)

λ

is increasing on R+. The function

u 7→ g′(u;u) = limλ0

g(u+ λu)− g(u)

λ

gives the directional derivative of g at u. We have

g′(u;u) = infλ0

g(u+ λu)− g(u)

λ,

so g′(u, ·) is positively homogeneous and convex by Exercise 4.0.2. The function

g∞(u) = limλ∞

g(u+ λu)− g(u)

λ

is called the recession function of g. Note that g∞ is independent of the choiceu, and

g∞(u) = supλ>0

g(u+ λu)− g(u)

λ,

so g∞ is positive homogeneous and convex. Since g is lsc, g∞ is lsc as well.

Theorem 4.5. For a proper lsc convex function g, the function

G(λ, u) :=

λ(g(u+ u/λ)− g(u)) if λ > 0,

g∞(u) if λ = 0,

+∞ otherwise.

is lsc.

13

Page 14: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. Exercise.

For a convex C, the set

C∞ := x | x′ + λx ∈ C ∀x′ ∈ C, λ > 0

is called the recession cone of C. For a closed C, we have δ∞C = δC∞ , so therecession cone is closed for a closed C. This is a consequence of the followinglemma.

Lemma 4.6. For a closed convex C in a topological vector space,

C∞ := u | u+ λu ∈ C ∀λ > 0

for any u ∈ C.

Proof. It suffices show that the right side is a subset of C∞. Let u 6= 0 andu ∈ C be such that u + λu ∈ C for all λ > 0. Let u′ ∈ C and λ′ > 0. For anyλ ≥ λ′,

u′ + λ′u+λ′

λ(u− u′) = (1− λ′

λ)u′ +

λ′

λ(u+ λu) ∈ C

by convexity. Since C is closed, letting λ∞ gives u′ + λ′u ∈ C.

Exercise 4.1.1. If (xν) is a sequence in a closed convex C, λν 0 and λνxν →x, then x ∈ C∞.

In a topological vector space, a set C is bounded if for any neighborhood A ofthe origin, C ⊂ λA for some λ > 0. In a normed space, like Rn, this means thatC is contained in some ball.

Theorem 4.7. A convex set C in Rd is bounded if and only if (clC)∞ = 0.

Proof. Exercise.

Remark 4.8. In a general LCTVS, a closed set C need not be bounded eventhough C∞ = 0. Consider, e.g. U = L∞, the space of essentially boundedrandom variables equipped with a topology generated by the essential supremumnorm. The set C = u ∈ L∞ | E|u| ≤ 1 is closed and C∞ = 0. If thereare non-null Aν ∈ F with P (Aν) 0, then uν := 1Aν/P (Aν) belongs to C but‖uν‖L∞ = P (Aν)∞, so C is not bounded.

Recall that in Rd, a set is compact if and only if it is bounded and closed.

Theorem 4.9. Let g : Rd → R be a proper lsc convex function. Then lev≤α gis bounded (and hence compact) for every α if and only if

x | g∞(x) ≤ 0 = 0.

In this case, argmin g is nonempty and compact.

14

Page 15: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. Exercise.

Exercise 4.1.2. Assume that g : Rn ×Rm → R is a proper lsc convex functionsuch that

L := x ∈ Rn | g∞(x, 0) ≤ 0

is a linear space. Then g(x+ x′, u) = g(x, u) for every x′ ∈ L.

Theorem 4.10. Assume that g : Rn×Rm → R is a proper lsc convex functionsuch that

L := x ∈ Rn | g∞(x, 0) ≤ 0

is a linear space. Then the infimum in

p(u) := infx∈Rn

g(x, u)

is attained, p is a proper lsc convex function and

p∞(u) = infx∈Rn

g∞(x, u).

Proof. To prove that p that is lsc, it suffices to show that lev≤β p∩B is compact(and hence closed) for every closed ball B. That g(x+x′) = g(x) for all x′ ∈ L isleft as an exercise. Let L⊥ := x ∈ Rn | x · x′ = 0 ∀ x′ ∈ L and g = g + δL⊥×Bso that p(u) + δB(u) = p(u) := infx∈Rn g(x, u).

For a proper lsc convex function f : Rn → R, we have that lev≤β f is bounded(and hence compact) for every β if and only if x | f∞(x) ≤ 0 = 0 (anexercise). Applying this to x 7→ g(x, u), this function has inf-compact level setsand thus the infimum in the definition of p and in that of p is attained for everyu. In particular, we have lev≤β p ∩ B = Π(lev≤β g) where Π is the projectionfrom Rn × Rm to Rm. We have

(x, u) | g∞(x, u) ≤ 0 = (x, u) | u = 0, x ∈ L⊥, g∞(x, u) ≤ 0 = (0, 0),

so g has compact-level sets as well. Thus lev≤β p∩B is a projection of a compactset and hence compact.

The proof of the recession formula is left as an exercise.

4.2 Continuity of convex functions

Given a set A ⊂ U ,

coreA = u ∈ U | ∀u′ ∈ U ∃λ : u+ λ′u ∈ A ∀λ ∈ (0, λ)

is known as the core (or algebraic interior) of A and its elements are calledinternal points, not to be confused with interior points. We always have intA ⊂coreA.

15

Page 16: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 4.11. If a convex function is bounded from above on an open set,then it is continuous throughout the core of its domain.

Proof. Let g be a function that is bounded from above on a neighborhood Oof u. We show first that this implies that g is continuous at u. Replacing g byg(u + u) − g(u), we may assume that u = 0 and that g(0) = 0. Hence thereexists M > 0 such that g(u) ≤M for all u ∈ O.

Let ε > 0 be arbitrary and choose λ ∈ (0, 1) with λM < ε. We have g(λu) < εfor each u ∈ O by convexity. Moreover

0 = g((1/2((−λu) + (1/2)λu) ≤ (1/2)g(−λu) + (1/2)g(λu),

which implies that −g(−λu) ≤ g(λu) for all u ∈ O ∩ (−O). Thus, |g(u)| < ε forall u ∈ α(O ∩ (−O)), so g is continuous at the origin.

Assume now that g is bounded from above on an open set A, i.e., there is Msuch that g(u) ≤ M for each u ∈ A. By above, it suffices to show that g isbounded from above at each u ∈ core dom g. Let u′ ∈ A. There is u ∈ dom gand λ ∈ (0, 1) with u = λu+ (1− λ)u′. We have

g(λu+ (1− λ)u)) ≤ λg(u) + (1− λ)M ∀u ∈ A,

so g is bounded from above on a open neighborhood λu+ (1− λ)A of u.

In Rd, geometric intuition suggests that a convex function is continuous on thecore of its domain. This idea extends to lsc convex functions on a barreled space.The LCTVS space U is barreled if every closed convex symmetric absorbing setis a neighborhood of the origin2. A set C is called absorbent if

⋃α∈R+

(αC) = U .A set is absorbent if and only if the origin belongs to its core. For example,every Banach space is barreled3.

Lemma 4.12. Let A ⊂ U be a convex set. Then intA = coreA under any ofthe following conditions:

1. U is finite dimensional,

2. intA 6= ∅,

3. U is barreled and A is closed.

Proof. We leave the first case as an exercise. To prove the second, it suffices toshow that, for u ∈ coreA, we have u ∈ intA. Let u′ ∈ intA. There is λ ∈ (0, 1)and u ∈ A with u = λu′ + (1− λ)u. Now

u+ (1− λ)(intA− u′) = u+ (1− λ) intA ⊂ A2It is an exercise to show that in a LCTVS, every neighborhood of the origin is absorbing

and contains a closed convex symmetric neighborhodhood of the origin.3An application of the Baire category theorem: if U =

⋃n∈N(nC) for a closed C, then

intC 6= ∅

16

Page 17: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

where the left side is an open neighborhood of u.

To prove the last claim, let U be barreled and A closed. Again, it suffices toshow that, for u ∈ coreA, we have u ∈ intA. Let B = A−u. Now 0 ∈ coreB, so0 ∈ core(B∩(−B)). Thus (B∩(−B)) is a closed convex symmetric absorbing setand hence it is a neighborhood of the origin. Thus 0 ∈ intB and x ∈ intA.

Theorem 4.13. A convex function g is continuous on core dom g in the follow-ing situations:

1. U is finite dimensional

2. U is barreled and g is lower semicontinuous.

Proof. We leave the first part as an exercise. To prove the second, let u ∈core dom g and α > g(u). For u′ ∈ U , the function λ 7→ g(u+λu′) is continuousat the origin by the first part, so u ∈ core levα g. By Lemma 4.12 and lowersemicontinuity of g, int levα g 6= ∅. Thus continuity follows from Theorem 4.11.

5 Convex normal integrands

Throughout, every finite dimensional space is equipped with the usual topologyand the usual Borel-σ-algebra, and S : Ω ⇒ Rd is a set-valued mapping, i.e., forevery ω, S(ω) ⊂ Rd. The mapping S is measurable if the preimage

S−1(O) := ω ∈ Ω | S(ω) ∩O 6= ∅

of every open O ⊂ Rd is measurable, i.e. S−1(O) ∈ F for every open O.The mapping S is (resp. convex, cone, etc.) closed-valued when S(ω) is (respconvex,conical, et.c) closed for each ω ∈ Ω. The set

domS := ω | S(ω) 6= ∅

is the domain of S. Being the preimage of the whole space, it is measurable assoon as S is measurable. Evidently, if S is measurable, then its image-closuremapping ω 7→ clS(ω) is measurable, since its preimages of open sets coincidewith those of S. The function

d(x,A) = infx′∈A

|x− x′|

is the distance mapping.

Theorem 5.1. Let S : Ω ⇒ Rn be closed-valued. The following are equivalent.

1. S is measurable,

17

Page 18: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

2. S−1(C) ∈ F for every compact set C,

3. S−1(C) ∈ F for every closed set C,

4. S−1(B) ∈ F for every closed ball B,

5. S−1(O) ∈ F for every open ball O,

6. ω ∈ Ω | S(ω) ⊂ O ∈ F for every open O,

7. ω ∈ Ω | S(ω) ⊂ C ∈ F for every closed C,

8. ω 7→ d(x, S(ω)) is measurable for every x ∈ Rn.

Proof. Assuming 1, for a compact C we have S−1(C) =⋂ν S−1(C +B(0, 1/ν))

which proves 2. For a closed C, we have C =⋃ν(C ∩ B(0, ν), so 3 implies

4. Clearly, 3 implies 4. For any open ball O, O =⋃q,rB(q, r) |,B(q, r) ⊂

O, (q, r) ∈ Qn,Q+ , so 4 implies 5. Any open O is a countable union of openballs, so 5 implies 1.

Since ω | S(ω) ⊆ A = Ω\S−1(AC), 3 and 6 are equivalent, and 4 and 7 areequivalent. For any α ∈ R+, d(x, S(ω) ≤ α if and only if S(ω) ∩ B(x, α) 6= ∅, so4 and 8 are equivalent.

For a set-valued mapping S : Ω ⇒ Rn,

gphS := (x, ω) ∈ Rn × Ω | x ∈ S(ω)

is the graph of S.

Corollary 5.2. The graph of a measurable closed-valued mapping is B(Rn)⊗F-measurable.

Proof. Exercise. Hint: consider open balls with rational centers and rationalradii.

Measurability is preserved under many algebraic operations.

Theorem 5.3. Let J be a countable index set and let each Sj, j ∈ J , be ameasurable set-valued mapping. Then

1. ω 7→⋂j∈J S

j(ω) is measurable if each Sj is closed,

2. ω 7→⋃j∈J S

j(ω) is measurable,

3. ω 7→∑j∈J λ

jSj(ω) is measurable for finite J , where λj ∈ R,

4. ω 7→ (S1(ω), . . . , Sj(ω)) is measurable for finite J ; here we may allowSj : Ω ⇒ Rdj .

18

Page 19: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. 4. Let R(ω) = (S1(ω), . . . , Sj(ω)). Every open set O in the productspace is expressible as a union of rectangular open sets ×jOjν . Thus R−1(O) =⋃ν(∩j(Sj)−1(Ojν)), where each (Sj)−1(Ojν) is measurable by the assumption.

3. Let R(ω) =∑Jj=1 λ

jSj(ω). For an open O, the set

O′ = (x1, . . . , xJ)|J∑j=1

λνxj ∈ O

is open in ×Jj=1Rd. Now

R−1(O) = ω | (J∑j=1

λjSj)(ω) ∩O 6= ∅

= ω | (S1(ω), . . . , SJ(ω)) ∩O′ 6= ∅,

where the set on the right-hand side is measurable by part 4.

2. (⋃j∈J S

j)−1(O) =⋃j∈J (Sj)−1(O) for any open O.

1. Assume first that J = 1, 2. Take any compact C ⊂ Rd, and denoteRν(ω) = Sν(ω) ∩ C, then

(S1 ∩ S2)−1(C) = ω | S1(ω) ∩ S2(ω) ∩ C 6= ∅= ω | 0 ∈ R1(ω)−R2(ω)= (R1 −R2)−1(0).

Here R1 − R2 is measurable by part 3; let us show that it is closed-valued aswell. Since Sν are closed-valued, Rν are compact-valued, so R1−R2 is compactvalued. (an exercise). Hence S1∩S2 is measurable. The case of finite J followsfrom by induction.

Suppose finally that J is countable, J = 1, 2, 3, . . . . Denote Sµ =⋂µν=1 S

ν .

Note that⋂∞ν=1 S

ν(ω) =⋂∞µ=1 S

µ(ω), and that Sµ are measurable by preceding.The proof is complete as soon as we show

(⋂Sν)−1(C) =

∞⋂µ=1

(Sµ)−1(C).

If ω ∈ (⋂Sν)−1(C), it is straight-forward to check that ω ∈

⋂∞µ=1(Sµ)−1(C).

For the converse, take ω ∈⋂∞µ=1(Sµ)−1(C). Since (Sµ(ω) ∩ C)∞µ=1 is a nested

sequence of nonempty compact sets,⋂∞µ=1(Sµ(ω) ∩ C) 6= ∅. By

⋂∞ν=1 S

ν(ω) =⋂∞µ=1 S

µ(ω) this means that ω ∈ (⋂Sν)−1(C).

Theorem 5.4. If M(·, ω) : Rn ⇒ Rm is such that

gphM(ω) := (x, u) | u ∈M(x, ω)

defines a measurable closed-valued mapping, then the following mappings aremeasurable,

19

Page 20: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

1. R(ω) := M(S(ω), ω), where S : Ω ⇒ Rn is measurable and closed-valued,

2. ω 7→ x ∈ Rn | M(x, ω) ∩ S(ω) 6= ∅, where S : Ω ⇒ Rm is measurableand closed-valued.

Proof. Let Π : Rn ×Rm → Rm be the projection mapping. We have R = Π Qfor Q(ω) := [S(ω) × Rm] ∩ gphM(ω), which is measurable by Theorem 5.1.Since R−1(O) = Q−1(Π−1(O)), where Π−1(O) is open for any open O, R ismeasurable.

To prove 2, we have x |M(x, ω)∩S(ω) 6= ∅ = Γ(S(ω)) for Γ(ω) := M−1(·, ω).Here gph Γ(ω) = (x, u) | u ∈M(x, ω), so the result follows from 1.

A function h : Ω× Rn → R is a normal integrand on Rn if

epih(ω) := (x, α) ∈ Rn × R | h(x, ω) ≤ α

defines a measurable closed-valued mapping. A normal integrand is convex (pos-itively homogeneous etc.), if, for all ω, h(·, ω) is convex (positively homogeneousetc). The indicator function of S,

δS(ω)(x) := δS(x, ω) :=

0 if x ∈ S(ω),

+∞ otherwise

defines a normal integrand on Rn if and only if S is closed-valued and measur-able.

A function h is a Caratheodory integrand if h(·, ω) is continuous for each ω ∈ Ωand h(x, ·) is measurable for each x ∈ Rn.

Theorem 5.5. A Caratheodory integrand is a normal integrand.

Proof. Let xν | ν ∈ N be a dense set in Rd and define αν,q(ω) = h(xν , ω) + q,where q ∈ Q+. Since h(·, ω) is continuous, the set O = (x, α) | h(x, ω) <α is open. For any (x, α) ∈ epih(·, ω) and for any open neighborhood O of(x, α), O ∩ O is open and nonempty, and there exists (xν , αν,q) ∈ O ∩ O, i.e.,(xν , αν,q(ξ) | ν ∈ N, q ∈ Q is dense in epih(·, ω). Thus for any open setO ∈ U × R,

ω | epih(ω) ∩O 6= ∅ =⋃ν,q

ω | (xν , αν,q(ω)) ∈ O

is measurable.

LetSoh(ω) := (x, α) ∈ Rn × R | h(x, ω) < α.

Since preimages of open sets under Sh and Soh are the same, one is measurableif and only if the other is so.

20

Page 21: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 5.6. For a normal integrand h on Rn and β ∈ L0, the level-setmapping

ω 7→ x ∈ Rd | h(x, ω) ≤ β(ω)

is measurable and closed-valued. A function h : Rn × Ω → R is a normalintegrand if and only if

lev≤β h(ω) := x ∈ Rn | h(x, ω) ≤ β

is measurable and closed-valued for every β ∈ R.

Proof. Let S(ω) := x ∈ Rd | h(x, ω) ≤ β(ω). For a closed C ⊂ Rn, R(ω) :=C × α | α ≤ β(ω) is measurable and closed-valued (an exercise). Now

S−1(C) = ω | S(ω) ∩ C 6= ∅= ω | Sh(ω) ∩R(ω) 6= ∅= dom(epih ∩R)

which shows the measurability of S.

To prove the second claim, note first that the closedness of the level sets impliesthat epih is closed-valued. Let (βν) be a dense sequence in R. Since countableunions of measurable mappings are measurable,

lev<βν h(ω) := x ∈ Rn | h(x, ω) < βν

are measurable. We have

epiho(ω) =⋃ν

(lev<βν h(ω)× [βν ,∞)) ,

so Sh is measurable.

Theorem 5.7. The following are normal integrands:

1. h(x, ω) = supi∈J hi(x, ω), where hi are normal integrands and J is count-

able.

2. h(x, ω) =∑ni=1 h

i(x, ω), where hi are normal integrands.

3. h(·, ω) = α(ω)h0(·, ω), where h0 is a normal integrand and α ∈ L0(R+).When α(ω) = 0, the scalar multiplication is defined as α(ω)h0(·, ω) =cl domh0(·, ω).

4. h(x, ω) = f(x, u(ω), ω), where f : Rn×Rm×Ω→ R is a normal integrandand u ∈ L0(Ω;Rm).

Proof. In 1, epih = ∩ epihi, so the claim follows directly from Theorem 5.3.Below, in each case, M satisfies the assumption of Theorem 5.4, which can beverified using Theorems 5.5, 5.6 and 5.3.

21

Page 22: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

To prove 2, let S = epih1 × · · · × epihJ and

M(x1, α1, . . . , xJ , αJ) =

(x, α1 + · · ·+ αJ) if x1 = · · · = xJ = x

∅ otherwise

so that epih = MS and the claim follows from Theorem 5.4.1. The multipli-cation 3 is an exercise.

In 4, we have epih(ω) = x | M(x, ω) ∈ epi f(·, ·, ω) for M(x, α, ω) :=(x, u(ω), α), so Theorem 5.4.2. applies.

Theorem 5.8. Assume that f : Rn × Rm × Ω → R is a normal integrand onRn × Rm and let

p(u, ω) := infx∈Rn

f(x, u, ω).

The function defined by clu p(u, ω) is a normal integrand on Rm. In particular,if p(·, ω) is lsc for every ω, p is a normal integrand.

Proof. Let Π(x, u, α) = (u, α) be the projection from Rn ×Rm ×R to Rm ×R.It is easy to check that Π epi fo(ω) = epi po(ω), so, for an open O ⊂ Rn × R,

(epi po)−1(O) = (epi fo)−1(Π−1(O)),

where the right side is measurable, since f is a normal integrand and Π iscontinuous. Thus epi p is measurable. Moreover, ω 7→ cl epi p is measurable,which shows that (u, ω) 7→ (lscu p)(u, ω) is a normal integrand. If p is lsc in thefirst argument, it is thus a normal integrand.

Corollary 5.9. For a normal integrand h, p(ω) := infx h(x, ω) is measurable,and

S(ω) := argminx

h(x, ω)

is measurable and closed-valued.

Proof. The first claim follows from Theorem 5.8. We have

S(ω) = x | h(x, ω) ≤ p(ω),

so S is measurable by Theorem 5.6. Since h(·, ω) is lsc, S is closed-valued.

Example 5.10. For a measurable closed-valued S : Ω ⇒ Rd and x ∈ L0(Rd),the projection mapping

ω 7→ PS(ω)(x(ω)) := argminx′∈S(ω)

|x′ − x(ω)|

is measurable and closed-valued. Indeed, this follows from Corollary 5.9 appliedto h(x′, ω) := δS(ω)(x

′) + |x′ − x(ω)|.

22

Page 23: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Corollary 5.11. Let h be a normal integrand such that h(x) ≥ −ρ|x| −m forsome ρ,m ∈ L0. The functions

hν(x, ω) := infx′∈Rd

h(x′, ω) + νρ(ω)|x− x′| ν ∈ N

are Caratheodory integrands, (νρ)-Lipschitz with hν(x) ≥ ρ|x| − m and as νincreases, they increase pointwise to h.

Proof. For any x1, x2 ∈ Rn,

hν(x1, ω) ≤ infyh(x′, ω) + νρ|x′ − x2|+ νρ(ω)|x2 − x1|

= hν(x2, ω) + νρ(ω)|x2 − x1|.

By symmetry, hν(·, ω) is νρ(ω)-Lipschitz continuous. For every ν and ε > 0,there is a yν such that

hν(x, ω) ≥ h(yν , ω) + νρ(ω)|yν − x| − ε≥ −ρ(ω)|yν | −m(ω) + νρ(ω)|yν − x| − ε≥ −ρ(ω)|x| −m(ω) + (ν − 1)ρ(ω)|yν − x| − ε.

Thus, either hν(x, ω)→∞ or yν → x as ν →∞. In the latter case,

lim inf hν(x, ω) ≥ h(x, ω)

by lower semicontinuity of h.

A function x : Ω→ Rd is called a selection of S if x(ω) ∈ S(ω) for all ω ∈ domS.The set of measurable selections of S is denoted by L0(S). The sequence (xν)of measurable selections of S in the following theorem is known as a Castaingrepresentation of S.

Theorem 5.12 (Castaing representation). Let S : Ω ⇒ Rd be closed-valued.Then S is measurable if and only if domS is measurable and there exists asequence (xν) ∈ L0(Rd) such that, for all ω ∈ domS,

S(ω) = clxν(ω) | ν = 1, 2, . . . .

Proof. Assuming the Castaing representation exists, we have, for an open O,

S−1(O) =

∞⋃ν=1

(xν)−1(O),

so S is measurable. Assume now that S is measurable. Let J be the countablecollection of q = (q0, q1, . . . , qd), q ∈ Qd such that q0, q1, . . . , qd are affinelyindependent. For each q ∈ J , we define recursively Sq,0(ω) := PS(ω)(q0) and

Sq,i(ω) := PSq,i−1(ω)(qi).

23

Page 24: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

These mappings are measurable and closed-valued, by Example 5.10. Moreover,Sq,d(ω) is a singleton, a point in S(ω) nearest to q0. Setting xq(ω) := Sq,d(ω),(xq) is a Castaing representation of S.

Let us verify that Sq,d is single-valued. We fix ω and omit it from the nota-tion. By the recursive definition of Sq,i, for each qi, there is ri ≥ 0 such thatSq,d ⊂ ∂B(qi, ri). Thus, for any x ∈ Sq,d, |x − qi|2 = (ri)2 for all i. By affineindependence, these equations have a unique solution.

Corollary 5.13 (Measurable selection theorem). Any measurable closed-valuedS : Ω ⇒ Rn admits a measurable selection.

Corollary 5.14 (Doob–Dynkin, set-valued version). Assume that S is σ(ξ)-measurable closed random set for ξ ∈ L0(Rd). Then there exists measurableclosed-valued S : Rd ⇒ Rn such that S(ω) = S(ξ(ω)).

Proof. Let (xν) be a σ(ξ)-measurable Castaing representation of S. By Doob-Dynkin lemma, there exist Borel-measurable gν : Rd → Rn such that xν(ω) =gν(ξ(ω)). Let

S(y) = clgν(y) | ν = 1, 2, . . .

so that S(ω) = cl(gν(ξ(ω)) | ν = 1, 2, . . . ) = S(ξ(ω)).

Corollary 5.15. If h is a σ(ξ)-normal integrand on Rn for an Rd-valued ran-dom variable ξ, then

h(x, ω) = H(x, ξ(ω))

for a B(Rd)-normal integrand H on Rn.

Proof. Apply Corollary 5.14 to the epigraphical mapping of h.

Corollary 5.16. A closed-valued mapping S : Ω ⇒ Rn is measurable if andonly if there exists a sequence (xν) ∈ L0(Rd) such that

1. for each ν, ω | xν(ω) ∈ S(ω) is measurable,

2. for each ω, S(ω) ∩ xν(ω) | ν ∈ N is dense in S(ω).

Proof. By Theorem 5.12, it suffices to show that the necessity is equivalent tothe existence of a Castaing representation. If a Castaing representation, exists,it satisfies 1 and 2. When (xν) ∈ L0(Rd) satisfies 1 and 2,...??

Theorem 5.17 (Section theorem). For normal integrands h and h, we haveh ≤ h almost surely everywhere if and only if, for every w ∈ L∞(F), h(w) ≤h(w) almost surely. In particular, h = h almost surely everywhere if and only ifh(w) = h(w) for every w ∈ L∞(F) almost surely.

24

Page 25: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. Necessity is obvious. Let hν and hν be the respective Lipschitz-regularizations.For w ∈ L∞, we have h(w) ≤ h(w) if and only if hν(w) ≤ hν(w) for every ν.Thus it suffices to prove the claim for Lipschitz integrands h and h.

We assume for a contradiction that, for some constants α and β with α < β,the domain of

S(ω) = x ∈ Rd | h(x, ω) ≥ β > α ≥ h(x, ω)

is not evanescent. Since −h is a normal integrand and

S = (lev≤α h) ∩ (lev≤−β(−h)),

S is measurable and closed-valued by Theorems 5.6 and 5.3. By Corollary 5.13,there is a measurable w with w ∈ S on domS. For ν large enough, w := w1|w|≤νand A := |w| ≤ ν ∩ domS satisfy P (A) > 0 which is a contradiction with theassumption that h(w) ≤ h(w) almost surely.

A function M : Rn×Ω→ Rm is a Caratheodory mapping if M(·, ω) is continuousfor all ω and M(x, ·) is measurable for all x ∈ Rn.

Theorem 5.18. When M is a Caratheodory mapping,

gphM(·, ω) := (x, u) | u ∈M(x, ω).

Proof. For any countable dense D ⊂ Rn, (x,M(x, ω)) | x ∈ D is a Castaingrepresentation of gphM .

The rest of the section is concerned with convexity.

For a convex-valued S, we define S∞ as the ω-wise recession cone of S, i.e.

S∞(ω) = x | x+ λx ∈ S(ω) ∀ x ∈ S(ω), λ > 0.

Theorem 5.19. For a measurable closed convex-valued S, S∞ is measurableclosed convex-valued.

Proof. By Corollary 5.13, there is x ∈ L0(Rn) such that x ∈ S on domS. ByTheorem ?? in the appendix,

S∞(ω) =

∞⋂ν=1

1

ν(S(ω)− x(ω)),

for ω ∈ domS. The measurability now follows from Theorem 5.3.

Given a convex normal integrand h, we define h∞ scenariowise as

h∞(·, ω) := h(·, ω)∞;

see the Appendix.

25

Page 26: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 5.20. For a convex normal integrand h, h∞ is a normal integrand.If h is proper, then

h∞(x, ω) = supλ>0

h(x(ω) + λx)− h(x(ω), ω)

λ∀(x, ω) ∈ Rn × Ω

for every x ∈ L0(domh).

Proof. By Theorem 5.19, h∞ is a normal integrand.

Theorem 5.21. Assume that f is a convex normal integrand and that the set-valued mapping

N(ω) = x ∈ Rn | f∞(x, 0, ω) ≤ 0is linear-valued. Then

p(u, ω) := infx∈Rn

f(x, u, ω)

is a normal integrand with

p∞(u, ω) = infx∈Rn

f∞(x, u, ω).

Moreover, given a u ∈ L0(F), there is an x ∈ L0(F) with x(ω) ⊥ N(ω) and

p(u(ω), ω) = f(x(ω), u(ω), ω).

Proof. By Theorem 4.10, the linearity condition implies that the infimum inthe definition of p is attained and that p(·, ω) is a lower semicontinuous convexfunction with

p∞(u, ω) = infx∈Rn

f∞(x, u, ω).

By Theorem 5.8, the lower semicontinuity implies that p is a normal integrand.By Theorem ??, there is an xt that attains the minimum for every ω. By Theo-rem 4.10, we may replace xt(ω) by its projection to the orthogonal complementof Nt(ω). By Example 5.10, such a projection preserves measurability.

Given an extended real-valued function g on Rm and an Rm-valued functionH on a subset domH of Rn, we define their composition as the extended real-valued function

(gH)(x) :=

g(H(x)) if x ∈ domH,

+∞ if x /∈ domH.

Given a convex cone K ⊂ Rm, the function H is said to be K-convex if the set

epiKH := (x, u) | x ∈ domH, H(x)− u ∈ K

is convex. A K-convex function is closed if epiK H is a closed set. It is easilyverified (see the proof below) that if g is convex and H is K-convex then hHis convex if

H(x)− u ∈ K =⇒ g(H(x)) ≤ g(u) ∀x ∈ domH. (5.1)

26

Page 27: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

We say thatH : Rn×Ω→ Rm is aK-convex normal function if ω 7→ epiK H(·, ω)is closed convex-valued and measurable.

Theorem 5.22. The following are convex normal integrands:

1. h(·, ω) = α(ω)h0(·, ω), where α ∈ L0+ and h0 is a convex normal integrand.

When α(ω) = 0, the scalar multiplication is defined as α(ω)h0(·, ω) =cl domh0(·, ω).

2. h(x, ω) =∑ni=1 h

i(x, ω), where hi are convex normal integrands.

3. h = gH, where g is a convex normal integrand and H is a K-convexnormal function such that (5.1) holds almost surely and (−K) ∩ u ∈Rm | g∞(u, ω) ≤ 0 is linear.

4. h(x, ω) = g(A(ω)x, ω) where g is a convex normal integrand and A : Rn×Ω→ Rd is an affine Caratheodory mapping.

Proof. The first two parts follow from Theorem 5.7 and the fact that scalarmultiplication and the sum preserve convexity. By 2,

h(x, u, ω) := g(u, ω) + δepiK H(·,ω)(x, u),

defines a convex normal integrand. The growth condition gives

(gH)(x, ω) = infu∈Rm

h(x, u, ω)

while the linearity condition implies, by Theorem 5.21, that this expression is anormal integrand. Part 4, follows from 3 by choosing H = A and K = 0.

For a normal integrand h, we define h∗ scenariowise, that is,

h∗(y, ω) := supuu · y − h(u, ω).

Theorem 5.23. For a normal integrand h, h∗ is a convex normal integrand.

Proof. Let (xν , αν) be a Castaing representation of epih. On dom epih,

h∗(y, ω) =

supνxν(ω) · y − αν(ω) ω ∈ dom epih,

−∞ ω /∈ dom epih,

this formula will be justified later on in the Section ”Conjugate duality”. Beinga countable supremum of normal integrands (in fact, Caratheodory integrands),h∗ is normal.

In particular, for a measurable S,

σS(y, ω) := σS(ω)(v) := supx∈S(ω)

x · v

is a normal integrand.

27

Page 28: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Exercise 5.0.1. Show that, for a convex normal integrand h,

H(x, α, ω) :=

αh(x/α, ω) if α > 0,

h∞(x, ω) if α = 0,

+∞ otherwise

is a convex normal integrand and that, for α ∈ L0(R+),

h0(x, ω) :=

α(ω)h(x/α(ω), ω) if α(ω) > 0,

h∞(x, ω) if α(ω) = 0,

+∞ otherwise

is a convex normal integrand. Hint: Compute the conjugate H∗ of H and then(H∗)∗.

6 Integral functionals

Given a normal integrand h, the associated integral functional Eh : L0 → R isdefined by

Eh(x) :=

∫h(x(ω), ω)dP (ω).

Here and what follows, the expectation of an extended real-valued random vari-able is defined as +∞ unless the positive part is integrable. Likewise, the sum ofαi ∈ R is defined as +∞ if αi = +∞ for some i. That the integral is well-definedfor any x ∈ L0 follows from the following lemma.

Lemma 6.1. A normal integrand h is B(Rn) ⊗ F-measurable from Rn × Ω toR. In particular,

ω 7→ h(x(ω), ω)

is measurable for any x ∈ L0(Rn).

Proof. Note that [−∞, β] | β ∈ R generate the σ-algebra of R. For β ∈ R,lev≤β h is closed-valued and measurable by Theorem 5.6, so (x, ω) | h(x, ω) ≤β is measurable by Corollary 5.2. The second claim follows from the fact thecompositions of measurable functions are measurable.

A vector space X ⊆ L0 is decomposable if L∞ ⊂ X and 1Ax ∈ X wheneverA ∈ F and x ∈ X . Equivalently, X is decomposable if

1Ax+ 1Ω\Ax′ ∈ U

whenever A ∈ F , x ∈ X and x′ ∈ L∞. Examples of decomposable spacesinclude Lp-spaces with p ≥ 0 and Orlicz spaces.

28

Page 29: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 6.2 (Interchange rule). For a normal integrand h : Ω×Rn → R anda decomposable X , we have

infx∈X

Eh(x) = E[ infx∈Rn

h(x)]

if X = L0 or the left side is less than +∞. As long as this common value isfinite,

argminx∈X

Eh = x ∈ X | x ∈ argminh P -a.s.

Proof. By Theorem 5.8, the function p defined by

p(ω) := infx∈Rm

h(x, ω)

is measurable. Let α > Ep and ε > 0 be small enough so that Eβ < α forβ := ε+ maxp,−1/ε. The mapping

S(ω) := x | h(x, ω) ≤ β(ω)

is measurable and closed-valued with P (domS) = 1, so, by Corollary 5.13, thereexists x ∈ L0(Rn) such that h(x) ≤ β almost surely and thus Eh(x) ≤ Eβ < α.Let x ∈ X be such that Eh(x) <∞, and define

xν = 1|x|≤νx+ 1|x|>ν x.

By construction, xν ∈ X , and h(xν) ≤ maxh(x), h(x) for all ν, so, by Fatou’slemma, Eh(xν) < E[h(x)] + ε for ν large enough. Since α > Ep was arbitrary,this proves the first claim.

Assume now that Ep is finite. If x′ ∈ argminh almost surely, then x′ ∈argminEh. If x′ 6∈ argminh, then P (A) > 0 for A := h(x′) > ε + pfor some ε > 0, and, as above, there is x such that h(x) < h(x′) on A, soEh(1Ax+ 1ACx

′) < Eh(x′), which means that x′ /∈ argminEh.

We equip L0 with the translation invariant metric

d(x, x′) := Eρ(|x′ − x|),

where ρ is a bounded nondecreasing sub-additive function vanishing only at theorigin. A sequence (xν) in L0 convergences in measure to an x ∈ L0 if

limν→∞

P (|xν − x| ≥ ε = 0

for all ε > 0.

Lemma 6.3. The space L0 is a complete metric vector space where a sequenceconverges if and only if it converges in probability. A sequence converges inprobability if and only if every subsequence has an almost surely convergentsubsequence with a common limit.

29

Page 30: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. Exercise.

Exercise 6.0.1. Show that, for Ω = [0, 1], F = B([0, 1]) and the Lebesquemeasure P , L0 is not locally convex. Hint: show that any nonempty open convexset is the whole L0.

We say that a normal integrand h is bounded from below if there is an m ∈ L1

such thath(x, ω) ≥ m(ω) ∀x ∈ Rn, ω ∈ A,

where A ∈ F is of full measure.

Theorem 6.4. If h is a normal integrand bounded from below, then Eh is lscon L0.

Proof. Let xν → x in L0. By Lemma 6.3, we may assume, by passing to asubsequence if necessary, that xν → x almost surely. By lower semicontinuityof h(·, ω),

lim infν→∞

h(xν(ω), ω) ≥ h(x(ω), ω)

almost surely so lower semicontinuity follows from Fatou’s lemma.

Exercise 6.0.2 (2 points). Show that the lower boundedness assumption inTheorem 6.4 can be relaxed as follows. Recall that A is an atom of P if forevery B ⊂ A, P (B) = 0 or P (B) = P (A).

Given a normal integrand h such that Eh is proper on L0, Eh is lsc on L0 ifand only if

A := ω ∈ Ω | infxh(x, ω) = −∞

contains only atoms of P and at most finitely many of them, and there existsm ∈ L1 such that h ≥ −m on AC . If Eh is not lsc, lscEh = −∞ on domEh.

When restricted to the space of N of adapted processes, the lower boundednessin Theorem 6.4 can be relaxed significantly using the following very useful result.We denote N⊥ = N ∩ L∞ and

N⊥ := v ∈ L1(Rn0+...nT ) | E[x · v] = 0 ∀x ∈ N∞.

Lemma 6.5. Let x ∈ N and v ∈ N⊥. If E[x · v]+ ∈ L1, then E[x · v] = 0.

Proof. Assume first that T = 0. Defining xν := 1|x|≤νx, we have xν ∈ L∞, soE[xν · v] = 0 and

E[x · v]− ≤ lim infν→∞

E[xν · v]− = lim infν→∞

E[xν · v]+ ≤ E[x · v]+,

where the inequalities follow from Fatou’s lemma. By dominated convergence,E[x · v] = limE[xν · v] = 0.

30

Page 31: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Assume now that the claim holds for every (T − 1)-period model. Definingxν := 1|x0|≤νx, we have

[

T∑t=1

xνt · vt]+ ≤ [xν · v]+ + [xν0 · v0]− ≤ [x · v]+ + [xν0 · v0]−,

so E[∑Tt=1 x

νt · vt] = 0, by the induction hypothesis. Since xν0 ∈ L∞, we get

E[xν · v] = 0. It then follows that E[x · v] = 0 just like in the case T = 0.

Example 6.6. If a martingale s and x ∈ N are such that

E[

T−1∑t=0

xt ·∆st+1]+ <∞,

then E[∑T−1t=0 xt ·∆st+1] = 0. This follows from Lemma 6.5 with v ∈ N⊥ defined

by vt = ∆st+1.

Lemma 6.7. Given extended real-valued random variables ξ1 and ξ2, we have

E[ξ1 + ξ2] = E[ξ1] + E[ξ2]

under any of the following:

1. ξ+1 , ξ

+2 ∈ L1,

2. ξ1 ∈ L1 or ξ2 ∈ L1,

3. ξ−1 , ξ−2 ∈ L1.

4. ξ1 or ξ2 is 0,+∞-valued.

Proof. Exercise.

The following is an immediate corollary of Lemma 6.7.

Lemma 6.8. Given normal integrands h1 and h2, we have

E[h1 + h2] = E[h1] + E[h2]

under any of the following:

1. the integrands are bounded from below.

2. h1 or h2 is an indicator function of a measurable closed-valued mapping.

Lemma 6.9. Given a normal integrand h and v ∈ L0, the following are equiv-alent.

31

Page 32: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

1. h(x, ω) ≥ λx · v(ω) −m(ω) for some m ∈ L1 and two different values ofλ ∈ R.

2. λv ∈ domEh∗ for two different values of λ ∈ R.

Definition 6.10. A normal integrand h is N⊥v -bounded if it satisfies one ofthe equivalent conditions of Lemma 6.9 with v ∈ N⊥. A normal integrand isN⊥-bounded if it is N⊥v -bounded fro some v ∈ N⊥.

Theorem 6.11. Assume that h is an N⊥v -bounded normal integrand with

h(x, ω) ≥ λx · v(ω)−m(ω)

where m ∈ L1 and λ ∈ R. Let

k(x, ω) := h(x, ω)− λx · v(ω).

Then Ek is lsc on L0 and Ek = Eh on N .

Proof. Theorem 6.4 gives the lower semicontinuity. If x ∈ domEh ∩ N , wehave E[x · v] = 0, by Lemma 6.5, so Ek(x) = Eh(x). On the other hand, byassumption, there exists λ′ 6= λ such that

h(x, ω) ≥ λ′x · v(ω)−m(ω)

sok(x, ω) ≥ (λ′ − λ)x · v(ω)−m(ω).

By Lemma 6.5 again, Ek = Eh on domEk ∩N .

The assumption in Theorem 8.19 may seem somewhat strange but it is satisfiedin many interesting applications as we will see in the following section.

If h is a convex normal integrand, then Eh is a convex function on L0.

Theorem 6.12. If h is a convex normal integrand such that Eh is lsc andproper, then

(Eh)∞ = Eh∞.

Proof. Let x ∈ domEh. Since Eh is lsc and h(x) is integrable, Theorem ?? andLemma 6.7 give

(Eh)∞(x) = supλ>0

E

[h(λx+ x)− h(x)

λ

].

The difference quotients

hλ(x(ω), ω) :=h(λx(ω) + x(ω), ω)− h(x(ω), ω)

λ

increase pointwise to h∞(x(ω), ω). Thus, (Eh)∞ ≤ Eh∞. If x + x /∈ domEh,then (Eh)∞(x) = +∞. If x+ x ∈ domEh, the claim follows from the monotoneconvergence theorem.

32

Page 33: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

7 Dynamic programming

The purpose of this section is to prove a general dynamic programming recursionwhich generalizes the classical Bellman equation for convex stochastic optimiza-tion. We will use the notion of a conditional expectation of a normal integrand.

An extended real-valued random variable X is said to quasi-integrable if eitherX+ or X− is integrable. For a quasi-integrable X, there exists an extended real-valued G-measurable random variable EGX, unique almost surely, such that

E[α(EGX)

]= E [αX] ∀α ∈ L∞+ (Ω,G, P ).

The random variable EGX is known as the G-conditional expectation of X.

Definition 7.1. Given a normal integrand h, a G-integrand EGh is a G-conditionalexpectation of h if

(EGh)(x) = EG [h(x)] a.s.

for all x ∈ L0(G) such that h(x) is quasi-integrable.

We will use notations xt = (x0, . . . , xt) and Et = EFt . We say that an adaptedsequence (ht)

Tt=0 of normal integrands ht : Rn0+···+nt × Ω → R solves the gen-

eralized Bellman equations if

hT = ETh,

ht−1(xt−1, ω) := infxt∈Rnt

ht(xt−1, xt, ω),

ht−1 = Et−1ht−1

(BE)

for t = T, . . . , 1. It is implicitly assumeed here that ht are normal integrandssince otherwise the last equation is not well defined.

In the above formulation, one does not separate the decision variables xt into“state” and “control” like in the classical dynamic programming models. A for-mulation closer to the classical dynamic programming equations will be givenin Corollary ?? below.

Given a collection C ⊂ L0 of random variables, a random variable ξ ∈ L0 issaid to be the essential infimum of C if ξ ≤ ξ′ almost surely for every ξ′ ∈ Cand if ξ ∈ L0 is another random variable with this property, then ξ ≤ ξ almostsurely. The essential infimum ξ exists, is unique almost surely and denoted byess infξ′∈C ξ

′.

Denote by N t the projection of N to its first t components. When h is boundedfrom below and EGh exists, we have E(EGh) = Eh.

Theorem 7.2. Assume that h is bounded from below, (SP ) is feasible and thatthe Bellman equations (BE) admit a solution (ht)

Tt=0. Then

ht(xt) = ess inf x∈N

Eth(x)

∣∣ xt = xt∀xt ∈ L0(Ft)

33

Page 34: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

and

inf (SP ) = infxt∈N t

Eht(xt)

for all t = 0, . . . , T and an x ∈ N is a solution of (SP ) if and only if

xt(ω) ∈ argminxt∈Rnt

ht(xt−1(ω), xt, ω) P -a.s. t = 0, . . . , T. (OC)

Proof. By the definition of ht, for every Ft-measurable xt,

ht(xt) = Et[ inf

xt+1

ht+1(xt, xt+1)] = ess infxt+1∈L0(Ft+1)Et[ht+1(xt, xt+1)],

where the last equality follows from Corollary 5.13 applied to ε-argmin; hence

Etht+1(xt, xt+1) = EtEt+1 infxt+2

ht+2(xt+1, xt+2) = ess infxt+2∈L0(Ft+2)Etht+2(xt+1, xt+2)

and the first statement follows from recursion and the fact that if C = ξ′α,α′ |α ∈ J , α′ ∈ J ′, then

ess infξ′∈C ξ′ = ess infα∈J (ess infα′∈J ′ ξ

′α,α′).

Let x ∈ N . Theorem 6.2 gives, for any t,

Eht−1(xt−1) = infxt∈L0(Ft)

Eht(xt−1, xt).

Since ht−1 is bounded from below, Eht−1 = Eht−1 on N t−1 and thus

infxt−1∈N t−1

Eht−1(xt−1) = infxt∈N t

Eht(xt).

By induction, inf (SP ) = infxt∈N t Eht(xt). We have

xt ∈ argminxt∈N t

Eht(xt)

if and only if

xt−1 ∈ argminxt−1∈N t−1

Eht−1(xt−1) and xt ∈ argminxt∈L0(Ft)

Eht(xt−1, xt).

By the second part of Theorem 6.2,

argminxt∈L0(Ft)

Eht(xt−1, xt) = xt ∈ L0(Ft) | xt ∈ argmin

xt∈Rntht(x

t−1, xt) a.s..

By induction, x ∈ N is optimal if and only if (OC) holds.

In Sections 7.1 and 7.2, we study existence of solutions to (BE) and (SP )and extend Theorem 7.2 to settings where the objective h is not bounded frombelow.

34

Page 35: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

7.1 Conditional expectation of a normal integrand

We say that two set-valued mappings S and S are indistinguishable if thereexists a P -null set N such that S = S outside N . Normal integrands are saidto indistinguishable if their epigraphical mappings are so. Note that if h is a G-conditional expectation of h, then any h that is indistinguishable from h is alsoa G-conditional expectation of h. Uniqueness and equality of normal integrandsmeans indistinguishable.

For an integrable Rd-valued random variable X, EGX is defined componentwiseas the conditional exectation of X.

We say that a normal integrand h is L-bounded if there exist ρ,m ∈ L1 suchthat

h(x) ≥ −ρ|x| −m ∀x ∈ Rn. (L)

Lemma 7.3. A convex normal integrand h is L-bounded if and only if domEh∗∩L1 6= ∅.

Proof. Let v ∈ L1 such that Eh∗(v) <∞ is finite. By Fenchel’s inequality,

h(x, ω) ≥ x · v − h∗(v, ω) ≥ −|v||x| − h∗(v, ω),

so we may choose ρ = |v| and m = h∗(v)+. On the other hand, h ≥ −ρ| · | −mcan be written as (h + ρ| · |)∗(0) ≤ m. By Corollary ?? in the appendix, thismeans that

infv∈Rnh∗(v) + δB(v/ρ) ≤ m,

where the infimum is attained. By Corollary 5.13, there is a v ∈ L0 with |v| ≤ ρand h∗(v) ≤ m.

For an L-bounded normal integrand, EG [h(x)] is well-defined for every x ∈L∞(G). The following lemma shows that, in the definition of conditional expec-tation, for L-bounded normal integrands, it suffices to test with x ∈ L∞(G).

Lemma 7.4. For an L-bounded normal integrand h, a G-normal integrand h isthe unique G-conditional expectation of h if

h(x) = EG [h(x)] ∀x ∈ L∞(G).

Proof. If there is x ∈ L0(G) such that h(x) is quasi-integrable and h(x) 6=EG [h(x)], then, for ν large enough,

1|x|≤νh(1|x|≤νx) 6= 1|x|≤νEG [h(1|x|≤νx)] = EG [1|x|≤νh(1|x|≤νx)]

which is a contradiction. Uniqueness follows from Theorem 5.17.

Example 7.5. Let h(x, ω) = x · v(ω) for v ∈ L1. Then (EGh)(x, ω) = x ·(EGv)(ω).

35

Page 36: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 7.6. Let h be a Caratheodory function such that there exist x ∈ Rnand ρ ∈ L1 with h(x) ∈ L1 and

|h(x)− h(x′)| ≤ ρ|x− x′| ∀x, x′ ∈ Rn.

Then EGh exists, is unique and characterized by

(EGh)(x) = EG [h(x)] ∀x ∈ Rn.

Moreover,|(EGh)(x− x′)| ≤ (EGρ)|x− x′| ∀x, x′ ∈ Rd.

Proof. The assumptions imply that h(x) ∈ L1 for all x ∈ Rd. Let D be acountable dense set in Rd and, for each x ∈ D, let h(x) be the conditionalexpectation of h(x). There is a P -null set N such that, outside N ,

|h(x)− h(x′)| ≤ (EGρ)|x− x′|

for each x, x′ ∈ D. Outside N , h extends to whole Rd by continuity while on Nwe set h = 0.

For a simple x =∑Nn=1 x

n1An , where An ∈ G form a disjoint partition of Ω and

xn ∈ D, we have

EG [h(x)] = EG [h(

N∑n=1

xn1An)] = EG [

N∑n=1

1Anh(xn)] =

N∑n=1

1An h(xn) = h(x)

Any x ∈ L∞(G) is a pointwise limit of such simple random variables boundedby ‖x‖, so dominated convergence for conditional expectation and scenariowisecontinuity of h and h imply that EG [h(x)] = h(x). Thus Lemma 7.4 implies theclaim.

Exercise 7.1.1. Find a solution to (BE) and a solution of the correspondingstochastic optimization problem when T = 0, n = 1

h(x, ω) = V (x0(s1 − s0)),

where si have standard normal distributions, s0 ∈ F0 (in particular, F0 is nottrivial), s1 is independent of F0, and

V (x) =

exp(x)− 1 if x ≤ 0,

x if x > 1.

Lemma 7.7. Let h1 and h2 be L-bounded normal integrands with h1 ≤ h2.Then EGh1 ≤ EGh2 whenever the conditional expectations exist.

Proof. For any x ∈ L∞(G), h1(x) and h2(x) are quasi-integrable, so h1 ≤ h2

implies EGh1(x) ≤ EGh2(x) and the result follows from Theorem 5.17.

36

Page 37: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

The following result is a monotone convergence theorem for conditional expec-tations of integrands.

Theorem 7.8. Let (hν)∞ν=1 be a nondecreasing sequence of L-bounded normalintegrands and

h = supνhν .

If each EGhν exists, then EGh exists and

EGh = supνEGhν .

Proof. For any x ∈ L∞(G) and α ∈ L∞(G;R+), monotone convergence andLemma 7.7 imply that

E[αh(x)] = E[α supνhν(x)] = sup

νE[αhν(x)]

= supνE[αEG [hν(x)]] = E[α sup

νEG [hν(x)]].

Thus supν EGhν = EGh.

The following gives existence and uniqueness of the conditional expectation ofgeneral L-bounded normal integrand.

Theorem 7.9. An L-bounded normal integrand has a unique conditional expec-tation and that is L-bounded as well.

Proof. Let h be an L-bounded normal integrand. Assume first that h ≤ n forsome constant n > 0. By Corollary 5.11,

hν(x, ω) := infx′h(x′, ω) + νρ(ω)|x− x′|

form a nondecreasing sequence of Caratheodory functions increasing pointwiseto h. By Theorems 7.6 and 7.8, EGh exists. To remove the assumption h ≤ n,consider the nondecreasing sequence hn(x) := minh(x), n and apply Theo-rem 7.8 again.

Corollary 7.10. Assume that S is a closed random set. Then EGδS = EδS,

where S is a unique closed random set such that L0(G; S) = L0(G;S).

Proof. By Theorem 7.9, the normal integrand h = δS has a unique conditionalexpectation EGh. Since the conditional expectation of a 0,+∞-valued randomvariable is 0,+∞-valued as well, we have, for every x ∈ L0(G) and a ≥ 0, thatif EGh(x) ≤ a, then EGh(x) = 0. Hence, by Theorem 5.17, lev≤a(EGh) =

lev≤0(EGh) for all a ≥ 0, so EGh is the indicator of its domain S and the claimfollows from the definition of EGh.

In the above lemma, S is known as the G-measurable core of S.

37

Page 38: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 7.11 (Tower property). Assume that h is an L-bounded normal in-tegrand, and G is a sub-σ-algebra of G. Then

EG(EGh) = EGh.

Proof. By Theorem 7.9, EGh exists and is L-bounded, so the result follows fromthe usual tower property of conditional expectation and Lemma 7.4.

Lemma 7.12. Let ξ1, ξ2 ∈ L0(F ;R) be quasi-integrable. Then

1. EG [ξ1 + ξ2] = EG [ξ1] + EG [ξ2] if ξ1, ξ2 satisfy any of the conditions inLemma 6.7.

2. EG [ξ1ξ2] = ξ1EG [ξ2] if ξ1 ∈ L∞(G).

Proof. Let α ∈ L∞+ (G). To prove 1, Lemma 6.7 gives

E[α(ξ1 +ξ2)] = E[αξ1]+E[αξ2] = E[αEGξ1]+E[αEGξ2] = E[α(EGξ1 +EGξ2)].

The second claim is clear.

Theorem 7.13 (Conditional expectation in operations). Let h, h1 and h2 beL-bounded normal integrands.

1. h1 + h2 is L-bounded and EG(h1 + h2) = EGh1 + EGh2.

2. If α ∈ L∞+ (G), then αh is L-bounded and EG(αh) = αEGh.

3. If M : Ω × Rn → Rd is a G-measurable Caratheodory mapping such that|M(x)| ≤ α|x|+ b for α, β ∈ L0 with ρα and ρβ integrable, then h1M isL-bounded and EG(hM) = (EGh)M .

Proof. For any x ∈ L∞(G), h1(x)− and h2(x)− are integrable, so 1 follows fromLemmas 7.4 and 7.12. The second follows similarly from Lemmas 7.4 and 7.12.To prove , assumptions imply that hM satisfies the assumption of Theorem 7.9and that h(M(x) is quasi-integrable for every x ∈ L∞. Thus Theorem 7.9implies EG(h(M(x)) = (EGh)(M(x)).

Next we focus on convex normal integrands.

Theorem 7.14. For a convex L-bounded normal integrand h, EGh is a convexL-bounded normal integrand. If h is positively homogeneous, so is EGh.

Proof. Exercise.

38

Page 39: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 7.15. Let h be a convex normal integrand and F a K-convex G-normal function such that (5.1) holds and (−K) ∩ u | h∞(u, ω) ≤ 0 is linearalmost surely, and there exists y ∈ domEh∗ ∩ L1 such that y · F is L-bounded.Then hF is an L-bounded convex normal integrand,

EG(hF ) = (EGh)F

and EG(hF ) is convex.

Proof. By Fenchel’s inequality,

h(F (x, ω), ω) ≥ y(ω) · F (x, ω)− h∗(y),

so the composition is L-bounded and the claim follows from Theorem 7.9.

Theorem 7.16. Assume that h is a convex normal integrand such that thereexists x ∈ domEh ∩ L0(G) and v ∈ domEh∗ ∩ L1 with (x · v)− ∈ L1. Then

EG(h∞) = (EGh)∞.

Proof. The difference quotients

hλ(x′, ω) :=h(λx′ + x(ω), ω)− h(x(ω), ω)

λ

define a nondecreasing sequence of integrands (hλ)∞λ=1 with pointwise limit h∞.Fenchel’s inequality gives

hλ(x′, ω) ≥ x′ · v(ω) + x · v(ω)− h∗(v(ω), ω))− h(x(ω), ω),

so Theorems 7.8 and 7.13 imply the claim.

A G-conditional expectation of an F-measurable set-valued mapping S : Ω ⇒ Rnis a G-measurable closed-valued mapping EGS such that

L1(G;EGS) = clEGv | v ∈ L1(F ;S),

where the closure is in L1. The following lemma gives the existence.

Lemma 7.17. Assume that S is closed convex-valued mapping and admits atleast one integrable selection. Then EGS exists, is unique, closed-convex valuedand characterized by

δEGS = (EGσS)∗.

Proof. By Theorem 7.14, EGσS is a positively homogeneous convex normal in-tegrand. By Theorem 5.23 and ??, its conjugate is the indicator function ofsome G-measurable closed convex-valued Γ.

39

Page 40: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Let C = EGv | v ∈ L1(F ;S). For every x ∈ L∞(G),

σC(x) = EσS(x) = E[EGσS(x)] = E[σΓ(x)] = σL1(Γ)(x),

where the first and the last equality follow from Theorem 6.2 and the secondfrom Theorem 7.20. By the biconjugate theorem in Section ??, clC = L1(Γ),so Γ = EGS by definition.

Lemma 7.18. Assume that h is a convex normal integrand such that thereexists x ∈ domEh ∩ L0(G) and v ∈ domEh∗ ∩ L1 with (x · v)− ∈ L1. Then

H(x, α, ω) :=

αh(x/α, ω) if α > 0,

h∞(x, ω) if α = 0,

+∞ otherwise

is an L-bounded convex normal integrand with H∗ = δepih∗ and

EGH(x, α, ω) =

αEGh(x/α, ω) if α > 0,

(EGh)∞(x, ω) if α = 0,

+∞ otherwise

Proof. Since v ∈ domEh∗ ∩ L1, H is L-bounded.

Assume for a contradiction that there exists (x, α) ∈ L0(G)+ ×L0(G) such thatEGσepih∗(x, α) 6= σepi(EGh)∗(α, x). Let

Aν := ω | α(ω) ∈ 0 ∪ [1/ν, ν], |x(ω)| ≤ ν.

Now L-boundedness of h gives

EG [σepih∗(x, α)] = σepi(EGh)∗(α, x) on Aν .

Since P (Aν) 1, this leads to a contradiction.

For a convex normal integrand h, the mapping EG epih is an epigraph of aG-normal integrand. We denote this G-normal integrand by Gh and call it theepi-conditional expectation of h.

Theorem 7.19. Assume that h is a convex normal integrand such that thereexists x ∈ domEh ∩ L0(G) and v ∈ domEh∗ ∩ L1 with (x · v)− ∈ L1. Then

(EGh)∗ = G(h∗).

Proof. By assumption, there exists v ∈ L1 such that (v, h∗(v)+) is an integrableselection of epih∗. Applying Lemma 7.17 to epih∗, we get that EG epih∗ existsand is characterized by

δEG epih∗ = (EGσepih∗)∗.

Thus the claim follows from Lemma 7.18.

40

Page 41: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Exercise 7.1.2. Give an example of a convex normal integrand h for which

EGh 6= Gh.

When h is bounded from below, then Eh = E(EGh) on L0(G) and, by Fatou’slemma, Eh is lsc on L0(G). More generally, we have the following.

Theorem 7.20. Assume that h is an L-bounded normal integrand. Then Eh =E(EGh) on domEh ∩ L0(G) and

infx∈L0(G)

E(EGh)(x) = infx∈L0(G)

Eh(x).

If Eh is proper and lsc on L0(G), then E(EGh) = Eh on L0(G) and, in partic-ular,

argminEh ∩ L0(G) = L0(G; argminEGh).

Proof. Let x ∈ domEh ∩ L0(G). Then h(x) is quasi-integrable, so Eh(x) =E[EGh(x)] = E(EGh)(x). Thus Eh = E(EGh) on domEh ∩ L0(G). Let x ∈domE(EGh) ∩ L0(G). We have

h(x1|x|≤ν + x1|x|>ν)− ≤ maxh(x1|x|≤ν)

−, h(x)−,

where the right side is integrable when h is L-bounded and Eh is proper. Thus

Eh(x1|x|≤ν + x1|x|>ν) = E(EGh)(x1|x|≤ν + x1|x|>ν).

Since

(EGh)(x1|x|≤ν + x1|x|>ν)+ ≤ max(EGh)(x)+, (EGh)(x)+,

where the right side in integrable, Fatou’s lemma gives

lim supν→∞

Eh(x1|x|≤ν + x1|x|>ν) ≤ E(EGh)(x),

proving the second claim. If Eh is proper and lsc on L0(G), the last in-equality gives Eh ≤ E(EGh) on L0(G). In this case, argminEh ∩ L0(G) =argminE(EGh) ∩ L0(G), where the latter equals L0(G; argminEGh) by Theo-rem 6.2.

Lemma 7.21. Assume that h is a convex normal integrand such that there existv ∈ L∞(G)⊥, m ∈ L1 and two different values of λ ∈ R such that

h(x, ω) ≥ λx · v(ω)−m(ω).

Then lev0 h∞ and lev0(EGh)∞ have the same G-measurable selections.

41

Page 42: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. If h∞(x) ≤ 0, then (EGh)∞(x) ≤ 0 by Theorem 7.16. We have

h∞(x) ≥ (λ′ − λ) maxx · v, 0 − λx · v,

so (EGh)∞ ≥ 0, by Theorem 7.16 and Theorem 7.13.1. If (EGh)∞(x) ≤ 0,then Eh∞(x) ≤ 0 by the previous theorem. In this case, the lower bound andLemma 6.5 imply x · v = 0, so h∞(x) ≥ 0 which together with Eh∞(x) ≤ 0gives h∞(x) = 0.

The next example shows that we can have E(EGh) 6= Eh if the lower semicon-tinuity assumption in Theorem ?? is omitted.

Example 7.22. Let h(x, ω) = x · v(ω) for v ∈ L∞(G)⊥ with v 6= 0 almostsurely. Then EGh ≡ 0, but Eh(x) = +∞ for some x ∈ L0(G) if L∞(G) is notfinite-dimensional while Eh is proper by Lemma 6.5. Hence Eh 6= E(EGh) onL0(G).

7.2 Existence of solutions

Recall that an adapted sequence (ht)Tt=0 of normal integrands ht : Rn0+···+nt ×

Ω→ R solves the generalized Bellman equations if

hT = ETh,

ht−1(xt−1, ω) := infxt∈Rnt

ht(xt−1, xt, ω),

ht−1 = Et−1ht−1

(BE)

for t = T, . . . , 1. The first part of the following result gives sufficient condi-tions for the existence of a solution. The second part uses the solution andTheorem 7.2 to prove the existence of a solution to (SP ).

Theorem 7.23. Assume that h is convex, (SP ) is feasible and that

1. there exist v ∈ N⊥, λ ∈ R and m ∈ L1 such that

h(x, ω) ≥ λx · v(ω)−m(ω), (7.1)

2. the setNt(ω) := xt ∈ Rnt | h∞t (xt, ω) ≤ 0, xt−1 = 0

is linear-valued whenever ht is a well-defined normal integrand.

Then (BE) has a unique solution (ht)Tt=0 with

ht(xt, ω) ≥ λxt · (Etvt)(ω)− (Etm)(ω) (7.2)

for all t = 0, . . . , T .

42

Page 43: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

If h is N⊥-bounded, then

inf (SP ) = infxt∈N t

Eht(xt)

for all t = 0, . . . , T , (SP ) has a solution x ∈ N with xt ⊥ Nt for every t =0, . . . , T and, moreover, the solutions of (SP ) are characterized by

xt(ω) ∈ argminxt

ht(xt−1(ω), xt, ω) P -a.s. t = 0, . . . , T. (OC)

Proof. Assume that ht+1 is well-defined and that

ht+1(xt+1, ω) ≥ λxt+1 · (Et+1vt+1)(ω)− (Et+1m)(ω).

By Theorem 5.21, ht is well-defined. Since Et+1vt+1t+1 = 0, ht is L-bounded, so

ht exists and is unique by Theorem 7.9. By the same theorem, the lower boundadmits a conditional expectation, so we get

ht(x, ω) ≥ λxt · (Etvt)(ω)− (Etm)(ω).

By induction, (BE) admits a unique solution satisfying (7.2).

Assume now that h is N⊥-bounded. Let

k(x, ω) := h(x, ω)− λx · v(ω).

and (kt) be the corresponding solution to (BE). By Theorem 8.19, we haveEk = Eh on N . Since Etv

tt = 0, Theorem 7.13 gives, recursively backward in

time, thatkt(x, ω) := ht(x

t, ω)− λxt · Etvt(ω).

Thus, Ekt = Eht on N t, by Theorem 8.19 again. Since k is bounded frombelow, Theorem 7.2 says that

inf (SP ) = infxt∈N t

Ekt(xt)

and that an x ∈ N is a solution of (SP ) associated with k if and only if

xt(ω) ∈ argminxt

kt(xt−1(ω), xt, ω) ∀t = 0, . . . , T P -a.s..

Since Etvtt = 0 we also have

argminxt

kt(xt−1(ω), xt, ω) = argmin

xt

ht(xt−1(ω), xt, ω)

which gives the optimality condition. Applying Theorem 5.21 recursively for-ward in time gives an optimal x ∈ N with xt ⊥ Nt.

The following example shows that assumptions 1 and 2 in Theorem 7.23 are notsufficient for the last statement of the theorem.

43

Page 44: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Example 7.24. Let α ∈ L2(F0) and v ∈ N⊥ be such that Eαv = ∞ andconsider

h(x, ω) :=1

2|x0 − α(ω)|2 + x0v(ω).

Here h0(x, ω) = 12 |x0 − α(ω)|2, so x ∈ N satisfies (OC) if and only if x0 = α.

For such x, neither h+(x) nor h−(x) is integrable, so E0h(x) does not exist.Hence (SP ) does not have a solution albeit (BE) has a unique solution andthere is a unique x satisfying (OC).

The following result shows that the linearity condition of Theorem 7.2 can bestated in terms of the original normal integrand h directly. We will use theconsequence of Theorem 5.17 that if C is closed-valued and G-measurable, thenit is almost surely linear-valued if and only if the set of its measurable selectorsis a linear space.

Theorem 7.25. Assume that h is an N⊥-bounded convex normal integrand.Then assumptions of Theorem 7.23 hold if and only if

L = x ∈ N|h∞(x(ω), ω) ≤ 0 a.s.

is a linear space.

Proof. We proceed by induction on T . When T = 0, Theorem 7.9 implies thathT is well defined and the second part of Lemma 7.21 gives

L = x ∈ N|h∞T (x(ω), ω) ≤ 0 a.s.,

which, by definition of NT , equals L0(FT ;NT ). Since NT is FT -measurable, byTheorem ??, the linearity of L is equivalent to NT being linear-valued. Let nowT be arbitrary and assume that the claim holds for every (T − 1)-period model.If L is linear then L′ := x ∈ N|x0 = 0, h∞(x(ω), ω) ≤ 0 a.s. is linear aswell. Applying the induction hypothesis to the (T − 1)-period model obtainedby fixing x0 ≡ 0, we get that ht is well defined and Nt is linear for t = T, . . . , 1.By Theorem 5.21 and Theorem 7.9, h0 is well defined. By Theorem 5.21 andthe second part of Lemma 7.21,

L0(F0;N0) = x0 ∈ L0(F0) |h∞0 (x0(ω), ω) ≤ 0 a.s.= x0 ∈ L0(F0) | h∞0 (x0(ω), ω) ≤ 0 a.s.= x0 ∈ L0(F0) | inf

x1

h∞1 (x0(ω), x1, ω) ≤ 0 a.s.

= x0 ∈ L0(F0) | ∃x ∈ N : x0 = x0, h∞1 (x1(ω), ω) ≤ 0 a.s.,

where the last equality follows by applying the last part of Theorem 5.21 to thenormal integrand h∞. Repeating the argument for t = 1, . . . , T , we get

L0(F0;N0) = x0 ∈ L0(F0) | ∃x ∈ N : x0 = x0, h∞T (x(ω), ω) ≤ 0 a.s.

= x0 ∈ L0(F0) | ∃x ∈ N : x0 = x0, h∞(x(ω), ω) ≤ 0 a.s.

= x0 ∈ L0(F0) | ∃x ∈ L : x0 = x0. (7.3)

44

Page 45: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

The linearity of L thus implies that of L0(F0;N0) which is equivalent to N0

being linear-valued.

Assume now that Nt is linear-valued for t = T, . . . , 0 and let x ∈ L. Expression(7.3) for L0(F0;N0) is again valid so, by linearity of N0, there is an x ∈ Lwith x0 = −x0. Since h∞ is sublinear, L is a cone, so that x + x ∈ L. Sincex0 + x0 = 0, we also have x + x ∈ L′. Since, by the induction assumption,L′ is linear and since L′ ⊆ L, we get −x − x ∈ L. Since L is a cone, we get−x = x− x− x ∈ L. Thus, L is linear.

For t = 0, the last claim follows directly from expression (7.3). The generalcase follows by applying this to the (T − t)-period model obtained by fixingxt−1 ≡ 0.

Given N := (xt)Tt=0 | xt ∈ L0(Ft;Rnt, n := n0 . . . nT we say that a mappingA : Rn × Ω→ Rn is adapted if Ax ∈ N for every x ∈ N .

Theorem 7.26. Let h be an N⊥-bounded convex normal integrand such that

h(x, ω) ≥ λx · v(ω)−m(ω)

for v ∈ N⊥, m ∈ L1 and two different values of λ ∈ R, and that

L = x ∈ N | h∞(x) ≤ 0

is a linear space. Then each h below is an N⊥-bounded convex normal integrandsuch that

L = x ∈ N | h∞(x) ≤ 0

is a linear space.

1. h(x, ω) := infh(x, ω) | A(ω)x = x as soon as h is a normal integrandand A : Rn × Ω → Rn is an adapted affine map such that v = AT v forsome v ∈ N⊥.

2. h(x, ω) := h(A(ω)x, ω), where A : Rn ×Ω→ Rn is an adapted affine mapsuch that AT v ∈ L1.

Proof. In 1, L = x ∈ N | ∃x ∈ L : Ax ∈ L, so L is a linear space. SinceAT v = v,

h(x) ≥ infxλx ·AT v −m | Ax = x ≥ λx · v −m,

so h is N⊥-bounded.

In 2, L = x ∈ N | Ax ∈ L, so L is a linear space. The lower bound becomes

h(x) ≥ λx ·AT v −m

By assumption, AT v ∈ L1 so that, for any x ∈ N∞, E[x ·AT v] = E[Ax · v] = 0by Lemma 6.5. Thus AT v ∈ N⊥ and h is N⊥-bounded.

45

Page 46: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

7.3 Kernels and conditioning

Given a measurable space (S,S), a mapping µ : (Ω,G)×S → R is a (probability)kernel if µ(A) := ω 7→ µ(ω,A) is G-measurable for each A ∈ S and A 7→ µ(ω,A)is a (probability) measure for each ω ∈ Ω. A probability kernel µ is a G-conditional regular distribution of η ∈ L0(Ω;S) if, for all B ∈ G and A ∈ S,

E[1B1η∈A] = E[1Bµ(A)].

This means that µ(A) is a G-conditional expectation of 1η∈A.

Theorem 7.27 (Disintegration). Let (S,S) be a measurable space, H : Rn ×S → R be B(Rn)⊗ S-measurable and η ∈ L0(Ω;S) with a G-conditional regulardistribution µ. The function

h(x, ω) :=

∫H(x, s)µ(ω, ds)

is B(Rn) × G-measurable. If H(x, η) is quasi-integrable for x ∈ L0(Ω,G;Rn),then, almost surely,

EG [H(x, η)] = h(x).

Proof. The claims holds if H = 1B1A for B ∈ S ′ and A ∈ S. By the monotoneclass theorem, the claims holds for every S ′×S-measurable nonnegative H. Thegeneral case follows by taking differences.

In the case where (S,S) = (Ω,F) and the coordinate mapping η(ω) = ω has aG-conditional regular distribution µ, Theorem 7.27 gives, almost surely

EG [X](ω) =

∫X(ω′)µ(ω, dω′)

for any quasi-integrable random variable X.

Theorem 7.28. Let h be an L-bounded normal integrand and assume that thecoordinate mapping η(ω) = ω has a G-conditional regular distribution µ. Then

h(x, ω) :=

∫h(x, ω′)µ(ω, dω′)

is a G-conditional expectation of h.

Proof. By Theorem 7.27,

EG [h(x)](ω) =

∫h(x(ω), ω′)µ(ω, dω′) = h(x(ω), ω)

for every x ∈ L∞(G). Let ρ(ω) :=∫ρ(ω′)µ(ω, dω′) for ρ from the definition of

L-boundedness of h. If

|h(x, ω)− h(x′, ω)| ≤ ρ(ω)|x− x′|,

46

Page 47: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

then|h(x, ω)− h(x′, ω)| ≤ ρ(ω)|x− x′|,

so h(·, ω) is continuous for every ω and, since h is measurable, h is a Caratheodoryintegrand and hence a normal integrand. Let

hν(x, ω) = infx′h(x′, ω) + νρ(ω)|x− x′|.

By above, each hν(x, ω) :=∫hν(x, ω′)µ(ω, dω′) is a normal integrand. Since

h = supν hν , h is a normal integrand by Theorem 7.8.

Theorem 7.29. Let H : Rn × Rm → R be a B(Rm)-normal integrand andη ∈ L0(F ;Rd) with a regular G-conditional distribution µ such that there existmeasurable M,R : Rm → R with M(η), R(η) ∈ L1, and

H(x, p) ≥ −R(p)|x| −M(p).

For h(x, ω) := H(x, η(ω)) we have

(EGh)(x, ω) = h(x, ω),

where h is the normal integrand

h(x, ω) :=

∫H(x, p)µ(ω, dp).

Proof. When h is a B(Rd)-normal integrand, Theorem 7.27 gives

EG [H(x, η)] =

∫H(x, p)µ(ω, dp) = h(x, ω)

for every x ∈ L∞(G). Denoting ρ(ω) :=∫R(p)µ(ω, dp), if

|H(x, p)−H(x′, p)| ≤ R(p)|x− x′|,

then|h(x, ω)− h(x′, ω)| ≤ ρ(ω)|x− x′|,

so h(·, ω) is continuous for every s ∈ Rd and, since h is measurable, h is aCaratheodory integrand and hence a normal integrand. Let

Hν(x, p) = infx′H(x′, p) + νR(p)|x− x′|.

By above, each hν(x, ω) :=∫Hν(x, p)µ(ω, dp) is a normal integrand. Since

h = supν hν , h is a normal integrand by Theorem 7.8.

47

Page 48: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Given ξ ∈ L0(Rd) and η ∈ L0(Rm), let ν be a probability kernel on Rd×B(Rm)such that Eσ(ξ)[1η∈B ] = ν(ξ,B) for all B ∈ B(Rn) almost surely. The kernel νis unique L(ξ)-a.e. and called the regular ξ-conditional distribution of η. Givena measurable R : Rm → R with R(η) ∈ L1, there exists a version of ν such that

R(s) :=

∫R(p)ν(s, dp) <∞ for all s.

Indeed, R(s) :=∫R(p)ν(s, dp) < ∞ L(ξ)-a.e., so it suffices to redefine ν to be

zero on a L(ξ)-null Borel set that contains s | R(s) =∞.

Theorem 7.30. Let H : Rn × Rm → R be a B(Rm)-normal integrand, η ∈L0(F ;Rd) such that there exist measurable M,R : Rm → R with M(η), R(η) ∈L1 and

H(x, p) ≥ −R(p)|x| −M(p)

and let ν be a regular ξ-conditional distribution of η for which

R(s) :=

∫R(p)ν(s, dp) <∞ for all s.

For h(x, ω) := H(x, η(ω)) we have

(Eσ(ξ)h)(x, ω) = H(x, ξ(ω)),

where H is the normal integrand

H(x, s) :=

∫H(x, p)ν(s, dp).

Proof. Theorem 7.27 gives

EG [H(x, η)] =

∫H(x, p)ν(ξ, dp) = H(x, ξ)

for every x ∈ L∞(σ(ξ)), so it suffices to prove that H is a B(Rd)-normal inte-grand. If

|H(x, p)−H(x′, p)| ≤ R(p)|x− x′|,

then|H(x, s)− H(x′, s)| ≤ R(s)|x− x′|,

so H(·, s) is continuous for every s ∈ Rd and, since H is measurable, H is aCaratheodory integrand and hence a normal integrand. Let

Hν(x, p) = infx′H(x′, p) + νR(p)|x− x′|.

By above, each Hν(x, s) :=∫Hν(x, p)ν(s, dp) is a normal integrand. Since

H = supν Hν , H is a normal integrand by Theorem 7.8.

48

Page 49: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Given G′,H ⊂ F , the σ-algebras G′ and G are H-conditionally independent if

EH[1A′1A] = EH[1A′ ]EH[1A]

for every A′ ∈ G′ and A ∈ G such that the conditional expectations exist.

Lemma 7.31. For σ-algebras G′, G and H, the following are equivalent:

1. G′ and G are H-conditionally independent

2. EH[w′w] = EH[w′]EH[w] for every w′ ∈ L1(G′) and w ∈ L∞(G).

3. EG∨H[w′] = EH[w′] for every w′ ∈ L1(G′).

Proof. The first implies the second by two applications of the monotone classtheorem. When the second holds, we have, for any w′ ∈ L1(G′), A ∈ G andB ∈ H,

E[EH[w′]1A∩B ] = E[EH[w′1A]1B ] = E[w′1A1B ] = E[EG∨H[w′]1A∩B ],

and, by the monotone class theorem, this extends from sets of the form A ∩ Bto any set in G ∨H. Thus the second implies the third. Assuming the third, wehave, for A′ ∈ G′ and A ∈ G,

EH[1A1A′ ] = EH[EG∨H[1A1A′ ]] = EH[1AEG∨H1A′ ]

= EH[1AEH1A′ ] = EH[1A]EH[1A′ ],

so the first holds.

A random variable w is H-conditionally independent of G if σ(w) is so. Likewise,we say that a normal integrand is H-conditionally independent of G if σ(h) isH-conditionally independent of G. Here σ(h) is the smallest σ-algebra underwhich epih is measurable.

Theorem 7.32. Let h be an L-bounded normal integrand H-conditionally in-dependent of G. Then EG∨Hh = EHh. In particular, if h is independent of G,then EGh is deterministic.

Proof. Assume first that h satisfies the assumptions of Theorem 7.6. Then EGhis characterized by

EG(h(x)) = (EGh)(x) ∀x ∈ Rd

and likewise for EG∨Hh. Thus EG∨Hh = EHh, by Lemma 7.31. The first claimnow follows using Lipschitz regularizations as in the proof of Theorem 7.9. Thesecond claim follows by taking H the trivial σ-algebra.

49

Page 50: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

7.4 Examples

Example 7.33 (Financial mathematics continued). In the optimal investmentproblem in Example 2.2, we have

h(x, ω) = V

(u(ω)−

T−1∑t=0

xt ·∆st+1(ω)

)+

T−1∑t=0

δDt(xt, ω).

Here h is N⊥-bounded if there exists a P -absolutely continuous martingalemeasure Q of the price process s such that, for y := dQ/dP , yu ∈ L1 andEV ∗(λiy) <∞ for two different λi ∈ R. Indeed, Fenchel’s inequality then gives

h(x) ≥ −λiyT−1∑t=0

xt ·∆st+1 − λiyu− V ∗(λiy),

so h is N⊥-bounded with vt := y∆st+1 and

m = maxλ1yu+ V ∗(λ1y), λ2yu+ V ∗(λ2y).

Exercise 7.4.1. In the setting of Example 7.33, assume that V is not a constantfunction (i.e., V is a convex nondecreasing non-constant extended real-valuedfunction on the real line) and that there are no portfolio constraints (Dt is thewhole space for all t). Show that the linearity condition in Theorem 7.25 is

equivalent to the no-arbitrage condition: the terminal wealth∑T−1t=0 xt ·∆st+1 is

nonnegative almost surely if and only if it is zero almost surely.

Exercise 7.4.2. Assume that V is a finite differentiable convex function on thereal line. Show that

V∞(x) =

v+x if x ≥ 0,

v−x if x < 0,

where v+ := supx V′(x), v− := infx V

′(x) and V ′ is the derivative of V .

Exercise 7.4.3 (Variance optimal hedging). Let s = (st)Tt=0 be an Rd-valued

(Ft)Tt=0-adapted stochastic process, u ∈ L0(Ω,F , P ;R) and consider the problemof minimizing

E

(α+

T−1∑t=0

zt ·∆st+1 − u

)2

over α ∈ R and Ft-measurable Rd-valued functions zt. Here α is interpreted asan initial value of a self-financing trading strategy where zt is the portfolio ofrisky assets held over period [t, t+ 1].

Show that optimal solutions exists.

The remaining examples are concerned with the dynamic programming princi-ple.

50

Page 51: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 7.34 (Problems of Bolza). Assume that

h(x, ω) =

T∑t=0

Kt(xt−1,∆xt, ω) + k(xT , ω)

for some fixed initial state x−1 and Ft-measurable convex normal integrands Kt

such that

1. There exists two different values of λ ∈ R and a, b ∈ N 1 with

Kt(xt−1,∆xt) ≥ λ(at · xt−1 + bt · xt)−mt,

k(xT , ω) ≥ λaT+1 · xT −mT+1

and Etat+1 + bt = 0.

2. x ∈ N |∑Tt=0K

∞t (xt−1,∆xt) + k∞(xT ) ≤ 0 is a linear space.

Then the functions Vt : Rnt × Ω→ R defined by

VT = ET k,

Vt−1(xt−1, ω) = infxt∈Rnt

Kt(xt−1,∆xt, ω) + Vt(xt, ω),

Vt−1 = Et−1Vt−1,

(7.4)

are L-bounded convex normal integrands,

Nt(ω) = xt ∈ Rnt | K∞t (0, xt, ω) + V∞t (xt, ω) ≤ 0

are linear-valued,

inf (SP ) = infxt∈N t

E

[t∑

s=0

Ks(xs−1,∆xs) + Vt(xt)

]t = 0, . . . , T,

an x ∈ N is optimal if and only if

xt(ω) ∈ argminxt∈Rnt

Kt(xt−1(ω),∆xt, ω) + Vt(xt, ω) P -a.s. t = 0, . . . , T,

and there is an optimal solution x ∈ N with xt ⊥ Nt for every t = 0, . . . , T .

Proof. Summing up the lower bounds,

h(x) ≥ λT∑t=0

xt · (at+1 + bt)−T+1∑t=0

mt

so h is N⊥-bounded. By Theorems 7.25 and 7.23, it suffices to show that thesolution of (BE) satisfies

ht(xt, ω) =

t∑s=0

Ks(xs−1,∆xs, ω) + Vt(xt, ω) (7.5)

51

Page 52: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

for every t = 0, . . . , T . For t = T , (7.5) is obvious since VT = ET k by definition.

Assuming that (7.5) holds for t and that Vt(xt) ≥ xt · Etat+1 − Et∑T+1s=t+1ms,

we get

ht−1(xt−1, ω) = infxt∈Rnt

ht(xt−1, xt, ω)

=

t−1∑s=0

Ks(xs−1,∆xs, ω) + infxt∈Rnt

Kt(xt−1,∆xt, ω) + Vt(xt, ω).

The lower bounds give

Vt−1(xt−1) ≥ xt−1 · at −mt − EtT+1∑s=t+1

ms,

so Vt−1 is L-bounded, while, by Theorem 5.21, the linearity of Nt implies thatVt−1 is a well-defined convex normal integrand. We obtained

ht−1(xt−1, ω) =

t−1∑s=0

Ks(xs−1,∆xs, ω) + Vt−1(xt−1, ω).

Since for s = 0, . . . , t− 1, Ks is Ft−1-measurable and L-bounded, Theorem 7.13gives

ht−1(xt−1, ω) =

t−1∑s=0

Ks(xs−1,∆xs, ω) + Vt−1(xt−1, ω),

so the claim follows by induction on t.

Corollary 7.35 (Stochastic control continued). Consider the problem

minimize E

[T∑t=0

Lt(Xt, Ut)

]over (X,U) ∈ N ,

subject to ∆Xt = AtXt−1 +BtUt−1 + ut

(SC)

and assume that

1. There exists two different values of λ ∈ R and a, b ∈ N 1 with Etat+1+bt =0, ATt bt, B

Tt bt and bt · ut integrable and

Lt−1(Xt−1, Ut−1) ≥ λ(at ·Xt−1 + bt ·Xt)−mt,

LT (XT , UT ) ≥ λaT+1 ·XT −mT+1

for all feasible (X,U).

2. (X,U) ∈ N |∑Tt=0 L

∞t (Xt, Ut) ≤ 0, ∆Xt = AtXt−1 + BtUt−1 is a

linear space.

52

Page 53: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Then functions Jt : Rlt ×Ω→ R and It : Rlt+mt ×Ω→ R defined recursively by

IT+1 := 0

Jt(Xt) := infUt∈Rmt

(Lt + EtIt+1)(Xt, Ut),

It(Xt−1, Ut−1) := Jt(Xt−1 +AtXt−1 +BtUt−1 + ut),

are L-bounded convex normal integrands,

Nt(ω) := Ut | L∞t (0, Ut, ω) + (EtI∞t+1)(0, Ut, ω) ≤ 0

are linear-valued, the optimum value of (SC) coincides with that of

minimize E

[t∑

s=0

Ls(Xs, Us) + Jt(Xt)

]over (Xt, U t) ∈ N t,

subject to ∆Xs = AsXs−1 +BsUs−1 + us s = 0, . . . , t

for all t = 0, . . . , T , an U ∈ N is optimal if and only if

Ut(ω) ∈ argminUt∈Rmt

Lt(Xt, Ut, ω) + (EtIt+1)(Xt, Ut, ω)

and there is an optimal solution (X,U) ∈ N with Ut ⊥ Nt for every t = 0, . . . , T .

Proof. Omitted.

Exercise 7.4.4. Prove Corollary 7.35 in the case when Lt ≥ m for all t forsome m ∈ L1.

Theorem 7.36. In the setting of Theorem 7.34, assume that Kt+1 is Kt-conditionally independent of Ft and k is KT -conditionally independent of FT .Then Vt is σ(Kt+1)-measurable and Vt = Eσ(Kt)Vt. In particular, if (Kt)

Tt=0

and k are mutually independent, then each Vt is deterministic.

Proof. By Theorem 7.32, VT = ET k is σ(KT )-measurable. If Vt+1 is σ(Kt+1)-measurable, then Vt is so as well, and Vt = Eσ(Kt)Vt, by Theorem 7.32. Thusthe claim follows by induction.

A stochastic process (ξt)Tt=0 is (Ft)Tt=0-Markov if ξt+1 is ξt-conditionally inde-

pendent of Ft. For each t, let νt be the regular ξt-conditional distribution νt ofξt+1. For a B(Rdt × Rdt+1)-normal integrand V : Rn × Rdt × Rdt+1 → R, wedefine Eξt V : Rn × Rdt → R by

(Eξt V )(x, st) :=

∫V (x, st, st+1)dνt(st, st+1).

This means that (Eξt V )(x, ξt(ω)) is the σ(ξt)-conditional expectation of

V (x, ξt(ω), ξt+1(ω)).

Thus we get the following consequence of Theorem 7.34.

53

Page 54: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 7.37 (Problems of Bolza under Markovian dynamics). In the settingof Theorem 7.34, assume that

Kt(xt−1,∆xt, ω) = Kt(xt−1,∆xt, ξt−1(ω), ξt(ω))

k(xT , ω) = k(xT , ξT (ω))

for an (Ft)Tt=0-Markov process ξ and convex normal integrands Kt and k. ThenVt(xt, ω) = Vt(xt, ξt(ω)), where Vt are given by the recursion

VT = k,

Vt−1(xt−1, st−1, st) = infxtKt(xt−1, xt, st−1, st) + Vt(xt, st),

Vt−1 = Eξt−1 Vt−1.

Corollary 7.38 (Stochastic control under Markovian dynamics). In the settingof Corollary 7.35, assume that

Lt(XT , UT , ω) = Lt(XT , UT , ξt(ω))

for an (Ft)Tt=0-Markov process ξ, normal integrands Lt and Borel measurableAt, Bt, ut. Then Jt(Xt, ω) = Jt(Xt, ξt(ω)), where

JT (XT , sT ) = infUT∈RmT

LT (XT , UT , sT ),

It+1(Xt, Ut, st+1) := Jt+1(Xt +At+1(st+1)Xt +Bt+1(st+1)Ut + ut+1(st+1), st+1),

Jt(Xt, st) = infUt∈Rmt

(Lt + Eξt It+1)(Xt, Ut, st).

Example 7.39 (Financial mathematics continued). We can write the optimalinvestment problem as a stochastic control problem by defining the rate of returnprocess Rit := ∆sit/s

it−1 of the price process s so that the problem becomes

minimize EV (u−XT ) over U ∈ ND,

subject to ∆Xt = Rt+1 · Ut,d∑j=0

U jt −Xt.

where U is the portfolio process in terms of wealth invested (U it := sitxit) and

D(ω) = U | U it ∈ Dt(ω)/Sit(ω). We assume that ξ = (S,R) is a Markov pro-cess, the claim is of the form u(ω) = c(ST ) and that each Dt is σ(ξt)-measurableso that D(ω) = D(ξt(ω)) for some measurable closed convex-valued D. This fitsCorollary 7.35 with

Lt(Xt, Ut, ξt) = δ0(

d∑j=0

U jt −Xt) + δDt(ξt)(Ut)

LT (XT , ξT ) = V (c(ST )−XT ).

54

Page 55: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Assuming that there are no portfolio constraints Dt, the recursion in Corol-lary 7.35 becomes

JT (XT , ξT ) = V (c(sT )−XT ),

Jt(Xt, ξt) = infUt∈Rd+1

EξtJt+1(Xt +Rt+1 · Ut, st+1)) |d∑j=0

U jt = Xt.(7.6)

Using the constraint to substitute out U0, we get

Jt(Xt, ξt) = infUt∈Rd

EξtJt+1((1 +R0t+1)Xt +

d∑j=1

(Rjt+1 −R0t+1)U jt , st+1).

If the returns Rt+1 are independent of (St, Rt), we see, using the fact that st+1 =st(1 +Rt+1), that Jt does not depend on Rt. If in addition the claim c(ST ) = 0and R0 = 0, we see by induction that Jt are independent of ξt and

JT (XT ) = V (−XT ),

Jt(Xt) = infUt∈Rd

EξtJt+1(Xt +Rt+1 · Ut).

In addition to European type claims c(sT ) that depend only on the terminalvalues sT of the underlyings s, many path-dependent claims can be handledsimilarly. For example, an Asian c(

∑t st) can be expressed using a new process

At =∑tt′=0 st with dynamics At+1 = At + (1 +Rt+1)st, the running maximum

Mt = supt′≤t st′ with Mt+1 = max(Mt, (1 + Rt+1)st). The joint process ξ =(s,A,M,R) will be Markov as soon as R is so.

Exercise 7.4.5. In the setting of Example 7.39, assume that there are no port-folio constraints, Rt+1 are independent of (St, Rt), c(ST ) = 0 and that R0 = 0.

For V (u) = exp(u)− 1, show that

JT (XT ) = exp(−XT )− 1,

Jt(Xt) = exp(−Xt)αt+1 infUt∈Rd

Eξt exp(−Rt+1 · Ut) − 1

where αT = 1 and

αt = αt+1 infUt∈Rd

Eξt exp(−Rt+1 · Ut).

Note that here the optimal risky allocation (U1t , . . . , U

dt ) does not even depend

on the current wealth Xt.

On the other hand, if we make the change of variables pj = U j/X (proportionof wealth invested in j), (7.6) becomes

JT (XT ) = V (c(sT )−XT ),

Jt(Xt, ξt) = infp∈Rd+1

EξtWt+1(Xt(1 +Rt+1 · p), st+1)) |d∑j=0

pj = 1.

55

Page 56: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

For V (u) = − ln(−u), show that

JT (XT ) = − ln(XT ),

Jt(Xt) = − ln(Xt) + αt+1 + infp∈Rd+1

Eξt [− ln((1 +Rt+1 · p))] |d∑j=0

pj = 1,

where αT = 0 and

αt = αt+1 + infp∈Rd+1

Eξt [− ln((1 +Rt+1 · p)) |d∑j=0

pj = 1.

Likewise, for V (u) = −(−u)q for q ∈ (0, 1), show that

JT (XT ) = −(XT )q,

Jt(Xt) = −Xqαt+1 infp∈Rd+1

Eξt [−((1 +Rt+1 · p))q] |d∑j=0

pj = 1,

where αT = 1 and

αt = αt+1 infp∈Rd+1

Eξt [−((1 +Rt+1 · p))q] |d∑j=0

pj = 1.

Note that, for both the logarithmic and the power loss function, the proportionsof wealth invested pt do not even depend on the current wealth Xt.

56

Page 57: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

8 Conjugate duality

8.1 Convex conjugates

From now on, we assume that U and Y are vector spaces that are in separatingduality under the bilinear form

〈u, y〉.That the bilinear form is separating means that for every u 6= u′, there is y ∈ Ywith 〈u − u′, y〉 6= 0. On U the weak topology σ(U, Y ) is the weakest locallyconvex topology under which each

u 7→ 〈u, y〉

is continuous. That is, σ(U, Y ) is generated by sets of the form

u ∈ U | |〈u, y〉| < α

where α > 0 and y ∈ Y . Under σ(U, Y ), U is a locally convex topological space.The Mackey topology τ(U, Y ) is the strongest locally convex topology underwhich each continuous linear functional can be identified with an element of Y .The Mackey topology is generated by sets of the form

u ∈ U | supy∈K〈u, y〉 < 1 (8.1)

where K is convex symmetric and σ(Y,U)-compact.

Turning the idea around, when U is a locally convex topological vector space,a natural choice for Y is the dual space of continuous linear functionals on U .By Theorem 3.6, the bilinear form is separating. Especially for Banach spaces,σ(U, Y ) is called the weak topology and σ(Y,U) the weak∗-topology, and theMackey topology τ(U, Y ) coincides with the norm topology. We call both thesetopologies simply weak topologies, when the spaces in question are clear. WhenU = Rd, we always choose Y = Rd and the bilinear form as just the usual innerproduct.

Example 8.1. Recall that, for p ∈ [1,∞), the Lebesque space Lp := Lp(Ω,F , P )is a Banach space under the norm

‖u‖ := (E|up|)1/p.

For p > 1, its continuous dual can be identified with Lp′

for p′ satisfying 1/p+1/p′ = 1. For p = 1, we set p′ =∞, and the continuous dual of L1 is the spaceL∞ = L∞(Ω,F , P ) of essentially bounded random variables. For all p = [1,∞),the bilinear form between Lp and Lp

′is given by

〈q, c〉 := E[qc].

Note, however, than when L∞ is equipped with the essential supremum norm,its continuous dual is not L1 (but the space of finitely additive measures on(Ω,F)). In particular, τ(L∞, L1) is a weaker topology than the one generatedby the essential supremum norm.

57

Page 58: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Given an extended real-valued function g on U , its conjugate g∗ : Y → R is

g∗(y) := supu∈U〈u, y〉 − g(u).

The function g∗ is also known as Legendre-Fenchel transform, polar function, orconvex conjugate of g. Since g∗ is a supremum of lower semicontinuous functions,g∗ is a lower semicontinuous function on Y . The Fenchel inequality

g(u) + g∗(y) ≥ 〈u, y〉

follows directly from the definition of the convex conjugate. In the exercises, wewill familiarize ourselves with this transformation by calculating conjugates ofconvex functions defined on Rd.

The biconjugate of g is the function

g∗∗(c) = supy〈u, y〉 − g∗(y).

By the Fenchel inequality, we always have

g ≥ g∗∗.

The following biconjugate theorem is the fundamental theorem on convex con-jugates. The lower semicontinuous hull lsc g is a function defined via

epi(lsc g) := cl epi g.

Theorem 8.2 (Biconjugate theorem). Assume that g is a convex extended real-valued function on U such that (lsc g)(u) > −∞ for all u. Then lsc g = g∗∗,i.e.,

(lsc g)(u) = supy∈Y 〈u, y〉 − g∗(y).

In particular, if g is a lsc proper convex function, then g = g∗∗, i.e., g has thedual representation

g)(u) = supy∈Y〈u, y〉 − g∗(y).

Proof. Recall that the closure of convex set in a locally convex space is theintersection of all closed half spaces containing the set. Applying this to theepigraph of lsc g, we get that lsc g is the supremum of all affine continuousfunctions less than g, i.e.,

(lsc g)(u) = supy∈Y,α∈R

〈u, y〉 − α | 〈u, y〉 − α ≤ g(u).

Let y ∈ Y be fixed. Then u 7→ 〈u, y〉 − α is smaller than g if 〈u, y〉 − α ≤ g(u)for all u, i.e.

α ≥ supu〈u, y〉 − g(u) = g∗(y).

58

Page 59: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

We got that, if g∗(y) <∞, then the largest affine function less than g with theslope y is u→ 〈u, y〉−g∗(y), whereas if g∗(y) =∞, then there are no such affinefunctions. Since this is valid for any y, we get that u 7→ sup〈u, y〉 − g∗(y) isthe supremum of all affine functions less than g. Combining this with the firstparagraph of the proof, we get the result.

Exercise 8.1.1. A proper convex function is σ(U, Y )-lsc if and only if it isτ(U, Y )-lsc.

Given a set C ⊂ U , the function

σC(y) = supu∈C〈u, y〉

is known as the support function of C,

jC(u) := infλ>0λ | u ∈ λC

as the gauge of C, and the set

C := y | σC(y) ≤ 1

as the polar of C. Note that σC is the conjugate of the indicator function

δC(u) =

0 if u ∈ C,+∞ otherwise.

In the following theorem, clC = C is known as the bipolar theorem.

Theorem 8.3. If a convex set C contains the origin, then σC = jC , j∗C = δC

andclC = C.

Proof. Exercise.

Theorem 8.4. The set C ⊂ U is a Mackey neighborhood of the origin if andonly if y ∈ Y | σC(y) ≤ α is weakly compact for some α > 0. In this case,y ∈ Y | σC(y) ≤ α is weakly compact for all α. In particular, C is a Mackeyneighborhood of the origin if and only if C is weakly compact.

Proof. Let C be a Mackey neighborhood of the origin. By (8.1), K ⊂ C forsome convex weakly compact K for which

y ∈ Y | σC(y) ≤ α ⊂ y ∈ Y | σK(y) ≤ α = αK = αK.

Thus the closed set on the left side belongs to a weakly compact set and is thuscompact for all α > 0.

To prove the converse, fix α > 0 with αC = y ∈ Y | σC(y) ≤ α weaklycompact. The convex symmetric set K := co(αC∪(−αC)) is weakly compactas well (an exercise). By (8.1), K is a Mackey neighborhood of the origin. SinceC ⊂ K/α, we have αK ⊂ C = clC, so C is a neighborhood of the origin.

59

Page 60: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 8.5. Let g be a proper convex lower semicontinuous function on U .The following are equivalent:

1. g is bounded from above on a τ(U, Y )- neighborhood of u,

2. for every α ∈ R, y ∈ Y | g∗(y)− 〈u, y〉 ≤ α is σ(Y, U)-compact.

Here 1 implies 2 even when g is not lsc.

Proof. By translations, we may assume that u = 0 and g(0) = 0. To prove that1. implies 2., note that we have (see the exercises)

g(u) ≤ γ + δO(u) ∀ u ⇐⇒ g∗(y) ≥ σO(y)− γ ∀ y,

so, for any α ∈ R,

y ∈ Y | g∗(y) ≤ α ⊂ y ∈ Y | σO(y) ≤ α+ γ,

where the set on the right side is weakly compact when O is a Mackey neigh-borhood of the origin.

That 2. implies 1., we may again do translations so that g∗(0) = infy g∗(y) = 0;

the details are left as an exercise. Let γ > 0 and denote

K := y ∈ Y | g∗(y) ≤ γ.

If y /∈ K, we have

jK(y) := infλ>0λ | y ∈ λK

= infλ>1λ | y ∈ λK

= infλ>1λ | g∗(y/λ) ≤ γ

≤ infλ>1λ | g∗(y)/λ ≤ γ

= g∗(y)/γ.

If y ∈ K, we have jK(y) ≤ 1, so putting these together we get that g∗(y) ≥γjK(y)− γ. Conjugating, we get

g(u) ≤ δK(u/γ) + γ.

Therefore, g is bounded above in a neighborhood of the origin, since K is thepolar of a weakly compact set.

8.1.1 Exercises

Exercise 8.1.2. Let f be a convex lower semicontinous function on the realline. Convince yourself that, given a ”slope” v, f∗(v) is the smallest constant αsuch that the affine function x→ vx−α is majorized by f. What does this meangeometrically?

60

Page 61: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Exercise 8.1.3. Calculate the conjugates of the following functions on the realline:

1. f(x) = |x|

2. f(x) = δB(x), where B = x | |x| ≤ 1.

3. f(x) = 1p |x|

p, for p > 1.

4. V (x) = (eax − 1)/a.

Exercise 8.1.4. Let V be a nondecreasing convex function on the real line.Analyze V ∗ using the geometric idea from the first exercise.

1. Is V ∗ positive?

2. Is V ∗ zero somewhere?

3. Is V ∗ monotone?

4. Where is V ∗ finite?

5. Is V ∗ necessarily finite at the origin?

Hint: The answers depends on your choice of V .

8.1.2 Subgradients

A vector y ∈ Y is a subgradient of g at u ∈ dom g if

g(u′) ≥ g(u) + 〈u′ − u, y〉 ∀u′ ∈ U.

The subdifferential ∂g(u) is the set of all subgradients of g at u. Note that weavoided defining subgradients outside the domain.

Exercise 8.1.5. We have y ∈ ∂g(u) if and only if

g(u) + g∗(y) = 〈u, y〉.

We say that g is subdifferentiable at u if ∂g(u) 6= ∅.

Exercise 8.1.6. Assume that g is a differentiable convex function on the realline. Show that ∂g(u) = g′(u), the derivative of g at u. Give an exampleof a proper lsc convex extended real-valued function on the real line that is notsubdifferentiable at a point in its domain.

Theorem 8.6. Assume that g is proper and bounded from above in a neigh-borhood of u. Then ∂g(u) is nonempty and weakly compact, and the directionalderivative g′(u, ·) is the support function of ∂g(u).

61

Page 62: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. Exercise. Hint: show first that g′(u, ·) is bounded above in a neighbor-hood of the origin.

Theorem 8.7. For a proper convex g, we have (g∗)∞ = σdom g. If g is also lsc,then g∞ = σdom g∗ .

Proof. Exercise.

8.2 Conjugate duality in optimization

Now we turn our attention to convex optimization problems. Given a vectorspace X and a locally convec topological vector space U , assume that F is ajointly convex extended real-valued function on X × U such that F (x, ·) is lscfor all x. The value function

ϕ(u) := infx∈ X

F (x, u)

gives the optimal value of the convex minimization problem

minimize F (x, u) overx ∈ X (Pu)

as a function of u. The value function is always convex, so, when it is properand lower semicontinuous, the biconjugate theorem gives

infx∈X

F (x, u) = ϕ(u) = sup〈u, y〉 − ϕ∗(y).

The optimization problem (Pu) is called the primal problem and

maximize 〈u, y〉 − ϕ∗(y) over y ∈ Y (Du)

is called the dual problem. When ϕ is proper and lower semicontinuous, theiroptimal values coincide, and one says that there is no duality gap, or there isstrong duality between (Pu) and (Du). Even when ϕ is not lsc, the inequalityϕ ≥ ϕ∗∗ means that, at least, the dual problem gives lower bounds to theoptimal value of the primal problem. This is sometimes called as weak dualitybetween (Pu) and (Du).

A simple condition for the absence of duality gap and the existence of dualsolutions is given by continuity of ϕ.

Theorem 8.8. Assume that ϕ is proper and bounded from above in a neighbor-hood of u. Then the optimal values of (Pu) and (Du) coincide, and (Du) hasa solution.

Proof. Exercise.

When also X is a LCTVS and F is jointly lsc, we have the following symmetrybetween the primal and the dual formulations.

62

Page 63: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 8.9. Assume that F : X × U → R is proper lsc convex functionon LCTVSs X and U and let V be the dual space of X. The conjugate ofx 7→ F (x, u) is (lsc γu), where

γu(v) := infyF ∗(v, y)− 〈u, y〉.

The function ϕ is lsc at u if and only if γu is lsc at the origin.

Proof. Exercise.

Many fundamental theorems in duality theory correspond to minimax theoremsof convex-concave functions. The function L : X × Y → R defined by

L(x, y) := infu∈UF (x, u)− 〈u, y〉

is called the Lagrangian associated with F . It is a convex-concave function inthe sense that L(·, y) is convex for all y ∈ Y and L(x, ·) is concave for all x ∈ X.We have

infx∈X

L(x, y) = −ϕ∗(y)

andsupy∈YL(x, y) + 〈u, y〉 = F (x, u).

Denoting Lu(x, y) := 〈u, y〉+ L(x, y) we have the elementary inequalities

F (x, u) ≥ Lu(x, y) ≥ 〈u, y〉 − ϕ∗(y)

and thus

infx

supyLu(x, y) = F (x, u) ≥ sup

y〈u, y〉 − ϕ∗(y) = sup

yinfxLu(x, y).

If we have an equality above, the common value is called a saddle-value of Lu.A pair (x, y) ∈ X × Y is called a saddle-point of Lu if

Lu(x, y′) ≤ Lu(x, y) ≤ Lu(x′, y) ∀(x′, y′) ∈ X × Y.

This means that the pair (x, y) satisfies the Karush-Kuhn-Tucker (KKT) con-dition

0 ∈ ∂xL(x, y) u ∈ ∂y(−L)(x, y).

Theorem 8.10. Assume that F : X × U → R is jointly convex with F (x, ·) lscfor every x ∈ X. The following are equivalent:

1. inf(Pu) = sup(Du)

2. ϕ is lsc at u.

3. The saddle-value of Lu exists.

63

Page 64: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Furthermore, the following are equivalent:

4 x solves (Pu), y solves (Du) and inf(Pu) = sup(Du).

5 The pair (x, y) satisfies the KKT-condition.

Proof. Exercise.

Exercise 8.2.1. Let X = U = R. Find a lsc convex function F : X × U suchthat the primal and the dual problem have solutions but there is a duality gap.

8.3 Dual spaces of random variables

We assume that the space U is decomposable in the sense that

1Au+ 1Ω\Au′ ∈ U

whenever A ∈ F , u ∈ U and u′ ∈ L∞(Ω,F , P ;Rm). Note that decomposabilityof the spaces U and Y implies that the separation property holds automaticallyfor the bilinear form

〈u, y〉 := E[u · y]

Moreover, we have the following relations for relative topologies.

Lemma 8.11. If U and Y are decomposable, then L∞ ⊆ U ⊆ L1 and

σ(L1, L∞)|U ⊆ σ(U ,Y), σ(U ,Y)|L∞ ⊆ σ(L∞, L1),

τ(L1, L∞)|U ⊆ τ(U ,Y), τ(U ,Y)|L∞ ⊆ τ(L∞, L1).

Proof. Exercise.

The following is a corollary of Theorem 6.2.

Corollary 8.12 (Jensen’s inequality). Let h be a G-measurable convex normalintegrand such that Eh∗ is proper on EGY and that EGU ⊂ U and EGY ⊂ Y.Then

Eh(EGu) ≤ Eh(u)

for every u ∈ U .

Proof. Applying Theorem 6.2 twice, we get

Eh(EGu) = supY ∈Y(G)

E[y · EGu− h∗(y)]

= supY ∈Y(G)

E[y · u− h∗(y)]

≤ supY ∈Y

E[y · u− h∗(y)]

= Eh(u).

64

Page 65: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Theorem 8.13. Let U and Y be decomposable and h be a convex normal in-tegrand. Then Eh : U → R and Eh∗ : Y → R are conjugates of each other assoon as they are proper. Moreover, y ∈ ∂Eh(u) if and only if y ∈ ∂h(u) almostsurely.

Proof. Exercise.

8.4 Conjugate duality in stochastic optimization

We will study (SP ) within the conjugate duality framework by introducing anadditional parameter z ∈ L∞ := L∞(Ω,F , P,Rn) and the extended real-valuedconvex function F on L∞ × L∞ × U defined by

F (x, z, u) = Ef(x, u) + δN (x− z).

The space L∞ is in separating duality with L1 := L1(Ω,F , P ;Rn) under thebilinear form

〈z, p〉 := E[z · p].

We denote the associated optimum value function by

ϕ(z, u) := infx∈L∞

Ef(x, u) |x− z ∈ N.

The functionl(x, y, ω) := inf

u∈Rmf(x, u, ω)− u · y

is the ”scenariowise Lagrangian” associated with f .

Theorem 8.14. If Ef is proper on L∞ × U , then

L(x, p, y) =

+∞ if x /∈ dom1 F ,

E[l(x, y)− x · p] if x ∈ dom1 F and p ∈ N⊥,−∞ otherwise,

F ∗(v, p, y) = Ef∗(v + p, y) + δN⊥(p),

ϕ∗(p, y) = Ef∗(p, y) + δN⊥(p).

The KKT conditions mean that x ∈ N∞, p ∈ N⊥ and

p ∈ ∂xl(x, y) and u ∈ ∂y(−l)(x, y)

P -almost surely.

Proof. By definition,

L(x, p, y) = inf(z,u)∈L∞×U

F (x, z, u)− 〈z, p〉 − 〈u, y〉

= inf(z,u)∈L∞×U

E[f(x, u)− z · p− u · y] |x− z ∈ N

= inf(z′,u)∈L∞×U

E[f(x, u)− (x− z′) · p− u · y] | z′ ∈ N,

65

Page 66: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

so the expression for L follows from Theorem ??. When Ef is proper on L∞×U ,the interchange rule gives

F ∗(v, p, y) = supx∈N∞

〈x, v〉 − L(x, p, y)

= supx∈L∞,z′∈ L∞,u∈U

〈x, v〉 − E[f(x, u)− (x− z′) · p− u · y] | z′ ∈ N

= supx∈L∞,z′∈ L∞,u∈U

E[x · (v + p) + u · y − f(x, u)− z′ · p] | z′ ∈ N

= Ef∗(v + p, y) + δN⊥(p),

The KKT condition(0, p, y) ∈ ∂F (x, 0, u)

means that F (x, 0, u) + F ∗(0, p, y) = 〈u, y〉. Using the expressions for F andF ∗, this can be written as x ∈ N∞, p ∈ N⊥ and

Ef(x, u) + Ef∗(p, y) = E[x · p] + E[u · y].

Since f(x, u, ω) + f∗(p, y, ω) ≥ x · p+ u · y for all x, p ∈ Rn and u, y ∈ Rm, thisis equivalent to the given subdifferential condition.

The dual of (SP ) can be written as

minimize E[f∗(p, y)− u · y] over (p, y) ∈ N⊥ × Y. (Du)

Remark 8.15. In the deterministic setting, N⊥ = 0 so the dual problembecomes

minimize f∗(0, y)− u · y over y ∈ Rm (Du)

and the optimality conditions in Theorem 8.14 become

0 ∈ ∂xl(x, y),

u ∈ ∂y(−l)(x, y)

which are the classical KKT conditions.

Given a convex function H on L∞, we say that p ∈ L1 is a shadow price ofinformation (spi) for H if p is a subgradient at the origin of the function

φ(z) := infx∈L∞

H(x) + δN (x− z)

on L∞. In particular, if (p, y) ∈ ∂ϕ(0, u), then p is an spi of x 7→ Ef(x, u).

Lemma 8.16. A p ∈ L1 is an spi for H : L∞ → R if and only if p ∈ N⊥ and

infx∈N∞

H(x) = infx∈L∞

H(x)− 〈x, p〉.

66

Page 67: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Proof. The condition p ∈ ∂φ(0) can be written as

H(x′ + z) ≥ infx∈N∞

H(x) + 〈p, z〉 ∀z ∈ L∞, x′ ∈ N∞

or, equivalently,

H(z′)− 〈p, z′ − x′〉 ≥ infx∈N∞

H(x) ∀z′ ∈ L∞, x′ ∈ N∞,

which proves the claim.

Let

L0(x, y) := infuEf(x, u)− 〈u, y〉

=

+∞ if x /∈ dom1Ef,

E[l(x, y)] otherwise.

Theorem 8.17. The following are equivalent

1. (p, y) ∈ ∂ϕ(0, u),

2. y ∈ ∂uϕ(0, u) and p is an spi of L0(·, y),

3. p ∈ ∂zϕ(0, u) and y ∈ Y satisfies y ∈ ∂kp(u) almost surely, where

kp(u, ω) = infx∈Rnf(x, u, ω)− x · p(ω).

Proof. Exercise. Hint: Check the KKT conditions of the general setting for thefunction F (z, u) = ϕ(z, u).

8.5 Relaxation

In order to obtain existence of solutions and the absence of duality gap, we willextend the the space of strategies to the linear space N of all (not necessarilybounded) adapted strategies. The objective in the larger space is defined as thelower semicontinuous hull F of F on L0×L∞×U where L0 is endowed with thetopology of convergence in measure. We denote the associated value functionby

ϕ(z, u) = infx∈L0

F (x, z, u).

Taking the lsc hull does not affect the infimum of a function, so

ϕ∗(p, y) = supz,u〈z, p〉+ 〈u, y〉 − ϕ(z, u)

= supx,z,u〈z, p〉+ 〈u, y〉 − F (x, z, u)

= supx,z,u〈z, p〉+ 〈u, y〉 − F (x, z, u)

= ϕ∗(p, y).

67

Page 68: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

The following theorem gives a simple expression for F . More importantly, thetheorem gives that

ϕ(0, u) = infx∈N

Ef(x, u)

which is precisely (SP ).

Theorem 8.18. Assume that F is proper. We have

F (x, z, u) = Ef(x, u) + δN (x− z)

as soon as the right side is lsc and proper on L0 × L∞ × U . In particular, theoptimum value of (SP ) equals ϕ(0, u).

Proof. By assumption,

Ef(x, u) + δN (x− z) ≤ F (x, z, u)

for all (x, z, u) ∈ L0×L∞×U . To prove the converse, let (x, z, u) ∈ L0×L∞×Ube such that the left side is finite. Define Q by dQ/dP = φ, where φ :=(1 + |x|2)−1. We have x ∈ X , where X := L2(Ω,F , Q;Rn) is paired withV = X . We have

F (x, z, u) ≤ F (x, z, u),

where F denotes the closure of F with respect to the locally convex spaceX × L∞ × U (the topology of which is stronger than that of L0 × L∞ × U).By the biconjugate theorem, it thus suffices to show that

F ∗∗(x, z, u) ≤ Ef(x, u) + δN (x− z),

where the conjugates are defined with respect to the pairing of X × L∞ × Uwith V × L1 × Y. Clearly, it suffices to consider the case x − z ∈ N . For any(v, p, y) ∈ V × L1 × Y,

F ∗(v, p, y) = sup(x,z,u)∈L∞×L∞×U

〈x, v〉+ 〈z, p〉+ 〈u, y〉 − Ef(x, u) |x− z ∈ N

= sup(x,z,u)∈L∞×L∞×U

E [x · (φv) + z · p+ u · y − f(x, u)] |x− z ∈ N

= sup(x,z′,u)∈L∞×L∞×U

E [x · (φv) + (x− z′) · p+ u · y − f(x, u)] | z′ ∈ N

= Ef∗(φv + p, y) + δN⊥(p),

where the last equality follows from the interchange rule. Hence,

F ∗∗(x, z, u) = sup(v,p,y)∈V×N⊥×Y

〈x, v〉+ 〈z, p〉+ 〈u, y〉 − Ef∗(φv + p, y)

= sup(v,p,y)∈V×N⊥×Y

E[x · (φv) + z · p+ u · y − f∗(φv + p, y)]

= sup(v,p,y)∈V×N⊥×Y

E[x · (φv + p) + (z − x) · p+ u · y − f∗(φv + p, y)].

68

Page 69: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

By Fenchel’s inequality,

f(x, u) + f∗(φv + p, y) ≥ x · (φv + p) + u · y,

so [x · p]+ ∈ L1 and [(z − x) · p]+ ∈ L1. Thus, by Lemma 6.5, E[(z − x) · p] = 0and then,

F ∗∗(x, z, u) = sup(v,p,y)∈V×N⊥×Y

E[x · (φv + p) + u · y − f∗(φv + p, y)]

≤ supp∈N⊥

E supv,y

[x · (φv + p) + u · y − f∗(φv + p, y)]

= supp∈N⊥

Ef(x, u),

which completes the proof.

Theorem 8.19. Let h be a normal integrand such that there exists a v ∈ N⊥,normal integrand g and two different values of λ ∈ R such that

f(x, u, ω) ≥ λx · v(ω)− g(u, ω)

and Eg is continuous on U . Then

(x, z, y) 7→ Ef(x, u) + δN (x− z)

is lsc on N × L∞ × U .

Proof. Letk(x, u, ω) := f(x, u, ω)− λx · v(ω) + g(u, ω)

andK(x, z, u) = Ek(x, u) + δN (x− z).

By Theorem 6.4, K is lsc on L0 × L∞ × U . If (x, z, u) ∈ domF , we haveE[x · v] = E[z · v], by Lemma 6.5, so

K(x, z, u) = F (x, z, u)− λ〈z, v〉+ Eg(u). (8.2)

On the other hand, by assumption, there exists λ′ 6= λ such that

f(x, u, ω) ≥ λ′x · v(ω) + g(u, ω)

sok(x, u, ω) ≥ (λ′ − λ)x · v(ω) + g(u, ω)

By Lemma 6.5 again, (8.2) holds on domK.

Remark 8.20. The assumption of Theorem ?? means that there exists a p ∈N⊥ such that

infy∈Y

Ef∗(λp, y) <∞

for two different values of λ ∈ R. This holds, in particular, if f is bounded frombelow. In the context of portfolio optimization, it holds when the utility functionsatisfies the so called “reasonable asymptotic elasticity”.

69

Page 70: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

Example 8.21. This is example shows that

F (x, z, u) < Ef(x+ z, u) + δN (x− z)

when the right side is not lsc. We continue Example ?? and let α ∈ L2(F0) andv ∈ N⊥ be such that Eαv =∞ and consider

f(x, u, ω) :=1

2|x0 − α(ω)|2 + x0v(ω).

Here

F (x, z, u) = E[1

2|x0−α|2 +x0v]+δN (x−z) = E[

1

2|x0−α|2]+〈z0, v〉+δN (x−z)

so

F (x, z, u) = E[1

2|x0 − α|2] + 〈z0, v〉+ δN (x− z).

Now F (α, 0, 0) = 0 while Ef(α, 0) =∞.

Example 8.22. There exists f such that F (x, u) := Ef(x, u) is lsc on N × Uand proper on N∞ × U , but

cl(F + δN∞×U ) 6= F.

That is, choose T = 1, n0 = n1 = 1, trivial F0, and

f(x, u, ω) :=

0 if x1 = x0ξ(ω),

+∞ otherwise.

where ξ /∈ L∞. Then cl(F + δN∞×U ) = δ0×U which does not coincide with F .

We define the Lagrangian associated with the relaxed problem by

L(x, p, y) := infF (x, z, u)− 〈z, p〉 − 〈u, y〉.

Note that the interchange rule gives the expression

L(x, p, y) = E[x · p− f∗(p, y)].

Theorem 8.23. Assuming that

F (x, z, u) = Ef(x, u) + δN (x− z)

and that the primal and the dual are feasible, the following are equivalent

(a) x solves (SP ), (p, y) solves the (Du) and there is no duality gap,

(b) x and (p, y) is a saddle-point of L,

(c) x is primal feasible, (p, y) is dual feasible and

(p, y) ∈ ∂f(x, u) P -a.s.

70

Page 71: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

(d) x is primal feasible, (p, y) is dual feasible and

p ∈ ∂xl(x, y), u ∈ ∂y[−l](x, y) P -a.s.

Proof. Let x and (v, y) be feasible. By Fenchel’s inequality,

f(x, u) + f∗(v, y)− u · y ≥ x · v P -a.s.

soEf(x, u) + E[f∗(p, y)− u · y] ≥ E[x · p]

and the latter holds as an equality if and only if the former does which, inturn, is equivalent to the subdifferential condition. It now suffices to note thatE[x · p] = 0, by Lemma 6.5.

Theorem 8.24. Assume that F is proper and that

F (x, z, u) = Ef(x, u) + δN (x− z)

is lsc and proper. The following are equivalent.

1. (p, y) ∈ ∂ϕ(0, u).

2. y ∈ ∂uϕ(0, u) and

infx∈N

L0(x, y) = infx∈L0L0(x, y)− E[x · p].

3. p ∈ ∂zϕ(0, u) and y ∈ Y satisfies y ∈ ∂φ(u) almost surely, where

φ(u, ω) = infxf(x, u, ω)− x · p(ω).

Proof. Omitted.

Example 8.25. Let f(x, u, ω) = 12 (x0 − 1)2 + δ0(x0ξ(ω) − x1), F0 trivial,

ξ ∈ L0(F1) with ξ /∈ L∞. We have

f∗(v, y, ω) = δ0(y) + v0 + v1ξ(ω) +1

2(v0 + v1ξ(ω))2.

Clearly Ef∗(p, y) ≥ 0 for all (p, y) ∈ N⊥ × Y, so the dual optimum is attainedat the origin. Clearly ϕ is lsc, so (0, 0) ∈ ∂ϕ(0, 0). Here L0(x, 0) = Ef(x, 0),so x ∈ N∞ in dom1 L0 if and only if x = 0. Thus

infx∈N∞

L0(x, 0) = 1

while, choosing x = (1, ξ), we see that

infx∈L∞

L0(x, 0) = 0.

71

Page 72: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

9 Examples

Exercise 9.0.1 (Mathematical programming). Consider the mathematical pro-gramming problem

minimize Ef0(x) overx ∈ Nsubject to fj(x) ≤ uj P -a.s., j = 1, . . . ,m,

where fj are convex normal integrands.

Embed the problem in to our duality framework, show that the corresponding fis a convex normal integrand, write down l and the optimality conditions foru = 0. You may assume that f satisfies all the conditions of the main theorems.

Exercise 9.0.2 (Linear stochastic programming). Consider Exercise 9.0.1 inthe case where f0(x, ω) = c(ω)·x and fj(x, ω) = aj(ω)·x−bj(ω) for j = 1, . . . ,m.The problem can then be written in the matrix notation as

minimize E[x · c] overx ∈ Nsubject to Ax ≤ b P -a.s.

Assume that c is adapted and the rows of A∗ define adapted processes. Writedown l, the dual problem and the optimality conditions. For every y, minimizeover p in the dual problem.

Exercise 9.0.3 (Problems of Bolza). Consider the problem of Bolza

minimizex∈N

E[

T∑t=0

Kt(xt−1,∆xt + ut) + k(xT )], (9.1)

where Kt and k are convex normal integrands.

Embed the problem into the duality framework and write down f , l and f∗.Write down l in terms of the associated Hamiltonian

Ht(xt−1, yt, ω) := infut∈Rd

Kt(xt−1, ut, ω)− ut · yt.

When kt(xt−1, u) = k1t (xt−1) + k2

t (u) for Ft−1-adapted k1t and Ft-adapted k2

t ,use Jensen’s inequality to show that for the dual problem with u = 0, the dualproblem reduces to

infp,y

ϕ∗(p, y) = infadapted y

E[

T∑t=0

K∗t (Et−1∆yt, yt) + k∗(−yT )].

You may assume that the Kt and k are bounded from below.

Example 9.1 (Financial mathematics, continued). The problem in Example 9.1fits the duality framework with

f(x, u) = V (u−T−1∑t=0

xt∆st+1) +

T−1∑t=0

δDt(ω)(xt)

72

Page 73: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

so

l(x, y) = −yT−1∑t=0

xt∆st+1 − V ∗(y) +

T−1∑t=0

δDt(ω)(xt),

l = l and

f∗(v, y) = V ∗(y) +

T−1∑t=0

σDt(ω)(vt + y∆st+1).

Thus the dual problem becomes

minimizeE

[V ∗(y) +

T−1∑t=0

σDt(ω)(pt + y∆st+1)− uy

]over p ∈ N⊥, y ∈ Y

while the optimality conditions are p ∈ N⊥ and

−T−1∑t0

xt∆st+1 ∈ ∂V ∗(y),

pt + y∆st+1 ∈ NDt(xt) t = 0, . . . , T.

Since Dt is adapted, Corollary 8.12 gives that, given y, optimal p satisfies pt =Et[y∆st+1]− y∆st+1, so the dual reduces to

minimizeE

[V ∗(y) +

T−1∑t=0

σDt(Et[y∆st+1])− uy

]over y ∈ Y.

If the dual problem has a solution, then x is optimal if and only if it is feasibleand there is y such that

−T−1∑t0

xt∆st+1 ∈ ∂V ∗(y),

Et[y∆st+1] ∈ NDt(xt) t = 0, . . . , T.

Example 9.2 (Stochastic control, continued). The problem

minimize E

[T∑t=0

Lt(Xt, Ut)

]over X,U ∈ N ,

subject to ∆Xt = AtXt−1 +BtUt−1 + ut t = 1, . . . , T

fits the general framework (equality constraints) with x = (X,U), u = (u1, . . . , uT ),

f(x, u) =

T∑t=0

Lt(xt) +

T∑t=1

δ0(π∆xt − Atxt−1 − ut),

73

Page 74: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

π = [I 0] and At = [At Bt]. Denoting y0 = yT+1 = 0 and AT+1 = 0, we get

l(x, y) =

T∑t=0

Lt(xt)−T∑t=1

(π∆xt − Atxt−1) · yt

=

T∑t=0

(Lt(xt) + xt · (π∗∆yt+1 + A∗t+1yt+1)

),

f∗(v, y) = supx

T∑t=0

(xt · (vt − π∗∆yt+1 − A∗t+1yt+1)− Lt(xt)

)

=

T∑t=0

L∗t (vt − π∗∆yt+1 − A∗t+1yt+1).

Optimality conditions become p ∈ N⊥, and, for all t,

pt − (∆yt+1 +A∗t+1yt+1, B∗t+1yt+1) ∈ ∂Lt(Xt, Ut),

∆Xt = AtXt−1 +BtUt−1 + ut.

When each Lt and ut is adapted, Corollary 8.12 gives

infp∈N⊥

Ef∗(p, y) = E

T∑t=0

L∗t (−Et[π∗∆yt+1 + A∗t+1yt+1]).

Moreover, if A and B are adapted and the dual optimum is attained, then anx is optimal if and only if it is feasible and there is an adapted y (adapted byJensen) such that

−Et[(∆yt+1 +A∗t+1yt+1, B∗t+1yt+1)] ∈ ∂Lt(Xt, Ut),

∆Xt = AtXt−1 +BtUt−1 + ut.

Note that the first condition can be written as (an exercise)

−(Et∆yt+1, 0) ∈ ∂(Xt,Ut)Ht(Xt, Ut, yt+1)

where Ht is the “Hamiltonian” defined by

Ht(Xt, Ut, yt+1) := Lt(Xt, Ut) + Et[yt+1 · (At+1Xt +Bt+1Ut)].

If Lt are of the form Lt(Xt, Ut) = L0t (Xt, Ut) + LXt (Xt) + LUt (Ut) with a dif-

ferentiable L0t , all Ft-measurable, then the optimality conditions can be written

as

Ut ∈ argminHt(Xt, ·, yt+1),

−Et∆yt+1 ∈ ∂XHt(Xt, Ut, yt+1),

∆Xt = AtXt−1 +BtUt−1 + ut.

74

Page 75: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

This can be viewed as a stochastic version of the classical “maximum principle”in discrete time.

Recall the “cost to go function”

Wt(Xt) = infUtEtLt(Xt, Ut) +Wt+1(Xt +At+1Xt +Bt+1Ut + ut+1)

from Corollary 7.35. We have

∂Wt(Xt) = Vt | ∃Ut : (Vt, 0) ∈ ∂Et[Lt(Xt, Ut)+Wt+1(Xt+At+1Xt+Bt+1Ut+ut+1)].Thus, if (X,U, y) satisfy the optimality conditions and Et+1yt+1 ∈ ∂Wt+1(Xt+1),then

Lt(X′t, U

′t) +Wt+1(X ′t +At+1X

′t +Bt+1U

′t + ut)

≥ Lt(Xt, Ut) +Wt+1(Xt +At+1Xt +Bt+1Ut + ut)

+ Et+1[yt+1] · (X ′t +At+1X′t +Bt+1U

′t −Xt −At+1Xt −Bt+1Ut)

− Et[(∆yt+1 +A∗t+1yt+1, B∗t+1yt+1)](X ′t −Xt, U

′t − Ut)

Taking conditional expectations gives

Et[Lt(X′t, U

′t) +Wt+1(X ′t +At+1X

′t +Bt+1U

′t + ut)]

≥ Et[Lt(Xt, Ut) +Wt+1(Xt +At+1Xt +Bt+1Ut + ut)] + Et[yt] · (X ′t −Xt)

i.e.

(Et[yt], 0) ∈ Et[Lt(Xt, Ut) +Wt+1(Xt +At+1Xt +Bt+1Ut + ut+1)],

so Et[yt] ∈ ∂Wt(Xt). We obtained that dual optimizers are subgradients of thecost-to-go function.

10 Absence of duality gap

Assumption 10.1. There exist p ∈ N⊥ such that

infy∈Y

Ef∗(λp, y) <∞

for two different values of λ ∈ R.

Assumption 10.2. The set L = x ∈ N | f∞(x, 0) ≤ 0 is linear.

The following theorem combines the results of this section with the main theo-rems of the previous one.

Theorem 10.1. Under Assumptions 10.1 and 10.2, ϕ is proper and lsc, (SP )admits solutions and its optimum value equals

supp∈N⊥,y∈Y

E[u · y − f∗(p, y).

Moreover, ϕ∗(p, y) = Ef∗(p, y) + δN⊥(p).

Proof. Omitted.

75

Page 76: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

11 Optimal investment and indifference pricing

Recall the optimal investment problem

minimize EV

(u−

T−1∑t=0

xt ·∆st+1

)over x ∈ N0, (ALM)

where N0 = x ∈ N | xT = 0 P -a.s. and V is a nondecreasing nonconstantconvex function with V (0) = 0.

The market satisfies the no-arbitrage condition ifT−1∑t=0

xt ·∆st+1

∣∣∣∣∣T−1∑t=0

xt ·∆st+1 ≥ 0, x ∈ N

= 0. (NA)

We denote the nonnegative multiples of absolutely continuous martingale den-sities of s by Q. It is an exercise to verify that a nonnegative y ∈ Y belongs toQ if and only if Et[y∆st+1] = 0 for all t = 0, . . . T − 1.

The following theorem is from the exercise section above.

Theorem 11.1. Assume that the market satisfies (??) and that there exists amartingale measure Q P of s such that EV ∗(λdQdP ) < ∞ for two different

λ ≥ 0. If dQdP ∈ Y, then the value function of (2.1) is closed in U and (2.1) has

a solution for all u ∈ U .

11.1 Pricing formulas for indifference prices

Given a current financial position u, the indifference price of a claim u is givenby

π(u; u) := infα∈Rα | ϕ(u+ u− α) ≤ ϕ(u).

This is the least amount of cash the agent needs to cover the claim u withoutworsening her risk profile. The indifference price can be expressed

π(u, u) = infα | c− α ∈ A(u),

where the acceptance set

A(u) := u ∈ U | ϕ(u+ u) ≤ ϕ(u),

describes the claims that the agent finds ”acceptable” given her risk preferencesand her ability to trade in the market.

Theorem 11.2. Assume that F0 is the trivial σ-field , ϕ is proper and lowersemicontinuous, and that the price process is arbitrage-free. Then

π(u; u) = supy∈YEuy − σA(u)(y) | Ey = 1.

76

Page 77: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

If inf ϕ < ϕ(c), then

π(u; u) = supy∈Q,λ>0

E[uy − λV ∗(y/λ)− uy]− λϕ(u) | Eq = 1.

Proof. We have

π∗(y; u) = supu∈UE[uy]− πs(u, u)

= supα∈R,c∈U

E[uy]− α | c− α ∈ A(u)

= supα∈R,c∈U

E[(u+ α)y]− α | c ∈ A(u)

= supα∈R,c∈U

E[(uy + α(y − 1)] | u ∈ A(u)

=

σA(u)(y) if Ey = 1,

+∞ otherwise.

Thus the first formula follows from the biconjugate theorem since π(·, u) isproper and lower semicontinuous (an exercise). This will be a topic in theexercises.

The support function σA(u) satisfies

σA(u)(q) = supu∈U〈u, y〉 | ϕ(u+ u) ≤ ϕ(u)

= infλ≥0

supu∈U〈u, y〉 − λ(ϕ(u+ u)− ϕ(u)) (11.1)

= infλ≥0

supu∈U〈u− u, y〉 − λ(ϕ(u)− ϕ(u))

= infλ≥0λϕ∗(y/λ)− E[uy] + λϕ(u).

Here the second line is based on Lagrangian duality and the assumption thatinf ϕ < ϕ(c) (exercise below) . Thus the result follows from Theorem ?? oncewe recall that taking the lsc-hull does not change the infimum.

Exercise 11.1.1. Prove that π(·, u) is proper and lsc. Hint: Prove sequentiallower semicontinuous using the ”direct method”, get precompactness of ”minimiz-ing sequences” using recession analysis and recalling ϕ∞(u) = infx∈N Ef

∞(x, u).

Exercise 11.1.2. Prove the formula (??). Hint: Apply conjugate duality to

F (α, u) = 〈u, y〉+ δR−(ϕ(u+ u)− ϕ(u) + α)

where α ∈ R is paired with λ ∈ R, 〈α, λ〉 := αλ.

Example 11.3 (Superhedging prices). For the superheding prices, we have V =δR− and c = 0, so the above formula reduces to

π(c) = supy∈QE[uy] | Ey = 1.

77

Page 78: Convex Stochastic optimization WiSe 2019/2020 · Convex sets are stable under many algebraic operations. Let X be another linear vector space. Theorem 3.1. Let Jbe an arbitrary index

We have recovered the pricing formula for superhedging prices in terms of ab-solutely continuous martingale measures of s.

78