Slides Optimization

8/12/2019 Slides Optimization

1/28

Introduction and Unconstrained Optimization

Master QFin at WU ViennaLecture Optimization

Rudiger Frey

[email protected]://statmath.wu.ac.at/frey

Spring 2014

1 / 2 9
http://goforward/http://find/http://goback/


2/28


Admin

Dates of the lecture. 4.3., 6.3., 11.3.,18.3. (incl mid-term)

25.3., 1.4., 8.4. (final exam at some point before easter) Examination. 10 % home assignments, 30% Midterm Exam

(unit4) 60% Final Exam

Tutor. Giorgo Ottonello, 2nd year QFin. He will correcthomework assignments

2 / 2 9
http://find/


3/28


Useful references

R. Frey, lecture notes Optimization, available on learn@WU,

2014 Bertsekas, D., Nonlinear Programming, Athena Scientific

Publishing, 1999

Griva, Nash, Sofer, Linear and Nonlinear optimization, SIAMpublishing, 2009

3 / 2 9
http://find/


4/28


Overview

Introduction and Unconstrained OptimizationIntroductionMathematical BackgroundUnconstrained Optimization: Theory

4 / 2 9
http://find/


5/28


Optimization problems

In its most general for an optimization problem is

minimizef(x) subject tox X (1)

Here the set of admissible pointsX is a subset ofRn, and the costfunction f is a function from X to R. Often the admissible pointsare further restricted by explicit inequality constraints.

Note that maximization problemscan be addressed by replacing fwith f, as supxXf(x) = infxX{f(x)}.

5 / 2 9

O
http://find/


6/28


Types of optimization problems.

Continuous problems. Here X is of continuous nature suchas X = Rn or sets of the formX ={x Rn :g(x)0 for some g : Rn Rn}. Theseproblems are usually tackled with calculus or convex analysis.

Discrete problems. Here X is a (usually large) finite set (as innetwork optimization)

Nonlinear programming. Here f is nonlinear or theconstrained set X is specified by nonlinear equations.

Linear programming. Here f and gare linear, and (1) takesthe form

min cxsuch thatAxb

forc, x Rn, a matrix A Rmn, b Rm and mn.

6 / 2 9

I d i d U i d O i i i
http://find/


7/28


Optimization problems in finance, economics and statistics.

a) Portfolio optimization. We give two examples.

Maximization of expected utility.

maxRn

E (u(V0+(ST S0))) ,

Here V0 is the initial wealth of an investor; S0 = (S10 , . . . S

n0 )

is the initial asset price; ST() = (S1T(), . . . S

nT()) the

terminal asset price; represents the portfolio strategy. Theincreasing and concave function u: R R is the utilityfunctionthat is used to model the risk aversion of the investor.

Markowitz problem. Here one looks for the minimal-varianceportfolio under all portfolios with a given mean.

7 / 2 9

I t d ti d U t i d O ti i ti
http://find/


8/28


Optimization problems ctd.

b) Calibration problems. Denote by g1(), . . . , gm() modelprices ofm financial instruments for given parameter vector X Rn and by g1 , . . . , g

m the observed market price.

Model calibration leads to the optimization problem

minX

12

mi=1

(gi() gi)

2.

Ifgi is linear in we have a standard regression problem;otherwise one speaks of a generalized regression problem.

c) Maximum likelihood methodsin statistics.

d) Financial mathematics. Duality results from convex analysisare crucial in financial mathematics (first fundamentaltheorem of asset pricing or superhedging duality).

8 / 2 9

http://find/http://goback/


9/28


Overview

1. Unconstrained Optimization Necessary and sufficient optimality conditions

Numerical methods2. Lagrange multiplier theory and Karush-Kuhn-Tucker theory

3. Convex optimization and duality Convexity and separation; The dual problem Duality results and economic applications

9 / 2 9

http://find/


10/28


Differentiability and partial derivatives

Consider some f: U Rn R, (x1, . . . , xn)t f(x1, . . . , xn),

where U is an open subset ofRn, and some xU.

Definition. (1) f is called continuously differentiable on U

(f C1

(U)) if for all xUall partial derivatives exist and if thepartial derivatives are are continuous functions ofx.

(2) More generally, a function f: U Rn Rm,

(x1, . . . , xn)t (f1(x1, . . . , xn), . . . , fm(x1, . . . , xn))

t

is continuously differentiable on U if all components f1, . . . , fmbelong to C1(U).

10/29

http://find/


11/28


Example: Quadratic form

Consider a symmetric 2 2 matrix A and let

f(x1, x2) =xtAx=a11x21 + 2a12x1x2+a22x22 .

f(x)

x1= 2a11x1+ 2a12x2 = (2Ax)1 and

f(x)

x2= (2Ax)2

11/29

http://find/


12/28


Gradient and Jacobi matrix

Suppose that f: U R is in C1(U). Then the column vector

f(x) =

fx1

(x), . . . , fxn

(x)t

is the gradientoff.

For a C1 function g :U Rm the Jacobi matrix is given by

Jg(x) =

g1(x)x1

g1(x)xn

... ...

gm(x)x1

. . . gm(x)

xn

Sometimes one uses also the gradient matrix

g(x) =Jg(x)t = (g1(x), . . . , gm(x)).

12/29

http://find/


13/28

p

First order (Taylor) approximation

Consider some C1 function f: U R. Then for any x, yU

f(y) f(x) =f(x)t(y x) +R(x, y x) (2)

where it holds that limz0R(x,z)z = 0.

Idea. The function fcan be approximated locally around xby theaffine mapping y f(x) + f(x)t(y x).

Similarly, we get for a C1 function g: U Rm that

g(y)g(x) =Jg(x)(yx)+R(x, yx) with limz0

R(x, z)

z = 0.

13/29

http://find/


14/28

Chain rule

Theorem. Consider C1 functions f: Rn Rm and g: Rk Rn

and let h:=f g. Then h is C1 and it holds for the Jacobi matrixthat Jh(x) =Jf(g(x))Jg(x), i.e. the Jacobian of theconcatenation is the product of the individual Jacobi matrices.

Example. (Derivative along a vector). Consider a C1 functionsf: Rn R We want to consider the function f along the straightline (t) :=x+tv, fort R, x, v Rn. We have J(t) =v,Jf(x) = (f(x))t and hence

d

dtf((t)) = (f(x+tv))tv, in particular

d

dtf((0)) = (f(x))tv.

14/29



15/28

Second derivatives

Definition. Consider C1 function f: U Rn R. Then the firstorder partial derivatives f(x)

xi, 1in, are themselves functions

from U to R.

1. If all partial derivatives are C1 functions, f is called twice

continuously differentiable on U (f C2

(U)). Fixi,j {1, . . . , n}. Then one writes

2f

xixj(x) :=

fxi

(x)

xj

for the second partial derivative in direction xi and xj.

2. For f C2(U) the matrix Hf with Hfij(x) = 2fxixj

(x) is the

Hessian matrixoff.

15/29



16/28

Theorem of Young and Schwarz

Theorem. Consider f C2(U). Then the Hessian matrix issymmetric, that is

2f

xixj(x) =

2f

xjxi(x) , 1 i,jn.

It follows that the Hessian is a symmetric matrix, that isHfij(x) =Hfji(x), 1 i,jn. In particular, the definiteness ofHf

can be checked using eigenvalues: HFis positive semi-definite if alleigenvalues are strictly positive and positive semidefinite if alleigenvalues are non-negative.

16/29



17/28

Example

(1) Consider f(x1, x2) =x31 x2+x

21 x

22 +x1+x

22 . Then we have

2fx21

= 6x1x2+ 2x22 , 2f

x22= 2x21 + 2 ,

2fx1x2

= 3x21 + 4x1x2.

(2) Consider f(x) =xtAx for some symmetric matrix A. Then

Hf(x) = 2A.

17/29



18/28

Second order Taylor expansion

Theorem. Iff is C2(U) and x, yU the Taylor formula becomes

f(y)f(x) =f(x)t(yx)+1

2(yx)tHf(x)(yx)+R2(x, yx)

where limz0 R2(x,z)z2 = 0.

Idea. fcan be approximated locally around xUby the quadraticfunction

y f(x) + f(x)t

(y x) +

1

2 (y x)t

Hf(x)(y x) .

Locally, this is a better approximation than the first order Taylorapproximation.

18/29

http://find/


19/28

Unconstrained optimization: the problem

In this section we consider problems of the form

minimizef(x) for x X = Rn

(3)

Moreover, we assume that f is once or twice continuouslydifferentiable.

Most results hold also in the case where X is an open subset ofRn.

19/29



20/28

Local and global optima

Definition. Consider the optimization problem (3).

1. x is called (unconstrained) local minimum off if there issome >0 such that f(x)f(x) for all x Rn with

x x< .2. x is called global minimum off, if f(x) f(x)x Rn.

3. x is said to be a strict local/global minimum if the inequalityf(x)f(x) is strict for x=x.

4. The valueof the problem is f := inf{f(x) :x Rn

}RemarkLocal and global maxima are defined analogously.

20/29



21/28

Necessary optimality conditions

Proposition. Suppose that x Uis a local optimum off.1. Iff is C1 in U, then f(x) = 0. (First Order Necessary

Condition or FONC).

2. If moreover f C2(U) then Hf(x) is positive semi-definite

(Second Order Necessary Condition or SONC).Comments.

x Rn with f(x) = 0 is called stationary pointoff.

Proof is based on Taylor formula.

Necessary conditions for a local maximum: f(x) = 0,Hf(x) negative semidefinite.

Necessary conditions in general not sufficient: considerf(x) =x3, x = 0.

21/29

http://find/


22/28

Sufficient optimality conditions

Proposition. (Sufficient conditions.) Let f :U Rn R be C2

on U. Suppose that x Usatisfies the conditions

f(x) = 0, Hf(x) strictly positive definite (4)

Then x is a local minimum.

Comments.

Sufficient conditions not necessary: Consider eg.

f(x) =x4, x = 0. No global statements possible.

22/29



23/28

The case of convex functions

Definition. (Convex sets and functions)

i) A set X Rn is convex ifx1, x2X, [0, 1] the convexcombination x1+ (1 )x2 belongs to X.

ii) A function f :X Rn

R

(Xconvex) is called convex ifx1, x2X, [0, 1]

f(x1+ (1 )x2) f(x1) + (1 )f(x2); (5)

f is strict convex if the inequality is strict for

(0,

1).iii) f :X Rn R is concave f is (strict) convex

holds in (5). Strict concavity is defined in the same way.

23/29

http://find/


24/28

Characterizations of Convexity

Lemma. Consider an open convex set X Rn. A C1 functionf :X Rn is convex on the if and only if it holds for all x, zXthat

f(z)f(x) + f(x)(z x).

Iff is C2 a necessary and sufficient condition for the convexity offon X is the condition that Hf(x) is positive semi-definite for allxX.

Comments

f is concave on U Hf is negative semidefinite on U.

Note that we may decide convexity or concavity by finding theeigenvalues ofHf(x).

24/29

http://find/


25/28

Example

Problem. Let f(x1, x2) = 2x1 x2 x21 + 2x1x2 x22 . Is f convex,concave or none of both ?

Solution. The symmetric matrix representing the quadratic part off is

A=

1 11 1

An easy computation gives for the Hessian that Hf(x) = 2A.Hence we need to check the definiteness ofA.

Approach via eigenvalues. The characteristic polynomial ofA isP() =2 + 2; the equation P() = 0 has solutions(eigenvalues) 2 and 0. Hence, A is negative semidefinite and thefunction is concave.

25/29

http://find/


26/28

Optimality conditions for convex functions

Proposition. Let f :X R be a convex function on some convexset X Rn. Then

1. A local minimum off over X is also a global minimum. Iff isstrictly convex, there exists at most one global minimum.

2. IfX is open, the condition f(x) = 0 is necessary andsufficient for x Xto be a global minimum off.

26/29

http://goforward/http://find/http://goback/


27/28

Example: Quadratic cost functions

Let f(x) =

1

2 x

Qx b

x, xRn

for a symmetric n n matrix Qand some b Rn. Then we have

f(x) =Qx band Hf(x) =Q.

a) Local minima. Ifx

is a local minimum we must havef(x) =Qx b= 0, Hf(x) =Qpositive semi-definite;

hence ifQ is not positive semi-definite, fhas no local minima.

b) IfQ is positive semi-definite, f is convex. In that case we neednot distinguish global and local minima, and f has a globalminimum if and only if there is some x with Qx =b.

c) IfQ is positive definite, Q1 exists and the unique globalminimum is attained at x =Q1b.

27/29



28/28

Existence results for a global minimumProposition(Weierstrass Theorem) Let X Rn be non-empty and

suppose that f :X R is continuous in X. Suppose moreover,that one of the following three conditions holds

(1) Xis compact (closed and bounded).

(2) X is closed and f is coercive, that is

(xk)kNX withxk one has lim

kf(xk) =

(3) There is some R such that the level set{xX :f(x)} is non-empty and compact.

Then fhas at least one global minimum and the set of all globalminima is compact.

Remark. The result holds also for lower semicontinous functions.(f is called lower semicontinuous if for all xX, all (xk)kN with

x

k

x lim infkf(xk

)f(x).) 28/29

Documents

Slides Optimization