Upload
vuongdiep
View
217
Download
2
Embed Size (px)
Citation preview
Chapter 2: Preliminaries and elements of convex
analysis
Edoardo Amaldi
DEIB – Politecnico di [email protected]
Website: http://home.deib.polimi.it/amaldi/OPT-14-15.shtml
Academic year 2014-15
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 1 / 23
2.1 Basic concepts
In Rn with Euclidean norm
x ∈ S ⊆ Rn is an interior point of S if ∃ ε > 0 such that
Bε(x) = {y ∈ Rn : ‖y − x‖ < ε} ⊆ S .
x ∈ Rn is a boundary point of S if, for every ε > 0, Bε(x) contains at least one
point of S and one point of its complement Rn \ S .
The set of all the interior points of S ⊆ Rn is the interior of S , denoted by int(S).
The set of all boundary points of S is the boundary of S , denoted by ∂(S).
S ⊆ Rn is open if S = int(S); S is closed if its complement is open. Intuitively, a
closed set contains all the points in ∂(S).
S ⊆ Rn is bounded if ∃ M > 0 such that ‖x‖ ≤ M for every x ∈ S .
S ⊆ Rn closed and bounded is compact.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 2 / 23
Property:
A set S ⊆ Rn is closed if and only if every sequence {x i}i∈N ⊆ S that converges,
converges to x ∈ S .
A set S ⊆ Rn is compact if and only if every sequence {x i}i∈N ⊆ S admits a subsequence
that converges to a point x ∈ S .
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 3 / 23
Existence of an optimal solution
In general, when we wish to minimize a function f : S ⊆ Rn → R, we only know that
there exists a greatest lower bound (infimum), that is
infx∈S
f (x)
Theorem (Weierstrass):
Let S ⊆ Rn be nonempty and compact set, and f : S → R be continuous on S . Then
there exists a x∗ ∈ S such that f (x∗) ≤ f (x) for every x ∈ S .
Examples in which the result does not hold because: S is not closed, S is not bounded orf (x) is not continuous on S .
Since the problem admits an optimal solution x∗ ∈ S , we can write minx∈S f (x).
Observation: this result holds in any vector space of finite dimension.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 4 / 23
Cones and affine subspaces
Consider any subset S ⊂ Rn
Definition: cone(S) denotes the set of all the conic combinations of points of S , i.e., allthe points x ∈ R
n such that x =∑m
i=1 αi x i with x1, . . . , xm ∈ S and αi ≥ 0 per every i ,1 ≤ i ≤ m.
Examples: polyedral cone generated by a finite number of vectors, ”ice cream” cone in
R3 generated by an infinite number of vectors
Definition: aff(S) denotes the smallest affine subspace that contains S .
aff(S) coincides with the set of all the affine combinations of points in S , i.e., all thepoints x ∈ R
n such that x =∑m
i=1 αi x i with x1, . . . , xm ∈ S and with∑m
i=1 αi = 1,where αi ∈ R for every i , 1 ≤ i ≤ m.
Examples: straight line containing two points in R2, plane containing three points in
general position in R3
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 5 / 23
2.2 Elements of convex analysis
Definition:
A set C ⊂ Rn is convex if
αx1 + (1− α)x2 ∈ C ∀x1, x2 ∈ C and ∀α ∈ [0, 1].
A point x ∈ Rn is a convex combination of x1, . . . , xm ∈ R
n se
x =m∑
i=1
αi x i
with αi ≥ 0 for every i , 1 ≤ i ≤ m, and∑m
i=1 αi = 1.
Property: If Ci with i = 1, . . . , k are convex, then ∩ki=1Ci is convex.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 6 / 23
Examples of convex sets
1) Hyperplane H = {x ∈ Rn : ptx = β} with p 6= 0.
For x ∈ H, ptx = β implies H = {x ∈ Rn : pt(x − x) = 0} and hence H is the set of all
the vectors in Rn orthogonal to p.
N.B.: H is closed since H = ∂(H)
2) Closed half-spaces H+ = {x ∈ Rn : ptx ≥ β} and H− = {x ∈ R
n : ptx ≤ β} withp 6= 0.
3) Feasible region X = {x ∈ Rn : Ax ≥ b, x ≥ 0} of a Linear Program (LP)
min c tx
s.t. Ax ≥ b
x ≥ 0
X is a convex and closed subset (intersection of m + n closed half-spaces if A ∈ Rm×n).
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 7 / 23
Definition: The intersection of a finite number of closed half-spaces is a polyedron.
N.B.: Also the set of optimal solutions of a LP is a polyhedron (just add c tx = z∗ to theconstraints, where z∗ is the optimal value) and hence convex.
Definition: The convex hull of S ⊆ Rn, denoted by conv(S), is the intersection of all
convex sets containing S .
conv(S) coincides with the set set of all convex combinations of points in S .Two equivalent characterizations (external/internal descriptions).
Definition: Given a convex set C of Rn, a point x ∈ C is an extreme point of C if itcannot be expressed as convex combination of two different points of C , that is
x = αx1 + (1− α)x2 with x1, x2 ∈ C and α ∈ (0, 1)
implies that x1 = x2.
Examples: convex sets with a finite and infinite number of extreme points.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 8 / 23
Projection on a convex set
Projection of a point on a subset to which it does not belong to.
Lemma (Projection):
Let C ⊆ Rn be a nonempty, closed and convex set, then for every y 6∈ C there exists a
unique x ′ ∈ C at minimum distance from y .
Moreover, x ′ ∈ C is the closest point to y if and only if
(y − x ′)t(x − x ′) ≤ 0 ∀x ∈ C .
Definition: x ′ is the projection of y on C .
Geometric illustration:
Proof:
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 9 / 23
Separation theorem and consequences
Geometrically intuitive but fundamental result.
Theorem (Separating hyperplane)
Let C ⊂ Rn be a nonempty, closed and convex set and y 6∈ C , then there exists a p ∈ R
n
such that ptx < pty for every x ∈ C .
Thus there exists a hyperplane H = {x ∈ Rn : ptx = β} with p 6= 0 that separates y
from C , i.e., such that
C ⊆ H− = {x ∈ Rn : ptx ≤ β} and y 6∈ H− (pty > β)
Geometric illustration:
Proof: According to the Lemma, (y − x ′)t(x − x ′) ≤ 0 ∀x ∈ C .
Taking p = (y − x ′) 6= 0 and β = (y − x ′)tx ′ we have ptx ≤ β ∀x ∈ C ,
while pty − β = (y − x ′)t(y − x ′) = ‖y − x ′‖2 > 0 since y 6∈ C .
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 10 / 23
Three important consequences:
1) Any nonempty, closed and convex set C ⊆ Rn is the intersection of all closed
half-spaces containing it.
Definition: Let S ⊂ Rn be nonempty and x ∈ ∂(S) ( boundary w.r.t. aff(S) ),
H = {x ∈ Rn : pt(x − x) = 0} is a supporting hyperplane of S at x if S ⊆ H− or
S ⊆ H+.
2) Supporting hyperplane:
If C 6= ∅ is a convex set in Rn, then for every x ∈ ∂(C) there exists a supporting
hyperplane H at x , i.e., ∃ p 6= 0 such that pt(x − x) ≤ 0, for each x ∈ C .
A convex set admits at least a supporting hyperplane at each boundary point. Fornonconvex sets, such a hyperplane may not exist.
Examples: cases with 1/∞/0 supporting hyperplanes in a given boundary point x
Proof sketch:
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 11 / 23
Central result of Optimization (also of Game theory) from which we will derive theoptimality conditions for Nonlinear Programming.
3) Farkas Lemma:
Let A ∈ Rm×n and b ∈ R
m. Then
∃ x ∈ Rn such that Ax = b and x ≥ 0 ⇔ 6 ∃ y ∈ R
m such that y tA ≤ 0t e y tb > 0.
In this form, it provides an infeasibility certificate for a given linear system, but it is alsoknown as the theorem of the alternative.
Alternative: exactly one of the two systems Ax = b, x ≥ 0 and y tA ≤ 0t , y tb > 0 isfeasible.
Geometric interpretation:
b belongs to the convex cone generated by the columns A1, . . . ,An of A, i.e., tocone(A)= {z ∈ R
m : z =∑n
j=1 αj Aj , α1 ≥ 0, . . . , αn ≥ 0}, if and only if no hyperplaneseparating b from cone(A) exists.
Alternative: b ∈ cone(A) or b 6∈ cone(A) (hence ∃ hyperplane separating b from cone(A))
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 12 / 23
Application of Farkas Lemma: asset pricing in absence of arbitrage
Single period, m assets are traded, n possible states (scenarios).
Assets can be bought or sold “short”, i.e., with the promise to buy them back at theend.
Portfolio y ∈ Rm, with yi = amount invested in asset i .
A negative value of yi indicates a ”short” position.
If pi is the price of a unit of asset i at the beginning, the portfolio cost is: pty
Consider A ∈ Rm×n with aij = value of one euro invested in asset i if state j occurs.
At the end all assets are sold (we receive a payoff of aijyi ) and the “short” positions arecovered (we pay aij |yi | and hence have a payoff of aijyi ).
Portfolio value: v t = y tA
Absence of arbitrage condition: any portfolio with nonnegative values for all states musthave a nonnegative cost.
Algebraically: 6 ∃ y such that y tA ≥ 0t and pty < 0.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 14 / 23
According to Farkas Lemma:
No possibility of arbitrage exist ( 6 ∃ y such that y tA ≥ 0t and pty < 0 )
if and only if
∃ q ∈ Rn with q ≥ 0 such that the asset price vector p satisfies
Aq = p.
If the market is efficient, some ”state prices” qi exist.
N.B.: in general q is not unique.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 15 / 23
2.2.2 Convex functions
Definitions:
A function f : C → R defined on a convex set C ⊆ Rn is convex if
f (αx1 + (1− α)x2) ≤ αf (x1) + (1− α)f (x2) ∀x1, x2 ∈ C and ∀α ∈ [0, 1],
f is strictly convex if the inequality holds with < for ∀x1, x2 ∈ C with x1 6= x2 and
∀α ∈ (0, 1).
f is concave if −f is convex.
The epigraph of f : S ⊆ Rn → R, denoted by epi(f ), is the subset of Rn+1
epi(f ) = {(x , y) ∈ S × R : f (x) ≤ y}.
Let f : C → R be convex, the domain of f is the subset of Rn
dom(f ) = {x ∈ C : f (x) < +∞}.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 16 / 23
Property:
Let C ⊆ Rn be a (nonempty) convex set and f : C → R be a convex function.
For each real β (also for +∞), the level set
Lβ = {x ∈ C : f (x) ≤ β} and {x ∈ C : f (x) < β}
is a convex subset of Rn.
The function f is continuous in the relative interior (with respect to aff(C)) of itsdomain.
The function f is convex if and only if epi(f ) is a convex subset of Rn+1 (exercise1.3).
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 17 / 23
Optimal solution of convex problems
Consider minx∈C⊆Rn f (x) where C ⊆ Rn is a convex set and f is a convex function.
Proposition:
If C ⊆ Rn is convex and f : C → R is convex, each local minimum of f on C is a global
minimum.
If f is strictly convex on C , then there exists at most one global minimum (the problemcould be unbounded).
Proof: Suppose x ′ is a local minimum and ∃ x∗ ∈ C such that f (x∗) < f (x ′).
Since f is convex
f (αx ′ + (1− α)x∗) ≤ αf (x ′) + (1− α)f (x∗) < f (x ′) ∀α ∈ (0, 1)
contradicts the fact that x ′ is a local minimum.
If f is strictly convex and x∗1 and x∗
2 are two global minima, the convexity of C implies
1
2(x∗
1 + x∗2 ) ∈ C
and strict convexity of f implies
f (1
2(x∗
1 + x∗2 )) <
1
2f (x∗
1 ) +1
2f (x∗
2 ).
Thus x∗1 and x∗
2 cannot be two global minima.Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 18 / 23
Characterizations of convex functions
1) Proposition: A continuously differentiable function (of class C1) f : C → R definedon an open and nonempty convex set C ⊆ R
n is convex if and only if
f (x) ≥ f (x) +∇t f (x)(x − x) ∀x , x ∈ C .
f is strictly convex if and only if the inequality holds with > for every pair x , x ∈ C withx 6= x .
Definition: Directional derivative of f : limα→0+f (x+α(x−x))−f (x)
α= ∇t f (x)(x − x)
Geometric interpretation:
The linear approximation of f at x (1st order Taylor’s expansion) bounds below f (x) and
H = {
(
xy
)
∈ Rn+1 : (∇t f (x) − 1)
(
xy
)
= −f (x) +∇t f (x) x }
is a supporting hyperplane of epi(f ) in (x , f (x)), with epi(f ) ⊆ H−.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 19 / 23
2) Proposition: A twice continuously differentiable function (of class C2) f : C → R
defined on an open and nonempty convex set C ⊆ Rn is convex if and only if the Hessian
matrix ∇2f (x) = ( ∂2f∂xi∂xj
) is positive semidefinite at every x ∈ S .
For C2 functions, if ∇2f (x) is positive definite ∀x ∈ C then f (x) is strictly convex.
N.B.: This condition is sufficient but not necessary: f (x) = x4 is strictly convex butf ′′(0) = 0.
Definition:A symmetric matrix A n × n is positive definite if y tAy > 0 ∀y ∈ R
n with y 6= 0,
A symmetric matrix A n × n is positive semidefinite if y tAy ≥ 0 ∀y ∈ Rn.
Equivalent definitions: based on the sign of the eigenvalues/principal minors of A or ofthe diagonal coefficients of specific factorizations of A (e.g., Cholesky factorization).
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 20 / 23
Subgradient of convex and concave functions
Convex/concave (continuous) functions that are not everywhere differentiable, e.g.f (x) = |x |.
Generalization of the concept of gradient for C1 functions to piecewise C1 functions.
Definitions: Let C ⊆ Rn be a convex set and f : C → R be a convex function on C
• a vector γ ∈ Rn is a subgradient of f at x ∈ C if
f (x) ≥ f (x) + γt(x − x) ∀x ∈ C ,
• the subdifferential, denoted by ∂f (x), is the set of all the subgradients of f at x .
Example: For f (x) = x2, in x = 3 the only subgradient is γ = 6.
Indeed, 0 ≤ (x − 3)2 = x2 − 6x + 9 implies for every x :
f (x) = x2 ≥ 6x − 9 = 9 + 6(x − 3) = f (x) + 6(x − x)
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 21 / 23
Other examples:
1) For f (x) = |x | it is clear that: γ = 1 if x > 0, γ = −1 if x < 0, ∂f (x) = [−1, 1] ifx = 0.
2) Consider f (x) = min{f1(x), f2(x)} with f1(x) = 4− |x | and f2(x) = 4− (x − 2)2.
Since f2(x) ≥ f1(x) for 1 ≤ x ≤ 4,
f (x) =
{
4− x 1 ≤ x ≤ 44− (x − 2)2 otherwise
γ = −1 for x ∈ (1, 4),
γ = −2(x − 2) for x < 1 or x > 4,
γ ∈ [−1, 2] at x = 1,
γ ∈ [−4,−1] at x = 4.
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 22 / 23
Properties:
1) A convex function f : C → R admits at least a subgradient at every interior point x ofC . In particular, if x ∈ int(C) then there exists γ ∈ R
n such that
H = {(x , y) ∈ Rn+1 : y = f (x) + γ
t(x − x)}
is a supporting hyperplane of epi(f ) at (x , f (x)).
N.B.: The existence of (at least) a subgradient at every point of int(C), with C convex,is a necessary and sufficient condition for f to be convex on int(C).
2) If f is a convex function and x ∈ C , ∂f (x) is a nonempty, convex, closed and boundedset.
3) x∗ is a (global) minimum of a convex function f : C → R if and only if 0 ∈ ∂f (x∗).
Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 23 / 23