View
246
Download
0
Category
Preview:
Citation preview
IE 5531: Engineering Optimization ILecture 14: Unconstrained optimization
Prof. John Gunnar Carlsson
October 27, 2010
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1 / 21
Administrivia
Midterms returned 11/01
11/01 o�ce hours moved
PS5 posted this evening
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 2 / 21
Recap: Applications of KKT conditions
Applications of KKT conditions:
Portfolio optimization
Public good allocation
Communication channel power allocation (water-�lling)
Fisher's exchange market
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 3 / 21
Today
Algorithms for unconstrained minimization:
Introduction
Bisection search
Golden section search
Line search
Wolfe, Goldstein conditions
Gradient method (steepest descent)
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 4 / 21
Introduction
Today's lecture is focused on solving the unconstrained problem
minimize f (x)
for x ∈ Rn
Ideally, we would like to �nd a global minimizer, i.e. a point x∗ suchthat f (x∗) ≤ f (x) for all x ∈ Rn
In general, as we have seen with the KKT conditions, we have tosettle for a local minimizer, i.e. a point x∗ such that f (x∗) ≤ f (x) forall x in a local neighborhood N (x∗)
If f (x) is convex, these two notions are the same
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 5 / 21
Necessary and su�cient conditions
If x∗ is a local minimizer, then there must be no descent direction, i.e.a direction d such that ∇f (x∗)T d < 0
This immediately implies that ∇f (x∗) = 0
We also need to distinguish between local maximizers and local
minimizers, so we also require that H � 0, where hij = ∂2f (x∗)∂xi∂xj
The stronger condition H � 0 is a su�cient condition for x∗ to be aminimizer
Again, if f (x) is convex (and continuously di�erentiable), then∇f (x∗) = 0 is a necessary and su�cient condition for a global
minimizer
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 6 / 21
Overview
Optimization algorithms tend to be iterative procedures:
Starting at a given point x0, they generate a sequence {xk} of iteratesThis sequence terminates when either no more progress can be made(out of memory, etc.) or when a solution point has been approximatedsatisfactorily
At any given iterate xk , we generally want xk+1 to satisfyf (xk+1) < f (xk)
Furthermore, we want our sequence to converge to a local minimizerx∗
The general approach is a line search:
At any given iterate xk , choose a direction dk , and then set
xk+1 = xk + αkdk for some scalar αk > 0
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 7 / 21
Convergent sequences
De�nition
Let {xk} be a sequence of real numbers. Then {xk} converges to x∗ if andonly if for all real numbers ε > 0, there exists a positive integer K suchthat ‖xk − x∗‖ < ε for all k ≥ K .
Examples of convergence:
xk = 1/k
xk = (1/2)k
xk =[
1log(k+1)
]k
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 8 / 21
Searching in one variable: root-�nding
Intermediate value theorem: given a continuous single-variablefunction f (x) and a pair of points x0 and x1 such that f (x`) < 0 andf (xr ) > 0, there exists a point x∗ ∈ [x`, xr ] such that f (x∗) = 0
A simpler question to motivate: how can we �nd x∗ (or a point withinε of x∗)?
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 9 / 21
Bisection
1 Choose xmid = x`+xr
2and evaluate f (xmid)
2 If f (xmid) = 0, then x∗ = xmid and we're done
3 Otherwise,
1 If f (xmid) < 0, then set x` = xmid
2 If f (xmid) > 0, then set xr = xmid
4 If xr − x` < ε, we're done; otherwise, go to step 1
The algorithm above divides the search interval in half at every iteration;thus, to approximate x∗ by ε we require at most log2
r−`ε iterations
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 10 / 21
Golden section search
Consider a unimodal function f (x) de�ned on an interval [x`, xr ]
Unimodal: f (x) has only one local minimizer x∗in [x`, xr ]
How can we �nd x∗ (or a point within ε of x∗)?
Hint: we can do this without derivatives
Hint: we need to sample two points x′`, x
′r in [x`, xr ]
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 11 / 21
Golden section search
Assume without loss of generality that x` = 0 and xr = 1; set ψ = 3−√5
2
1 Set x′` = ψ and x
′r = 1− ψ.
2 If f(x′`
)< f
(x′r
), then the minimizer must lie in the interval[
x`, xr ′], so set xr = x
′r
3 Otherwise, the minimizer must lie in the interval[x`′ , xr
], so set
x` = x`′
4 If xr − x` < ε, we're done; otherwise, go to step 1
By setting ψ = 3−√5
2we decrease the search interval by a constant factor
1− ψ ≈ 0.618
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 12 / 21
Line search: step length
Consider the multi-dimensional problem
minimize f (x)
for x ∈ Rn
At each iteration xk we set dk = −∇f (xk) and setxk+1 = xk + αkdk , for appropriately chosen αk
Ideally, we would like for αk to be the minimizer of the univariatefunction
φ (α) := f (xk + αdk)
but this is time-consuming
In the big picture, we want αk to give us a su�cient reduction inf (x), without spending too much time on it
Two conditions we can impose are the Wolfe and Goldstein conditions
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 13 / 21
Armijo condition
Clearly the step length αk should guarantee a su�cient decrease inf (x), so we require
φ (α) = f (xk + αdk) ≤ f (xk) + c1α∇f (xk)Tdk
with c1 ∈ (0, 1)
The right-hand side is linear in α
Note that this is satis�ed for all α that are su�ciently small
In practice, we often set c1 ≈ 10−4
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 14 / 21
Curvature condition
The preceding condition is not su�cient because an arbitrarily small αsatis�es it, which means that {xk} may not converge to a minimizer
One way to get around this is to impose the additional condition
φ′(α) = ∇f (xk + αdk)
Tdk ≥ c2∇f (xk)
Tdk
where c2 ∈ (c1, 1)
This condition just says that the slope at φ (α) has to be more than c2times the slope at φ (0)
Typically we choose c2 ≈ 0.9
If the slope at φ (α) were really small, it would mean that our step sizewasn't chosen very well (we could continue in that direction anddecrease the function)
The Armijo condition and the curvature condition, when combined,are called the Wolfe conditions
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 15 / 21
Goldstein conditions
An alternative to the Wolfe conditions is the Goldstein conditions:
f (xk)+(1− c)α∇f (xk)Tdk ≤ f (xk + αdk) ≤ f (xk)+cα∇f (xk)
Tdk
with c ∈ (0, 1/2)
The second inequality is just the su�cient decrease condition
The �rst inequality bounds the step length from below
One disadvantage is that the local minimizers of φ (α) may beexcluded in this search
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 16 / 21
Steepest (gradient) descent example
Recall that in the method of steepest descent, we set dk = −∇f (xk)
Consider the case where we want to minimize
f (x) = cTx +
1
2xTQx
where Q is a symmetric positive de�nite matrix
Clearly, the unique minimizer lies where ∇f (x∗) = 0, which occursprecisely when
Qx = −c
The descent direction will be d = −∇f (x) = − (c + Qx)
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 17 / 21
Steepest descent example
The iteration schemexk+1 = xk + αkdk
is given byxk+1 = xk − αk (c + Qxk)
We need to choose a step size αk , so we consider
φ (α) = f (xk − α (c + Qxk))
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 18 / 21
Steepest descent example
Note that we don't even need the Wolfe or Goldstein conditions, as wecan �nd the optimal α analytically!
φ (α) = f (xk − α (c + Qxk))
= cT (xk − α (c + Qxk))
+1
2(xk − α (c + Qxk))
T Q (xk − α (c + Qxk))
Since φ (α) is a strictly convex quadratic function in α it is not hardto see that its minimizer occurs where
cTdk + x
Tk Qdk + αdT
k Qdk = 0
and thus we set
αk =dTk dk
dTk Qdk
with dk = − (c + Qxk)
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 19 / 21
Steepest descent example
The recursion for the steepest descent method is therefore
xk+1 = xk −(
dTk dk
dTk Qdk
)dk
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 20 / 21
Convergence of steepest descent
Theorem
Let f (x) be a given continuously di�erentiable function. Let x0 ∈ Rn be a
point for which the sub-level set
X0 = {x ∈ Rn : f (x) ≤ f (x0)}
is bounded. Let {xk} be a sequence of points generated by the steepest
descent method initiated at x0, using either the Wolfe or Goldstein line
search conditions. Then {xk} converges to a stationary point of f (x).
The above theorem gives what is called the global convergence
property of the steepest-descent methodNo matter how far away x0 is, the steepest descent method mustconverge to a stationary pointThe steepest descent method may, however, be very slow to reachthat point
Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 21 / 21
Recommended