32
Unconstrained Univariate Optimization Univariate optimization means optimization of a scalar function of a single variable: y = P(x) These optimization methods are important for a variety of reasons: 1) there are many instances in engineering when we want to find the optimum of functions such as these (e.g. optimum reactor temperature, etc.), 2) almost all multivariable optimization methods in commercial use today contain a line search step in their algorithm, 3) they are easy to illustrate and many of the fundamental ideas are directly carried over to multivariable optimization. Of course these discussions will be limited to nonlinear functions, but before discussing optimization methods we need some mathematical background.

Unconstrained Univariate Optimization - University of …aksikas/unlp1.pdf · 2008-10-17 · Unconstrained Univariate Optimization Univariate optimization means optimization of a

Embed Size (px)

Citation preview

Unconstrained UnivariateOptimization

Univariate optimization means optimization of a scalarfunction of a single variable:

y = P(x)

These optimization methods are important for a varietyof reasons:

1) there are many instances in engineering whenwe want to find the optimum of functions such as these

(e.g. optimum reactor temperature, etc.),

2) almost all multivariable optimization methodsin commercial use today contain a line search step intheir algorithm,

3) they are easy to illustrate and many of thefundamental ideas are directly carried over tomultivariable optimization.

Of course these discussions will be limited to nonlinearfunctions, but before discussing optimization methodswe need some mathematical background.

Unconstrained UnivariateOptimization

Continuity

In this course we will limit our discussions to continuousfunctions.

Functions can be continuous but there derivatives maynot be.

A function P(x) is continuous at a point x0 iff:

x0

continuous

x0

discontinuous

x0

continuousderivatives

x0

discontinuousderivatives

lim lim + -

P x exists

and

P x P x P xx x x x

( )

( ) ( ) ( )

0

0 0 00 0! !

" "

Unconstrained UnivariateOptimization

Some discontinuous functions include:• price of processing equipment,• manpower costs,• measurements,

Many of the functions we deal with in engineering areeither continuous or can be approximated as continuousfunctions. However, there is a growing interest in solvingoptimization problems with discontinuities.

Convexity of Functions

In most optimization problems we require functionswhich are either convex or concave over some region(preferably globally):

M

concave

xa

convex

xb

P(x) P(x)

xa xb

Unconstrained UnivariateOptimization

This can be expressed mathematically as:

Convexity is a general idea used in optimization theorywhich when applied to a function, describes the propertyof having an optimum. Convexity combines bothstationarity and curvature into a single concept. Oftenboth convex and concave functions are described asexhibiting the property of convexity.

These conditions are very inconvenient for testing aspecific function, so we tend to test the derivatives of thefunction at selected points. First derivatives give us ameasure of the rate of change of the function. Secondderivatives give us a measure of curvature or the rate ofchange of the second derivatives.

[ ]

[ ]( ) ( ) [ ] ( )

[ ]

[ ]( ) ( ) [ ] ( )

i) for a convex function and 0,1 ,

ii) for a concave function and 0,1 ,

a b a b

a b a b

! "

+ # $ + #

! "

+ # % + #

&

& & & &

&

& & & &

P x x P x P x

P x x P x P x

1 1

1 1

Unconstrained UnivariateOptimization

Notice for a convex function:

The first derivatives increase (become more positive) asx increases. Thus, in the region we are discussing, thesecond derivative is positive. Since the first derivativestarts out negative and becomes positive in the region ofinterest, there is some point where the function no longerdecreases in value as x increases and this point is thefunction’s minimum. This is another way of stating thatthe point x* is a (local) minimum of the function P(x) iff:

If this can be stated as a strict inequality x* is said to bea unique (local) minimum.

x

P(x)dP

dx< 0

dP

dx> 0

dP

dx= 0

P x P x x x x( ) ( )*! " # [ , ]a b

Unconstrained UnivariateOptimization

A similar line of argument can be followed for concavefunctions, with the exception that: the first derivativesdecrease (become more negative) with increasing x, thesecond derivative is negative and as a result, theoptimum is a maximum.

The problem with these ideas is that we can onlycalculate the derivatives at a point. So looking for anoptimum using these fundamental ideas would beinfeasible since a very large number of points(theoretically all of the points) in the region of interestmust be checked.

Necessary and Sufficient Conditions for an Optimum

Recall that for a twice continuously differentiable functionP(x), the point x* is an optimum iff:

= 0

and:

0

0

dP

dx

d P

dx

d P

dx

x

x

x

*

*

*

2

2

2

2

>

<

stationarity

a minimum

a maximum

Unconstrained UnivariateOptimization

What happens when the second derivative is zero?Consider the function:

Which we know to have an minimum at x = 0.

At the point x=0:

y x =4

y

x0

y x =4

dy

dxx

d y

dxx

d y

dxx

d y

dx

x

x

x

x

=

=

=

=

= =

= =

= =

= >

0

3

2

2

0

2

3

3

0

4

4

0

4 0

12 0

24 0

24 0

stationarity

a minimum?

Unconstrained UnivariateOptimization

We need to add something to our optimality conditions. Ifat the stationary point x* the second derivative is zero wemust check the higher-order derivatives. Thus, for thefirst non-zero derivative is odd, that is:

Then x* is an inflection point. Otherwise, if the firsthigher-order, non-zero derivative is even:

Some Examples:

1)

{ }d P

dx n

n

n

x*

! "0 where 3,5,7,K

{ }

{ }

d P

dx n

d P

dx n

n

n

x

n

n

x

*

*

> !

< !

0

0

where 4,6,8,

where 4,6,8,

K

K

a minimum

a maximum

minimum

maximum

min

:

:

*

xax bx c

stationarity

dP

dxax b x

b

a

curvature

d P

dxa

a

a

2

2

2

2 02

20

0

+ +

= + = ! ="

=> !

< !

#$%

Unconstrained UnivariateOptimization

2)

3)

min

:

:

*

*

**

x

x

x x x

stationarity

dP

dxx x x

curvature

d P

dx

x

x

3

x - 4

minimum

maximum

3 2

2

2

22 19

3

2 5 6

4 5 02 19

3

6

2 19 02 19

3

2 19 02 19

3

! ! +

= ! ! = " =±

=

> " =+#

$%

&

'(

! < " =!#

$%

&

'(

)

*++

,++

min )x

x ( n!1

P x x x x( ) = ! ! +3 22 5 6

P(x)

x

Unconstrained UnivariateOptimization

4)A more realistic example. Consider the situation wherewe have two identical settling ponds in a waste watertreatment plant:

These are normally operated at a constant throughputflow of F, with an incoming concentration ofcontaminants ci. The system is designed so that atnominal operating conditions the outgoing concentrationco is well below the required levels. However, we knowthat twice a day (~7 am and 7 pm) the incomingconcentration of contaminants to this section of the plantdramatically increase above these nominal levels. Whatwe would like to determine is whether the peak outgoinglevels will remain within acceptable limits.

What assumptions can we make?1) well mixed,2) constant density,3) constant flowrate.

V,ρ

F, ci

F, ciV,ρ

F, c1

Unconstrained UnivariateOptimization

contaminant balance:

Let the residence time be . Then thebalances become:

Since we wish to know how changes in ci will affect theoutlet concentration co. Define perturbation variablesaround the nominal operating point and take LaplaceTransforms:

Combining and simplifying into a single transfer function:

d Vc

dtFc Fc

d Vc

dtFc Fc

o

o

i

!

!

= "

= "

1

1

1

second pond

first pond

!"

=V

F

!

!

dc

dtc c

dc

dtc c

o

o

i

= "

= "

1

1

1

second pond

first pond

!

!

s C s C s C s

s C s s C s

o o

i

C

( ) ( ) ( )

( ) ( ) ( )

= "

= "

1

1 1

second pond

first pond

( )C s

sC si0 2

1

1( ) ( ) =

+!

Unconstrained UnivariateOptimization

If we represent the concentration disturbances as idealimpulses:

Then, transfer function of the outlet concentrationbecomes:

In the time-domain this is:

Since we want to determine:

C s ki( ) =

( )C s

k

s0 2

1( ) =

+!

c t kte

t

0 ( ) =!"

co(t)

tt*

co*

maximum

( > 0)

max ( )

:

:

to

t

o

t

o

t

t

t

c t kte

stationarity

dc

dtk

te

tt

curvature

d c

dtkt e ke

k

=

= !"#$

%&' = ( !"

#$

%&' = ( =

= !"#$

%&' = ! < (

!

!

=

!

=

!

)

)

)

)

)

) ))

) ) )

1 0 1 0

2 02

2

1

Unconstrained UnivariateOptimization

Finally, the maximum outlet concentration is:

This example serves to highlight some difficulties withthe analytical approach. These include:

→ determining when the first derivative goes to zero.This required the solution of a nonlinear equationwhich is often as difficult as the solution of theoriginal optimization problem.

→ computation of the appropriate higher-orderderivatives.

→ evaluation of the higher-order derivatives at theappropriate points.

We will need more efficient methods for optimization ofcomplex nonlinear functions. Existing numerical methodscan be classified into two broad types:

1) those that rely solely on the evaluation of the objective function,

2) those that use derivatives (or approximationsof the derivatives) of the objective function.

c k ek V

F0

1 0 368( ) .! !"

= #$

Interval Elimination Methods

These methods work by systematically reducing the sizeof the interval within which the optimum is believed toexist. Since we are only ever working with an interval,we can never find the optimum exactly. There are manyinterval elimination algorithms; however, they all consistof the same steps:

1) determine an interval in which the optimum is believed to exist,2) reduce the size of the interval,3) check for convergence,4) repeat steps 2 & 3 until converged.

We will examine only one interval elimination method,called the “Golden Section Search”.

Scanning & Bracketing As an example, consider that our objective is to minimize

a function P(x). In order to start the optimizationprocedure we need determine the upper and lowerbounds for an interval in which we believe an optimumwill exist. These could be established by:

• physical limits,• scanning,

- choosing a starting point,- checking the function initially

decreases,- finding another point of higher value.

Interval Elimination Methods

Golden Section Search

Consider that we want to break up the interval, in whichwe believe the optimum exists, into two regions:

where:

Since L=l1+l2 , l2 =rL and l1 =r2L, then:

There are two solutions for this quadratic equation. Weare only interested in the positive root:

Which has the interesting property:

a bL

l1 l2x

l

l

l

Lr

1

2

2= =

or:

L rL r L

r r

= +

+ ! =

2

21 0

r =! +

"1 5

20 61803398. K

rr

=+

1

1

Golden ratio

Interval Elimination Methods

We will use this Golden ratio to break up our interval asfollows:

By breaking up the interval in this way we will be able tolimit ourselves to one simple calculation of an xi and oneobjective function evaluation per iteration.

The optimization algorithm is:1) using the interval limits ak and bk, determine

x1k and x2

k,2) evaluate P(x1

k ) and P(x2k ),

3) eliminate the part of the interval in which theoptimum is not located,

4) repeat steps 1 through 3 until the desiredaccuracy is obtained.

How can we decide which part of the interval toeliminate?

ak bk

L

x1k

rL

xx2k

rL

Interval Elimination Methods

There are only three possible situations. For theminimization case, consider:

P(x)

xx1k x2

k

P(x1k)>P(x2

k)

P(x)

xx1k x2

k

P(x1k)<P(x2

k)

P(x)

xx1k x2

k

P(x1k)=P(x2

k)

eliminate the interval to the leftof x1

k. For a convex function, allvalues to the right of x1

k are less than those on the left.

eliminate the interval to the rightof x2

k. For a convex function, allvalues to the left of x2

k are less than those on the right.

we know that the optimum is between x1

k and x2k, so we could

eliminate the intervals to the leftof x1

k and to the right of x2k. In

practice we eliminate only oneregion for computational ease.

Interval Elimination Methods

Graphically the algorithm is:

Three things worth noting:1) only one new point needs to be evaluated

during each iteration,2) the length of the remaining interval after k

iterations (or accuracy to which you know the optimum value of x) is:

3) the new point for the current iteration is:

P(x)

xx1k

x2k +1

bkak

ak+1 bk +1x1k +1

x2k

eliminate kth iteration eliminate k+1th iteration

interval after k+1th iteration

L r L Lk k k = !

0 00 618034( . )

or:

x a b x

x a b x

k k k k

k k k k

1 2

2 1

= + !

= + !

Interval Elimination Methods

Golden-Section Example

Find the solution to within 5% for:

Guess an initial interval for the optimum as .

Note that the number of calculation we have performedis:

• (k+2) = 8 evaluations of P(xi),• (k+2) = 8 determinations of a new point xi.

There are a variety of other interval elimination methodsincluding Interval Bisection, Fibonacci Searches, and soforth. They are all simple to implement and notcomputationally intensive. They tend to be fairly robustand somewhat slow to converge.

minx

x x 2

2 1! +

0 2! !x*

0 0.0000 2.0000 0.7640 1.2360 5.5696E-02 5.5696E-02

1 0.7640 2.0000 1.2360 1.5280 5.5696E-02 2.7878E-01

2 0.7640 1.5280 1.0558 1.2360 3.1158E-03 5.5696E-02

3 0.7640 1.2360 0.9443 1.0558 3.1025E-03 3.1158E-03

4 0.7640 1.0558 0.8755 0.9443 1.5507E-02 3.1025E-03

5 0.8755 1.0558 0.9443 0.9869 3.1025E-03 1.7130E-04

6 0.9443 1.0558 0.9869 1.0132 1.7130E-04 1.7440E-04

k ak bk x1k x2

k P(x1k) P(x2

k)

Polynomial Approximation Methods

There are a number of polynomial approximationmethods, which differ only in the choice of polynomialused to locally approximate the function to be optimized.Polynomials are attractive since they are easily fit to dataand the optima of low-order polynomials are simple tocalculate. We will only discuss Successive QuadraticApproximation as it illustrates the basic ideas and willcarry over into multivariable optimization techniques.

Successive Quadratic Approximation

Suppose we have evaluated some test points of afunction as follows:

We could approximate the objective function by fitting aquadratic through the test points.

x

P(x) P x ax bx c( ) ! + +2

x1 x2 x3

Polynomial Approximation Methods

The main advantage of using a quadratic approximationis that we know that optimum is:

The first step is to calculate the approximation constants(a,b,c) in the quadratic from the available points (x1, x2,x3). Using these three evaluation points we can write:

We have three equations and three unknowns, so ingeneral the constants (a,b,c) can be uniquelydetermined. In matrix form the set of equations arewritten:

Which has the solution:

$*x

b

a = !

2

P x ax bx c

P x ax bx c

P x ax bx c

( )

( )

( )

1 1

2

1

2 2

2

2

3 3

2

3

= + +

= + +

= + +

linear in (a,b,c)

x x

x x

x x

a

b

c

P x

P x

P x

1

2

1

2

2

2

3

2

3

1

2

3

1

1

1

!

"

###

$

%

&&&

!

"

###

$

%

&&&

=

!

"

###

$

%

&&&

( )

( )

( )

a

b

c

x x

x x

x x

P x

P x

P x

!

"

###

$

%

&&&

=

!

"

###

$

%

&&&

!

"

###

$

%

&&&

-1

1

2

1

2

2

2

3

2

3

1

2

3

1

1

1

( )

( )

( )

Polynomial Approximation Methods

Some algebra yields the solution for the minimum as:

Graphically, the situation is:

To perform the next iteration, we need to chose whichthree points to use next. The textbook gives you someelaborate rules for coding into a computer. If you’redoing hand calculations choose the point (xi) with thesmallest value of the objective function P(x) and thepoints immediately on either side. Note that will notnecessarily have the smallest objective function value.

Finally, the accuracy to which the optimum is known atany iteration is given by the two points which bracket thepoint with the smallest objective function value.

( ) ( ) ( )( ) ( ) ( )

$( ) ( ) ( )

( ) ( ) ( )

*x

x x P x x x P x x x P x

x x P x x x P x x x P x=

! + ! + !

! + ! + !

"

#$$

%

&''

1

2

2

2

3

2

1 3

2

1

2

2 1

2

2

2

3

2 3 1 3 1 2 1 2 3

x

P(x) P x ax bx c( ) ! + +2

x1 x2 x3

$*x

$*x

Polynomial Approximation Methods

The procedure is:1) choose three points that bracket the optimum,2) determine a quadratic approximation for the

objective function based on thesethree points.

3) calculate the optimum of the quadratic approximation (the predicted optimum ),4) repeat steps 1 to 3 until the desired accuracy

is reached.

Example: revisit the wastewater treatment problemwith τ = 1.

$*x

max mint

t

t

t

te te ! !

" !

iteration t t t P t P t P t t P t1 2 3 1 2 3

0 0 5 15 2 5 0 3303 0 3347 0 2052 11953 0 3617

1 0 5 11953 15 0 3303 0 3617 0 3347 10910 0 3664

2 0 5 10910 11953 0 3303 0 3664 0 3617 10396 0 3676

3 0 5 10396 10910 0 3303 0 3676 0 3664 10185 0 3678

( ) ( ) ( ) $ ( $ )* *

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

! ! ! !

! ! ! !

! ! ! !

! ! ! !

4 0 5 10185 10396 0 3303 0 3678 0 3676 10083 0 3679. . . . . . . .! ! ! !

Newton’s Method

This is a derivative-based method that uses first andsecond derivatives to approximate the objective function.The optimum of the objective function is then estimatedusing this approximation.

Consider the 2nd-order Taylor Series approximations ofour objective function:

Then we could approximate the first derivative of theobjective function as:

Since we require that the first derivative of the objectivefunction to be zero at the optimum, then:

( ) ( )P x P xdP

dxx x

d P

dxx x

k

x

k

x

k

k k

( ) ( ) ! + " + "1

2

2

2

2

a quadratic approximation of P(x) at the point xk.

( )dP

dx

dP

dx

d P

dxx x

x x

k

k k

! + "1

22

2

2( )

( )dP

dx

dP

dx

d P

dxx x

x x

k

k k

! + " =2

20

Newton’s Method

A little algebra will allow us to estimate what theoptimum value of x is:

This is called a Newton step and we can iterate on thisequation to find the optimum value of x.

Graphically, this is:

Newton’s method attempts to find the stationary point ofa function by successive approximation using the firstand second derivatives of the original objective function.

$x x x

dP

dx

d P

dx

k k

x

x

k

k

*

= = !

"

#

$$$$

%

&

''''

+1 2

2

xxk

x*

dP

dx

xk+10

Newton’s Method

Example: revisit the wastewater treatment problem with τ = 1.

In summary, Newton’s Method:1) shows relatively fast convergence (finds the

optimum of a quadratic in a single step),2) requires first and second derivatives of the

function,3) has a computational load of first and second

derivative evaluations, and a Newton step calculation per iteration,

4) may oscillate or not converge when theremany local optima or stationary points.

max mint

t

t

t

te te ! !

" !

( ) ( ) ( )iteration x P x P x P xk k k k

! !!

" "

" "

" "

" "

" "

0 0 0000 1000 2 0000 0 0000

1 05000 03303 09098 03303

2 08333 0 0724 05070 03622

3 09762 0 0090 03857 03678

4 09994 0 0002 03683 03679

5 10000

. . . .

. . . .

. . . .

. . . .

. . . .

.

A major drawback of Newton’s method is that it requiresus to have analytically determined both the first andsecond derivatives of our objective function. Often this isconsidered onerous, particularly in the case of thesecond derivative. The large family of optimizationalgorithms that use finite difference approximations ofthe derivatives are called “Quasi-Newton” methods.

There are a variety of ways first derivatives can beapproximated using finite differences:

Approximations for the second derivatives can bedetermined in terms of either the first derivatives or theoriginal objective function:

Quasi-Newton Methods

( ) ( )

( ) ( )

( ) ( )

! "+ # #

! "+ #

! "# #

P xP x x P x x

x

P xP x x P x

x

P xP x P x x

x

( )

( )

( )

$ $

$

$

$

$

$

2

( ) ( )

( ) ( )

( ) ( )

!! "! + # ! #

!! "! + # !

!! "! # ! #

P xP x x P x x

x

P xP x x P x

x

P xP x P x x

x

( )

( )

( )

$ $

$

$

$

$

$

2

central

forward

backward

Quasi-Newton Methods

The major differences within the Quasi-Newton family ofalgorithms arises from: the number of function and / orderivatives that must be evaluated, the speed ofconverge and the stability of the algorithm.

Regula Falsi

This method approximates the second derivative as:

where xp and xq are chosen so that andhave different signs (i.e. the two points bracket thestationary point).

( ) ( )

( ) ( )

( ) ( )

!! "+ # + #

!! "+ # + +

!! "# # + #

P xP x x P x P x x

x

P xP x x P x x P x

x

P xP x P x x P x x

x

( )( )

( )( )

( )( )

$ $

$

$ $

$

$ $

$

2

2 2

2 2

2

2

2

d P

dx

P x P x

x x

q p

q p

2

2 !

" # "

#

( ) ( )

!P x p( ) !P xq

( )

Quasi-Newton Methods

Substituting this approximation into the formula for aNewton step yields:

The Regula Falsi method iterates on this formula takingcare on each iteration to retain two points:

depending upon:

or:

Graphically:

( )$

*x x

dP

dxx x

dP

dx

dP

dx

k q

x

q p

x x

q

q p

= !

!

!"

#$$

%

&''

(

)

*****

+

,

-----

$*x x xk p q and or

[ ] [ ]sign P x sign P xk

p! = !( $ ) ( )*

[ ] [ ]sign P x sign P xk

q! = !( $ ) ( )*

keep and $*x xk q

keep and $*x xk p

xxq

x*

dP

dx

0 xp

$*xk

$*xk+1

Quasi-Newton Methods

Example: revisit the wastewater treatmentproblem with τ = 1.

note: computational load is one derivative evaluation, a sign check and a step calculation per iteration.

In summary, Quasi-Newton methods:1) show slower convergence rates the Newton’s

Method,2) use some finite difference approximations for

derivatives of the objective function,3) may oscillate or not converge when there

many local optima or stationary points.

iteration x x x P x P x P xp q k p q k$ ( ) ( ) ( $ )

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

* *! ! !

"

"

"

"

"

"

"

0 0 2 0000 17616 1 01353 01308

1 0 17616 15578 1 01308 01175

2 0 15578 13940 1 01175 0 0978

3 0 13940 12699 1 0 0978 0 0758

4 0 12699 11804 1 0 0758 0 0554

5 0 11804 11184 1 0 0554 0 0387

6 0 11184 10768 1 0 0387 0 0262

max mint

t

t

t

te te ! !

" !

Univariate Summary

There are only two basic ideas in unconstrainedunivariate optimization:

1) direct search (scanning, bracketing, interval elimination, approximation),

2) derivative based methods (optimalityconditions, Newton and Quasi-Newton methods).The direct search methods work directly on the objectivefunction to successively determine better values of x.The derivative based methods try to find the stationarypoint of the objective function, usually by some iterativeprocedure.

Choosing which method to use is always a trade-offbetween:

1) ease of implementation,2) convergence speed,3) robustness,4) computational load (number of function evaluations, derivative evaluations /

approximations, etc.).There are no clear answers as to which univariatemethod is better. The best choice is problem dependent:

• for nearly quadratic functions, the Newton /Quasi-Newton methods are generally a goodchoice,

• for very flat functions, interval eliminationmethods can be very effective.

Univariate Summary

When the univariate search is buried in a multivariatecode, the best choice of univariate search will dependupon the properties and requirements of the multivariatesearch method.

Finally remember that to solve an unconstrainedunivariate optimization problem you must:

1) have a function (and derivatives) you canevaluate,

2) choose an optimization method appropriate forthe problem,

3) choose a starting point (or interval),4) select a termination criteria (or accuracy).