Survey of gradient based constrained optimization algorithms

PowerPoint Presentation

Survey of gradient based constrained optimization algorithmsSelect algorithms based on their popularity.Additional details and additional algorithms in Chapter 5 of Haftka and Gurdals Elements of Structural Optimization

Optimization with constraintsStandard formulation

Equality constraints are a challenge, but are fortunately missing in most engineering design problems, so this lecture will deal only with equality constraints.

Derivative based optimizersAll are predicated on the assumption that function evaluations are expensive and gradients can be calculated.Similar to a person put at night on a hill and directed to find the lowest point in an adjacent valley using a flashlight with limited batteryBasic strategy: Flash light to get derivative and select direction. Go straight in that direction until you start going up or hit constraint.Repeat until converged.Some methods move mostly along the constraint boundaries, some mostly on the inside (interior point algorithms)

As for unconstrained gradient based algorithms the basic idea is to calculate a direction based on the gradients, but here they include the gradient of both the objective function and the constraints (possibly only the active ones). Then because gradient calculation is expensive we move in that direction as far as we can.

It can be likened to a person with a flashlight with very limited battery trying to go down hill in a terrain with fenced areas. Shining the flashlight is the equivalent of calculating the gradients, and you use it to select a direction. Then you move until you either go up or bump into a constraint.

Some algorithms move mostly away from constraints and some inch along the constraint boundaries.3Gradient projection and reduced gradient methodsFind good direction tangent to active constraintsMove a distance and then restore to constraint boundariesA typical active set algorithm, used in Excel

The gradient projection algorithm starts with a point on the boundary of the feasible domain and moves in the plane tangent to all the active constraints there. The direction is the projection of the gradient on that plane.

This move will end either when a new constraint is encountered or when the move will take it too far from the constraints boundary due to their nonlinearity. Then there is a restoration move that brings it back to the constraint boundary.4Penalty function methodsQuadratic penalty function

Gradual rise of penalty parameter leads to sequence of unconstrained minimization technique (SUMT). Why is it important?

Penalty function methods convert the constrained optimization problem to unconstrained one by adding a penalty for violating the constraints. For gradient based algorithms the penalty is usually quadratic, because this preserves differentiability. The penalty for equality constraints is proportional to their square, while for inequality constraints it is their square when they are positive and zero other wise. This is indicated by the triangular brackets.

The disadvantage of a quadratic penalty is that it is very small for small violation, so the solution will tend to violate some constraints. This can be minimized by multiplying the penalty by a large multiplier r, but that makes the problem numerically ill conditioned. So the established procedure is to gradually increase r as one gets closer to the solution.

Penalty function methods are not popular any more for gradient based algorithms, but we introduce them because they are still popular for global gradient less algorithms.5Example 5.7.1

As an example we minimize a quadratic function with a linear equality constraint. This leads to an augmented function including the penalty that is quadratic so we can find an analytical solution for any penalty multiplier r.

The table shows the solution, the objective and the augmented function for a series of increasing r values. The difference between f and phi is the penalty. For low values of r the constraint is violated a lot, the penalty is high but the objective is small. As we increase the penalty parameter the constraint violation decreases fast enough so that the total penalty actually decreases!

For the highest penalty parameter, the sum of the variables is 3.966, so that the violation is smaller than 1%, but that may not be acceptable, in which case an even higher penalty parameter would be required. 6Contours for r=1.

The contours of the augmented function for r=1 are well behaved so that optimizing the function numerically would be easy. The contours were obtained with the following Matlab script

r=1;x=linspace(1,5,41); y=linspace(0.1,0.5,41);[X,Y]=meshgrid(x,y);Z=X.^2+10*Y.^2+r*(4-X-Y).^2;cs=contour(X,Y,Z); clabel(cs);7Contours for r=1000 .

For non-derivative methods can avoid this by having penalty proportional to absolute value of violation instead of its square!

On the other hand, for r=1000, the contours show that we have a canyon at the constraint boundary. For a linear constraint this may not be much of a problem, but if the constraint is curved, an optimization algorithm that moves in straight lines would require large number of iterations and may get stuck.

For non gradient algorithm we may avoid this problem by using a penalty that is proportional to the violation instead of its square, so that the penalty parameter would not need to be very high.8Problems PenaltyWith an extremely robust algorithm, we can find a very accurate solution with a penalty function approach by using a very high r. However, at some high value the algorithm will begin to falter, either taking very large number of iterations or not reaching the solution. Test fminunc and fminsearch on Example 5.7.1 starting from x0=[2,2]. Start with r=1000 and increase. Solution

5.9: Projected Lagrangian methodsSequential quadratic programming

Find direction by solving

Find alpha by minimizing

Projected Lagrangian methods are based on the idea of adding a penalty term to the Lagrangian function, so that instead of the minimum being just a stationary point of the Lagrangian, it becomes a minimum of that function.

The most popular method of this class is sequential quadratic programming (SQP), also used by Matlabs fmincon. At each iteration it finds a direction by solving

where A is an approximation to the Hessian of the Langrangian function. A problem of minimizing a a quadratic function with linear constraints is called quadratic programming (QP), hence the name of the method. QP problems have special efficient solution method.

The distance to move along the direction s is found by minimizing the objective plus a linear penalty for constraint violations, where the penalty is proportional to estimates (mus) of the Lagrange multipliers of the constraints.

More details in Section 5.9 of Elements of Structural Optimization.10

Matlab function fminconFMINCON attempts to solve problems of the form: min F(X) subject to: A*X > [x,fval,exitflag,output]=fmincon(@quad2,x0,[],[],[],[],[],[],@ring)

Local minimum found that satisfies the constraints.Optimization completed because the objective function is non-decreasing in feasible directions, to within the default value of the function tolerance,and constraints are satisfied to within the default value of the constraint tolerance.

x = 0 10.0000fval =1.0000e+03exitflag =1output = iterations: 7 funcCount: 24 constrviolation: 0 stepsize: 9.9532e-05 algorithm: 'interior-point' firstorderopt: 3.4971e-0516

Documents

Survey of gradient based constrained optimization algorithms