76
1 DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS Delphine Sinoquet (IFPEN) COURSE 1: MAIN DFO METHODS https://www.ifpenergiesnouvelles.fr/page/delphine-sinoquet

DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

1

DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

Delphine Sinoquet (IFPEN)

COURSE 1: MAIN DFO METHODS

https://www.ifpenergiesnouvelles.fr/page/delphine-sinoquet

Page 2: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

2 | © 2 0 2 0 I F P E N

DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

Course 1: main DFO methods

Course 2: various applications of DFO

Course 3: some challenges in DFO

Page 3: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

3 | © 2 0 2 0 I F P E N

REFERENCES

Audet, C. and Hare W., Derivative-Free and Blackbox Optimization, Springer Series in Operations Research and Financial Engineering (2017)

Conn, A.R., Scheinberg, K., Vicente, L.N., Introduction to derivative-free optimization. SIAM, Philadelphia (2009)

Rios, L.M. & Sahinidis, N.V., Derivative-free optimization: a review of algorithms andcomparison of software implementations, J Glob Optim (2013), Vol. 56, Issue 3, pp 1247–1293

Cartis, C., Lectures on global and derivative-free optimization, University of Oxford (2018)

Page 4: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

4 | © 2 0 2 0 I F P E N

WHY DERIVATIVE FREE OPTIMIZATION ?

Page 5: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

5 | © 2 0 2 0 I F P E N

STUDY OF COMPLEX SYSTEMS

Parameters

$$

Responses of interest

Reach one or severalobjectives by tuning the

input parameters

Experiments/Simulations

Page 6: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

6 | © 2 0 2 0 I F P E N

STUDY OF COMPLEX SYSTEMS

Parameters

$$

Responses of interest

Reach one or severalobjectives by tuning the

input parameters

➢ Manual optimization (trial & error): when the expert knows very well and can control the system, and when the number of parameters is small

Experiments/Simulations

Page 7: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

7 | © 2 0 2 0 I F P E N

STUDY OF COMPLEX SYSTEMS

Parameters

$$

Responses of interest

Reach one or severalobjectives by tuning the

input parameters

➢ Manual optimization (trial & error): when the expert knows very well and can control the system, and when the number of parameters is small

➢ Random exploration: How many simulations should we do ? How do we know that the current set of values is close to a solution ?

Experiments/Simulations

Page 8: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

8 | © 2 0 2 0 I F P E N

STUDY OF COMPLEX SYSTEMS

Parameters

$$

Responses of interest

Reach one or severalobjectives by tuning the

input parameters

➢ Manual optimization (trial & error): when the expert knows very well and can control the system, and when the number of parameters is small

➢ Random exploration: How many simulations should we do ? How do we know that the current set of values is close to a solution ?

➢ Discretisation of the parameter space on a regular grid: 3n simulations if we consider 3 points per dimension and n parameters !!!

Experiments/Simulations

Page 9: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

9 | © 2 0 2 0 I F P E N

Experiments/Simulations

OPTIMIZATION

STUDY OF COMPLEX SYSTEMS

Parameters

$$

Responses of interest

Page 10: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

10 | © 2 0 2 0 I F P E N

Experiments/Simulations

OPTIMIZATIONmin𝑥

𝑓 𝑥

STUDY OF COMPLEX SYSTEMS

➢ An optimizer O is an algorithm which proposes iteratively a new x based on the information from previous trials in order to approximate the solution of the problem 𝑥 = argmin(𝑓 𝑥 )

S O S O𝑥0 𝑓(𝑥0) 𝑥1 𝑓(𝑥𝑘) 𝑥𝑘+1

Optimizer

Simulator 𝑥 𝑓(𝑥)

Parameters

$$

Responses of interest

Page 11: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

11 | © 2 0 2 0 I F P E N

Experiments/Simulations

OPTIMIZATIONmin𝑥

𝑓 𝑥

STUDY OF COMPLEX SYSTEMS

➢ An optimizer O is an algorithm which proposes iteratively a new x based on the information from previous trials in order to approximate the solution of the problem 𝑥 = argmin(𝑓 𝑥 )

➢ The cost of the optimization is linked with the number of calls to the simulator S which evaluates f

S O S O𝑥0 𝑓(𝑥0) 𝑥1 𝑓(𝑥𝑘) 𝑥𝑘+1

Optimizer

Simulator 𝑥 𝑓(𝑥)

Parameters

$$

Responses of interest

Page 12: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

12 | © 2 0 2 0 I F P E N

WHY DERIVATIVE FREE OPTIMIZATION ?

More and more complex simulatorsblack-box simulator (proprietary code or a simulation package)derivatives of objective function are not available

numerical approximation of ∇𝑓(𝑥) is expensive: finite-differences when computing 𝑓(𝑥) is expensive or for a high number of optimization variables 𝑥

computing 𝑓(𝑥) is expensive: time consuming numerical simulations or experiments

➢Need for optimization methods adapted to derivative free problems

Simulator𝑓(𝑥)

Optimizerminx𝑓(𝑥)

𝑥

𝑓(𝑥)

Page 13: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

13 | © 2 0 2 0 I F P E N

DFO METHODS

(Standard derivative-based methods with approximate gradients)

Direct Search methods Nelder Mead Simplex

Pattern Search

Surrogate optimization / model-based DFO methodsLocal model of the objective function

Global model of the objective function

Stochastic DFO methodsEvolutionary strategies

Simulated annealing

Page 14: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

14 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS

by finite differences (F.D.)n+1 simulations per iteration

by generalized finite derivatives (E.F.D.)

by an approximation model of the objective functionsimulations from previous iterations + m (≤ n) new simulations

∇𝑥𝑖𝑓(𝑥𝑘) ≈

𝑓 𝑥𝑘 + ℎ𝑒𝑖 − 𝑓(𝑥𝑘)

∇𝑓 𝑥𝑘 ≈ ∆𝑥 −1 𝑓 𝑥𝑘 + ∆𝑥 − 𝑓 𝑥𝑘

∆𝑥 is the perturbation matrix (for F.D. ∆𝑥 = 𝐼𝑛)

෨𝐹 𝑥𝑘 + 𝑠 = 𝑓 𝑥𝑘 + 𝑔𝑘𝑠 ou ෨𝐹 𝑥𝑘 + 𝑠 = 𝑓 𝑥𝑘 + 𝑔𝑘𝑠 + 𝑠𝑇 ෩𝐻𝑘𝑠

∇𝑓 𝑥𝑘 ≈ 𝑔𝑘

Page 15: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

15 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: FINITE DIFFERENCES

x1

x2

Page 16: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

16 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: FINITE DIFFERENCES

x1

x2

Page 17: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

17 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: FINITE DIFFERENCES

x1

x2

Page 18: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

18 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: FINITE DIFFERENCES

x1

x2

Page 19: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

19 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION

x1

x2

Page 20: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

20 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION

x2

x1

Page 21: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

21 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION

x2

x1

Page 22: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

22 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION

x2

x1

Page 23: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

23 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION

x1

x2

Page 24: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

24 | © 2 0 2 0 I F P E N

A CLASSICAL EXAMPLE FOR NL OPTIMIZATION

-1 0 1-0,5

-0

0,5

1

1,5

Rosenbrock function :

(-1;1)initial point solution

(1;1)

Page 25: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

25 | © 2 0 2 0 I F P E N

DERIVATIVE-BASED METHOD

14 iterations

Solution obtained with Newton method

Page 26: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

26 | © 2 0 2 0 I F P E N

DERIVATIVE-BASED METHOD

53 iterations

Solution obtained with BFGS method

Page 27: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

27 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS

92 simulations

-1 0 1-0,5

-0

0,5

1

1,5

Solution obtained with BFGS method

Page 28: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

28 | © 2 0 2 0 I F P E N

APPROXIMATE GRADIENTS

Function-evaluation cost: 𝑛 + 1 evaluations at each iteration

Difficulty to choose the finite difference step ℎ

If noisy function →meaningless approximate gradients

➢Convergence issues

Page 29: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

29 | © 2 0 2 0 I F P E N

DFO METHODS

Use only objective function values

No gradient approximation

Sample of points 𝑥𝑖 𝑖=1,…,𝑝➔ simulations 𝑥𝑖 𝑖=1,…,𝑝➔ new iterate 𝑥𝑘

Page 30: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

30 | © 2 0 2 0 I F P E N

DFO METHODS

(Standard derivative-based methods with approximate gradients)

Direct Search methods Nelder Mead Simplex

Pattern Search

Surrogate optimization / model-based DFO methodsLocal model of the objective function

Global model of the objective function

Stochastic DFO methodsEvolutionary strategies

Simulated annealing

Page 31: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

31 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Nelder-Mead simplex algorithm

Source : Richards (2010)

based on the comparison of objective function values on a (𝑛 + 1) simplex: 𝑓 𝑥1 ≤ 𝑓 𝑥2 … ≤ 𝑓(𝑥𝑛+1)

Attempt to improve the worst objective function value 𝑓(𝑥𝑛+1):𝑥𝑛+1 is replaced by a point belonging to the line ҧ𝑥, 𝑥𝑛+1with ҧ𝑥 =

1

𝑛σ𝑖=1𝑛 𝑥𝑖, centroid of the best 𝑛 points

→ expansion, reflection or contraction of the simplex at each iteration

Page 32: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

32 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Nelder-Mead simplex algorithm

Source : Wright (2013)

Page 33: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

33 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Nelder-Mead simplex algorithm

Termination conditions: function values at vertices are close to each other

or simplex becomes too small

Simulation cost:k=0 and for any shrinkage step: (𝑛 + 1) evaluations

1 or 2 evaluations for all other steps

Limited convergence results (only for 𝑛 = 1 or 𝑛 = 2)see Torczon (1991) for other simplex methods with better convergence results

Lot of failure examples

Page 34: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

34 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

𝑥1 = 𝑥0 + 𝛼0𝑒1

𝑥2 = 𝑥1 + 𝛼1𝑒2

⋮𝑥𝑛 = 𝑥𝑛−1 + 𝛼𝑛−1𝑒𝑛−1

𝑥𝑛+1 = 𝑥𝑛 + 𝛼𝑛𝑒1

𝛼𝑘 is chosen to produce a sufficient decrease

𝑓 𝑥𝑘 + 𝛼𝑘𝑒𝑘−1 < 𝑓 𝑥𝑘 − 𝜌 𝛼𝑘

with 𝜌 𝑡 ≥ 0 increasing function of 𝑡, 𝜌(𝑡)/𝑡𝑡→0

0

→Inefficient: coordinate direction (almost) ⊥ ∇𝑓 𝑥𝑘

→Efficient when the variables are essentially uncoupled

Linesearch derivative free methods: e.g. coordinate search method

Page 35: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

35 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Pattern search methods

Motivation: parallelisation of function evaluationsInstead of one search direction 𝑠𝑙(= 𝑒𝑙) in linesearch, explore a set of directions 𝐷𝑘

e.g. 𝐷𝑘 = 𝑒1, 𝑒2, … , 𝑒𝑛, −𝑒1, −𝑒2, … , −𝑒𝑛

At each iteration, for a given mesh step 𝛼𝑘:

Search step (OPTIONNAL): evaluate the objective function on a finite number of points with any method: along a given direction, on a simplex, …

Poll Step: if no better point if found in optionnal search step, search for a better point in the 𝐷𝑘 directions: 𝑥𝑘 + 𝛼𝑘𝑠𝑖 , 𝑠𝑖 ∈ 𝐷𝑘

If no better point is found (no smaller function value):𝑥𝑘+1 = 𝑥𝑘 and decrease the mesh size 𝛼𝑘

else 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑠𝑖 and increase the mesh size

Page 36: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

36 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Pattern search methods

Source : Kolda et al. (2003)

Page 37: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

37 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Pattern search methods: DIRECT

Source : Perttunen et al. (1993)

= Dividing RECTangles

Divide each side of the « rectangle(s) » associated with the smallest function values into3 in order to define sub-rectangles

Evaluate the center of the new rectangles

Stopping criteria: minimal size of the rectangles

Global convergence for continuous functions

High evaluation cost

Page 38: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

38 | © 2 0 2 0 I F P E N

DIRECT ALGORITHM

-1 0 1-0,5

-0

0,5

1

1,5

517 simulations

Page 39: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

39 | © 2 0 2 0 I F P E N

DIRECT ALGORITHM

-1 0 1-0,5

-0

0,5

1

1,5

Last iterations

Page 40: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

40 | © 2 0 2 0 I F P E N

DIRECT SEARCH METHODS

Popular methodsEasy to implement

Easy to parallelize

But expensive in terms of simulations

often coupled with a surrogate model in the search step

Pattern search methods

Page 41: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

41 | © 2 0 2 0 I F P E N

DFO METHODS

(Standard derivative-based methods with approximate gradients)

Direct Search methods Nelder Mead Simplex

Pattern Search

Surrogate optimization / model-based DFO methodsLocal model of the objective function

Global model of the objective function

Stochastic DFO methodsEvolutionary strategies

Simulated annealing

Page 42: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

42 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Optimization methods based on a surrogate model of the objective function

to limit the number of evaluations of the objective functions

the model is updated during the iterations based on new simulations

The model is either global or local

Page 43: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

43 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Global models

Design of experiment technique choose evaluation points to be used to compute the initial model

space filling design (maximin criterion)

Regression choose a model type

Gaussian process or kriging, Radial Basis Function (RBF), Neuronal Networks

Sampling criterionchoose new point(s) to evaluate for the update of the modelminimum of the current model, maximum of the error prediction ...

Model

Page 44: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

44 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Global models: Gaussian process (kriging)

Assumption: the objective function is assumed to be a realization of a Gaussian random process (GP) with parametric mean function and stationary covariance function

𝐹 𝑥 = 𝛽𝑇𝑟 𝑥 + 𝑍(𝑥)

Regression(e.g. polynomial d=1) zero-mean, stationary covariance

stochastic part

Page 45: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

45 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Global models: Gaussian process (kriging)

Assumption: the objective function is assumed to be a realization of a Gaussian random process (GP) with parametric mean function and stationary covariance function

𝐹 𝑥 = 𝛽𝑇𝑟 𝑥 + 𝑍(𝑥)

The surrogate model is the conditional expectation of the GP

𝐹 𝑥 = 𝐸 𝐹 𝑥 | 𝑥𝑖 , 𝑓 𝑥𝑖 𝑖=1,…,𝑝

= 𝛽𝑇 𝑟 𝑥 + 𝑘𝑇 𝑥 𝐾−1 𝑌𝑝 − 𝑅𝛽

avec 𝑅 = 𝑟𝑗 𝑥𝑖𝑖,𝑗

, 𝐾 = 𝜌 𝑥𝑖 , 𝑥𝑗𝑖,𝑗, 𝑘 𝑥 = (𝜌 𝑥, 𝑥1 , … , 𝜌 𝑥, 𝑥𝑝 )

Page 46: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

46 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Global models: Gaussian process (kriging)

Assumption: the objective function is assumed to be a realization of a Gaussian random process (GP) with parametric mean function and stationary covariance function

𝐹 𝑥 = 𝛽𝑇𝑟 𝑥 + 𝑍(𝑥)

The surrogate model is the conditional expectation of the GP

𝐹 𝑥 = 𝐸 𝐹 𝑥 | 𝑥𝑖 , 𝑓 𝑥𝑖 𝑖=1,…,𝑝

= 𝛽𝑇 𝑟 𝑥 + 𝑘𝑇 𝑥 𝐾−1 𝑌𝑝 − 𝑅𝛽

The variance of GP are used as error indicators 𝜎2(𝑥) = 𝜎2 − 𝑘𝑇 𝑥 𝐾−1𝑘𝑇 𝑥

avec 𝑅 = 𝑟𝑗 𝑥𝑖𝑖,𝑗

, 𝐾 = 𝜌 𝑥𝑖 , 𝑥𝑗𝑖,𝑗, 𝑘 𝑥 = (𝜌 𝑥, 𝑥1 , … , 𝜌 𝑥, 𝑥𝑝 )

Page 47: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

47 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Global models: Gaussian process (kriging)

Page 48: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

48 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Sampling criterion based on Expected Improvement (EI)a balance between exploration and minimization

➢ EGO (Efficient Global Optimization)

or Bayesian Optimization

argmax 𝐸𝐼 𝑥 = 𝐸 𝐼 𝑥

= 𝐸 max(0, 𝑓𝑚𝑖𝑛 − 𝐹(𝑥))

Page 49: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

49 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local models

Quadratic interpolation models built from a set of appropriately chosen sample points𝐹𝑘 𝑠 = 𝑐𝑘 + 𝑠𝑇𝑔𝑘 +

1

2𝑠𝑇𝐻𝑘𝑠, s ∈ ℬ 𝑥𝑘 , Δ (trust region) (TR)

with 𝑐𝑘 ∈ ℝ, 𝑔𝑘 ∈ ℝ𝑛 and 𝐻𝑘 ∈ ℝ𝑛×𝑛 (symmetric) that satisfy interpolation conditions:𝐹𝑘 𝑥𝑖 − 𝑥𝑘 = 𝑓 𝑥𝑖

The matrix of linear system must be non-singular and well conditioned

Minimization of the quadratic model in the trust region min𝑠 ≤Δ𝑘

𝐹𝑘 𝑠

Update the model with new evaluations

Improve the geometry of the interpolation set to help with the model interpolation stepOne point is replaced by another one that improves the conditioning of the interpolation matrix

Page 50: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

50 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local models

Let 𝑠𝑘 be the solution of the (TR) minimization problem

Predicted model decrease 𝐹𝑘 0 − 𝐹𝑘 𝑠𝑘 = 𝑓 𝑥𝑘 − 𝐹𝑘 𝑠𝑘

Actual function decrease 𝑓 𝑥𝑘 − 𝑓 𝑥𝑘 + 𝑠𝑘

The trust region is updated according to the value of 𝜌𝑘 =𝑓 𝑥𝑘 −𝑓 𝑥𝑘+𝑠𝑘

𝑓 𝑥𝑘 − 𝐹𝑘 𝑠𝑘

If 𝜌𝑘 ≥ 𝜂 (successful step): 𝑥𝑘+1 = 𝑥𝑘 + 𝑠𝑘, Δ𝑘+1 ≥ Δ𝑘

If 𝜌𝑘 < 𝜂 (unsuccessful step): 𝑥𝑘+1 = 𝑥𝑘, Δ𝑘 is reduced or the interpolation set is improved

Page 51: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

51 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region

Page 52: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

52 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region

Page 53: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

53 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region

Page 54: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

54 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local models objective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region

Page 55: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

55 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region

Page 56: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

56 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local models objective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region

Page 57: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

57 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Local models (TR DFO)

-1 0 1-0,5

-0

0,5

1

1,5

86 simulations

Page 58: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

58 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Global models (Gaussian Process)

29 simulations

-1 0 1-0,5

-0

0,5

1

1,5

Page 59: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

59 | © 2 0 2 0 I F P E N

DFO METHODS

(Standard derivative-based methods with approximate gradients)

Direct Search methods

Nelder Mead Simplex

Pattern Search

Surrogate optimization / model-based DFO methods

Local model of the objective function

Global model of the objective function

Stochastic DFO methodsEvolutionary strategies

Simulated annealing

Page 60: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

60 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Evolution strategies

Global optimization

Very few assumptions on function regularity

Main principle1. Random generation of initial population

2. Evaluation of each individual of current generation

3. Reproduction: selection of the best individuals

4. Diversification: cross-over and mutation

5. Replacement : survival of the best individuals

6. Repeat step 2 until satisfying solution is obtained

Page 61: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

61 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Evolution strategies

An example of a cross-over operator for continuous variablesPair of individuals 𝑥, 𝑦 selected randomly

𝑥, 𝑦 → 𝛼𝑥 + 1 − 𝛼 𝑦, 𝛼~𝑈( 0; 1 )

An example of a mutation operator for continuous variablesaddition of a Gaussian noise

𝑥𝑖 ≔ 𝑥𝑖 + 𝑢𝑖 , 𝑢𝑖~𝑁 0, 𝜎𝑖2 , 𝑖 = 1,2, … , 𝑛

𝜎𝑖 is a critical parameter to tune

Page 62: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

62 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Evolution strategies

1/5-rule (Rechenberg)

If < 0.2, then increases

If > 0.2, then decreases (where = % of successful mutations over T generations)

➢Can adapt one general step-size

➢But no individual step size

Mutative step-size control Strategy parameters (step-sizes) treated similarly to optimized parameters

➢Facilitates adaptation of individual step-sizes

Page 63: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

63 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Evolution strategies

Mutation operator

Covariance matrix ~ inverse of a “global” Hessian matrix

Select the best individuals to compute the meanupdate of

the covariance matrix

the global standard deviation

Page 64: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

64 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Evolution strategies vs. genetic algorithms

Main difference : in evolutionary strategies, only the best individuals are allowed to reproduce (elitist selection)

The parents can be included in the next generation

Similar operators: mutation, cross-over

Page 65: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

65 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Evolution strategies: CMAES method (Hansen)

Generation 1 Generation 20 Generation 30

Page 66: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

66 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Evolution strategy; CMAES

-1 0 1-0,5

-0

0,5

1

1,5

370 simulations

Page 67: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

67 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Evolution strategy; CMAES

170 last simulations

-1 0 1-0,5

-0

0,5

1

1,5

Page 68: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

68 | © 2 0 2 0 I F P E N

STOCHASTIC METHODS

Simulated annealing

Principle: emulate the physical system of the cooling of a solid so that the frozen state is frozen for a minimum energy configuration

At a given iteration, a new solution (state) 𝑥𝑘+1 is determined from the previous solution (state) 𝑥𝑘

randomly choose a neighbour of 𝒙𝒌: 𝒚𝒌then, the new state is

𝑥𝑘+1 = ൞

𝑦𝑘 if 𝑓 𝑦𝑘 ≤ 𝑓(𝑥𝑘)

𝑦𝑘 if 𝑓 𝑦𝑘 > 𝑓(𝑥𝑘)𝑥𝑘 otherwise

with the probability 𝑒𝑓 𝑥𝑘 −𝑓 𝑦𝑘

𝑡𝑘

𝒕𝒌 (temperature of the system) is a decreasing sequence in order to decrease the probability to accept a bad solution (increasing 𝑓) at last iterations

Page 69: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

69 | © 2 0 2 0 I F P E N

SURROGATE OPTIMIZATION METHODS

Simulated annealing: SIMPSA

-1 0 1-0,5

-0

0,5

1

1,5

280 last simulations

Page 70: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

70 | © 2 0 2 0 I F P E N

DFO METHODS

SQPAL = SQP BFGS – FDSQA = local surrogate optimEGO = global surrogate optim (kriging)NMSMAX = Nelder-Mead simplexCMAES = Evolutionary Strategy

53 problems2 ≤ 𝑛 ≤ 12

Benchmark Moré & Wild

Page 71: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

71 | © 2 0 2 0 I F P E N

DFO METHODS

Summary (I)

Classical methods with approximate gradients➢popular (do not change optimizer)

➢but not adapted for large scale problem

➢step size tuning is cumbersome

Direct Search methods ➢ revival of these methods with parallelization➢hybrid implementation (coupled with surrogate models)➢not adapted for large scale problems

Page 72: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

72 | © 2 0 2 0 I F P E N

DFO METHODS

Summary (II)

Surrogate optimizationLocal interpolation model with TR

➢can handle constraints (coupled with SQP)

➢good performances in terms of number of function evaluations

Global models

➢global methods

➢not adapted for large scale problems: needs a lot of evaluation to obtain a good accuracy

Page 73: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

73 | © 2 0 2 0 I F P E N

DFO METHODS

Summary (III)

Evolutionary strategies / Simulated annealing➢no assumption on function regularities

➢discrete optimization is possible

➢global methods

➢but expensive in terms of function evaluations

➢difficulties to handle constraints

Page 74: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

74 | © 2 0 2 0 I F P E N

DFO METHODS

Global versus Local

Multi-start optimizationrun a local method from several initial points

➢well adapted for functions with a small number of local minima

➢handles constraints

Global surrogate optimization (kriging, RBF, NN) / Evolutionary algorithms / Simulated annealing➢not adapted for large scale problems:

needs a lot of evaluations to obtain a good accuracy

➢difficulties for handling constraints

Page 75: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

75 | © 2 0 2 0 I F P E N

REFERENCES

Audet, C. and Hare W., Derivative-Free and Blackbox Optimization, Springer Series in Operations Research and Financial Engineering (2017)

Conn, A.R., Scheinberg, K., Vicente, L.N., Introduction to derivative-free optimization. SIAM, Philadelphia (2009)

Rios, L.M. & Sahinidis, N.V., Derivative-free optimization: a review of algorithms andcomparison of software implementations, J Glob Optim (2013), Vol. 56, Issue 3, pp 1247–1293

Cartis, C., Lectures on global and derivative-free optimization, University of Oxford (2018)

Page 76: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

76 | © 2 0 2 0 I F P E N

DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS

Course 1: main DFO methods

Course 2: various applications of DFO

Course 3: some challenges in DFO