View
221
Download
0
Category
Preview:
Citation preview
Derivative Free Optimization class Part I
Exercice 1 - partial correctionAnne Auger (Inria) anne.auger_AT_inria.fr
https://www.lri.fr/~auger/teaching.html
Adaptive step-size algorithms Questions 2 - 3
• you should obtain something similar to the plot already in the class slides:
Adaptive step-size algorithms Question 4
• Three phases:
1.step-size too small compared to distance to optimum
2.step-size adapted to distance to optimum
3.step-size too large compared to distance to optimum
Adaptive step-size algorithms Question 5
• adaptation of the step-size takes place
1.initial step-size is too small, the algorithm increases it
2.once step-size reaches the order of distance to the optimum (after 50 f-evals), both the step-size and distance to optimum decrease linearly
Adaptive step-size algorithms Question 5
• We see linear convergence (after the initial stage), formally: There exists
limt!1
1
tln
kXtkkX0k
= c
limt!1
1
tln
�t�0
= c
c < 0
Adaptive step-size algorithms Question 5
kXtk�t
is a stable homogeneous Markov chains
difficult to prove but feasible
Implies linear convergence of algorithm
PhD thesis possible related to those aspects
Anne Auger, Nikolaus Hansen (2013), Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains
Anne Auger, Nikolaus Hansen (2013), Linear Convergence on Positively Homogeneous Functions of a Comparison Based Step-Size Adaptive Randomized Search: the (1+1) ES with Generalized One-fifth Success Rule
References:
Adaptive step-size algorithms Question 6
•(1+1)-ES slower than on the sphere because the ellipsoid is a ill-conditioned function
•need for covariance matrix adaptation
sphere
ellipsoid
Adaptive step-size algorithms Question 7
Rosenbrock function (banana shape) “ill-conditioned” level sets
(1+1)-ES slower than on the sphere because of the shape of the function
Remark: Optimum of the function in (1,…,1), so change initial point
Adaptive step-size algorithms Question 8
The function g being strictly increasing and the algorithm being rank-based, you should theoretically observe the same convergence plot on f and g o f
But: numerical precision issue observed due to floating point representation
Typically: with matlab significand with 15 numbers such that 1+10^(-14) = 1 (for the computer)
Recommended