Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
How to update the di!erent parameters ?m, σ, C1. Adapting the mean 2. Adapting the step-size 3. Adapting the covariance matrix
mσ
C
Why Step-size Adaptation?
Assume a (1+1)-ES algorithm with !xed step-size (and ) optimizing the function .
σ
C = Id f(x) =n
∑i=1
x2i = ∥x∥2
Initialize m, σWhile (stopping criterion not met)
sample new solution: x ← m + σ#(0,Id)
if f(x) ≤ f(m)m ← x
What will happen if you look at the convergence
of f(m)?
red curve: (1+1)-ES with optimal step-size (see later) green curve: (1+1)-ES with constant step-size ( )σ = 10−3
Why Step-size Adaptation?PHASE III"" "* ±
/ )
twhat is going on for late) - ES with fixed step-size ?
-- -mt
-
÷:*:::÷:±÷÷¥¥÷÷€€
ot K Hmt H . Phase 3 : int close to optimumof> Hmt H
Boba to sample better point small
Does the ( att ) - ES with fixed step- size converge ?
II : Does mt o ?
AFTER NEXT SLIDE- .
ebay.es#iEixkxmtbIf T ⇒ Hmtll then Ir ( Hmt + off, Ia) H L llmtll) small-
candidate solution
⇒ with large probability meter = mt ( sampled solutionnot accepted )
↳ stagnation in phase II
Phani Ir ( Hmt + onto, Ia) H - Hmt " ) x I §¥€
red curve: (1+1)-ES with optimal step-size (see later) green curve: (1+1)-ES with constant step-size ( )σ = 10−3
Why Step-size Adaptation?
We need step-size adaptation to approach
the optimum fast (converge linearly)
Methods for Step-size Adaptation
1/5th success rule, typically applied with “+” selection[Rechenberg, 73][Schumer and Steiglitz, 78][Devroye, 72]
-self adaptation, applied with “,” selectionσ
random variation is applied to the step-size and the better one, according to the objective function value, is selected
[Schwefel, 81]
path-length control or Cumulative step-size adaptation (CSA), applied with “,” selection
[Ostermeier et al. 84][Hansen, Ostermeier, 2001]
two-point adaptation (TPA), applied with “,” selection [Hansen 2008]
test two solutions in the direction of the mean shift, increase or decrease accordingly the step-size
Step-size control: 1/5th Success Rule
Step-size control: 1/5th Success Rule
Step-size control: 1/5th Success Rule
probability of success per iteration: ps = #candidate solutions better than m
#candidate solutions
[ f(x) ≤ f(m)]
(1+1)-ES with One-!fth Success Rule - Convergence
• what if display ft instead ?
CCONVERGENCEtoo,mo
C-*no- e←
ontogeny rate
t.tn Hmtll → c> o almost surely
GLOBAL LINEAR CONVERGENCE . ie in log - scale , awe observe a line
after an adaptation phase .
Froottoo involved to be presented shortly
Also provenf
g of where f is strongly - convex - Lipschitzg : Im(f) → IR strictly increasing
Path Length Control - Cumulative Step-size Adaptation (CSA)
step-size adaptation used in the -ES algorithm framework (in CMA-ES in particular)
(μ/μw, λ)
Main Idea:
CSA-ES The Equations
The update of the step- are in CSA -ES satisfiesIf f- (x) = rand [random objective function ) , ponchoId)
or fat. No, D sampled independently at eachtime
f- [log Lotti ) I otimt) = log lot)i.e. .. .. x
PRoo candidate solutions xi= mt + oti Vft , where
{ ytiei , i-- i, . - rt , y't, nRaIdllytii ) i
define it such that
flxa.tl e - -- E fl xx :t)
and y' such that xi# = mteotyit,
under random selection fytijt u updated) and hytiit,i --e . - it} are
independent .
Consequently yw-
- Aw ?Iwiy'Iit v upload)Hone
where few = [ proof )
therefore poet, -- H - co) pot +EEF,wiytii
v No , Id) if ptu Wfaa)Here if f- rand , poncho , Id) , ohh ptf.gov Who, Id ) ft
since ota -- ot up¥C¥F⇒i ' ))log Lotti) = log lot +¥ - t)
[email protected] - t))-
= geofIBptI - e )in= 0
Convergence of -CSA-ES(μ/μw, λ)2x11 runs
Convergence of -CSA-ES(μ/μw, λ)
Note: initial step-size taken too small ( ) to illustrate the step-size adaptation
σ0 = 10−2
Convergence of -CSA-ES(μ/μw, λ)
Optimal Step-size - Lower-bound for Convergence Rates
In the previous slides we have displayed some runs with “optimal” step-size.
Optimal step-size relates to step-size proportional to the distance to the optimum: where is the optimum of the optimized function (with properly chosen).
The associated algorithm is not a real algorithm (as it needs to know the distance to the optimum) but it gives bounds on convergence rates and allows to compute many important quantities.
σt = σ∥x − x⋆∥ x⋆
σ
The goal for a step-size adaptive algorithm is to achieve convergence rates close to the one with optimal step-size
We will formalize this in the context of the (1+1)-ES. Similar results can be obtained for other algorithm frameworks.
Optimal Step-size - Bound on Convergence Rate - (1+1)-ES
Consider a (1+1)-ES algorithm with any step-size adaptation mechanism:
Xt+1 = {Xt + σt#t+1 if f(Xt + σt#t+1) ≤ f(Xt)Xt otherwise
Xt+1 = Xt + σt#t+11{f(Xt+σt#t+1)≤ f(Xt)}
with i.i.d. {#t, t ≥ 1} ∼ #(0,Id)
equivalent writing:
Bound on Convergence Rate - (1+1)-ES
Theorem: For any objective function , for any f : ℝn → ℝy⋆ ∈ ℝn
E[∥Xt+1 − y⋆∥] ≥ E[∥Xt − y⋆∥] − τwhere with τ = max
σ∈ℝ>E[ln− ∥e1 + σ#∥]
=:φ(σ)
e1 = (1,0,…,0)
Theorem: The convergence rate lower-bound is reached on spherical functions (with strictly increasing) and step-size proportional to the distance to the optimum with such that .
f(x) = g(∥x − x⋆∥) g : ℝ≥0 → ℝ
σt = σopt∥x − x⋆∥ σopt φ(σopt) = τ
lower boundin in
hilxkmaxh - en,o3j¥_hi
Xtti = Xt + et Nfu Itgflxteotdttiefcxt)} after -- WEED)
xtta is either equal to Xt or xttotirtx,
Titus
Axtell 2 mind HXHI , Htttotopttill }= llxttlmimtyd , HY¥upE¥tNMtt'M
lnlltt-ull-lnllxth-lnlminhmll.FI#itffInMtt'll}-
= - hi ( HII t :# Nt" ")
lnlltttill -- lnllxth - hi lift,,*f¥nMtt'll⇐[lnlltteitl lot, mmxt) -- lntlttll + EE
t lot,mXt)
bmma-IE-Llnminhn.ly#*ffInMttiHlglotixt )→41 YEH,) where
*410)=-E[ lnlminhn.lleri-owllf.HE.fi HentaiE- [ lnllxttill lot.tt/--ln1lxtH-4/ftf)9looptt--mgxUlo)
in
z.NO/z-mgx9lo)optweZ
-6-410) Ko) = # [hithertoMll )) :PRz
§§,
Progress rate
apt
- No)=E[ln(minds, Hee -10Wh } )Uco)=E[ hither-10Wh ))
Log-Linear Convergence of scale-invariance step-size ES
Theorem: The (1+1)-ES with step-size proportional to the distance to the optimum converges (log)-linearly on the sphere function almost surely:
σt = σ∥x∥f(x) = g(∥x∥)
1t
ln ∥Xt∥∥X0∥ t→∞
− φ(σ) =: CR(1+1)(σ) feat::tanarAlgo.
Asymptotic Results (n → ∞)
1st -- 014×11
Asymptotic Results (n → ∞)