51
Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation The challenges for stochastic optimization and a variable metric approach Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint Centre, INRIA Saclay April 6, 2009 Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

The challenges for stochastic optimization anda variable metric approach

Nikolaus Hansen, INRIA Saclay

Microsoft Research–INRIA Joint Centre, INRIA Saclay

April 6, 2009

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 2: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Content

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation EvolutionStrategy (CMA-ES)

5 Discussion

6 Evaluation

Einstein once spoke of the“unreasonable effectiveness ofmathematics” in describing howthe natural world works. Whetherone is talking about basic physics,about the increasingly importantenvironmental sciences, or thetransmission of disease,mathematics is never any more, orany less, than a way of thinkingclearly. As such, it always hasbeen and always will be a valuabletool, but only valuable when it ispart of a larger arsenal embracinganalytic experiments and, aboveall, wide-ranging imagination.

Lord Kay

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 3: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Problem StatementContinuous Domain Search/Optimization

Task: minimize a objective function (fitness function,loss function) in continuous domain

f : X ⊆ Rn → R, x 7→ f (x)

Black Box scenario (direct search scenario)

f(x)x

gradients are not available or not usefulproblem domain specific knowledge is used only within theblack box, e.g. within an appropriate encoding

Search costs: number of function evaluations

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 4: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Problem StatementContinuous Domain Search/Optimization

Goalsolution x with small function value with least search cost

there are two conflicting objectivesfast convergence to the global optimum

. . . or to a robust solution x

Typical Examples

shape optimization (e.g. using CFD) curve fitting, airfoils

model calibration biological, physical

parameter calibration controller, plants,images

Approach: stochastic search, Evolutionary Algorithms

. . . metaphores

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 5: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Metaphors

Evolutionary Computation Optimization

genome ←→ decision variablesdesign variablesobject variables

individual, offspring, parent ←→ candidate solutionpopulation ←→ set of candidate solutions

fitness function ←→ objective functionloss functioncost function

generation ←→ iteration

. . . properties

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 6: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Objective Function Properties

We assume f : X ⊂ Rn → R to be non-linear, non-separableand to have at least moderate dimensionality, say n 6� 10.Additionally, f can be

non-convexnon-smooth derivatives do not existdiscontinuousill-conditionedmultimodal there are eventually many local optima

noisy. . .

Goal : cope with any of these function propertiesthey are related to real-world problems

. . . Tripeds

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 7: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Comparison of CMA-ES, IDEA and Simplex-Downhill

CMA-ES: Covariance Matrix Adaptation Evolution StrategyIDEA: Iterated Density-Estimation Evolutionary Algorithm1

Fminsearch: Nelder-Mead simplex downhill method2

see. . .http://www.icos.ethz.ch/cse/research/highlights/Race.gif

. . . function properties1

Bosman (2003) Design and Application of Iterated Density-Estimation Evolutionary Algorithms. PhD thesis.2

Nelder and Mead (1965). A simplex method for function minimization. Computer Journal.Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 8: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

What Makes a Function Difficult to Solve?Why stochastic search?

ruggednessnon-smooth, discontinuous, multimodal,

and/or noisy function

non-separabilitydependencies between the objective

variablesdimensionality

(considerably) larger than three

ill-conditioning

cut from 5-D solvableexample

gradient direction−f ′(x)T

Newton direction−H−1f ′(x)T

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 9: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Separable Problems

Definition (Separable Problem)

A function f is separable if

arg min(x1,...,xn)

f (x1, . . . , xn) =(

arg minx1

f (x1, . . .), . . . , arg minxn

f (. . . , xn))

⇒ it follows that f can be optimized in a sequence of nindependent 1-D optimization processes

Example: Additively decomposablefunctions

f (x1, . . . , xn) =n∑

i=1

fi(xi)

Rastrigin function−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 10: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Non-Separable ProblemsBuilding a non-separable problem from a separable one

Rotating the coordinate system

f : x 7→ f (x) separablef : x 7→ f (Rx) non-separable

R rotation matrix

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

R−→

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

34

3Hansen, Ostermeier, Gawelczyk (1995). On the adaptation of arbitrary normal mutation distributions in

evolution strategies: The generating set adaptation. Sixth ICGA, pp. 57-64, Morgan Kaufmann4

Salomon (1996). ”Reevaluating Genetic Algorithm Performance under Coordinate Rotation of BenchmarkFunctions; A survey of some theoretical and practical aspects of genetic algorithms.” BioSystems, 39(3):263-278

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 11: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Curse of Dimensionality

The term Curse of dimensionality (Richard Bellman) refers toproblems caused by the rapid increase in volume associatedwith adding extra dimensions to a (mathematical) space.

Example: Consider placing 100 points onto a real interval, say[−1, 1]. To get similar coverage, in terms of distance betweenadjacent points, of the 10-dimensional space [−1, 1]10 wouldrequire 10010 = 1020 points. A 100 points appear now asisolated points in a vast empty space.

Consequently, a search policy (e.g. exhaustive search) that isvaluable in small dimensions might be useless in moderate orlarge dimensional search spaces.

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 12: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Ill-Conditioned ProblemsCurvature of level sets

Consider the convex-quadratic functionf (x) = 1

2(x− x∗)TH(x− x∗)

gradient direction −f ′(x)T

Newton direction −H−1f ′(x)T

Condition number equals nine here. Condition numbersbetween 100 and even 1010 can often be observed in real

world problems.

If H ≈ I (small condition number of H) first order information(e.g. the gradient) is sufficient. Otherwise second orderinformation (estimation of H−1) is required.

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 13: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

What Makes a Function Difficult to Solve?. . . and what can be done

Challenge Approach in Evolutionary Computation

Ruggedness non-local policy, large sampling width (step-size)as large as possible while preserving a

reasonable convergence speed

stochastic, non-elitistic, population-basedmethodrecombination operator

serves as repair mechanism

Dimensionality,Non-Separability

exploiting the problem structurelocality, neighborhood, encoding

Ill-conditioning second order approachchanges the neighborhood metric

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 14: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Covariance Matrix AdaptationCumulation—the Evolution PathStep-Size Control

5 Discussion

6 Evaluation

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 15: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Stochastic Search

A black box search template to minimize f : Rn → RInitialize distribution parameters θ, set population sizeλ ∈ NWhile not terminate

1 Sample distribution P (x|θ)→ x1, . . . , xλ ∈ Rn

2 Evaluate x1, . . . , xλ on f3 Update parameters θ ← Fθ(θ, x1, . . . , xλ, f (x1), . . . , f (xλ))

Everything depends on the definition of P and Fθdeterministic algorithms are covered as well

In Evolutionary Algorithms the distribution P is often implicitlydefined via operators on a population, in particular, selection,recombination and mutationNatural template for Estimation of Distribution Algorithms

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 16: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Stochastic Search

A black box search template to minimize f : Rn → RInitialize distribution parameters θ, set population sizeλ ∈ NWhile not terminate

1 Sample distribution P (x|θ)→ x1, . . . , xλ ∈ Rn

2 Evaluate x1, . . . , xλ on f3 Update parameters θ ← Fθ(θ, x1, . . . , xλ, f (x1), . . . , f (xλ))

In the following

P is a multi-variate normal distributionN(

m, σ2C)∼ m + σN

(0,C

)θ = {m,C, σ} ∈ Rn × Rn×n × R+

Fθ = Fθ(θ, x1:λ, . . . , xµ:λ), where µ ≤ λ and xi:λ is the i-thbest of the λ points

. . . why?

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 17: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Normal Distribution

−4 −2 0 2 40

0.1

0.2

0.3

0.4Standard Normal Distribution

prob

abilit

y de

nsity

probability density of 1-D standardnormal distribution

−5

0

5

−5

0

50

0.1

0.2

0.3

0.4

2−D Normal Distribution

probability density of 2-D normaldistribution

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 18: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

The Multi-Variate (n-Dimensional) Normal Distribution

Any multi-variate normal distribution N“

m,C”

is uniquely determined by itsmean value m ∈ Rn and its symmetric positive definite n× n covariancematrix C.

The mean value m

determines the displacement (translation)

is the value with the largest density (modalvalue)

the distribution is symmetric about thedistribution mean

−4 −2 0 2 40

0.1

0.2

0.3

0.4Standard Normal Distribution

prob

abilit

y de

nsity

The covariance matrix C

determines the shape

has a valuable geometrical interpretation: any covariance matrix canbe uniquely identified with the iso-density ellipsoid {x ∈ Rn | xTC−1x = 1}

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 19: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

. . . any covariance matrix can be uniquely identified with the iso-densityellipsoid {x ∈ Rn | xTC−1x = 1}

Lines of Equal Density

N“

m, σ2I”∼ m + σN

“0, I

”one degree of freedom σ

components areindependent standardnormally distributed

N“

m,D2”∼ m + DN

“0, I

”n degrees of freedomcomponents areindependent, scaled

N“

m,C”∼ m + C

12N

“0, I

”(n2 + n)/2 degrees of freedom

components arecorrelated

where I is the identity matrix (isotropic case) and D is a diagonal matrix

(reasonable for separable problems) and A×N“

0, I”∼ N

“0,AAT

”holds for

all A.. . . CMA

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 20: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Covariance Matrix AdaptationCumulation—the Evolution PathStep-Size Control

5 Discussion

6 Evaluation

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 21: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Stochastic Search

A black box search template to minimize f : Rn → RInitialize distribution parameters θ, set population sizeλ ∈ NWhile not terminate

1 Sample distribution P (x|θ)→ x1, . . . , xλ ∈ Rn

2 Evaluate x1, . . . , xλ on f3 Update parameters θ ← Fθ(θ, x1, . . . , xλ, f (x1), . . . , f (xλ))

P is a multi-variate normal distributionN(

m, σ2C)∼ m + σN

(0,C

). . . sampling

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 22: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Sampling New Search PointsThe Mutation Operator

New search points are sampled normally distributed

xi ∼ m + σNi

(0,C

)for i = 1, . . . , λ

as perturbations of m where xi,m ∈ Rn, σ ∈ R+, and C ∈ Rn×n

where

the mean vector m ∈ Rn represents the favoritesolutionthe so-called step-size σ ∈ R+ controls the steplengththe covariance matrix C ∈ Rn×n determines theshape of the distribution ellipsoid

The question remains how to update m, C, and σ.

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 23: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Update of the Distribution Mean mSelection and Recombination

Given the i-th solution point xi = m + σ Ni

(0,C

)︸ ︷︷ ︸

=: yi

= m + σ yi

Let xi:λ the i-th ranked solution point, such thatf (x1:λ) ≤ · · · ≤ f (xλ:λ).The new mean reads

m←µ∑

i=1

wi xi:λ = m + σ

µ∑i=1

wi yi:λ︸ ︷︷ ︸

=: yw

wherew1 ≥ · · · ≥ wµ > 0,

∑µi=1 wi = 1

The best µ points are selected from the sampled solutions(non-elitistic) and a weighted mean is taken.

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 24: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Covariance Matrix AdaptationRank-One Update

m ← m + σyw, y

w=∑µ

i=1 wi yi:λ, y

i∼ Ni

(0,C

)

new distribution,C← 0.8× C + 0.2× y

wyT

wthe ruling principle: the adaptation increases the likelyhood ofsuccessful steps, y

w, to appear again

. . . equations

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 25: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Preliminary Set of EquationsCovariance Matrix Adaptation with Rank-One Update

Initialize m ∈ Rn, and C = I, set σ = 1, learning rate ccov ≈ 2/n2

While not terminate

xi = m + σ yi, y

i∼ Ni

(0,C

), i = 1, . . . , λ

m ← m + σyw

where yw

=µ∑

i=1

wi yi:λ

C ← (1− ccov)C + ccovµw yw

yTw︸︷︷︸

rank-one

where µw =1∑µ

i=1 wi2 ≥ 1

λ can be small

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 26: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

C← (1− ccov)C + ccovµwyw

yTw

The covariance matrix adaptation

learns all pairwise dependencies between variablesoff-diagonal entries in the covariance matrix reflect the dependencies

conducts a principle component analysis (PCA) of steps yw,

sequentially in time and spaceeigenvectors of the covariance matrix C are the principle components

/ the principle axes of the mutation ellipsoid

approximates the inverse Hessian on convex-quadratic functionsoverwhelming empirical evidence, proof is in progress

learns a new, rotated problem representa-tion and a new variable metric (Mahalanobis)

components are independent (only) in the new representationrotational invariant

equivalent with an adaptive (general) linear encodinga

aHansen 2000, Invariance, Self-Adaptation and Correlated Mutations in Evolution

Strategies, PPSN VI

. . . cumulation, step-size control

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 27: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Covariance Matrix AdaptationCumulation—the Evolution PathStep-Size Control

5 Discussion

6 Evaluation

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 28: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

CumulationThe Evolution Path

Evolution Path

Conceptually, the evolution path is the path the strategy mean m takes over anumber of generation steps.

An exponentially weightedsum of steps y

wis used

pc∝

gXi=0

(1− cc)g−i| {z }

exponentially

fading weights

y(i)w

The recursive construction of the evolution path (cumulation):

pc← (1− cc)| {z }

decay factor

pc

+p

1− (1− cc)2√µw| {z }normalization factor

yw|{z}

input,m−mold

σ

where µw = 1Pwi2 , cc � 1. History information is accumulated in the evolution

path.Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 29: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

“Cumulation” is a widely used technique and also know as

exponential smoothing in time series, forecastingexponentially weighted mooving averageiterate averaging in stochastic approximationmomentum in the back-propagation algorithm for ANNs. . .

. . . why?

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 30: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

CumulationUtilizing the Evolution Path

We used yw

yTw

for updating C. Because yw

yTw

= −yw(−y

w)T the sign of y

wis

neglected. The sign information is (re-)introduced by using the evolution path.

pc← (1− cc)| {z }

decay factor

pc

+p

1− (1− cc)2√µw| {z }normalization factor

yw

where µw = 1Pwi2 , cc � 1.

. . . equations

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 31: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Preliminary Set of Equations (2)Covariance Matrix Adaptation, Rank-One Update with Cumulation

Initialize m ∈ Rn, C = I, and pc

= 0 ∈ Rn,set σ = 1, cc ≈ 4/n, ccov ≈ 2/n2

While not terminate

xi = m + σ yi, y

i∼ Ni

(0,C

), i = 1, . . . , λ

m ← m + σyw

where yw

=µ∑

i=1

wi yi:λ

pc← (1− cc) p

c+√

1− (1− cc)2√µw yw

C ← (1− ccov)C + ccov pcp

cT︸ ︷︷ ︸

rank-one

. . .O(n2) toO(n)

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 32: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Using an evolution path for the rank-one update of thecovariance matrix reduces the number of function evaluationsto adapt to a straight ridge from O(n2) to O(n).a

aHansen, Muller and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution

Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1), pp. 1-18

The overall model complexity is n2 but important parts of themodel can be learned in time of order n

. . . step-size

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 33: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Covariance Matrix AdaptationCumulation—the Evolution PathStep-Size Control

5 Discussion

6 Evaluation

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 34: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Path Length ControlThe Concept

xi = m + σ yi

m ← m + σyw

Measure the length of the evolution path

the pathway of the mean vector m in thegeneration sequence

⇓decrease σ

⇓increase σ

loosely speaking steps are

perpendicular under random selection (in expectation)

perpendicular in the desired situation (to be most efficient)

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 35: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

SummaryCovariance Matrix Adaptation Evolution Strategy (CMA-ES) in a Nutshell

1 Multivariate normal distribution to generate new searchpoints

follows the maximum entropy principle

2 Selection only based on the ranking of the f -valuespreserves invariance

3 Covariance matrix adaptation (CMA) increases thelikelyhood of previously successful steps

learning all pairwise dependencies=⇒ adapts a variable metric

=⇒ new (rotated) problem representation

4 An evolution path (a non-local trajectory)enhances the covariance matrix (rank-one) adaptation

yields sometimes linear time complexitycontrols the step-size (step length)

aims at conjugate perpendicularity

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 36: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Summary of EquationsThe Covariance Matrix Adaptation Evolution Strategy

Initialize m ∈ Rn, σ ∈ R+, C = I, and pc

= 0, pσ

= 0,set cc ≈ 4/n, cσ ≈ 4/n, c1 ≈ 2/n2, cµ ≈ µw/n2, c1 + cµ ≤ 1,dσ ≈ 1 +

õwn ,

set λ and wi, i = 1, . . . , µ such that µw ≈ 0.3λ

While not terminate

xi = m + σ yi, y

i∼ Ni

(0,C

), sampling

m ← m + σyw

where yw

=∑µ

i=1 wi yi:λ

update mean

pc← (1− cc) p

c+ 1I{‖p

σ‖<1.5

√n}√

1− (1− cc)2√µw yw

cumulation for C

pσ← (1− cσ) p

σ+√

1− (1− cσ)2√µw C−12 y

wcumulation for σ

C ← (1− c1 − cµ) C + c1 pcp

cT + cµ

∑µi=1 wi y

i:λyT

i:λupdate C

σ ← σ × exp(

(‖p

σ‖

E‖N(0,I)‖ − 1))

update of σ

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 37: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

5 Discussion

6 Evaluation

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 38: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Experimentum CrucisWhat did we want to achieve?

reduce any convex-quadratic function

f (x) = xTHx

e.g. f (x) =Pn

i=1 106 i−1n−1 x2

i

to the sphere modelf (x) = xTx

without use of derivatives

lines of equal density align with lines of equal fitness

C ∝ H−1

in a stochastic sense

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 39: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Experimentum Crucis (1)f convex-quadratic, separable

0 2000 4000 600010

−20

10−10

100

1010

3e−08

4e−11

f=7.57197205170835e−15

blue:abs(f), cyan:f−min(f), green:sigma, red:axis ratio

0 2000 4000 6000−2

0

2

4

6

x(3)=−9.659e−10

x(6)=−4.1425e−10

x(5)=−2.9455e−10

x(7)=−1.4076e−10

x(8)=−4.1558e−12

x(9)=−2.1436e−12

x(4)=3.0757e−09

x(1)=5.8198e−09

x(2)=1.2625e−08Object Variables (9−D)

0 2000 4000 600010

−4

10−2

100

102

Principle Axes Lengths

function evaluations0 2000 4000 6000

10−4

10−2

100

102

9

8

7

6

5

4

3

2

1Standard Deviations in Coordinates divided by sigma

function evaluations

f (x) =∑n

i=1 10αi−1n−1 x2

i , α = 6

. . . crucis rotated

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 40: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Experimentum Crucis (2)f convex-quadratic, as before but non-separable (rotated)

0 2000 4000 600010

−20

10−10

100

1010

8e−093e−09

f=2.01014325086668e−15

blue:abs(f), cyan:f−min(f), green:sigma, red:axis ratio

0 2000 4000 6000−4

−2

0

2

4

x(4)=−1.0408e−08

x(1)=−9.5213e−09

x(6)=−5.675e−09

x(5)=−2.5302e−09

x(9)=−1.3347e−09

x(3)=2.0212e−10

x(7)=3.8418e−09

x(8)=7.135e−09

x(2)=7.8611e−09Object Variables (9−D)

0 2000 4000 600010

−4

10−2

100

102

Principle Axes Lengths

function evaluations0 2000 4000 6000

100

5

6

9

7

3

8

4

2

1Standard Deviations in Coordinates divided by sigma

function evaluations

C ∝ H−1 for allg,H

f (x) = g(xTHx

), g : R→ R stricly monotonic

. . . on convergence

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 41: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

On Global Convergence

convergence on a very broad class of functions, e.g. forMonte Carlo pure random search

very slow

convergence with practically feasible convergence rateson, e.g., ‖x‖α

Markov Chain analysis

Stability/Stationarity/Ergodicity of a markov chainthe markov chain always returns to “the center” of thestate space (recurrence)the chain exhibits an invariant measure, a limit probabilitydistribution

implies convergence/divergence

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 42: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

The Convergence RateOptimal convergence rate can be achieved

The converence rate for evolution strategies onf (x) = g(‖x− x∗‖) in iteration t reads5

limt→∞

1t

t∑k=1

log‖mk − x∗‖‖mk−1 − x∗‖

∝ −1n

0 1000 2000 300010

−20

10−10

100

1010

max=2e−08min=2e−08

f=1.7291816268996e−15

blue:abs(f), cyan:f−min(f), green:sigma, red:axis ratio

0 1000 2000 3000−0.5

0

0.5

1

1.5

x(7)=−1.5729e−08

x(3)=−1.3984e−08

x(10)=−9.3271e−09

x(1)=−7.1885e−09

x(4)=−2.8965e−09

x(5)=−1.1197e−09

x(6)=−5.7315e−10

x(8)=4.9008e−09

x(2)=8.0638e−09

x(9)=1.2656e−08Object Variables (10−D)

0 1000 2000 3000

10−0.8

10−0.5

10−0.2

100.1

Principle Axes Lengths

function evaluations0 1000 2000 3000

10−0.7

10−0.5

10−0.3

10−0.1

9

1

4

5

10

3

8

6

2

7Standard Deviations in Coordinates divided by sigma

function evaluations

loosely

‖mt − x∗‖ ∝ exp(− t

n

)=(

1e t

)1/n

random search exhibits ‖mt − x∗‖ ∝„

1t

«1/n

which is the lower bound for randomized direct search withisotropic sampling 6

5Auger 2005

6Jagerskupper 2008

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 43: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Convergence of the Covariance MatrixYet to be proven

Theorem (convergence of covariance matrix C)

Given the functionf (x) = g(xTHx)

where H is positive and g is monotonic, we have

E(C) ∝ H−1

where the expectation is taken with respect to the invariantmeasure

without use of derivatives0 2000 4000 600010

−20

10−10

100

1010

3e−08

4e−11

f=7.57197205170835e−15

blue:abs(f), cyan:f−min(f), green:sigma, red:axis ratio

0 2000 4000 6000−2

0

2

4

6

x(3)=−9.659e−10

x(6)=−4.1425e−10

x(5)=−2.9455e−10

x(7)=−1.4076e−10

x(8)=−4.1558e−12

x(9)=−2.1436e−12

x(4)=3.0757e−09

x(1)=5.8198e−09

x(2)=1.2625e−08Object Variables (9−D)

0 2000 4000 600010

−4

10−2

100

102

Principle Axes Lengths

function evaluations0 2000 4000 6000

10−4

10−2

100

102

9

8

7

6

5

4

3

2

1Standard Deviations in Coordinates divided by sigma

function evaluations

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 44: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

1 Introduction

2 The Challenges

3 Stochastic Search

4 The Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

5 Discussion

6 Evaluation

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 45: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Evaluation/Selection of Search Algorithms

Evaluation (of the performance) of a search algorithm needs

meaningful quantitative measure on benchmark functionsor real world problems

account for meta-parameter tuningcan be quite expensive

account for invariance properties (symmetries)prediction of performance is based on “similarity”, ideally

equivalence classes of functions

account for algorithm internal costoften negligible, depending on the objective function cost

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 46: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Comparison to BFGS, NEWUOA, PSO and DE (1)f convex-quadratic, separable with varying α

0

102

104

106

108

1010

10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

Ellipsoid dimension 20, 21 trials, tolerance 1e−09, eval max 1e+07

Condition number

SP

1

NEWUOA

BFGS

DE2

PSO

CMAES

f (x) = g(xTHx) withg identity (BFGS,NEWUOA) org order-preserving(strictly increasing, allother)

SP1 = average number of objective function evaluations to reach the targetfunction value of 10−9

. . . population size, invariance

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 47: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Comparison to BFGS, NEWUOA, PSO and DE (2)f convex-quadratic, non-separable (rotated) with varying α

0

102

104

106

108

1010

10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

Rotated Ellipsoid dimension 20, 21 trials, tolerance 1e−09, eval max 1e+07

Condition number

SP

1

NEWUOA

BFGS

DE2

PSO

CMAES

f (x) = g(xTHx) withg identity (BFGS,NEWUOA) org order-preserving(strictly increasing, allother)

SP1 = average number of objective function evaluations to reach the targetfunction value of 10−9

. . . population size, invariance

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 48: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Comparison to BFGS, NEWUOA, PSO and DE (3)f non-convex, non-separable (rotated) with varying α

0

102

104

106

108

1010

10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

Sqrt of sqrt of rotated ellipsoid dimension 20, 21 trials, tolerance 1e−09, eval max 1e+07

Condition number

SP

1

NEWUOA

BFGS

DE2

PSO

CMAES

f (x) = g(xTHx) with

g(.) = (.)1/4 (BFGS,NEWUOA) org any order-preserving(strictly increasing, allother)

SP1 = average number of objective function evaluations to reach the targetfunction value of 10−9

. . . population size, invariance

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 49: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

InvarianceThe short version

The grand aim of all science is to cover the greatest number of empirical factsby logical deduction from the smallest number of hypotheses or axioms.

— Albert Einstein

−5 −4 −3 −2 −1 0 1 2 3 4 50

5

10

15

20

25

func

tion

valu

e

f (x) = x2−5 −4 −3 −2 −1 0 1 2 3 4 50

1

2

3

4

5

6

7

func

tion

valu

e

f (x) = g1(x2)−5 −4 −3 −2 −1 0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

func

tion

valu

e

f (x) = g2(x2)

all three functions are equivalent for rank-based searchmethods

large equivalence class

invariance allows a save generalization of empiricalresults here on f (x) = x2 (left) to any f (x) = g(x2), where g is

monotonous

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 50: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Comprehensive Comparison of 11 AlgorithmsEmpirical Distribution of Normalized Success Performance

n = 10

100 101 1020

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 G−CMA−ES

DE

L−SaDE

DMS−L−PSO

EDA

BLX−GL50

L−CMA−ES

K−PCX

SPC−PNX

BLX−MA

CoEVO

FEs / FEsbest

empi

rical

dist

ribut

ion

over

all f

unct

ions

n = 30

100 101 1020

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 G−CMA−ES

K−PCX

L−CMA−ES

DMS−L−PSO

L−SaDE

EDA

SPC−PNX

BLX−GL50

CoEVO

BLX−MA

DE

FEs / FEsbestem

piric

al d

istrib

utio

n ov

er a

ll fun

ctio

ns

FEs = mean(#fevals)× #all runs (25)#successful runs , where #fevals includes only successful runs.

Shown: empirical distribution function of the Success Performance FEs divided byFEs of the best algorithm on the respective function.

Results of all functions are used where at least one algorithm was successful at least once, i.e. wherethe target function value was reached in at least one experiment (out of 11× 25 experiments).

Small values for FEs and therefore large (cumulative frequency) values in the graphsare preferable.

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach

Page 51: The challenges for stochastic optimization and a variable metric …nikolaus.hansen/Paris6handout.pdf · 2009. 4. 9. · Nikolaus Hansen, INRIA Saclay Microsoft Research–INRIA Joint

Introduction The Challenges Stochastic Search The CMA Evolution Strategy Discussion Evaluation

Merci !

http://www.lri.fr/˜hansen/cmaesintro.htmlor google NIKOLAUS HANSEN

Nikolaus Hansen, INRIA Saclay Stochastic optimization and a variable metric approach