Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Not all parents are equal for MO-CMA-ES

Ilya Loshchilov1,2, Marc Schoenauer1,2, Michèle Sebag2,1

1TAO Project-team, INRIA Saclay - Île-de-France

2and Laboratoire de Recherche en Informatique (UMR CNRS 8623)Université Paris-Sud, 91128 Orsay Cedex, France

Kindly presented by Jan Salmen, Universität Bochumwith apologies from the authors

Evolutionary Multi-Criterion Optimization - EMO 2011

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16



Content

1 MotivationWhy some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?

2 Parent Selection OperatorsTournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

3 Experimental validationResultsSummary




Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?

Why some parents are better than other ?

Objective 1

Ob

jective

2

Objective 1O

bje

ctive

2

Hard to find segments of the Pareto front

Some parents improve the fitness faster than other




Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?

Why MO-CMA-ES ?

µMO-(1+1)-CMA-ES ≡ µ× (1+1)-CMA-ES + global Pareto selection

CMA-ES excellent on single-objective problems (e.g., BBOB)

In µMO-(1+1)-CMA-ES, each individual is a (1+1)-CMA-ES1

Objective 1

Ob

jective

2

Objective 1

Ob

jective

2

Two typical adaptations for MO-CMA-ES

1C. Igel, N. Hansen, and S. Roth (2005). "The Multi-objective Variable Metric Evolution Strategy"




Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Evolution Loop for Steady-state MO-CMA-ES

Parent Selection

Variation

Evaluation

Environmental selection

Update / Adaptation





Tournament Selection (quality-based)

A total preorder relation ≺X is defined on any finite subset X:

x ≺X y ⇔ PRank(x,X) < PRank(y,X) // lower Pareto rank

or // same Pareto rank and higher Hypervolume Contribution

PRank(x,X) = PRank(y,X) and ∆H(x,X) > ∆H(y,X)

Tournament (µ+t 1) selection for MOO

Input: tournament size t ∈ IN ; population of µ individuals X

Procedure: uniformly select t individuals from X

Output: the best individual among t according to ≺X criterion

With t = 1: standard steady-state MO-CMA-ES with random selection a

aC. Igel, T. Suttorp, and N. Hansen (EMO 2007). "Steady-state Selection and EfficientCovariance Matrix Update in the Multi-objective CMA-ES"





Multi-armed Bandits

Original Multi-armed Bandits (MAB) problem

A gambler plays arm (machine) j at time t

and wins reward : rj,t ={

1 with prob. p0 with prob. (1-p)

Goal: maximize the sum of rewards

Upper Confidence Bound (UCB) [Auer, 2002]

Initialization: play each arm onceLoop: play arm j that maximizes:

r̄j,t + C√

2 ln∑

knk,t

nj,t,

where{

r̄j,t average reward of arm j

nj,t number of plays of arm j





Reward-based Parent Selection

MAB for MOO

r̄j,t is the average reward along a time window of size w

ni,t = 0 for new offspring or for an individual selected w steps ago

select parent i =

{

i with ni,t = 0 if exist,

i = Argmax{

r̄j,t + C√

2 ln∑

knk,t

nj,t

}

otherwiseAt the moment, C = 0 (exploration iff ni,t = 0)

Objective 1

Ob

jective

2

1. =2. = +mutation3. Is successful ?4. Update and5.

ParentSelection()

ComputeRewards()





Defining Rewards I: (µ+ 1succ), (µ+ 1rank)

Parent a from the population Qg generates offspring a′.Both the offspring and the parent receive reward r:

(µ+ 1succ)

If a′ becomes member of new population Q(g+1):

r = 1 if a′ ∈ Q(g+1), and 0 otherwise

(µ+ 1rank)

A smoother reward is defined by the rank of a′ in Q(g+1):

r = 1−rank(a′)

µif a′ ∈ Q(g+1), and 0 otherwise





Defining Rewards II: (µ+ 1∆H1), (µ+ 1∆Hi

)

(µ+ 1∆H1)

Set the reward to the increase of the total Hypervolume contributionfrom generation g to g + 1:

r =

{

0 if offspring is dominated∑

a∈Q(g+1) ∆H(a,Q(g+1))−∑

a∈Q(g) ∆H(a,Q(g)) otherwise

(µ+ 1∆Hi)

A relaxation of the above reward, involving a rank-based penalization:

r =1

2k−1

∑

ndomk(Q(g+1))

∆H(a, ndomk(Q(g+1)))−

∑

ndomk(Q(g))

∆H(a, ndomk(Q(g)))

where k denotes the Pareto rank of the current offspring, andndomk(Q

(g)) is k-th non-dominated front of Q(g).




ResultsSummary


Algorithms

The steady-state MO-CMA-ES with modified parent selection:- 2 tournament-based: (µ+2 1) and (µ+10 1);- 4 MAB-based: (µ+ 1succ), (µ+ 1rank), (µ+ 1∆H1) and (µ+ 1∆Hi

).

The baseline MO-CMA-ES:- steady-state (µ+ 1)-MO-CMA , (µ≺ + 1)-MO-CMA and generational(µ+ µ)-MO-CMA.

Default parameters (µ = 100), 200,000 evaluations, 31 runs.

Benchmark Problems:

sZDT1:3-6 with the true Pareto front shifted in decision space:x′

i ← |xi − 0.5| for 2 ≤ i ≤ n

IHR1:3-6 rotated variants of the original ZDT problems

LZ09-1:5 with complicated Pareto front in decision space 2

2H. Li and Q. Zhang. "Multiobjective Optimization Problems With Complicated Pareto Sets,MOEA/D and NSGA-II." IEEE TEC 2009




ResultsSummary

Results: sZDT1, IHR1, LZ3 and LZ4

0 2 4 6 8 10

x 104

10−2

10−1

100

sZDT1

number of function evaluations

∆ H

targ

et

(µ + 1)(µ

< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

0 2 4 6 8 10

x 104

10−5

10−4

10−3

10−2

10−1

100

IHR1


∆ H

targ

et

(µ + 1)(µ

< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

0 2 4 6 8 10

x 104

10−2

10−1

100

LZ09−3


∆ H

targ

et

(µ + 1)

(µ< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

0 2 4 6 8 10

x 104

10−3

10−2

10−1

100

LZ09−4


∆ H

targ

et

(µ + 1)(µ

< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)




ResultsSummary

Results: (µ+ 1), (µ+10 1), (µ+ 1rank) on LZ problems

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-4

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-4

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-4

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-5

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-5

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-5

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-3

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-3

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-3

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-2

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-2

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-2

x2

x3

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

x1

LZ09-1

x2

x3

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

x1

LZ09-1

x2

x3

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

x1

LZ09-1

x2

x3




ResultsSummary

Loss of diversity

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Objective 1

Obj

ectiv

e 2

sZDT2

All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES

0 0.2 0.4 0.6 0.8 1 1.2−1

0

1

2

3

4

5

Objective 1

Obj

ectiv

e 2

IHR3

All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES

Typical behavior of (µ+ 1succ)-MO-CMA on sZDT2 (left) and IHR3 (right)problems after 5,000 fitness function evaluations.




ResultsSummary

Summary

Speed-up (factor 2-5) with MAB schemes: (µ+ 1rank) and (µ+ 1∆Hi).

Loss of diversity, especially on multi-modal problems. Too greedyschemes: (µ+ 1succ) and (µ+t 1).

Perspectives

Find "appropriate" C for exploration term r̄j,t + C√

2 ln∑

k nk,t

nj,t.

Allocate some budget of evaluations for dominated arms (individuals)generated in the past to preserve the diversity.

Integrate the reward mechanism and update rules of (1+1)-CMA-ES,e.g. success rate and average reward in (µ+ 1rank).

Experiments on Many-objective problems.




ResultsSummary

Thank you for your attention !

Please send your questions to{Ilya.Loshchilov,Marc.Schoenauer,Michele.Sebag}@inria.fr


Technology

Not all parents are equal for MO-CMA-ES