Upload
ilya-loshchilov
View
230
Download
0
Tags:
Embed Size (px)
DESCRIPTION
t. The Steady State variants of the Multi-Objective CovarianceMatrix Adaptation Evolution Strategy (SS-MO-CMA-ES) generate oneoffspring from a uniformly selected parent. Some other parental selectionoperators for SS-MO-CMA-ES are investigated in this paper. These operators involve the definition of multi-objective rewards, estimating theexpectation of the offspring survival and its Hypervolume contribution.Two selection modes, respectively using tournament, and inspired fromthe Multi-Armed Bandit framework, are used on top of these rewards.Extensive experimental validation comparatively demonstrates the merits of these new selection operators on unimodal MO problems
Citation preview
MotivationParent Selection Operators
Experimental validation
Not all parents are equal for MO-CMA-ES
Ilya Loshchilov1,2, Marc Schoenauer1,2, Michèle Sebag2,1
1TAO Project-team, INRIA Saclay - Île-de-France
2and Laboratoire de Recherche en Informatique (UMR CNRS 8623)Université Paris-Sud, 91128 Orsay Cedex, France
Kindly presented by Jan Salmen, Universität Bochumwith apologies from the authors
Evolutionary Multi-Criterion Optimization - EMO 2011
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16
MotivationParent Selection Operators
Experimental validation
Content
1 MotivationWhy some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?
2 Parent Selection OperatorsTournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
3 Experimental validationResultsSummary
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 2/ 16
MotivationParent Selection Operators
Experimental validation
Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?
Why some parents are better than other ?
Objective 1
Ob
jective
2
Objective 1O
bje
ctive
2
Hard to find segments of the Pareto front
Some parents improve the fitness faster than other
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 3/ 16
MotivationParent Selection Operators
Experimental validation
Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?
Why MO-CMA-ES ?
µMO-(1+1)-CMA-ES ≡ µ× (1+1)-CMA-ES + global Pareto selection
CMA-ES excellent on single-objective problems (e.g., BBOB)
In µMO-(1+1)-CMA-ES, each individual is a (1+1)-CMA-ES1
Objective 1
Ob
jective
2
Objective 1
Ob
jective
2
Two typical adaptations for MO-CMA-ES
1C. Igel, N. Hansen, and S. Roth (2005). "The Multi-objective Variable Metric Evolution Strategy"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 4/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Evolution Loop for Steady-state MO-CMA-ES
Parent Selection
Variation
Evaluation
Environmental selection
Update / Adaptation
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 5/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Tournament Selection (quality-based)
A total preorder relation ≺X is defined on any finite subset X:
x ≺X y ⇔ PRank(x,X) < PRank(y,X) // lower Pareto rank
or // same Pareto rank and higher Hypervolume Contribution
PRank(x,X) = PRank(y,X) and ∆H(x,X) > ∆H(y,X)
Tournament (µ+t 1) selection for MOO
Input: tournament size t ∈ IN ; population of µ individuals X
Procedure: uniformly select t individuals from X
Output: the best individual among t according to ≺X criterion
With t = 1: standard steady-state MO-CMA-ES with random selection a
aC. Igel, T. Suttorp, and N. Hansen (EMO 2007). "Steady-state Selection and EfficientCovariance Matrix Update in the Multi-objective CMA-ES"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 6/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Multi-armed Bandits
Original Multi-armed Bandits (MAB) problem
A gambler plays arm (machine) j at time t
and wins reward : rj,t ={
1 with prob. p0 with prob. (1-p)
Goal: maximize the sum of rewards
Upper Confidence Bound (UCB) [Auer, 2002]
Initialization: play each arm onceLoop: play arm j that maximizes:
r̄j,t + C√
2 ln∑
knk,t
nj,t,
where{
r̄j,t average reward of arm j
nj,t number of plays of arm j
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 7/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Reward-based Parent Selection
MAB for MOO
r̄j,t is the average reward along a time window of size w
ni,t = 0 for new offspring or for an individual selected w steps ago
select parent i =
{
i with ni,t = 0 if exist,
i = Argmax{
r̄j,t + C√
2 ln∑
knk,t
nj,t
}
otherwiseAt the moment, C = 0 (exploration iff ni,t = 0)
Objective 1
Ob
jective
2
1. =2. = +mutation3. Is successful ?4. Update and5.
ParentSelection()
ComputeRewards()
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 8/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Defining Rewards I: (µ+ 1succ), (µ+ 1rank)
Parent a from the population Qg generates offspring a′.Both the offspring and the parent receive reward r:
(µ+ 1succ)
If a′ becomes member of new population Q(g+1):
r = 1 if a′ ∈ Q(g+1), and 0 otherwise
(µ+ 1rank)
A smoother reward is defined by the rank of a′ in Q(g+1):
r = 1−rank(a′)
µif a′ ∈ Q(g+1), and 0 otherwise
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 9/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Defining Rewards II: (µ+ 1∆H1), (µ+ 1∆Hi
)
(µ+ 1∆H1)
Set the reward to the increase of the total Hypervolume contributionfrom generation g to g + 1:
r =
{
0 if offspring is dominated∑
a∈Q(g+1) ∆H(a,Q(g+1))−∑
a∈Q(g) ∆H(a,Q(g)) otherwise
(µ+ 1∆Hi)
A relaxation of the above reward, involving a rank-based penalization:
r =1
2k−1
∑
ndomk(Q(g+1))
∆H(a, ndomk(Q(g+1)))−
∑
ndomk(Q(g))
∆H(a, ndomk(Q(g)))
where k denotes the Pareto rank of the current offspring, andndomk(Q
(g)) is k-th non-dominated front of Q(g).
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 10/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Experimental validation
Algorithms
The steady-state MO-CMA-ES with modified parent selection:- 2 tournament-based: (µ+2 1) and (µ+10 1);- 4 MAB-based: (µ+ 1succ), (µ+ 1rank), (µ+ 1∆H1) and (µ+ 1∆Hi
).
The baseline MO-CMA-ES:- steady-state (µ+ 1)-MO-CMA , (µ≺ + 1)-MO-CMA and generational(µ+ µ)-MO-CMA.
Default parameters (µ = 100), 200,000 evaluations, 31 runs.
Benchmark Problems:
sZDT1:3-6 with the true Pareto front shifted in decision space:x′
i ← |xi − 0.5| for 2 ≤ i ≤ n
IHR1:3-6 rotated variants of the original ZDT problems
LZ09-1:5 with complicated Pareto front in decision space 2
2H. Li and Q. Zhang. "Multiobjective Optimization Problems With Complicated Pareto Sets,MOEA/D and NSGA-II." IEEE TEC 2009
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 11/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Results: sZDT1, IHR1, LZ3 and LZ4
0 2 4 6 8 10
x 104
10−2
10−1
100
sZDT1
number of function evaluations
∆ H
targ
et
(µ + 1)(µ
< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
0 2 4 6 8 10
x 104
10−5
10−4
10−3
10−2
10−1
100
IHR1
number of function evaluations
∆ H
targ
et
(µ + 1)(µ
< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
0 2 4 6 8 10
x 104
10−2
10−1
100
LZ09−3
number of function evaluations
∆ H
targ
et
(µ + 1)
(µ< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
0 2 4 6 8 10
x 104
10−3
10−2
10−1
100
LZ09−4
number of function evaluations
∆ H
targ
et
(µ + 1)(µ
< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 12/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Results: (µ+ 1), (µ+10 1), (µ+ 1rank) on LZ problems
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-4
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-4
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-4
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-5
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-5
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-5
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-3
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-3
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-3
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-2
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-2
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-2
x2
x3
0
0.5
1
0
0.5
10
0.2
0.4
0.6
0.8
1
x1
LZ09-1
x2
x3
0
0.5
1
0
0.5
10
0.2
0.4
0.6
0.8
1
x1
LZ09-1
x2
x3
0
0.5
1
0
0.5
10
0.2
0.4
0.6
0.8
1
x1
LZ09-1
x2
x3
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 13/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Loss of diversity
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Objective 1
Obj
ectiv
e 2
sZDT2
All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES
0 0.2 0.4 0.6 0.8 1 1.2−1
0
1
2
3
4
5
Objective 1
Obj
ectiv
e 2
IHR3
All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES
Typical behavior of (µ+ 1succ)-MO-CMA on sZDT2 (left) and IHR3 (right)problems after 5,000 fitness function evaluations.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 14/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Summary
Speed-up (factor 2-5) with MAB schemes: (µ+ 1rank) and (µ+ 1∆Hi).
Loss of diversity, especially on multi-modal problems. Too greedyschemes: (µ+ 1succ) and (µ+t 1).
Perspectives
Find "appropriate" C for exploration term r̄j,t + C√
2 ln∑
k nk,t
nj,t.
Allocate some budget of evaluations for dominated arms (individuals)generated in the past to preserve the diversity.
Integrate the reward mechanism and update rules of (1+1)-CMA-ES,e.g. success rate and average reward in (µ+ 1rank).
Experiments on Many-objective problems.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 15/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Thank you for your attention !
Please send your questions to{Ilya.Loshchilov,Marc.Schoenauer,Michele.Sebag}@inria.fr
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 16/ 16