63
Adaptive Operator Selection for Optimization ´ Alvaro Fialho Advisors: Marc Schoenauer & Mich` ele Sebag Ph.D. Defense ´ Ecole Doctorale d’Informatique Universit´ e Paris-Sud, Orsay, France December 22, 2010

Adaptive Operator Selection for Optimization Alvaro Fialho

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Adaptive Operator Selection for Optimization

Alvaro Fialho

Advisors: Marc Schoenauer & Michele Sebag

Ph.D. DefenseEcole Doctorale d’Informatique

Universite Paris-Sud, Orsay, FranceDecember 22, 2010

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Outline

1 Context & Motivation

2 Operator Selection

3 Credit Assignment

4 Empirical Validation

5 Conclusions & Further Work

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 2/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Context & Motivation

1 Context & MotivationEvolutionary AlgorithmsParameter Setting in EAsParameter Setting of Variation OperatorsAdaptive Operator Selection

2 Operator Selection

3 Credit Assignment

4 Empirical Validation

5 Conclusions & Further Work

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 3/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Evolutionary Algorithms

Stochastic optimization algorithms (Darwinian paradigm)Bottleneck: parameter setting

Population size and number of offspring generatedParameters of selection and replacement methodsParameters of Variation Operators (application rate, etc)

Goal: Automatic parameter setting (Crossing the Chasm)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 4/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Evolutionary Algorithms

Stochastic optimization algorithms (Darwinian paradigm)Bottleneck: parameter setting

Population size and number of offspring generatedParameters of selection and replacement methodsParameters of Variation Operators (application rate, etc)

Goal: Automatic parameter setting (Crossing the Chasm)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 4/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parameter Setting in EAs

[Eiben et al., 2007]

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 5/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parameter Setting of Variation Operators

Difficult to predict the performance

Problem-dependent and inter-dependent choices

Off-line tuning can find the best static strategy (expensive)

fitness of the parent

1000 3000 5000 7000 9000

0

1

2

3

4

5 1-Bit 3-Bit 5-Bit 1/n BitFlip

Performance of operators on OneMax

Depends also on...

Fitness of the parents

Pop. fitness distribution(sample fig. with a (1+50)-EA)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 6/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parameter Setting of Variation Operators

Difficult to predict the performance

Problem-dependent and inter-dependent choices

Off-line tuning can find the best static strategy (expensive)

fitness of the parent

1000 3000 5000 7000 9000

0

1

2

3

4

5 1-Bit 3-Bit 5-Bit 1/n BitFlip

Performance of operators on OneMax

Depends also on...

Fitness of the parents

Pop. fitness distribution(sample fig. with a (1+50)-EA)

⇒ Should be adapted on-line, while solving the problem

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 6/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parameter Setting in EAs

[Eiben et al., 2007]

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 7/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Adaptive Operator Selection

Position of the Problem

Given a set of K variation operators

Select on-line the operator to be applied next

Based on their recent performance

Operator

Application

Impact

Evaluation

EA AOS

Operator

Selection

Credit

Assignment

quality op1

. . .

quality op2

quality opk

impact

operator credit or

reward

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 8/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Operator Selection

1 Context & Motivation

2 Operator SelectionRelated WorkDiscussion on Operator SelectionA (kind of) Multi-Armed Bandit problemDynamic Multi-Armed Bandit (DMAB)Sliding Multi-Armed Bandit (SLMAB)Contributions to Operator Selection: Summary

3 Credit Assignment

4 Empirical Validation

5 Conclusions & Further WorkAlvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 9/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Operator Selection - Related Work

“Empirical quality”: qj ,t+1 = (1 − α) · qj ,t + α · rj ,t

Probability Matching (PM) [Goldberg, 1990]

si proportional to qi

si ,t+1 = pmin + (1 − K · pmin) ·qi ,t+1

∑Kj=1 qj ,t+1

Adaptive Pursuit (AP) [Thierens, 2005]

si∗ is pushed to pmax ; others to pmin

i∗ = argmax{qi ,t, i = 1 . . . K}

si∗,t+1 = si∗,t + β · (pmax − si∗,t) ,

si ,t+1 = si ,t + β · (pmin − si ,t) , for i 6= i∗

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 10/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Discussion on Operator Selection

Exploration versus Exploitation

In operators search space, not problem search space

Acquire new information (use other operators) vs.Capitalize on the available knowledge (use current best)

Probability-based Methods (PM and AP)

Conservative approach: fixed pmin

Entails over-exploration when many operators

EvE balance ⇒ Game Theory: Multi-Armed Bandits

Level of exploration depends on confidence about knowledge

i.e., pmin should be “dynamic”

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 11/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

A (kind of) Multi-Armed Bandit problem

Original Multi-Armed Bandits (Machine Learning - ML)

Given K arms (≡ operators)

At time t, gambler plays arm j and gets

rj,t = 1 with (unknown) prob. pj

rj,t = 0 otherwise

Goal: maximize cumulative reward ≡ minimize regret

L(T ) =

T∑

t=1

(r∗t − rt)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 12/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

The Upper Confidence Bound MAB algorithm

Assymptotic optimality guarantees (static context) [Auer et al., 2002]

Optimal L(T ) = O(log T )

At time t, choose arm i maximizing:

scorei ,t = qi ,t︸︷︷︸

exploitation

+

2 log∑

k nk,t

ni ,t︸ ︷︷ ︸

exploration

with ni ,t+1 = ni ,t + 1 # times

and qi ,t+1 =(

1 − 1ni,t+1

)

· qi ,t + 1ni,t+1

· ri ,t emp. qual.

Efficiency comes from optimal EvE balance

Interval between exploration trials increases exponentiallyw.r.t. # time steps

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 13/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Operator Selection with UCB: shortcomings

Exploration vs. Exploitation (EvE) balance

Original MAB: rewards ∈ {0, 1};

AOS: rewards ∈ [a, b] (e.g., fitness improvement)

UCB’s EvE balance is broken, Scaling is needed:

scorei ,t = qi ,t + C√

2 logP

k nk,t

ni,t

Dynamics

When opi is not the best anymore . . .

qi ,t+1 =(

1 − 1ni,t+1

)

· qi ,t + 1ni,t+1

· ri ,t

Weight of r is inversely proportional to n

Adjusting q’s after a change takes a long time

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 14/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Dynamic Multi-Armed Bandit (DMAB)

Rationale

No need for exploration in stationary situations

⇒ Upon the detection of a change, restart the MAB.

How to detect a change in a distribution?

Page-Hinkley statistical test [Page, 1954]

1 rt = 1t

Pti=1 ri

2 mt =Pt

i=1(ri − ri + δ),

3 Mt = max{|mi |, i = 1 . . . t}

4 Return (Mt − |mt | > γ)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 15/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Dynamic Multi-Armed Bandit (DMAB)

Rationale

No need for exploration in stationary situations

⇒ Upon the detection of a change, restart the MAB.

How to detect a change in a distribution?

Page-Hinkley statistical test [Page, 1954]

1 rt = 1t

Pti=1 ri

2 mt =Pt

i=1(ri − ri + δ),

3 Mt = max{|mi |, i = 1 . . . t}

4 Return (Mt − |mt | > γ)

DMAB = UCB + Scaling + Page-Hinkley

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 15/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Sliding Multi-Armed Bandit (SLMAB)

MAB: qi ,t+1 =(

1 − 1ni,t+1

)

· qi ,t + 1ni,t+1

· ri ,t

Too slow: weight of r inversely proportional to n

AP/PM: qi ,t+1 = (1 − α) · qi ,t + α · ri ,tFixed weight, extra hyper-parameter

Rationale

Weight of ri ,t adapted w.r.t. operator application frequency

smaller weight if frequently applied, bigger otherwise

qi ,t+1 = qi ,t ·W

W+(t−ti )+ 1

ni,t+1· ri ,t

ni ,t+1 = ni ,t ·(

WW+(t−ti )

+ 1ni,t+1

) weight ∈ {1/W ,W}

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 16/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Sliding Multi-Armed Bandit (SLMAB)

MAB: qi ,t+1 =(

1 − 1ni,t+1

)

· qi ,t + 1ni,t+1

· ri ,t

Too slow: weight of r inversely proportional to n

AP/PM: qi ,t+1 = (1 − α) · qi ,t + α · ri ,tFixed weight, extra hyper-parameter

Rationale

Weight of ri ,t adapted w.r.t. operator application frequency

smaller weight if frequently applied, bigger otherwise

qi ,t+1 = qi ,t ·W

W+(t−ti )+ 1

ni,t+1· ri ,t

ni ,t+1 = ni ,t ·(

WW+(t−ti )

+ 1ni,t+1

) weight ∈ {1/W ,W}

SLMAB = MAB with Sliding update rule

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 16/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Contributions to Operator Selection: Summary

MAB = UCB + Scaling

Optimal EvE, but in static setting. . . AOS is dynamic

DMAB = MAB + Page-Hinkley change-detection

Won Pascal challenge on On-line EvE trade-off [Hartland et al., 2007]

Utilization in the AOS context [GECCO’08]

2 hyper-parameters: scaling C and Page-Hinkley threshold γ

Very efficient, but very sensitive to hyper-parameter setting

Change-detection works only when changes are abrupt

SLMAB = MAB + Sliding update rule

2 hyper-parameters: scaling C and sliding window size W

W set to the Credit Assignment sliding window size

Better than or similar to DMAB on artificial scenarios [AMAI’10]

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 17/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Credit Assignment

1 Context & Motivation

2 Operator Selection

3 Credit AssignmentRelated WorkExtreme Value Based Credit AssignmentDiscussion on Credit AssignmentRank-based Area-Under-Curve (AUC)Rank-based AUC with MABContributions to Credit Assignment: Summary

4 Empirical Validation

5 Conclusions & Further WorkAlvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 18/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Credit Assignment - Related Work

Impact of an operator application?

Most common: Fitness Improvement ∆F

Given fitness function F , operator o, and individual x

∆F = F(o(x)) −F(x)

For multi-modal problems: diversity also important

From Impact to Credit (or reward)

Instantaneous (∆F last application) likely to be unstable

Average of the last W applications

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 19/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Extreme Value Based Credit Assignment

Impact: fitness improvement

Outlier operators are rarely considered - smaller expectation.

EC: Focus on extreme, rather than avg events [Whitacre et al., 2006]

Complex systems, e.g. rogue waves, epidemic propagation

Extreme Value-Based Credit Assignment

r = Extreme value over a Window of size W [PPSN’08]

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 20/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Discussion on Credit Assignment

Schemes based on raw values of ∆F

Instantaneous, Average, Extreme

Rewards are problem-dependent

Different fitness landscapes = different ranges and variance

Rewards are “moment”-dependent

Range and variance might reduce as the search advancesImprovements tend to become more and more scarce

Sensitiveness of scaling factor C

MAB, DMAB and SLMAB very sensitive to C

C has a double role:

Correction of the EvE balanceScaling of the rewards

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 21/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Discussion on Credit Assignment

Schemes based on raw values of ∆F

Instantaneous, Average, Extreme

Rewards are problem-dependent

Different fitness landscapes = different ranges and variance

Rewards are “moment”-dependent

Range and variance might reduce as the search advancesImprovements tend to become more and more scarce

Sensitiveness of scaling factor C

MAB, DMAB and SLMAB very sensitive to C

C has a double role:

Correction of the EvE balanceScaling of the rewards

Higher robustness: Credit Assignment based on Ranks

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 21/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

Area Under ROC Curve in ML

Evaluation of binary classifiers [Fawcett, 2006]

[ + + - - + + + - - - - . . . ]

Performance: % of misclassificationEquivalent to MannWhitneyWilcoxon test

Pr (rank(n+) > rank(n−))

Area Under ROC Curve in AOS

One operator versus others [GECCO’10]

[ op1, op2,op1,op1,op1, op2, op2, . . .]

Fitness improvements are ranked

Size of the segment = assigned rank-value 0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

opera

tor

under

ass

ess

ment (1

)

other operators

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 22/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

00

Op. 2, Step 0

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

1

0

Op. 2, Step 1

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

2

0

Op. 2, Step 2

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

2

1

Op. 2, Step 3

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

2

2

Op. 2, Step 4

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

8

7

Op. 2, Step 15

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12 14

Op. 0: 1.39Op. 1: 31.94Op. 2: 61.11Op. 3: 5.56

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14

original AUC in ML: equal widths

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12 14

Op. 0: 1.39Op. 1: 31.94Op. 2: 61.11Op. 3: 5.56

segments with same size

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based Area-Under-Curve (AUC)

R ∆F Op

1 5.0 22 4.7 23 4.2 14 3.5 15 3.4 26 3.3 27 3.1 28 3.0 29 2.9 210 2.8 211 2.5 312 2.0 113 1.5 014 1.0 315 0.8 0

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14

original AUC in ML: equal widthsexponential D=0.5: (D^R).(W-R)

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12 14

Op. 0: 1.39Op. 1: 31.94Op. 2: 61.11Op. 3: 5.56

segments with same size

0

5

10

15

20

25

0 5 10 15 20 25 30

Op. 0: 0.00Op. 1: 5.36

Op. 2: 94.64Op. 3: 0.00

exponential decayDR(W − R)

(example with D = 0.5) ⇒

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Rank-based AUC with MAB

MAB for AOS

Original MAB: very slow adaptation

AUC: behavior of all ops.: dynamic by construction

Extreme DMAB/SLMAB: sensitive hyper-parameter setting

AUC is rank-based, fitness ranges don’t matter

Use of AUC within MAB

Originally, q is the average of received rewards

AUC is already an aggregation

⇒ directly use AUC in UCB:

scorej ,t = AUCj ,t + C ·√

2 logP

k nk,t

nj,t

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 24/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Contributions to Credit Assignment: Summary

Extreme Value Based

Empirically shown to outperform baseline Inst/Avg

But based on raw values of ∆F : problem-dependent

Area-Under-Curve (AUC)

Ranks over fitness improvements (∆F)

Invariant w.r.t. linear scaling of F

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 25/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Contributions to Credit Assignment: Summary

Extreme Value Based

Empirically shown to outperform baseline Inst/Avg

But based on raw values of ∆F : problem-dependent

Area-Under-Curve (AUC)

Ranks over fitness improvements (∆F)

Invariant w.r.t. linear scaling of F

Fitness-based AUC (FAUC)

Ranks over fitness values (F), rather than ranks over ∆F

Invariant with respect to monotonous transformations

e.g., behavior on F = behavior on Fn, log(F), exp(F), . . .

Comparison-based property maintained

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 25/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Empirical Validation

1 Context & Motivation

2 Operator Selection

3 Credit Assignment

4 Empirical ValidationAOS Combinations and Hyper-ParametersGoals of ExperimentsComparative Performance on the OneMax ProblemInvariance Analysis on the OneMax ProblemComparative Results on BBOB

5 Conclusions & Further WorkAlvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 26/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

AOS Combinations and Hyper-Parameters

Proposed AOS Combinations

MAB DMAB SLMABExtreme(∆F) [GECCO’08, PPSN’08, LION’09, GECCO’09] [AMAI’10]

AUC(∆F)FAUC(F)

[GECCO’10, BBOB’10,

PPSN’10]

Hyper-Parameters

Off-line tuned by F-Race [Birattari et al., 2002]

Operator Selection:

MAB : scaling factor CDMAB : scaling factor C, PH threshold γ

SLMAB : scaling factor C – if using W from Credit AssignmentAUC-B : scaling factor C, decay factor D (≡ 0.5 ok)

Credit Assignment:sliding window size W and type (Average, Extreme, AUC)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 27/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Goals of Experiments

Given a set of K operators . . .

Performance ?

Baseline methods1 Each operator being applied alone2 Naive uniform selection between operators3 Static off-line tuning of application rates (cost ≫)4 Optimal behavior (available only on simple benchmarks)5 State-of-the-art OS method: Adaptive Pursuit [Thierens, 2005]

Robustness/Generality ?

AOS methods have hyper-parameters

Robustness w.r.t. hyper-parameter settingGenerality w.r.t. different problems/landscapes

Invariance properties

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 28/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

The OneMax Problem

104 bits

Fitness:# of “1”s

(1+50)-GA

4 mutationoperators

fitness of the parent

1000 3000 5000 7000 9000

0

1

2

3

4

5 1-Bit 3-Bit 5-Bit 1/n BitFlip

Performance of operators on OneMax

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 29/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

The OneMax Problem

104 bits

Fitness:# of “1”s

(1+50)-GA

4 mutationoperators

fitness of the parent

1000 3000 5000 7000 9000

0

1

2

3

4

5 1-Bit 3-Bit 5-Bit 1/n BitFlip

Performance of operators on OneMax

Generations0 1000 2000 3000 4000 5000

0

1

0.5

Fitness 1-Bit 3-Bit 5-Bit 1/n BitFlipChanges

Optimal Operator Selection (Oracle)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 29/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Comparative Behavior on the OneMax Problem

0 1000 2000 3000 4000 50000

1

0.5

Extreme - Adaptive Pursuit

0 1000 2000 3000 4000 50000

1

0.5

Extreme - Dynamic Multi-Armed Bandit

0 1000 2000 3000 4000 50000

1

0.5

Area-Under-Curve - Bandit

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 30/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Comparative Performance on the OneMax Problem

0

25

50

75

100

4500 5000 5500 6000 6500 7000 7500

AUC-MABExt-SLMABExt-DMAB

Ext-MABExt-APNaive

Best StaticOracle

*Best Static: 1-Bit 80% + 5-Bit 20%

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 31/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Comparative Performance on the OneMax Problem

0

25

50

75

100

4500 5000 5500 6000 6500 7000 7500

AUC-MABExt-SLMABExt-DMAB

Ext-MABExt-APNaive

Best StaticOracle

*Best Static: 1-Bit 80% + 5-Bit 20%

Other scenarios

Artificial

LongK-Path

Royal Road

SAT probs.

Continuous

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 31/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Analysis of Invariance w.r.t. Monotonous Transformations

Original OneMax: F =∑n

i=1 bi

3 monotonous transformations: log(F), exp (F) and F2

(h-l) F =P

bi log(F) exp(F) F 2 AOS tech.

485 5103/427 5195/430 5562/950 5588/950 AUC-MAB

807 5123/218 5431/223 5930/334 5792/382 Ext-AP

0 5726/399 5726/399 5726/399 5726/399 FAUC-MAB

2591 5376/285 7967/718 7722/2151 6138/516 Ext-DMAB

6971 6059/667 8863/694 13030/3053 12136/949 Ext-SLMAB

7052 9044/840 7947/1267 14999/0 14999/0 Ext-MAB

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 32/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Black-Box Optimization Benchmark (BBOB)

Exp. framework for rigorous benchmarking [Hansen et al., 2010]

24 continuous functions, 15 instances per function

Several problem dimensions (2, 3, 5, 10, 20, 40)

Adaptive Operator Selection in Differential Evolution

A completely different evolutionary algorithm

NP = 100 · DIM;CR = 1.0;F = 0.5

With 4 possible mutation strategies

rand/1, rand/2, rand-to-best/2, current-to-rand/1

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 33/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parwise comparisons of FAUC-Bandit with . . . (sample fig)

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

pro

port

ion

f6-9

DE1: 4/4

DE2: 4/3

DE3: 4/4

DE4: 4/0

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 34/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parwise comparisons of FAUC-Bandit with . . .

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f1-24

DE1: 15/15

DE2: 15/12

DE3: 15/15

DE4: 15/0

(a) all functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f1-5

DE1: 3/3

DE2: 3/3

DE3: 3/3

DE4: 3/0

(b) separable functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f6-9

DE1: 4/4

DE2: 4/3

DE3: 4/4

DE4: 4/0

(c) moderate functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f10-14

DE1: 5/5

DE2: 5/5

DE3: 5/5

DE4: 5/0

(d) ill-conditioned functions

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 35/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parwise comparisons of FAUC-Bandit with . . .

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f1-24

pm: 15/15

AP: 15/15

DMAB: 15/15

SLMAB: 15/15

MAB: 15/15

(e) all functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f1-5

pm: 3/3

AP: 3/3

DMAB: 3/3

SLMAB: 3/3

MAB: 3/3

(f) separable functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f6-9

pm: 4/4

AP: 4/4

DMAB: 4/4

SLMAB: 4/4

MAB: 4/4

(g) moderate functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f10-14

pm: 5/5

AP: 5/5

DMAB: 5/5

SLMAB: 5/5

MAB: 5/5

(h) ill-conditioned functions

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 36/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Parwise comparisons of FAUC-Bandit with . . .

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f1-24

Naive: 15/15

StAll: 15/15

StEach: 15/15

CMA: 15/19

(i) all functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f1-5

Naive: 3/3

StAll: 3/3

StEach: 3/3

CMA: 3/3

(j) separable functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f6-9

Naive: 4/4

StAll: 4/4

StEach: 4/4

CMA: 4/4

(k) moderate functions

-2 -1 0 1 2log10 of FEvals(A1)/FEvals(A0)

prop

ortio

n

f10-14

Naive: 5/5

StAll: 5/5

StEach: 5/5

CMA: 5/5

(l) ill-conditioned functions

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 37/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Conclusions & Further Work

1 Context & Motivation

2 Operator Selection

3 Credit Assignment

4 Empirical Validation

5 Conclusions & Further WorkSummary of ContributionsSome Perspectives for Further Work

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 38/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Summary of Contributions I

Algorithmic Contributions

Operator Selection

MAB = UCB + ScalingDMAB = MAB + Page-Hinkley test [GECCO’08]

SLMAB = MAB + Sliding update rule [AMAI’10]

Credit Assignment

Extreme value-based (∆F) [PPSN’08]

Rank-based methods [GECCO’10]

AOS Combinations

Extreme-xMAB: efficient, but sensitive w.r.t. hyper-parameters(F)AUC-MAB: efficient and robust w.r.t. hyper-parameters

FAUC: comparison-based

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 39/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Summary of Contributions I

Algorithmic Contributions

Operator Selection

MAB = UCB + ScalingDMAB = MAB + Page-Hinkley test [GECCO’08]

SLMAB = MAB + Sliding update rule [AMAI’10]

Credit Assignment

Extreme value-based (∆F) [PPSN’08]

Rank-based methods [GECCO’10]

AOS Combinations

Extreme-xMAB: efficient, but sensitive w.r.t. hyper-parameters(F)AUC-MAB: efficient and robust w.r.t. hyper-parameters

FAUC: comparison-based

⇒ Combining concepts from ML: MABs and AUC⇒ Extending them to a dynamic context

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 39/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Summary of Contributions II

Proposal of new Artificial Scenarios

Boolean and Outlier [GECCO’08], derived from Uniform [Thierens, 2005]

Family of Two-Values scenarios [AMAI’10]

Two params: control of variance and expectation of rewardsAnalysis of different behavioral aspects of AOS methods

Empirical Validation (performance, robustness and generality)

Genetic Algorithms

Artificial scenarios [GECCO’08, AMAI’10, GECCO’10]

Boolean problems [PPSN’08, LION’09, GECCO’09, AMAI’10, GECCO’10]

OneMax, Long K-Path and Royal Road problems

Memetic Algorithms

SAT problems, with the Compass Credit Assign. [CEC’09, Chapter’10]

Differential Evolution

Continuous problems [BBOB’10, PPSN’10]

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 40/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Some Perspectives for Further Work

Application extensions: AOS paradigm is very general

Use within other meta-heuristicsUse at the level of hyper-heuristics

Cross-domain Heuristic Search Challenge (CHeSC)

Algorithmic extensions: towards real-world problems

Extend to multi-modal (diversity, pop.size, . . . )Extend to multi-objective (Pareto, hyper-volume, . . . )

First trial in real-world: sustainable development

Optimization of designs of buildings for energy efficiency

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 41/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Our Publications I

Da Costa, L., Fialho, A., Schoenauer, M., and Sebag, M. (2008).

Adaptive operator selection with dynamic multi-armed bandits.In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM.

Fialho, A., Da Costa, L., Schoenauer, M., and Sebag, M. (2008).

Extreme value based adaptive operator selection.In Proc. Intl. Conf. on Parallel Problem Solving from Nature (PPSN). Springer.

Fialho, A., Da Costa, L., Schoenauer, M., and Sebag, M. (2009).

Dynamic multi-armed bandits and extreme value-based rewards for AOS in evolutionary algorithms.In Proc. Intl. Conf. on Learning and Intelligent Optimization (LION). Springer.

Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., and Sebag, M. (2009).

Extreme compass and dynamic multi-armed bandits for adaptive operator selection.In Proc. IEEE Congress on Evolutionary Computation (CEC). IEEE.

Fialho, A., Schoenauer, M., and Sebag, M. (2009).

Analysis of adaptive operator selection techniques on the royal road and long k-path problems.In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM.

Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., Lardeux, F., and Sebag, M. (2010).

Adaptive operator selection and management in evolutionary algorithms.In Y. Hamadi et al, editor, Autonomous Search. Springer. (to appear)

Fialho, A., Da Costa, L., Schoenauer, M., and Sebag, M. (2010).

Analyzing bandit-based adaptive operator selection mechanisms.Annals of Mathematics and A. I. – Special Issue on Learning and Intelligent Optimization. Springer.

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 42/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Our Publications II

Fialho, A., Schoenauer, M., and Sebag, M. (2010).

Toward comparison-based adaptive operator selection.In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM.

Gong, W., Fialho, A., and Cai, Z. (2010).

Adaptive strategy selection in differential evolution.In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM.

Fialho, A., Schoenauer, M., and Sebag, M. (2010).

Fitness-AUC bandit adaptive strategy selection vs. the probability matching one within DE.In Black-Box Optimization Benchmarking Workshop (BBOB-GECCO). ACM.

Fialho, A., Gong, W., and Cai, Z. (2010).

Probability matching-based adaptive strategy selection vs. uniform strategy selection within DE.In Black-Box Optimization Benchmarking Workshop (BBOB-GECCO). ACM.

Fialho, A. and Ros, R. (2010).

Analysis of adaptive strategy selection within differential evolution on the BBOB-2010 noiseless benchmark.Research Report RR-7259, INRIA.

Fialho, A., Ros, R., Schoenauer, M., and Sebag, M. (2010).

Comparison-based adaptive strategy selection in differential evolution.In Proc. Intl. Conf. on Parallel Problem Solving from Nature (PPSN). Springer.

Li, K., Fialho, A., and Kwong, S. (2011).

Multi-objective differential evolution with adaptive control of parameters and operators.In Proc. Intl. Conf. on Learning and Intelligent Optimization (LION). Springer. (to appear)

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 43/46

Adaptive Operator Selection for Optimization

Alvaro Fialho

Advisors: Marc Schoenauer & Michele Sebag

Ph.D. DefenseEcole Doctorale d’Informatique

Universite Paris-Sud, Orsay, FranceDecember 22, 2010

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Other References I

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002).

Finite-time analysis of the multi-armed bandit problem.Machine Learning, 47(2-3):235–256.

Birattari, M., Stutzle, T., Paquete, L., and Varrentrapp, K. (2002).

A racing algorithm for configuring metaheuristics.In W.B. Langdon et al., editor, Proc. Genetic and Evolutionary Computation Conference (GECCO), pages11–18. Morgan Kaufmann.

Eiben, A., Michalewicz, Z., Schoenauer, M., and Smith, J. (2007).

Parameter control in evolutionary algorithms.volume 54 of Studies in Computational Intelligence, pages 19–46. Springer.

Fawcett, T. (2006).

An introduction to ROC analysis.Pattern Recogn. Lett., 27(8):861–874.

Goldberg, D. (1990).

Probability matching, the magnitude of reinforcement, and classifier system bidding.Machine Learning, 5(4):407–426.

Hansen, N., Auger, A., Finck, S., and Ros, R. (2010).

Real-parameter black-box optimization benchmarking 2010: Experimental setup.Technical Report RR-7215, INRIA.

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 45/46

Context Operator Selection Credit Assignment Empirical Validation Conclusion

Other References II

Hartland, C., Baskiotis, N., Gelly, S., Teytaud, O., and Sebag, M. (2007).

Change point detection and meta-bandits for online learning in dynamic environments.In Proc. Conference Francophone sur l’Apprentissage Automatique (CAPS).

Page, E. (1954).

Continuous inspection schemes.Biometrika, 41:100–115.

Thierens, D. (2005).

An adaptive pursuit strategy for allocating operator probabilities.In H.-G. Beyer et al., editor, Proc. Genetic and Evolutionary Computation Conference (GECCO), pages1539–1546. ACM.

Whitacre, J., Pham, T., and Sarker, R. (2006).

Use of statistical outlier detection method in adaptive evolutionary algorithms.In M. Cattolico et al., editor, Proc. Genetic and Evolutionary Computation Conference (GECCO), pages1345–1352. ACM.

Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 46/46