61
An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, Kevin P. Murphy Department of Computer Science University of British Columbia Canada {hutter, hoos, kevinlb, murphyk}@cs.ubc.ca

An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

  • Upload
    lamnhan

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

An Experimental Investigation ofModel-Based Parameter Optimization:

SPO and Beyond

Frank Hutter, Holger H. Hoos,Kevin Leyton-Brown, Kevin P. Murphy

Department of Computer ScienceUniversity of British Columbia

Canada{hutter, hoos, kevinlb, murphyk}@cs.ubc.ca

Page 2: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Motivation for Parameter Optimization

Genetic Algorithms & Evolutionary Strategies are

+ Very flexible frameworks

– Tedious to configure for a new domainI Population sizeI Mating schemeI Mutation rateI Search operatorsI Hybridizations, ...

Automated parameter optimization can help

I High-dimensional optimization problem

I Automate saves time & improves results

2

Page 3: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Motivation for Parameter Optimization

Genetic Algorithms & Evolutionary Strategies are

+ Very flexible frameworks

– Tedious to configure for a new domainI Population sizeI Mating schemeI Mutation rateI Search operatorsI Hybridizations, ...

Automated parameter optimization can help

I High-dimensional optimization problem

I Automate saves time & improves results

2

Page 4: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Motivation for Parameter Optimization

Genetic Algorithms & Evolutionary Strategies are

+ Very flexible frameworks

– Tedious to configure for a new domainI Population sizeI Mating schemeI Mutation rateI Search operatorsI Hybridizations, ...

Automated parameter optimization can help

I High-dimensional optimization problem

I Automate saves time & improves results

2

Page 5: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

I Numerical parameters

– See Blackbox optimization workshop (this GECCO)– Algorithm parameters: CALIBRA [Adenso-Diaz & Laguna, ’06]

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters

– Genetic algorithms [Terashima-Marın, Ross & Valenzuela-Rendon, ’99]

– Iterated Local Search[Hutter, Hoos, Leyton-Brown & Stutzle, ’07-’09]

Dozens of parameters (e.g., CPLEX with 63 parameters) For many problems: SAT, MIP, time-tabling, protein folding,

MPE, ...

3

Page 6: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

I Numerical parameters

– See Blackbox optimization workshop (this GECCO)– Algorithm parameters: CALIBRA [Adenso-Diaz & Laguna, ’06]

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters

– Genetic algorithms [Terashima-Marın, Ross & Valenzuela-Rendon, ’99]

– Iterated Local Search[Hutter, Hoos, Leyton-Brown & Stutzle, ’07-’09]

Dozens of parameters (e.g., CPLEX with 63 parameters) For many problems: SAT, MIP, time-tabling, protein folding,

MPE, ...

3

Page 7: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

I Numerical parameters

– See Blackbox optimization workshop (this GECCO)– Algorithm parameters: CALIBRA [Adenso-Diaz & Laguna, ’06]

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters

– Genetic algorithms [Terashima-Marın, Ross & Valenzuela-Rendon, ’99]

– Iterated Local Search[Hutter, Hoos, Leyton-Brown & Stutzle, ’07-’09]

Dozens of parameters (e.g., CPLEX with 63 parameters) For many problems: SAT, MIP, time-tabling, protein folding,

MPE, ...

3

Page 8: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

I Numerical parameters

– See Blackbox optimization workshop (this GECCO)– Algorithm parameters: CALIBRA [Adenso-Diaz & Laguna, ’06]

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters

– Genetic algorithms [Terashima-Marın, Ross & Valenzuela-Rendon, ’99]

– Iterated Local Search[Hutter, Hoos, Leyton-Brown & Stutzle, ’07-’09]

Dozens of parameters (e.g., CPLEX with 63 parameters) For many problems: SAT, MIP, time-tabling, protein folding,

MPE, ...

3

Page 9: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

Model-free Parameter Optimization

I Numerical parameters: see BBOB workshop (this GECCO)

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters[e.g., Terashima-Marın, Ross & Valenzuela-Rendon, ’99, Hutter, Hoos,

Leyton-Brown & Stutzle, ’07-’09]

Model-based Parameter Optimization

I Methods

– Fractional factorial designs [e.g., Ridge & Kudenko, ’07]

– Sequential Parameter Optimization (SPO)[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

I Can use model for more than optimization

– Importance of each parameter– Interaction between parameters

4

Page 10: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

Model-free Parameter Optimization

I Numerical parameters: see BBOB workshop (this GECCO)

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters[e.g., Terashima-Marın, Ross & Valenzuela-Rendon, ’99, Hutter, Hoos,

Leyton-Brown & Stutzle, ’07-’09]

Model-based Parameter Optimization

I Methods

– Fractional factorial designs [e.g., Ridge & Kudenko, ’07]

– Sequential Parameter Optimization (SPO)[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

I Can use model for more than optimization

– Importance of each parameter– Interaction between parameters

4

Page 11: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

Model-free Parameter Optimization

I Numerical parameters: see BBOB workshop (this GECCO)

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters[e.g., Terashima-Marın, Ross & Valenzuela-Rendon, ’99, Hutter, Hoos,

Leyton-Brown & Stutzle, ’07-’09]

Model-based Parameter Optimization

I Methods

– Fractional factorial designs [e.g., Ridge & Kudenko, ’07]

– Sequential Parameter Optimization (SPO)[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

I Can use model for more than optimization

– Importance of each parameter– Interaction between parameters

4

Page 12: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Parameter Optimization Methods

Model-free Parameter Optimization

I Numerical parameters: see BBOB workshop (this GECCO)

I Few categorical parameters: racing algorithms[Birattari, Stutzle, Paquete & Varrentrapp, ’02]

I Many categorical parameters[e.g., Terashima-Marın, Ross & Valenzuela-Rendon, ’99, Hutter, Hoos,

Leyton-Brown & Stutzle, ’07-’09]

Model-based Parameter Optimization

I Methods

– Fractional factorial designs [e.g., Ridge & Kudenko, ’07]

– Sequential Parameter Optimization (SPO)[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

I Can use model for more than optimization

– Importance of each parameter– Interaction between parameters

4

Page 13: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Outline

1. Sequential Model-Based Optimization (SMBO): Introduction

2. Comparing Two SMBO Methods: SPO vs SKO

3. Components of SPO: Model Quality

4. Components of SPO: Sequential Experimental Design

5. Conclusions and Future Work

5

Page 14: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Outline

1. Sequential Model-Based Optimization (SMBO): Introduction

2. Comparing Two SMBO Methods: SPO vs SKO

3. Components of SPO: Model Quality

4. Components of SPO: Sequential Experimental Design

5. Conclusions and Future Work

6

Page 15: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

.

.

True function

.

.

First step of SMBO

7

Page 16: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

..

True function

Function evaluations

.

First step of SMBO

7

Page 17: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

.

.

.

Function evaluations

.

First step of SMBO

7

Page 18: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

DACE mean prediction

DACE mean +/− 2*stddev

.

Function evaluations

.

First step of SMBO

7

Page 19: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

DACE mean prediction

DACE mean +/− 2*stddev

.

Function evaluations

EI (scaled)

First step of SMBO

7

Page 20: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

DACE mean prediction

DACE mean +/− 2*stddev

True function

Function evaluations

EI (scaled)

First step of SMBO

7

Page 21: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

SMBO: Introduction

1. Get response values at initial design points

2. Fit a model to the data

3. Use model to pick most promising next design point (basedon expected improvement criterion)

4. Repeat 2. and 3. until time is up

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

DACE mean prediction

DACE mean +/− 2*stddev

True function

Function evaluations

EI (scaled)

First step of SMBO

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

DACE mean prediction

DACE mean +/− 2*stddev

True function

Function evaluations

EI (scaled)

Second step of SMBO

7

Page 22: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Outline

1. Sequential Model-Based Optimization (SMBO): Introduction

2. Comparing Two SMBO Methods: SPO vs SKO

3. Components of SPO: Model Quality

4. Components of SPO: Sequential Experimental Design

5. Conclusions and Future Work

8

Page 23: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Dealing with Noise: SKO vs SPO

I Method I (used in SKO) [Huang, Allen, Notz & Zeng, ’06.]

– Fit standard GP assuming Gaussian observation noise– Can only fit the mean of the responses

I Method II (used in SPO) [Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

– Compute statistic of empirical distribution of responses at eachdesign point

– Fit noise-free GP to that

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

GP mean prediction

GP mean +/− 2*stddev

True function

Function evaluations

.

Method I: noisy fit of original response

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

respo

nse y

DACE mean prediction

DACE mean +/− 2*stddev

True function

Function evaluations

.

Method II: noise-free fit of cost statistic

9

Page 24: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Dealing with Noise: SKO vs SPO

I Method I (used in SKO) [Huang, Allen, Notz & Zeng, ’06.]

– Fit standard GP assuming Gaussian observation noise– Can only fit the mean of the responses

I Method II (used in SPO) [Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

– Compute statistic of empirical distribution of responses at eachdesign point

– Fit noise-free GP to that

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

GP mean prediction

GP mean +/− 2*stddev

True function

Function evaluations

.

Method I: noisy fit of original response

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20

25

30

parameter x

response y

DACE mean prediction

DACE mean +/− 2*stddev

True function

Function evaluations

.

Method II: noise-free fit of cost statistic9

Page 25: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Experiment: SPO vs SKO for Tuning CMA-ES

I CMA-ES [Hansen et al., ’95-’09]

– Evolutionary strategy for global optimization– State-of-the-art (see BBOB workshop this GECCO)– Parameters: population size, number of parents, learning rate,

damping parameter

I Tuning objective– Solution cost: best function value found in budget– Here: Sphere function– Minimize mean solution cost across many runs

50 100 150 200

10−6

10−5

10−4

10−3

number of target algorithm runs k

co

st c

k

SKO

SPO 0.3

SPO 0.4

SPO+

10

Page 26: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Experiment: SPO vs SKO for Tuning CMA-ES

I CMA-ES [Hansen et al., ’95-’09]

– Evolutionary strategy for global optimization– State-of-the-art (see BBOB workshop this GECCO)– Parameters: population size, number of parents, learning rate,

damping parameterI Tuning objective

– Solution cost: best function value found in budget– Here: Sphere function– Minimize mean solution cost across many runs

50 100 150 200

10−6

10−5

10−4

10−3

number of target algorithm runs k

co

st c

k

SKO

SPO 0.3

SPO 0.4

SPO+

10

Page 27: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Experiment: SPO vs SKO for Tuning CMA-ES

I CMA-ES [Hansen et al., ’95-’09]

– Evolutionary strategy for global optimization– State-of-the-art (see BBOB workshop this GECCO)– Parameters: population size, number of parents, learning rate,

damping parameterI Tuning objective

– Solution cost: best function value found in budget– Here: Sphere function– Minimize mean solution cost across many runs

50 100 150 200

10−6

10−5

10−4

10−3

number of target algorithm runs k

co

st c

k

SKO

SPO 0.3

SPO 0.4

SPO+

10

Page 28: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Outline

1. Sequential Model-Based Optimization (SMBO): Introduction

2. Comparing Two SMBO Methods: SPO vs SKO

3. Components of SPO: Model Quality

4. Components of SPO: Sequential Experimental Design

5. Conclusions and Future Work

11

Page 29: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: initial design

I Fixed number of initial design points (250) and repeats (2)

– Size of initial design studied before [Bartz-Beielstein & Preuss, ’06]

I Here: studied which 250 design points to use

– Sampled uniformly at random– Random Latin Hypercube– Iterated Hypercube Sampling [Beachkofski & Grandhi, ’02]

– SPO’s standard LHD

I Result: no significant difference

– Initial design not very important– Using cheap random LHD from here on

12

Page 30: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: initial design

I Fixed number of initial design points (250) and repeats (2)

– Size of initial design studied before [Bartz-Beielstein & Preuss, ’06]

I Here: studied which 250 design points to use

– Sampled uniformly at random– Random Latin Hypercube– Iterated Hypercube Sampling [Beachkofski & Grandhi, ’02]

– SPO’s standard LHD

I Result: no significant difference

– Initial design not very important– Using cheap random LHD from here on

12

Page 31: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: initial design

I Fixed number of initial design points (250) and repeats (2)

– Size of initial design studied before [Bartz-Beielstein & Preuss, ’06]

I Here: studied which 250 design points to use

– Sampled uniformly at random– Random Latin Hypercube– Iterated Hypercube Sampling [Beachkofski & Grandhi, ’02]

– SPO’s standard LHD

I Result: no significant difference

– Initial design not very important– Using cheap random LHD from here on

12

Page 32: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: Transformations

I Compute empirical cost statistics c(θ) firstI Then transform cost statistics: log(c(θ))

I Data: solution cost of CMA-ES on sphere– Training: 250 · 2 data points as above– Test: 250 new points, sampled uniformly at random

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

No transformation

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

Log transformation

Note: In newer experiments, SKO with log models was competitive

13

Page 33: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: Transformations

I Compute empirical cost statistics c(θ) firstI Then transform cost statistics: log(c(θ))I Data: solution cost of CMA-ES on sphere

– Training: 250 · 2 data points as above– Test: 250 new points, sampled uniformly at random

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

No transformation

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

Log transformation

Note: In newer experiments, SKO with log models was competitive

13

Page 34: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: Transformations

I Compute empirical cost statistics c(θ) firstI Then transform cost statistics: log(c(θ))I Data: solution cost of CMA-ES on sphere

– Training: 250 · 2 data points as above– Test: 250 new points, sampled uniformly at random

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

No transformation

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

Log transformation

Note: In newer experiments, SKO with log models was competitive

13

Page 35: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: Transformations

I Compute empirical cost statistics c(θ) firstI Then transform cost statistics: log(c(θ))I Data: solution cost of CMA-ES on sphere

– Training: 250 · 2 data points as above– Test: 250 new points, sampled uniformly at random

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

No transformation

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

Log transformation

Note: In newer experiments, SKO with log models was competitive

13

Page 36: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: Transformations

I Compute empirical cost statistics c(θ) firstI Then transform cost statistics: log(c(θ))I Data: solution cost of CMA-ES on sphere

– Training: 250 · 2 data points as above– Test: 250 new points, sampled uniformly at random

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

No transformation

−6 −4 −2 0 2 4

−6

−4

−2

0

2

4

log10

(true response)

log

10(p

red

icte

d r

esp

on

se

)

Log transformation

Note: In newer experiments, SKO with log models was competitive13

Page 37: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Outline

1. Sequential Model-Based Optimization (SMBO): Introduction

2. Comparing Two SMBO Methods: SPO vs SKO

3. Components of SPO: Model Quality

4. Components of SPO: Sequential Experimental Design

5. Conclusions and Future Work

14

Page 38: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO:expected improvement criterion

User wants to optimize some objective c

I We transform c to improve the model

I But that doesn’t change the user’s objective

Have to adapt expected improvement criterion to handleun-transformed objective

Fix for log-transform: new expected improvement criterion

I Want to optimize Iexp(θ) = max{0, fmin − exp[f (θ)]}I There is a closed-form solution (see paper)

I However: no significant improvement in our experiments

15

Page 39: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO:expected improvement criterion

User wants to optimize some objective c

I We transform c to improve the model

I But that doesn’t change the user’s objective

Have to adapt expected improvement criterion to handleun-transformed objective

Fix for log-transform: new expected improvement criterion

I Want to optimize Iexp(θ) = max{0, fmin − exp[f (θ)]}I There is a closed-form solution (see paper)

I However: no significant improvement in our experiments

15

Page 40: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO:expected improvement criterion

User wants to optimize some objective c

I We transform c to improve the model

I But that doesn’t change the user’s objective

Have to adapt expected improvement criterion to handleun-transformed objective

Fix for log-transform: new expected improvement criterion

I Want to optimize Iexp(θ) = max{0, fmin − exp[f (θ)]}I There is a closed-form solution (see paper)

I However: no significant improvement in our experiments

15

Page 41: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noise

Some algorithm runs can be lucky

need extra mechanism to ensure incumbent is really good

SPO increases number of repeats over time

SPO’s mechanism in a nutshell

I Compute cost statistic c(θ) for each configuration θ

I θinc ← configuration with lowest c(θ)I Perform up to R runs for θinc to ensure it is good

– Increase R over time

I But what if it doesn’t perform well?

– Then a different incumbent is picked in the next iteration– That might also turn out not to be good...

16

Page 42: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noise

Some algorithm runs can be lucky

need extra mechanism to ensure incumbent is really good

SPO increases number of repeats over time

SPO’s mechanism in a nutshell

I Compute cost statistic c(θ) for each configuration θ

I θinc ← configuration with lowest c(θ)I Perform up to R runs for θinc to ensure it is good

– Increase R over time

I But what if it doesn’t perform well?

– Then a different incumbent is picked in the next iteration– That might also turn out not to be good...

16

Page 43: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noise

Some algorithm runs can be lucky

need extra mechanism to ensure incumbent is really good

SPO increases number of repeats over time

SPO’s mechanism in a nutshell

I Compute cost statistic c(θ) for each configuration θ

I θinc ← configuration with lowest c(θ)

I Perform up to R runs for θinc to ensure it is good

– Increase R over time

I But what if it doesn’t perform well?

– Then a different incumbent is picked in the next iteration– That might also turn out not to be good...

16

Page 44: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noise

Some algorithm runs can be lucky

need extra mechanism to ensure incumbent is really good

SPO increases number of repeats over time

SPO’s mechanism in a nutshell

I Compute cost statistic c(θ) for each configuration θ

I θinc ← configuration with lowest c(θ)I Perform up to R runs for θinc to ensure it is good

– Increase R over time

I But what if it doesn’t perform well?

– Then a different incumbent is picked in the next iteration– That might also turn out not to be good...

16

Page 45: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noise

Some algorithm runs can be lucky

need extra mechanism to ensure incumbent is really good

SPO increases number of repeats over time

SPO’s mechanism in a nutshell

I Compute cost statistic c(θ) for each configuration θ

I θinc ← configuration with lowest c(θ)I Perform up to R runs for θinc to ensure it is good

– Increase R over time

I But what if it doesn’t perform well?

– Then a different incumbent is picked in the next iteration– That might also turn out not to be good...

16

Page 46: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noiseSimple fix

I Iteratively perform runs for single most promising θnew

I Compare against current incumbent θinc

I Once θnew has as many runs as θinc : make it new θinc

I Maintain invariant: θinc has the most runs of all

I Substantially improves robustness → new SPO variant: SPO+

500 600 700 800 900 1000

10−3

10−2

10−1

100

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Griewangk

500 600 700 800 900 1000

101

102

103

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Rastrigin

17

Page 47: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noiseSimple fix

I Iteratively perform runs for single most promising θnew

I Compare against current incumbent θinc

I Once θnew has as many runs as θinc : make it new θinc

I Maintain invariant: θinc has the most runs of all

I Substantially improves robustness → new SPO variant: SPO+

500 600 700 800 900 1000

10−3

10−2

10−1

100

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Griewangk

500 600 700 800 900 1000

101

102

103

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Rastrigin

17

Page 48: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noiseSimple fix

I Iteratively perform runs for single most promising θnew

I Compare against current incumbent θinc

I Once θnew has as many runs as θinc : make it new θinc

I Maintain invariant: θinc has the most runs of all

I Substantially improves robustness → new SPO variant: SPO+

500 600 700 800 900 1000

10−3

10−2

10−1

100

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Griewangk

500 600 700 800 900 1000

101

102

103

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Rastrigin

17

Page 49: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Components of SPO: choosing the incumbentparameter setting in presence of noiseSimple fix

I Iteratively perform runs for single most promising θnew

I Compare against current incumbent θinc

I Once θnew has as many runs as θinc : make it new θinc

I Maintain invariant: θinc has the most runs of all

I Substantially improves robustness → new SPO variant: SPO+

500 600 700 800 900 1000

10−3

10−2

10−1

100

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Griewangk

500 600 700 800 900 1000

101

102

103

number of algorithm runs k

pe

rfo

rma

nc

e p

k

SPO 0.3SPO 0.4SPO+

Tuning CMA-ES on Rastrigin17

Page 50: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Summary of Study of SPO components &Definition of SPO+

Model Quality

I Initial design not very important

use simple random LHD in SPO+

I Log-transforms sometimes improve model quality a lot

use them in SPO+ (for positive functions)

Sequential Experimental Design

I Expected improvement criterion

New one that’s better in theory but not in practice Use original one in SPO+

I New mechanism for increasing #runs & selecting incumbent

substantially improves robustness Use it in SPO+

18

Page 51: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Summary of Study of SPO components &Definition of SPO+

Model Quality

I Initial design not very important

use simple random LHD in SPO+

I Log-transforms sometimes improve model quality a lot

use them in SPO+ (for positive functions)

Sequential Experimental Design

I Expected improvement criterion

New one that’s better in theory but not in practice Use original one in SPO+

I New mechanism for increasing #runs & selecting incumbent

substantially improves robustness Use it in SPO+

18

Page 52: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Comparison to State of the Artfor tuning SAPS

I SAPSI Stochastic local search algorithm for SATI 4 continuous parametersI Here: min. search steps for single problem instance

I Results known for CALIBRA & ParamILS [Hutter et al, AAAI’07]

103

104

0.5

1

1.5

2

2.5

3x 10

4

number of algorithm runs k

pe

rfo

rma

nce

pk

SPO 0.3SPO 0.4SPO+

Comparison to SPO variants,with varying budget

Procedure SAPS median run-time/103

SAPS default 85.5CALIBRA(100) 10.7± 1.1BasicILS(100) 10.9± 0.6

FocusedILS 10.6± 0.5SPO 0.3 18.3± 13.7SPO 0.4 10.4± 0.7

SPO+ 10.0± 0.4

With budget of 20000 runs of SAPS

19

Page 53: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Comparison to State of the Artfor tuning SAPS

I SAPSI Stochastic local search algorithm for SATI 4 continuous parametersI Here: min. search steps for single problem instance

I Results known for CALIBRA & ParamILS [Hutter et al, AAAI’07]

103

104

0.5

1

1.5

2

2.5

3x 10

4

number of algorithm runs k

pe

rfo

rma

nce

pk

SPO 0.3SPO 0.4SPO+

Comparison to SPO variants,with varying budget

Procedure SAPS median run-time/103

SAPS default 85.5CALIBRA(100) 10.7± 1.1BasicILS(100) 10.9± 0.6

FocusedILS 10.6± 0.5SPO 0.3 18.3± 13.7SPO 0.4 10.4± 0.7

SPO+ 10.0± 0.4

With budget of 20000 runs of SAPS

19

Page 54: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Comparison to State of the Artfor tuning SAPS

I SAPSI Stochastic local search algorithm for SATI 4 continuous parametersI Here: min. search steps for single problem instance

I Results known for CALIBRA & ParamILS [Hutter et al, AAAI’07]

103

104

0.5

1

1.5

2

2.5

3x 10

4

number of algorithm runs k

pe

rfo

rma

nce

pk

SPO 0.3SPO 0.4SPO+

Comparison to SPO variants,with varying budget

Procedure SAPS median run-time/103

SAPS default 85.5CALIBRA(100) 10.7± 1.1BasicILS(100) 10.9± 0.6

FocusedILS 10.6± 0.5SPO 0.3 18.3± 13.7SPO 0.4 10.4± 0.7

SPO+ 10.0± 0.4

With budget of 20000 runs of SAPS

19

Page 55: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Comparison to State of the Artfor tuning SAPS

I SAPSI Stochastic local search algorithm for SATI 4 continuous parametersI Here: min. search steps for single problem instance

I Results known for CALIBRA & ParamILS [Hutter et al, AAAI’07]

103

104

0.5

1

1.5

2

2.5

3x 10

4

number of algorithm runs k

pe

rfo

rma

nce

pk

SPO 0.3SPO 0.4SPO+

Comparison to SPO variants,with varying budget

Procedure SAPS median run-time/103

SAPS default 85.5CALIBRA(100) 10.7± 1.1BasicILS(100) 10.9± 0.6

FocusedILS 10.6± 0.5SPO 0.3 18.3± 13.7SPO 0.4 10.4± 0.7

SPO+ 10.0± 0.4

With budget of 20000 runs of SAPS

19

Page 56: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Outline

1. Sequential Model-Based Optimization (SMBO): Introduction

2. Comparing Two SMBO Methods: SPO vs SKO

3. Components of SPO: Model Quality

4. Components of SPO: Sequential Experimental Design

5. Conclusions and Future Work

20

Page 57: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Conclusions

I SMBO can help design algorithmsI More principled, saves development timeI Can exploit full potential of flexible algorithms

I Our contributionI Insights: what makes a popular SMBO algorithm, SPO, workI Improved version, SPO+, often performs better than SPO

21

Page 58: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Conclusions

I SMBO can help design algorithmsI More principled, saves development timeI Can exploit full potential of flexible algorithms

I Our contributionI Insights: what makes a popular SMBO algorithm, SPO, workI Improved version, SPO+, often performs better than SPO

21

Page 59: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Ongoing & Future Work

Ongoing Extensions of Model-Based Framework

I Use of different models in SPO+ framework

I Dealing with categorical parameters

I Optimization for Sets/Distributions of Instances

Use of models for scientific understanding

I Interactions of instance features and parameter values

I Can help understand and hopefully improve algorithms

22

Page 60: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Ongoing & Future Work

Ongoing Extensions of Model-Based Framework

I Use of different models in SPO+ framework

I Dealing with categorical parameters

I Optimization for Sets/Distributions of Instances

Use of models for scientific understanding

I Interactions of instance features and parameter values

I Can help understand and hopefully improve algorithms

22

Page 61: An Experimental Investigation of Model-Based …hutter/papers/09-GECCO-SPO+presentation.pdf · An Experimental Investigation of Model-Based Parameter Optimization: SPO and Beyond

Thanks to

I Thomas Bartz-Beielstein

– SPO implementation & CMA-ES wrapper

I Theodore Allen

– SKO implementation

23