Multi-criteria meta-parameter tuning for mono-objective stochastic metaheuristics

Multi-objective meta-parameter tuning for mono-objective stochastic metaheuristics

Multi-objective meta-parameter tuning
for mono-objective stochastic metaheuristics

Johann Dro

THALES Research & Technology

Introduction

Multi-objective method

Parameter tuning

Stochastic metaheuristics

Performance profiles

http://www.flickr.com/photos/k23/2792398403/

Dreo & Siarry, 2004

This presentation concerns the use of a multi-objective approach for tuning the parameters of stochastic metaheuristics.

Going further, it introduce the notion of performance profile.

This work is based on a first idea I published while I was doing my thesis, in 2004.

Stochastic metaheuristics

What I consider to be stochastic metaheuristics here are optimization algorithms that are manipulating a sample of an objective function, and make it progress toward an area where the global optimum is supposed to be.

The sample is probabilistic and is iteratively biased toward a single region of the search space, making the algorithm stochastic.

Those kind of algorithms tries to perform a global search, avoiding the local optima. Using probabilistic sampling operators makes them able to do so even if the objective function shows a lot of constraints.

Examples of stochastic metaheuristics

Most known stochastic metaheuristics are evolutionary algorithms, ant colony optimization, particle swarm optimization, estimation of distribution algorithms, stochastic local searches, etc.

These forms the majority of metaheuristics, but there exists also methods that are not always stochastic, like tabu search, path relinking or variable neighbourhood search.

But most of the times, the efficient methods are the stochastic ones.

Parameter setting

Parameter setting of metaheuristics can be achieved through several methods.

The first approach is the control of parameters, that is to say during the execution of the algorithm.

But the method I present here concerns parameter tuning, that is done before the use of the metaheuristic.

Three main approaches can be used here, all three consider the parameter tuning problem as an optimization one:* racing, that consist in runing several algorithms, keeping only the better as the tests goes along.* SPO, a dedicated method that optimize and estimates the probabilistic bias, trying to reduce the number of runs of the algorithm being tuned.* meta-parameter tuning, that use generalistic optimization algorithm to find the best set of parameters

Meta-parameter tuning

What I propose is a method considering the parameter tuning problem as a multi-objective problem.

As a mono-objective problem

Parameter setting:

Improve performance

http://www.flickr.com/photos/sigfrid/223626315/

Indeed, in all the method that I just cited, the parameter setting is considered as an optimization problem, with only one objective, generally finding the best optimum, or reducing the uncertainty of the results. Sometimes, one try to improve the speed.

More rarely, speed, precision or robustness are aggregated in one criterion, with an adhoc formula.

As a multi-objective problem

Parameter setting:

What is performance ?

multi-objective problem

http://www.flickr.com/photos/jesusdq/345379863/

Thus, in fact, one can set parameters according to several objectives (improve speed, improve robustness, etc.).

One cannot find a set of parameters fitting all the potential uses of a single algorithm on a single problem instance.

Thus, parameter setting is a multi-objective problem.

Multi-objective problem

Performance ?

Precision

Speed

Robustness

Precision

Speed

Stability ( benchmark)

http://www.flickr.com/photos/matthewfch/1688409628/

So, what is performance ?

There is four big category of performances:* the precision is the error of localization of the optimum, either on the objective scale, on the variable scale, or more rigorously, in term of [euclidean] distance to the optimum.* the speed is the time taken by the implementation to reach a stopping criterion, or, if one want to avoid implementation bias, the number of call to the objective function.* the robustness is a measure of the dispersions of the results (for both precision or speed) for a given algorithm instance, on a given problem instance.* finally, what I call stability is the dispersion of results for a given algorithm instance on a given SET of problem instance, a benchmark.

Multi-objective problem

Performance ?

Precision

Speed

Robustness

Precision

Speed

Stability ( benchmark)

In this work, we will only consider speed and precision, although the method may handle any performance metrics.

Meta-parameter tuning

Mono-objective problemStochastic metaheuristic

Thus, we have a mono-objective optimization problem to solve, and we want to use a stochastic metaheuristic.

In our approach, this defines a multi-objective problem, where the parameters of the metaheuristic forms the variables and its performances the objective function.

The obvious and classical criticism here is that there is to much layers of optimizations...

Multi-objective parameter tuning problemMeta-parameter tuning

Mono-objective problemStochastic metaheuristic




Multi-objective parameter tuning problemMeta-parameter tuning

Mono-objective problemStochastic metaheuristicMeta-optimizer




Complexity

Multi-objective parameter tuning problemMono-objective problemStochastic metaheuristicMeta-optimizer

Difficult

Easier1 time

But metaheuristics DO HAVE parameters, and we really want to set them at their better values.

The key point here is that it is easier to set the parameters of a solver than to solve the problem directly. The simpler example of this idea is when you want to solve a continuous optimization problem with hundreds of variables, with a metaheuristic that have 3 parameters.

Moreover, you only have to tune your parameters once, even if you will solve many problems instances later.

Methodology

Speed / PrecisionMedian estimationMono-objective problemStochastic metaheuristicNSGA-2

Today, I will just present a proof o concept of the approach.

In this work, we have used the NSGA-2 multi-objective algorithm, tackling a bi-objective problem, with the dispersion of the metaheuristic results estimated by the median.

Indeed, stochastic metaheuristics cannot guarantee that they will always output the same optimum. In practice, if you run several times the same algorithm instance (with the same parameters) on the same problem instance, your results will be scattered.

We choose to use the median here, and not the mean, because of the non-symetric distribution of the results of the majority of the tested metaheuristics.

Methodology

Speed / PrecisionMedian estimationMono-objective problemStochastic metaheuristicNSGA-2

Today, I will just present a proof o concept of the approach.

In this work, we have used the NSGA-2 multi-objective algorithm, tackling a bi-objective problem, with the dispersion of the metaheuristic results estimated by the median.

Indeed, stochastic metaheuristics cannot guarantee that they will always output the same optimum. In practice, if you run several times the same algorithm instance (with the same parameters) on the same problem instance, your results will be scattered.

We choose to use the median here, and not the mean, because of the non-symetric distribution of the results of the majority of the tested metaheuristics.

Results plots

Speed

Precision

Performance profile / front

What is crucial in our method is that we do not want to aggregates the criterions, instead, we want the Pareto front corresponding to all the non-dominated parameters set.

Latter I will show plots representing the Pareto front, that I will sometimes call the performance front, or performance profile.

The idea is that we can then compare more rigorously several algorithms, by comparing their respective performance fronts. We can also benefits from having a cursor, scaling from a behaviour oriented towards speed, at one extreme, or precision, at the other side.

Some results

Example

2 continuous EDA (CEDA, CHEDA)

Sampling density parameter

Rosenbrock, 2 dimensions

Median estimated with 10 runs

10 000 max eval.

NSGA-2

20 iter., 50 indiv.

10 runs

3 days computation

+ Nelder-Mead Search

Here are ploted the performance profiles of two estimation of distribution algorithms, tackling a small classical continuous problem. Here, we are trying to tune the sample size.

The yellow algorithm is the same as the blue one (an EDA using a multi-variate gaussian PDF), except that it apply a local NMS on selected points of the sample.

One can see here that the hybridization with NMS has a cost in term of speed, but that it permits to gain a lot on precision.

Example

+ simulated annealing

stable temperature parameter



10 000 max eval.

NSGA-2

20 iter., 50 indiv.

10 runs

1 day computation

On the same problem, lets check how a simulated annealing performs. Here, we tune the number of tries we permits before decreasing the temperature (which is thus decreasing step by step).

Here, we gain on speed, but at the cost of precision.

Example

+ genetic algorithm

population parameter



10 000 max eval.

NSGA-2

20 iter., 50 indiv.

10 runs

1 day computation

Finally, we add a genetic algorithm in the comparison, tuning the size of the population.

One can notice that the parameters used in this small study are almost similar, they tune what I would call the sampling density.

Anyway, the genetic algorithm have an interesting performance profile, that is very convex, with an obvious compromise.

SA

JGEN

CEDA

CHEDA

Speed

Precision

Even more interesting is the performance profile projected on the parameters space.

One can see that every algorithm has its very own profile, that tells a lot on how it behaves.

Behaviour exploration

Speed

Precision

Genetic algorithm

Population size

As an example, lets take a closer look to the genetic algorithm's performance profile.

What I find interesting with such plots is that they immediately tells you where is the best compromise for setting your parameters.

On the precision plot, one can see that adding more and more individual to the population will not have a lot of effects for more than 200 individuals. In the same time, for this value we can see an inflection point for the speed objective.

Performance front

Temporal planner, ''Divide & Evolve > CPT'', version ''GOAL''

2 mutation parameters

IPC ''rovers'' problem, instance 06


NSGA-2

10 iter., 5 indiv.

30 runs

1 week computation for 1 run

I have tried this approach on a more complex combinatorial problem. Here, a temporal planning problem, where the parametrized metaheuristic is a indirect evolutionary algorithm, piloting a constraint programming scheduler, in order to solve a problem of the international planning competition.

Due to the nature of the problem, it takes more time to find the performance front.

Anyway, one can see that we have a nice convex front, unfortunately exhibiting perturbations due to the probabilistic errors of the parameter solver.

Performance front in Parameters space

Speed

Precision

M1

M2

In this work we have tuned two probabilities of applying a mutation operator.

There would be a lot of things to say here, but let just say that here also, the performance profiles tells us a lot on how the algorithm behaves.

As an example, one can see that for M2/speed, there is a clear structure, but that the two mutations operators are not dependent, as shown by the correlation plot.

These results may be quite intuitives for an expert, but you can imagine that it is less simple when you deal with several parameters. Moreover, the performances profiles permits to quantify such relationships.

Previous parameters settings

As an example of the interest of using performance profiles, let me show you how it helps us to better tune those parameters.

For one applications, the algorithm may not be sufficiently precise, and one may be ready to loose a bit of speed to increase the precision of the solver.

Looking at the performance profile, one can see that the previous parameter setting (in red) can be improved by slightly changing it to other values (in green).

On the upper left figure you will see the approximate corresponding gain in terms of speed and precision.

Conclusion

Drawbacks

Computation cost

Stochastic M.-O. algo.
supplementary bias

http://www.flickr.com/photos/orvaratli/2690949652/

Drawbacks

Computation cost

supplementary bias

Valid only for:

Algorithm implementation

Problem instance

Stopping criterion

Error

Time

t steps, improvement <


Drawbacks

Computation cost

supplementary bias

Valid only for:

Algorithm implementation

Problem instance

Stopping criterion

Error

Time

t steps, improvement <

Fronts often convex
aggregations?

No benchmarking


Advantages


Objectives space

Parameters space

Quantification of expert knowledge

Advantages


Objectives space

Parameters space


Automatic parameter tuning

One step before use

N parameters 1 parameter

More degrees of freedom

Advantages


Objectives space

Parameters space



One step before use



Algorithms comparison

Statistical tests more meaningful

Advantages


Objectives space

Parameters space



One step before use



Algorithms comparison

Statistical tests more meaningful

Behaviour understanding

Perspectives

Include robustness

Include dispersion estimation

Include benchmarking

Multi-objective SPO, F-Race

Regressions in parameters space

Performances / parameters

Behaviour models?

Links?

Fitness Landscape /

Run time distribution

Taillard's significance plots

...

http://www.flickr.com/photos/colourcrazy/2065575762/

[email protected]

http://www.flickr.com/photos/earlg/275371357/

"Thales confidential. All rights reserved"

Research & Technology