If you can't read please download the document
Upload
johann-dreo
View
1.916
Download
0
Embed Size (px)
Citation preview
Multi-objective meta-parameter tuning for mono-objective stochastic metaheuristics
Multi-objective meta-parameter tuning
for mono-objective stochastic metaheuristics
Johann Dro
THALES Research & Technology
Introduction
Multi-objective method
Parameter tuning
Stochastic metaheuristics
Performance profiles
http://www.flickr.com/photos/k23/2792398403/
Dreo & Siarry, 2004
This presentation concerns the use of a multi-objective approach for tuning the parameters of stochastic metaheuristics.
Going further, it introduce the notion of performance profile.
This work is based on a first idea I published while I was doing my thesis, in 2004.
Stochastic metaheuristics
What I consider to be stochastic metaheuristics here are optimization algorithms that are manipulating a sample of an objective function, and make it progress toward an area where the global optimum is supposed to be.
The sample is probabilistic and is iteratively biased toward a single region of the search space, making the algorithm stochastic.
Those kind of algorithms tries to perform a global search, avoiding the local optima. Using probabilistic sampling operators makes them able to do so even if the objective function shows a lot of constraints.
Examples of stochastic metaheuristics
Most known stochastic metaheuristics are evolutionary algorithms, ant colony optimization, particle swarm optimization, estimation of distribution algorithms, stochastic local searches, etc.
These forms the majority of metaheuristics, but there exists also methods that are not always stochastic, like tabu search, path relinking or variable neighbourhood search.
But most of the times, the efficient methods are the stochastic ones.
Parameter setting
Parameter setting of metaheuristics can be achieved through several methods.
The first approach is the control of parameters, that is to say during the execution of the algorithm.
But the method I present here concerns parameter tuning, that is done before the use of the metaheuristic.
Three main approaches can be used here, all three consider the parameter tuning problem as an optimization one:* racing, that consist in runing several algorithms, keeping only the better as the tests goes along.* SPO, a dedicated method that optimize and estimates the probabilistic bias, trying to reduce the number of runs of the algorithm being tuned.* meta-parameter tuning, that use generalistic optimization algorithm to find the best set of parameters
Meta-parameter tuning
What I propose is a method considering the parameter tuning problem as a multi-objective problem.
As a mono-objective problem
Parameter setting:
Improve performance
http://www.flickr.com/photos/sigfrid/223626315/
Indeed, in all the method that I just cited, the parameter setting is considered as an optimization problem, with only one objective, generally finding the best optimum, or reducing the uncertainty of the results. Sometimes, one try to improve the speed.
More rarely, speed, precision or robustness are aggregated in one criterion, with an adhoc formula.
As a multi-objective problem
Parameter setting:
What is performance ?
multi-objective problem
http://www.flickr.com/photos/jesusdq/345379863/
Thus, in fact, one can set parameters according to several objectives (improve speed, improve robustness, etc.).
One cannot find a set of parameters fitting all the potential uses of a single algorithm on a single problem instance.
Thus, parameter setting is a multi-objective problem.
Multi-objective problem
Performance ?
Precision
Speed
Robustness
Precision
Speed
Stability ( benchmark)
http://www.flickr.com/photos/matthewfch/1688409628/
So, what is performance ?
There is four big category of performances:* the precision is the error of localization of the optimum, either on the objective scale, on the variable scale, or more rigorously, in term of [euclidean] distance to the optimum.* the speed is the time taken by the implementation to reach a stopping criterion, or, if one want to avoid implementation bias, the number of call to the objective function.* the robustness is a measure of the dispersions of the results (for both precision or speed) for a given algorithm instance, on a given problem instance.* finally, what I call stability is the dispersion of results for a given algorithm instance on a given SET of problem instance, a benchmark.
Multi-objective problem
Performance ?
Precision
Speed
Robustness
Precision
Speed
Stability ( benchmark)
In this work, we will only consider speed and precision, although the method may handle any performance metrics.
Meta-parameter tuning
Mono-objective problemStochastic metaheuristic
Thus, we have a mono-objective optimization problem to solve, and we want to use a stochastic metaheuristic.
In our approach, this defines a multi-objective problem, where the parameters of the metaheuristic forms the variables and its performances the objective function.
The obvious and classical criticism here is that there is to much layers of optimizations...
Multi-objective parameter tuning problemMeta-parameter tuning
Mono-objective problemStochastic metaheuristic
Thus, we have a mono-objective optimization problem to solve, and we want to use a stochastic metaheuristic.
In our approach, this defines a multi-objective problem, where the parameters of the metaheuristic forms the variables and its performances the objective function.
The obvious and classical criticism here is that there is to much layers of optimizations...
Multi-objective parameter tuning problemMeta-parameter tuning
Mono-objective problemStochastic metaheuristicMeta-optimizer
Thus, we have a mono-objective optimization problem to solve, and we want to use a stochastic metaheuristic.
In our approach, this defines a multi-objective problem, where the parameters of the metaheuristic forms the variables and its performances the objective function.
The obvious and classical criticism here is that there is to much layers of optimizations...
Complexity
Multi-objective parameter tuning problemMono-objective problemStochastic metaheuristicMeta-optimizer
Difficult
Easier1 time
But metaheuristics DO HAVE parameters, and we really want to set them at their better values.
The key point here is that it is easier to set the parameters of a solver than to solve the problem directly. The simpler example of this idea is when you want to solve a continuous optimization problem with hundreds of variables, with a metaheuristic that have 3 parameters.
Moreover, you only have to tune your parameters once, even if you will solve many problems instances later.
Methodology
Speed / PrecisionMedian estimationMono-objective problemStochastic metaheuristicNSGA-2
Today, I will just present a proof o concept of the approach.
In this work, we have used the NSGA-2 multi-objective algorithm, tackling a bi-objective problem, with the dispersion of the metaheuristic results estimated by the median.
Indeed, stochastic metaheuristics cannot guarantee that they will always output the same optimum. In practice, if you run several times the same algorithm instance (with the same parameters) on the same problem instance, your results will be scattered.
We choose to use the median here, and not the mean, because of the non-symetric distribution of the results of the majority of the tested metaheuristics.
Methodology
Speed / PrecisionMedian estimationMono-objective problemStochastic metaheuristicNSGA-2
Today, I will just present a proof o concept of the approach.
In this work, we have used the NSGA-2 multi-objective algorithm, tackling a bi-objective problem, with the dispersion of the metaheuristic results estimated by the median.
Indeed, stochastic metaheuristics cannot guarantee that they will always output the same optimum. In practice, if you run several times the same algorithm instance (with the same parameters) on the same problem instance, your results will be scattered.
We choose to use the median here, and not the mean, because of the non-symetric distribution of the results of the majority of the tested metaheuristics.
Results plots
Speed
Precision
Performance profile / front
What is crucial in our method is that we do not want to aggregates the criterions, instead, we want the Pareto front corresponding to all the non-dominated parameters set.
Latter I will show plots representing the Pareto front, that I will sometimes call the performance front, or performance profile.
The idea is that we can then compare more rigorously several algorithms, by comparing their respective performance fronts. We can also benefits from having a cursor, scaling from a behaviour oriented towards speed, at one extreme, or precision, at the other side.
Some results
Example
2 continuous EDA (CEDA, CHEDA)
Sampling density parameter
Rosenbrock, 2 dimensions
Median estimated with 10 runs
10 000 max eval.
NSGA-2
20 iter., 50 indiv.
10 runs
3 days computation
+ Nelder-Mead Search
Here are ploted the performance profiles of two estimation of distribution algorithms, tackling a small classical continuous problem. Here, we are trying to tune the sample size.
The yellow algorithm is the same as the blue one (an EDA using a multi-variate gaussian PDF), except that it apply a local NMS on selected points of the sample.
One can see here that the hybridization with NMS has a cost in term of speed, but that it permits to gain a lot on precision.
Example
+ simulated annealing
stable temperature parameter
Rosenbrock, 2 dimensions
Median estimated with 10 runs
10 000 max eval.
NSGA-2
20 iter., 50 indiv.
10 runs
1 day computation
On the same problem, lets check how a simulated annealing performs. Here, we tune the number of tries we permits before decreasing the temperature (which is thus decreasing step by step).
Here, we gain on speed, but at the cost of precision.
Example
+ genetic algorithm
population parameter
Rosenbrock, 2 dimensions
Median estimated with 10 runs
10 000 max eval.
NSGA-2
20 iter., 50 indiv.
10 runs
1 day computation
Finally, we add a genetic algorithm in the comparison, tuning the size of the population.
One can notice that the parameters used in this small study are almost similar, they tune what I would call the sampling density.
Anyway, the genetic algorithm have an interesting performance profile, that is very convex, with an obvious compromise.
SA
JGEN
CEDA
CHEDA
Speed
Precision
Even more interesting is the performance profile projected on the parameters space.
One can see that every algorithm has its very own profile, that tells a lot on how it behaves.
Behaviour exploration
Speed
Precision
Genetic algorithm
Population size
As an example, lets take a closer look to the genetic algorithm's performance profile.
What I find interesting with such plots is that they immediately tells you where is the best compromise for setting your parameters.
On the precision plot, one can see that adding more and more individual to the population will not have a lot of effects for more than 200 individuals. In the same time, for this value we can see an inflection point for the speed objective.
Performance front
Temporal planner, ''Divide & Evolve > CPT'', version ''GOAL''
2 mutation parameters
IPC ''rovers'' problem, instance 06
Median estimated with 10 runs
NSGA-2
10 iter., 5 indiv.
30 runs
1 week computation for 1 run
I have tried this approach on a more complex combinatorial problem. Here, a temporal planning problem, where the parametrized metaheuristic is a indirect evolutionary algorithm, piloting a constraint programming scheduler, in order to solve a problem of the international planning competition.
Due to the nature of the problem, it takes more time to find the performance front.
Anyway, one can see that we have a nice convex front, unfortunately exhibiting perturbations due to the probabilistic errors of the parameter solver.
Performance front in Parameters space
Speed
Precision
M1
M2
In this work we have tuned two probabilities of applying a mutation operator.
There would be a lot of things to say here, but let just say that here also, the performance profiles tells us a lot on how the algorithm behaves.
As an example, one can see that for M2/speed, there is a clear structure, but that the two mutations operators are not dependent, as shown by the correlation plot.
These results may be quite intuitives for an expert, but you can imagine that it is less simple when you deal with several parameters. Moreover, the performances profiles permits to quantify such relationships.
Previous parameters settings
As an example of the interest of using performance profiles, let me show you how it helps us to better tune those parameters.
For one applications, the algorithm may not be sufficiently precise, and one may be ready to loose a bit of speed to increase the precision of the solver.
Looking at the performance profile, one can see that the previous parameter setting (in red) can be improved by slightly changing it to other values (in green).
On the upper left figure you will see the approximate corresponding gain in terms of speed and precision.
Conclusion
Drawbacks
Computation cost
Stochastic M.-O. algo.
supplementary bias
http://www.flickr.com/photos/orvaratli/2690949652/
Drawbacks
Computation cost
Stochastic M.-O. algo.
supplementary bias
Valid only for:
Algorithm implementation
Problem instance
Stopping criterion
Error
Time
t steps, improvement <
http://www.flickr.com/photos/orvaratli/2690949652/
Drawbacks
Computation cost
Stochastic M.-O. algo.
supplementary bias
Valid only for:
Algorithm implementation
Problem instance
Stopping criterion
Error
Time
t steps, improvement <
Fronts often convex
aggregations?
No benchmarking
http://www.flickr.com/photos/orvaratli/2690949652/
Advantages
Performance profiles
Objectives space
Parameters space
Quantification of expert knowledge
Advantages
Performance profiles
Objectives space
Parameters space
Quantification of expert knowledge
Automatic parameter tuning
One step before use
N parameters 1 parameter
More degrees of freedom
Advantages
Performance profiles
Objectives space
Parameters space
Quantification of expert knowledge
Automatic parameter tuning
One step before use
N parameters 1 parameter
More degrees of freedom
Algorithms comparison
Statistical tests more meaningful
Advantages
Performance profiles
Objectives space
Parameters space
Quantification of expert knowledge
Automatic parameter tuning
One step before use
N parameters 1 parameter
More degrees of freedom
Algorithms comparison
Statistical tests more meaningful
Behaviour understanding
Perspectives
Include robustness
Include dispersion estimation
Include benchmarking
Multi-objective SPO, F-Race
Regressions in parameters space
Performances / parameters
Behaviour models?
Links?
Fitness Landscape /
Performance profiles
Run time distribution
Taillard's significance plots
...
http://www.flickr.com/photos/colourcrazy/2065575762/
http://www.flickr.com/photos/earlg/275371357/
"Thales confidential. All rights reserved"
Research & Technology