Upload
tristin-hansard
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Empirical Algorithmics Reading Group Oct 11, 2007
Tuning Search Algorithms for Real-World Applications:
A Regression Tree Based Approach
by Thomas Bartz-Beielstein & Sandor Markon
Presenter: Frank Hutter
Motivation
• “How to find a set of working parameters for direct search algorithms when the number of allowed epxeriments is low”– i.e. find good parameters with few evaluations
• Taking a user’s perspective:– Adopt standard params from the literature– But NFL theorem: can’t do good everywhere– Tune for instance class / for optimization
instances even on a single instance
Considered approaches
• Regression analysis
• ANOVA
• DACE
• CART
Elevator Group Control
• Multi-objective problem– Overall service quality– Traffic throughput– Energy consumption– Transport capacity– Many more …
• Here: only one objective– Minimize time customers have to wait until
they can enter the elevator car
Optimization via Simulation
• Goal: Optimize expected performanceE[y(x1,…, xn)] (x1,…, xn controllable)
• Black box function y
Direct search algorithms
• Do not construct a model of the fitness function
• Interesting aside: same nomenclature as I use, but independent
• Here– Evolution strategy
(special class of evolutionary algorithm)– Simulated annealing
Evolution strategies (ES)
• Start out with parental population at t=0
• For each new generation:– Create offsprings
• Select parent family of size \rho at random• Apply recombination to object variables (?) and
strategy parameters (?)
– Mutation of each offspring– Selection
Many parameters in ES• Number of parent individuals• Number of offspring individuals• Initial mean step sizes (i)
– Can choose problem-specific, different i for each dimension (not done here)
• Number of standard deviations (??)• Mutation strength (global/individual, extended log-normal rule ??)• Mixing number (size of each parent family)• Recombination operator
– For object variables– For strategy variables
• Selection mechanims, maximum life span Plus-strategies ( + ) and comma-strategies (, )Can be generalized by (maximum age of individual)
Simulated Annealing
• Proposal: Gaussian Markov kernel with scale proportional to the temperature
• Decrease temperature on a logarithmic cooling schedule
• Two parameters– Starting temperature– Number of function evaluations at each
temperature
Experimental Analysis of Search Heuristics
• Which parameters have the greatest effect?– Screening
• Which parameter setting might lead to an improved performance– Modelling– Optimization
Design of experiments (DOE)• Choose two factors for each parameter
– Both qualitative and quantitative
• 2k-p fractional factorial design– 2: number of levels for each factor– K parameters– Only 2k-p experiments– Can be generated from a full factorial design on k-p params– Resolution = (k-p) +1 (is this always the case?)
• Resolution 2: not useful – main effects are confounded with each other• Resolution 3: often used, main effects are unconfounded with each other• Resolution 4: all main effects are unconfounded with all 2-factor interactions • Resolution 5: all 2-factor interactions are unconfounded with each other
• Here: 2III9-5 fractional factorial design
Regression analysis
• Using stepAIC function built into R– Akaike’s information criterion to penalize
many parameters in the model– Line search to improve algorithm’s
performance (?)
Tree based regression
• Used for screening• Based on the fractional factorial design• Forward growing
– Splitting criterion: minimal variance within the two children
– Backward pruning: snipping away branches to maximize penalized cost
• Using rpart implementation from R– 10-fold cross validation– “1-SE” rule: mean + 1stddev as pessimistic estimate– Threshold complexity parameter: visually chosen
based on 1-SE rule
Experimental results
• 5000 fitness evaluations as termination criterion• Initialization already finds good parameters! only small improvements possible
• Actual results not too important, but methods!• Questions
– Is strategy useful?– Improve parameters– Which analysis strategy works?
• Two splits (, ):Regression analysis:only first split significant
• Tuned algorithm foundsolution with quality y=32.252– Which parameter settings?– What does 32.252 mean?– How about multiple runs?
strategy useful?regression tree analysis
New Gupta vs. classical + selection
• Tune old and new variants
• Report new results and runtime for tuning– Just that they do not report the runtime for
tuning
Comparison of approaches on Simulated Annealing
• Only two (continuous) parameters
• Classical regression “fails”– No significant effects
• Regression tree– Best around 10,10– Based on a full-factorial
design with 2 levels each this is pretty shaky
Comparison of approaches
E.g. regression trees for screening, then DACE if only a few continuous parameters remain (why the restriction to few?)