Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Monte Carlo Tree Search for Algorithm Configuration: MOSAIC
Herilalaina Rakotoarison and Michele SebagTAU
CNRS − INRIA − LRI − Universite Paris-Sud
NeurIPS MetaLearning Wshop − Dec. 8, 2018
1 / 14
Monte Carlo Tree Search for Algorithm Configuration: MOSAIC
Herilalaina Rakotoarison and Michele SebagTackling the Underspecified
CNRS − INRIA − LRI − Universite Paris-Sud
NeurIPS MetaLearning Wshop − Dec. 8, 2018
1 / 14
AutoML: Algorithm Selection and Configuration
A mixed optimization problem
Find λ∗ ∈ arg minλ∈ΛL(λ,P)
with λ a pipeline and L the predictive loss on dataset P
Modes
I offline hyper-parameter setting
I online hyper-parameter setting
Approaches
I Bayesian optimization: SMAC, Auto-SkLearn, AutoWeka, BHOBHutter et al., 11; Feurer et al. 15; Kotthoff et al. 17; Falkner et al. 18
I Evolutionary Computation Olson et al. 16; Choromanski et al. 18
I Bilevel optimization Franceschi et al. 17, 18
I Reinforcement learning Andrychowicz 16; Drori et al. 18
2 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search Tree
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
New Node
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
New Node
PhaseRandom
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
New Node
PhaseRandom
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
New Node
PhaseRandom
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search
Kocsis & Szepesvari 06, Gelly & Silver 07
Game playing when no good evaluation function and huge search space.
I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks
I Select next action (bandit-based phase)Auer et al. 02
I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes
I Returned solutionI Path visited most often
Explored Tree
Search TreePhase
Bandit−Based
New Node
PhaseRandom
Within learningFeature selection Gaudel, Sebag, 10
Active learning Rolet, Teytaud, Sebag, 09
3 / 14
Monte Carlo Tree Search for AutoML
1. Select next action (alg/hyperparameter)
select arg maxi
{µi + c
√logN
ni
}with µi average reward, ni number visits,N =
∑i ni
2. Add a node: new alg orhyper-parameter;
3. Random phase: complete pipeline withdefault/random choices.
4. Compute reward v : predictive accuracyof pipeline
5. Use v to update µi , increment ni in allvisited nodes
4 / 14
Mosaic: MCTS for AutoML
Overview
I Search space: { Preprocessing algs } × { Algorithms }I Fixed sequence of choices:
1. Preprocessing alg PCA, random proj,...2. hyper-parameters of pre-processing alg dimension3. Algorithm SVM, RF, ...4. hyper-parameters of Alg. C , ε,...
Key features
I CPU management(every ∆t, kills unpromising pipelines; increases evaluation resources forothers)
I Sample continuous hyperparameters: Progressive wideningGelly Silver 07
Increase number of sampled values like⌊√
N⌋
5 / 14
A MOSAIC Tree
6 / 14
Search space
Methods Parameters
Preprocessing PCA n componentsSelectKBest k, score func
Gaussian Random Projection n components, epsNo preprocessing -
Algorithm Logistic regression C, penalty, solver
SGD Classifierlearning rate, penalty,
alpha, l1 ratio, lossKNN classifier K, metric, weights
XGBoost classifierlearning rate, max depth,
gamma, subsample,regularization
LDA n components,learning decay
Random forestcriterion, max features,max depth, bootstrap,
min sample split
7 / 14
Experiments and Results
AutoML Challenge PAKDD 2018
I Binary classification (Final phase)
I 10 or 20 minutes time budget for each dataset
I Metric: balanced accuracy
Set 1 Set 2 Set 3 Set 4 Set 5aad freiburg 0.5533 0.2839 0.3932 0.2635 0.6766
mosaic 0.5382 0.3161 0.3376 0.3182 0.6317narnars0 0.5418 0.2894 0.3665 0.2005 0.6922W|wang| 0.5655 0.4851 0.2829 -0.0886 0.6840
Final test phase: mosaic ranked second w.r.t. average rank.
8 / 14
Experiment 2: extended pre-processing search space
Methods Parameters
PCA n components, whiten, svd solver, tol, iterated power
KernelPCAn components, kernel, gamma, degree, coef0, alpha,
eigen solver, tol, max iterFastICA n components, algorithm, max iter, tol, whiten, funIdentity -
IncrementalPCA n components, whiten, batch sizeSelectKBest score func, k
SelectPercentile score func, percentileLinearSVC Pre-processing C, class weight, max iter
ExtraTreesClassifierPre-processing
n estimators, criterion, max depth,min samples split, min samples leaf,
min weight fraction leaf, max features,max leaf nodes, class weight
FeatureAgglomeration n clusters, affinity, linkagePolynomialFeatures degree
RBFSampler gamma, n components
RandomTreesEmbeddingn components, max depth, min samples split,
min samples leaf, min weight fraction leaf, max leaf nodes,min impurity decrease
9 / 14
Experiment 2: extended algorithm search space
Algorithms ParametersLinearDiscriminantAnalysis solver, shrinkage
QuadraticDiscriminantAnalysis reg paramDummyClassifier -
AdaBoostClassifierbase estimator, n estimators,
learning rate, algorithm
ExtraTreesClassifier
n estimators, criterion, max depth,min samples split, min samples leaf,
min weight fraction leaf, max features,max leaf nodes, class weight
RandomForestClassifier
n estimators, criterion, min samples split,min samples leaf, min weight fraction leaf,
max features, max leaf nodes,class weight, bootstrap
GradientBoostingClassifier
loss, learning rate, n estimators,max depth, criterion, min samples split,
min samples leaf, min weight fraction leaf,subsample, max features, max leaf nodes
SGD Classifierlearning rate, penalty,
alpha, l1 ratio, loss, epsilon,eta0, power t, class weight, max iter
10 / 14
Experiment 2: extended algorithm search space, foll.
Algorithms ParametersPerceptron penalty, alpha, max iter, tol, shuffle, eta0
RidgeClassifier alpha, max iter, class weight, solverPassiveAggressiveClassifier C, max iter, tol, loss, class weight
KNeighborsClassifier n neighbors, weights, algorithm, leaf size, p, metric
MLPClassifier
hidden layer sizes, activation, solver,alpha, batch size, learning rate,
learning rate init, power t, max iter,shuffle, warm start, momentum,
nesterovs momentum, early stopping,validation fraction, beta 1, beta 2, epsilon
SVCC, max iter, tol, loss, class weight,
kernel, degree, gamma, coef0
DecisionTreeClassifier
criterion, splitter, max depth, min samples split,min samples leaf, min weight fraction leaf,
max features, max leaf nodes,min impurity decrease, class weight
ExtraTreeClassifier
criterion, splitter, max depth, min samples split,min samples leaf, min weight fraction leaf,
max features, max leaf nodes,min impurity decrease, class weight
11 / 14
Experiment 2
On 133 datasets from the OpenML repository Vanschoren et al. 14
1 hour per (dataset, run); 10 runs per dataset.
Average rank (lower is better) of MOSAIC and Vanilla Auto-Sklearn across 102datasets (Datasets on which the performance of both methods differsstatistically according to Mann-Whitney rank test with p = 0.05).
12 / 14
Discussion
Monte Carlo Tree Search for Algorithm Configuration
I Proof of concept
Limitations
I Order of hyper-parameters
I Time allocation
13 / 14
Perspectives
Short term
I Refine initialization
I Extend to constrained satisfaction
Medium term
I Learn value of parameters across datasets
I Improving the sampling of continuous hyperparameter values
Long term
I Learning Meta-Features: a (the) key Meta-Learning task.
14 / 14