Upload
rylee-lark
View
214
Download
0
Embed Size (px)
Citation preview
Dynamic RestartsOptimal Randomized Restart Policies
with Observation
Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman
Outline Background
heavy-tailed run-time distributions of backtracking search
restart policies Optimal strategies to improve expected
time to solution using observation of solver behavior during
particular runs predictive model of solver performance
Empirical results
Backtracking SearchBacktracking Search
Backtracking search algorithms often exhibit a remarkable variability in performance among: slightly different problem instances slightly different heuristics different runs of randomized heuristics
Problematic for practical application Verification, scheduling, planning
Heavy-tailed Runtime DistributionsHeavy-tailed Runtime Distributions
Observation (Gomes 1997): distributions of runtimes of backtrack solvers often have heavy tails infinite mean and variance probability of long runs decays by power law
(Pareto-Levy), rather than exponentially (Normal)
Very short Very long
Formal Models of Heavy-tailed Behavior
Imbalanced tree search models (Chen 2001) Exponentially growing subtrees occur with
exponentially decreasing probabilities Heavy-tailed runtime distribution can arise
in backtrack search for imbalanced models with appropriate parameters p and b p is the probability of the branching heuristics
making an error b is the branch factor
Randomized Restarts Solution: randomize the systematic
solver Add noise to the heuristic branching (variable
choice) function Cutoff and restart search after some number
of steps Provably eliminates heavy tails Effective whenever search stagnates
Even if RTD is not formally heavy-tailed! Used by all state-of-the-art SAT engines
Chaff, GRASP, BerkMin Superscalar processor verification
Complete Knowledge of RTD
P(t)
t
D
Complete Knowledge of RTD
P(t)
t
D
T*
* arg min (
)
)
(t
t
tT E R
E R
Luby (1993): Optimal policy uses fixed cutoff
where is the expected time to solution
restarting every
t steps
Complete Knowledge of RTD
P(t)
t
D
T*
* arg min ( )
( ) ( | )( )t
tt
tE T t E T T t T
P T t
T E T
Luby (1993): Optimal policy uses fixed cutoff
where is the
length of complete run (without c
sin utgle off)
No Knowledge of RTD
1, 1, 2, 1, 1, 2, 4, ...
(log *)O T
Luby (1993): of cutoffs
is within of the optimal policy for the
unknown di
Universal sequence
In
stribution
- 1-2 orders of maprac gnitice tude
Open cases: Partial knowledge of RTD (CP 2002) Additional knowledge beyond RTD
Example: Runtime Observations
P(t)
t
D1 D2D
T1 T2T*
Idea: use observations of early progress of a run to induce finer-grained RTD’s
Example: Runtime Observations
P(t)
t
D1 D2
What is optimal policy, given original & component RTD’s, and classification of each run?
Lazy: use static optimal cutoff for combined RTD
D
T*
Example: Runtime Observations
P(t)
t
D1 D2
T1* T2*
What is optimal policy, given original & component RTD’s, and classification of each run?
Naïve: use static optimal cutoff for each RTD
Results Method for inducing component
distributions using Bayesian learning on traces of solver Resampling & Runtime Observations
Optimal policy where observation assigns each run to a component distribution
Conditions under which optimal policy prunes one (or more) distributions
Empirical demonstration of speedup
I. Learning to Predict Solver Performance
Formulation of Learning Problem
Consider a burst of evidence over observation horizon
Learn a runtime predictive model using supervised learning
LongLongShortShortObservation horizonObservation horizon
Median run timeMedian run time
Horvitz, et al. UAI 2001
Runtime Features Solver instrumented to record at each
choice (branch) point: SAT & CSP generic features: number free
variables, depth of tree, amount unit propagation, number backtracks, …
CSP domain-specific features (QCP): degree of balance of uncolored squares, …
Gather statistics over 10 choice points: initial / final / average values 1st and 2nd derivatives SAT: 127 variables, CSP: 135 variables
Learning a Predictive Model Training data: samples from
original RTD labeled by (summary features, length of run)
Learn a decision tree that predicts whether current run will complete in less than the median run time
65% - 90% accuracy
Generating Distributions by Resampling the Training Data Reasons:
The predictive models are imperfect Analyses that include a layer of error
analysis for the imperfect model are cumbersome
Resampling the training data: Use the inferred decision trees to define
different classes Relabel the training data according to these
classes
Creating Labels
The decision tree reduces all the observed features to a single evidential feature F
F can be: Binary valued
Indicates prediction: shorter than median runtime?
Multi-valued Indicates particular leaf of the decision tree that is
reached when trace of a partial run is classified
Result
Decision tree can be used to precisely classify new runs as random samples from the induced RTD’s
P(t)
t
Observed F Observed FD
medianMake Observation
II. Creating Optimal Control Policies
Control Policies Problem Statement:
A process generates runs randomly from a known RTD
After the run has completed K steps, we may observe features of the run
We may stop a run at any point Goal: Minimize expected time to solution Note: using induced component RTD’s implies
that runs are statistically independent Optimal policy is stationary
Optimal Policies
Straightforward generalization to multi-valued features
1 1
1
Obs
Obs
T T T
TT
Optimal binary feature F (1) or of the form:(2)
restart policy for using a is of the form:
Set cutoff to for a fixed
Wait for steps, then observe F; If F holds, then use cutoff
1 2
2
,T T
Tfor appropriate constants
else use cutoff
Case (2): Determining Optimal Cutoffs
** **1 2 1 2
( )
( , ,...) arg min ( , ,...)
( ( ))
arg min( )
i
i
i
i
i i ii t T
Ti i i
T
i
d i
q t
d T q t
d q T
T T E T T
where is the probability of observing the -th value of F
is the probability run succeeds in t or fewer
steps
Optimal Pruning Runs from component D2 should be
pruned (terminated) immediately after observation when:
1 1
2
12 2
0 : ( , ) ( , )
- ( )( , )
( ) ( )
i
Obs Obs
Obs Obs
T t TObs
Obs Obs
D
E T T E T T
q tE T T
q T q T
for all
Note
Equivalently:
not depend on prio: do rses on
III. Empirical Evaluation
Backtracking Problem Solvers
Randomized SAT solver Satz-Rand, a randomized version of Satz (Li 1997)
DPLL with 1-step lookahead Randomization with noise parameter for
increasing variable choices
Randomized CSP solver Specialized CSP solver for QCP ILOG constraint programming library Variable choice, variant of Brelaz heuristic
Domains Quasigroup With Holes
Graph Coloring
Logistics Planning (SATPLAN)
Dynamic Restart Policies Binary dynamic policies
Runs are classified as either having short or long run-time distributions
N-ary dynamic policies Each leaf in the decision tree is
considered as defining a distinct distribution
Policies for Comparison Luby optimal fixed cutoff
For original combined distribution Luby universal policy Binary naïve policy
Select distinct, separately optimal fixed cutoffs for the long and for the short distributions
Illustration of Cutoffs
P(t)
t
D1 D2D
T1* T2*T*
Make Observation
T2** T1**
Comparative Results
Expected Runtime (Choice Points)
QCP (CSP) QCP (Satz)
Graph Coloring (Satz)
Planning (Satz)
Dynamic n-ary 3,295 8,962 9,499 5,099
Dynamic binary 5,220 11,959 10,157 5,366
Fixed optimal 6,534 12,551 14,669 6,402
Binary naïve 17,617 12,055 14,669 6,962
Universal 12,804 29,320 38,623 17,359
Median (no cutoff)
69,046 48,244 39,598 25,255
Improvement of dynamic policies over Luby fixed optimal cutoff policy is 40~65%
Cutoffs: Graph Coloring (Satz)
Dynamic n-ary: 10, 430, 10, 345, 10, 10
Dynamic binary: 455, 10Binary naive: 342, 500Fixed optimal: 363
Discussion Most optimal policies turned out to
prune runs Policy construction independent
from run classification – may use other learning techniques Does not require highly-accurate
prediction! Widely applicable
Limitations Analysis does not apply in cases where
runs are statistically dependent Example:
We begin with 2 or more RTD’s E.g.: of SAT and UNSAT formulas
Environment flips a coin to choose a RTD, and then always samples that RTD
We do not get to see the coin flip! Now each unsuccessful run gives us
information about that coin flip!
The Dependent Case Dependent case much harder to solve Ruan et al. CP-2002:
“Restart Policies with Dependence among Runs: A Dynamic Programming Approach”
Future work Using RTD’s of ensembles to reason about
RTD’s of individual problem instances Learning RTD’s on the fly (reinforcement
learning)
Big Picture
ProblemInstances
Solver
static features
runtime
Learning /Analysis
PredictiveModel
dynamic features
control / policy