Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Dynamic RestartsOptimal Randomized Restart Policies

with Observation

Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Outline Background

heavy-tailed run-time distributions of backtracking search

restart policies Optimal strategies to improve expected

time to solution using observation of solver behavior during

particular runs predictive model of solver performance

Empirical results

Backtracking SearchBacktracking Search

Backtracking search algorithms often exhibit a remarkable variability in performance among: slightly different problem instances slightly different heuristics different runs of randomized heuristics

Problematic for practical application Verification, scheduling, planning

Heavy-tailed Runtime DistributionsHeavy-tailed Runtime Distributions

Observation (Gomes 1997): distributions of runtimes of backtrack solvers often have heavy tails infinite mean and variance probability of long runs decays by power law

(Pareto-Levy), rather than exponentially (Normal)

Very short Very long

Formal Models of Heavy-tailed Behavior

Imbalanced tree search models (Chen 2001) Exponentially growing subtrees occur with

exponentially decreasing probabilities Heavy-tailed runtime distribution can arise

in backtrack search for imbalanced models with appropriate parameters p and b p is the probability of the branching heuristics

making an error b is the branch factor

Randomized Restarts Solution: randomize the systematic

solver Add noise to the heuristic branching (variable

choice) function Cutoff and restart search after some number

of steps Provably eliminates heavy tails Effective whenever search stagnates

Even if RTD is not formally heavy-tailed! Used by all state-of-the-art SAT engines

Chaff, GRASP, BerkMin Superscalar processor verification

Complete Knowledge of RTD

P(t)

t

D


P(t)

t

D

T*

* arg min (

)

)

(t

t

tT E R

E R

Luby (1993): Optimal policy uses fixed cutoff

where is the expected time to solution

restarting every

t steps


P(t)

t

D

T*

* arg min ( )

( ) ( | )( )t

tt

tE T t E T T t T

P T t

T E T

Luby (1993): Optimal policy uses fixed cutoff

where is the

length of complete run (without c

sin utgle off)

No Knowledge of RTD

1, 1, 2, 1, 1, 2, 4, ...

(log *)O T

Luby (1993): of cutoffs

is within of the optimal policy for the

unknown di

Universal sequence

In

stribution

- 1-2 orders of maprac gnitice tude

Open cases: Partial knowledge of RTD (CP 2002) Additional knowledge beyond RTD

Example: Runtime Observations

P(t)

t

D1 D2D

T1 T2T*

Idea: use observations of early progress of a run to induce finer-grained RTD’s


P(t)

t

D1 D2

What is optimal policy, given original & component RTD’s, and classification of each run?

Lazy: use static optimal cutoff for combined RTD

D

T*


P(t)

t

D1 D2

T1* T2*

What is optimal policy, given original & component RTD’s, and classification of each run?

Naïve: use static optimal cutoff for each RTD

Results Method for inducing component

distributions using Bayesian learning on traces of solver Resampling & Runtime Observations

Optimal policy where observation assigns each run to a component distribution

Conditions under which optimal policy prunes one (or more) distributions

Empirical demonstration of speedup

I. Learning to Predict Solver Performance

Formulation of Learning Problem

Consider a burst of evidence over observation horizon

Learn a runtime predictive model using supervised learning

LongLongShortShortObservation horizonObservation horizon

Median run timeMedian run time

Horvitz, et al. UAI 2001

Runtime Features Solver instrumented to record at each

choice (branch) point: SAT & CSP generic features: number free

variables, depth of tree, amount unit propagation, number backtracks, …

CSP domain-specific features (QCP): degree of balance of uncolored squares, …

Gather statistics over 10 choice points: initial / final / average values 1st and 2nd derivatives SAT: 127 variables, CSP: 135 variables

Learning a Predictive Model Training data: samples from

original RTD labeled by (summary features, length of run)

Learn a decision tree that predicts whether current run will complete in less than the median run time

65% - 90% accuracy

Generating Distributions by Resampling the Training Data Reasons:

The predictive models are imperfect Analyses that include a layer of error

analysis for the imperfect model are cumbersome

Resampling the training data: Use the inferred decision trees to define

different classes Relabel the training data according to these

classes

Creating Labels

The decision tree reduces all the observed features to a single evidential feature F

F can be: Binary valued

Indicates prediction: shorter than median runtime?

Multi-valued Indicates particular leaf of the decision tree that is

reached when trace of a partial run is classified

Result

Decision tree can be used to precisely classify new runs as random samples from the induced RTD’s

P(t)

t

Observed F Observed FD

medianMake Observation

II. Creating Optimal Control Policies

Control Policies Problem Statement:

A process generates runs randomly from a known RTD

After the run has completed K steps, we may observe features of the run

We may stop a run at any point Goal: Minimize expected time to solution Note: using induced component RTD’s implies

that runs are statistically independent Optimal policy is stationary

Optimal Policies

Straightforward generalization to multi-valued features

1 1

1

Obs

Obs

T T T

TT

Optimal binary feature F (1) or of the form:(2)

restart policy for using a is of the form:

Set cutoff to for a fixed

Wait for steps, then observe F; If F holds, then use cutoff

1 2

2

,T T

Tfor appropriate constants

else use cutoff

Case (2): Determining Optimal Cutoffs

** **1 2 1 2

( )

( , ,...) arg min ( , ,...)

( ( ))

arg min( )

i

i

i

i

i i ii t T

Ti i i

T

i

d i

q t

d T q t

d q T

T T E T T

where is the probability of observing the -th value of F

is the probability run succeeds in t or fewer

steps

Optimal Pruning Runs from component D2 should be

pruned (terminated) immediately after observation when:

1 1

2

12 2

0 : ( , ) ( , )

- ( )( , )

( ) ( )

i

Obs Obs

Obs Obs

T t TObs

Obs Obs

D

E T T E T T

q tE T T

q T q T

for all

Note

Equivalently:

not depend on prio: do rses on

III. Empirical Evaluation

Backtracking Problem Solvers

Randomized SAT solver Satz-Rand, a randomized version of Satz (Li 1997)

DPLL with 1-step lookahead Randomization with noise parameter for

increasing variable choices

Randomized CSP solver Specialized CSP solver for QCP ILOG constraint programming library Variable choice, variant of Brelaz heuristic

Domains Quasigroup With Holes

Graph Coloring

Logistics Planning (SATPLAN)

Dynamic Restart Policies Binary dynamic policies

Runs are classified as either having short or long run-time distributions

N-ary dynamic policies Each leaf in the decision tree is

considered as defining a distinct distribution

Policies for Comparison Luby optimal fixed cutoff

For original combined distribution Luby universal policy Binary naïve policy

Select distinct, separately optimal fixed cutoffs for the long and for the short distributions

Illustration of Cutoffs

P(t)

t

D1 D2D

T1* T2*T*

Make Observation

T2** T1**

Comparative Results

Expected Runtime (Choice Points)

QCP (CSP) QCP (Satz)

Graph Coloring (Satz)

Planning (Satz)

Dynamic n-ary 3,295 8,962 9,499 5,099

Dynamic binary 5,220 11,959 10,157 5,366

Fixed optimal 6,534 12,551 14,669 6,402

Binary naïve 17,617 12,055 14,669 6,962

Universal 12,804 29,320 38,623 17,359

Median (no cutoff)

69,046 48,244 39,598 25,255

Improvement of dynamic policies over Luby fixed optimal cutoff policy is 40~65%

Cutoffs: Graph Coloring (Satz)

Dynamic n-ary: 10, 430, 10, 345, 10, 10

Dynamic binary: 455, 10Binary naive: 342, 500Fixed optimal: 363

Discussion Most optimal policies turned out to

prune runs Policy construction independent

from run classification – may use other learning techniques Does not require highly-accurate

prediction! Widely applicable

Limitations Analysis does not apply in cases where

runs are statistically dependent Example:

We begin with 2 or more RTD’s E.g.: of SAT and UNSAT formulas

Environment flips a coin to choose a RTD, and then always samples that RTD

We do not get to see the coin flip! Now each unsuccessful run gives us

information about that coin flip!

The Dependent Case Dependent case much harder to solve Ruan et al. CP-2002:

“Restart Policies with Dependence among Runs: A Dynamic Programming Approach”

Future work Using RTD’s of ensembles to reason about

RTD’s of individual problem instances Learning RTD’s on the fly (reinforcement

learning)

Big Picture

ProblemInstances

Solver

static features

runtime

Learning /Analysis

PredictiveModel

dynamic features

control / policy

Documents

Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman