38
Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Embed Size (px)

Citation preview

Page 1: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Dynamic RestartsOptimal Randomized Restart Policies

with Observation

Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Page 2: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Outline Background

heavy-tailed run-time distributions of backtracking search

restart policies Optimal strategies to improve expected

time to solution using observation of solver behavior during

particular runs predictive model of solver performance

Empirical results

Page 3: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Backtracking SearchBacktracking Search

Backtracking search algorithms often exhibit a remarkable variability in performance among: slightly different problem instances slightly different heuristics different runs of randomized heuristics

Problematic for practical application Verification, scheduling, planning

Page 4: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Heavy-tailed Runtime DistributionsHeavy-tailed Runtime Distributions

Observation (Gomes 1997): distributions of runtimes of backtrack solvers often have heavy tails infinite mean and variance probability of long runs decays by power law

(Pareto-Levy), rather than exponentially (Normal)

Very short Very long

Page 5: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Formal Models of Heavy-tailed Behavior

Imbalanced tree search models (Chen 2001) Exponentially growing subtrees occur with

exponentially decreasing probabilities Heavy-tailed runtime distribution can arise

in backtrack search for imbalanced models with appropriate parameters p and b p is the probability of the branching heuristics

making an error b is the branch factor

Page 6: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Randomized Restarts Solution: randomize the systematic

solver Add noise to the heuristic branching (variable

choice) function Cutoff and restart search after some number

of steps Provably eliminates heavy tails Effective whenever search stagnates

Even if RTD is not formally heavy-tailed! Used by all state-of-the-art SAT engines

Chaff, GRASP, BerkMin Superscalar processor verification

Page 7: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Complete Knowledge of RTD

P(t)

t

D

Page 8: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Complete Knowledge of RTD

P(t)

t

D

T*

* arg min (

)

)

(t

t

tT E R

E R

Luby (1993): Optimal policy uses fixed cutoff

where is the expected time to solution

restarting every

t steps

Page 9: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Complete Knowledge of RTD

P(t)

t

D

T*

* arg min ( )

( ) ( | )( )t

tt

tE T t E T T t T

P T t

T E T

Luby (1993): Optimal policy uses fixed cutoff

where is the

length of complete run (without c

sin utgle off)

Page 10: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

No Knowledge of RTD

1, 1, 2, 1, 1, 2, 4, ...

(log *)O T

Luby (1993): of cutoffs

is within of the optimal policy for the

unknown di

Universal sequence

In

stribution

- 1-2 orders of maprac gnitice tude

Open cases: Partial knowledge of RTD (CP 2002) Additional knowledge beyond RTD

Page 11: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Example: Runtime Observations

P(t)

t

D1 D2D

T1 T2T*

Idea: use observations of early progress of a run to induce finer-grained RTD’s

Page 12: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Example: Runtime Observations

P(t)

t

D1 D2

What is optimal policy, given original & component RTD’s, and classification of each run?

Lazy: use static optimal cutoff for combined RTD

D

T*

Page 13: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Example: Runtime Observations

P(t)

t

D1 D2

T1* T2*

What is optimal policy, given original & component RTD’s, and classification of each run?

Naïve: use static optimal cutoff for each RTD

Page 14: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Results Method for inducing component

distributions using Bayesian learning on traces of solver Resampling & Runtime Observations

Optimal policy where observation assigns each run to a component distribution

Conditions under which optimal policy prunes one (or more) distributions

Empirical demonstration of speedup

Page 15: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

I. Learning to Predict Solver Performance

Page 16: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Formulation of Learning Problem

Consider a burst of evidence over observation horizon

Learn a runtime predictive model using supervised learning

LongLongShortShortObservation horizonObservation horizon

Median run timeMedian run time

Horvitz, et al. UAI 2001

Page 17: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Runtime Features Solver instrumented to record at each

choice (branch) point: SAT & CSP generic features: number free

variables, depth of tree, amount unit propagation, number backtracks, …

CSP domain-specific features (QCP): degree of balance of uncolored squares, …

Gather statistics over 10 choice points: initial / final / average values 1st and 2nd derivatives SAT: 127 variables, CSP: 135 variables

Page 18: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Learning a Predictive Model Training data: samples from

original RTD labeled by (summary features, length of run)

Learn a decision tree that predicts whether current run will complete in less than the median run time

65% - 90% accuracy

Page 19: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Generating Distributions by Resampling the Training Data Reasons:

The predictive models are imperfect Analyses that include a layer of error

analysis for the imperfect model are cumbersome

Resampling the training data: Use the inferred decision trees to define

different classes Relabel the training data according to these

classes

Page 20: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Creating Labels

The decision tree reduces all the observed features to a single evidential feature F

F can be: Binary valued

Indicates prediction: shorter than median runtime?

Multi-valued Indicates particular leaf of the decision tree that is

reached when trace of a partial run is classified

Page 21: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Result

Decision tree can be used to precisely classify new runs as random samples from the induced RTD’s

P(t)

t

Observed F Observed FD

medianMake Observation

Page 22: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

II. Creating Optimal Control Policies

Page 23: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Control Policies Problem Statement:

A process generates runs randomly from a known RTD

After the run has completed K steps, we may observe features of the run

We may stop a run at any point Goal: Minimize expected time to solution Note: using induced component RTD’s implies

that runs are statistically independent Optimal policy is stationary

Page 24: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Optimal Policies

Straightforward generalization to multi-valued features

1 1

1

Obs

Obs

T T T

TT

Optimal binary feature F (1) or of the form:(2)

restart policy for using a is of the form:

Set cutoff to for a fixed

Wait for steps, then observe F; If F holds, then use cutoff

1 2

2

,T T

Tfor appropriate constants

else use cutoff

Page 25: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Case (2): Determining Optimal Cutoffs

** **1 2 1 2

( )

( , ,...) arg min ( , ,...)

( ( ))

arg min( )

i

i

i

i

i i ii t T

Ti i i

T

i

d i

q t

d T q t

d q T

T T E T T

where is the probability of observing the -th value of F

is the probability run succeeds in t or fewer

steps

Page 26: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Optimal Pruning Runs from component D2 should be

pruned (terminated) immediately after observation when:

1 1

2

12 2

0 : ( , ) ( , )

- ( )( , )

( ) ( )

i

Obs Obs

Obs Obs

T t TObs

Obs Obs

D

E T T E T T

q tE T T

q T q T

for all

Note

Equivalently:

not depend on prio: do rses on

Page 27: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

III. Empirical Evaluation

Page 28: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Backtracking Problem Solvers

Randomized SAT solver Satz-Rand, a randomized version of Satz (Li 1997)

DPLL with 1-step lookahead Randomization with noise parameter for

increasing variable choices

Randomized CSP solver Specialized CSP solver for QCP ILOG constraint programming library Variable choice, variant of Brelaz heuristic

Page 29: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Domains Quasigroup With Holes

Graph Coloring

Logistics Planning (SATPLAN)

Page 30: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Dynamic Restart Policies Binary dynamic policies

Runs are classified as either having short or long run-time distributions

N-ary dynamic policies Each leaf in the decision tree is

considered as defining a distinct distribution

Page 31: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Policies for Comparison Luby optimal fixed cutoff

For original combined distribution Luby universal policy Binary naïve policy

Select distinct, separately optimal fixed cutoffs for the long and for the short distributions

Page 32: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Illustration of Cutoffs

P(t)

t

D1 D2D

T1* T2*T*

Make Observation

T2** T1**

Page 33: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Comparative Results

Expected Runtime (Choice Points)

QCP (CSP) QCP (Satz)

Graph Coloring (Satz)

Planning (Satz)

Dynamic n-ary 3,295 8,962 9,499 5,099

Dynamic binary 5,220 11,959 10,157 5,366

Fixed optimal 6,534 12,551 14,669 6,402

Binary naïve 17,617 12,055 14,669 6,962

Universal 12,804 29,320 38,623 17,359

Median (no cutoff)

69,046 48,244 39,598 25,255

Improvement of dynamic policies over Luby fixed optimal cutoff policy is 40~65%

Page 34: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Cutoffs: Graph Coloring (Satz)

Dynamic n-ary: 10, 430, 10, 345, 10, 10

Dynamic binary: 455, 10Binary naive: 342, 500Fixed optimal: 363

Page 35: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Discussion Most optimal policies turned out to

prune runs Policy construction independent

from run classification – may use other learning techniques Does not require highly-accurate

prediction! Widely applicable

Page 36: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Limitations Analysis does not apply in cases where

runs are statistically dependent Example:

We begin with 2 or more RTD’s E.g.: of SAT and UNSAT formulas

Environment flips a coin to choose a RTD, and then always samples that RTD

We do not get to see the coin flip! Now each unsuccessful run gives us

information about that coin flip!

Page 37: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

The Dependent Case Dependent case much harder to solve Ruan et al. CP-2002:

“Restart Policies with Dependence among Runs: A Dynamic Programming Approach”

Future work Using RTD’s of ensembles to reason about

RTD’s of individual problem instances Learning RTD’s on the fly (reinforcement

learning)

Page 38: Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

Big Picture

ProblemInstances

Solver

static features

runtime

Learning /Analysis

PredictiveModel

dynamic features

control / policy