View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Lukas Kroc, Ashish Sabharwal, Bart SelmanCornell University, USA
SAT 2010 Conference
Edinburgh, July 2010
An Empirical Study of Optimal Noise andRuntime Distributions in Local Search
Presented by:
Holger H. Hoos
Local Search Methods for SAT
A lot is known about Stochastic Local Search (SLS) methods[e.g. Hoos-Stutzle ’04], especially their behavior on random 3-SAT
Along with systematic search, the main SAT solution paradigm Walksat one of the first widely successful local search solver
Biased random walk Combines greedy moves (downhill) with stochastic moves (possibly
uphill) controlled by a “noise” parameter [0% .. 100%]
Yet, new surprising findings are still being discovered Part of this work motivated by the following observation:
Empirical evidence that Walksat's running time on large, random,3-SAT instances is quite predictable, and scales linearly
with number of variables for a specific setting of the noise parameter[Seitz-Alava-Orponen 2005]
Optimal Noise and Runtime Distributions in Local Search 2
Our Motivation
Our work looks at Walksat again, on large, random, 3-SAT formulas, and seeks answers to two questions:
A. Can we further characterize the “optimal noise” and the linear scaling behavior of Walksat?• Key parameter: the clause-to-variable ratio, α
B. How do runtime distributions of Walksat behave at sub-optimal noise?• Are they concentrated around the mean or do they have “heavy
tails” similar to complete search methods?• Heavy tails very long runs more likely than we might expect• Heavy tails not reported in local search so far
Note: Walksat still faster than current adaptive, dynamic noise solvers on these formulas; studying behavior at optimal static noise of much interest
Optimal Noise and Runtime Distributions in Local Search 3
Summary of Results
Walksat on large, random, 3-SAT formulas:
A. Further characterization the “optimal noise” and linear scaling: A detailed analysis, showing a piece-wise linear fit for optimal noise
as a function of α, with transitions at interesting points(extending the previous observation that ~57% is optimal for α=4.2)
Simple inverse polynomial dependence of runtime on α
B. Runtime distributions of Walksat behave at sub-optimal noise Exponential decay in the high noise regime Heavy tails in the low noise regime
First quantitative observation of heavy tails in local search [earlier insights: Hoos-Stutzle 2000]
Preliminary Markov Chain model
Optimal Noise and Runtime Distributions in Local Search 4
A. Further Study of Optimal Noise and Linear Scaling
Optimal Noise Setting vs. α
Question:
How does the optimal noise setting vary with α and N?
Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K]
For each, find the noise setting where Walksat is the fastest (binary search)
Average these optimal noise settings and plot against α
Optimal Noise and Runtime Distributions in Local Search 6
Optimal Noise and Runtime Distributions in Local Search 7
Optimal Noise Setting vs. α
Data with 1 standard deviation bars
Generalized Unit Clause heuristic works till here
Greedy Walksat (GSAT) works till here
Optimal noise depends significantly on α (e.g., ~46% at α=3.9; ~57% at α=4.2)
Very good piece-wise linear fit
Transitions at interesting places:• α≈3: up to which generalized
unit clause (GUC) rule works almost surely [Frieze-Suen 1996]
• α≈3.9: up to which greedy Walksat (GSAT) works (also where “clustering structure” of the solution space is believed to change drastically: from one giant cluster to exponentially many small ones [Mezard-Mora-Zecchina 2005])
Optimal Noise and Runtime Distributions in Local Search 8
Linear Scaling at Optimal Noise
Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K] Measure Walksat's runtime with optimal noise (#flips till solution found) Plot #flips/N against α (one point per run, no averaging)
Results: Inverse polynomial fit of #flips/N as a function of α Suggesting linear scaling for α < 4.235
Points with varying N fall on each other after
rescaling by N, showing linearity wrt N
[fig explained in paper]
B. Runtime Distribution of Local Search Methods
Optimal Noise and Runtime Distributions in Local Search 10
Standard vs. Heavy Tailed Distributions
Standard distributions: Exponential or faster decay
e.g., Normal distribution
Heavy-tailed distributions: Power law decay
e.g. Pareto-Levy distribution
Power Law Decay
Standard Distribution(finite mean & variance)
Exponential Decay
Optimal Noise and Runtime Distributions in Local Search 11
Heavy Tailed Distributions
Heavy-tailed distributions: Power law decay
e.g. Pareto-Levy distribution
Signature: tail of the distribution is a line in log-log plot
Observed in systematic search solvers Mechanism well-understood in terms of “bad” variable assignments that
are hard to recover from [Gomes, Kautz and Selman ‘99, ’00] Motivated key techniques such as search restarts, algorithm portfolios
Not previously observed in studies on local search methods
Power Law Decay
Optimal Noise and Runtime Distributions in Local Search 12
Runtime Distributions of Walksat
Experiment:
Generate a random 3-SAT formula with N=100K at α=4.2 Large formulas, free of small size effects Very hard to solve Still less constrained than formulas at the phase transition
(α4.26)
Run 100K (!) runs of Walksat with noise settings around the optimal
Plot the runtime distribution: probability of failure to find a solutionas a function of #flips
Optimal Noise and Runtime Distributions in Local Search 13
Runtime Distributions of Walksat
[Setting: Large, random, 3-SAT formula with α=4.2]
Summary of Results:
There is a qualitative difference between noise higher that optimal (>56.7%) and lower that optimal (<56.7%)
High noise regime: tail of P[failure] has an exponential distribution Low noise regime : tail of P[failure] has a power-law distribution
Intuition captured by a (preliminary) Markov Chain model High noise means “guessing the solution” Low noise (too greedy) leads the search into “local traps” Optimal noise is where the two effects balance
14
Heavy-Tails in Low Noise Regimes
LOG-LOG scale straight line = power-law decay
Last 5% of tail (5K points)Linear slope = 0.38
100K data points plotted per curve;actual data points, no fitting;Not all data points markedwith o, x, +, etc. for clarity
Heavy-Tails in Low Noise Regimes
Optimal Noise and Runtime Distributions in Local Search 15
Same data as previous plot, but with all 100K data points (per curve) marked
with o, x, +, etc., and full y-axis. As before, actual data points, no fitting.
Qualitative Contrast: High vs. Low Noise Regimes
Optimal Noise and Runtime Distributions in Local Search 16
High Noise
Low Noise
Not straight lines not heavy tailed. In fact, log-linear plot reveals a clear exponential tail
Line heavy tailed.
extremely long runs are much more likely than one might expect!
LOG-LOG scale straight line = power-law decay
Understanding Variation with Noise Leveland Power-Law Decay: Preliminary Insights
Different “Search” at High, Low, Opt Noise
Optimal Noise and Runtime Distributions in Local Search 18
Experiment: Run Walksat at different noise levels on a formula with 100K vars, 420K clauses Plot how the number of unsatisfied clauses evolves as the search progresses
(0 on y-axis = solution)
High noise: search “stuck”at a relatively high value
Optimal noise: a gradualdescent until solution found
Low noise: #unsat clausesdecreases fast but gets “stuck”at a relatively low value
Markov Chain Model CapturingPower-Law Decay (preliminary)
Optimal Noise and Runtime Distributions in Local Search 19
[details omitted; refer to paper. Similar to work of Hoos ’02]
Key features: States represent (roughly) the
number of unsatisfied clauses;left-most state = all solutions
Ladder structures capture fallinginto a “trap”; the farther it keepsfalling, the harder it gets to recover(recovery time = hitting time of a biased 1-dimensional Markov Chain)
Markov Chain Model CapturingPower-Law Decay (preliminary)
Optimal Noise and Runtime Distributions in Local Search 20
[details omitted; refer to paper. Similar to work of Hoos ’02]
In the horizontal part of the chain:
High noise: avoids traps but attraction towards the top-middle node; exponential time to convergence, very concentrated around the mean
Low noise: leftward drift but good chance of falling into a trap; exponential time to convergence but power-law decay
Summary
Optimal Noise and Runtime Distributions in Local Search 21
A. Further study of optimal noise for Walksat depends on the clause-to-variable ratio, α, in piece-wise linear fashion
with transitions at interesting points allows for a simple inverse polynomial fit for the linearity constant
B. Runtime distributions in local search drastic change in behavior below and above optimal noise exponential decay for higher-than-optimal noise power-law decay (heavy tails) for lower-than-optimal noise
Future directions: A better understanding of when heavy tails appear and when they don’t Improved model capturing heavy tails in local search Ways of utilizing these insights to improve local search solvers
(similar to restarts and algorithm portfolios for complete search)