Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution...
24
Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms Martin Pelikan and Mark W. Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO [email protected], [email protected]http://medal.cs.umsl.edu/ Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms
Slides from the presentation from the Optimization by Building and Using Probabilistic Models (OBUPM-2011) at ACM SIGEVO GECCO 2011.
Citation preview
Using Problem-Specic Knowledge and Learning from Experience in
Estimation of Distribution Algorithms Martin Pelikan and Mark W.
Hauschild Missouri Estimation of Distribution Algorithms Laboratory
(MEDAL) University of Missouri, St. Louis, MO [email protected],
[email protected] http://medal.cs.umsl.edu/Martin Pelikan, Mark W.
Hauschild Prior Knowledge and Learning from Experience in EDAs
Motivation Two key questions Can we use past EDA runs to solve
future problems faster? EDAs do more than solve a problem. EDAs
provide us with lot of information about the landscape. Why throw
out this information? Can we use problem-specic knowledge to speed
up EDAs? EDAs are able to adapt exploration operators to the
problem. We do not have to know much about the problem to solve it.
But why throw out prior problem-specic information if available?
This presentation Reviews some of the approaches that attempt to do
this. Focus is on two areas: Using prior problem-specic knowledge.
Learning from experience (past EDA runs).Martin Pelikan, Mark W.
Hauschild Prior Knowledge and Learning from Experience in EDAs
Outline 1. EDA bottlenecks. 2. Prior problem-specic knowledge.
3. Learning from experience. 4. Summary and conclusions.Martin
Pelikan, Mark W. Hauschild Prior Knowledge and Learning from
Experience in EDAs
Estimation of Distribution Algorithms Estimation of
distribution algorithms (EDAs) Work with a population of candidate
solutions. Learn probabilistic model of promising solutions. Sample
the model to generate new solutions. Probabilistic Model-Building
GAs Current Selected New population population population
Probabilistic Model replace crossover+mutation with learning in
EDAsMartin Pelikan, Mark W. Hauschild Prior Knowledge and Learning
from Experience
Eciency Enhancement of EDAs Main EDA bottlenecks Evaluation.
Model building. Model sampling. Memory complexity (models,
candidate solutions). Eciency enhancement techniques Address one or
more bottlenecks. Can adopt much from standard evolutionary
algorithms. But EDAs provide opportunities to do more than that!
Many approaches, we focus on a few.Martin Pelikan, Mark W.
Hauschild Prior Knowledge and Learning from Experience in EDAs
What Comes Next? 1. Using problem-specic knowledge. 2. Learning
from experience.Martin Pelikan, Mark W. Hauschild Prior Knowledge
and Learning from Experience in EDAs
Problem-Specic Knowledge in EDAs Basic idea We dont have to
know much about the problem to use EDAs. But what if we do know
something about it? Can we use prior problem-specic knowledge in
EDAs? Bias populations Inject high quality solutions into
population. Modify solutions using a problem-specic procedure. Bias
model building How to bias Bias model structure (e.g. Bayesian
network structure). Bias model parameters (e.g. conditional
probabilities). Types of bias Hard bias: Restrict admissible
models/parameters. Soft bias: Some models/parameters given
preference over others.Martin Pelikan, Mark W. Hauschild Prior
Knowledge and Learning from Experience in EDAs
Example: Biasing Model Structure in Graph Bipartitioning Graph
bipartitioning Input Graph G = (V, E). V are nodes. E are edges.
Task Split V into equally sized subsets so that the number of edges
between these subsets is minimized.Martin Pelikan, Mark W.
Hauschild Prior Knowledge and Learning from Experience in EDAs
Example: Biasing Model Structure in Graph Bipartitioning
Biasing models in graph bipartitioning Soft bias (Schwarz &
Ocenasek, 2000) Increase prior probability of models with
dependencies included in E. Decrease prior probability of models
with dependencies not included in E. Hard bias (Mhlenbein and
Mahnig, 2002) u Strictly disallow model dependencies that disagree
with edges in E. In both cases performance of EDAs was
substantially improved.Martin Pelikan, Mark W. Hauschild Prior
Knowledge and Learning from Experience in EDAs
Important Challenges Challenges in the use of prior knowledge
in EDAs Parameter bias using prior probabilities not explored much.
Structural bias introduced only rarely. Model bias often studied
only on surface. Theory missing.Martin Pelikan, Mark W. Hauschild
Prior Knowledge and Learning from Experience in EDAs
Learning from Experience Basic idea Consider solving many
instances of the same problem class. Can we learn from past EDA
runs to solve future instances of this problem type faster? Similar
to the use of prior knowledge, but in this case we automate the
discovery of problem properties (instead of relying on expert
knowledge). What features to learn? Model structure. Promising
candidate solutions or partial solutions. Algorithm parameters. How
to use the learned features? Modify/restrict algorithm parameters.
Bias populations. Bias models.Martin Pelikan, Mark W. Hauschild
Prior Knowledge and Learning from Experience in EDAs
Example: Probability Coincidence Matrix Probability coincidence
matrix (PCM) Hauschild, Pelikan, Sastry, Goldberg (2008). Each
model may contain dependency between Xi and Xj . PCM stores
observed probabilities of dependencies. PCM = {pij } where i, j {1,
2, . . . , n}. pi,j = proportion of models with dependency between
Xi and Xj . Example PCMMartin Pelikan, Mark W. Hauschild Prior
Knowledge and Learning from Experience in EDAs
Example: Probability Coincidence Matrix Using PCM for hard bias
Hauschild et al. (2008). Set threshold for the minimum proportion
of a dependency. Only accept dependencies occuring at least that
often. Strictly disallow other dependencies. Using PCM for soft
bias Hauschild and Pelikan (2009). Introduce prior probability of a
model structure. Dependencies that were more likely in the past are
given preference.Martin Pelikan, Mark W. Hauschild Prior Knowledge
and Learning from Experience in EDAs
(b) 24x24 Results: PCM for 32 32 2D Spin Glass 5 Execution Time
Speedup 4 3 2 1 0 1.5 2 0 0.5 1 1.5 2ntage allowed Minimum edge
percentage allowed (d) 32x32 (Hauschild, Pelikan, Sastry, Goldberg;
2008)edupPelikan, Mark W. Hauschild restrictions on model-building
Experience in EDAs Martin with increased Prior Knowledge and
Learning from for 10
Results:Hauschild for 32 32 2D Spin Glass Mark W. PCM Size
Execution-time speedup pmin % Total Dep. 256 (16 16) 3.89 0.020
6.4% 324 (18 18) 4.37 0.011 8.7% 400 (20 20) 4.34 0.020 7.0% 484
(22 22) 4.61 0.010 6.3% 576 (24 24) 4.63 0.013 4.6% 676 (26 26)
4.62 0.011 4.7% 784 (28 28) 4.45 0.009 5.4% 900 (30 30) 4.93 0.005
8.1% 1024 (32 32) 4.14 0.007 5.5% Table 2: Optimal speedup and the
corresponding PCM threshold pmin as well as the percentage of total
possible dependencies that were considered for the 2D Ising spin
glass. (Hauschild, Pelikan, Sastry, Goldberg; 2008) maximum
distance of dependencies remains a challenge. If the distances are
restricted too severely, the bias on the model building may be too
strong to allow for sufciently complex models; this was supported
also with results in Hauschild, Pelikan, Lima, and Sastry (2007).
On the other hand, if the distances are not restricted sufciently,
the benets of this approach may be negligible. Prior Knowledge and
Learning from Experience in EDAsMartin Pelikan, Mark W.
Hauschild
Example: Distance Restrictions PCM limitations Only can be
applied when variables have xed function. Dependencies between
specic variables are either more likely or less likely across many
problem instances. Concept is dicult to scale with the number of
variables. Distance restrictions Hauschild, Pelikan, Sastry,
Goldberg (2008). Introduce a distance metric over problem variables
such that variables at shorter distances are more likely to
interact. Gather statistics of dependencies at particular
distances. Decide on distance threshold to disallow some
dependencies. Use distances to provide soft bias via prior
distributions. Distance metrics are often straightforward,
especially for additively decomposable problems.Martin Pelikan,
Mark W. Hauschild Prior Knowledge and Learning from Experience in
EDAs
Example: Distance Restrictions for Graph Bipartitioning Example
for graph bipartitioning Given graph G = (V, E). Assign weight 1
for all edges in E. Distance given as shortest path between
vertices. Unconnected vertices given distance |V |.Martin Pelikan,
Mark W. Hauschild Prior Knowledge and Learning from Experience in
EDAs
Example: Distance Restrictions for ADFs Distance metric for
additively decomposable function Additively decomposable function
(ADF): m f (X1 , . . . , Xn ) = fi (Si ) i=1 fi is ith subfunction
Si is subset of variables from {X1 , . . . , Xn } Connect variables
in the same subset Si for some i. Distance is shortest path between
variables (if connected). Distance is n if path doesnt exist.Martin
Pelikan, Mark W. Hauschild Prior Knowledge and Learning from
Experience in EDAs
(b) 20 20 Results: Distance Restrictions on 28 28 2D Spin Glass
6 Execution Time Speedup 5 4 5 6 4 3 77 88 3 99 2 10 10 11 11 2 13
12 12 14 24 1 28 0 0.8 1 0.2 0.4 0.6 0.8 1ependencies Original
Ratio of Total Dependencies (Hauschild, Pelikan; 2009) (d) 28 28
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from
Experience in EDAs
Results: Distance Restrictions on 2D Spin Glass Biasing models
in hBOA using prior knowledge Size Execution-time speedup Max Dist
Allowed qmin % Total Dep. 256 (16 16) 4.2901 2 0.62 4.7% 400 (20
20) 4.9288 3 0.64 6.0% 576 (24 24) 5.2156 3 0.60 4.1% 784 (28 28)
4.9007 5 0.63 7.6% Table 3: Distance cutoff runs with their best
speedups by distance as well as the per- centage of total possible
dependencies that were considered for 2D Ising spin glass
(Hauschild, Pelikan; 2009) with dependencies restricted by the
maximum distance, instances we ran experiments which was varied
from 1 to the maximum distance found between any two proposi- tions
(for example, for p = 24 we ran experiments using a maximum
distance from 1 to 9). For some instances with p = 1 the maximum
distance was 500, indicating that there was no path between some
pairs of propositions. On the tested problems, small distance
restrictions (restricting to only distance 1 or 2) were sometimes
too restrictive and some instances would not be solved even with
extremely large population sizes (N = 512000); in these cases the
results were omitted (such restrictions were not used).Martin
Pelikan, Mark W. Hauschild Prior Knowledge and Learning from
Experience in EDAs
Important Challenges Challenges in learning from experience The
process of selecting threshold is manual and dicult. The ideas must
be applied and tested on more problem types. Theory is
missing.Martin Pelikan, Mark W. Hauschild Prior Knowledge and
Learning from Experience in EDAs
Another Related Idea: Model-Directed Hybridization
Model-directed hybridization EDA models reveal lot about problem
landscape Use this information to design advanced neighborhood
structures (operators). Use this information to design
problem-specic operators. Lot of successes, lot of work to be
done.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning
from Experience in EDAs
Conclusions and Future Work Conclusions EDAs do a lot more than
just solve the problem. EDAs give us a lot of information about the
problem. EDAs allow use of prior knowledge of various forms. Yet,
most EDA researchers focus on design of new EDAs and only few look
at the use of EDAs beyond solving an isolated problem instance.
Future work Some of the key challenges were mentioned throughout
the talk. If you are interested in collaboration, talk to us.Martin
Pelikan, Mark W. Hauschild Prior Knowledge and Learning from
Experience in EDAs
Acknowledgments Acknowledgments NSF; NSF CAREER grant
ECS-0547013. University of Missouri; High Performance Computing
Collaboratory sponsored by Information Technology Services;
Research Award; Research Board.Martin Pelikan, Mark W. Hauschild
Prior Knowledge and Learning from Experience in EDAs