Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms

Using Problem-Specic Knowledge and Learning from Experience in Estimation of Distribution Algorithms Martin Pelikan and Mark W. Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO [email protected], [email protected] http://medal.cs.umsl.edu/Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Motivation Two key questions Can we use past EDA runs to solve future problems faster? EDAs do more than solve a problem. EDAs provide us with lot of information about the landscape. Why throw out this information? Can we use problem-specic knowledge to speed up EDAs? EDAs are able to adapt exploration operators to the problem. We do not have to know much about the problem to solve it. But why throw out prior problem-specic information if available? This presentation Reviews some of the approaches that attempt to do this. Focus is on two areas: Using prior problem-specic knowledge. Learning from experience (past EDA runs).Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Outline 1. EDA bottlenecks. 2. Prior problem-specic knowledge. 3. Learning from experience. 4. Summary and conclusions.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Estimation of Distribution Algorithms Estimation of distribution algorithms (EDAs) Work with a population of candidate solutions. Learn probabilistic model of promising solutions. Sample the model to generate new solutions. Probabilistic Model-Building GAs Current Selected New population population population Probabilistic Model replace crossover+mutation with learning in EDAsMartin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience

Eciency Enhancement of EDAs Main EDA bottlenecks Evaluation. Model building. Model sampling. Memory complexity (models, candidate solutions). Eciency enhancement techniques Address one or more bottlenecks. Can adopt much from standard evolutionary algorithms. But EDAs provide opportunities to do more than that! Many approaches, we focus on a few.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

What Comes Next? 1. Using problem-specic knowledge. 2. Learning from experience.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Problem-Specic Knowledge in EDAs Basic idea We dont have to know much about the problem to use EDAs. But what if we do know something about it? Can we use prior problem-specic knowledge in EDAs? Bias populations Inject high quality solutions into population. Modify solutions using a problem-specic procedure. Bias model building How to bias Bias model structure (e.g. Bayesian network structure). Bias model parameters (e.g. conditional probabilities). Types of bias Hard bias: Restrict admissible models/parameters. Soft bias: Some models/parameters given preference over others.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Example: Biasing Model Structure in Graph Bipartitioning Graph bipartitioning Input Graph G = (V, E). V are nodes. E are edges. Task Split V into equally sized subsets so that the number of edges between these subsets is minimized.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Example: Biasing Model Structure in Graph Bipartitioning Biasing models in graph bipartitioning Soft bias (Schwarz & Ocenasek, 2000) Increase prior probability of models with dependencies included in E. Decrease prior probability of models with dependencies not included in E. Hard bias (Mhlenbein and Mahnig, 2002) u Strictly disallow model dependencies that disagree with edges in E. In both cases performance of EDAs was substantially improved.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Important Challenges Challenges in the use of prior knowledge in EDAs Parameter bias using prior probabilities not explored much. Structural bias introduced only rarely. Model bias often studied only on surface. Theory missing.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Learning from Experience Basic idea Consider solving many instances of the same problem class. Can we learn from past EDA runs to solve future instances of this problem type faster? Similar to the use of prior knowledge, but in this case we automate the discovery of problem properties (instead of relying on expert knowledge). What features to learn? Model structure. Promising candidate solutions or partial solutions. Algorithm parameters. How to use the learned features? Modify/restrict algorithm parameters. Bias populations. Bias models.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Example: Probability Coincidence Matrix Probability coincidence matrix (PCM) Hauschild, Pelikan, Sastry, Goldberg (2008). Each model may contain dependency between Xi and Xj . PCM stores observed probabilities of dependencies. PCM = {pij } where i, j {1, 2, . . . , n}. pi,j = proportion of models with dependency between Xi and Xj . Example PCMMartin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Example: Probability Coincidence Matrix Using PCM for hard bias Hauschild et al. (2008). Set threshold for the minimum proportion of a dependency. Only accept dependencies occuring at least that often. Strictly disallow other dependencies. Using PCM for soft bias Hauschild and Pelikan (2009). Introduce prior probability of a model structure. Dependencies that were more likely in the past are given preference.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

(b) 24x24 Results: PCM for 32 32 2D Spin Glass 5 Execution Time Speedup 4 3 2 1 0 1.5 2 0 0.5 1 1.5 2ntage allowed Minimum edge percentage allowed (d) 32x32 (Hauschild, Pelikan, Sastry, Goldberg; 2008)edupPelikan, Mark W. Hauschild restrictions on model-building Experience in EDAs Martin with increased Prior Knowledge and Learning from for 10

Results:Hauschild for 32 32 2D Spin Glass Mark W. PCM Size Execution-time speedup pmin % Total Dep. 256 (16 16) 3.89 0.020 6.4% 324 (18 18) 4.37 0.011 8.7% 400 (20 20) 4.34 0.020 7.0% 484 (22 22) 4.61 0.010 6.3% 576 (24 24) 4.63 0.013 4.6% 676 (26 26) 4.62 0.011 4.7% 784 (28 28) 4.45 0.009 5.4% 900 (30 30) 4.93 0.005 8.1% 1024 (32 32) 4.14 0.007 5.5% Table 2: Optimal speedup and the corresponding PCM threshold pmin as well as the percentage of total possible dependencies that were considered for the 2D Ising spin glass. (Hauschild, Pelikan, Sastry, Goldberg; 2008) maximum distance of dependencies remains a challenge. If the distances are restricted too severely, the bias on the model building may be too strong to allow for sufciently complex models; this was supported also with results in Hauschild, Pelikan, Lima, and Sastry (2007). On the other hand, if the distances are not restricted sufciently, the benets of this approach may be negligible. Prior Knowledge and Learning from Experience in EDAsMartin Pelikan, Mark W. Hauschild

Example: Distance Restrictions PCM limitations Only can be applied when variables have xed function. Dependencies between specic variables are either more likely or less likely across many problem instances. Concept is dicult to scale with the number of variables. Distance restrictions Hauschild, Pelikan, Sastry, Goldberg (2008). Introduce a distance metric over problem variables such that variables at shorter distances are more likely to interact. Gather statistics of dependencies at particular distances. Decide on distance threshold to disallow some dependencies. Use distances to provide soft bias via prior distributions. Distance metrics are often straightforward, especially for additively decomposable problems.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Example: Distance Restrictions for Graph Bipartitioning Example for graph bipartitioning Given graph G = (V, E). Assign weight 1 for all edges in E. Distance given as shortest path between vertices. Unconnected vertices given distance |V |.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Example: Distance Restrictions for ADFs Distance metric for additively decomposable function Additively decomposable function (ADF): m f (X1 , . . . , Xn ) = fi (Si ) i=1 fi is ith subfunction Si is subset of variables from {X1 , . . . , Xn } Connect variables in the same subset Si for some i. Distance is shortest path between variables (if connected). Distance is n if path doesnt exist.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

(b) 20 20 Results: Distance Restrictions on 28 28 2D Spin Glass 6 Execution Time Speedup 5 4 5 6 4 3 77 88 3 99 2 10 10 11 11 2 13 12 12 14 24 1 28 0 0.8 1 0.2 0.4 0.6 0.8 1ependencies Original Ratio of Total Dependencies (Hauschild, Pelikan; 2009) (d) 28 28 Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Results: Distance Restrictions on 2D Spin Glass Biasing models in hBOA using prior knowledge Size Execution-time speedup Max Dist Allowed qmin % Total Dep. 256 (16 16) 4.2901 2 0.62 4.7% 400 (20 20) 4.9288 3 0.64 6.0% 576 (24 24) 5.2156 3 0.60 4.1% 784 (28 28) 4.9007 5 0.63 7.6% Table 3: Distance cutoff runs with their best speedups by distance as well as the percentage of total possible dependencies that were considered for 2D Ising spin glass (Hauschild, Pelikan; 2009) with dependencies restricted by the maximum distance, instances we ran experiments which was varied from 1 to the maximum distance found between any two propositions (for example, for p = 24 we ran experiments using a maximum distance from 1 to 9). For some instances with p = 1 the maximum distance was 500, indicating that there was no path between some pairs of propositions. On the tested problems, small distance restrictions (restricting to only distance 1 or 2) were sometimes too restrictive and some instances would not be solved even with extremely large population sizes (N = 512000); in these cases the results were omitted (such restrictions were not used).Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Important Challenges Challenges in learning from experience The process of selecting threshold is manual and dicult. The ideas must be applied and tested on more problem types. Theory is missing.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Another Related Idea: Model-Directed Hybridization Model-directed hybridization EDA models reveal lot about problem landscape Use this information to design advanced neighborhood structures (operators). Use this information to design problem-specic operators. Lot of successes, lot of work to be done.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Conclusions and Future Work Conclusions EDAs do a lot more than just solve the problem. EDAs give us a lot of information about the problem. EDAs allow use of prior knowledge of various forms. Yet, most EDA researchers focus on design of new EDAs and only few look at the use of EDAs beyond solving an isolated problem instance. Future work Some of the key challenges were mentioned throughout the talk. If you are interested in collaboration, talk to us.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Acknowledgments Acknowledgments NSF; NSF CAREER grant ECS-0547013. University of Missouri; High Performance Computing Collaboratory sponsored by Information Technology Services; Research Award; Research Board.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs