36
Natural Computing Michael Herrmann [email protected] phone: 0131 6 517177 Informatics Forum 1.42 INFR09038 22/10/2010 Lecture 9: Evolutionary Strategies

Natural Computing · Promising GP Application Areas Problem areas involving many variables that are interrelated in highly non-linear ways Inter-relationship of variables is not well

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Natural Computing

    Michael [email protected]: 0131 6 517177Informatics Forum 1.42

    INFR0903822/10/2010

    Lecture 9: Evolutionary Strategies

  • genotype(encoding)

    mutation/crossover

    phenotype(applied to)

    Genetic algorithm

    strings of binary or integer numbers

    e.g. 1-pointfor either one with pm, pc

    optimization or search of optimalsolutions

    Geneticprogramming

    trees (can be represented as strings)

    like GA plus additional operators

    computer programsfor a computational problem

    Evolutionary programming

    real-valued parameter vector

    mutation with self-adaptive rates

    parameters of a computer program with fixed structure

    Evolution strategy

    real-valued encoding

    mutation with self-adaptive rates

    optimization or search of optimalsolutions

    Evolutionary algorithms

  • Characteristics Suggesting the Use of GP1.Discovering the size and shape of the solution2.Reusing substructures3.Discovering a set of useful of substructures4.Discovering the nature of the hierarchical references

    among substructures5.Passing parameters to a substructure6.Discovering the type of substructures (e.g.,

    subroutines, iterations, loops, recursions, or storage)7.Discovering the number of arguments possessed by a

    substructure8.Maintaining syntactic validity and locality by means of

    a developmental process9.Discovering a general solution in the form of a

    parametrized topology containing free variables

  • Fundamental differences between GP and other approaches to AI and ML

    1. Representation: Genetic programming overtly conducts its search for a solution to the given problem in program space.

    2. Role of point-to-point transformations in the search: Genetic programming does not conduct its search by transforming a single point in the search space into another single point, but instead transforms a set of points into another set of points.

    3. Role of hill climbing in the search: Genetic programming does not rely exclusively on greedy hill climbing to conduct its search, but instead allocates a certain number of trials, in a principled way, to choices that are appear to be inferior at a given stage.

    4. Role of determinism in the search: Genetic programming conducts its search probabilistically.

    5. Role of an explicit knowledge base: None (perhaps for initialisation).6. Role of formal logic in the search: None (perhaps for editing)7. Underpinnings of the technique: Biologically inspired.

  • Promising GP Application Areas Problem areas involving many variables that are interrelated in

    highly non-linear ways Inter-relationship of variables is not well understood A good approximate solution is satisfactory

    − design, control, classification and pattern recognition, data mining, system identification and forecasting

    Discovery of the size and shape of the solution is a major part of the problem

    Areas where humans find it difficult to write programs− parallel computers, cellular automata, multi-agent strategies / distributed AI, FPGAs

    "black art" problems− synthesis of topology and sizing of analog circuits, synthesis of topology and tuning

    of controllers, quantum computing circuits, synthesis of designs for antennas Areas where you simply have no idea how to program a solution,

    but where the objective (fitness measure) is clear Problem areas where large computerized databases are

    accumulating and computerized techniques are needed to analyze the data

  • Open Questions/Research Areas• Scaling up to more complex problems and larger

    programs• Using large function and terminal sets.• How well do the evolved programs generalise?• How can we evolve nicer programs?

    • size, efficiency, correctness• What sort of problems is GP good at / not-so-

    good at?• Convergence, optimality etc.?• Relation to human-based evolutionary processes

    (e.g. wikipedia)

    • Reading: J. Koza 1990, especially pp 8–14, 27–35, 42–43 (paper linked to web page)

    • Riccardo Poli, William B Langdon, Nicholas F. McPhee (2008) A Field Guide to Genetic Programming. For free at http://www.lulu.com/content/2167025

    • see also: http://www.genetic-programming.org http://www.geneticprogramming.us

    • Outlook: Practical issues of EC

  • Cross-Domain Features Native representations are sufficient when working

    with genetic programming Genetic programming breeds “simulatability” (Koza) Genetic programming starts small and controls bloat Genetic programming frequently exploits a simulator’s

    built-in assumption of reasonableness Genetic programming engineers around existing

    patents and creates novel designs more frequently than it creates infringing solutions

    John R. Koza: GECCO 2007 Tutorial / Introduction to Genetic Programminghttp://www.genetic-programming.org

  • Overview

    1. Introduction: History2. The genetic code3. The canonical genetic algorithm4. Examples & Variants of GA5. The schema theorem6. The building block hypothesis7. Hybrid algorithms8. Multiobjective Optimization9. Genetic Programming10.Evolutionary strategies11.Differential evolution

  • Evolution strategiesNatural problem-dependent representation for search and optimisation (without “genetic” encoding)Individuals are vectors of real numbers which describe current solutions of the problemRecombination by exchange or averaging of components (but is sometimes not used)Mutation in continuous steps with adaptation of the mutation rate to account for different scales and correlations of the componentsSelection by fitness from various parent setsElitism, islands, adaptation of parameters

    1964: Ingo Rechenberg; Hans-Paul Schwefel

  • Multidimensional Mutations in ES

    A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing. Evolution Strategies

    Uncorrelated mutations Uncorrelated mutation (scaled) Correlated mutations

    Generation of offspring: y = x + N (0,C')x stands for the vector ( x1,…,xn ) describing a parentC' is the covariance matrix C after mutation of the σ valueswhere C=diag(σ, ..., σ) for uncorreleted mutations,

    C=diag(σ1, ..., σ

    n) for scaled axes or

    C=(Cij) for correlated mutations

  • Multidimensional Mutations in ES

    Off-spring vectors: xi:=m+z

    i, z

    i~N(0,C)

    Select λ best, i.e. (1,λ) - ESCorrelations among successful offspring:

    Z:=1/λ Σ zi z

    iT

    Update correlations: C:=(1-ε) C+ε ZNew state vector: m:=m+1/λ Σ z

    iSmoothes fitness fluctuations; or: m=best

  • Evolution strategies(μ , λ): selection of a set of λ children(μ + λ): selection from a set of μ

    parents and λ children(μ',λ'(μ,λ)γ): isolate the children for γ generations where each time λ children arecreated (total population is λλ'). Then the best subpopulation is selected and becomes parents (e.g. λ=μ') for the new cycle of γ generationsAnalogous: (μ'+λ'(μ, λ)γ), (μ'+λ'(μ+λ)γ), (μ',λ'(μ+λ)γ)Heuristic 1/5 rule: If less than 1/5 of the children are better than their parents then decrease size of mutations

  • http://www.bionik.tu-berlin.de/intseit2/xs2mulmo.html

    Hills are not independently distributed (hills of hills) Find a local maximum as a start state Generate 3 offspring populations (founder

    populations) that then evolve in isolation Local hill-climbing (if convergent: increase diversity

    of offspring populations) Select only highest

    population Walking process from

    peak to peak within an “ordered hill scenery” named Meta-Evolution

    Takes the role of crossover in GA

    Nested Evolution Strategy

  • ES: Conclusion A class of metaheuristic search algorithms Adaptive parameters important Relations to Gaussian adaptation Advanced ESs compare favourably to other

    metaheuristic algorithms (see www.lri.fr/~Hansen) Diversity of the population of solutions

    needs to be specifically considered See also www.scholarpedia.org/article/Evolution_strategies

  • Differential Evolution NP D-dime

  • Differential Evolution NP D-dimensional parameter vectors

    xiG; i = 1, 2, . . . , NP; G: generation counter Mutation: viG+1 = xr1G + F * (xr2G -xr3G); F in [0,2] (possible amplification of the differential variation) ri random indexes different from I (“rnbr”) Crossover:

    randb in [0,1] Selection: xiG+1=uiG+1 if uiG+1 is better, otherwise xiG+1=xiG

    Rainer Storn & Kenneth Price (1997) Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11: 341–359,

    (F is just a real number)

  • Differential Evolution

  • DE: Details Properties

    Simple, very fast Reasonably good results Diversity increases in flat regions

    (divergence property)

    Parameters NP=5D (4 … 10D) CR=0.1 (0 … 1.0) F=0.5 (0.4 …. 1.0) a proof exist that effectiveness requires

    NP

    CR

    FF 21−

    =≥ crit

  • Search in Differential Evolution

    Rainer Storn (2008) Differential Evolution Research – Trends and Open Questions. Chapter 1 of Uday K. Chakraborty: Advances in Differential Evolution

  • Objective functionused here:

  • DE with Crossover

  • Invariant representations Crossover depends

    on the coordinatedirections and is thus not rotationally invariant

    Using randomly rotated coordinatesystems the searchbecomes isotropic

  • DE with Jitter

    choose for eachvector i and for eachcoordinate j a differentrandom increment, e.g.:

  • Mutability and threshold parameters can also be evolved for each individual (as the step sizes in ES), i.e. dimension becomes D+2.

    Scheme for denoting DE variants:

    Also a number of self-adapting variants exist cf. [Storn, 08]

    DE: Variants

    e.g. best/2

  • Meta-Heuristic Search µετα “beyond”, ευρισκειν "to find“ applied mainly to combinatorial

    optimization The user has to modify the

    algorithm to a greater or lesserextend in order to adapt it tospecific problem

    These algorithms seem to defythe no-free lunch (NFL) theorem due to the combination of

    − biased choice of problems− user-generated modifications

    Can often be outperformed bya problem-dependent heuristic

  • The General Scheme1. Use populations of solutions/trials/individuals2. Transfer information in the population from the best

    individuals to others by selection+crossover/attraction

    3. Maintain diversity by adding noise/mutations/intrinsic dynamics/amplifying differences

    4. Avoid local minima (leapfrog/crossover/more noise/ subpopulations/border of instability/checking success, random insertions)

    4. Whenever possible, use building blocks/partial solutions/royal road functions

    5. Store good solutions in memory as best-so-far/iteration best/individual best/elite/pheromones

    6. Use domain knowledge and intuition for encoding, initialization, termination, choice of the algorithm

    7. Tweak the parameters, develop your own variants

  • “Banal Metaheuristic” *** in three easy steps ***

    1. Call the user-provided state generator.

    2. Print the resulting state. 3. Stop.

    Given any two distinct metaheuristics M and N, and almost any goal function f, it is usually possible to write a set of auxiliary procedures that will make M find the optimum much more efficient than N, by many orders of magnitude; or vice-versa. In fact, since the auxiliary procedures are usually unrestricted, one can submit the basic step of metaheuristic M as the generator or mutator for N.

    en.wikipedia.org/wiki/Metaheuristic

  • Contra No-free-lunch theorem implies that there must be

    some implicit assumptions that single out “good” problems (one such assumption is the correlation between goal function values at nearby candidate solutions)

    If these assumptions were made explicit more specific algorithms could be designed

    Random search often seems to be the essential component

    The quality of a ME algorithm is not well-defined because user-provided domain knowledge enters

    There are many “classical” problems which are fully understood and where ME algorithms perform comparatively poor. (LS is usually not state of the art)

    Dilettantism: A few hours of reading, thinking and programming can easily save months of computer time used up by ME en.wikipedia.org/wiki/Metaheuristic

  • Pro If you know a better solution then why using ME?

    But if not, then why not? Its not just random search There are a number of applications where ME are

    performing reasonably well Theoretical expertise, problem analysis, modeling

    and implementation are cost factors in real-world problems

    There are domains where modeling is questionable, but the combination of existing solutions is possible(minority games, e.g. esthetic design, financial markets)

    Nature is an important source of inspiration It may help to understand decision making in nature

    and society

  • Ecological niches for MH algorithms

    PSO Mini Tutorial on Particle Swarm Optimisation (2004) [email protected]

  • Some of the dimensions of the problems space

    Slide 1Evolutionary algorithmsCharacteristics Suggesting the Use of GPFundamental Differences between GP and other Approaches to AI and MLPromising GP Application AreasSlide 6Cross-Domain FeaturesSlide 8Slide 9Multidimensional mutations in ESSlide 11Slide 12Nested Evolution StrategySlide 14Differential EvolutionSlide 16Folie 5Folie 4DE: DetailsSearch in Differential EvolutionFolie 8Folie 9Folie 10DE with CrossoverInvariant representationsFolie 13DE with JitterFolie 15DE: VariantsMeta-Heuristic SearchThe General Scheme “Banal Metaheuristic” (humant colony algorithm ;-) *** in three easy steps ***ContraProEcological niches for MH algorithmsFolie 17