Ga and Mycin

Embed Size (px)

Citation preview

  • 8/3/2019 Ga and Mycin

    1/5

    A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution.

    This heuristic is routinely used to generate useful solutions to optimization and searchproblems.

    Genetic algorithms belong to the larger class ofevolutionary algorithms (EA), which generate

    solutions to optimization problems using techniques inspired by natural evolution, such as

    inheritance,mutation, selection, and crossover.

    Methodology

    In a genetic algorithm, a population of strings (called chromosomes or the genotype of thegenome), which encode candidate solutions(called individuals, creatures, orphenotypes) to an

    optimization problem, evolves toward better solutions. Traditionally, solutions are represented in

    binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually startsfrom a population of randomly generated individuals and happens in generations. In each

    generation, the fitness of every individual in the population is evaluated, multiple individuals are

    stochastically selected from the current population (based on their fitness), and modified

    (recombined and possibly randomly mutated) to form a new population. The new population isthen used in the next iteration of thealgorithm. Commonly, the algorithm terminates when either

    a maximum number of generations has been produced, or a satisfactory fitness level has been

    reached for the population. If the algorithm has terminated due to a maximum number ofgenerations, a satisfactory solution may or may not have been reached.

    Genetic algorithms find application in bioinformatics, phylogenetics, computational science,

    engineering, economics, chemistry,manufacturing,mathematics,physics and other fields.

    A typical genetic algorithm requires:

    1. agenetic representation of the solution domain,

    2. afitness function to evaluate the solution domain.

    A standard representation of the solution is as an array of bits. Arrays of other types and

    structures can be used in essentially the same way. The main property that makes these genetic

    representations convenient is that their parts are easily aligned due to their fixed size, whichfacilitates simple crossover operations. Variable length representations may also be used, but

    crossover implementation is more complex in this case. Tree-like representations are explored in

    genetic programming and graph-form representations are explored inevolutionary programming.

    The fitness function is defined over the genetic representation and measures the quality of therepresented solution. The fitness function is always problem dependent. For instance, in the

    knapsack problem one wants to maximize the total value of objects that can be put in a knapsackof some fixed capacity. A representation of a solution might be an array of bits, where each bitrepresents a different object, and the value of the bit (0 or 1) represents whether or not the object

    is in the knapsack. Not every such representation is valid, as the size of objects may exceed the

    capacity of the knapsack. The fitness of the solution is the sum of values of all objects in theknapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even

    impossible to define the fitness expression; in these cases, interactive genetic algorithms are

    used.

    http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Problemhttp://en.wikipedia.org/wiki/Evolutionary_algorithmhttp://en.wikipedia.org/wiki/Heredityhttp://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genomehttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Computational_sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Economicshttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Physicshttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Genetic_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Knapsack_problemhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computationhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computationhttp://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Problemhttp://en.wikipedia.org/wiki/Evolutionary_algorithmhttp://en.wikipedia.org/wiki/Heredityhttp://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genomehttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Computational_sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Economicshttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Physicshttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Genetic_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Knapsack_problemhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computation
  • 8/3/2019 Ga and Mycin

    2/5

    Once we have the genetic representation and the fitness function defined, GA proceeds to

    initialize a population of solutions randomly, then improve it through repetitive application of

    mutation, crossover, inversion and selection operators.

    Initialization

    Initially many individual solutions are randomly generated to form an initial population. The

    population size depends on the nature of the problem, but typically contains several hundreds or

    thousands of possible solutions. Traditionally, the population is generated randomly, coveringthe entire range of possible solutions (the search space). Occasionally, the solutions may be

    "seeded" in areas where optimal solutions are likely to be found.

    Selection

    During each successive generation, a proportion of the existing population is selected to breed a

    new generation. Individual solutions are selected through a fitness-based process, where fitter

    solutions (as measured by a fitness function) are typically more likely to be selected. Certainselection methods rate the fitness of each solution and preferentially select the best solutions.

    Other methods rate only a random sample of the population, as this process may be very time-consuming.

    Reproduction

    The next step is to generate a second generation population of solutions from those selected

    through genetic operators: crossover(also called recombination), and/ormutation.

    For each new solution to be produced, a pair of "parent" solutions is selected for breeding from

    the pool selected previously. By producing a "child" solution using the above methods ofcrossover and mutation, a new solution is created which typically shares many of the

    characteristics of its "parents". New parents are selected for each new child, and the processcontinues until a new population of solutions of appropriate size is generated. Although

    reproduction methods that are based on the use of two parents are more "biology inspired", some

    research[1][2]suggests more than two "parents" are better to be used to reproduce a good quality

    chromosome.

    These processes ultimately result in the next generation population of chromosomes that is

    different from the initial generation. Generally the average fitness will have increased by this

    procedure for the population, since only the best organisms from the first generation are selected

    for breeding, along with a small proportion of less fit solutions, for reasons already mentionedabove.

    Although Crossover and Mutation are known as the main genetic operators, it is possible to use

    other operators such as regrouping, colonization-extinction, or migration in genetic algorithms. [3]

    Termination

    http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Genetic_operatorhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-0http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-2http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Genetic_operatorhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-0http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-2
  • 8/3/2019 Ga and Mycin

    3/5

    This generational process is repeated until a termination condition has been reached. Common

    terminating conditions are:

    A solution is found that satisfies minimum criteria

    Fixed number of generations reached

    Allocated budget (computation time/money) reached The highest ranking solution's fitness is reaching or has reached a plateau such that

    successive iterations no longer produce better results Manual inspection

    Combinations of the above

    Simple generational genetic algorithm procedure:

    1. Choose the initialpopulation ofindividuals

    2. Evaluate the fitness of each individual in that population

    3. Repeat on this generationuntil termination (time limit, sufficient fitness achieved, etc.):

    1.Select the best-fit individuals forreproduction

    2. Breed new individuals through crossoverand mutation operations to give birth to

    offspring

    3. Evaluate the individual fitness of new individuals

    4. Replace least-fit population with new individuals

    The building block hypothesis

    Genetic algorithms are simple to implement, but their behavior is difficult to understand. In

    particular it is difficult to understand why these algorithms frequently succeed at generating

    solutions of high fitness when applied to practical problems. The building block hypothesis

    (BBH) consists of:

    1. A description of a heuristic that performs adaptation by identifying and recombining

    "building blocks", i.e. low order, low defining-length schemata with above average

    fitness.2. A hypothesis that a genetic algorithm performs adaptation by implicitly and efficiently

    implementing this heuristic.

    Goldberg describes the heuristic as follows:

    "Short, low order, and highly fit schemata are sampled, recombined [crossed over], and

    resampled to form strings of potentially higher fitness. In a way, by working with theseparticular schemata [the building blocks], we have reduced the complexity of our

    problem; instead of building high-performance strings by trying every conceivable

    combination, we construct better and better strings from the best partial solutions of pastsamplings.

    "Because highly fit schemata of low defining length and low order play such an

    important role in the action of genetic algorithms, we have already given them a specialname: building blocks. Just as a child creates magnificent fortresses through the

    http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Individualhttp://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Reproducehttp://en.wikipedia.org/wiki/Breedhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Offspringhttp://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Individualhttp://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Reproducehttp://en.wikipedia.org/wiki/Breedhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Offspringhttp://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)
  • 8/3/2019 Ga and Mycin

    4/5

    arrangement of simple blocks of wood, so does a genetic algorithm seek near optimal

    performance through the juxtaposition of short, low-order, high-performance schemata,

    or building blocks."[4]

    .MYCIN: A Quick Case Study

    No course on Expert systems is complete without a discussion of Mycin. As mentioned above,Mycin was one of the earliest expert systems, and its design has strongly influenced the design of

    commercial expert systems and expert system shells.

    Mycin was an expert system developed at Stanford in the 1970s. Its job was to diagnose andrecommend treatment for certain blood infections. To do the diagnosis ``properly'' involves

    growing cultures of the infecting organism. Unfortunately this takes around 48 hours, and if

    doctors waited until this was complete their patient might be dead! So, doctors have to come up

    with quick guesses about likely problems from the available data, and use these guesses toprovide a ``covering'' treatment where drugs are given which should deal with any possible

    problem.

    Mycin was developed partly in order to explore how human experts make these rough (but

    important) guesses based on partial information. However, the problem is also a potentiallyimportant one in practical terms - there are lots of junior or non-specialised doctors who

    sometimes have to make such a rough diagnosis, and if there is an expert tool available to help

    them then this might allow more effective treatment to be given. In fact, Mycin was neveractually used in practice. This wasn't because of any weakness in its performance - in tests it

    outperformed members of the Stanford medical school. It was as much because of ethical and

    legal issues related to the use of computers in medicine - if it gives the wrong diagnosis, who doyou sue?

    Anyway Mycin represented its knowledge as a set of IF-THEN rules with certainty factors. The

    following is an English version of one of Mycin's rules:

    IF the infection is pimary-bacteremia

    AND the site of the culture is one of the sterile sites

    AND the suspected portal of entry is the gastrointestinal tract

    THEN there is suggestive evidence (0.7) that infection is bacteroid.

    The 0.7 is roughly the certainty that the conclusion will be true given the evidence. If the

    evidence is uncertain the certainties of the bits of evidence will be combined with the certainty ofthe rule to give the certainty of the conclusion.

    Mycin was written in Lisp, and its rules are formally represented as Lisp expressions. The action

    part of the rule could just be a conclusion about the problem being solved, or it could be an

    arbitary lisp expression. This allowed great flexibility, but removed some of the modularity and

    clarity of rule-based systems, so using the facility had to be used with care.

    http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-3http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-3
  • 8/3/2019 Ga and Mycin

    5/5