Upload
rohit-saini
View
215
Download
0
Embed Size (px)
Citation preview
8/3/2019 Ga and Mycin
1/5
A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution.
This heuristic is routinely used to generate useful solutions to optimization and searchproblems.
Genetic algorithms belong to the larger class ofevolutionary algorithms (EA), which generate
solutions to optimization problems using techniques inspired by natural evolution, such as
inheritance,mutation, selection, and crossover.
Methodology
In a genetic algorithm, a population of strings (called chromosomes or the genotype of thegenome), which encode candidate solutions(called individuals, creatures, orphenotypes) to an
optimization problem, evolves toward better solutions. Traditionally, solutions are represented in
binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually startsfrom a population of randomly generated individuals and happens in generations. In each
generation, the fitness of every individual in the population is evaluated, multiple individuals are
stochastically selected from the current population (based on their fitness), and modified
(recombined and possibly randomly mutated) to form a new population. The new population isthen used in the next iteration of thealgorithm. Commonly, the algorithm terminates when either
a maximum number of generations has been produced, or a satisfactory fitness level has been
reached for the population. If the algorithm has terminated due to a maximum number ofgenerations, a satisfactory solution may or may not have been reached.
Genetic algorithms find application in bioinformatics, phylogenetics, computational science,
engineering, economics, chemistry,manufacturing,mathematics,physics and other fields.
A typical genetic algorithm requires:
1. agenetic representation of the solution domain,
2. afitness function to evaluate the solution domain.
A standard representation of the solution is as an array of bits. Arrays of other types and
structures can be used in essentially the same way. The main property that makes these genetic
representations convenient is that their parts are easily aligned due to their fixed size, whichfacilitates simple crossover operations. Variable length representations may also be used, but
crossover implementation is more complex in this case. Tree-like representations are explored in
genetic programming and graph-form representations are explored inevolutionary programming.
The fitness function is defined over the genetic representation and measures the quality of therepresented solution. The fitness function is always problem dependent. For instance, in the
knapsack problem one wants to maximize the total value of objects that can be put in a knapsackof some fixed capacity. A representation of a solution might be an array of bits, where each bitrepresents a different object, and the value of the bit (0 or 1) represents whether or not the object
is in the knapsack. Not every such representation is valid, as the size of objects may exceed the
capacity of the knapsack. The fitness of the solution is the sum of values of all objects in theknapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even
impossible to define the fitness expression; in these cases, interactive genetic algorithms are
used.
http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Problemhttp://en.wikipedia.org/wiki/Evolutionary_algorithmhttp://en.wikipedia.org/wiki/Heredityhttp://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genomehttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Computational_sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Economicshttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Physicshttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Genetic_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Knapsack_problemhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computationhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computationhttp://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Problemhttp://en.wikipedia.org/wiki/Evolutionary_algorithmhttp://en.wikipedia.org/wiki/Heredityhttp://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genomehttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Computational_sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Economicshttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Physicshttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Genetic_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Knapsack_problemhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computation8/3/2019 Ga and Mycin
2/5
Once we have the genetic representation and the fitness function defined, GA proceeds to
initialize a population of solutions randomly, then improve it through repetitive application of
mutation, crossover, inversion and selection operators.
Initialization
Initially many individual solutions are randomly generated to form an initial population. The
population size depends on the nature of the problem, but typically contains several hundreds or
thousands of possible solutions. Traditionally, the population is generated randomly, coveringthe entire range of possible solutions (the search space). Occasionally, the solutions may be
"seeded" in areas where optimal solutions are likely to be found.
Selection
During each successive generation, a proportion of the existing population is selected to breed a
new generation. Individual solutions are selected through a fitness-based process, where fitter
solutions (as measured by a fitness function) are typically more likely to be selected. Certainselection methods rate the fitness of each solution and preferentially select the best solutions.
Other methods rate only a random sample of the population, as this process may be very time-consuming.
Reproduction
The next step is to generate a second generation population of solutions from those selected
through genetic operators: crossover(also called recombination), and/ormutation.
For each new solution to be produced, a pair of "parent" solutions is selected for breeding from
the pool selected previously. By producing a "child" solution using the above methods ofcrossover and mutation, a new solution is created which typically shares many of the
characteristics of its "parents". New parents are selected for each new child, and the processcontinues until a new population of solutions of appropriate size is generated. Although
reproduction methods that are based on the use of two parents are more "biology inspired", some
research[1][2]suggests more than two "parents" are better to be used to reproduce a good quality
chromosome.
These processes ultimately result in the next generation population of chromosomes that is
different from the initial generation. Generally the average fitness will have increased by this
procedure for the population, since only the best organisms from the first generation are selected
for breeding, along with a small proportion of less fit solutions, for reasons already mentionedabove.
Although Crossover and Mutation are known as the main genetic operators, it is possible to use
other operators such as regrouping, colonization-extinction, or migration in genetic algorithms. [3]
Termination
http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Genetic_operatorhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-0http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-2http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Genetic_operatorhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-0http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-28/3/2019 Ga and Mycin
3/5
This generational process is repeated until a termination condition has been reached. Common
terminating conditions are:
A solution is found that satisfies minimum criteria
Fixed number of generations reached
Allocated budget (computation time/money) reached The highest ranking solution's fitness is reaching or has reached a plateau such that
successive iterations no longer produce better results Manual inspection
Combinations of the above
Simple generational genetic algorithm procedure:
1. Choose the initialpopulation ofindividuals
2. Evaluate the fitness of each individual in that population
3. Repeat on this generationuntil termination (time limit, sufficient fitness achieved, etc.):
1.Select the best-fit individuals forreproduction
2. Breed new individuals through crossoverand mutation operations to give birth to
offspring
3. Evaluate the individual fitness of new individuals
4. Replace least-fit population with new individuals
The building block hypothesis
Genetic algorithms are simple to implement, but their behavior is difficult to understand. In
particular it is difficult to understand why these algorithms frequently succeed at generating
solutions of high fitness when applied to practical problems. The building block hypothesis
(BBH) consists of:
1. A description of a heuristic that performs adaptation by identifying and recombining
"building blocks", i.e. low order, low defining-length schemata with above average
fitness.2. A hypothesis that a genetic algorithm performs adaptation by implicitly and efficiently
implementing this heuristic.
Goldberg describes the heuristic as follows:
"Short, low order, and highly fit schemata are sampled, recombined [crossed over], and
resampled to form strings of potentially higher fitness. In a way, by working with theseparticular schemata [the building blocks], we have reduced the complexity of our
problem; instead of building high-performance strings by trying every conceivable
combination, we construct better and better strings from the best partial solutions of pastsamplings.
"Because highly fit schemata of low defining length and low order play such an
important role in the action of genetic algorithms, we have already given them a specialname: building blocks. Just as a child creates magnificent fortresses through the
http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Individualhttp://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Reproducehttp://en.wikipedia.org/wiki/Breedhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Offspringhttp://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Individualhttp://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Reproducehttp://en.wikipedia.org/wiki/Breedhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Offspringhttp://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)8/3/2019 Ga and Mycin
4/5
arrangement of simple blocks of wood, so does a genetic algorithm seek near optimal
performance through the juxtaposition of short, low-order, high-performance schemata,
or building blocks."[4]
.MYCIN: A Quick Case Study
No course on Expert systems is complete without a discussion of Mycin. As mentioned above,Mycin was one of the earliest expert systems, and its design has strongly influenced the design of
commercial expert systems and expert system shells.
Mycin was an expert system developed at Stanford in the 1970s. Its job was to diagnose andrecommend treatment for certain blood infections. To do the diagnosis ``properly'' involves
growing cultures of the infecting organism. Unfortunately this takes around 48 hours, and if
doctors waited until this was complete their patient might be dead! So, doctors have to come up
with quick guesses about likely problems from the available data, and use these guesses toprovide a ``covering'' treatment where drugs are given which should deal with any possible
problem.
Mycin was developed partly in order to explore how human experts make these rough (but
important) guesses based on partial information. However, the problem is also a potentiallyimportant one in practical terms - there are lots of junior or non-specialised doctors who
sometimes have to make such a rough diagnosis, and if there is an expert tool available to help
them then this might allow more effective treatment to be given. In fact, Mycin was neveractually used in practice. This wasn't because of any weakness in its performance - in tests it
outperformed members of the Stanford medical school. It was as much because of ethical and
legal issues related to the use of computers in medicine - if it gives the wrong diagnosis, who doyou sue?
Anyway Mycin represented its knowledge as a set of IF-THEN rules with certainty factors. The
following is an English version of one of Mycin's rules:
IF the infection is pimary-bacteremia
AND the site of the culture is one of the sterile sites
AND the suspected portal of entry is the gastrointestinal tract
THEN there is suggestive evidence (0.7) that infection is bacteroid.
The 0.7 is roughly the certainty that the conclusion will be true given the evidence. If the
evidence is uncertain the certainties of the bits of evidence will be combined with the certainty ofthe rule to give the certainty of the conclusion.
Mycin was written in Lisp, and its rules are formally represented as Lisp expressions. The action
part of the rule could just be a conclusion about the problem being solved, or it could be an
arbitary lisp expression. This allowed great flexibility, but removed some of the modularity and
clarity of rule-based systems, so using the facility had to be used with care.
http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-3http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-38/3/2019 Ga and Mycin
5/5