Ga and Mycin

8/3/2019 Ga and Mycin

1/5

A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution.

This heuristic is routinely used to generate useful solutions to optimization and searchproblems.

Genetic algorithms belong to the larger class ofevolutionary algorithms (EA), which generate

solutions to optimization problems using techniques inspired by natural evolution, such as

inheritance,mutation, selection, and crossover.

Methodology

In a genetic algorithm, a population of strings (called chromosomes or the genotype of thegenome), which encode candidate solutions(called individuals, creatures, orphenotypes) to an

optimization problem, evolves toward better solutions. Traditionally, solutions are represented in

binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually startsfrom a population of randomly generated individuals and happens in generations. In each

generation, the fitness of every individual in the population is evaluated, multiple individuals are

stochastically selected from the current population (based on their fitness), and modified

(recombined and possibly randomly mutated) to form a new population. The new population isthen used in the next iteration of thealgorithm. Commonly, the algorithm terminates when either

a maximum number of generations has been produced, or a satisfactory fitness level has been

reached for the population. If the algorithm has terminated due to a maximum number ofgenerations, a satisfactory solution may or may not have been reached.

Genetic algorithms find application in bioinformatics, phylogenetics, computational science,

engineering, economics, chemistry,manufacturing,mathematics,physics and other fields.

A typical genetic algorithm requires:

1. agenetic representation of the solution domain,

2. afitness function to evaluate the solution domain.

A standard representation of the solution is as an array of bits. Arrays of other types and

structures can be used in essentially the same way. The main property that makes these genetic

representations convenient is that their parts are easily aligned due to their fixed size, whichfacilitates simple crossover operations. Variable length representations may also be used, but

crossover implementation is more complex in this case. Tree-like representations are explored in

genetic programming and graph-form representations are explored inevolutionary programming.

The fitness function is defined over the genetic representation and measures the quality of therepresented solution. The fitness function is always problem dependent. For instance, in the

knapsack problem one wants to maximize the total value of objects that can be put in a knapsackof some fixed capacity. A representation of a solution might be an array of bits, where each bitrepresents a different object, and the value of the bit (0 or 1) represents whether or not the object

is in the knapsack. Not every such representation is valid, as the size of objects may exceed the

capacity of the knapsack. The fitness of the solution is the sum of values of all objects in theknapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even

impossible to define the fitness expression; in these cases, interactive genetic algorithms are

used.
http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Problemhttp://en.wikipedia.org/wiki/Evolutionary_algorithmhttp://en.wikipedia.org/wiki/Heredityhttp://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genomehttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Computational_sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Economicshttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Physicshttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Genetic_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Knapsack_problemhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computationhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computationhttp://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Search_algorithmhttp://en.wikipedia.org/wiki/Problemhttp://en.wikipedia.org/wiki/Evolutionary_algorithmhttp://en.wikipedia.org/wiki/Heredityhttp://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Chromosome_(genetic_algorithm)http://en.wikipedia.org/wiki/Genotypehttp://en.wikipedia.org/wiki/Genomehttp://en.wikipedia.org/wiki/Candidate_solutionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Stochasticshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Computational_sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Economicshttp://en.wikipedia.org/wiki/Chemistryhttp://en.wikipedia.org/wiki/Manufacturinghttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Physicshttp://en.wikipedia.org/wiki/Genetic_representationhttp://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Genetic_programminghttp://en.wikipedia.org/wiki/Evolutionary_programminghttp://en.wikipedia.org/wiki/Knapsack_problemhttp://en.wikipedia.org/wiki/Interactive_evolutionary_computation


2/5

Once we have the genetic representation and the fitness function defined, GA proceeds to

initialize a population of solutions randomly, then improve it through repetitive application of

mutation, crossover, inversion and selection operators.

Initialization

Initially many individual solutions are randomly generated to form an initial population. The

population size depends on the nature of the problem, but typically contains several hundreds or

thousands of possible solutions. Traditionally, the population is generated randomly, coveringthe entire range of possible solutions (the search space). Occasionally, the solutions may be

"seeded" in areas where optimal solutions are likely to be found.

Selection

During each successive generation, a proportion of the existing population is selected to breed a

new generation. Individual solutions are selected through a fitness-based process, where fitter

solutions (as measured by a fitness function) are typically more likely to be selected. Certainselection methods rate the fitness of each solution and preferentially select the best solutions.

Other methods rate only a random sample of the population, as this process may be very time-consuming.

Reproduction

The next step is to generate a second generation population of solutions from those selected

through genetic operators: crossover(also called recombination), and/ormutation.

For each new solution to be produced, a pair of "parent" solutions is selected for breeding from

the pool selected previously. By producing a "child" solution using the above methods ofcrossover and mutation, a new solution is created which typically shares many of the

characteristics of its "parents". New parents are selected for each new child, and the processcontinues until a new population of solutions of appropriate size is generated. Although

reproduction methods that are based on the use of two parents are more "biology inspired", some

research[1][2]suggests more than two "parents" are better to be used to reproduce a good quality

chromosome.

These processes ultimately result in the next generation population of chromosomes that is

different from the initial generation. Generally the average fitness will have increased by this

procedure for the population, since only the best organisms from the first generation are selected

for breeding, along with a small proportion of less fit solutions, for reasons already mentionedabove.

Although Crossover and Mutation are known as the main genetic operators, it is possible to use

other operators such as regrouping, colonization-extinction, or migration in genetic algorithms. [3]

Termination
http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Genetic_operatorhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-0http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-2http://en.wikipedia.org/wiki/Selection_(genetic_algorithm)http://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Fitness_functionhttp://en.wikipedia.org/wiki/Genetic_operatorhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-0http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-1http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-2


3/5

This generational process is repeated until a termination condition has been reached. Common

terminating conditions are:

A solution is found that satisfies minimum criteria

Fixed number of generations reached

Allocated budget (computation time/money) reached The highest ranking solution's fitness is reaching or has reached a plateau such that

successive iterations no longer produce better results Manual inspection

Combinations of the above

Simple generational genetic algorithm procedure:

1. Choose the initialpopulation ofindividuals

2. Evaluate the fitness of each individual in that population

3. Repeat on this generationuntil termination (time limit, sufficient fitness achieved, etc.):

1.Select the best-fit individuals forreproduction

2. Breed new individuals through crossoverand mutation operations to give birth to

offspring

3. Evaluate the individual fitness of new individuals

4. Replace least-fit population with new individuals

The building block hypothesis

Genetic algorithms are simple to implement, but their behavior is difficult to understand. In

particular it is difficult to understand why these algorithms frequently succeed at generating

solutions of high fitness when applied to practical problems. The building block hypothesis

(BBH) consists of:

1. A description of a heuristic that performs adaptation by identifying and recombining

"building blocks", i.e. low order, low defining-length schemata with above average

fitness.2. A hypothesis that a genetic algorithm performs adaptation by implicitly and efficiently

implementing this heuristic.

Goldberg describes the heuristic as follows:

"Short, low order, and highly fit schemata are sampled, recombined [crossed over], and

resampled to form strings of potentially higher fitness. In a way, by working with theseparticular schemata [the building blocks], we have reduced the complexity of our

problem; instead of building high-performance strings by trying every conceivable

combination, we construct better and better strings from the best partial solutions of pastsamplings.

"Because highly fit schemata of low defining length and low order play such an

important role in the action of genetic algorithms, we have already given them a specialname: building blocks. Just as a child creates magnificent fortresses through the
http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Individualhttp://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Reproducehttp://en.wikipedia.org/wiki/Breedhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Offspringhttp://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Populationhttp://en.wikipedia.org/wiki/Individualhttp://en.wikipedia.org/wiki/Fitness_(biology)http://en.wikipedia.org/wiki/Generationhttp://en.wikipedia.org/wiki/Reproducehttp://en.wikipedia.org/wiki/Breedhttp://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)http://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)http://en.wikipedia.org/wiki/Offspringhttp://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Schema_(genetic_algorithms)http://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)


4/5

arrangement of simple blocks of wood, so does a genetic algorithm seek near optimal

performance through the juxtaposition of short, low-order, high-performance schemata,

or building blocks."[4]

.MYCIN: A Quick Case Study

No course on Expert systems is complete without a discussion of Mycin. As mentioned above,Mycin was one of the earliest expert systems, and its design has strongly influenced the design of

commercial expert systems and expert system shells.

Mycin was an expert system developed at Stanford in the 1970s. Its job was to diagnose andrecommend treatment for certain blood infections. To do the diagnosis ``properly'' involves

growing cultures of the infecting organism. Unfortunately this takes around 48 hours, and if

doctors waited until this was complete their patient might be dead! So, doctors have to come up

with quick guesses about likely problems from the available data, and use these guesses toprovide a ``covering'' treatment where drugs are given which should deal with any possible

problem.

Mycin was developed partly in order to explore how human experts make these rough (but

important) guesses based on partial information. However, the problem is also a potentiallyimportant one in practical terms - there are lots of junior or non-specialised doctors who

sometimes have to make such a rough diagnosis, and if there is an expert tool available to help

them then this might allow more effective treatment to be given. In fact, Mycin was neveractually used in practice. This wasn't because of any weakness in its performance - in tests it

outperformed members of the Stanford medical school. It was as much because of ethical and

legal issues related to the use of computers in medicine - if it gives the wrong diagnosis, who doyou sue?

Anyway Mycin represented its knowledge as a set of IF-THEN rules with certainty factors. The

following is an English version of one of Mycin's rules:

IF the infection is pimary-bacteremia

AND the site of the culture is one of the sterile sites

AND the suspected portal of entry is the gastrointestinal tract

THEN there is suggestive evidence (0.7) that infection is bacteroid.

The 0.7 is roughly the certainty that the conclusion will be true given the evidence. If the

evidence is uncertain the certainties of the bits of evidence will be combined with the certainty ofthe rule to give the certainty of the conclusion.

Mycin was written in Lisp, and its rules are formally represented as Lisp expressions. The action

part of the rule could just be a conclusion about the problem being solved, or it could be an

arbitary lisp expression. This allowed great flexibility, but removed some of the modularity and

clarity of rule-based systems, so using the facility had to be used with care.
http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-3http://en.wikipedia.org/wiki/Genetic_algorithm#cite_note-3


5/5

Documents

Ga and Mycin