Genetic algorithms in Data Mining

Preview:

DESCRIPTION

An overview of genetic algorithms and their use in Data Mining.

Citation preview

GENETIC ALGORITHMS

AND THEIR APPLICATIONS IN DATA MINING

UNIVERSITY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

GENETIC ALGORITHMS

• DARWINIAN SELECTION:Survival of the fittest Understanding a GA means understanding the simple, iterative processes

that underpin evolutionary change

EXAMPLE: finding largest divisor of a big number By applying Darwinian selection, only the best solutions will remain, thus narrowing the search space.

BIOLOGICAL BACKGROUND (BASIC CONCEPTS)

• CHROMOSOME: A set of genes. Chromosome contains the solution in form of genes.

• GENE:A part of chromosome. A gene contains a part of solution. It determines the solution. E.g. 16743 is a chromosome and 1, 6, 7, 4 and 3 are its genes.

BIOLOGICAL BACKGROUND (BASIC CONCEPTS) CONTD..

• POPULATION: • No. of individuals present with same length of chromosome.

• FITNESS:• Fitness is the value assigned to an individual. It is based on how far or

close a individual is from the solution. Greater the fitness value better the solution it contains.

BIOLOGICAL BACKGROUND (BASIC CONCEPTS) CONTD..

• FITNESS FUNCTION: Fitness function is a function which assigns fitness value to the individual. It is problem specific.

FLOWCHART OF A GENETIC ALGORITHM

SELECTION

• Selection is the stage of a genetic algorithm in which individual genomes are chosen from a population for later breeding (recombination or crossover).

• We will discuss two techniques:• Roulette Wheel Selection

• Rank Selection

ROULETTE WHEEL SELECTION

• Parents are selected according to their fitness.

• The better the chromosomes are, the more chances to be selected they have.

• Imagine a roulette wheel where are placed all chromosomes in the population, every has its place big accordingly to its fitness function.

RANK SELECTION

• Rank selection first ranks the population and then every chromosome receives fitness from this ranking.

• The worst will have fitness 1, second worst 2 etc. and the best will have fitness N(number of chromosomes in population).

OPERATORS

CROSSOVER

• Combine bits and pieces of good parents

• Speculate on new, possibly better children

• By itself, a random shuffle

BASIC CONCEPTS CONTD..

MUTATION

• Mutation is random alteration of a string

• Change a gene, small movement in the neighbourhood

• By itself, a random walk

DATA MINING

The goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

EXAMPLE

OCCUPATION: POLITICATIONBELONGS TO :ABC PARTY

AIM:WANTS TO CONTEST UPCOMING ELECTIONS IN BYTELAND.PROBLEM:BUT NOT SURE FROM WHERE HE SHOULD CONTEST HIS ELECTIONS..

KNOWS DATA MINING …

COLLECTS DATA OF ALL PREVOUS STATE AND GENERAL ELECTIONS

PREPARES THE DATA MAKES IT CONSISTENT AND NOISELESS

SPLITS INTO TWO EQUAL BUT MUTUALLY EXCLUSIVE ELEMENTS, A TEST AND A TRAINING DATASET.

SETS PREDICTOR VARIABLE AS LITERACY RATE, LOCALITY, ETC. AND THE TARGET WOULD BE: WHETHER A PERSON X WILL VOTE FOR A PARTY ABC OR NOT.

Feeds the input and output variable of training data into software that run Genetic Algorithms on it

Software process the input of training data set and matches against its output E.g. After applying Genetic operators software establishes two rulesRule1:if voter X is belongs to locality A then he will vote for party ABC.Rule 2:if voter X is literate and belongs to locality A then he will vote form party ABC.

Feeds the input variable of the test data, applies the rules obtained from GA and check if the expected output matches the actual output. Keeps the rule which gets validated.

Now after applying Data mining and using Genetic Algorithms politician knows that maximum probability of him wining elections is to contest election from a constituency which have

• Maximum Number of literacy rate • And falls in locality A.

ADVANTAGES

• Concepts are easy to understand due to techniques similar to the natural processes like inheritance, mutation, etc.

• Can be used where traditional search methods fail.

• Useful where search space is large, complex or poorly understood.

ADVANTAGES CONTD..

• Provides us with several local optimums as well as the global optimum.

• Solves problems with multiple solutions.

• Genetic algorithms are easily transferred to existing simulations and problems.

LIMITATIONS• Due to poorly known fitness functions, some optimization

problems cannot be solved by Genetic algorithms. These are called Variant problems.

• There is no assurance of finding a global optimum. It happens very often when the populations have a lot of individuals.

• Like other artificial intelligence techniques, the genetic algorithm cannot assure constant optimization response times.

LIMITATIONS CONTD..

• While using genetic algorithms, it is true that the entire population is improving, but this could not be said for an individual within this population.

• Writing of fitness function must be accurate.

APPLICATIONS

• Optimization: GAs have been used in a wide variety of optimization tasks.

• Automatic Programming: for building computational structures like cellular automata and sorting networks.

• Machine and Robot Learning: used for classification and prediction, and protein structure prediction.

• Economic models: for development of bidding strategies in the emerging markets.

CONCLUSIONS

• Genetic Algorithms are easy to apply to a wide range of problems, like TSP, concept learning, etc.

• Results can be very good on some problems while rather poor on others.

• If we use mutation only, it makes the algorithm very slow, crossover makes it significantly faster.

CONCLUSIONS CONTD..

• They have applications in commercial, educational and scientific areas.

• Very useful where developer does not have precise domain expertise, because of their ability to explore and learn from their domain.

THANK YOU

Recommended