24
GENETIC ALGORITHMS AND THEIR APPLICATIONS IN DATA MINING UNIVERSITY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

Genetic algorithms in Data Mining

Embed Size (px)

DESCRIPTION

An overview of genetic algorithms and their use in Data Mining.

Citation preview

Page 1: Genetic algorithms in Data Mining

GENETIC ALGORITHMS

AND THEIR APPLICATIONS IN DATA MINING

UNIVERSITY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

Page 2: Genetic algorithms in Data Mining

GENETIC ALGORITHMS

• DARWINIAN SELECTION:Survival of the fittest Understanding a GA means understanding the simple, iterative processes

that underpin evolutionary change

EXAMPLE: finding largest divisor of a big number By applying Darwinian selection, only the best solutions will remain, thus narrowing the search space.

Page 3: Genetic algorithms in Data Mining

BIOLOGICAL BACKGROUND (BASIC CONCEPTS)

• CHROMOSOME: A set of genes. Chromosome contains the solution in form of genes.

• GENE:A part of chromosome. A gene contains a part of solution. It determines the solution. E.g. 16743 is a chromosome and 1, 6, 7, 4 and 3 are its genes.

Page 4: Genetic algorithms in Data Mining

BIOLOGICAL BACKGROUND (BASIC CONCEPTS) CONTD..

• POPULATION: • No. of individuals present with same length of chromosome.

• FITNESS:• Fitness is the value assigned to an individual. It is based on how far or

close a individual is from the solution. Greater the fitness value better the solution it contains.

Page 5: Genetic algorithms in Data Mining

BIOLOGICAL BACKGROUND (BASIC CONCEPTS) CONTD..

• FITNESS FUNCTION: Fitness function is a function which assigns fitness value to the individual. It is problem specific.

Page 6: Genetic algorithms in Data Mining

FLOWCHART OF A GENETIC ALGORITHM

Page 7: Genetic algorithms in Data Mining

SELECTION

• Selection is the stage of a genetic algorithm in which individual genomes are chosen from a population for later breeding (recombination or crossover).

• We will discuss two techniques:• Roulette Wheel Selection

• Rank Selection

Page 8: Genetic algorithms in Data Mining

ROULETTE WHEEL SELECTION

• Parents are selected according to their fitness.

• The better the chromosomes are, the more chances to be selected they have.

• Imagine a roulette wheel where are placed all chromosomes in the population, every has its place big accordingly to its fitness function.

Page 9: Genetic algorithms in Data Mining

RANK SELECTION

• Rank selection first ranks the population and then every chromosome receives fitness from this ranking.

• The worst will have fitness 1, second worst 2 etc. and the best will have fitness N(number of chromosomes in population).

Page 10: Genetic algorithms in Data Mining

OPERATORS

CROSSOVER

• Combine bits and pieces of good parents

• Speculate on new, possibly better children

• By itself, a random shuffle

Page 11: Genetic algorithms in Data Mining

BASIC CONCEPTS CONTD..

MUTATION

• Mutation is random alteration of a string

• Change a gene, small movement in the neighbourhood

• By itself, a random walk

Page 12: Genetic algorithms in Data Mining

DATA MINING

The goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Page 13: Genetic algorithms in Data Mining

EXAMPLE

OCCUPATION: POLITICATIONBELONGS TO :ABC PARTY

AIM:WANTS TO CONTEST UPCOMING ELECTIONS IN BYTELAND.PROBLEM:BUT NOT SURE FROM WHERE HE SHOULD CONTEST HIS ELECTIONS..

KNOWS DATA MINING …

COLLECTS DATA OF ALL PREVOUS STATE AND GENERAL ELECTIONS

Page 14: Genetic algorithms in Data Mining

PREPARES THE DATA MAKES IT CONSISTENT AND NOISELESS

SPLITS INTO TWO EQUAL BUT MUTUALLY EXCLUSIVE ELEMENTS, A TEST AND A TRAINING DATASET.

SETS PREDICTOR VARIABLE AS LITERACY RATE, LOCALITY, ETC. AND THE TARGET WOULD BE: WHETHER A PERSON X WILL VOTE FOR A PARTY ABC OR NOT.

Page 15: Genetic algorithms in Data Mining

Feeds the input and output variable of training data into software that run Genetic Algorithms on it

Software process the input of training data set and matches against its output E.g. After applying Genetic operators software establishes two rulesRule1:if voter X is belongs to locality A then he will vote for party ABC.Rule 2:if voter X is literate and belongs to locality A then he will vote form party ABC.

Feeds the input variable of the test data, applies the rules obtained from GA and check if the expected output matches the actual output. Keeps the rule which gets validated.

Page 16: Genetic algorithms in Data Mining

Now after applying Data mining and using Genetic Algorithms politician knows that maximum probability of him wining elections is to contest election from a constituency which have

• Maximum Number of literacy rate • And falls in locality A.

Page 17: Genetic algorithms in Data Mining

ADVANTAGES

• Concepts are easy to understand due to techniques similar to the natural processes like inheritance, mutation, etc.

• Can be used where traditional search methods fail.

• Useful where search space is large, complex or poorly understood.

Page 18: Genetic algorithms in Data Mining

ADVANTAGES CONTD..

• Provides us with several local optimums as well as the global optimum.

• Solves problems with multiple solutions.

• Genetic algorithms are easily transferred to existing simulations and problems.

Page 19: Genetic algorithms in Data Mining

LIMITATIONS• Due to poorly known fitness functions, some optimization

problems cannot be solved by Genetic algorithms. These are called Variant problems.

• There is no assurance of finding a global optimum. It happens very often when the populations have a lot of individuals.

• Like other artificial intelligence techniques, the genetic algorithm cannot assure constant optimization response times.

Page 20: Genetic algorithms in Data Mining

LIMITATIONS CONTD..

• While using genetic algorithms, it is true that the entire population is improving, but this could not be said for an individual within this population.

• Writing of fitness function must be accurate.

Page 21: Genetic algorithms in Data Mining

APPLICATIONS

• Optimization: GAs have been used in a wide variety of optimization tasks.

• Automatic Programming: for building computational structures like cellular automata and sorting networks.

• Machine and Robot Learning: used for classification and prediction, and protein structure prediction.

• Economic models: for development of bidding strategies in the emerging markets.

Page 22: Genetic algorithms in Data Mining

CONCLUSIONS

• Genetic Algorithms are easy to apply to a wide range of problems, like TSP, concept learning, etc.

• Results can be very good on some problems while rather poor on others.

• If we use mutation only, it makes the algorithm very slow, crossover makes it significantly faster.

Page 23: Genetic algorithms in Data Mining

CONCLUSIONS CONTD..

• They have applications in commercial, educational and scientific areas.

• Very useful where developer does not have precise domain expertise, because of their ability to explore and learn from their domain.

Page 24: Genetic algorithms in Data Mining

THANK YOU