Download pdf - Survey table.docx

8/10/2019 Survey table.docx

1/7

Genetic Algorithms for Association Rule Mining: A Comparative Study

S.

No

Paper Id Objective Year Methodology Data set Sample

Size

Evaluated

By /

Parameters

Results Conclusion &

Future Work

1 RobertCattral,Franz

Oppacher,

DwightDeugo

To Implement aRAGA (RuleAcquisition

Using Genetic

Algorithm)from Large and

noisy database

1999

Rules are of varyinglength and complex

Initial Population isseeded for the purpose of

rule refinement.

Uses macro mutations toexplore space of rules.

Evolves hierarchy of

rules.

Grilledmushrooms in

Agaricus

andLepiota

family.

Vehicle

silhouett

e Dataset

8128 of 23species22

attributes

846

records

18

attributes

Support andconfidence

Produced 14 and 25rules with five testruns on 1000

instances and 4

using 7124

Accuracy ontraining dataset

between 95 to 100.

On Test data

between 62 and 71

C :Halfwaybetween GA

and GP.

Defaulthierarchy ofrules

generated.

Promotesvalidity and

integrity

2 Manish

Saggar,Ashish

KumarAgarwal,

Abhimanyu

Lad

Optimize Rules

generated byApriori

Rules withnegative

occurrences of

attributes areconsidered

2004 Binary coding of rules

Roulette wheel samplingprocedure for selection

Pm whether mutation is

required or nor and atwhat point of locus to be

mutated

Pc crossover same as

PmFitness calculated using

TP, TN, FP, FN ( TruePositive, False negative)

Syntheti

cdatabase

for theselection

of

electivesfor a

course

GA

Population100

Pc = 0.1

Pm = 0.005

Fitnessfunction

Accuracy

100%

Rules with negation

of attributes as wellas general rules

F:

Handle Otherdatabases

Complexityreduction of GA

by Distributedcomputing

3 Cunrong Li,Mingzhong

Yang.

Associationrule mining

Using GA forManufacturing

information

system Data

set.

2004 Algorithm

Encode

Select

Save gene list

Mutation and Crossover

Repeat until no new gene

list generatedDecode

Data setfrom

MIS

Support andcompatibility

set to 5%

Based on time GAis faster than

Apriori

Based on Time forfetchingknowledge Apriori

increaseshomogenously

C:

The precisionis found todecreased

slightly

The Efficiency

has increased alot


2/7


3/7

Wine

Features 13Classes 3

Size 178

6 Zhou Jun,

Li Shu-you,Mei Hong-

yan, LiuHai-xia

FindingImplicatingRules based on

GAFindingOptimizedRules using GA

2007 Coding using Naturalnumbers

Individual fitness

Evaluation - Strength ofimplication included

Initialization of population

Crossover

Mutation

Car

TestResults

Dataset

Populationsize 10

Mutation

Rate 0.2CrossoverRate 0.4

Fitness

threshold 0.6

Generation50

During generationbetween 1 to 200interesting rules

are discoveredfrequently for

whatever be thethreshold.

For change inthreshold from0.3.0.5,0.7 the

grade of curvebecomes from

gentle to steep

C:

The rules thatare cut out

have lowimplication

values

Findsimplicating

rules , judgeand denote

concurrentrules

7 Hua TANG,Jun LU

To discoverclassificationrules in datasets

2007

Array representation ofchromosomes

The fitness function isbased on sustaining,

creditable and inclusiveindex

Population initializationbased on entropy

Crossover operation

Mutation Operation

Rule pruning (removingirrelevant terms)

Sixdatasetsfrom

Irvine

repository

Sustainableindex 1.0

CreditableIndex 1.0

InclusiveIndex 1.0

MutationRate 0.6

CrossoverRate 0.8

Compared withCN2 and Ant-miner

Predictive

Accuracy betterthan other twomethods for all six

datasets

Simplicity of rule

discovered better

performance for all6 datasets.

Genetic minersuperior toother two

methods for

both predictionand simplicity

of rules.

8 Wenxiangdou, Jinglu

Hu, KotaroHirasawa,

Genfeng

Wu

Efficient datamining system

with quickresponse o

users andfriendly

interface

2008 Mines maximum freq set

User picks the desired

rules

Real support andconfidence scanned and

reported

The scanning done on kitem set alone and hence

fast

Singletable

produced

randoml

y with100

transacti

ons

40attributes

Min support10%

Minimum

Confidence50%

Maxfrequent

item sets

500

Response got inten seconds

whereas forApriori it is more

than 3000 seconds

Lets users choosethe real demanded

Association Rules

C :

ReducesLarger miningTime when

compared toApriori

Produces less

and real

demandedrules when


4/7

compared

with AprioriF:

Could beapplied to

network datamining with

large data Combined

with othermethods for

multi-relation

data

9 Antonio

Peregrin,Miguel

Angel

Rodriguez

DistributedGeneticalgorithm for

Classification

rule extraction

2008 Multipleintercommunicatingsubpopulations

Distributed data and DLF

Central Elite poolThe Data Learning Flow

(DLF) copies the training

example with low fitness

to neighborhood

Each node is assignedwith different partitions oflearning data

UCI :

Nursery

12960

InstancesComparedwithREGAL

using 5

partitionsand 50% oftraining and

Testing

Faster and betterbehavior

ClassificationAccuracy Similar

Number of rulesGenerated is

between 60% and80% smaller

C:

EDGAR shows

considerable

speedup with nocompromise for

accuracy andquality of

classifier

10 Eghbal G.Mansoori,

MansoorJ.Zolghrdri,

Seraj

D.Katebi

Steady StateGenetic

Algorithm toextract compact

set of Fuzzyrules from

numerical data

2008 Generate fuzzy rules withone active antecedent and

determine the consequentsDivide rules into M

groups ,

Rank them based onFitness values

Choose best Q rules

Applying Geneticoperators

Repeat till fixedgenerations or nooffspring produced

11 Datasets from

IrvineMachine

Learning

Repository

Comparedwith C4.5

Sensitive to ruleevaluation method

Rule evaluationmeasure 10 found

to be optimum

ClassificationError rates are low

Outperforms c4.5

C:

Algorithm

simple andintuitive

Generates fewshort , accurate

andinterpretable

rules

Fast and couldbe applied to

high

dimensionalproblems.


5/7

F :

Rule selectioncan be madedependent on

other classes

Selection of

more rules11 Xian-Jun

Shi, HongLei

GeneticAlgorithm

based approach

of miningclassification

rules from large

database

2008 Fitness function dependson Predictive accuracy,

Comprehensibility and

Interestingness

Crossover done on sameattributes if found betweenindividuals

If no same attributes

random attributes chosen

Elitist recombinationmethod of selection

Adult

NurseryDatasets

from

UCI

48842

instanceswith

15attributes

12960 with9 attributes

Mutationrate 0.05

Population

size 40

Betterclassification

Performance

F:

For validationapplication to

other domains

And furtherstudies needed.

12 Xiaoyuan

Zhu,Yongquan

Yu, XueyanGuo

Evaluationstrategys

excellence is

applied to GAevolution

process

2009 Improvement in evolutionstrategy is as

Dissimilar degree ofindividuals is judged in

colony when century is

evolved

Cross probability and

mutation are set updynamically during

evolution therebyenhancing diversity of

colony

Evolution of currentgeneration is based onLast generation

Finance

servicedata of

certaincity

2050

groupsMinimumsupport 0.3

Minimumconfidence

0.6

Generations

50

Produces partialassociation rules

after

252generationswhereas it is 850 in

traditional GA

Accelerated search

speed

F:

Could beapplied to other

domains

13 Hong Guo,Ya Zhou

GA improvedthrough

adaptivemutation rate

2009 Mutation rate is madeadaptive so as to avoid

excessive variationcausing non convergence

Databaseof

studentachieve

Minimumsupport 0.7

Populationsize 200

The algorithmbased on 0.1

support and 0.7confidence is close

C:Effective and

feasible


6/7

and improving

methods ofindividual

choice

Individual based selectionmethod rather than fixed

fitness function

ment in

schoolsin recent

years

Cross overrate 0.9

Mutationrate 0.01

to actual situation

Reducesunnecessary

operation steps

14 YongWqang.

Dawu Gi,Xiuxia

Tian, JingLi

Efficient rulegenerator for

denial ofservices of

networkintrusion

detection using

Effectivefitness function

generation

2009 Fitness function modifiedto decide whether a

chromosome is right ornot for detection of

intrusion detection.

The GA operators areapplied

KDDCUP99

Dataset

Length ofchromosome

41Generations100

Generation

gap 0.9

Rules generatedare useful in

detecting intrusion

F:

More research

on ping todeath or

smurf attack,

Could beapplied to testother attacks

as teardrop,

SYN floodingetc.

15 D.Rodriguez,

J.C.Requeleme,

R.Ruiz,

J.S.Aguilar-Ruiz

Data mining

techniques tosearch for rules

with highprobability of

being defective

2009 Feature selection is

applied to attributescapable of being defective

GA applied to the reducedset to search for rules the

rules whose probability ofdefect being for high

CM1,

KC1,KC2,PC1

FromUCI

repositor

y

22 attributes Datasets are not

homogenousGenerated rules are

simpler

Generated rulescombines set of

attributes to

provide betterestimation and

explanation ofdefective modules

F:

The algorithmcould be

improved togenerate

further simplerrules

Percentage oferror could be

increased so as

to increase the

number ofinstances

16 EloyGonzales,

Shingo

Mabu, KarlaTaboada,

KaoruShimada,

KotaroHirasawa

Newevolutionary

Geneticalgorithm for

reducing

classificationrules generated

by othermethods

2009 Two GRA : directed andundirected branches

Fitness of GRA is basedon distance between rules

Genetic operators asSelection, crossover,

mutation 1 and mutation 2

applied

Classification done with

Vehicledataset and

Lympogra

phyDataset

148 recordswith 18

attributes

and 4classes

846

records, 18attributes

Population240

Generations100

Crossoverprobability

0.1

Mutation10.01

For lowerreduction rates

GRA producescomparable

accuracy with

other methods

GRA outperforms

conventionalmethods

C :

Could beintegrated withother methods

Classificationaccuracy

improves when

partial matchused in


7/7