8/10/2019 Survey table.docx
1/7
Genetic Algorithms for Association Rule Mining: A Comparative Study
S.
No
Paper Id Objective Year Methodology Data set Sample
Size
Evaluated
By /
Parameters
Results Conclusion &
Future Work
1 RobertCattral,Franz
Oppacher,
DwightDeugo
To Implement aRAGA (RuleAcquisition
Using Genetic
Algorithm)from Large and
noisy database
1999
Rules are of varyinglength and complex
Initial Population isseeded for the purpose of
rule refinement.
Uses macro mutations toexplore space of rules.
Evolves hierarchy of
rules.
Grilledmushrooms in
Agaricus
andLepiota
family.
Vehicle
silhouett
e Dataset
8128 of 23species22
attributes
846
records
18
attributes
Support andconfidence
Produced 14 and 25rules with five testruns on 1000
instances and 4
using 7124
Accuracy ontraining dataset
between 95 to 100.
On Test data
between 62 and 71
C :Halfwaybetween GA
and GP.
Defaulthierarchy ofrules
generated.
Promotesvalidity and
integrity
2 Manish
Saggar,Ashish
KumarAgarwal,
Abhimanyu
Lad
Optimize Rules
generated byApriori
Rules withnegative
occurrences of
attributes areconsidered
2004 Binary coding of rules
Roulette wheel samplingprocedure for selection
Pm whether mutation is
required or nor and atwhat point of locus to be
mutated
Pc crossover same as
PmFitness calculated using
TP, TN, FP, FN ( TruePositive, False negative)
Syntheti
cdatabase
for theselection
of
electivesfor a
course
GA
Population100
Pc = 0.1
Pm = 0.005
Fitnessfunction
Accuracy
100%
Rules with negation
of attributes as wellas general rules
F:
Handle Otherdatabases
Complexityreduction of GA
by Distributedcomputing
3 Cunrong Li,Mingzhong
Yang.
Associationrule mining
Using GA forManufacturing
information
system Data
set.
2004 Algorithm
Encode
Select
Save gene list
Mutation and Crossover
Repeat until no new gene
list generatedDecode
Data setfrom
MIS
Support andcompatibility
set to 5%
Based on time GAis faster than
Apriori
Based on Time forfetchingknowledge Apriori
increaseshomogenously
C:
The precisionis found todecreased
slightly
The Efficiency
has increased alot
8/10/2019 Survey table.docx
2/7
8/10/2019 Survey table.docx
3/7
Wine
Features 13Classes 3
Size 178
6 Zhou Jun,
Li Shu-you,Mei Hong-
yan, LiuHai-xia
FindingImplicatingRules based on
GAFindingOptimizedRules using GA
2007 Coding using Naturalnumbers
Individual fitness
Evaluation - Strength ofimplication included
Initialization of population
Crossover
Mutation
Car
TestResults
Dataset
Populationsize 10
Mutation
Rate 0.2CrossoverRate 0.4
Fitness
threshold 0.6
Generation50
During generationbetween 1 to 200interesting rules
are discoveredfrequently for
whatever be thethreshold.
For change inthreshold from0.3.0.5,0.7 the
grade of curvebecomes from
gentle to steep
C:
The rules thatare cut out
have lowimplication
values
Findsimplicating
rules , judgeand denote
concurrentrules
7 Hua TANG,Jun LU
To discoverclassificationrules in datasets
2007
Array representation ofchromosomes
The fitness function isbased on sustaining,
creditable and inclusiveindex
Population initializationbased on entropy
Crossover operation
Mutation Operation
Rule pruning (removingirrelevant terms)
Sixdatasetsfrom
Irvine
repository
Sustainableindex 1.0
CreditableIndex 1.0
InclusiveIndex 1.0
MutationRate 0.6
CrossoverRate 0.8
Compared withCN2 and Ant-miner
Predictive
Accuracy betterthan other twomethods for all six
datasets
Simplicity of rule
discovered better
performance for all6 datasets.
Genetic minersuperior toother two
methods for
both predictionand simplicity
of rules.
8 Wenxiangdou, Jinglu
Hu, KotaroHirasawa,
Genfeng
Wu
Efficient datamining system
with quickresponse o
users andfriendly
interface
2008 Mines maximum freq set
User picks the desired
rules
Real support andconfidence scanned and
reported
The scanning done on kitem set alone and hence
fast
Singletable
produced
randoml
y with100
transacti
ons
40attributes
Min support10%
Minimum
Confidence50%
Maxfrequent
item sets
500
Response got inten seconds
whereas forApriori it is more
than 3000 seconds
Lets users choosethe real demanded
Association Rules
C :
ReducesLarger miningTime when
compared toApriori
Produces less
and real
demandedrules when
8/10/2019 Survey table.docx
4/7
compared
with AprioriF:
Could beapplied to
network datamining with
large data Combined
with othermethods for
multi-relation
data
9 Antonio
Peregrin,Miguel
Angel
Rodriguez
DistributedGeneticalgorithm for
Classification
rule extraction
2008 Multipleintercommunicatingsubpopulations
Distributed data and DLF
Central Elite poolThe Data Learning Flow
(DLF) copies the training
example with low fitness
to neighborhood
Each node is assignedwith different partitions oflearning data
UCI :
Nursery
12960
InstancesComparedwithREGAL
using 5
partitionsand 50% oftraining and
Testing
Faster and betterbehavior
ClassificationAccuracy Similar
Number of rulesGenerated is
between 60% and80% smaller
C:
EDGAR shows
considerable
speedup with nocompromise for
accuracy andquality of
classifier
10 Eghbal G.Mansoori,
MansoorJ.Zolghrdri,
Seraj
D.Katebi
Steady StateGenetic
Algorithm toextract compact
set of Fuzzyrules from
numerical data
2008 Generate fuzzy rules withone active antecedent and
determine the consequentsDivide rules into M
groups ,
Rank them based onFitness values
Choose best Q rules
Applying Geneticoperators
Repeat till fixedgenerations or nooffspring produced
11 Datasets from
IrvineMachine
Learning
Repository
Comparedwith C4.5
Sensitive to ruleevaluation method
Rule evaluationmeasure 10 found
to be optimum
ClassificationError rates are low
Outperforms c4.5
C:
Algorithm
simple andintuitive
Generates fewshort , accurate
andinterpretable
rules
Fast and couldbe applied to
high
dimensionalproblems.
8/10/2019 Survey table.docx
5/7
F :
Rule selectioncan be madedependent on
other classes
Selection of
more rules11 Xian-Jun
Shi, HongLei
GeneticAlgorithm
based approach
of miningclassification
rules from large
database
2008 Fitness function dependson Predictive accuracy,
Comprehensibility and
Interestingness
Crossover done on sameattributes if found betweenindividuals
If no same attributes
random attributes chosen
Elitist recombinationmethod of selection
Adult
NurseryDatasets
from
UCI
48842
instanceswith
15attributes
12960 with9 attributes
Mutationrate 0.05
Population
size 40
Betterclassification
Performance
F:
For validationapplication to
other domains
And furtherstudies needed.
12 Xiaoyuan
Zhu,Yongquan
Yu, XueyanGuo
Evaluationstrategys
excellence is
applied to GAevolution
process
2009 Improvement in evolutionstrategy is as
Dissimilar degree ofindividuals is judged in
colony when century is
evolved
Cross probability and
mutation are set updynamically during
evolution therebyenhancing diversity of
colony
Evolution of currentgeneration is based onLast generation
Finance
servicedata of
certaincity
2050
groupsMinimumsupport 0.3
Minimumconfidence
0.6
Generations
50
Produces partialassociation rules
after
252generationswhereas it is 850 in
traditional GA
Accelerated search
speed
F:
Could beapplied to other
domains
13 Hong Guo,Ya Zhou
GA improvedthrough
adaptivemutation rate
2009 Mutation rate is madeadaptive so as to avoid
excessive variationcausing non convergence
Databaseof
studentachieve
Minimumsupport 0.7
Populationsize 200
The algorithmbased on 0.1
support and 0.7confidence is close
C:Effective and
feasible
8/10/2019 Survey table.docx
6/7
and improving
methods ofindividual
choice
Individual based selectionmethod rather than fixed
fitness function
ment in
schoolsin recent
years
Cross overrate 0.9
Mutationrate 0.01
to actual situation
Reducesunnecessary
operation steps
14 YongWqang.
Dawu Gi,Xiuxia
Tian, JingLi
Efficient rulegenerator for
denial ofservices of
networkintrusion
detection using
Effectivefitness function
generation
2009 Fitness function modifiedto decide whether a
chromosome is right ornot for detection of
intrusion detection.
The GA operators areapplied
KDDCUP99
Dataset
Length ofchromosome
41Generations100
Generation
gap 0.9
Rules generatedare useful in
detecting intrusion
F:
More research
on ping todeath or
smurf attack,
Could beapplied to testother attacks
as teardrop,
SYN floodingetc.
15 D.Rodriguez,
J.C.Requeleme,
R.Ruiz,
J.S.Aguilar-Ruiz
Data mining
techniques tosearch for rules
with highprobability of
being defective
2009 Feature selection is
applied to attributescapable of being defective
GA applied to the reducedset to search for rules the
rules whose probability ofdefect being for high
CM1,
KC1,KC2,PC1
FromUCI
repositor
y
22 attributes Datasets are not
homogenousGenerated rules are
simpler
Generated rulescombines set of
attributes to
provide betterestimation and
explanation ofdefective modules
F:
The algorithmcould be
improved togenerate
further simplerrules
Percentage oferror could be
increased so as
to increase the
number ofinstances
16 EloyGonzales,
Shingo
Mabu, KarlaTaboada,
KaoruShimada,
KotaroHirasawa
Newevolutionary
Geneticalgorithm for
reducing
classificationrules generated
by othermethods
2009 Two GRA : directed andundirected branches
Fitness of GRA is basedon distance between rules
Genetic operators asSelection, crossover,
mutation 1 and mutation 2
applied
Classification done with
Vehicledataset and
Lympogra
phyDataset
148 recordswith 18
attributes
and 4classes
846
records, 18attributes
Population240
Generations100
Crossoverprobability
0.1
Mutation10.01
For lowerreduction rates
GRA producescomparable
accuracy with
other methods
GRA outperforms
conventionalmethods
C :
Could beintegrated withother methods
Classificationaccuracy
improves when
partial matchused in
8/10/2019 Survey table.docx
7/7