Survey table.docx

Embed Size (px)

Citation preview

  • 8/10/2019 Survey table.docx

    1/7

    Genetic Algorithms for Association Rule Mining: A Comparative Study

    S.

    No

    Paper Id Objective Year Methodology Data set Sample

    Size

    Evaluated

    By /

    Parameters

    Results Conclusion &

    Future Work

    1 RobertCattral,Franz

    Oppacher,

    DwightDeugo

    To Implement aRAGA (RuleAcquisition

    Using Genetic

    Algorithm)from Large and

    noisy database

    1999

    Rules are of varyinglength and complex

    Initial Population isseeded for the purpose of

    rule refinement.

    Uses macro mutations toexplore space of rules.

    Evolves hierarchy of

    rules.

    Grilledmushrooms in

    Agaricus

    andLepiota

    family.

    Vehicle

    silhouett

    e Dataset

    8128 of 23species22

    attributes

    846

    records

    18

    attributes

    Support andconfidence

    Produced 14 and 25rules with five testruns on 1000

    instances and 4

    using 7124

    Accuracy ontraining dataset

    between 95 to 100.

    On Test data

    between 62 and 71

    C :Halfwaybetween GA

    and GP.

    Defaulthierarchy ofrules

    generated.

    Promotesvalidity and

    integrity

    2 Manish

    Saggar,Ashish

    KumarAgarwal,

    Abhimanyu

    Lad

    Optimize Rules

    generated byApriori

    Rules withnegative

    occurrences of

    attributes areconsidered

    2004 Binary coding of rules

    Roulette wheel samplingprocedure for selection

    Pm whether mutation is

    required or nor and atwhat point of locus to be

    mutated

    Pc crossover same as

    PmFitness calculated using

    TP, TN, FP, FN ( TruePositive, False negative)

    Syntheti

    cdatabase

    for theselection

    of

    electivesfor a

    course

    GA

    Population100

    Pc = 0.1

    Pm = 0.005

    Fitnessfunction

    Accuracy

    100%

    Rules with negation

    of attributes as wellas general rules

    F:

    Handle Otherdatabases

    Complexityreduction of GA

    by Distributedcomputing

    3 Cunrong Li,Mingzhong

    Yang.

    Associationrule mining

    Using GA forManufacturing

    information

    system Data

    set.

    2004 Algorithm

    Encode

    Select

    Save gene list

    Mutation and Crossover

    Repeat until no new gene

    list generatedDecode

    Data setfrom

    MIS

    Support andcompatibility

    set to 5%

    Based on time GAis faster than

    Apriori

    Based on Time forfetchingknowledge Apriori

    increaseshomogenously

    C:

    The precisionis found todecreased

    slightly

    The Efficiency

    has increased alot

  • 8/10/2019 Survey table.docx

    2/7

  • 8/10/2019 Survey table.docx

    3/7

    Wine

    Features 13Classes 3

    Size 178

    6 Zhou Jun,

    Li Shu-you,Mei Hong-

    yan, LiuHai-xia

    FindingImplicatingRules based on

    GAFindingOptimizedRules using GA

    2007 Coding using Naturalnumbers

    Individual fitness

    Evaluation - Strength ofimplication included

    Initialization of population

    Crossover

    Mutation

    Car

    TestResults

    Dataset

    Populationsize 10

    Mutation

    Rate 0.2CrossoverRate 0.4

    Fitness

    threshold 0.6

    Generation50

    During generationbetween 1 to 200interesting rules

    are discoveredfrequently for

    whatever be thethreshold.

    For change inthreshold from0.3.0.5,0.7 the

    grade of curvebecomes from

    gentle to steep

    C:

    The rules thatare cut out

    have lowimplication

    values

    Findsimplicating

    rules , judgeand denote

    concurrentrules

    7 Hua TANG,Jun LU

    To discoverclassificationrules in datasets

    2007

    Array representation ofchromosomes

    The fitness function isbased on sustaining,

    creditable and inclusiveindex

    Population initializationbased on entropy

    Crossover operation

    Mutation Operation

    Rule pruning (removingirrelevant terms)

    Sixdatasetsfrom

    Irvine

    repository

    Sustainableindex 1.0

    CreditableIndex 1.0

    InclusiveIndex 1.0

    MutationRate 0.6

    CrossoverRate 0.8

    Compared withCN2 and Ant-miner

    Predictive

    Accuracy betterthan other twomethods for all six

    datasets

    Simplicity of rule

    discovered better

    performance for all6 datasets.

    Genetic minersuperior toother two

    methods for

    both predictionand simplicity

    of rules.

    8 Wenxiangdou, Jinglu

    Hu, KotaroHirasawa,

    Genfeng

    Wu

    Efficient datamining system

    with quickresponse o

    users andfriendly

    interface

    2008 Mines maximum freq set

    User picks the desired

    rules

    Real support andconfidence scanned and

    reported

    The scanning done on kitem set alone and hence

    fast

    Singletable

    produced

    randoml

    y with100

    transacti

    ons

    40attributes

    Min support10%

    Minimum

    Confidence50%

    Maxfrequent

    item sets

    500

    Response got inten seconds

    whereas forApriori it is more

    than 3000 seconds

    Lets users choosethe real demanded

    Association Rules

    C :

    ReducesLarger miningTime when

    compared toApriori

    Produces less

    and real

    demandedrules when

  • 8/10/2019 Survey table.docx

    4/7

    compared

    with AprioriF:

    Could beapplied to

    network datamining with

    large data Combined

    with othermethods for

    multi-relation

    data

    9 Antonio

    Peregrin,Miguel

    Angel

    Rodriguez

    DistributedGeneticalgorithm for

    Classification

    rule extraction

    2008 Multipleintercommunicatingsubpopulations

    Distributed data and DLF

    Central Elite poolThe Data Learning Flow

    (DLF) copies the training

    example with low fitness

    to neighborhood

    Each node is assignedwith different partitions oflearning data

    UCI :

    Nursery

    12960

    InstancesComparedwithREGAL

    using 5

    partitionsand 50% oftraining and

    Testing

    Faster and betterbehavior

    ClassificationAccuracy Similar

    Number of rulesGenerated is

    between 60% and80% smaller

    C:

    EDGAR shows

    considerable

    speedup with nocompromise for

    accuracy andquality of

    classifier

    10 Eghbal G.Mansoori,

    MansoorJ.Zolghrdri,

    Seraj

    D.Katebi

    Steady StateGenetic

    Algorithm toextract compact

    set of Fuzzyrules from

    numerical data

    2008 Generate fuzzy rules withone active antecedent and

    determine the consequentsDivide rules into M

    groups ,

    Rank them based onFitness values

    Choose best Q rules

    Applying Geneticoperators

    Repeat till fixedgenerations or nooffspring produced

    11 Datasets from

    IrvineMachine

    Learning

    Repository

    Comparedwith C4.5

    Sensitive to ruleevaluation method

    Rule evaluationmeasure 10 found

    to be optimum

    ClassificationError rates are low

    Outperforms c4.5

    C:

    Algorithm

    simple andintuitive

    Generates fewshort , accurate

    andinterpretable

    rules

    Fast and couldbe applied to

    high

    dimensionalproblems.

  • 8/10/2019 Survey table.docx

    5/7

    F :

    Rule selectioncan be madedependent on

    other classes

    Selection of

    more rules11 Xian-Jun

    Shi, HongLei

    GeneticAlgorithm

    based approach

    of miningclassification

    rules from large

    database

    2008 Fitness function dependson Predictive accuracy,

    Comprehensibility and

    Interestingness

    Crossover done on sameattributes if found betweenindividuals

    If no same attributes

    random attributes chosen

    Elitist recombinationmethod of selection

    Adult

    NurseryDatasets

    from

    UCI

    48842

    instanceswith

    15attributes

    12960 with9 attributes

    Mutationrate 0.05

    Population

    size 40

    Betterclassification

    Performance

    F:

    For validationapplication to

    other domains

    And furtherstudies needed.

    12 Xiaoyuan

    Zhu,Yongquan

    Yu, XueyanGuo

    Evaluationstrategys

    excellence is

    applied to GAevolution

    process

    2009 Improvement in evolutionstrategy is as

    Dissimilar degree ofindividuals is judged in

    colony when century is

    evolved

    Cross probability and

    mutation are set updynamically during

    evolution therebyenhancing diversity of

    colony

    Evolution of currentgeneration is based onLast generation

    Finance

    servicedata of

    certaincity

    2050

    groupsMinimumsupport 0.3

    Minimumconfidence

    0.6

    Generations

    50

    Produces partialassociation rules

    after

    252generationswhereas it is 850 in

    traditional GA

    Accelerated search

    speed

    F:

    Could beapplied to other

    domains

    13 Hong Guo,Ya Zhou

    GA improvedthrough

    adaptivemutation rate

    2009 Mutation rate is madeadaptive so as to avoid

    excessive variationcausing non convergence

    Databaseof

    studentachieve

    Minimumsupport 0.7

    Populationsize 200

    The algorithmbased on 0.1

    support and 0.7confidence is close

    C:Effective and

    feasible

  • 8/10/2019 Survey table.docx

    6/7

    and improving

    methods ofindividual

    choice

    Individual based selectionmethod rather than fixed

    fitness function

    ment in

    schoolsin recent

    years

    Cross overrate 0.9

    Mutationrate 0.01

    to actual situation

    Reducesunnecessary

    operation steps

    14 YongWqang.

    Dawu Gi,Xiuxia

    Tian, JingLi

    Efficient rulegenerator for

    denial ofservices of

    networkintrusion

    detection using

    Effectivefitness function

    generation

    2009 Fitness function modifiedto decide whether a

    chromosome is right ornot for detection of

    intrusion detection.

    The GA operators areapplied

    KDDCUP99

    Dataset

    Length ofchromosome

    41Generations100

    Generation

    gap 0.9

    Rules generatedare useful in

    detecting intrusion

    F:

    More research

    on ping todeath or

    smurf attack,

    Could beapplied to testother attacks

    as teardrop,

    SYN floodingetc.

    15 D.Rodriguez,

    J.C.Requeleme,

    R.Ruiz,

    J.S.Aguilar-Ruiz

    Data mining

    techniques tosearch for rules

    with highprobability of

    being defective

    2009 Feature selection is

    applied to attributescapable of being defective

    GA applied to the reducedset to search for rules the

    rules whose probability ofdefect being for high

    CM1,

    KC1,KC2,PC1

    FromUCI

    repositor

    y

    22 attributes Datasets are not

    homogenousGenerated rules are

    simpler

    Generated rulescombines set of

    attributes to

    provide betterestimation and

    explanation ofdefective modules

    F:

    The algorithmcould be

    improved togenerate

    further simplerrules

    Percentage oferror could be

    increased so as

    to increase the

    number ofinstances

    16 EloyGonzales,

    Shingo

    Mabu, KarlaTaboada,

    KaoruShimada,

    KotaroHirasawa

    Newevolutionary

    Geneticalgorithm for

    reducing

    classificationrules generated

    by othermethods

    2009 Two GRA : directed andundirected branches

    Fitness of GRA is basedon distance between rules

    Genetic operators asSelection, crossover,

    mutation 1 and mutation 2

    applied

    Classification done with

    Vehicledataset and

    Lympogra

    phyDataset

    148 recordswith 18

    attributes

    and 4classes

    846

    records, 18attributes

    Population240

    Generations100

    Crossoverprobability

    0.1

    Mutation10.01

    For lowerreduction rates

    GRA producescomparable

    accuracy with

    other methods

    GRA outperforms

    conventionalmethods

    C :

    Could beintegrated withother methods

    Classificationaccuracy

    improves when

    partial matchused in

  • 8/10/2019 Survey table.docx

    7/7