Dcmeet Second v2

Embed Size (px)

Citation preview

  • 8/4/2019 Dcmeet Second v2

    1/34

    Presented By

    K.Indira

    Under the Guidance of

    Dr. S. Kanmani,

    Professor,Department of Information Technology,

    Pondicherry Engineering College.

    1

    Mining Association Rules using OptimalGenetic Algorithm & Quantum Swarm

    intelligent PSO.

  • 8/4/2019 Dcmeet Second v2

    2/34

    2

    Objective.

    Introduction.Data Mining.Association Analysis.Limitations of the existing system.GA and PSO An Introduction.

    Existing Work.Based on GA.Based on PSO.

    Work Done So far.

    Proposed Work.

    Papers Published.

    References.

    Contents

    Execution Plan.

  • 8/4/2019 Dcmeet Second v2

    3/34

    3

    To Propose an efficient methodology formining of ARs using Optimal Genetic

    Algorithm & Quantum Swarm intelligentPSO

    Objective

  • 8/4/2019 Dcmeet Second v2

    4/34

    Extraction of interesting information orpatterns from data in large databases is knownas data mining.

    Data Mining

    4

  • 8/4/2019 Dcmeet Second v2

    5/34

    5

    Association Rules

    Find all the rulesXYwithminimum support andconfidence

    Support, s, probability that a

    transaction contains X

    Y Confidence, c, conditional

    probability that a transactionhaving X also contains Y

    Let minsup = 50%, minconf = 50%

    Freq. Pat.: Milk:3, Nuts:3, Sugar:4, Eggs:3,{Milk, Sugar}:3

    Customer

    buys sugar

    Customer

    buys both

    Customer

    buys milk

    Nuts, Eggs, Bread40

    Nuts, Coffee, Sugar , Eggs, Bread50

    Milk, Sugar, Eggs30

    Milk, Coffee, Sugar20

    Milk, Nuts, Sugar10

    Items boughtTid

    Association rules: Milk Sugar (60%, 100%) SugarMilk (60%, 75%)

  • 8/4/2019 Dcmeet Second v2

    6/34

    6

    Apriori, FP Growth Tree, clat are some of the

    popular algorithms for mining ARs.

    Traverse the database many times.

    I/O overhead, and computational complexity is more

    Cannot meet the requirements of large-scale

    database mining.

    Limitations of Existing System

  • 8/4/2019 Dcmeet Second v2

    7/34

    GA and PSO An Introduction

    Evolutionary algorithms provide robust andefficient approach in exploring large search space.

    A Genetic Algorithm (GA) is a procedure used to

    find approximate solutions to search problemsthrough the application of the principles ofevolutionary biology.

    PSOs mechanism is inspired by the social andcooperative behavior displayed by various specieslike birds, fish etc including human beings.

    7

  • 8/4/2019 Dcmeet Second v2

    8/348

    Existing Work

    Mining ARs Based on Genetic Algorithm

    Efficient Distributed Genetic Algorithm done by spatialpartitioning of the population into several semi-isolated nodes,each evolving in parallel and possibly exploring different regionsof the search space.

    Genetic algorithm without taking the minimum support and

    confidence into account. Extracts the best rules that have bestcorrelation between support and confidence

    Improved niched Pareto genetic algorithm(INPGA), selects theaccurate candidates and also saves selection time with combining

    BNPGA and SDNPGA

    GRA with a new operator, called guided mutation is introduced.GRA considers the correlation coefficient between nodes in eachindividual of GRA.

  • 8/4/2019 Dcmeet Second v2

    9/349

    Mining ARs Based on Particle Swarm Optimization

    Existing Work contd..

    A novel algorithm for association rule mining in order to improve

    computational efficiency as well as to automatically determinesuitable threshold values.

    The algorithm operates at three evolution levels where an adaptiveinertia weight is presented. The safety distance is introduced to move

    the particle through its current position, and the proximity index. Self-adaptive method to adjust the inertia weight of the velocity update

    rule based on the empirical values and negative feedback technique isintroduced ,which relieve the burden of specifying the parametersvalues.

    Combines Particle Swarm Optimization (PSO) and Genetic Algorithms(GAs) using fuzzy logic to integrate the results of both methods and forparameters tuning. The new optimization method combines theadvantages of PSO and GA to give us an improved FPSO + FGA hybrid

    approach.

  • 8/4/2019 Dcmeet Second v2

    10/3410

    Work Done so Far

    Association Rule Mining was carried out using theGenetic Algorithm in Matlab 2008a.

    Mining of Association rule was carried out using selfAdaptive Genetic algorithm using Java.

    The GA Parameters were varied and the results wererecorded for each cases.

  • 8/4/2019 Dcmeet Second v2

    11/3411

    Mining ARs using GA in Matlab 2008a.

    MethodologySelection : Tournament

    Crossover Probability : Fixed ( Tested with 3 values)

    Mutation Probability : No Mutation

    Fitness Function :

    Dataset : Lenses, Iris, Haberman fromUCI Irvine repository.

    Population : Fixed ( Tested with 3 values)

  • 8/4/2019 Dcmeet Second v2

    12/34

    12

    Flow chart of the GA

  • 8/4/2019 Dcmeet Second v2

    13/34

    Results Analysis

    No. of Instances No. of Instances * 1.25 No. of Instances *1.5Accuracy

    %

    No. ofGenerations

    Accuracy%

    No. ofGenerations

    Accuracy%

    No. ofGenerations

    Lenses 75 7 82 12 95 17Haberman 71 114 68 88 64 70Iris 77 88 87 53 82 45

    Comparison based on variation in population Size.

    Minimum Support & Minimum Confidence

    Sup = 0.4 & con=0.4

    Sup =0.9 & con =0.9 Sup = 0.9 & con =0.2

    Sup = 0.2 & con =0.9

    Accuracy%

    No. ofGen

    Accuracy%

    No. ofGen.

    Accuracy%

    No. ofGen.

    Accuracy%

    No. ofGen

    Lenses 22 20 49 11 70 21 95 18Haberman 45 68 58 83 71 90 62 75

    Iris 40 28 59 37 78 48 87 55

    Comparison based on variation in Minimum Support and Confidence

  • 8/4/2019 Dcmeet Second v2

    14/34

    14

    Cross OverPc = .25 Pc = .5 Pc = .75

    Accuracy % No. ofGenerations

    Accuracy % No. ofGenerations

    Accuracy % No. ofGenerations

    Lenses 95 8 95 16 95 13Haberman 69 77 71 83 70 80Iris 84 45 86 51 87 55

    Dataset No. of

    Instances

    No. of

    attributes

    Populatio

    n Size

    Minimum

    Support

    Minimum

    confidence

    Crossover

    rate

    Accuracy

    in %

    Lenses 24 4 36 0.2 0.9 0.25 95Haberman 306 3 306 0.9 0.2 0.5 71Iris 150 5 225 0.2 0.9 0.75 87

    Comparison of the optimum value ofParameters for maximum Accuracy achieved

    Comparison based on variation in Crossover Probability

  • 8/4/2019 Dcmeet Second v2

    15/34

    15

    Values of minimum support, minimum confidence and

    population size decides upon the accuracy of the systemthan other GA parameters.

    Crossover rate affects the convergence rate rather than the

    accuracy of the system. The optimum value of the GA parameters varies from data

    to data and the fitness function plays a major role in

    optimizing the results.

    The size of the dataset and relationship between

    attributes in data contributes to the setting up of the

    parameters.

    Inferences

  • 8/4/2019 Dcmeet Second v2

    16/34

    16

    Mining ARs using Self Adaptive GA inJava.

    MethodologySelection : Roulette Wheel

    Crossover Probability : Fixed ( Tested with 3 values)

    Mutation Probability : Self Adaptive

    Fitness Function :

    Dataset : Lenses, Iris, Car fromUCI Irvine repository.

    Population : Fixed ( Tested with 3 values)

  • 8/4/2019 Dcmeet Second v2

    17/34

    17

    Procedure SAGA

    BeginInitialize population p(k);Define the crossover and mutation rate;

    Do

    {Do{Calculate support of all k rules;

    Calculate confidence of all k rules;Obtain fitness;Select individuals for crossover / mutation;

    Calculate the average fitness of the n and (n-1) the generation;Calculate the maximum fitness of the n and (n-1) the generation;

    Based on the fitness of the selected item, calculate the new crossoverand mutation rate;Choose the operation to be performed;} k times;}

  • 8/4/2019 Dcmeet Second v2

    18/34

    Self Adaptive GA

    SELFADAPTIVE

    l l

  • 8/4/2019 Dcmeet Second v2

    19/34

    19

    Dataset Traditional GA Self Adaptive GAAccuracy No. of

    GenerationsAccuracy No. of Generations

    Lenses 75 38 87.5 35

    Haberman 52 36 68 28

    CarEvaluation

    85 29 96 21

    Dataset Traditional GA Self Adaptive GAAccuracy No. of

    GenerationsAccuracy No. of

    GenerationsLenses 50 35 87.5 35

    Haberman 36 38 68 28

    CarEvaluation 74 36 96 21

    ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS AREACCORDING TO TERMINTAION OF SAGA

    ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE IDEALFOR TRADITIONAL GA

    Results Analysis

  • 8/4/2019 Dcmeet Second v2

    20/34

    Inferences

    Better accuracy.

    Better convergence.

    Self Adaptive GA gives better accuracy than

    Traditional GA.

  • 8/4/2019 Dcmeet Second v2

    21/34

    21

    Proposed Work

    1. To implement a Distributive niched Pareto memetic

    Algorithm for Rule Mining.

    2. To propose a association rule mining algorithm basedon Chaotic PSO and swarm intelligence.

    3. Propose a Particle swarm optimization rule miningmethodology combined with quantum computing andquantum differential evolution

  • 8/4/2019 Dcmeet Second v2

    22/34

    22

    Obtains the comparison set S from clustering based samples.

    For any two candidates and comparison set S, if one candidate is

    dominated and the other not, the candidate non-dominated is

    selected, Exit.

    If two candidates (cd_1 and cd_2) compute the number of samples

    in two niches, count1 and count2. Ifcount1=0, cd_1 is selected and if count2=0, cd_2 is selected, Exit.

    If count1-count2>delta or count2-count1>delta, then selects

    cd_2 or cd_1, Exit..

    If abs(count1-count2)sd2, cd_1 is selected, otherwise, cd_2 is selected.

    Exit

    Niched Pareto Selection Algorithm

  • 8/4/2019 Dcmeet Second v2

    23/34

    23

    Distributed Model

    GA1subpopulation

    GA2subpopulation

    GA3subpopulation

    GA4subpopulation Full Dataset

    RulesGenerated

    RulesGenerated

    RulesGenerated

    RulesGenerated

    Concept

    Description

    Association Rule mining Algorithm based on Chaotic

  • 8/4/2019 Dcmeet Second v2

    24/34

    24

    Based onchaotic maps

    Association Rule mining Algorithm based on ChaoticPSO and Swarm intelligence.

    Swarm IntelligenceConcept

    E ti Pl

  • 8/4/2019 Dcmeet Second v2

    25/34

    Execution Plan

    25

    July : Niched Pareto Sampling based Selection.Implementing GA for Local intensity Search.

    August : Distributed Methodology Implementation.Preparing the Above work as a paper.

    September& : Particle Swarm Optimization basedOctober Rule Mining to be implemented.

    November : Chaotic PSO & Swarm intelligence based PSOfor Mining ARs to be implemented.Documenting the same into paper.

    December& : Study on Quantum computing and

    January differential Evolution concepts.

    P P bli h d

  • 8/4/2019 Dcmeet Second v2

    26/34

    Papers Published

    26

    Paper titled Framework for Comparison of Association RuleMining Using Genetic Algorithm has been presented in the

    International Conference On Computers, Communication &Intelligence at VCET, 2010.

    Paper titled Mining Association Rules Using GeneticAlgorithm: The role of Estimation Parameters has beenSelected for presentation in the International conference onadvances in computing and communications ,2011. To bepublished in Springer LNCS (CCIS) series.

    Paper titled Rule Acquisition in Data Mining Using a SelfAdaptive Genetic Algorithm has been Selected for

    presentation in the First International conference on ComputerScience and Information Technology (CCSEIT-2011) , To bepublished in Springer LNCS (CCIS) series.

    R f

  • 8/4/2019 Dcmeet Second v2

    27/34

    References Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On Real-

    Coded, International Conference on Biomedical Engineering andcomputer Science , Page(s): 1 - 4 , 2010

    Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discoverythrough Data Mining, IEEE Magazine on Computational Intelligence,

    Volume 5, February 2010.

    Caises, Y., Leyva, E., Gonzalez, A., Perez, R., An extension of the Genetic

    Iterative Approach for Learning Rule Subsets , 4th International Workshopon Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010

    Shangping Dai, Li Gao, Qiang Zhu, Changwu Zhu, A Novel GeneticAlgorithm Based on Image Databases for Mining Association Rules, 6thIEEE/ACIS International Conference on Computer and Information Science,

    Page(s): 977 980, 2007

    Peregrin, A., Rodriguez, M.A., Efficient Distributed Genetic Algorithm for

    Rule Extraction,. Eighth International Conference on Hybrid Intelligent

    Systems, HIS '08. Page(s): 531 536, 2008

    27

  • 8/4/2019 Dcmeet Second v2

    28/34

    28

    Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., SGERD: A Steady-StateGenetic Algorithm for Extracting Fuzzy Classification Rules From

    Data, IEEE Transactions on Fuzzy Systems, Volume: 16 , Issue: 4 ,Page(s): 1061 1071, 2008..

    Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based onEvolution Strategy and the Application in Data Mining, FirstInternational Workshop on Education Technology and Computer Science,ETCS '09, Volume: 1 , Page(s): 848852, 2009

    Hong Guo, Ya Zhou, An Algorithm for Mining Association Rules Basedon Improved Genetic Algorithm and its Application, 3rd International

    Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s):

    117120, 2009

    Genxiang Zhang, Haishan Chen, Immune Optimization Based GeneticAlgorithm for Incremental Association Rules Mining, International

    Conference on Artificial Intelligence and Computational Intelligence, AICI'09, Volume: 4, Page(s): 341345, 2009

    References Contd..

    R f

  • 8/4/2019 Dcmeet Second v2

    29/34

    29

    Maria J. Del Jesus, Jose A. Gamez, Pedro Gonzalez, Jose M. Puerta,On the Discovery of Association Rules by means of Evolutionary

    Algorithms, from Advanced Review of John Wiley & Sons , Inc. 2011 Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule

    Discovery Using the Improved Niched Pareto Genetic Algorithm,Third International Conference on Measuring Technology andMechatronics Automation, 2011.

    Hamid Reza Qodmanan, Mahdi Nasiri, Behrouz Minaei-Bidgoli,Multi Objective Association Rule Mining with Genetic Algorithmwithout specifying Minimum Support and Minimum Confidence,Expert Systems with Applications 38 (2011) 288298.

    Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, EfficientDistributed Genetic Algorithm for Rule Extraction, Applied SoftComputing 11 (2011) 733743.

    J.H. Ang, K.C. Tan , A.A. Mamun, An Evolutionary MemeticAlgorithm for Rule Extraction, Expert Systems with Applications 37

    (2010) 13021315.

    References

    R f C td

  • 8/4/2019 Dcmeet Second v2

    30/34

    R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimizationto association rule mining, Applied Soft Computing 11 (2011) 326336.

    Bilal Alatas , Erhan Akin, Multi-objective rule mining using a chaoticparticle swarm optimization algorithm, Knowledge-Based Systems 22(2009) 455460.

    Mourad Ykhlef, A Quantum Swarm Evolutionary Algorithm for miningassociation rules in large databases, Journal of King Saud University Computer and Information Sciences (2011) 23, 16.

    Haijun Su, Yupu Yang, Liang Zhao, Classification rule discovery withDE/QDE algorithm, Expert Systems with Applications 37 (2010) 12161222.

    Jing Li, Han Rui-feng, ASelf-Adaptive Genetic Algorithm Based On Real-Coded, International Conference on Biomedical Engineering andcomputer Science , Page(s): 1 - 4 , 2010

    Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discoverythrough Data Mining, IEEE Magazine on Computational Intelligence,

    Volume 5, February 2010.

    30

    References Contd..

  • 8/4/2019 Dcmeet Second v2

    31/34

    31

    Caises, Y., Leyva, E., Gonzalez, A., Perez, R., An extension of theGenetic Iterative Approach for Learning Rule Subsets , 4thInternational Workshop on Genetic and Evolutionary Fuzzy Systems,Page(s): 63 - 67 , 2010

    Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based onEvolution Strategy and the Application in Data Mining, FirstInternational Workshop on Education Technology and ComputerScience, ETCS '09, Volume: 1 , Page(s): 848 852, 2009

    References Contd..

  • 8/4/2019 Dcmeet Second v2

    32/34

    32

    References Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient

    Distributed Genetic Algorithm for Rule extraction, Applied Soft

    Computing 11 (2011) 733743.

    Hamid Reza Qodmanan , Mahdi Nasiri, Behrouz Minaei-Bidgoli,Multi objective association rule mining with genetic algorithmwithout specifying minimum support and minimum confidence,

    Expert Systems with Applications 38 (2011) 288298.

    Junli Lu, Fan Yang, Momo Li1, Lizhen Wang, Multi-objective RuleDiscovery Using the Improved Niched Pareto Genetic Algorithm, 2011Third International Conference on Measuring Technology andMechatronics Automation.

    Yan Chen, Shingo Mabu, Kotaro Hirasawa, Genetic relation algorithmwith guided mutation for the large-scale portfolio optimization,Expert Systems with Applications 38 (2011) 33533363.

    References

  • 8/4/2019 Dcmeet Second v2

    33/34

    33

    References

    R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm

    optimization to association rule mining, Applied Soft Computing 11(2011) 326336

    Yamina Mohamed Ben Ali, Soft Adaptive Particle Swarm Algorithmfor Large Scale Optimization, IEEE 2010.

    Feng Lu, Yanfeng Ge, LiQun Gao, Self-adaptive Particle SwarmOptimization Algorithm for Global Optimization, 2010 SixthInternational Conference on Natural Computation (ICNC 2010)

    Fevrier Valdez, Patricia Melin, Oscar Castillo, An improved

    evolutionary method with fuzzy logic for combining Particle SwarmOptimization and Genetic Algorithms, Applied Soft Computing 11(2011) 26252632

  • 8/4/2019 Dcmeet Second v2

    34/34

    Thank You