17

A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

Scientia Iranica D (2020) 27(3), 1316{1332

Sharif University of TechnologyScientia Iranica

Transactions D: Computer Science & Engineering and Electrical Engineeringhttp://scientiairanica.sharif.edu

A genetic algorithm-based framework for miningquantitative association rules without specifyingminimum support and minimum con�dence

F. Moslehi and A. Haeri�

School of Industrial Engineering, Iran University of Science & Technology, Tehran, Iran.

Received 15 May 2018; received in revised form 14 January 2019; accepted 20 May 2019

KEYWORDSMachine learning;Data mining;Quantitativeassociation rules;Multi-objectiveevolutionaryalgorithms;Genetic algorithm.

Abstract. Discovering association rules is a useful and common technique for datamining, in which relations and co-dependencies of datasets are shown. One of the mostimportant challenges of data mining is to discover the rules of continuous numericaldatasets. Furthermore, another restriction imposed by algorithms in this area is the needto determine the minimum threshold for the support and con�dence criteria. In this paper,a multi-objective algorithm for mining quantitative association rules is proposed. Theprocedure is based on the genetic algorithm, and there is no need to determine the extent ofthe threshold for the support and con�dence criteria. By proposing a multi-criteria method,useful and attractive rules and the most suitable numerical intervals are discovered, withoutthe need to discretize numerical values and determine the minimum support threshold andminimum con�dence threshold. Di�erent criteria are considered to determine appropriaterules. In this algorithm, selected rules are extracted based on con�dence, interestingness,and cosine2. The results obtained from real-world datasets demonstrate the e�ectiveness ofthe proposed approach. The algorithm is used to examine three datasets, and the resultsshow the superior performance of the proposed algorithm compared to similar algorithms.© 2020 Sharif University of Technology. All rights reserved.

1. Introduction

Association Rule Mining (ARM) is one of the mostimportant techniques of Data Mining (DM) [1]. Itshould be noted that DM consists of a set of com-putational techniques that are applied to discoverknowledge, hidden patterns, and rules obtained fromdata in various sciences [2]. Among several techniquesfor discovering meaningful knowledge from datasets,ARM has an important role [3]. ARM is a descriptive

*. Corresponding author. Tel.: +98 21 73225019;Fax: +98 21 73225098E-mail addresses: moslehi [email protected] (F.Moslehi); [email protected] (A. Haeri)

doi: 10.24200/sci.2019.51030.1969

approach that can extract interesting patterns amonga large collection of data. Mining or generating usefulrules from the dataset is done by ARM techniques [4].The technique used for extracting association rules was�rst proposed to analyze the customers' market basket,expressing the correlation among the features.

Many algorithms have been developed to identifyassociation rules from a database. However, most ofthese algorithms are designed to deal with categoricalattributes. However, the data in real-world databasesconsist of both categorical (e.g., education) and quanti-tative attributes (e.g., weight). The di�culty in dealingwith both quantitative and categorical attributes andthe problem of working with large databases are thereasons that make it complex to extract rules form bothquantitative and categorical data [5]. The majority ofthe proposed methods produce numerous laws along

Page 2: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1317

with a large number of �elds, and they cannot minethe association rules. These methods apply the ideaof dividing a range of values of all numerical �eldsinto a series of distances, called discretization [6];therefore, the information will be lost. Moreover,most of the proposed methods need to determine thethreshold limit value such as minimum support andminimum con�dence before the DM process begins.Many developed rule mining algorithms need to specifythe minimum support threshold and the minimumcon�dence threshold that cause these methods to bedependent on datasets, and they must be run severaltimes [7]; furthermore, it is di�cult to de�ne them foreach database.

The appropriate rules for the Quantitative As-sociation Rules (QAR) process are selected accordingto several criteria and objects. Multi-Objective Pro-gramming (MOP) has been one of the most commonresearch areas [8]. Due to the growing importance ofMOP, a wide diversity of Multi-Objective EvolutionaryAlgorithms (MOEAs) has been developed in di�erent�elds [9{11]. In this study, the MOEA is used tooptimize the QAR problem. This approach can leadto an optimal solution to QAR problem.

This study aims to identify both quantitative andcategorical association rules using a Multi-ObjectiveGenetic Algorithm (MOGA) approach. In the pro-posed approach, there is no need to determine theminimum support threshold and the minimum con�-dence threshold, and rules are extracted without anydiscretization of a dataset. Three criteria includingcon�dence, cosine2, and interestingness are used toevaluate the generated rules. The criterion of cosine2

in this method has been considered for selecting usefulrules in the �tness function for the �rst time. Sincethis criterion is considered to evaluate the resemblancedegree of antecedent and consequent and the �tnessfunction chooses rules based on this criterion, the rulesresulting from this method enjoy a higher con�dencelevel than other proposed methods. The main objectiveof this paper is to develop a novel framework forQuantitative Association Rule Mining (QARM) usinga Genetic Algorithm (GA), where there is no need todetermine the extent of the threshold for the supportand con�dence criteria. To do so, the criterion cosine2

is applied with the aim of increasing the number ofextracted rules and improving the support percentageand the con�dence measure.

The present research aims to answer the followingquestions:

� Will the clustering of records and the implementa-tion of the QAR algorithm improve the e�ciency ofQAR algorithm?

� Does the use of the con�dence, cosine2, and in-terestingness criteria in the �tness function of an

evolutionary algorithm (EA) lead to the extractionof attractive and useful rules from a numericaldataset?

� Can the adoption of a multi-objective �tness func-tion increase the percentage of support and con�-dence criteria of extracted rules?

This paper is organized as follows. Section 2presents a brief overview of MOEAs for mining quanti-tative association rules. Section 3 introduces the pro-posed model. Then, Section 4 describes experimentalresults. Finally, Section 5 draws the conclusions.

1.1. Related worksIn recent years, the frequent association rule hasbeen one of the interesting research topics in the�eld of descriptive DM. Therefore, a large number ofe�cient algorithms have been proposed by researchersin this �eld. Kaya and Alhajj [12] developed a hybridapproach based on GA and clustering to �nd fuzzysets used for mining meaningful association rules. Inaddition, they [13] proposed a cluster-based GA withtwo �tness functions related to minimum support andcon�dence. Alatas and Akin [14] proposed a novel EAbased on rough particle swarm optimization algorithmto solve the problem of numerical ARM. Ayubi etal. [15] developed a novel method for mining generalassociation rules from tabular data based on GA. Thismethod was improved in terms of time complexityand memory requirements. In this method, candidategeneral item sets were stored in a tree structure. Nasiriet al. [16] in 2010 applied simulated annealing as amulti-objective numerical ARM. Qodmanan et al. [17]proposed a method based on GA without taking theminimum support and con�dence into account. Thebest rules indicate that a perfect correlation betweensupport and con�dence has been drawn out from adataset by their algorithm. A new algorithm basedon Ant colony optimization was proposed by Moslehiet al. [7]. This algorithm was suggested for miningnumerical rules without determining minimum sup-port. Djenouri et al. [18] proposed an e�cient hybridalgorithm based on bees swarm optimization and Tabusearch to extract rules. Agbehadji et al. [19] applieda meta-heuristic algorithm to �nd interesting rules innumerical datasets. Wolf search algorithm techniquewas applied in their proposed approach. Can andAlatas [20], for the �rst time, applied GravitationalSearch Algorithm (GSA) to search for a quantitativeassociation rule. The results of their method havedemonstrated that GSA is an e�ective technique for�nding rules in numerical datasets.

A review of the literature reveals that thereare many researchers who have used EAs, especiallyGas [21], for �nding QARs from datasets with quan-titative values [22,23]. GENetic Association Rules

Page 3: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1318 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

(GENAR) is an EA-based approach, as proposed byMata et al. [24]. The �tness function of this algorithmwas based on support, con�dence, and cover measures.They [25] also presented an EA for extracting thefrequent item sets in numerical databases. Theirmethod was developed based on GA. They used anelastic strategy for the selection phase. Alatas andAkin [22] developed a GA to mine positive and neg-ative association quantitative rules. The tournamentselection method and uniform crossover were used inthis algorithm. In their method, there is no need todetermine the minimum support and the minimumcon�dence. QuantMiner was proposed by Salleb-Aouissi et al. [26]. They applied GA for miningquantitative association rules and de�ned the �tnessfunction based on gain measure to select appropriaterules. Taboada et al. [27] developed a hybrid methodbased on graph theory and GA called Genetic NetworkProgramming (GNP). The purpose of this algorithmis to extract rules among numerical datasets. Yan etal. [28] presented an evolutionary tool Evolutionary As-sociation Rules Mining with Genetic Algorithm (EAR-MGA) for �nding association rules in quantitativenumerical databases. The proposed GA performs an al-gorithm that does not rely upon the minimum supportand the minimum con�dence thresholds. Mart��nez-Ballesteros et al. [29] developed a new algorithm calledQARGA (Quantitative Association Rules by GeneticAlgorithm). A roulette selection method was used intheir proposed algorithm. Mart��n et al. [30] applied GAto maximize the interestingness, comprehensibility, andperformance of quantitative association rules. �Alvarezand Vazquez [5] designed a GA-based strategy (GAR-plus) for identifying association rules without the needto partition the domain of the numerical attributes.Three measures of con�dence, interestingness, andcomprehensibility as �tness functions were de�ned for�nding meaningful quantitative association rules byMinaei-Bidgoli et al. [31]. The proposed algorithm(MOGAR) draws on the Michigan approach for rep-resenting rules in chromosomes. Mart��n et al. [32]proposed QAR-CIP-NSGA-II, a mining quantitativeassociation rules system. This method is based onGA that extracts good intervals in association rules.They [33] also presented another algorithm to mineinteresting quantitative association rules by using aNiching GA. By drawing on Greedy Particle SwarmOptimization (GPSO), a hybrid mechanism was pro-posed by Indira and Kanmani [34] for the ARM.GPSO improves the ARM by integrating the uniquefeatures of both GA and PSO. The GPSO methodologyoutperformed the GA and PSO algorithms in ARMregarding predictive accuracy and consistency whentested on �ve benchmark datasets. Agarwal andNanavati [35] developed a novel hybrid ARM algorithmbased on PSO and GA algorithms. The estimation

of their method for Bakery dataset has shown thatthis algorithm is an e�ective technique for �ndingrules in non-numerical datasets. Sarkar et al. [36]proposed a new rule mining algorithm by using GA thatselects and extracts useful rules according to optimalsupport and con�dence. Djenouri et al. [37] proposeda novel parallel for ARM with the aim of reducingcomputation time in �nding appropriate rules process.This algorithm used GA that was implemented onGraphics Processing Units (GPUs) cluster. Kumar andSingh [38] proposed a mining quantitative associationrule system. This method is based on a GA thatextracts good intervals in association rules. Mart��nez-Ballesteros et al. [39] attempted to apply GA toimprove the scalability of the methods for QARM and,thus, to process large datasets without sacri�cing qual-ity. The results of the comparative analysis mirroredthe signi�cant improvements made by the proposedmethods in the computational costs of QARGA-Mand previously developed GAs without reducing thequality of the results. Padilla et al. [40] developeda new genetic programming algorithm for large dataARM. The genetic operators involved in our proposedtechnique were particularly designed to avoid highsolution complexity and any improvements in valuesof the �tness function. In 2018, a new generic parallelframework called MRQAR was proposed by Mart��n etal. [41] to unveil the QAR in large amounts of data.The MapReduce paradigm formed the basis of thisapproach along with Apache Spark. The incrementallearning process provided by MapReduce QuantitativeAssociation Rules (MRQAR) can apply any sequentialQAR algorithm to large datasets without redesigningthe algorithm.

Table 1 summarizes the applied measures for bothevaluation and optimization in several works that arereviewed in this section.

As mentioned above, the researchers have care-fully considered the ARM domain and that the appli-cation of EAs has been welcomed because they cansearch for and solve problems in this area (Table 1).Initially, Studies have focused more on the ARM inthe categorical attributes' dataset. Furthermore, sincemany datasets in the real world are characterizedby continuous numerical values, it is necessary toprovide algorithms for identifying association rulesfrom a dataset with quantitative attributes. Recently,numerous methods have been proposed in the QARarea, many of which have used the metaheuristicapproach. In the QAR area, which is the subjectof this paper, several metaheuristic algorithms havebeen used in many studies. GA is the most widelyused evolutionary technique for QAR problems dueto its longer historical background. GA has beenhighly enhanced so far, the general motivation of whichpoints to search mechanism, representation, and �tness

Page 4: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1319

Table 1. Quality measures used in the literature.

Quality measures considered

Related work Support Con�dence Recovered Comprehen.a or# attributes

Amplitude Interest Leverage

GENARp p p

GARp p p p p

Alatas & Akinp p p p p

QuantMinerp p

GNPp p

EARMGAp

QARGAp p p p p

NSGA-II-QARp p p p

GAR-plusp p p p p

MOGARp p p p

QAR-CIP-NSGA-IIp p p

NICGARp p p

GPSOp p p

Agarwal & Nanavatip p p p

Sarkar et al.p

CGPUGAp p

Kumar & Singhp p p

QARGA-Mp p p

Padilla et al.p p p

MRQARp

a: Comprehen.: Comprehensibility

function. Di�erent criteria have been applied to selectuseful rules in the works reviewed. Di�erent combi-nations of these criteria have been applied in di�erentstudies. In this study, the cosine2 measure is used forthe �tness function along with two other criteria. Ascan be seen in Table 1, this criterion has not beenused in the previous work to extract rules in QARalgorithms. A hybrid of interestingness, con�dence,and cosine2 has been adopted in the �tness functionof the proposed approach. Similarly, a combinationof these criteria is used in previous works includingMOGAR [31]. Nevertheless, the advantage of usingcosine2 is that it examines the degree of similaritybetween the antecedent and consequential extractiverules and, thus, chooses rules that are more appropriateand useful. This property has made extraction rulesmore qualitative in this study than the previous oneswith the same �tness function. The combination ofcriteria has increased the support and con�dence ofselected rules shown in the comparison results, aspresented in Section 4 of this paper.

Another point that has been highlighted in pre-vious works is the change of trend in the ARM �eldresearches into a rule-extracting algorithm from large-scale datasets. Due to a large amount of information

and the progress of data storage equipment, the datadimension is increasing. Therefore, the need to providemethods for dealing with big data is necessary morethan ever. In the QAR area, researchers have notignored this subject, and papers such as Mart��nez-Ballesteros et al. [39] and Mart��n et al. [41] have beenproposed to extract quantitative association rules inthe large-scale dataset.

2. Multi-objective evolutionary algorithms

ARM is considered to be one of the most commontechniques of DM to discover patterns and rules [42].Association rules that are applied to �nd a correlationbetween the existing set of items in a dataset arede�ned as X ! Y , where X and Y represent thesets of items and X \ Y = � [1; 32]. If the intervalis continuous, association rules are considered to bequantitative, and each item is a pair of distanceproperties [43]. The general classi�cation of di�erentmethods of association rules is shown in Figure 1.In order to discover rules in a continuous numericaldataset, di�erent techniques such as distribution anddiscretization are adopted. An exponent is used toindicate a numerical interval, which can also be used

Page 5: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1320 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

Figure 1. Numerical ARs mining approaches [46].

for the formation of rules. However, the application ofsuch techniques and the discretization of data lead tothe loss of valuable information and cause the resultingrules not to be precise enough. In these methods, thedesirability of rules is only assessed based on a singlecriterion.

Some researchers have recently removed somerestrictions of the current methods and consideredassociation rules rather than a single-objective one[44,45]. A variety of types of objectives are consideredin the process of mining association rules, resulting inmore interesting and precise rules.

The investigation of association rules in DM con-sists of a procedure used for determining the relationsbetween the properties or between their values in adataset so as to facilitate the decision-making. Theserelations can be presented in the form of IF-THEN.The condition examined in the IF section is calledAntecedent, and what the THEN section contains iscalled Consequent. Thus, the relation between the twocan be shown as X ! Y . Di�erent criteria are used tostudy the value and desirability of such rules, the mostimportant and common of which are the coe�cients ofcon�dence and those of support.

� Support: This measure indicates the number oftimes a rule has appeared in transaction data.This measure is computed through Eq. (1). Thenumber of records covered by both antecedent andconsequent of the rule is determined by the supportas a statistical measure, which is an indicator thatpresents the number of times a set of items comesalong in a dataset.

Support (X ! Y ) =SUP (XY )jDj : (1)

� Con�dence: According to con�dence, if the �rstside of a rule holds, the second side will happen,as well. Thus, it indicates the validity of a rule andis calculated as expressed in Eq. (2). The strengthof a rule is measured by the con�dence recognized aspredictive accuracy, which is a demonstration of thevalidity of a rule. The quality of a rule is assessedby the con�dence criterion based on the occurrencefrequency of an AR in the entire dataset. Whena rule occurs more frequently in the dataset, it isaccounted to be of better quality [46].

Confidence (X ! Y ) =SUP (XY )SUP (X)

: (2)

� Interestingness: This measure uses the con�denceof the antecedent and consequent to calculate theinterestingness of the rules and is described asin Eq. (3). It consists of three sides: the �rstside expresses the probability of production ruleaccording to the antecedent. The second sideexpresses the probability of creating a rule accordingto the consequent, while the third side indicatesthe probability that a rule may not be producedaccording to the whole dataset. What this criterionis that rules with a high degree of con�dence arecharacterized by a lower degree of interestingness[31]. As a part of DM, the ARM process needsto excerpt some unseen information, that is, thereshould be relatively a smaller occurrence number ofthe extracted rules in the entire database. While

Page 6: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1321

it is challenging to measure such astonishing rules,users may �nd themselves more interested in theirextraction. For classi�cation rules, this factor canbe measured by information gain theoretic measures[47], which is not of e�ective interestingness for thecalculation of ARs. This study uses a di�erentapproach according to the support count of bothantecedent and consequent parts [48]. Interesting-ness (X ! Y ) of a rule indicates a measure of therule's surprisingness to the user. The discovery ofsome unseen information from the data is the mostimportant concept in ARM. Not only are rules withhigher frequency discovered by the interestingnessof a rule, but also they are relatively less frequent inthe database [49].

Interestingness =�SUP (XY )SUP (X)

���SUP (XY )SUP (Y )

���1� SUP (XY )

SUP (D)

�: (3)

� Cosine2: The criterion Cosine2 measures the simi-larity between the consequent and antecedent. Thequantity amplitude of such a law ranges from 0 to 1,and its square is also present in this amplitude. Acosine2 value of approx. 1 suggests that the majorityof transactions holding item X also contain item Y ,and vice versa. Likewise, a cosine value of nearly 0speci�es that most transactions containing item Xdo not possess item Y , and vice versa [49]. Most ofthe transactions possessing item X also hold itemY , and vice versa (i.e., many of transactions havingitem Y contain item X, too) when cosine2 valueapproximates to one. With the approximation ofcosine2 value to zero, the majority of transactionspossess item X, but lack item Y , and vice versa [50].The square of this measure is used in this paper sinceit is more accurate [51]. It is de�ned through Eq. (4):

Cosine2 =SUP (XY )SUP (X)

� SUP (XY )SUP (C)

: (4)

2.1. Genetic Algorithm (GA)In recent years, most researchers have used EAs[14,33,52], especially GA [21,28,32], in order to discoverquantitative association rules of quantitative datasets[23]. GA is a widely renowned evolutionary approach,which randomizes the generation of a population ofstrings. The best string in the population will accessan optimal solution by applying evolutionary operatorssuch as crossover, mutation, and reproduction [53].GAs are mathematical tools with a wide range of appli-cations. They are very e�cient in the optimization ofthe problems, especially when the respective objectivefunctions are discontinuous and exhibit many localoptima [54]. GA is a heuristic algorithm of random

search, which denotes the biological evolution. Suchalgorithms have been successfully used in solving hardoptimization problems, those for which the steepestancestry techniques go through local minima or remainincapable due to the complications [55]. Processing ofnumerical dataset for getting information is a tediousand computationally complex task because of thenature of the data. ARM for a numerical dataset isa challenging task as the huge number of expectedrules will be generated. Therefore, a technique thatruns for many solutions in parallel is required so thata large number of rules can be processed. The pre-existing technique that follows these criteria is thegenetic algorithm that has the property of running forsolution in parallel and improving their own solutionin all iterations, i.e., generation in terms of GA. Giventhe features of the GA such as parent selection, muta-tion, and crossover, this algorithm passes appropriateresponses to the next generation. In the QAR process,the exploration property is required for providing anumber of answers and the exploitation property fortheir desirability. Given the steps in the GA, thisalgorithm allows for two exploration and exploitationneeds. Therefore, the GA is selected to solve the prob-lem of ARM. Based on the evolutionary ideas of naturalselection and genetic, GAs are adaptive heuristic searchalgorithms. The concept of GAs is basically intendedto mimic natural system processes essential for evo-lution. Thus, GAs manifest the intelligent utilizationof a random search within a de�ned search space forproblem solving. There is little knowledge about thefact that GAs are among the best methods for solvinga problem. As a very general algorithm, it will functionproperly in any search space. GA is a stochasticsearch algorithm, modeled on the natural selection pro-cess underlining biological evolution [56]. Numeroussearch, optimization, and machine learning problemshave successfully employed GA, working iteratively byproducing new populations of strings from the olderones. Each string is an encoded binary, real, etc.version of a candidate solution. An evaluation functionfollows the �tness criterion for each string, suggestingits �tness with the problem [57]. Because of its longerhistorical background, GA is the most commonly usedevolutionary procedure for solving QAR problems. GAhas received many improvements, of which the generalattention is given to search mechanism, representation,and �tness function.

3. Proposed method

In this paper, an MOEA is proposed for discoveringquantitative association rules (MOEA-QAR), as ex-plained in this section. This method is a combinationof the MOGAs and clustering. In this method, �rst,the dataset is clustered; then, the records of each

Page 7: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1322 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

cluster are determined as the input of each GA and,then, the attractive rules of each cluster are extracted.Characterized by a number of advantages and practi-cality, the K-means clustering algorithm is employedin the present research. The K-means algorithm isadvantageous because of its simplicity and rapidityfor low-dimensional data. It can also detect puresub-clusters when de�ning a huge number of clusters.Besides, K-means computation is considered to bequicker than hierarchical clustering and is projectedto yield tighter clusters than hierarchical clustering[58]. In the K-mean algorithm, there is a possibilityto change and select the number of clusters by theuser. This makes the K-mean algorithm exible unlikehierarchical algorithms. In this algorithm, according tothe requirements of the problem, the expert can test anumber of di�erent clusters and determine the propernumber of clusters according to the results. The K-mean algorithm allows the user to participate and isin fact an interactive process between the algorithmand the expert. In the K-means method, the recordsare grouped so that the records inside a cluster can besimilar (SSW ) and the records in the individual clustersbe di�erent (SSB). This algorithm attempts to mini-mize SSB and maximize SSW . Therefore, it providesan appropriate tradeo� for forming clusters. Since thesimilarity of the records for rule mining algorithm canbe an important advantage, the K-means method isselected for the clustering records. Because of thesebene�ts, the criteria for evaluating the rules extractedfrom each cluster are calculated by using the wholedataset. In this method, there is one archive to keep theattractive rules of each generation. Finally, the rules orchromosomes remaining in the archive are introducedas the extracted rules. This idea is similar to theelitism method in the GA that was applied to storethe best chromosomes of each generation. The �rstinput generation is selected randomly, and the initialchromosome gains a value. The algorithm is run untilthe terminating conditions are met. These conditionsmay include the number of algorithm repetitions or thegiven number of the created rules. The running of thealgorithm ends if the number of the rules existing inthe archive remains unchanged. In the following, �rst,the method of presenting chromosomes is explained;then, the mutation, crossover, and calculation of the�tness function are expressed. The pseudocode of theproposed algorithm is described in Figure 2, and the owchart of the algorithm is shown in Figure 3.

3.1. Genetic operationsLike the methods that rely on GAs, in the �rst step,the encoding or representation of the applied chromo-somes must be de�ned. Then, how to compute the�tness value of chromosome, mutation, and crossoveroperations is described.

3.1.1. Presentation of chromosomesEach chromosome shows a rule. Each feature of thedataset is presented by a gene. The ith gene expressesthe ith feature. Each gene includes 2 bits and 2numbers. The �rst 2 bits determine whether thefeature is involved in creating the rule or not and theyalso determine the section of the rule where the featureis present. The bit 00 represents the presence of thisfeature in the rule consequence; further to this, if thesetwo �elds are equal to 11, this feature is involved increating the rule. In addition, if these two bits are 01or 10, then this feature is not involved in the creationof the rule. The two other numbers represent the lowerand upper limits of the given feature interval. The �rstnumber is considered as the lower limit of the feature,and the second number is the upper limit.

Genei = (aci; lbi; ubi) ; i = 1; :::;m;

CT = Gene1Gene2::::Genem:

3.1.2. Fitness functionTo achieve the projected aim, it is required to de�nea suitable objective function that comprises a numberof parameters that are sensible enough to identify thewhole plausible structural damages [59]. As previouslymentioned, the proposed method is of multi-criteriatype, and several criteria are used to choose the gener-ation. The �tness of each chromosome is measured bythree parameters, as shown in Eq. (5). This functionattempts to extract the rules that are more valuableand attractive. In this method, the problem is solvedusing non-dominated solutions. If solution (B) is betterthan solution (A) with respect to the pre-determinedcriteria, then solution (A) is dominated by anothersolution, i.e., solution (B). In this case, solution (B)is called the non-dominated solution. The value ofeach rule is assessed by three parameters such as in-terestingness, cosine2, and con�dence coe�cient. Thechromosomes that can dominate other chromosomeswith respect to the values of these three factors willremain in the archive. The rules are ranked based onthese three factors, and the rules with greater scoreswill be extracted as the �nal rules.

Fitness = Confidence+ Interestingness

+Cosine2: (5)

The value and attractiveness of these rules arestudied based on di�erent criteria, all of which aimto excerpt bene�cial rules, with each criterion usedfor assessing the rules from a di�erent viewpoint.Although there are di�erent de�nitions of some of thesebenchmarks, they share a goal, viz. elevation and vivid-ness, both of which attempt to choose fascinating rules.Most often, it su�ces to concentrate on the integration

Page 8: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1323

Figure 2. The pseudocode of the proposed algorithm.

Figure 3. The owchart of the proposed algorithm.

of support, con�dence, and either elevation or in uenceto measure the \quality" of a rule quantitatively.However, the real value of a rule is subjective in termsof e�ectiveness and functionality and is strongly depen-dent on the speci�c area and business purposes [60].

3.1.3. Crossover and mutation operatorsRegarding the methods of presenting the chromosomesand calculating the �tness function, the crossover and

mutation operators are introduced in this section. Theposition of the features does not change at mutationand intersection stages, i.e., if a variable represents thesecond gene, it will remain the second gene after theaforementioned stages. A two-point crossover methodis used to perform the crossover operator, as shownin Figure 4. A high rate of crossover causes thegenomes in the next generation to be more random,as there will be more genomes that represent a mix of

Page 9: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1324 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

Figure 4. A simple example of the crossover operator.

previous generation genomes. A low rate of crossoverkeeps the �t genomes from the previous generation,although it decreases the chance that a very �t genomewill be produced by the crossover operation. Uniformcrossover will create genomes that will be very di�erentfrom their parents if their parents are not similar. Ifits parents are similar, the o�spring will be similar toits parents. Using the 1-point crossover means thato�spring genomes will be less diverse; they will be quitesimilar to their parents. Therefore, the trade of theseoptions is appropriate, and the two-point crossover ischosen in this paper. Both parental genotypes in atwo-point crossover are cleaved at two points, makinga new progeny through parts one and three from the�rst and the middle part from the second ancestor.A more thorough searching problem space is madepossible through the two-point crossover. The schemaremains safe against disruption by applying an operatorwith single-point and two-point crossovers; however,search space grows narrower in case of a homogeneouspopulation [61].

The �rst 2 bits of the chromosome can changeto do the mutation process, and the given featuremay be involved in creating the rule or not. The 4states mentioned for these two bits (00, 11, 10, and01) are changed into one another randomly. Also,the lower and upper limits of the values can change.Of course, one should be careful that increasing orreducing the upper and lower limits must not exceedthe permissible variables and variable limits; by thesame token, the lower unit should not exceed the upperlimit. The changes to the mutation process are maderandomly.

According the output of implementing algorithmon di�erent datasets that are shown in Figures 5 and6, the provided solutions had exploitation requirement.These �gures proved the convergence of the proposedresponses.

To meet the exploration requirement, it can be ex-pressed that all of the required changes and alterationshave been considered to generate new chromosomes.Determining the crossover, initial population, and mu-tation provided su�cient changes and the populationgeneration. To do so, all types of mutation are appliedto chromosomes including 2 bits and the lower or upperlimits of the values mutation. The �rst 2 bits of thechromosome can be modi�ed to perform the mutationprocess, and the given feature may be involved increating the rule. The four states mentioned for these

Figure 5. The output of the implementation of algorithmon basketball dataset.

Figure 6. The output of the implementation of algorithmon basketball dataset.

two bits (00, 11, 01, and 10) change into one anotherrandomly. Moreover, the lower and upper limits of thevalues can vary. Of course, one should exercise carewhen increasing or reducing the upper and lower limitsno to exceed the permissible variables and variablelimits, and also the lower unit should not exceed theupper limit. The changes in the mutation process aremade randomly. The two-point crossover method hasbeen used for the crossover operator. A high crossoverrate causes the genomes in the next generation to bemore random, as there will be more genomes thatare a mix of previous generation genomes. A lowcrossover rate keeps the �t genomes from the previous

Page 10: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1325

generation, although it decreases the chance that a very�t genome will be produced by the crossover operation.The uniform crossover will create genomes that will bevery di�erent from their parents if their parents arenot similar. If its parents are similar, the o�springwill be similar to its parents. The application of 1-point crossover means that o�spring genomes will beless diverse; they will be quite similar to their parents.Therefore, the trade of these options is appropriate,and the two-point crossover is chosen in this paper. Inthe two-point crossover, both parental genotypes aresplit at two points, constructing a new o�spring byusing parts one and three from the �rst and the middlepart from the second ancestor. The application of thetwo-point crossover will enable searching the problemspace more thoroughly. The application of single-pointand two-point crossover operators prevents schemafrom disruption; however, when population becomeshomogeneous, the search space becomes smaller [61].Moreover, the ability of the method to extract morerules and improve the results of the evaluation crite-ria (support and con�dence) in comparison with theresults of the previous work proves that the proposedalgorithm enjoys desirable performance at the point ofexploration.

4. Results

To evaluate the e�ciency of the proposed approach, thepresented algorithm was examined in three datasets,and the results were compared with those of similarmodels. Such datasets include properties with contin-uous and numerical quantities. The test data are shownin Table 2. The datasets are available at \the BilkentUniversity Function Approximation Repository" [62].The adjusted parameters in the implementation ofthe algorithm are presented in Table 3. To achievethe accurate results of transactions, the presentedalgorithm was implemented on each dataset more than10 times. The results are shown in Table 3. Thefrequency of the GA was considered between 100 and400 each time so that the results might be achieved indi�erent scenarios. A two-point crossover method wasused in this algorithm.

As mentioned before, �rst, the dataset is clusteredand, then, the useful rules of each cluster are minedby means of the GA. The number of clusters of eachdataset is selected according to the Silhouette value.

Table 3. The parameter settings for algorithmimplementation.

Parameter Populationsize

Crossoverprobability

Mutationprobability

Value 800 0.7 0.2

The number of clusters with higher Silhouette valuesis chosen as the optimal number of clusters. Theclustering results of these three datasets are presentedin Table 4.

In the following sections, the results of the pro-posed algorithm (MOEA-QAR) were compared withthose of previous studies in similar cases such as Alatasand Akin [22], RPSO [14], GAR [25], and MOGAR[31]. Mining association rules of continuous numericaldatasets using EAs is the main objective of suchstudies. The descriptions of the method used in theabove-mentioned algorithms are provided in the �rstpart of the introduction. In each study, a di�erent�tness function is de�ned to select and mine the �nalrules. The results of these methods are directly derivedfrom the papers presented by them. The developedalgorithm was implemented in two ways: once on theentire dataset and once on each cluster after clusteringthe data. The results of both implementations areprovided in the following sections. The comparison ofthe number of rules extracted by other algorithms andthose of the proposed approach is presented in Table 5.

According to the results shown in Table 5, theproposed model can detect more useful and interestingresults from datasets. The performance improvementof the proposed approach in comparison with that ofother methods in terms of the number of extractedrules is also shown. Another result presented in thistable is that implementing algorithm on clusters canextract more rules than implementing it on the wholedataset. The aim of applying the clustering approachbefore executing the QAR algorithm is to examine thedi�erence between the results of the QAR algorithmon the total dataset records and on the records ofa cluster. Since the records of a cluster are closerand more similar to the total records of the dataset,the goal is to determine whether this similarity hasan e�ect on the results of the QAR algorithm. Theresults demonstrate that such similarity and proximitycause the QAR algorithm to extract a greater numberof rules in a cluster than the whole dataset, and this

Table 2. De�nition of used datasets for comparisons.

Target feature No. of features No. of records Dataset

Basketball 96 5 Points per minuteBodyfat 225 18 Body heightQuake 2178 4 Richter

Page 11: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1326 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

Table 4. The results of three datasets clustering.

Dataset No. of records

Basketball Cluster 1: 16 records Cluster 2: 27 records Cluster 3: 20 records Cluster 4: 33 recordsBodyfat Cluster 1: 147 records Cluster 2: 105 recordsQuake Cluster 1: 1516 records Cluster 2: 662 records

Table 5. Comparison of the number of the obtained rules.

Dataset Alatas & Akin[22]

RPSO[14]

MOGAR[31]

MOEA-QAR(this study)

Total data set: Cluster 1: Cluster 2: Cluster 3: Cluster 4:Basketball 33.8 34.2 50 54.71 61 57.27 57 56.37

Total data set: Cluster 1: Cluster 2:Bodyfat 44.2 46.4 84 84.33 84.4 84.66

Total data set: Cluster 1: Cluster 2:Quake 43.8 46.4 44.87 54.28 66.16 57.4

superiority is shown in Figures 7{9. As is clear in these�gures, the number of extracted rules from the totaldataset is shown by the blue columns, the number ofextracted rules from each cluster is determined by theyellow columns, and the green columns illustrate thetotal number of rules from all clusters. Whereas thenumber of records in a cluster is less than the wholedataset, the percentage of support rules decreases.

Table 6 shows the results of con�dence coe�cient,and Table 7 shows the support percentage obtained by

Figure 7. Comparison of the number of extracted rulesfrom the total bodyfat dataset and clusters.

Figure 8. Comparison of the number of extracted rulesfrom the total basketball dataset and clusters.

Figure 9. Comparison of the number of extracted rulesfrom the total quake dataset and clusters.

implementing the proposed algorithm along with theresults obtained from previous studies. The averages ofthese coe�cients are presented in the above-mentionedtables. As Table 6 shows, this method can signi�cantlyimprove the con�dence of rules. The rules extracted bythis method had a higher con�dence coe�cient thanthe previous methods. In the mode of implementingalgorithm, support percentage would be very highfor extracting the rules. In this case, the supportpercentage of rules of DM via this method also showeda signi�cant di�erence compared to previous methods,and there was an improvement. Based on the obtainedresults from the two di�erent modes of algorithm, itcan be mentioned that the rules extracted from thewhole dataset had higher support percentage thanthose of clusters. The improvement of results in termsof support and con�dence indicated that the resultsobtained by the proposed method were more reliableand that the algorithm used to discover associationrules outperformed other methods.

The criterion of interestingness in Table 8, whichis used in this study, is compared with MOGARmethod in the �tness function. Based on the results

Page 12: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1327

Table 6. Comparison of con�dence (%) of the obtained rules.

Dataset Alatas & Akin[22]

RPSO[14]

MOGAR[31]

MOEA-QAR(this study)

Total data set: Cluster 1: Cluster 2: Cluster 3: Cluster 4:Basketball 60 6 83 98 97.1 97.7 96 97.6

Total data set: Cluster 1: Cluster 2:Bodyfat 59 61 85 99.4 99.4 99.4

Total data set: Cluster 1: Cluster 2:Quake 62 63 82 99 99 99

Table 7. Comparison of support (%) of the obtained rules.

Dataset Alatas & Akin[22]

RPSO[14]

GAR[25]

MOGAR[31]

MOEA-QAR(this study)

Total data set: Cluster 1: Cluster 2: Cluster 3: Cluster 4:

Basketball 32.21 36.44 36.69 50.28 83.11 34.27 43.34 27.27 44.69

Total data set: Cluster 1: Cluster 2:

Bodyfat 63.29 65.22 65.22 65.26 86.57 76.70 71.34

Total data set: Cluster 1: Cluster 2:

Quake 38.74 38.74 36.96 30.12 60.11 59.8 40.12

Table 8. Results of interestingness criteria.

Dataset MOGAR MOEA-QAR (this study)

Basketball 0.53 Total data set: Cluster 1: Cluster 2: Cluster 3: Cluster 4:0.213 0.182 0.119 0.071 0.206

Bodyfat 0.56 Total data set: Cluster 1: Cluster 2:0.209 0.230 0.173

Quake 0.46 Total data set: Cluster 1: Cluster 2:0.173 0.161 0.151

Table 9. Results of cosine2 of the obtained rules.

Dataset MOEA-QAR (this study)

Basketball Total data set: Cluster 1: Cluster 2: Cluster 3: Cluster 4:0.93 0.497 0.49 0.309 0.584

Bodyfat Total data set: Cluster 1: Cluster 2:0.882 0.827 0.819

Quake Total data set: Cluster 1: Cluster 2:0.61 0.615 0.485

obtained from Table 8, the rules extracted by MOGARmethod are characterized by higher interestingness. Assaid before, regarding the de�nition of criterion of in-terestingness, the rules with higher support percentagewill have lower interestingness, and since the rulesexamined by the proposed method in this study havehigher support percentage than MOGAR method, theyhave a lower interestingness coe�cient. The results ofcosine2 criterion are presented in Table 9. Since this

criterion has not been used in algorithms presented sofar, it has not been compared to other methods.

Regarding the results of the proposed algorithm,it could be said that the proposed method outper-formed the proposed algorithm at the same tasks, andit would be an appropriate and trustworthy algorithmfor �nding useful and attractive rules on the sets ofcontinuous numerical data. In comparison to othermethods, this method achieved signi�cant improve-

Page 13: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1328 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

Table 10. The improvement of MOEA-QAR compared to other methods.

Previous methods No. of rules Con�dence support%

Alatas & Akin [22] 35.03% 38.93% 41.22%RPSO [14] 32.32% 37.92% 38.78%

MOGAR [31] 8.77% 15.65% 38%

ments with respect to the number of the rules extractedfrom the dataset, high con�dence measure, and supportpercentage. Table 10 shows the average improvementsof di�erent criteria resulting from the rules extractedby the proposed method in the examined dataset inrelation to the compared methods. In this table, thepercentage of progression of the proposed method tothe comparative methods is mentioned in terms of thecriteria considered. The superiority of the proposedmethod is surveyed based on the following perspectives:the number of extracted rules, the support percentage,and the con�dence measure. These criteria are themain and basic factors in evaluating ARM algorithmsand are used to prove the e�ciency of the proposedapproach. These factors were used in all of the previousworks, some of which are mentioned in the \relatedworks" section. The highest value of improvementbelongs to the criterion of con�dence. This methodcan extract more rules with higher support percentagecon�dence coe�cient in relation to other methods ofsets of continuous numerical data. The criterion ofcosine2 has not been used in previous methods. Sincethis criterion evaluates the degree of resemblance ofantecedent and consequent and �tness function choosesrules based on this criterion, this criterion has positiveimpact on the results of extracted rules in terms ofcon�dence measure.

5. Discussion

In this paper, a novel QAR adopts GA algorithm. The�tness function is de�ned based on Con�dence, cosine2,and Interestingness criteria. The e�ciency of the algo-rithms proposed is assessed and compared regardingthree measures (the number of rules, con�dence, andsupport). The proposed method is compared withthose in previous works (i.e., Alatas and Akin [22],RPSO [14], GAR [25], and MOGAR [31]) that aregenerated using GA to evaluate the e�ectiveness ofthe proposed algorithm. As mentioned before, thisalgorithm was implemented in two separate ways: oneway on the entire dataset and the other on the clustereddataset. In the latter case, more rules are identi�edfrom the dataset. The aim of using the clusteringapproach before implementing the QAR algorithm isto examine the di�erence between the results of theQAR algorithm on the total dataset records and thoseon the records of a cluster. Since the records of

a cluster are closer and more similar to the totalrecords of the dataset, the goal is to determine whetherthis similarity has an e�ect on the results of theQAR algorithm. The results demonstrated that thissimilarity and proximity caused the QAR algorithm toextract a greater number of rules in a cluster than thewhole dataset. Whereas the number of records in acluster is less than the whole dataset, the percentageof support rules decreases. The average number ofrules extracted from di�erent methods is comparedand shown in Table 5, indicating that the presentedmethod is capable of extracting further rules thanother approaches. Comparisons between the MOEA-QAR and the aforementioned approaches regardingthe average value of support criterion and the averagecon�dence of ARs are shown in Tables 6 and 7; theseare considered to be the power of ARs. In comparisonto other approaches, MOEA-QAR absolutely rivals thecon�dence and support of ARs. The MOEA-QAR-extracted con�dence of ARs has substantially enhancedthe concerning con�dence and support of the rulesextracted in all of the datasets.

This study characterizes the problem of miningnumerical ARs as a multi-objective optimization prob-lem. In addition, the MOEA-QAR algorithm, namelyMOEA-QAR, is recommended for mining all preciseand understandable ARs. A summary of the majorcharacteristics of the proposed algorithm is presentedbelow:

� The con�dence, comprehensibility, and interest-ingness measures are concurrently optimized byMOEA-QAR by making no item sets;

� Unlike the conventional ARM approaches, the userdoes not determine the minimum support andminimum con�dence, hence automating the miningprocess;

� In the majority of the experiments, the success of theproposed method demonstrates that MOEA-QARis a method capable of numerical ARM featuringa simple particle design with no need for a prioridiscretization;

� The results of using clustering method and the QARalgorithm in a hybrid framework con�rmed that theQAR algorithm enjoyed the ability to mine a highernumber of rules in a cluster than the entire dataset.

The dynamic and competitive nature of today's

Page 14: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1329

business environment has prompted managers andbusiness analysts to apply Decision Support Systems(DSS) to provide quick answers to complex businessquestions. The organizations will not be able to meetcustomer expectations without a deeper awarenessof customers' habitual and behavioral patterns, andthis can have a signi�cant e�ect on their business.The rapid growth of the volume of data and theimportance of using DSS systems demonstrate thenecessity of adopting DM techniques. The applica-tion of DM technology enables the companies to paygreater attention to the customers' interactions, gainproper knowledge more rapidly through behavioral andsystemic information, create more e�ective practicalattitudes, and eventually function more wisely basedon such attitudes. The decision-makers will accomplishtheir chief useful purposes by e�ectively managingand analyzing the volume, velocity, and variety ofthe present data, which are rapidly increasing, andbene�ting from proper tools and skills for a better un-derstanding of their operations, customers, and market.In this study, a novel hybrid QAR approach is proposedfor numerical datasets. DM is concerned with �ndingthe hidden relationships present in business data inorder to allow businesses to make predictions for futureuse [63]. ARM is one of important techniques in DMthat provides valuable information for decision-makersby extracting helpful and applicable rules. The methodcan extract more rules with higher con�dence measureand support percentage than similar and previousapproaches. A larger number of rules with higherquality allow decision-makers to have more alternativesand make more appropriate executive decisions.

6. Conclusion

In the present study, a new approach using multi-objective Genetic Algorithms (GA) was proposed tomine the numerical association rule. One of themost important applications and interesting �elds ofData Science is to extract association rules. Ex-tracting the rules of the dataset containing continu-ous data is one of the challenges of this �eld. Inthis study, a multi-criteria approach based on GAwas developed to extract meaningful and attractiverules from the dataset containing continuous numericalvariables. Association Rule Mining (ARM) is not asingle objective problem and, naturally, is a multi-objective one. In the proposed algorithm, three criteriawere employed in the �tness function. The e�ciencyof the proposed method was examined using threecontinuous numerical datasets. The results of thealgorithm implementation were compared with thoseof previous works in similar cases. The distinctionbetween the proposed method and the similar casescan be expressed as follows: in this method, there is

no need to determine a certain limit for the minimumsupport and minimum con�dence; however, this couldnot a�ect the number of the rules extracted, and thismethod improved the number of these rules comparedto other approaches. Second, one of these criteria usedin �tness function's de�nition was cosine2, which wasnot employed in this �eld of study. This study foundthat the application of cosine2 criteria in the �tnessfunction could signi�cantly improve the usefulness andattractiveness of the extracted rules. The proposedapproach led to an improvement in the results of theevaluation of support and con�dence criteria comparedto other methods; both support and con�dence criteriaincreased in the proposed method, and tradeo� wasmade between the values of these two criteria. Third,no limitation was considered on the number of rulesin this method, causing the extracted rules to havehigher desirability than other methods. In addition,support and con�dence values in the rules extractedfrom the proposed method improved. According tothe mentioned points, it can be expressed that thedetermined goals of this study were achieved. Theresults obtained through comparisons showed the im-provement of the results of a couple of criteria forproper rule extraction: the support percentage andthe con�dence measure. The ability of this methodto extract more rules with higher con�dence measureand support percentage proved the applicability of theproposed approach. This algorithm was implementedin two ways: once on the entire dataset and once oneach cluster after clustering the data. Implementingthe algorithm on each cluster separately extracted morerules than those on the entire dataset. Consideringthat only the positive rules were mined in the presentresearch, the extraction of positive and negative rulesfor future studies is suggested.

References

1. Agrawal, R., Imieli�nski, T., and Swami, A. \Min-ing association rules between sets of items in largedatabases", In ACM SIGMOD Record, 22(2), pp. 207{216 (June 1993).

2. Haeri, A. and Tavakkoli-Moghaddam, R. \Develop-ing a hybrid data mining approach based on multi-objective particle swarm optimization for solving atraveling salesman problem", Journal of Business Eco-nomics and Management, 13(5), pp. 951{967 (2012).

3. Soysal, �O.M. \Association rule mining with mostlyassociated sequential patterns", Expert Systems withApplications, 42(5), pp. 2582{2592 (2015).

4. Rana, M. and Mann, P.S. \Analysis of MFGA toextract interesting rules" International Journal ofComputer Applications, 84(3), pp. 15{21 (2013).

5. �Alvarez, V.P. and V�azquez, J.M. \An evolutionaryalgorithm to discover quantitative association rules

Page 15: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1330 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

from huge databases without the need for an a pri-ori discretization", Expert Systems with Applications,39(1), pp. 585{593 (2012).

6. Miller, R.J. and Yang, Y. \Association rules overinterval data", ACM SIGMOD Record, 26(2), pp. 452{461 (1997).

7. Moslehi, P., Bidgoli, B.M., Nasiri, M., and Salajegheh,A. \Multi-objective numeric association rules miningvia ant colony optimization for continuous domainswithout specifying minimum support and minimumcon�dence", International Journal of Computer Sci-ence Issues (IJCSI), 8(5), pp. 1{34 (2011).

8. Ehrgott, M., Multicriteria Optimization, 491. SpringerScience & Business Media (2005).

9. Coello, C.A.C., Lamont, G.B., and Van Veldhuizen,D.A., Evolutionary Algorithms for Solving Multi-Objective Problems, Springer, 800 (2002).

10. Coello, C.A.C., Lamont, G.B., and Van Veldhuizen,D.A., Evolutionary Algorithms for Solving Multi-Objective Problems, 5, New York: Springer (2007).

11. Deb, K., Multi-Objective Optimization Using Evolu-tionary Algorithms, 16, John Wiley & Sons (2001).

12. Kaya, M. and Alhajj, R. \Genetic algorithm basedframework for mining fuzzy association rules", FuzzySets and Systems, 152(3), pp. 587{601 (2005).

13. Kaya, M. and Alhajj, R. \Utilizing genetic algorithmsto optimize membership functions for fuzzy weightedassociation rules mining", Applied Intelligence, 24(1),pp. 7{15 (2006).

14. Alatas, B. and Akin, E. \Rough particle swarm op-timization and its applications in data mining", SoftComputing, 12(12), pp. 1205{1218 (2008).

15. Ayubi, S., Muyeba, M.K., Baraani, A., et al. \An al-gorithm to mine general association rules from tabulardata", Information Sciences, 179(20), pp. 3520{3539(2009).

16. Nasiri, M., Taghavi, L.S., and Minaee, B. \Multi-objective rule mining using simulated annealing algo-rithm", Journal of Convergence Information Technol-ogy, 5(1), pp. 60{68 (2010).

17. Qodmanan, H.R., Nasiri, M., and Minaei-Bidgoli, B.\Multi objective association rule mining with geneticalgorithm without specifying minimum support andminimum con�dence", Expert Systems with Applica-tions, 38(1), pp. 288{298 (2011).

18. Djenouri, Y., Drias, H., and Chemchem, A. \A hybridbees swarm optimization and tabu search algorithmfor association rule mining", In Nature and BiologicallyInspired Computing (NaBIC), 2013 World Congress onIEEE pp. 120{125, (Aug, 2013).

19. Agbehadji, I.E., Fong, S., and Millham, R. \Wolfsearch algorithm for numeric association rule mining",In Cloud Computing and Big Data Analysis (ICC-

CBDA), 2016 IEEE International Conference on IEEEpp. 146{151 (July, 2016).

20. Can, U. and Alatas, B. \Automatic mining of quan-titative association rules with gravitational searchalgorithm", International Journal of Software Engi-neering and Knowledge Engineering, 27(3), pp. 343{372 (2017).

21. Erickson, M., Mayer, A., and Horn, J. \The nichedPareto genetic algorithm 2 applied to the design ofgroundwater remediation systems", In InternationalConference on Evolutionary Multi-Criterion Optimiza-tion, (pp. 681{695), Springer, Berlin, Heidelberg(March, 2001).

22. Alata�s, B. and Akin, E. \An e�cient genetic algorithmfor automated mining of both positive and negativequantitative association rules", Soft Computing, 10(3),pp. 230{237 (2006).

23. Alcala-Fdez, J., Flugy-Pape, N., Bonarini, A., andHerrera, F. \Analysis of the e�ectiveness of the geneticalgorithms based on extraction of association rules",Fundamental Informaticae, 98(1), pp. 1{14 (2010).

24. Mata, J., Alvarez, J.L., and Riquelme, J.C. \Miningnumeric association rules with genetic algorithms",In Arti�cial Neural Nets and Genetic Algorithms, pp.264{267, Springer, Vienna (2001).

25. Mata, J., Alvarez, J.L., and Riquelme, J.C. \Dis-covering numeric association rules via evolutionaryalgorithm", Advances in Knowledge Discovery andData Mining, pp. 40{51 (2002).

26. Salleb-Aouissi, A., Vrain, C., and Nortet, C. \Quant-Miner: A genetic algorithm for mining quantitativeassociation rules", In IJCAI, 7, pp. 1035{1040 (2007,January).

27. Taboada, K., Gonzales, E., Shimada, K., et al. \As-sociation rule mining for continuous attributes usinggenetic network programming", IEEJ Transactions onElectrical and Electronic Engineering, 3(2), pp. 199{211 (2008).

28. Yan, X., Zhang, C., and Zhang, S. \Genetic algorithm-based strategy for identifying association rules withoutspecifying actual minimum support", Expert Systemswith Applications, 36(2), pp. 3066{3076 (2009).

29. Mart��nez-Ballesteros, M., Mart��nez-�Alvarez, F., Tron-coso, A., and Riquelme, J.C. \An evolutionary algo-rithm to discover quantitative association rules in mul-tidimensional time series", Soft Computing, 15(10), p.2065 (2011).

30. Mart��n, D., Rosete, A., Alcal�a-Fdez, J., and Her-rera, F. \A multi-objective evolutionary algorithm formining quantitative association rules", In IntelligentSystems Design and Applications (ISDA), 2011 11thInternational Conference on, pp. 1397{1402, IEEE(November, 2011).

31. Minaei-Bidgoli, B., Barmaki, R., and Nasiri, M. \Min-ing numerical association rules via multi-objectivegenetic algorithms", Information Sciences, 233, pp.15{24 (2013).

Page 16: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332 1331

32. Mart��n, D., Rosete, A., Alcal�a-Fdez, J., and Herrera,F. \QAR-CIP-NSGA-II: A new multi-objective evo-lutionary algorithm to mine quantitative associationrules", Information Sciences, 258, pp. 1{28 (2014).

33. Mart��n, D., Alcal�a-Fdez, J., Rosete, A., and Herrera,F. \NICGAR: A niching genetic algorithm to minea diverse set of interesting quantitative associationrules", Information Sciences, 355, pp. 208{228 (2016).

34. Indira, K. and Kanmani, S. \Mining association rulesusing hybrid genetic algorithm and particle swarm op-timisation algorithm", International Journal of DataAnalysis Techniques and Strategies, 7(1), pp. 59{76(2015).

35. Agarwal, A. and Nanavati, N. \Association rule miningusing hybrid GA-PSO for multi-objective optimisa-tion", In Computational Intelligence and ComputingResearch (ICCIC), 2016 IEEE International Confer-ence on, pp. 1{7, IEEE (December, 2016).

36. Sarkar, S., Lohani, A., and Maiti, J. \Geneticalgorithm-based association rule mining approach to-wards rule generation of occupational accidents", InInternational Conference on Computational Intelli-gence, Communications, and Business Analytics, pp.517{530, Springer, Singapore (March, 2017).

37. Djenouri, Y., Belhadi, A., Fournier-Viger, P., andFujita, H. \Mining diversi�ed association rules in bigdatasets: A cluster/GPU/genetic approach", Informa-tion Sciences, 459, pp. 117{134 (2018).

38. Kumar, P. and Singh, A.K. \E�cient generation ofassociation rules from numeric data using geneticalgorithm for smart cities", In Security in SmartCities: Models, Applications, and Challenges pp. 323{343, Springer, Cham. (2019).

39. Mart��nez-Ballesteros, M., Bacardit, J., Troncoso, A.,and Riquelme, J.C. \Enhancing the scalability of agenetic algorithm to discover quantitative associationrules in large-scale datasets", Integrated Computer-Aided Engineering, 22(1), pp. 21{39 (2015).

40. Padillo, F., Luna, J.M., Herrera, F., et al. \Mining as-sociation rules on big data through MapReduce geneticprogramming", Integrated Computer-Aided Engineer-ing, 25(1), pp. 31{48 (2018).

41. Mart��n, D., Mart��nez-Ballesteros, M., Garc��a-Gil, D.,et al. \MRQAR: A generic MapReduce frameworkto discover quantitative association rules in big dataproblems", Knowledge-Based Systems, 153, pp. 176{192 (2018).

42. Haery, A., Salmasi, N., Yazdi, M.M., and Iranmanesh,H. \Application of association rule mining in supplierselection criteria", World Academy of Science, Engi-neering and Technology, 40(1), pp. 358{362 (2008).

43. Srikant, R. and Agrawal, R. \Mining quantitative asso-ciation rules in large relational tables", In AcmSigmodRecord, 25(2), pp. 1{12, ACM (1996, June).

44. Shenoy, P.D., Srinivasa, K.G., Venugopal, K.R., andPatnaik, L.M. \Dynamic association rule mining usinggenetic algorithms", Intelligent Data Analysis, 9(5),pp. 439{453 (2005).

45. Kuo, R.J., Chao, C.M., and Chiu, Y.T. \Applicationof particle swarm optimization to association rulemining", Applied Soft Computing, 11(1), pp. 326{336(2011).

46. Beiranvand, V., Mobasher-Kashani, M., and Bakar,A.A. \Multi-objective PSO algorithm for mining nu-merical association rules without a priori discretiza-tion", Expert Systems with Applications, 41(9), pp.4259{4273 (2014).

47. Freitas, A.A., Data Mining and Knowledge Discoverywith Evolutionary Algorithms, Springer (2002).

48. Ghosh, A. and Nath, B. \Multi-objective rule miningusing genetic algorithms", Information Sciences, 163,pp. 123{133 (2004).

49. Prajapati, D.J., Garg, S., and Chauhan, N.C. \In-teresting association rule mining with consistent andinconsistent rule detection from big sales data indistributed environment", Future Computing and In-formatics Journal, 2(1), pp. 19{30 (2017).

50. Merceron, A. and Yacef, K. \Interestingness measuresfor association rules in educational data", EDM, 8, pp.57{66 (2008).

51. Rokh, B., Mirvaziri, H., and Eftekhari, M. \Proposingan e�cient combination of interesting measures formining association rules via NSGA-II", In Technol-ogy, Communication and Knowledge (ICTCK), 2014International Congress on, pp. 1{7, IEEE (November,2014).

52. Luna, J.M., Romero, J.R., and Ventura, S. \Grammar-based multi-objective algorithms for mining associa-tion rules", Data & Knowledge Engineering, 86, pp.19{37 (2013).

53. Hsieh, Y., Lee, P., and You, P. \Immune basedevolutionary algorithm for determining the opti-mal sequence of multiple disinfection operations",Scientia Iranica, 26(2), pp. 959{974 (2019). DOI:10.24200/sci.2018.20324

54. Sadeghi, H., Zolfaghari, M., and Heydarizade, M.\Estimation of electricity demand in residential sectorusing genetic algorithm approach", Journal of Indus-trial Engineering & Production Research, 22(1), pp.43{50 (2011).

55. Ostadi, B., MotamediSedeh, O., HusseinzadehKashan, A., and Amin-Naseri, M. \An intelligentmodel to predict the day-ahead deregulated marketclearing price: a hybrid NN, PSO and GA approach",Scientia Iranica, 26(6), pp. 3846{3856 (2019). DOI:10.24200/sci.2018.50910.1909

56. Russell, S.J. and Norvig, P., Arti�cial Intelligence AModern Approach, Pearson Education (2008).

Page 17: A genetic algorithm-based framework for mining quantitative association rules …scientiairanica.sharif.edu/article_21432_8f2cc4b83db58b... · 2021. 3. 10. · Data mining; Quantitative

1332 F. Moslehi and A. Haeri/Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1316{1332

57. Goldberg, D.E., Genetic Algorithms in Search Opti-mization and Machine Learning, Addison Wesley, p.41 (1989).

58. Sonagara, D. and Badheka, S. \Comparison of basicclustering algorithms", Int. J. Comput. Sci. Mob.Comput, 3(10), pp. 58{61 (2014).

59. Dabbagh, H., GhodratiAmiri, G., and Shaabani, S.\Modal data-based approach to structural damageidenti�cation by means of imperialist competitive op-timization algorithm", Scientia Iranica, 25(3), pp.1070{1080 (2018). DOI: 10.24200/sci.2017.4590

60. Mart��nez-Ballesteros, M. and Riquelme, J.C. \Analysisof measures of quantitative association rules", In Inter-national Conference on Hybrid Arti�cial IntelligenceSystems, pp. 319{326, Springer, Berlin, Heidelberg(May, 2011).

61. Picek, S. and Golub, M. \Comparison of a crossoveroperator in binary-coded genetic algorithms", WSEASTransactions on Computers, 9, pp. 1064{1073 (2010).

62. Guvenir, H. and Uysal, I., Bilkent University FunctionApproximation Repository (2000).<http://funapp.cs.bilkent.edu.tr/>

63. Moslehi, F., Haeri, A., and Moini, A. \Analyzing andinvestigating the use of electronic payment tools inIran using data mining techniques", Journal of AIand Data Mining, 6(2), pp. 417{437 (2018). DOI:10.22044/jadm.2017.5352.1643

Biographies

Fateme Moslehi received her BSc degree in Com-puter Engineering from University of Shahid Rajaee,Iran in 2014. She pursued MSc degree in 2015 andreceived it in Information Technology Engineering fromIran University of Science and Technology in 2017.Her research interests are data mining techniques suchas rule mining and clustering and soft computingmethods such as genetic algorithm and particle swarmoptimization.

Abdorrahman Haeri is an Assistant Professor ofIndustrial Engineering at Iran University of Scienceand Technology, Iran. He received his PhD fromthe Department of Industrial Engineering, College ofEngineering, and University of Tehran, Iran in 2013.He holds an MSc degree in Industrial Engineering,Sharif University of Technology, Tehran, Iran in 2008.His main areas of teaching and research interestsinclude data envelopment analysis, network design, anddata mining. He has published several papers in inter-national conferences and academic journals includingSafety Science, Scientometrics, Energy Sources, Part B:Economics, Planning, and Policy, Journal of Engineer-ing Manufacture: Part B, Int. J. Production Research,Int. J. Advanced Manufacturing and Technology, Int.J. Productivity and Quality Management (IJPQM).