13
Research Article Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning Algorithm That Learns to Search Optimized Ensembles Ruihan Hu, 1,2 Songbin Zhou , 1 Yisen Liu, 1 and Zhiri Tang 2 1 Guangdong Key Laboratory of Modern Control Technology, Guangdong Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou 510070, Guangdong Province, China 2 School of Physics and Technology, Wuhan University, Wuhan 430072, Hubei, China Correspondence should be addressed to Songbin Zhou; [email protected] Received 7 December 2018; Revised 6 March 2019; Accepted 9 April 2019; Published 3 June 2019 Academic Editor: Cheng-Jian Lin Copyright © 2019 Ruihan Hu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e ensemble pruning system is an effective machine learning framework that combines several learners as experts to classify a test set. Generally, ensemble pruning systems aim to define a region of competence based on the validation set to select the most competent ensembles from the ensemble pool with respect to the test set. However, the size of the ensemble pool is usually fixed, and the performance of an ensemble pool heavily depends on the definition of the region of competence. In this paper, a dynamic pruning framework called margin-based Pareto ensemble pruning is proposed for ensemble pruning systems. e framework explores the optimized ensemble pool size during the overproduction stage and finetunes the experts during the pruning stage. e Pareto optimization algorithm is used to explore the size of the overproduction ensemble pool that can result in better performance. Considering the information entropy of the learners in the indecision region, the marginal criterion for each learner in the ensemble pool is calculated using margin criterion pruning, which prunes the experts with respect to the test set. e effectiveness of the proposed method for classification tasks is assessed using datasets. e results show that margin-based Pareto ensemble pruning can achieve smaller ensemble sizes and better classification performance in most datasets when compared with state-of-the-art models. 1. Introduction Recent publications have widely applied multiple classifier systems (MCSs) [1] in fields such as digital recognition [2], facial recognition [3], acoustic recognition [4], credit scoring [5], imbalance classification [6], recommender system [7], software bug detection [8], and environmental data analysis [9]. Unlike deep learning frameworks [10], it has been shown that MCSs [2] can be learned well on both small and large- scale sets. e advantage of an MCS is that more decision guidelines are provided by the ensemble pool than by a single learner. However, MCSs cannot determine which learners are most suitable with respect to the incoming dataset because not all decision guidelines are useful for classifying targets. As a modified case of MCS, an ensemble pruning system (EPS) [11–20] is a popular machine learning model that can select base learners from an ensemble pool to construct the expert. Previous studies [12] have demonstrated that EPS can achieve superior performance to MCS because the competence level of each base learner is calculated using the validation set, and learners with low competitiveness that unlikely improve the performance of ensemble pool are pruned when the testing set is used in EPS. In general, the EPS model can be separated into 3 categories: static pruning, dynamic classifier pruning, and dynamic ensemble pruning. In static pruning [13], experts are directly selected from the ensemble pool using the training set. In dynamic classifier pruning [14, 15], only the most competent learner can be chosen from the ensemble pool once a test sample emerges. In dynamic ensemble pruning [16–20], the subsets of the ensemble pools are selected as experts to deal with the test samples. For our viewpoint, the dynamic classifier pruning model can be seen as a special case of the dynamic ensemble Hindawi Computational Intelligence and Neuroscience Volume 2019, Article ID 7560872, 12 pages https://doi.org/10.1155/2019/7560872

Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

Research ArticleMargin-Based Pareto Ensemble Pruning An Ensemble PruningAlgorithm That Learns to Search Optimized Ensembles

Ruihan Hu12 Songbin Zhou 1 Yisen Liu1 and Zhiri Tang2

1Guangdong Key Laboratory of Modern Control Technology Guangdong Institute of Intelligent ManufacturingGuangdong Academy of Sciences Guangzhou 510070 Guangdong Province China2School of Physics and Technology Wuhan University Wuhan 430072 Hubei China

Correspondence should be addressed to Songbin Zhou sbzhougiimaccn

Received 7 December 2018 Revised 6 March 2019 Accepted 9 April 2019 Published 3 June 2019

Academic Editor Cheng-Jian Lin

Copyright copy 2019 Ruihan Hu et al -is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

-e ensemble pruning system is an effective machine learning framework that combines several learners as experts to classify a testset Generally ensemble pruning systems aim to define a region of competence based on the validation set to select the mostcompetent ensembles from the ensemble pool with respect to the test set However the size of the ensemble pool is usually fixedand the performance of an ensemble pool heavily depends on the definition of the region of competence In this paper a dynamicpruning framework called margin-based Pareto ensemble pruning is proposed for ensemble pruning systems -e frameworkexplores the optimized ensemble pool size during the overproduction stage and finetunes the experts during the pruning stage-e Pareto optimization algorithm is used to explore the size of the overproduction ensemble pool that can result in betterperformance Considering the information entropy of the learners in the indecision region the marginal criterion for each learnerin the ensemble pool is calculated using margin criterion pruning which prunes the experts with respect to the test set -eeffectiveness of the proposed method for classification tasks is assessed using datasets -e results show that margin-based Paretoensemble pruning can achieve smaller ensemble sizes and better classification performance in most datasets when compared withstate-of-the-art models

1 Introduction

Recent publications have widely applied multiple classifiersystems (MCSs) [1] in fields such as digital recognition [2]facial recognition [3] acoustic recognition [4] credit scoring[5] imbalance classification [6] recommender system [7]software bug detection [8] and environmental data analysis[9] Unlike deep learning frameworks [10] it has been shownthat MCSs [2] can be learned well on both small and large-scale sets -e advantage of an MCS is that more decisionguidelines are provided by the ensemble pool than by asingle learner However MCSs cannot determine whichlearners are most suitable with respect to the incomingdataset because not all decision guidelines are useful forclassifying targets

As a modified case of MCS an ensemble pruning system(EPS) [11ndash20] is a popular machine learning model that can

select base learners from an ensemble pool to construct theexpert Previous studies [12] have demonstrated that EPScan achieve superior performance to MCS because thecompetence level of each base learner is calculated using thevalidation set and learners with low competitiveness thatunlikely improve the performance of ensemble pool arepruned when the testing set is used in EPS In general theEPS model can be separated into 3 categories static pruningdynamic classifier pruning and dynamic ensemble pruningIn static pruning [13] experts are directly selected from theensemble pool using the training set In dynamic classifierpruning [14 15] only the most competent learner can bechosen from the ensemble pool once a test sample emergesIn dynamic ensemble pruning [16ndash20] the subsets of theensemble pools are selected as experts to deal with the testsamples For our viewpoint the dynamic classifier pruningmodel can be seen as a special case of the dynamic ensemble

HindawiComputational Intelligence and NeuroscienceVolume 2019 Article ID 7560872 12 pageshttpsdoiorg10115520197560872

pruning model -e dynamic ensemble pruning model hasfour stages overproduction region of confidence definitionselection and integration During overproduction everybase learner independently learns when composing theensemble pool using the training set During region ofcompetence definition the criterion is defined to explore thecorrelation between the ensemble pool and the targets usingthe validation set During selection according to the regionof competence different learners with low competitivenessneed to be pruned from the testing set During integrationthe outputs of the learners are aggregated To the best of ourknowledge different EPS models use different definitions ofthe region of competence criteria to select classifiers Antosikand Kurzynski [16] estimated the source competence ofevery learner using the minimum difference minimizationcriterion between the outputs and the targets with respect tothe validation set Lobato et al [17] used the Bayesiancomputation framework to approximate the confidence levelof the remaining learners by querying the selected expert-e size of the expert is increased until the confidence levelof the remaining learners is above the predefined level Liand Zhou [18] generalized the EPS problem as a quadraticprogramming problem with a sparse solution -e results inthis publication show that dynamic ensemble pruning canachieve good generalization performance little expertiseKNORA-Eliminate [19] selected the expert that can correctlyclassify all samples in the region of competence -e regionof competence for KNORA-Eliminate will be reduced untilthe expert achieves 100 accuracy in the region of com-petence META-DES [20] provided an alternative frame-work that views dynamic ensemble pruning as a meta-learning procedure -e meta-features are constructedaccording to the correlation among the probability outputsinputs and targets

Regarding dynamic ensemble pruning the size of theensemble pool has been fixed in previous publications[16ndash20] and these dynamic ensemble pruning methodsfailed to search the optimal ensemble pool during theoverproduction stage According to Davis et al [21] theselection of the expert is known to be a nondeterministicpolynomial hard (NP-hard) problem Additionally mostpruning criteria estimate the competency of the learners butneglect the condition that queries the sample located in theindecision boundary and learners have different suggestionsfor samples in the region of competence It is insufficient tocorrectly select the expert in the ensemble pool To addressthese problems we first regard the calculation procedure forthe size of the pool during the overproduction stage as anoptimization problem Two targets ie ensemble pool sizeand learning performance for dynamic ensemble pruningmodels are globally searched by a bi-objective programmingformulation -e ensemble pool size calculated by theprogramming method does not utilize the local informationabout the region of competence for the validation set -usthe obtained ensemble pool size may be suboptimal Moreprecisely the local exploration method between baselearners in the ensemble pool with the suboptimal size needsto be considered Accordingly margin-based Pareto en-semble pruning (MBPEP) is proposed consisting of Pareto

optimization and a margin criterion pruning mechanism forglobal and local search of the optimal ensemble pool size Inthis paper the following two contributions are made thePareto optimization algorithm [22] is applied to calculate theoptimal size of the ensemble pool during the overproductionstage for dynamic ensemble pruning in MBPEP and themargin criterion pruning (MCP) mechanism is containedwithin the MBPEP so that the learners in the indecisionregion can detect the different classes in the region Sub-sequently the margin criterion for each class in the in-decision region is further calculated to prune the size of theexpert Classifiers in the indecision region that contain a lotof information will be pruned according to the MCP Unlikesome dynamic ensemble pruning methods [19 20] based onvarious pruning criteria the local exploration technique inMBPEP is preselected for these pruning criteria it can bedirectly inserted into them and providemore robust learningperformance than the original dynamic ensemble pruningmethods

-e remainder of this paper is organized as followsSection 21 describes the proposed MBPEP frameworkSection 22 discusses the Pareto optimization that estimatesthe optimal size of the ensemble pool Section 22 introducesthe MCPmethod to prune the ensemble pool In Section 3 aseries of experiments are executed to evaluate the perfor-mance of the MBPEP framework -e studyrsquos conclusionsare given in Section 4

2 Methodology

In this section the methodology of theMBPEP framework isdiscussed and the basic architecture of MBPEP is describedin Section 21 Key ensemble exploration details for ensemblepool Pareto optimization and MCP are described in Sec-tions 22 and 23

21 Margin-Based Pareto Ensemble Pruning (MBPEP)Framework for Dynamic Ensemble Pruning Like the generaldynamic ensemble models [16ndash20] the MBPEP-based dy-namic ensemble pruning framework can be separated intofour stages During the overproduction stage T is the initialensemble pool size and each learner in ensemble pool H

H1 HT1113864 1113865 can be independently learned based on thetraining set XTr To keep the ensemble pool diverse andinformative during this stage these T base learners will beinitialized by training different training sets For examplebagging boost and clustering approaches [1] that cangenerate different distributions of training sets are used totrain every learner

Bagging is used in this paper and classification andregression trees (CARTs) [23] are used as base learnersSubsequently the optimal overproduction pool size HS

H1 HTs is estimated using the Pareto optimization

algorithm -e global exploration for the T initial baselearners is calculated and the Ts learners that result insuboptimal performance are selected during this stageDuring the region of competence definition stage theneighbors in validation set XVal are learned using various

2 Computational Intelligence and Neuroscience

criteria Based on region of competence definition similarsamples for unknown query instance Xquery and the compe-tence level of each base learner can be estimated -e commontechniques for defining the region of competence includeminimum difference minimization [16] k-nearest neighbors(KNN) [24] K-means [25] and the competence map method[26] During the pruning stage some learners are extracted toconstruct the expert with respect to the test setXTes In contracttraditional dynamic ensemble pruning models use insufficientpruning techniques such as KNORA-Eliminate [19] andMETA[20] to prune learners with low competitiveness in the ensemblepool In MBPEP the expert Hlowast H1 HTlowast1113864 1113865 is estimatedthrough two steps for the test set Meanwhile the region ofcompetence for XTes will be checked usingMBPEPwith respectto whether the learners in the ensembles have the same sug-gestions when they recognize the samples in the region ofcompetence for XTes When every learner in the ensembles hasthe same suggestion that region is defined as a safe region inthis condition the MCP method does not need to be activatedand the ensemble pool is passed down to various pruningtechniques However when the learners in ensembles makedifferent suggestions for samples in the region of competencefor XTes the region is defined as the indecision region MCP isapplicable to the scope of the indecision region and the size ofthe expert can be further pruned -e MCP operator whichdoes not damage the general pruning criteria [19 20] can beused as a preselector to improve exploration for the local in-formation between the base learners in the ensemble poolDuring the integration stage the results that are calculated bythe expert are aggregated -e strategies used for aggregationcan be separated into three types nontrainable methods [1] inwhich the weights of the outputs do not need to be learnedtrainable methods [1] in which the outputs from the experts inthe pruning stage are used as input features to be trained byanother learner and dynamic weighting methods [1] in whichthe weights are determined by the estimated competence levelsof every expert -e famous integration methods includemajority voting and oracle In this paper majority voting is usedas the aggregation strategy -e architecture of the MBPEP-based dynamic ensemble pruning framework is shown inFigure 1 Notably the performance of dynamic ensemblepruning is sensitive to the definition of the region of compe-tence Since the region of competence is not the key contri-bution of MBPEP in this paper different criteria are used todefine the region of competence

22 Pareto Optimization Evolutionary Algorithm In recentyears several studies [11ndash20] have demonstrated that en-semble models always achieve better results than any baselearner However methods of building ensembles that canachieve great performance with respect to the test set are stillrequired In this paper this problem is solved by MBPEPduring the overproduction stage In Section 21 the archi-tecture of MBPEP that is introduced during the over-production stage in Figure 1 is discussed and theoverproduced learners are trained to compose the ensemblepool In general it is a conflicting phenomenon that few baselearners produce few decision boundaries without achieving

powerful performance According to previous studies [16ndash20] the ensemble pool size is fixed during the over-production stage and calculation of the size of the ensemblepool that is suitable for handling classification tasks ischallenging Our objective is to estimate an optimal en-semble pool size for achieving better performance than thatof a fixed ensemble pool size -e MBPEP computationframework uses Pareto optimization to address the learningperformance and the ensemble pool size

In this study the initial ensemble pool size is T(H H1 HT1113864 1113865) the inputs are X X1 Xn1113864 1113865 andthe targets are y y1 yn1113864 1113865 where n is the number ofinstances and c is the number of classes -e correspondingoptimized pool size after calculation is defined as HS -evector representation form of the optimized ensemble pool iscomposed of a binary element vector that is defined asS isin 0 1 T with T dimensions Si 1 denotes that the ithlearner in H H1 HT1113864 1113865 is selected otherwise Si 0-e Pareto optimization algorithm converts the calculation ofthe optimal ensemble pool size into a subset selectionproblem Suppose that the outputs of the different ensemblepools can be denoted as hisk (i isin 1 n s isin [0 T] andk isin 1 c) where i s and k denote the indexes of instancesensembles and classes respectively Following these symbolsthe classification error f(HS) can be denoted as follows

f HS( 1113857 1minus1n

1113944

n

i1max

c

k11113944

T

s1hisk (1)

-e classification performance and ensemble pool sizeare estimated using the Pareto optimization bi-objectiveprogramming formulation as follows

argminSisin 01 T

f HS( 1113857 L( 1113857(2)

where L denotes sum(S) -e operator sum(middot) denotes thatthe binary vector S is summed up to calculate the ensemblepool size Unlike other single-objective algorithms [27 28]the concept of ldquodominationrdquo which continuously disturbsthe ensemble size and measures the difference betweenf(HS) and L during the global searching iteration process isintroduced in Pareto optimization

23 Margin Criterion Pruning (MCP) As mentioned inSection 21 the Pareto optimization method is used duringthe overproduction stage to calculate the optimal ensemblepool size in the dynamic ensemble pruning model If querysample xquery is located in the indecision boundaries of thelearners then the predictions of the learners for querysample xquery might differ from the samples that belong to itscorresponding region of competence Like the other char-acteristics of heuristic algorithms [22] the global explorationbetween the base learners used by Pareto optimization withrespect to the two objectives (f(HS) and sum(s)) cannotdeal with this condition Pareto optimization provides thesuboptimal [21] ensemble pool size during the over-production stage because the local information between thebase learners and the targets is neglected in the solution

Computational Intelligence and Neuroscience 3

space Different from other publications previous studies[16ndash20] propose using the pruning criteria to estimate thecompetence of each learner for query sample xquery withoutconsidering when this query sample is located in the in-decision boundaries In this paper the local information ismeasured by MCP mechanism between these base learnersin the ensemble pool to compute the information quantitythat query sample xquery is located in the indecisionboundaries of the learners -e MCP is added during thepruning stage as a preselector that is executed in parallel withsuch classical pruning criteria [16ndash20] As a subset of theregion of competence the indecision region of the ensemblepool is defined and explored to estimate the local compe-tence of each sublearner in the ensembles A schematicdiagram of the region of competence for four ensemble poolsize conditions is shown in Figure 2

-e safe region and the indecision region are defined andshown in Figures 2(a) and 2(b) and Figures 2(c) and 2(d)respectively As shown in Figure 2 two regions of compe-tence that query samples (yellow triangle in Figure 2) locatedin the safe and indecision regions are determined accordingto whether the ensembles can correctly recognize theneighbors of the query sample If these ensembles make thesame suggestions for these neighbors the region of com-petence becomes the safe region When ensembles havedifferent suggestions for these neighbors the region ofcompetence becomes the indecision region According toFigures 2(b)ndash2(d) it can be seen that ensembles can reach100 (77) 714 (57) and 571 (47) respectively Inaddition MCP focuses on the indecision region and thelocal competence of each learner that can distinguish the

samples with different classes in the indecision region isestimated

Suppose that there are TS learnersHS H1 Hs1113864 1113865 (inSection 22) selected to compose the ensemble pool afterbeing globally explored using the Pareto optimization al-gorithm during the overproduction stage Suppose thatquery sample x with N neighbors in the region of com-petence has various suggestions made by the ensembles N1

x

denotes the number votes of the most popular label voted forby the ensemblesHS In addition N2

x represents the numbervotes of the second most popular label voted for by Hlowast -emarginal criterion is used to measure the amount of in-formation for the ensembles in the indecision region usingMCP We define the difference between the labels with themost and second most votes as the marginal information forthe neighbors of query sample x as follows

margin(x) N1

x minusN2x

TS (3)

Entropy is used to compute the amount of informationquantity in the machine learning field [29] To measure themargins of the final hypotheses the margin entropy criterionis defined to calculate the amount of information of samplesX for Hi in HS H1 Hs1113864 1113865

CHi(X) minus

1N

1113944

N

j1log margin xj1113872 11138731113960 1113961 (4)

where N denotes the number of neighbors around querysamplesX If a greater difference is calculated by equation (3)between the most and second most popular labels then it

Overproduction

H = h1 hT

Hlowast = h1 hTlowast

XTr

Definition of the region of competence

XVal XTes

Pruning stage

Integration stage

MCPPruningcriteria

Pareto optimization MDM KNN

K-meansMajority voting

Oracle

Figure 1 Architecture of MBPEP which consists of four stages overproduction region of competence definition pruning and integration

4 Computational Intelligence and Neuroscience

can be interfered that these ensembles cannot detect samplesthat belong to different classes in the region of competence Inaddition in MBPEP N1

x minusN2x is always less than TS and

hence the maximum value of margin(x) is less than 1According to equation (4) it can be seen that if the calculatedmargin entropy CHi

(X) of validation sample X is more than 0then the learner can be used to compose the expert Otherwisewhen CHi

(X)le 0 the learner needs to be pruned-is processis called MCP MCP performs greedy research to define asubset of the ensemble pool that can be used as a preselector todetermine a suitable learner that can recognize the differenttargets in the indecision region -e MCP algorithm is shownin Algorithm 1

3 Results and Discussion

31 Configuration Sixteen UCI datasets [30] are used asbenchmarks in this paper -e characteristics of these

datasets are shown in Table 1 and each of them is split intothree parts a training set a validation set and a test setClassification and regression trees (CARTs) are used as baselearners First the bagging method is applied to split thetraining datasets -e training sets are divided into subsetsand each subset is selected using bootstrap extraction -esubclassifiers are trained using the training set -e numberof ensembles is preset as 100 Second the MBPEP method isapplied to prune the validation set -e optimized sub-classifiers are aggregated into the final hypotheses using themajority voting method [1] Finally the test set is used tomeasure the performance of the methods

32 Pruning Metrics -e bagging ensemble is a commonarchitecture that uses learners to train different bootstrappedsamples In this paper the full bagging method in which allbase learners are selected to construct the ensemble

Query

H1H2

(a)

H1

Query

(b)

H2

Query

(c)

H3

Query

(d)

Figure 2 -e region of competence during the pruning stage -e query sample is located in (a b) the safe region and (c d) the indecisionregion

Computational Intelligence and Neuroscience 5

classifiers is used as the baseline algorithm-e classificationaccuracy and ensemble size of MBPEP are compared withthe optimization-based pruning method EA [31] and fourcompetence ordering-based pruning methods ie reducederror pruning [32] kappa pruning [33] complementaritymeasure pruning [34] and margin distance minimizationpruning [26] -ese pruning methods are described asfollows

(i) Optimization-based pruning [31] regards aggre-gation as a programming problem that intends toselect the optimized subset of base learners byminimizing the validation error In this paper EAis used as an optimization ensemble method

(ii) Reduced error (RE) pruning [32] sorts the sub-classifiers and adds these subclassifiers one by oneto find the lowest classification error of the finalensemble Margineantu and Dietterich use theback fitting search method to approximate thegeneralization performance of RE pruning

(iii) Kappa pruning [33] uses the Kappa error as astatistical method that measures the diversity be-tween a pair of sublearners Kuncheva uses κ

statistics to measure the correlation between theoutputs of two sublearners -e classifiers are it-eratively added to the ensemble with the lowest κstatistics

(iv) Complementarity measure (CM) pruning [34] isan ordered pruning ensemble that focuses onfinding the most complementary learners to theensemble for each iteration and incorporates theminto the ensemble classifier -e complementaritypruning method enhances the performance of theclasses with the most votes without harming theclasses with the least votes

(v) Margin distance minimization (MDM) pruning[16] was discussed in Introduction

(vi) Randomized reference classifier (RRC) pruning[26] estimates the competence of each learner inthe ensemble pool for the validation set using thecorresponding probability outputs and severalrandom variables with beta probabilitydistribution

(vii) KNORA-Eliminate [19] was discussed inIntroduction

(viii) META-DES [20] was discussed in Introduction

33 Characteristics of MBPEP To investigate the charac-teristics of MBPEP the advantages of both Pareto optimi-zation and MCP are measured In this section RRCKNORA-Eliminate full bagging and META-DES areused for comparison purposes Notably the full baggingmethod can be used as the baseline without pruning-e twocontributions (Pareto optimization and MCP) for MBPEPare applied to the overproduction and pruning stages re-spectively and hence two experiments in which MCP andMBPEP are embedded into dynamic ensemble pruningmodels which are executed to explore the characteristics ofMBPEP

To measure the advantages of MCP from Figure 3 wecan see that the performances of the dynamic ensemblepruning models (for the META and KNORA models) basedon MCP are superior to those without MCP In addition all

Require the ensemble pool HS H1 Hs1113864 1113865 generated during the overproduction stage(1) Hlowast (the pruned ensemble pool is empty)(2) for i in HS H1 Hs1113864 1113865 do(3) if sample X is in the safe region(4) continue(5) else if sample X is in the indecision region(6) if CHi

(X)gt 0(7) Hlowast HlowastcupHi

(8) else if CHi(X)le 0

(9) Hi is pruned(10) end if(11) end if(12) return Hlowast

ALGORITHM 1 MCP algorithm

Table 1 Benchmark datasets used in the experiments

Data set Examples AttributesAustralian 690 14Breast cancer 286 9Liver 345 6Heart-statlog 270 13Waveform 1000 20Ionosphere 351 34Kr-vs-kp 3196 96Letter 20000 16German 306 3Diabetes 76 4Sonar 208 60Spambase 4601 57Tic-tac-toe 958 9Vehicle 846 18Vote 435 16Pima 768 8

6 Computational Intelligence and Neuroscience

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 2: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

pruning model -e dynamic ensemble pruning model hasfour stages overproduction region of confidence definitionselection and integration During overproduction everybase learner independently learns when composing theensemble pool using the training set During region ofcompetence definition the criterion is defined to explore thecorrelation between the ensemble pool and the targets usingthe validation set During selection according to the regionof competence different learners with low competitivenessneed to be pruned from the testing set During integrationthe outputs of the learners are aggregated To the best of ourknowledge different EPS models use different definitions ofthe region of competence criteria to select classifiers Antosikand Kurzynski [16] estimated the source competence ofevery learner using the minimum difference minimizationcriterion between the outputs and the targets with respect tothe validation set Lobato et al [17] used the Bayesiancomputation framework to approximate the confidence levelof the remaining learners by querying the selected expert-e size of the expert is increased until the confidence levelof the remaining learners is above the predefined level Liand Zhou [18] generalized the EPS problem as a quadraticprogramming problem with a sparse solution -e results inthis publication show that dynamic ensemble pruning canachieve good generalization performance little expertiseKNORA-Eliminate [19] selected the expert that can correctlyclassify all samples in the region of competence -e regionof competence for KNORA-Eliminate will be reduced untilthe expert achieves 100 accuracy in the region of com-petence META-DES [20] provided an alternative frame-work that views dynamic ensemble pruning as a meta-learning procedure -e meta-features are constructedaccording to the correlation among the probability outputsinputs and targets

Regarding dynamic ensemble pruning the size of theensemble pool has been fixed in previous publications[16ndash20] and these dynamic ensemble pruning methodsfailed to search the optimal ensemble pool during theoverproduction stage According to Davis et al [21] theselection of the expert is known to be a nondeterministicpolynomial hard (NP-hard) problem Additionally mostpruning criteria estimate the competency of the learners butneglect the condition that queries the sample located in theindecision boundary and learners have different suggestionsfor samples in the region of competence It is insufficient tocorrectly select the expert in the ensemble pool To addressthese problems we first regard the calculation procedure forthe size of the pool during the overproduction stage as anoptimization problem Two targets ie ensemble pool sizeand learning performance for dynamic ensemble pruningmodels are globally searched by a bi-objective programmingformulation -e ensemble pool size calculated by theprogramming method does not utilize the local informationabout the region of competence for the validation set -usthe obtained ensemble pool size may be suboptimal Moreprecisely the local exploration method between baselearners in the ensemble pool with the suboptimal size needsto be considered Accordingly margin-based Pareto en-semble pruning (MBPEP) is proposed consisting of Pareto

optimization and a margin criterion pruning mechanism forglobal and local search of the optimal ensemble pool size Inthis paper the following two contributions are made thePareto optimization algorithm [22] is applied to calculate theoptimal size of the ensemble pool during the overproductionstage for dynamic ensemble pruning in MBPEP and themargin criterion pruning (MCP) mechanism is containedwithin the MBPEP so that the learners in the indecisionregion can detect the different classes in the region Sub-sequently the margin criterion for each class in the in-decision region is further calculated to prune the size of theexpert Classifiers in the indecision region that contain a lotof information will be pruned according to the MCP Unlikesome dynamic ensemble pruning methods [19 20] based onvarious pruning criteria the local exploration technique inMBPEP is preselected for these pruning criteria it can bedirectly inserted into them and providemore robust learningperformance than the original dynamic ensemble pruningmethods

-e remainder of this paper is organized as followsSection 21 describes the proposed MBPEP frameworkSection 22 discusses the Pareto optimization that estimatesthe optimal size of the ensemble pool Section 22 introducesthe MCPmethod to prune the ensemble pool In Section 3 aseries of experiments are executed to evaluate the perfor-mance of the MBPEP framework -e studyrsquos conclusionsare given in Section 4

2 Methodology

In this section the methodology of theMBPEP framework isdiscussed and the basic architecture of MBPEP is describedin Section 21 Key ensemble exploration details for ensemblepool Pareto optimization and MCP are described in Sec-tions 22 and 23

21 Margin-Based Pareto Ensemble Pruning (MBPEP)Framework for Dynamic Ensemble Pruning Like the generaldynamic ensemble models [16ndash20] the MBPEP-based dy-namic ensemble pruning framework can be separated intofour stages During the overproduction stage T is the initialensemble pool size and each learner in ensemble pool H

H1 HT1113864 1113865 can be independently learned based on thetraining set XTr To keep the ensemble pool diverse andinformative during this stage these T base learners will beinitialized by training different training sets For examplebagging boost and clustering approaches [1] that cangenerate different distributions of training sets are used totrain every learner

Bagging is used in this paper and classification andregression trees (CARTs) [23] are used as base learnersSubsequently the optimal overproduction pool size HS

H1 HTs is estimated using the Pareto optimization

algorithm -e global exploration for the T initial baselearners is calculated and the Ts learners that result insuboptimal performance are selected during this stageDuring the region of competence definition stage theneighbors in validation set XVal are learned using various

2 Computational Intelligence and Neuroscience

criteria Based on region of competence definition similarsamples for unknown query instance Xquery and the compe-tence level of each base learner can be estimated -e commontechniques for defining the region of competence includeminimum difference minimization [16] k-nearest neighbors(KNN) [24] K-means [25] and the competence map method[26] During the pruning stage some learners are extracted toconstruct the expert with respect to the test setXTes In contracttraditional dynamic ensemble pruning models use insufficientpruning techniques such as KNORA-Eliminate [19] andMETA[20] to prune learners with low competitiveness in the ensemblepool In MBPEP the expert Hlowast H1 HTlowast1113864 1113865 is estimatedthrough two steps for the test set Meanwhile the region ofcompetence for XTes will be checked usingMBPEPwith respectto whether the learners in the ensembles have the same sug-gestions when they recognize the samples in the region ofcompetence for XTes When every learner in the ensembles hasthe same suggestion that region is defined as a safe region inthis condition the MCP method does not need to be activatedand the ensemble pool is passed down to various pruningtechniques However when the learners in ensembles makedifferent suggestions for samples in the region of competencefor XTes the region is defined as the indecision region MCP isapplicable to the scope of the indecision region and the size ofthe expert can be further pruned -e MCP operator whichdoes not damage the general pruning criteria [19 20] can beused as a preselector to improve exploration for the local in-formation between the base learners in the ensemble poolDuring the integration stage the results that are calculated bythe expert are aggregated -e strategies used for aggregationcan be separated into three types nontrainable methods [1] inwhich the weights of the outputs do not need to be learnedtrainable methods [1] in which the outputs from the experts inthe pruning stage are used as input features to be trained byanother learner and dynamic weighting methods [1] in whichthe weights are determined by the estimated competence levelsof every expert -e famous integration methods includemajority voting and oracle In this paper majority voting is usedas the aggregation strategy -e architecture of the MBPEP-based dynamic ensemble pruning framework is shown inFigure 1 Notably the performance of dynamic ensemblepruning is sensitive to the definition of the region of compe-tence Since the region of competence is not the key contri-bution of MBPEP in this paper different criteria are used todefine the region of competence

22 Pareto Optimization Evolutionary Algorithm In recentyears several studies [11ndash20] have demonstrated that en-semble models always achieve better results than any baselearner However methods of building ensembles that canachieve great performance with respect to the test set are stillrequired In this paper this problem is solved by MBPEPduring the overproduction stage In Section 21 the archi-tecture of MBPEP that is introduced during the over-production stage in Figure 1 is discussed and theoverproduced learners are trained to compose the ensemblepool In general it is a conflicting phenomenon that few baselearners produce few decision boundaries without achieving

powerful performance According to previous studies [16ndash20] the ensemble pool size is fixed during the over-production stage and calculation of the size of the ensemblepool that is suitable for handling classification tasks ischallenging Our objective is to estimate an optimal en-semble pool size for achieving better performance than thatof a fixed ensemble pool size -e MBPEP computationframework uses Pareto optimization to address the learningperformance and the ensemble pool size

In this study the initial ensemble pool size is T(H H1 HT1113864 1113865) the inputs are X X1 Xn1113864 1113865 andthe targets are y y1 yn1113864 1113865 where n is the number ofinstances and c is the number of classes -e correspondingoptimized pool size after calculation is defined as HS -evector representation form of the optimized ensemble pool iscomposed of a binary element vector that is defined asS isin 0 1 T with T dimensions Si 1 denotes that the ithlearner in H H1 HT1113864 1113865 is selected otherwise Si 0-e Pareto optimization algorithm converts the calculation ofthe optimal ensemble pool size into a subset selectionproblem Suppose that the outputs of the different ensemblepools can be denoted as hisk (i isin 1 n s isin [0 T] andk isin 1 c) where i s and k denote the indexes of instancesensembles and classes respectively Following these symbolsthe classification error f(HS) can be denoted as follows

f HS( 1113857 1minus1n

1113944

n

i1max

c

k11113944

T

s1hisk (1)

-e classification performance and ensemble pool sizeare estimated using the Pareto optimization bi-objectiveprogramming formulation as follows

argminSisin 01 T

f HS( 1113857 L( 1113857(2)

where L denotes sum(S) -e operator sum(middot) denotes thatthe binary vector S is summed up to calculate the ensemblepool size Unlike other single-objective algorithms [27 28]the concept of ldquodominationrdquo which continuously disturbsthe ensemble size and measures the difference betweenf(HS) and L during the global searching iteration process isintroduced in Pareto optimization

23 Margin Criterion Pruning (MCP) As mentioned inSection 21 the Pareto optimization method is used duringthe overproduction stage to calculate the optimal ensemblepool size in the dynamic ensemble pruning model If querysample xquery is located in the indecision boundaries of thelearners then the predictions of the learners for querysample xquery might differ from the samples that belong to itscorresponding region of competence Like the other char-acteristics of heuristic algorithms [22] the global explorationbetween the base learners used by Pareto optimization withrespect to the two objectives (f(HS) and sum(s)) cannotdeal with this condition Pareto optimization provides thesuboptimal [21] ensemble pool size during the over-production stage because the local information between thebase learners and the targets is neglected in the solution

Computational Intelligence and Neuroscience 3

space Different from other publications previous studies[16ndash20] propose using the pruning criteria to estimate thecompetence of each learner for query sample xquery withoutconsidering when this query sample is located in the in-decision boundaries In this paper the local information ismeasured by MCP mechanism between these base learnersin the ensemble pool to compute the information quantitythat query sample xquery is located in the indecisionboundaries of the learners -e MCP is added during thepruning stage as a preselector that is executed in parallel withsuch classical pruning criteria [16ndash20] As a subset of theregion of competence the indecision region of the ensemblepool is defined and explored to estimate the local compe-tence of each sublearner in the ensembles A schematicdiagram of the region of competence for four ensemble poolsize conditions is shown in Figure 2

-e safe region and the indecision region are defined andshown in Figures 2(a) and 2(b) and Figures 2(c) and 2(d)respectively As shown in Figure 2 two regions of compe-tence that query samples (yellow triangle in Figure 2) locatedin the safe and indecision regions are determined accordingto whether the ensembles can correctly recognize theneighbors of the query sample If these ensembles make thesame suggestions for these neighbors the region of com-petence becomes the safe region When ensembles havedifferent suggestions for these neighbors the region ofcompetence becomes the indecision region According toFigures 2(b)ndash2(d) it can be seen that ensembles can reach100 (77) 714 (57) and 571 (47) respectively Inaddition MCP focuses on the indecision region and thelocal competence of each learner that can distinguish the

samples with different classes in the indecision region isestimated

Suppose that there are TS learnersHS H1 Hs1113864 1113865 (inSection 22) selected to compose the ensemble pool afterbeing globally explored using the Pareto optimization al-gorithm during the overproduction stage Suppose thatquery sample x with N neighbors in the region of com-petence has various suggestions made by the ensembles N1

x

denotes the number votes of the most popular label voted forby the ensemblesHS In addition N2

x represents the numbervotes of the second most popular label voted for by Hlowast -emarginal criterion is used to measure the amount of in-formation for the ensembles in the indecision region usingMCP We define the difference between the labels with themost and second most votes as the marginal information forthe neighbors of query sample x as follows

margin(x) N1

x minusN2x

TS (3)

Entropy is used to compute the amount of informationquantity in the machine learning field [29] To measure themargins of the final hypotheses the margin entropy criterionis defined to calculate the amount of information of samplesX for Hi in HS H1 Hs1113864 1113865

CHi(X) minus

1N

1113944

N

j1log margin xj1113872 11138731113960 1113961 (4)

where N denotes the number of neighbors around querysamplesX If a greater difference is calculated by equation (3)between the most and second most popular labels then it

Overproduction

H = h1 hT

Hlowast = h1 hTlowast

XTr

Definition of the region of competence

XVal XTes

Pruning stage

Integration stage

MCPPruningcriteria

Pareto optimization MDM KNN

K-meansMajority voting

Oracle

Figure 1 Architecture of MBPEP which consists of four stages overproduction region of competence definition pruning and integration

4 Computational Intelligence and Neuroscience

can be interfered that these ensembles cannot detect samplesthat belong to different classes in the region of competence Inaddition in MBPEP N1

x minusN2x is always less than TS and

hence the maximum value of margin(x) is less than 1According to equation (4) it can be seen that if the calculatedmargin entropy CHi

(X) of validation sample X is more than 0then the learner can be used to compose the expert Otherwisewhen CHi

(X)le 0 the learner needs to be pruned-is processis called MCP MCP performs greedy research to define asubset of the ensemble pool that can be used as a preselector todetermine a suitable learner that can recognize the differenttargets in the indecision region -e MCP algorithm is shownin Algorithm 1

3 Results and Discussion

31 Configuration Sixteen UCI datasets [30] are used asbenchmarks in this paper -e characteristics of these

datasets are shown in Table 1 and each of them is split intothree parts a training set a validation set and a test setClassification and regression trees (CARTs) are used as baselearners First the bagging method is applied to split thetraining datasets -e training sets are divided into subsetsand each subset is selected using bootstrap extraction -esubclassifiers are trained using the training set -e numberof ensembles is preset as 100 Second the MBPEP method isapplied to prune the validation set -e optimized sub-classifiers are aggregated into the final hypotheses using themajority voting method [1] Finally the test set is used tomeasure the performance of the methods

32 Pruning Metrics -e bagging ensemble is a commonarchitecture that uses learners to train different bootstrappedsamples In this paper the full bagging method in which allbase learners are selected to construct the ensemble

Query

H1H2

(a)

H1

Query

(b)

H2

Query

(c)

H3

Query

(d)

Figure 2 -e region of competence during the pruning stage -e query sample is located in (a b) the safe region and (c d) the indecisionregion

Computational Intelligence and Neuroscience 5

classifiers is used as the baseline algorithm-e classificationaccuracy and ensemble size of MBPEP are compared withthe optimization-based pruning method EA [31] and fourcompetence ordering-based pruning methods ie reducederror pruning [32] kappa pruning [33] complementaritymeasure pruning [34] and margin distance minimizationpruning [26] -ese pruning methods are described asfollows

(i) Optimization-based pruning [31] regards aggre-gation as a programming problem that intends toselect the optimized subset of base learners byminimizing the validation error In this paper EAis used as an optimization ensemble method

(ii) Reduced error (RE) pruning [32] sorts the sub-classifiers and adds these subclassifiers one by oneto find the lowest classification error of the finalensemble Margineantu and Dietterich use theback fitting search method to approximate thegeneralization performance of RE pruning

(iii) Kappa pruning [33] uses the Kappa error as astatistical method that measures the diversity be-tween a pair of sublearners Kuncheva uses κ

statistics to measure the correlation between theoutputs of two sublearners -e classifiers are it-eratively added to the ensemble with the lowest κstatistics

(iv) Complementarity measure (CM) pruning [34] isan ordered pruning ensemble that focuses onfinding the most complementary learners to theensemble for each iteration and incorporates theminto the ensemble classifier -e complementaritypruning method enhances the performance of theclasses with the most votes without harming theclasses with the least votes

(v) Margin distance minimization (MDM) pruning[16] was discussed in Introduction

(vi) Randomized reference classifier (RRC) pruning[26] estimates the competence of each learner inthe ensemble pool for the validation set using thecorresponding probability outputs and severalrandom variables with beta probabilitydistribution

(vii) KNORA-Eliminate [19] was discussed inIntroduction

(viii) META-DES [20] was discussed in Introduction

33 Characteristics of MBPEP To investigate the charac-teristics of MBPEP the advantages of both Pareto optimi-zation and MCP are measured In this section RRCKNORA-Eliminate full bagging and META-DES areused for comparison purposes Notably the full baggingmethod can be used as the baseline without pruning-e twocontributions (Pareto optimization and MCP) for MBPEPare applied to the overproduction and pruning stages re-spectively and hence two experiments in which MCP andMBPEP are embedded into dynamic ensemble pruningmodels which are executed to explore the characteristics ofMBPEP

To measure the advantages of MCP from Figure 3 wecan see that the performances of the dynamic ensemblepruning models (for the META and KNORA models) basedon MCP are superior to those without MCP In addition all

Require the ensemble pool HS H1 Hs1113864 1113865 generated during the overproduction stage(1) Hlowast (the pruned ensemble pool is empty)(2) for i in HS H1 Hs1113864 1113865 do(3) if sample X is in the safe region(4) continue(5) else if sample X is in the indecision region(6) if CHi

(X)gt 0(7) Hlowast HlowastcupHi

(8) else if CHi(X)le 0

(9) Hi is pruned(10) end if(11) end if(12) return Hlowast

ALGORITHM 1 MCP algorithm

Table 1 Benchmark datasets used in the experiments

Data set Examples AttributesAustralian 690 14Breast cancer 286 9Liver 345 6Heart-statlog 270 13Waveform 1000 20Ionosphere 351 34Kr-vs-kp 3196 96Letter 20000 16German 306 3Diabetes 76 4Sonar 208 60Spambase 4601 57Tic-tac-toe 958 9Vehicle 846 18Vote 435 16Pima 768 8

6 Computational Intelligence and Neuroscience

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 3: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

criteria Based on region of competence definition similarsamples for unknown query instance Xquery and the compe-tence level of each base learner can be estimated -e commontechniques for defining the region of competence includeminimum difference minimization [16] k-nearest neighbors(KNN) [24] K-means [25] and the competence map method[26] During the pruning stage some learners are extracted toconstruct the expert with respect to the test setXTes In contracttraditional dynamic ensemble pruning models use insufficientpruning techniques such as KNORA-Eliminate [19] andMETA[20] to prune learners with low competitiveness in the ensemblepool In MBPEP the expert Hlowast H1 HTlowast1113864 1113865 is estimatedthrough two steps for the test set Meanwhile the region ofcompetence for XTes will be checked usingMBPEPwith respectto whether the learners in the ensembles have the same sug-gestions when they recognize the samples in the region ofcompetence for XTes When every learner in the ensembles hasthe same suggestion that region is defined as a safe region inthis condition the MCP method does not need to be activatedand the ensemble pool is passed down to various pruningtechniques However when the learners in ensembles makedifferent suggestions for samples in the region of competencefor XTes the region is defined as the indecision region MCP isapplicable to the scope of the indecision region and the size ofthe expert can be further pruned -e MCP operator whichdoes not damage the general pruning criteria [19 20] can beused as a preselector to improve exploration for the local in-formation between the base learners in the ensemble poolDuring the integration stage the results that are calculated bythe expert are aggregated -e strategies used for aggregationcan be separated into three types nontrainable methods [1] inwhich the weights of the outputs do not need to be learnedtrainable methods [1] in which the outputs from the experts inthe pruning stage are used as input features to be trained byanother learner and dynamic weighting methods [1] in whichthe weights are determined by the estimated competence levelsof every expert -e famous integration methods includemajority voting and oracle In this paper majority voting is usedas the aggregation strategy -e architecture of the MBPEP-based dynamic ensemble pruning framework is shown inFigure 1 Notably the performance of dynamic ensemblepruning is sensitive to the definition of the region of compe-tence Since the region of competence is not the key contri-bution of MBPEP in this paper different criteria are used todefine the region of competence

22 Pareto Optimization Evolutionary Algorithm In recentyears several studies [11ndash20] have demonstrated that en-semble models always achieve better results than any baselearner However methods of building ensembles that canachieve great performance with respect to the test set are stillrequired In this paper this problem is solved by MBPEPduring the overproduction stage In Section 21 the archi-tecture of MBPEP that is introduced during the over-production stage in Figure 1 is discussed and theoverproduced learners are trained to compose the ensemblepool In general it is a conflicting phenomenon that few baselearners produce few decision boundaries without achieving

powerful performance According to previous studies [16ndash20] the ensemble pool size is fixed during the over-production stage and calculation of the size of the ensemblepool that is suitable for handling classification tasks ischallenging Our objective is to estimate an optimal en-semble pool size for achieving better performance than thatof a fixed ensemble pool size -e MBPEP computationframework uses Pareto optimization to address the learningperformance and the ensemble pool size

In this study the initial ensemble pool size is T(H H1 HT1113864 1113865) the inputs are X X1 Xn1113864 1113865 andthe targets are y y1 yn1113864 1113865 where n is the number ofinstances and c is the number of classes -e correspondingoptimized pool size after calculation is defined as HS -evector representation form of the optimized ensemble pool iscomposed of a binary element vector that is defined asS isin 0 1 T with T dimensions Si 1 denotes that the ithlearner in H H1 HT1113864 1113865 is selected otherwise Si 0-e Pareto optimization algorithm converts the calculation ofthe optimal ensemble pool size into a subset selectionproblem Suppose that the outputs of the different ensemblepools can be denoted as hisk (i isin 1 n s isin [0 T] andk isin 1 c) where i s and k denote the indexes of instancesensembles and classes respectively Following these symbolsthe classification error f(HS) can be denoted as follows

f HS( 1113857 1minus1n

1113944

n

i1max

c

k11113944

T

s1hisk (1)

-e classification performance and ensemble pool sizeare estimated using the Pareto optimization bi-objectiveprogramming formulation as follows

argminSisin 01 T

f HS( 1113857 L( 1113857(2)

where L denotes sum(S) -e operator sum(middot) denotes thatthe binary vector S is summed up to calculate the ensemblepool size Unlike other single-objective algorithms [27 28]the concept of ldquodominationrdquo which continuously disturbsthe ensemble size and measures the difference betweenf(HS) and L during the global searching iteration process isintroduced in Pareto optimization

23 Margin Criterion Pruning (MCP) As mentioned inSection 21 the Pareto optimization method is used duringthe overproduction stage to calculate the optimal ensemblepool size in the dynamic ensemble pruning model If querysample xquery is located in the indecision boundaries of thelearners then the predictions of the learners for querysample xquery might differ from the samples that belong to itscorresponding region of competence Like the other char-acteristics of heuristic algorithms [22] the global explorationbetween the base learners used by Pareto optimization withrespect to the two objectives (f(HS) and sum(s)) cannotdeal with this condition Pareto optimization provides thesuboptimal [21] ensemble pool size during the over-production stage because the local information between thebase learners and the targets is neglected in the solution

Computational Intelligence and Neuroscience 3

space Different from other publications previous studies[16ndash20] propose using the pruning criteria to estimate thecompetence of each learner for query sample xquery withoutconsidering when this query sample is located in the in-decision boundaries In this paper the local information ismeasured by MCP mechanism between these base learnersin the ensemble pool to compute the information quantitythat query sample xquery is located in the indecisionboundaries of the learners -e MCP is added during thepruning stage as a preselector that is executed in parallel withsuch classical pruning criteria [16ndash20] As a subset of theregion of competence the indecision region of the ensemblepool is defined and explored to estimate the local compe-tence of each sublearner in the ensembles A schematicdiagram of the region of competence for four ensemble poolsize conditions is shown in Figure 2

-e safe region and the indecision region are defined andshown in Figures 2(a) and 2(b) and Figures 2(c) and 2(d)respectively As shown in Figure 2 two regions of compe-tence that query samples (yellow triangle in Figure 2) locatedin the safe and indecision regions are determined accordingto whether the ensembles can correctly recognize theneighbors of the query sample If these ensembles make thesame suggestions for these neighbors the region of com-petence becomes the safe region When ensembles havedifferent suggestions for these neighbors the region ofcompetence becomes the indecision region According toFigures 2(b)ndash2(d) it can be seen that ensembles can reach100 (77) 714 (57) and 571 (47) respectively Inaddition MCP focuses on the indecision region and thelocal competence of each learner that can distinguish the

samples with different classes in the indecision region isestimated

Suppose that there are TS learnersHS H1 Hs1113864 1113865 (inSection 22) selected to compose the ensemble pool afterbeing globally explored using the Pareto optimization al-gorithm during the overproduction stage Suppose thatquery sample x with N neighbors in the region of com-petence has various suggestions made by the ensembles N1

x

denotes the number votes of the most popular label voted forby the ensemblesHS In addition N2

x represents the numbervotes of the second most popular label voted for by Hlowast -emarginal criterion is used to measure the amount of in-formation for the ensembles in the indecision region usingMCP We define the difference between the labels with themost and second most votes as the marginal information forthe neighbors of query sample x as follows

margin(x) N1

x minusN2x

TS (3)

Entropy is used to compute the amount of informationquantity in the machine learning field [29] To measure themargins of the final hypotheses the margin entropy criterionis defined to calculate the amount of information of samplesX for Hi in HS H1 Hs1113864 1113865

CHi(X) minus

1N

1113944

N

j1log margin xj1113872 11138731113960 1113961 (4)

where N denotes the number of neighbors around querysamplesX If a greater difference is calculated by equation (3)between the most and second most popular labels then it

Overproduction

H = h1 hT

Hlowast = h1 hTlowast

XTr

Definition of the region of competence

XVal XTes

Pruning stage

Integration stage

MCPPruningcriteria

Pareto optimization MDM KNN

K-meansMajority voting

Oracle

Figure 1 Architecture of MBPEP which consists of four stages overproduction region of competence definition pruning and integration

4 Computational Intelligence and Neuroscience

can be interfered that these ensembles cannot detect samplesthat belong to different classes in the region of competence Inaddition in MBPEP N1

x minusN2x is always less than TS and

hence the maximum value of margin(x) is less than 1According to equation (4) it can be seen that if the calculatedmargin entropy CHi

(X) of validation sample X is more than 0then the learner can be used to compose the expert Otherwisewhen CHi

(X)le 0 the learner needs to be pruned-is processis called MCP MCP performs greedy research to define asubset of the ensemble pool that can be used as a preselector todetermine a suitable learner that can recognize the differenttargets in the indecision region -e MCP algorithm is shownin Algorithm 1

3 Results and Discussion

31 Configuration Sixteen UCI datasets [30] are used asbenchmarks in this paper -e characteristics of these

datasets are shown in Table 1 and each of them is split intothree parts a training set a validation set and a test setClassification and regression trees (CARTs) are used as baselearners First the bagging method is applied to split thetraining datasets -e training sets are divided into subsetsand each subset is selected using bootstrap extraction -esubclassifiers are trained using the training set -e numberof ensembles is preset as 100 Second the MBPEP method isapplied to prune the validation set -e optimized sub-classifiers are aggregated into the final hypotheses using themajority voting method [1] Finally the test set is used tomeasure the performance of the methods

32 Pruning Metrics -e bagging ensemble is a commonarchitecture that uses learners to train different bootstrappedsamples In this paper the full bagging method in which allbase learners are selected to construct the ensemble

Query

H1H2

(a)

H1

Query

(b)

H2

Query

(c)

H3

Query

(d)

Figure 2 -e region of competence during the pruning stage -e query sample is located in (a b) the safe region and (c d) the indecisionregion

Computational Intelligence and Neuroscience 5

classifiers is used as the baseline algorithm-e classificationaccuracy and ensemble size of MBPEP are compared withthe optimization-based pruning method EA [31] and fourcompetence ordering-based pruning methods ie reducederror pruning [32] kappa pruning [33] complementaritymeasure pruning [34] and margin distance minimizationpruning [26] -ese pruning methods are described asfollows

(i) Optimization-based pruning [31] regards aggre-gation as a programming problem that intends toselect the optimized subset of base learners byminimizing the validation error In this paper EAis used as an optimization ensemble method

(ii) Reduced error (RE) pruning [32] sorts the sub-classifiers and adds these subclassifiers one by oneto find the lowest classification error of the finalensemble Margineantu and Dietterich use theback fitting search method to approximate thegeneralization performance of RE pruning

(iii) Kappa pruning [33] uses the Kappa error as astatistical method that measures the diversity be-tween a pair of sublearners Kuncheva uses κ

statistics to measure the correlation between theoutputs of two sublearners -e classifiers are it-eratively added to the ensemble with the lowest κstatistics

(iv) Complementarity measure (CM) pruning [34] isan ordered pruning ensemble that focuses onfinding the most complementary learners to theensemble for each iteration and incorporates theminto the ensemble classifier -e complementaritypruning method enhances the performance of theclasses with the most votes without harming theclasses with the least votes

(v) Margin distance minimization (MDM) pruning[16] was discussed in Introduction

(vi) Randomized reference classifier (RRC) pruning[26] estimates the competence of each learner inthe ensemble pool for the validation set using thecorresponding probability outputs and severalrandom variables with beta probabilitydistribution

(vii) KNORA-Eliminate [19] was discussed inIntroduction

(viii) META-DES [20] was discussed in Introduction

33 Characteristics of MBPEP To investigate the charac-teristics of MBPEP the advantages of both Pareto optimi-zation and MCP are measured In this section RRCKNORA-Eliminate full bagging and META-DES areused for comparison purposes Notably the full baggingmethod can be used as the baseline without pruning-e twocontributions (Pareto optimization and MCP) for MBPEPare applied to the overproduction and pruning stages re-spectively and hence two experiments in which MCP andMBPEP are embedded into dynamic ensemble pruningmodels which are executed to explore the characteristics ofMBPEP

To measure the advantages of MCP from Figure 3 wecan see that the performances of the dynamic ensemblepruning models (for the META and KNORA models) basedon MCP are superior to those without MCP In addition all

Require the ensemble pool HS H1 Hs1113864 1113865 generated during the overproduction stage(1) Hlowast (the pruned ensemble pool is empty)(2) for i in HS H1 Hs1113864 1113865 do(3) if sample X is in the safe region(4) continue(5) else if sample X is in the indecision region(6) if CHi

(X)gt 0(7) Hlowast HlowastcupHi

(8) else if CHi(X)le 0

(9) Hi is pruned(10) end if(11) end if(12) return Hlowast

ALGORITHM 1 MCP algorithm

Table 1 Benchmark datasets used in the experiments

Data set Examples AttributesAustralian 690 14Breast cancer 286 9Liver 345 6Heart-statlog 270 13Waveform 1000 20Ionosphere 351 34Kr-vs-kp 3196 96Letter 20000 16German 306 3Diabetes 76 4Sonar 208 60Spambase 4601 57Tic-tac-toe 958 9Vehicle 846 18Vote 435 16Pima 768 8

6 Computational Intelligence and Neuroscience

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 4: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

space Different from other publications previous studies[16ndash20] propose using the pruning criteria to estimate thecompetence of each learner for query sample xquery withoutconsidering when this query sample is located in the in-decision boundaries In this paper the local information ismeasured by MCP mechanism between these base learnersin the ensemble pool to compute the information quantitythat query sample xquery is located in the indecisionboundaries of the learners -e MCP is added during thepruning stage as a preselector that is executed in parallel withsuch classical pruning criteria [16ndash20] As a subset of theregion of competence the indecision region of the ensemblepool is defined and explored to estimate the local compe-tence of each sublearner in the ensembles A schematicdiagram of the region of competence for four ensemble poolsize conditions is shown in Figure 2

-e safe region and the indecision region are defined andshown in Figures 2(a) and 2(b) and Figures 2(c) and 2(d)respectively As shown in Figure 2 two regions of compe-tence that query samples (yellow triangle in Figure 2) locatedin the safe and indecision regions are determined accordingto whether the ensembles can correctly recognize theneighbors of the query sample If these ensembles make thesame suggestions for these neighbors the region of com-petence becomes the safe region When ensembles havedifferent suggestions for these neighbors the region ofcompetence becomes the indecision region According toFigures 2(b)ndash2(d) it can be seen that ensembles can reach100 (77) 714 (57) and 571 (47) respectively Inaddition MCP focuses on the indecision region and thelocal competence of each learner that can distinguish the

samples with different classes in the indecision region isestimated

Suppose that there are TS learnersHS H1 Hs1113864 1113865 (inSection 22) selected to compose the ensemble pool afterbeing globally explored using the Pareto optimization al-gorithm during the overproduction stage Suppose thatquery sample x with N neighbors in the region of com-petence has various suggestions made by the ensembles N1

x

denotes the number votes of the most popular label voted forby the ensemblesHS In addition N2

x represents the numbervotes of the second most popular label voted for by Hlowast -emarginal criterion is used to measure the amount of in-formation for the ensembles in the indecision region usingMCP We define the difference between the labels with themost and second most votes as the marginal information forthe neighbors of query sample x as follows

margin(x) N1

x minusN2x

TS (3)

Entropy is used to compute the amount of informationquantity in the machine learning field [29] To measure themargins of the final hypotheses the margin entropy criterionis defined to calculate the amount of information of samplesX for Hi in HS H1 Hs1113864 1113865

CHi(X) minus

1N

1113944

N

j1log margin xj1113872 11138731113960 1113961 (4)

where N denotes the number of neighbors around querysamplesX If a greater difference is calculated by equation (3)between the most and second most popular labels then it

Overproduction

H = h1 hT

Hlowast = h1 hTlowast

XTr

Definition of the region of competence

XVal XTes

Pruning stage

Integration stage

MCPPruningcriteria

Pareto optimization MDM KNN

K-meansMajority voting

Oracle

Figure 1 Architecture of MBPEP which consists of four stages overproduction region of competence definition pruning and integration

4 Computational Intelligence and Neuroscience

can be interfered that these ensembles cannot detect samplesthat belong to different classes in the region of competence Inaddition in MBPEP N1

x minusN2x is always less than TS and

hence the maximum value of margin(x) is less than 1According to equation (4) it can be seen that if the calculatedmargin entropy CHi

(X) of validation sample X is more than 0then the learner can be used to compose the expert Otherwisewhen CHi

(X)le 0 the learner needs to be pruned-is processis called MCP MCP performs greedy research to define asubset of the ensemble pool that can be used as a preselector todetermine a suitable learner that can recognize the differenttargets in the indecision region -e MCP algorithm is shownin Algorithm 1

3 Results and Discussion

31 Configuration Sixteen UCI datasets [30] are used asbenchmarks in this paper -e characteristics of these

datasets are shown in Table 1 and each of them is split intothree parts a training set a validation set and a test setClassification and regression trees (CARTs) are used as baselearners First the bagging method is applied to split thetraining datasets -e training sets are divided into subsetsand each subset is selected using bootstrap extraction -esubclassifiers are trained using the training set -e numberof ensembles is preset as 100 Second the MBPEP method isapplied to prune the validation set -e optimized sub-classifiers are aggregated into the final hypotheses using themajority voting method [1] Finally the test set is used tomeasure the performance of the methods

32 Pruning Metrics -e bagging ensemble is a commonarchitecture that uses learners to train different bootstrappedsamples In this paper the full bagging method in which allbase learners are selected to construct the ensemble

Query

H1H2

(a)

H1

Query

(b)

H2

Query

(c)

H3

Query

(d)

Figure 2 -e region of competence during the pruning stage -e query sample is located in (a b) the safe region and (c d) the indecisionregion

Computational Intelligence and Neuroscience 5

classifiers is used as the baseline algorithm-e classificationaccuracy and ensemble size of MBPEP are compared withthe optimization-based pruning method EA [31] and fourcompetence ordering-based pruning methods ie reducederror pruning [32] kappa pruning [33] complementaritymeasure pruning [34] and margin distance minimizationpruning [26] -ese pruning methods are described asfollows

(i) Optimization-based pruning [31] regards aggre-gation as a programming problem that intends toselect the optimized subset of base learners byminimizing the validation error In this paper EAis used as an optimization ensemble method

(ii) Reduced error (RE) pruning [32] sorts the sub-classifiers and adds these subclassifiers one by oneto find the lowest classification error of the finalensemble Margineantu and Dietterich use theback fitting search method to approximate thegeneralization performance of RE pruning

(iii) Kappa pruning [33] uses the Kappa error as astatistical method that measures the diversity be-tween a pair of sublearners Kuncheva uses κ

statistics to measure the correlation between theoutputs of two sublearners -e classifiers are it-eratively added to the ensemble with the lowest κstatistics

(iv) Complementarity measure (CM) pruning [34] isan ordered pruning ensemble that focuses onfinding the most complementary learners to theensemble for each iteration and incorporates theminto the ensemble classifier -e complementaritypruning method enhances the performance of theclasses with the most votes without harming theclasses with the least votes

(v) Margin distance minimization (MDM) pruning[16] was discussed in Introduction

(vi) Randomized reference classifier (RRC) pruning[26] estimates the competence of each learner inthe ensemble pool for the validation set using thecorresponding probability outputs and severalrandom variables with beta probabilitydistribution

(vii) KNORA-Eliminate [19] was discussed inIntroduction

(viii) META-DES [20] was discussed in Introduction

33 Characteristics of MBPEP To investigate the charac-teristics of MBPEP the advantages of both Pareto optimi-zation and MCP are measured In this section RRCKNORA-Eliminate full bagging and META-DES areused for comparison purposes Notably the full baggingmethod can be used as the baseline without pruning-e twocontributions (Pareto optimization and MCP) for MBPEPare applied to the overproduction and pruning stages re-spectively and hence two experiments in which MCP andMBPEP are embedded into dynamic ensemble pruningmodels which are executed to explore the characteristics ofMBPEP

To measure the advantages of MCP from Figure 3 wecan see that the performances of the dynamic ensemblepruning models (for the META and KNORA models) basedon MCP are superior to those without MCP In addition all

Require the ensemble pool HS H1 Hs1113864 1113865 generated during the overproduction stage(1) Hlowast (the pruned ensemble pool is empty)(2) for i in HS H1 Hs1113864 1113865 do(3) if sample X is in the safe region(4) continue(5) else if sample X is in the indecision region(6) if CHi

(X)gt 0(7) Hlowast HlowastcupHi

(8) else if CHi(X)le 0

(9) Hi is pruned(10) end if(11) end if(12) return Hlowast

ALGORITHM 1 MCP algorithm

Table 1 Benchmark datasets used in the experiments

Data set Examples AttributesAustralian 690 14Breast cancer 286 9Liver 345 6Heart-statlog 270 13Waveform 1000 20Ionosphere 351 34Kr-vs-kp 3196 96Letter 20000 16German 306 3Diabetes 76 4Sonar 208 60Spambase 4601 57Tic-tac-toe 958 9Vehicle 846 18Vote 435 16Pima 768 8

6 Computational Intelligence and Neuroscience

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 5: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

can be interfered that these ensembles cannot detect samplesthat belong to different classes in the region of competence Inaddition in MBPEP N1

x minusN2x is always less than TS and

hence the maximum value of margin(x) is less than 1According to equation (4) it can be seen that if the calculatedmargin entropy CHi

(X) of validation sample X is more than 0then the learner can be used to compose the expert Otherwisewhen CHi

(X)le 0 the learner needs to be pruned-is processis called MCP MCP performs greedy research to define asubset of the ensemble pool that can be used as a preselector todetermine a suitable learner that can recognize the differenttargets in the indecision region -e MCP algorithm is shownin Algorithm 1

3 Results and Discussion

31 Configuration Sixteen UCI datasets [30] are used asbenchmarks in this paper -e characteristics of these

datasets are shown in Table 1 and each of them is split intothree parts a training set a validation set and a test setClassification and regression trees (CARTs) are used as baselearners First the bagging method is applied to split thetraining datasets -e training sets are divided into subsetsand each subset is selected using bootstrap extraction -esubclassifiers are trained using the training set -e numberof ensembles is preset as 100 Second the MBPEP method isapplied to prune the validation set -e optimized sub-classifiers are aggregated into the final hypotheses using themajority voting method [1] Finally the test set is used tomeasure the performance of the methods

32 Pruning Metrics -e bagging ensemble is a commonarchitecture that uses learners to train different bootstrappedsamples In this paper the full bagging method in which allbase learners are selected to construct the ensemble

Query

H1H2

(a)

H1

Query

(b)

H2

Query

(c)

H3

Query

(d)

Figure 2 -e region of competence during the pruning stage -e query sample is located in (a b) the safe region and (c d) the indecisionregion

Computational Intelligence and Neuroscience 5

classifiers is used as the baseline algorithm-e classificationaccuracy and ensemble size of MBPEP are compared withthe optimization-based pruning method EA [31] and fourcompetence ordering-based pruning methods ie reducederror pruning [32] kappa pruning [33] complementaritymeasure pruning [34] and margin distance minimizationpruning [26] -ese pruning methods are described asfollows

(i) Optimization-based pruning [31] regards aggre-gation as a programming problem that intends toselect the optimized subset of base learners byminimizing the validation error In this paper EAis used as an optimization ensemble method

(ii) Reduced error (RE) pruning [32] sorts the sub-classifiers and adds these subclassifiers one by oneto find the lowest classification error of the finalensemble Margineantu and Dietterich use theback fitting search method to approximate thegeneralization performance of RE pruning

(iii) Kappa pruning [33] uses the Kappa error as astatistical method that measures the diversity be-tween a pair of sublearners Kuncheva uses κ

statistics to measure the correlation between theoutputs of two sublearners -e classifiers are it-eratively added to the ensemble with the lowest κstatistics

(iv) Complementarity measure (CM) pruning [34] isan ordered pruning ensemble that focuses onfinding the most complementary learners to theensemble for each iteration and incorporates theminto the ensemble classifier -e complementaritypruning method enhances the performance of theclasses with the most votes without harming theclasses with the least votes

(v) Margin distance minimization (MDM) pruning[16] was discussed in Introduction

(vi) Randomized reference classifier (RRC) pruning[26] estimates the competence of each learner inthe ensemble pool for the validation set using thecorresponding probability outputs and severalrandom variables with beta probabilitydistribution

(vii) KNORA-Eliminate [19] was discussed inIntroduction

(viii) META-DES [20] was discussed in Introduction

33 Characteristics of MBPEP To investigate the charac-teristics of MBPEP the advantages of both Pareto optimi-zation and MCP are measured In this section RRCKNORA-Eliminate full bagging and META-DES areused for comparison purposes Notably the full baggingmethod can be used as the baseline without pruning-e twocontributions (Pareto optimization and MCP) for MBPEPare applied to the overproduction and pruning stages re-spectively and hence two experiments in which MCP andMBPEP are embedded into dynamic ensemble pruningmodels which are executed to explore the characteristics ofMBPEP

To measure the advantages of MCP from Figure 3 wecan see that the performances of the dynamic ensemblepruning models (for the META and KNORA models) basedon MCP are superior to those without MCP In addition all

Require the ensemble pool HS H1 Hs1113864 1113865 generated during the overproduction stage(1) Hlowast (the pruned ensemble pool is empty)(2) for i in HS H1 Hs1113864 1113865 do(3) if sample X is in the safe region(4) continue(5) else if sample X is in the indecision region(6) if CHi

(X)gt 0(7) Hlowast HlowastcupHi

(8) else if CHi(X)le 0

(9) Hi is pruned(10) end if(11) end if(12) return Hlowast

ALGORITHM 1 MCP algorithm

Table 1 Benchmark datasets used in the experiments

Data set Examples AttributesAustralian 690 14Breast cancer 286 9Liver 345 6Heart-statlog 270 13Waveform 1000 20Ionosphere 351 34Kr-vs-kp 3196 96Letter 20000 16German 306 3Diabetes 76 4Sonar 208 60Spambase 4601 57Tic-tac-toe 958 9Vehicle 846 18Vote 435 16Pima 768 8

6 Computational Intelligence and Neuroscience

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 6: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

classifiers is used as the baseline algorithm-e classificationaccuracy and ensemble size of MBPEP are compared withthe optimization-based pruning method EA [31] and fourcompetence ordering-based pruning methods ie reducederror pruning [32] kappa pruning [33] complementaritymeasure pruning [34] and margin distance minimizationpruning [26] -ese pruning methods are described asfollows

(i) Optimization-based pruning [31] regards aggre-gation as a programming problem that intends toselect the optimized subset of base learners byminimizing the validation error In this paper EAis used as an optimization ensemble method

(ii) Reduced error (RE) pruning [32] sorts the sub-classifiers and adds these subclassifiers one by oneto find the lowest classification error of the finalensemble Margineantu and Dietterich use theback fitting search method to approximate thegeneralization performance of RE pruning

(iii) Kappa pruning [33] uses the Kappa error as astatistical method that measures the diversity be-tween a pair of sublearners Kuncheva uses κ

statistics to measure the correlation between theoutputs of two sublearners -e classifiers are it-eratively added to the ensemble with the lowest κstatistics

(iv) Complementarity measure (CM) pruning [34] isan ordered pruning ensemble that focuses onfinding the most complementary learners to theensemble for each iteration and incorporates theminto the ensemble classifier -e complementaritypruning method enhances the performance of theclasses with the most votes without harming theclasses with the least votes

(v) Margin distance minimization (MDM) pruning[16] was discussed in Introduction

(vi) Randomized reference classifier (RRC) pruning[26] estimates the competence of each learner inthe ensemble pool for the validation set using thecorresponding probability outputs and severalrandom variables with beta probabilitydistribution

(vii) KNORA-Eliminate [19] was discussed inIntroduction

(viii) META-DES [20] was discussed in Introduction

33 Characteristics of MBPEP To investigate the charac-teristics of MBPEP the advantages of both Pareto optimi-zation and MCP are measured In this section RRCKNORA-Eliminate full bagging and META-DES areused for comparison purposes Notably the full baggingmethod can be used as the baseline without pruning-e twocontributions (Pareto optimization and MCP) for MBPEPare applied to the overproduction and pruning stages re-spectively and hence two experiments in which MCP andMBPEP are embedded into dynamic ensemble pruningmodels which are executed to explore the characteristics ofMBPEP

To measure the advantages of MCP from Figure 3 wecan see that the performances of the dynamic ensemblepruning models (for the META and KNORA models) basedon MCP are superior to those without MCP In addition all

Require the ensemble pool HS H1 Hs1113864 1113865 generated during the overproduction stage(1) Hlowast (the pruned ensemble pool is empty)(2) for i in HS H1 Hs1113864 1113865 do(3) if sample X is in the safe region(4) continue(5) else if sample X is in the indecision region(6) if CHi

(X)gt 0(7) Hlowast HlowastcupHi

(8) else if CHi(X)le 0

(9) Hi is pruned(10) end if(11) end if(12) return Hlowast

ALGORITHM 1 MCP algorithm

Table 1 Benchmark datasets used in the experiments

Data set Examples AttributesAustralian 690 14Breast cancer 286 9Liver 345 6Heart-statlog 270 13Waveform 1000 20Ionosphere 351 34Kr-vs-kp 3196 96Letter 20000 16German 306 3Diabetes 76 4Sonar 208 60Spambase 4601 57Tic-tac-toe 958 9Vehicle 846 18Vote 435 16Pima 768 8

6 Computational Intelligence and Neuroscience

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 7: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

EPSmodels achieve better performance than full bagging Asan alternative technique for measuring performance the F1value is an eective metric that has been widely [35] used tomeasure the precision and sensitivity of results and it iscalculated as follows

F1 2TP

2TP + FP + FN (5)

where TP FP and FN denote true positives false positivesand false negatives respectively For example for class 0 inFigure 3 the F1 values for META-DES KNORA-Eliminateand RRC with MCP in Figure 3 are 088 089 and 089 Inaddition for class 1 in Figure 3 the F1 values are 088 088and 088 respectively To explore the inuence of MBPEP indynamic ensemble models optimal overproduction andlearning performance are used as metrics in Figure 4 Dif-ferent from Figure 3 META-MCP KNORA-MCP and RRC-MCP in Figure 4 are processed by Pareto optimization duringthe overproduction stage META KNORA and RRC showthat ensemble pruning model is learned by Pareto optimi-zation but not MCP From the top panel of Figure 4 it can beseen that META and KNORA-Eliminate models with MCPachieve superior learning performance to those withoutMCP

For RRC RRC-MCP achieves comparable learning perfor-mance to that of RRC From the bottom panel of Figure 4 itcan be seen that META and RRC with MCP achieve anoptimal overproduction pool size compared to that withoutMCP For the KNORA-Eliminate model KNORA-MCPachieves a comparable pool size to that without MCP

34Comparison Tomeasure the classication performancetest error and average optimized ensemble size are calculatedusing MBPEP and compared with the other ensemblemethods e results are shown in Tables 2 and 3 e testerrors and optimal ensemble sizes of each model are notxed for each experiment For calculation with the bench-marked sets each result is executed 30 times and the averageresults are shown in Tables 2 and 3 e winners are boldedfor MBPEP in Tables 2 and 3 e comparison methods arefull bagging RE pruning Kappa pruning CM pruningMDM pruning and EA pruning RE pruning Kappapruning CM pruning and MDM pruning are dynamicensemble pruning models that use competence ordering toestimate the competence of each learner EA pruning isbased on the optimized dynamic pruning model

09

089

088

087Accu

racy

086

085META-MCP KNORA-MCP RRC-MCP META KNORA RRC Full-bagging

Figure 3 Performance of the dynamic ensemble pruning methods with and without MCP

092

091

09

Acc

urac

y

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(a)

30

25

20

Opt

imal

ense

mbl

e siz

e

META-MCP KNORA-MCP RRC-MCP META KNORA RRC

(b)

Figure 4 Performance of the dynamic ensemble pruning models with and without MBPEP

Computational Intelligence and Neuroscience 7

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 8: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

In Table 2 it can be seen that MBPEP achieves the lowesttest error for 10 of 16 datasets (1016) Meanwhile thedatasets with lowest test errors are full bagging RE pruningKappa pruning CM pruning MDM pruning and EApruning with 2 3 4 3 1 and 1 out of 16 datasets re-spectively -us MBPEP has been demonstrated to achievebetter performance than the other methods Howeveraccording to Demsar [36] it is inappropriate to use a singlecriterion to evaluate the performance of an ensemble clas-sifier In this section the pairwise significance for numbersof direct wins is validated between MBPEP and the otheralgorithms using a sign test Notably a win is counted as 1and a tie is counted as 05 for each dataset to compareMBPEP with the other algorithms in Tables 2 and 3 FromTable 2 the numbers of direct wins are 13 105 11 10 11and 11 when comparing MBPEP with the other methods

We also measure the optimal ensemble size in Table 3 Itcan be easily found that the EA algorithm needs to querymore sublearners during the aggregation process -isphenomenon has been explained by Zhou et al [27]According to Table 3 MBPEP achieves the lowest ensemblesize on 8125 (1316) of the datasets while the othermethods achieve the lowest ensemble size in less than 19(316)-e results from Tables 2 and 3 support the claim thatthe performance of MBPEP is superior to those of thecompetence ordering method and the optimization-basedpruning method for most datasets -e reason for this resultis that MBPEP can simultaneously minimize classificationerror and ensemble size whereas competence orderingensemble methods (RE pruning Kappa pruning CMpruning and MDM pruning) focus on optimizing only oneobjective such as diversity but neglect the others

Table 2 Test error comparison

Data sets Ours Full bagging RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 1162 1430 1440 1430 1450 1480 1430Breast cancer 2383 2790 2770 2870 2920 2950 2750Liver 2573 3270 3200 3260 3060 3370 3170Heart-statlog 2350 1950 1870 2010 1990 1990 1960Waveform 2080 2220 2050 2090 2030 2010 2050Ionosphere 570 920 860 840 890 1000 930Kr-vs-kp 090 150 100 100 110 110 120Letter 852 590 480 480 480 570 530German 2254 2530 2410 2410 2390 2400 2430Diabetes 2216 2510 2430 2430 2410 2420 2400Sonar 2140 2660 2670 2490 2500 2680 2510Spambase 693 680 660 660 660 680 660Tic-tac-toe 1567 1640 1350 1320 1320 1450 1380Vehicle 2197 2280 2260 2330 2340 2440 2300Vote 440 470 440 410 430 450 450Optdigits 320 380 360 350 360 370 350Count of wins 10 2 3 4 3 1 1Direct wins 13 105 11 10 11 11

Table 3 Average optimized ensemble size

Data sets Ours RE pruning Kappa pruning CM pruning MDM pruning EA pruningAustralian 72 125 147 110 85 419Breast-cancer 72 87 261 88 78 446Liver 74 139 247 153 177 42Heart-statlog 96 114 179 132 136 442Waveform 9 123 156 117 115 424Ionosphere 91 79 105 85 107 488Kr-vs-kp 43 58 106 96 72 459Letter 125 151 138 129 232 383German 70 94 136 115 197 412Diabetes 117 143 168 126 181 428Sonar 132 110 206 139 206 431Spambase 126 185 200 190 288 397Tic-tac-toe 130 161 174 154 280 398Vehicle 70 157 165 112 216 419Vote 35 32 51 54 60 478Optdigits 223 250 252 214 468 414Count of wins 13 3 0 0 0 0Direct wins 13 16 14 16 16

8 Computational Intelligence and Neuroscience

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 9: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

35 Robustness of Classification -e robustness of the al-gorithms is an important indicator for measuring classifierperformance It can reflect the fault tolerance of the algo-rithms In this section the ensemble classifiers are con-structed under different levels of the Gaussian noise -enoise levels are determined using the variance intensity ofthe Gaussian noise When the experiment is executed thetest errors of the ensemble classifiers are modified when thetraining sets are corrupted by 000 002 004 and 008 noise-e results are measured using the 7 datasets described inTable 4 -e average test errors are shown in Table 4 andeach one is calculated 50 times To better evaluate the ro-bustness of MBPEP the performances of full bagging andthe other ensemble pruning methods are also measured -ewinners are bolded for MBPEP in Table 4

In Table 4 it can be seen that the test errors of allmethods increased with the intensity of the Gaussian noiseFor example MBPEP can achieve a 214 test error whenthere is no noise to corrupt the training set of the sonardataset However the test error reaches 323 when thetraining set is corrupted by 008 Gaussian noise Accordingto Table 4 MBPEP achieves the lowest test error on 5 5 6and 7 datasets of the 7 datasets under the different noiseintensities Specifically when the variance intensity of the

Gaussian noise is larger than 004 MBPEP performs betterthan the other methods For example the test error ofMBPEP reaches 2080 on the waveform dataset when thereis no noise to corrupt the training set It is not the bestclassification performance of all of the comparison methodsbut when the variance of the noise is larger than 004MBPEP achieves the lowest test errors -e number ofwinners for MBPEP increases with noise intensity whichdemonstrates that MBPEP is robust with respect to Gaussiannoise

36 Application to the Pattern Recognition Task MBPEP isapplied to handwritten digital character pattern recognition

Table 4 Test errors in the noisy classification problem

Noise Data sets Ours()

Full bagging()

RE pruning()

Kappa pruning()

CM pruning()

MDM pruning()

EA pruning()

000

Diabetes 2210 2510 2430 2430 2430 2420 2400German 2250 2530 2410 2410 2390 2400 2430Heart-statlog 2350 1950 1870 2010 1990 1990 1960

Ionosphere 570 920 860 840 890 1000 930Sonar 2140 2660 2670 2490 2500 2680 2510Vehicle 2190 2280 2260 2330 2340 2440 2300

Waveform 2080 2220 2050 2090 2030 2010 2050

002

Diabetes 2420 2660 2660 2630 2670 2640 2700German 2440 2770 2770 2810 2700 2710 2760Heart-statlog 2430 2320 2270 2160 2200 2220 2310

Ionosphere 720 1310 1180 1300 1150 1130 1220Sonar 2340 2810 2400 2620 2460 2350 2530Vehicle 2580 3300 2990 2990 2960 3010 2980

Waveform 2520 2680 2430 2580 2420 2390 2450

004

Diabetes 2730 3010 2960 2970 2960 2940 3010German 2910 3100 3010 3130 2990 2970 3070Heart-statlog 2800 2820 2810 2650 2750 2750 2800

Ionosphere 1140 1770 1580 1830 1570 1590 1750Sonar 2510 3010 2810 2960 2770 2590 2830Vehicle 2510 3010 2810 2960 2770 2590 2830

Waveform 2750 3010 2810 2920 2790 2760 2860

008

Diabetes 2750 3010 2810 2920 2790 2760 2860German 3360 3620 3610 3640 3590 3610 3820Heart-statlog 3240 3340 3260 3290 3320 3330 3440

Ionosphere 2640 2750 2650 2760 2660 2650 2790Sonar 3230 3610 3650 3840 3570 3640 3650Vehicle 3870 4300 4060 4200 4050 4020 4160

Waveform 3430 3760 3580 3680 3560 3530 3740

Figure 5 Examples of handwritten digits from the MNIST

Computational Intelligence and Neuroscience 9

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 10: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

Digital character recognition is an important research eldin computer vision MNIST [37] is a standard digitalcharacter benchmark dataset that contains 60000 and 10000grayscale images for training and testing respectively Eachimage is mapped to 10 classes that include the digits 0 to 9Figure 5 shows samples of MNIST To evaluate the gener-alization performance of the pruning ensemble method onMNIST it was tested 30 times

In this section MBPEP and the ve dynamic pruningmethods that were mentioned in Section 32 are applied tothe deep forest framework multigrained cascade forest (gc-Forest) [1] To validate the eciency of MBPEP the averagemax process has been replaced by dynamic pruning in gc-Forest We call gc-Forest with the dynamic pruning processthe modied version of gc-Forest e two modules in gc-Forest ie multigrained scanning and cascade forest arereserved e modied version of gc-Forest is shown inFigure 6

According to Figure 6 the overall learning procedure ofthe modied version of gc-Forest has the same constructionas the original version of gc-Foreste number of raw inputfeatures is scaled using the scanning method on the novelfeatures (printed in blue blocks) e scaling process isdetermined using the scaling window sizes e generatedfeatures are concatenated to the large feature vectors(printed in red blocks) that are prepared to enter the cascadeforest To encourage diversity and improve pruning e-ciency for the construction of the cascade forest each layerconsists of 100 random forests in this sectione number oflayers is self-adapted until the validation performance isbelow the error tolerance We introduce a metric namelythe improvement ratio of the test error in this sectionSpecically the original version of gc-Forest is used as abaseline e results are shown in Figure 7(a) In additionthe percentage of reduction for the number of optimizedlearners that has been compared with the original ensemble

Multi-grained scanningInputs

Random forest

Random forest

DynamicPruning

Output

Cascade forest

Figure 6 Modied version of gc-Forest

05

Impr

ovem

ent r

atio

of c

lass

ifica

tion

accu

racy 04

03

02

01

0

ndash01

ndash02

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(a)

100

90

80

70

60

50

40

30

20

10

0

Perc

enta

ge o

f red

uctio

n

MBPEP

Complementarity pruning

Margin distance minimization pruning

Reduce-error pruning

EA

(b)

Figure 7 Performance of pruning ensemble algorithms with full bagging on digital character recognition (a) Improvement ratio per-formance of the classication performance (b) Reduction percentage of the ensemble size

10 Computational Intelligence and Neuroscience

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 11: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

size is shown in Figure 7(b) -e different colors in Figure 7denote the different ensemble methods

According to Figure 7(a) it can be seen that not all en-semble methods can improve the classification accuracy whencompared with the original version of gc-Forest-emodifiedversion of gc-Forest with MBPEP could improve the classi-fication accuracy by 04 over that of the original versionMeanwhile the other methods can improve the classificationaccuracy by 02 From Figure 7(b) the results show thatMBPEP can store fewer sublearners during aggregationMBPEP reduces the query quantity by approximately 644for the MBPEP algorithm when combining these sublearnersinto the final hypotheses -e other four ensemble methodsneed to query more sublearners which increases the timeconsumption of the majority voting process -us MBPEPhas apparent advantages with less storage and higher com-putational efficiency than the other pruning methods

4 Conclusions

-e dynamic ensemble pruning technique is an importantstrategy for improving the performance of ensembleclassifiers when a subset of ensemble pools is used as aclassification expert with respect to the incoming setDuring the overproduction stage of the dynamic ensemblepruning model since selecting the optimal ensemble size isan NP-hard optimization problem the initial ensemblepool size is fixed Many previous publications mainly ad-dress the pruning criteria during the pruning stage and fewstudies utilize the local margin information for learners toselect the expert in an ensemble In this paper MBPEP isproposed

First the Pareto optimization algorithm is used byMBPEP to globally search for the feasible ensemble pool sizeduring the overproduction stage -e ldquodominationrdquo com-putation between the learning performance and the en-semble pool size is continuously estimated by Paretooptimization Second the margin entropy for every learnerin the ensembles is used to locally determine the expert thatcan detect the classes in the indecision region Several ex-periments are conducted to demonstrate the advantages ofMBPEP with respect to other state-of-art pruning methods(RE pruning Kappa pruning complementarity measurepruning MDM pruning RRC pruning KNORA-EliminateMETA-DES and EA ensemble method) Finally MBPEP isapplied to the deep forest framework to conduct hand-written digital character recognition tasks Compared to theoriginal version of gc-Forest the average max process hasbeen replaced by MBPEP-based gc-Forest -is modifiedversion of gc-Forest with MBPEP can improve the classi-fication accuracy while using fewer sublearners whencombining the final hypotheses

In the future we would like to focus on merging en-semble methods with deep learning frameworks -ecombination of ensemble methods and deep learningconstruction has become an advanced research directionin machine learning For example deep ensemble learning[2] has been demonstrated as a powerful tool foraddressing various recognition tasks

Data Availability

-e datasets generated during and analysed during thecurrent study are available in the UCI repository (httpsarchiveicsuciedumlindexphp) and theMNISTrepository(httpyannlecuncomexdbmnist)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is work was supported by the National Science Foun-dation for Young Scientists of China (61803107)-e Scienceand Technology Planning Project of Guangzhou China(Grant number 201803020025)

References

[1] R M O Cruz R Sabourin and G D C Cavalcanti ldquoDy-namic classifier selection recent advances and perspectivesrdquoInformation Fusion vol 41 pp 195ndash216 2018

[2] Z Zhou and J Feng ldquoDeep forest towards an alternative todeep neural networksrdquo in Proceedings of the 26th In-ternational Joint Conference on Artificial Intelligence Mel-bourne VIC Australia August 2017

[3] R Hu S Chang Q Huang H Wang and J He ldquoEfficientmultispike learning for spiking neural networks usingprobability-modulated timing methodrdquo IEEE Transactions onNeural Networks and Learning Systems 2017

[4] P R L Almeida E J Silva T M Celinski A S BrittoL E S Oliveira and A L Koerich ldquoMusic genre classificationusing dynamic selection of ensemble of classifiersrdquo in Pro-ceedings of IEEE International Conference on Systems Man andCybernetics pp 2700ndash2705 Banff AB Canada October 2017

[5] S Lessmann B Baesens H V Seow and L C -omasldquoBenchmarking state-of-art classification algorithms forcredit scoringrdquo Journal of the Operational Research Societyvol 54 no 6 pp 627ndash635 2003

[6] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews) vol 42 no 4 pp 463ndash484 2012

[7] C Porcel A Tejeda-Lorente M A Herrera-Viedma andE Herrera ldquoA hybrid recommender system for the selectivedissemination of research resources in a technology transferofficerdquo Information Sciences vol 184 no 1 pp 1ndash19 2012

[8] D Nucci F Palomba R Oliveto and A Lucia ldquoDynamicselection of classifiers in bug prediction an adaptive methodrdquoIEEE Transactions on Emerging Topics in Computational In-telligence vol 1 no 3 pp 202ndash212 2017

[9] R Hu Q Huang H Wang J He and S Chang ldquoMonitor-based spiking recurrent network for the representation ofcomplex dynamic patternsrdquo International Journal of NeuralSystems article 1950006 2017

[10] Y Lecun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[11] M Wozniak M Grantildea and E Corchado ldquoA survey ofmultiple classifier systems as hybrid systemsrdquo InformationFusion vol 16 pp 3ndash17 2014

Computational Intelligence and Neuroscience 11

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 12: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

[12] R M O Cruz H Zakane R Sabourin and G CavalcantildquoDynamic ensemble selection VS K-NN why and whendynamic selection obtains higher classification performancerdquo2018 httparxivorgabs180407882

[13] A Lazarevic and Z Obradovic ldquoEffective pruning of neuralnetwork classifiersrdquo in Proceedings of the 14th InternationalJoint Conference on Neural Networks vol 1ndash4 pp 796ndash801Washington DC USA July 2001

[14] K Woods W P Kegelmeyer and K Bowyer ldquoCombinationof multiple classifiers using local accuracy estimatesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 19 no 4 pp 405ndash410 1997

[15] G Giacinto and F Roli ldquoDynamic classifier selection based onmultiple classifier behaviourrdquo Pattern Recognition vol 34no 9 pp 1879ndash1881 2001

[16] B Antosik and M Kurzynski ldquoNew measures of classifiercompetence ndash heuristics and application to the design ofmultiple classifier systemsrdquo in Proceedings of 7th InternationalConference on Computer Recognition Systems vol 4pp 197ndash206 Wroclaw Poland 2011

[17] D H Lobato G M Martinez and A Suarez ldquoStatisticalinstance-based pruning in ensembles of independent classi-fiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 2 pp 364ndash369 2009

[18] N Li and Z H Zhou ldquoSelective ensemble under regulari-zation frameworkrdquo in Proceedings of the 8th InternationalWorkshop on Multiple Classifier Systems vol 5519 pp 293ndash303 Reykjavik Iceland June 2009

[19] A H R Ko R Sabourin and J Britto ldquoFrom dynamicclassifier selection to dynamic ensemble selectionrdquo PatternRecognition vol 47 no 11 pp 3665ndash3680 2014

[20] R M O Cruz R Sabourin and G D C Cavalcanti ldquoOnmeta-learning for dynamic ensemble selectionrdquo in Pro-ceedings of the International Workshop on Multiple ClassifierSystems pp 157ndash166 Naples Italy June 2011

[21] G Davis S Mallat and M Avellaneda ldquoAdaptive greedyapproximationsrdquo Constructive Approximation vol 13 no 1pp 57ndash98 1997

[22] C Qian J Shi Y Yu K Tang and Z Zhou ldquoParallel Paretooptimization for subset selectionrdquo in Proceedings of the 25thInternational Joint Conference on Artificial Intelligencepp 1939ndash1945 New York NY USA July 2016

[23] Q Lu J Ren and ZWang ldquoUsing genetic programming withprior formula knowledge to solve symbolic regressionproblemrdquo Computational Intelligence and Neurosciencevol 2016 Article ID 1021378 17 pages 2016

[24] H Song Z Jiang A Men and B Yang ldquoA hybrid semi-supervised anomaly detection model for high-dimensionaldatardquo Computational Intelligence and Neuroscience vol 2017Article ID 8501683 9 pages 2017

[25] L Kuncheva ldquoClustering and selection model for classifiercombinationrdquo in Proceedings of the 4th International Conferenceon Knowledge-Based Intelligent Information Engineering Systemsand Allied Technologies pp 185ndash188 Brighton UK 2000

[26] T Woloszynski and M Kurzynski ldquoA probalistic model ofclassifier competence for dynamic ensemble selectionrdquo Pat-tern Recognition vol 41 no 10-11 pp 2656ndash2668 2011

[27] Z-H Zhou J Wu and W Tang ldquoEnsembling neural net-works many could be better than allrdquo Artificial Intelligencevol 137 no 1-2 pp 239ndash263 2002

[28] P Castro G P Coelho M F Caetano and V ZubenldquoDesigning ensembles of fuzzy classification systems animmune-inspired approachrdquo in Proceedings of the 4th

International Conference on Artificial Immune Systemsvol 3627 pp 469ndash482 Banff AB Canada August 2005

[29] A Kirshners S Parshutin and H Gorskis ldquoEntropy-basedclassifier enhancement to handle imbalanced class problemrdquoin Proceedings of the International Conference on Technologyand Education vol 104 pp 586ndash591 Balikpapan Indonesia2017

[30] C Blake E Keogh and C J Merz ldquoUCI repository of ma-chine learning databasesrdquo 1998 httpsarchiveicsuciedumlindexphp

[31] T Back Evolutionary Algorithms in eory and PracticeEvolution Strategies Evolutionary Programming Genetic Al-gorithms Oxford University Press Vol 2 Oxford UK 1997

[32] D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conferenceon Machine Learning pp 211ndash218 Nashville TN USA July1997

[33] L I Kuncheva ldquoDiversity in multiple classifier systemsrdquoInformation Fusion vol 6 no 1 pp 3-4 2005

[34] G Martinezmuoz and A Suarez ldquoAggregation ordering inbaggingrdquo in Proceedings of 8th International Conference onArtificial Intelligence and Applications vol 1 pp 258ndash263Innsbruck Austria 2004

[35] Q Umer H Liu and Y Sultan ldquoEmotion based automatedpriority prediction for bug reportsrdquo IEEE Access vol 6pp 35743ndash35752 2018

[36] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 no 1pp 1ndash30 2006

[37] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 13: Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning ...downloads.hindawi.com/journals/cin/2019/7560872.pdf · Regarding dynamic ensemble pruning, the size of the ensemble pool

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom