14
Applied Soft Computing 49 (2016) 1020–1033 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Iterative software fault prediction with a hybrid approach Ezgi Erturk a,, Ebru Akcapinar Sezer b a The Scientific and Technological Research Council of Turkey (TUBITAK), Software Technologies Research Institute, 06100, Ankara, Turkey b Hacettepe University, Department of Computer Engineering, 06800 Ankara, Turkey article info Article history: Received 3 November 2015 Received in revised form 12 July 2016 Accepted 12 August 2016 Available online 17 August 2016 Keywords: Software fault prediction Iterative prediction Artificial neural network Fuzzy inference systems Adaptive neuro fuzzy inference system abstract In this study, we consider a software fault prediction task that can assist a developer during the lifetime of a project. We aim to improve the performance of software fault prediction task while keeping it as applicable. Initial predictions are constructed by Fuzzy Inference Systems (FISs), whereas subsequent pre- dictions are performed by data-driven methods. In this paper, an Artificial Neural Network and Adaptive Neuro Fuzzy Inference System are employed. We propose an iterative prediction model that begins with a FIS when no data are available for the software project and continues with a data-driven method when adequate data become available. To prove the usability of this iterative prediction approach, software fault prediction experiments are performed using expert knowledge for the initial version and infor- mation about previous versions for subsequent versions. The datasets employed in this paper comprise different versions of Ant, jEdit, Camel, Xalan, Log4j and Lucene projects from the PROMISE repository. The metrics of the models are common object-oriented metrics, such as coupling between objects, weighted methods per class and response for a class. The results of the models are evaluated according to the receiver operating characteristics with the area under the curve approach. The results indicate that the iterative software fault prediction is successful and can be transformed into a tool that can automatically locate fault-prone modules due to its well-organized information flow. We also implement the proposed methodology as a plugin for the Eclipse environment. © 2016 Elsevier B.V. All rights reserved. 1. Introduction Software fault prediction (SFP) is a classification task that deter- mines whether a software module is faulty by considering some characteristics or parameters collected from software projects. If larger scales and different types of current software projects are considered, the importance of the determination or identification of fault-prone modules during the development phase can be better understood. Software processes contain some software engineer- ing activities, such as maintenance, review, refactoring or testing, and they are as costly as development processes because these activities have to be applied to all modules of large-scale software to achieve a reliable software system. The cost of these activities can be reduced by the following approaches: (i) reduce the num- ber of modules that are processed in the activities (ii) reduce the need to repeat these activities. SFP becomes a key solution for both approaches because determination of the modules, which may require refactoring or testing, is feasible by employing SFP Corresponding author. E-mail addresses: [email protected] (E. Erturk), [email protected] (E. Akcapinar Sezer). before they are realized by people. In this manner, developers can be informed by the modules, which are alerted in SFP, to handle the review and refactoring process. If SFP is implemented during coding, developers can be immediately warned about the fault sus- ceptibility of his/her code. If SFP can detect fault-prone modules in the early stages of software projects, it provides benefits to project teams in terms of time and budget. To render SFP a useful and mean- ingful activity for software projects, it should be implemented in the early stages of projects. The majority of studies of SFP generally address the appropriate- ness of soft computing methods (e.g., [1,2]) or useful metric groups for SFP models (e.g., [3–5]). Some studies compare the performance of various methods (e.g., Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Trees (DTs) and Adaptive Neuro Fuzzy Inference System (ANFIS)) to determine the most applicable classification method for SFP (e.g., [6,7]). In addition to classifica- tion methods, several metric groups, such as process-level metrics, class-level metrics or method-level metrics (e.g., [3]), are defined, and some preprocessing techniques are suggested to extract the most useful techniques (e.g., [8–10]). The experiments in these studies are typically conducted on public datasets; the most pop- ular datasets are available in PROMISE (e.g., [7,11,12]). A detailed list of studies is provided in Section 2. http://dx.doi.org/10.1016/j.asoc.2016.08.025 1568-4946/© 2016 Elsevier B.V. All rights reserved.

Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

  • Upload
    others

  • View
    2

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

I

Ea

b

a

ARRAA

KSIAFA

1

mclcouiaatcbnbm

(

h1

Applied Soft Computing 49 (2016) 1020–1033

Contents lists available at ScienceDirect

Applied Soft Computing

journa l homepage: www.e lsev ier .com/ locate /asoc

terative software fault prediction with a hybrid approach

zgi Erturka,∗, Ebru Akcapinar Sezerb

The Scientific and Technological Research Council of Turkey (TUBITAK), Software Technologies Research Institute, 06100, Ankara, TurkeyHacettepe University, Department of Computer Engineering, 06800 Ankara, Turkey

r t i c l e i n f o

rticle history:eceived 3 November 2015eceived in revised form 12 July 2016ccepted 12 August 2016vailable online 17 August 2016

eywords:oftware fault predictionterative predictionrtificial neural networkuzzy inference systemsdaptive neuro fuzzy inference system

a b s t r a c t

In this study, we consider a software fault prediction task that can assist a developer during the lifetimeof a project. We aim to improve the performance of software fault prediction task while keeping it asapplicable. Initial predictions are constructed by Fuzzy Inference Systems (FISs), whereas subsequent pre-dictions are performed by data-driven methods. In this paper, an Artificial Neural Network and AdaptiveNeuro Fuzzy Inference System are employed. We propose an iterative prediction model that begins witha FIS when no data are available for the software project and continues with a data-driven method whenadequate data become available. To prove the usability of this iterative prediction approach, softwarefault prediction experiments are performed using expert knowledge for the initial version and infor-mation about previous versions for subsequent versions. The datasets employed in this paper comprisedifferent versions of Ant, jEdit, Camel, Xalan, Log4j and Lucene projects from the PROMISE repository. Themetrics of the models are common object-oriented metrics, such as coupling between objects, weighted

methods per class and response for a class. The results of the models are evaluated according to thereceiver operating characteristics with the area under the curve approach. The results indicate that theiterative software fault prediction is successful and can be transformed into a tool that can automaticallylocate fault-prone modules due to its well-organized information flow. We also implement the proposedmethodology as a plugin for the Eclipse environment.

© 2016 Elsevier B.V. All rights reserved.

. Introduction

Software fault prediction (SFP) is a classification task that deter-ines whether a software module is faulty by considering some

haracteristics or parameters collected from software projects. Ifarger scales and different types of current software projects areonsidered, the importance of the determination or identificationf fault-prone modules during the development phase can be betternderstood. Software processes contain some software engineer-

ng activities, such as maintenance, review, refactoring or testing,nd they are as costly as development processes because thesectivities have to be applied to all modules of large-scale softwareo achieve a reliable software system. The cost of these activitiesan be reduced by the following approaches: (i) reduce the num-er of modules that are processed in the activities (ii) reduce the

eed to repeat these activities. SFP becomes a key solution foroth approaches because determination of the modules, whichay require refactoring or testing, is feasible by employing SFP

∗ Corresponding author.E-mail addresses: [email protected] (E. Erturk), [email protected]

E. Akcapinar Sezer).

ttp://dx.doi.org/10.1016/j.asoc.2016.08.025568-4946/© 2016 Elsevier B.V. All rights reserved.

before they are realized by people. In this manner, developers canbe informed by the modules, which are alerted in SFP, to handlethe review and refactoring process. If SFP is implemented duringcoding, developers can be immediately warned about the fault sus-ceptibility of his/her code. If SFP can detect fault-prone modules inthe early stages of software projects, it provides benefits to projectteams in terms of time and budget. To render SFP a useful and mean-ingful activity for software projects, it should be implemented inthe early stages of projects.

The majority of studies of SFP generally address the appropriate-ness of soft computing methods (e.g., [1,2]) or useful metric groupsfor SFP models (e.g., [3–5]). Some studies compare the performanceof various methods (e.g., Artificial Neural Network (ANN), SupportVector Machine (SVM), Decision Trees (DTs) and Adaptive NeuroFuzzy Inference System (ANFIS)) to determine the most applicableclassification method for SFP (e.g., [6,7]). In addition to classifica-tion methods, several metric groups, such as process-level metrics,class-level metrics or method-level metrics (e.g., [3]), are defined,and some preprocessing techniques are suggested to extract the

most useful techniques (e.g., [8–10]). The experiments in thesestudies are typically conducted on public datasets; the most pop-ular datasets are available in PROMISE (e.g., [7,11,12]). A detailedlist of studies is provided in Section 2.
Page 2: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

ed Sof

ipttccsbmtaftErotiioaitSie

bsamswssiidesvatteieatbtstccT

-

E. Erturk, E. Akcapinar Sezer / Appli

If we assume that the software fault assessment problem onlynvolves the classification of faulty modules, we suggest that thisroblem has almost been solved. However, SFP cannot occur inhe development process of software projects because the prac-ical use of SFP is not prevalent. The main reason is the approach toonstructing prediction models. The dataset that includes collectedode metrics for a finalized project is employed for both the testingtage and the training stage while employing a machine learning-ased method. This approach has two different problems: (i) theethod cannot be applied to the first version of the software due

o the lack of labeled or historical data and (ii) predictions becomevailable after a code is completed or a project is finalized. Theault proneness of a completed code does not provide any benefitso the project team because the software code comes into service.xisting faults in software are not detected at an early stage; as aesult, the faults reach the end users. However, the primary aimf SFP should be the detection of risky modules of software sys-ems while the systems are developing. As a result, SFP should bemplemented from the beginning of software development (cod-ng) activities to the end of a project. SFP can reduce costs in termsf time and efforts consumed in the project. The requirement forSFP model, which can serve the entire software before it comes

nto service, should be emphasized. If this requirement is satisfied,hen a SFP task can be practically applied. This paper presents aFP methodology for creating systematic predictions during ongo-ng development activities, which is integrated in the developmentnvironment.

In recent years, traditional software development methods haveeen replaced with agile software development approaches [13],uch as the Agile Unified Process, Extreme Programming, Scrumnd Kanban. The advantage of these methods is that they areore suitable for managing and conducting large systems. Agile

oftware development suggests an incremental delivery of soft-are [13]. Software projects that are developed according to agile

oftware development approaches are iteratively produced as ver-ions. The testing and refactoring of activities can be conductedn parallel during the development of these versions. Additionally,ncremental delivery indicates that a small set of new features iseveloped and the maintenance of available features is created forach software version, which is available to use. Because each newoftware version has different features and code parts from pre-ious versions, each version should be individually addressed byn SFP task. However, each version may include classes or codeshat correspond with a previous version. As a result, a distinct rela-ionship between different versions of the same software projectxists. If a fault-prone module or class exists in a version, it may benherited by later versions. Therefore, the proposed methodologynables the detection of fault-prone modules as much as possiblend prevents the inheritance of the module that requires refac-oring between subsequent versions. Thus, the methodology cane employed to produce more reliable software versions by detec-ing fault-prone modules when they emerge. Fault-prone moduleshould be determined during development of the current versiono ensure that available faults can be located before the versionomes into service. We propose an iterative SFP methodology byonsidering the version-based development of software projects.he model consists of two types of iterations:

First Iteration corresponds to the first version of the software andthe development of the software; no historical data exists to per-form SFP. Therefore, a fault prediction task can be performed

for this version using FIS, which is a rule-based soft computingmethod. Because FIS does not require any historical data and pre-dictions can be made using expert knowledge, software faults canbe predicted.

t Computing 49 (2016) 1020–1033 1021

- Later Iteration(s) denotes the iterations after the completion ofthe first version of the software and the subsequent versionsin the beginning of development. The collection of metric datafrom previous iterations becomes possible. They are consideredto be historical data to be employed as a training set, whereasthe metrics collected from the current developed version can beconsidered to be a test set. The use of data-driven methods tomake an SFP becomes possible for the project versions subse-quent to the initial version. Data-driven methods are preferredbecause they produce more successful results than rule-basedmethods; they can learn complexity from data but experts can-not adequately describe this complexity to obtain better results[2,14]. Thus, ANN and ANFIS have been the preferred data-drivenmethods according to the results of previous studies [2].

The use of FIS to evaluate an initial version of a project and theuse of data from previous versions in the training phase of a pre-diction model for subsequent versions are proposed in this paper.Thus, at ready-to-use SFP model exists for each version of softwarefrom the beginning of the development phase, and faulty modulesof each version can be addressed before the software is opened forservice. We conduct some experiments to prove the sufficiency ofthe iterative model, namely, beneficial relationships among the ver-sions for making SFP. In the experiments, cbo, wmc and rfc whichare object-oriented metrics are used as metrics because of theircommon use (e.g., [15–17]). Because the proposed iterative modelrequires a versioned dataset, the experiments in the study are per-formed on versions of the Ant, jEdit, Camel, Xalan, Log4j and Luceneprojects from the PROMISE repository. The results of the experi-ments are compared and evaluated according to the criteria of thereceiver operating characteristics curves with the area under thecurve (ROCAUC). According to the experimental results, FIS is suit-able for the first iteration of the prediction methodology, and ANNis preferred for the remaining iterations. After the success of theproposed SFP methodology is demonstrated in the experiments, aplugin is developed for the Eclipse environment to demonstratethat the model is not only successful but also implementable. Thedeveloped plugin includes the FIS and ANN methods; it can makepredictions for compiled Java files in the selected directory.

The remainder of the paper is organized as follows: a brief sum-mary is provided in Section 2, and an overview of the employed datasets and the metrics of cbo, wmc and rfc are introduced in Section3. In Section 4, the proposed iterative SFP methodology is detailed:first, the definition and structure of the employed methods (FIS,ANN and ANFIS) are presented, and second, the iterative SFP modeland the usage of composition of methods are explained. Section 5contains the experiments and results, and the implementation ofthe SFP methodology—the developed Eclipse plugin—is explainedin Section 6.

2. Previous SFP studies

Many studies about SFP are briefly summarized to depict thetrends and approaches in literature. For a systematic presenta-tion, studies are classified according to their solution approachesand a total of seven categories are constructed. All categories arelabeled with their common characteristics; relevant studies arelisted within these categories, and the resulting list is provided asfollows:

(1) The studies that compare the results of various classification meth-

ods based on statistical, intelligent or heuristic approaches: Cahillet al. [8] compared the methods of SVM and Naïve Bayes(NB) by conducting experiments on nine datasets from theNASA MDP repository [18]; they determined that NB was more
Page 3: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

1 ed Sof

(

(

(

022 E. Erturk, E. Akcapinar Sezer / Appli

successful than SVM. Carrozza et al. [6] presented the com-parison results of DT, Bayesian Networks (BNs), MultinominalLogistic Regression, SVM and NB. According to the compari-son, SVM and Multinominal Logistic Regression produced moresatisfying results. Carrozza et al. [6] employed an industrialsoftware system as a dataset. Dejaeger et al. [19] comparedLogistic Regression, Random Forests (RFs), NB, K2, AugmentedNB Classifiers and Max-Min Hill Climbing. They employed eightdatasets of the NASA MDP repository and three datasets ofthe Eclipse Bug repository [20] to perform the comparativeexperiments. The results of the study indicated that all exper-imental methods had similar performances, and RF and K2were slightly more successful than the other methods. Malho-tra [7] employed the methods of Linear Regression (LR), ANN,SVM, DT, the Cascade Correlation Network, the Group Methodof Data Handling Method and Gene Expression Programmingfor comparison. She preferred the AR1 and AR6 datasets fromthe PROMISE repository for the experiments. The experimentalresults indicate that DT was superior to the remaining com-pared methods.

2) The studies suggest the use of combined methods to achievesuccessful results for the identification of faultiness: The combina-tions employed by Cotroneo et al. [21] include NB + logarithmictransformation (log), BN + log, DT + log and LR + log. They uti-lized Linux Kernel, MySQL DBMS and CARDAMOM projectsas datasets of the experiments. SVM and manual inspectionwere combined by Kasai et al. [22]. They conducted experi-ments on a telecommunication system. The method tuple ofMizuno [23] was Fault-prone Filtering + Multi-Variate BernoulliNB. The combined approach was tested on eight datasetsof the PROMISE repository. The list of combined methodsby Chen et al. [12] includes TC + Random Sampling (RS) + NB,TC + RS + C4.5 and TC + RS + K-Nearest Neighbor (KNN). Theypreferred several datasets from the PROMISE and NASA MDPrepositories. Khoshgoftaar et al. [24] combined methods, suchas C4.5 + case-based-learning, with genetic algorithm optimiza-tion. They employed an industrial scale dataset. Li et al. [25]presented the results from the combination of Clustering + C4.5.They preferred Eclipse as the dataset.

3) The studies that suggest the use of approaches or methods that arenot employed prior to the SFP problem: A new method proposedby Abaei et al. [26] was the Self-Organizing Map. The method istested on AR3, AR4 and AR5 datasets from the PROMISE reposi-tory. The first use of a fuzzy inference system (FIS) for SFP wasattempted on datasets from the PROMISE Software Engineer-ing Repository (PROMISE SER) [27] by Erturk and Sezer [14].Goyal et al. [15] proposed the use of Step-Wise Regression andutilized Eclipse, Equinox, Lucene, Mylyn and PDE datasets fromthe Bug Prediction Dataset repository [28] to test the method.Czibula et al. [1] proposed the use of Defect Prediction usingRelational Association Rules and utilized ten datasets from thePROMISE repository to conduct experiments. Erturk and Sezer[2] employed ANFIS and preferred datasets from PROMISE SER.

4) The studies that investigate metrics and attempt to specifyuseful metrics for SFP models: Catal and Diri [3] investi-gated on method-level, class-level, component-level, file-level,process-level and quantitative-level metrics. Oyetoyan et al. [4]discussed the use of cyclic dependencies and investigated thesemetrics for a commercial application and several open sourceprojects, such as Eclipse, Apache Camel and Apache Lucene. Rad-jenovic et al. [5] assessed the metric sets of Chidamber-Kemerermetrics (object-oriented metrics), McCabe metrics (traditional

metrics) and process metrics. Erturk and Sezer [29] performedexperiments on the KC1 dataset from PROMISE SER to inves-tigate Chidamber-Kemerer metrics because the results fromcoupling between objects (cbo), weighted methods per class

t Computing 49 (2016) 1020–1033

(wmc) and response for a class (rfc) metrics are more use-ful metrics than the depth of inheritance tree and the lack ofcohesion of methods. Wu et al. [30] investigated the effect ofdeveloper-quality metrics on several open source projects, suchas Pdfbox and Apache Camel. Erturk and Sezer [2] investigatedMcCabe metrics (McCabe number of code line, cyclomatic com-plexity, and design complexity are more useful metrics thanessential complexity).

(5) Studies that suggest preprocessing techniques to eliminateirrelevant software featuresx Cahill et al. [8] employed Rank Sum,whereas Chen et al. [11] employed two-stage preprocessing. Inthe 1st stage, Chi-Square, Symmetrical Uncertainty and Reli-efF were utilized and a clustering-based method is utilized inthe 2nd stage. They preferred KC1, eclipse-2.0, eclipse-2.1 andeclipse-3.0 datasets from the PROMISE repository Peters et al.[16] presented the results of Peters Filter using 35 datasets fromthe PROMISE repository. Shivaji et al. [9] investigated severalpreprocessing techniques, such as Gain Ratio Attribute Evalua-tion, Chi-squared Attribute Evaluation, Significance AttributeEvaluation, Relief-F Attribute Selection and Wrapper Meth-ods, and determined that the best performance is exhibited bySignificance Attribute Evaluation. They preferred open sourceprojects, such as Eclipse, Apache Columba and Mozilla, as datasetsof the study. Lu et al. [31] preferred Multidimensional Scal-ing as a preprocessor and tested the technique on the KC1,PC1, PC3 and PC4 datasets from the NASA MDP repository.Laradji et al. [10] employed Greedy Forward Selection. Theyconducted experiments on datasets from the PROMISE (ant-1.7, camel-1.6 and KC3) and NASA MDP (MC1, PC2 and PC4)repositories. Lu et al. [32] compared different types of pre-processing approaches, such as feature selection techniques(Information Gain) and dimensionality reduction techniques(Multidimensional Scaling). They tested the techniques oneclipse-2.0, eclipse-2.1 and eclipse-3.0 datasets and discoveredthat dimensionality reduction techniques are more useful thanfeature selection techniques.

(6) Cross-project defect prediction studies: The main logic of cross-project defect prediction is that a constructed prediction modellearns from project A and makes a prediction using these learn-ings for project B. To apply a cross-project defect predictionsolution, Ma et al. [33] used the Transfer NB (TNB) method,which is a transfer learning approach. They applied sevendatasets from the PROMISE repository as source data and threedatasets (AR3, AR4 and AR5) from the PROMISE repository astarget data. Canfora et al. [34] employed the Multi-ObjectiveGenetic Algorithm and tested it on ten datasets of the PROMISErepository. Peters et al. [16] employed Peters filter + RF, Petersfilter + NB, Peters filter + LR and Peters filter + KNN approachesand compared their results. According to the comparison, thebest performance was achieved by the Peters filter + RF method.In another study by Peters et al. [35], they employed CLIFF andMORPH and preferred ten datasets from the PROMISE repos-itory. Turhan et al. [36] utilized NB and investigated recallincrements and the possibility of false alarm reductions. Theyperformed the experiments on six datasets from the PROMISErepository and three datasets (AR3, AR4 and AR5) from the NASAMDP repository. Ma et al. [37] used the methods of Associative-Classification, C4.5, DT, CART, ADT and RIPPER. They testedthese methods on 12 datasets from the NASA MDP repository.

(7) The studies that apply solutions for the lack of data in the earlylife cycles of software projects: To predict the fault proneness ofan object class, Kamiya et al. [38] introduce four checkpoints

in the analysis/design/implementation phase and use appli-cable Chidamber-Kemerer metrics for the prediction tasks ateach checkpoint. The experiments were performed on a pri-vate project of a software company. Nagappan and Ball [39]
Page 4: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033 1023

Table 1Details of datasets.

Dataset Faults cbo rfc wmc

# of Instances # of Defective Instances Rate of Defective Instances Min Avg Max Min Avg Max Min Avg Max

ant-v1.3 125 20 0.16 0 10.43 103 0 34.37 186 0 10.51 71ant-v1.4 178 40 0.225 0 10.75 136 0 33.84 196 0 10.49 77ant-v1.5 293 32 0.109 0 10.65 193 0 30.67 213 0 10.1 91ant-v1.6 351 92 0.262 0 11.46 243 0 34.23 247 0 11.15 100ant-v1.7 745 166 0.223 0 11.05 499 0 34.36 288 0 11.07 120jEdit-v3.2 272 90 0.331 1 12.04 162 1 37.74 487 1 12.53 399jEdit-v4.0 306 75 0.245 1 12.39 184 1 38.24 494 1 12.88 407jEdit-v4.1 312 79 0.253 1 12.98 197 1 39.87 505 1 13.13 413jEdit-v4.2 367 48 0.131 0 14.08 258 0 40.98 522 0 13.16 351jEdit-v4.3 492 11 0.022 0 14.32 346 0 39.85 540 0 12.35 351camel-1.0 339 13 0.038 0 9.96 185 0 19.63 143 0 8.07 82camel-1.2 608 216 0.355 0 10.1 272 0 20.23 186 0 8.31 94camel-1.4 872 145 0.166 0 10.76 389 0 21.2 286 0 8.52 141camel-1.6 965 188 0.195 0 11.1 448 0 21.42 322 0 8.57 166xalan-2.4 723 110 0.152 0 14.5 171 0 30.16 355 0 11.45 123xalan-2.5 803 387 0.482 0 12.86 173 0 29.6 391 0 11.32 130xalan-2.6 885 411 0.464 0 12.07 168 0 29.29 409 0 11.03 133xalan-2.7 909 898 0.988 0 11.99 172 0 29.2 428 0 10.87 138log4j-1.0 135 34 0.252 0 6.86 60 0 21.4 96 0 6.59 49log4j-1.1 109 37 0.339 0 6.86 46 0 23.22 90 0 7.82 50log4j-1.2 205 189 0.922 0 7.61 65 0 25.22 321 0 8.42 105lucene-2.0 195 91 0.467 0 9.76 80 1 23.15 129 1 9.26 57lucene-2.2 247 144 0.583 0 9.96 118 1 22.91 171 1 9.36 80

tmmiimp

3

pc[rwne

lucene-2.4 340 203 0.597

employed two different static analysis tools (PREfix and PRE-fast) to determine the pre-release defect density for WindowsServer 2003. Halim [40] aims to predict fault-prone modulesin the design phase of a software development life cycle. Heapplied metrics collected from a UML class diagram as inputs forNB and KNN models and conducted the experiments on jEdit-3.2.1, jEdit-4.0, jEdit-4.1 and jEdit-4.2 datasets of the PROMISErepository. In a study by Jiang et al. [41], 11 code metricswere replaced by six design metrics using Canonical CorrelationAnalysis. A total of five classification algorithms (Bagging, LR,Boosting, NB, and RF) were utilized for prediction. The datasetsin this paper included 13 datasets from the NASA MDP reposi-tory. Lu et al. [31] suggested fitting-the-confident-fits method,which is a semi-supervised approach because the method canbe employed to make predictions with limited labeled data. Luet al. [32] proposed an active learning approach to make pre-dictions for early phases of each version, with the exception ofthe initial versions of software projects.

Identification of the faulty modules has been attempted manyimes with different priorities; the majority of machine learning

ethods have been employed to make predictions in a classicalanner. Some attempts have been made to render predictions

n early stages of projects because the timing of predictions ismportant. The problem that we address in this paper is a com-

on problem in SFP research; however, our solution differs fromrevious solutions based on its methods and iterative approach.

. Data

In SFP research, NASA MDP [18] and PROMISE [42] are veryopular repositories. However, the PROMISE repository is moreommon if we consider Section 2 (previous SFP studies) (e.g.,43,44]) and several reviews (e.g., [3,5]) of SFP. The PROMISE

epository includes datasets of approximately 65 different soft-are projects that are related with the SFP problem. Because aovel application methodology of methods that were previouslymployed for datasets of PROMISE is proposed in this paper, the

0 10.75 128 1 25.19 392 1 10.39 166

datasets of the PROMISE repository enable us to obtain comparableexperimental results.

As a result, available versions of Ant, jEdit, Camel, Xalan, Log4jand Lucene projects in the PROMISE repository are selected asdatasets. Because the proposed iterative methodology requires ver-sioned datasets of each project, selected projects from PROMISEsatisfy this requirement, and the versions of the listed projectsare preferred for the experiments. Ant and jEdit are projects withthe maximum amount of versions in PROMISE (5 versions). Cameland Xalan have four versions, whereas Log4j and Lucene havethree versions. A short description of each project is as follows:Apache Ant project is an open source, Java-based, shell-independentbuild tool, and jEdit is an open source mature programmer’s texteditor. Apache Camel is an open-source integration frameworkthat is based on Enterprise Integration Patterns, and Xalan-Java isan XSLT processor for transforming XML documents into HTML,text, or other XML document types. Log4j is an open-sourcelogging package for Java, and the Apache LuceneTM project isan open-source, Java-based search engine that provides index-ing and search technology with advanced analysis/tokenizationcapabilities. The versions of the projects are detailed in Table 1[45].

The datasets consist of 18 object-oriented metrics. Some of thesemetrics are employed in the scope of this study; however, themost common three object-oriented metrics (cbo, wmc and rfc)from the Chidamber-Kemerer metric category [46] are preferredbecause a review [5] of software metrics in SFP indicates that thebest metrics from the CK metrics suite are cbo, wmc and rfc. Theemployed metrics exist in 71 of the 94 datasets in the PROMISErepository. Some studies suggest that these metrics are minimallyuseful as a parameter set to construct successful models for thisproblem (e.g., [15–17,29,34,47,48]). The details of the metrics arebriefly explained as follows [46]:

Coupling between objects: The Cbo value for a class indicatesthat the number of classes for which the class is dependent. For

instance, if class A utilizes attributes or methods that belong to classB, class A is dependent on B. If the cbo value increases, maintenance,testability, modularity and reusability decrease and the complexityof the system increases.
Page 5: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

1024 E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033

mbers

tat

tbc

4

4

npsinacopiuteAtAAidl

TM

Fig. 1. Triangle me

Weighted methods per class: Wmc is the sum of the complexi-ies of all methods per class. The metric indicates the developmentnd maintenance cost of a class. If the number of methods increases,he specialization degree of the class also increases.

Response for a class: Rfc is the number of methods, includinghe inherited methods per class. If the rfc value increases, the testa-ility and cohesion of a class decrease and the responsibility of thelass increases.

. Iterative methodology

.1. Methods

In this study, we aim to predict software faults from the begin-ing of a project (the moment of t0 for the project) to the end of aroject. We propose an iterative model that consists of several clas-ification methods. In the first iteration, FIS is considered becauset can make predictions using only expert opinion. Thus, it doesot require historical data for prediction tasks. In subsequent iter-tions, data-driven methods are employed due to their success inlassifications. In this study, ANN is the preferred prediction meth-ds because the success of the ANN for the SFP problem has beenresented in previous studies (e.g., [7,49,50]). Additionally, ANFIS

s selected as another method due to its hybrid approach and itssefulness for comparison and justification of results. As a result,hree different classification methods—ANN, FIS and ANFIS—aremployed to predict software faults in this study, and ANN andNFIS are applied as opponents. ANFIS and ANN are employed in

he proposed SFP methodology. The opposing role of ANFIS againstNN is sourced from the differences in their learning schemes.

NFIS applies expert opinion and data to learn and gradual mod-

fy prediction methods from expert-based approaches (i.e., FIS) toata driven approaches (i.e., ANN). To represent the ability and

imitations of learning from experts, ANFIS is employed in the

able 2embership function range of FIS models.

Dataset cbo rfc

a b c a

ant-v1.3 0 24 103 0jEdit-v3.2 0 52 162 0camel-1.0 0 67 185 0xalan-2.4 0 26 171 0log4j-1.0 0 15 60 0lucene-2.0 0 16 80 0

hip function form.

experiments with ANN. ANFIS is utilized to assess whether theapplication of expert data in the later stages of a project is anappropriate design.

FIS is a rule-based inference system that is dependent on fuzzyset theory. The basic approach of fuzzy set theory is ‘an elementbelongs to a set with a particular membership degree’ [51]. FIS mod-els are based on linguistic variables and their linguistic values, i.e.,fuzzy sets. In this study, the software metrics (cbo, rfc, wmc) cor-respond to input linguistic variables, and faultiness represents theoutput linguistic variable. The values of input linguistic variablesare designated as Low, Moderate and High, whereas output linguis-tic values are specified as Low and High. Membership functionsare employed to convert crisp values (metric values) of linguis-tic variables to fuzzy values. In this paper, membership functionsassume a triangular form (Fig. 1). The rule-based FIS model is com-posed of If-Then rules. The range and rules of the prepared trianglemembership function for each FIS model in this study are listed inTables 2 and 3, respectively. Fuzzy sets, fuzzy rules and the mem-bership function properties of an expert system are defined andadjusted by a domain expert. In this study, the Mamdani-type FIS,which is a common FIS model, is employed [52,53]. The Mamdani-type FIS produces the final crisp results in four steps: fuzzification,rule evaluation, aggregation and defuzzification [54] (Fig. 2); addi-tional explanations are provided in the study by Akgun et al. [55].The common implementation details of FIS for this study are pro-vided in Table 4.

The rules employed in this study were developed by bothauthors who are computer engineers. Table 3 lists a total of 27 rules.Rules 1, 2, 3, 4, 7, 10 and 19 produce a Low value (denoted by L) forthe dependent (output) parameter, whereas the remaining ruleshave a High value (denoted by H). Rules are produced in an algo-

rithmic manner. If two or three of three software metrics have ‘L’as the linguistic value in the conditional part; faultiness also has alinguistic value of ‘L’ in the conclusive part of the rule. The fuzzy

wmc

b c a b c

47 186 0 17 7131 487 0 12 39981 143 0 20 82168 355 0 46 12369 96 0 12 4981 129 0 21 57

Page 6: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033 1025

Fig. 2. General structure o

Table 3Rules of FIS model.

Rule No Rule

1 IF wmc is L AND cbo is L AND rfc is L THEN faultiness is L2 IF wmc is L AND cbo is L AND rfc is M THEN faultiness is L3 IF wmc is L AND cbo is L AND rfc is H THEN faultiness is L4 IF wmc is L AND cbo is M AND rfc is L THEN faultiness is L5 IF wmc is L AND cbo is M AND rfc is M THEN faultiness is H6 IF wmc is L AND cbo is M AND rfc is H THEN faultiness is H7 IF wmc is L AND cbo is H AND rfc is L THEN faultiness is L8 IF wmc is L AND cbo is H AND rfc is M THEN faultiness is H9 IF wmc is L AND cbo is H AND rfc is H THEN faultiness is H10 IF wmc is M AND cbo is L AND rfc is L THEN faultiness is L11 IF wmc is M AND cbo is L AND rfc is M THEN faultiness is H12 IF wmc is M AND cbo is L AND rfc is H THEN faultiness is H13 IF wmc is M AND cbo is M AND rfc is L THEN faultiness is H14 IF wmc is M AND cbo is M AND rfc is M THEN faultiness is H15 IF wmc is M AND cbo is M AND rfc is H THEN faultiness is H16 IF wmc is M AND cbo is H AND rfc is L THEN faultiness is H17 IF wmc is M AND cbo is H AND rfc is M THEN faultiness is H18 IF wmc is M AND cbo is H AND rfc is H THEN faultiness is H19 IF wmc is H AND cbo is L AND rfc is L THEN faultiness is L20 IF wmc is H AND cbo is L AND rfc is M THEN faultiness is H21 IF wmc is H AND cbo is L AND rfc is H THEN faultiness is H22 IF wmc is H AND cbo is M AND rfc is L THEN faultiness is H23 IF wmc is H AND cbo is M AND rfc is M THEN faultiness is H24 IF wmc is H AND cbo is M AND rfc is H THEN faultiness is H25 IF wmc is H AND cbo is H AND rfc is L THEN faultiness is H26 IF wmc is H AND cbo is H AND rfc is M THEN faultiness is H27 IF wmc is H AND cbo is H AND rfc is H THEN faultiness is H

Table 4FIS implementation details.

Fuzzy inference system type Mamdani

Input fuzzy sets LowModerateHigh

Output fuzzy sets LowHigh

Membership function type Triangle-shapedmembership function(Fig. 1)

Input membership functionrange (Table 2)

(a) 0(b) Expert observation(c) Max. value of metric

Output membership functionrange

(a) 0(b) 1

Rule count 27

f Mamdani type-FIS.

sets that were specified for the faultiness parameter are designedfor addressing fuzziness at the maximum level (Fig. 1). This typeof design makes system more susceptible to small changes in theindependent parameters.

Although the same rules are employed for all projects, different‘a’, ‘b’ and ‘c’ values are employed in FIS models of different projectsor different versions of the same project. An ‘a’ value is zero for allsoftware metrics, a ‘c’ value is the existing maximum value of therelevant metric and a ‘b’ value is half of ‘c’. When the maximumvalue of one parameter increases, its fuzzy sets change in accor-dance with this situation. Although the rules are static (Table 3),their behaviors exhibit differences because parameterized mem-bership functions are utilized. Thus, the same rule may producedifferent membership degrees, which denote the value of L or H fordifferent fuzzified inputs.

An ANN is a powerful supervised learning method that simu-lates the structure of the human brain using a network of artificialneurons. The network structure consists of two main elements:neurons and directed-weighted relations, which connect neuronsto other neurons (Fig. 3). In these relations, some parameters orweights are adjusted in the training phase. ANN models can learnthese parameters from data without any involvement and usethis knowledge to make predictions or classifications in a test set.Because ANN performs its task as a black box, the readability andinterpretability of ANN models are difficult. An important advan-tage of ANN is that ANN models can tolerate inconsistent or missingvalues in datasets. Additional details about ANN are provided in thestudy of Hecht Nielsen [56]. In our experiments, three-layered ANNmodels are constructed. The neuron count of the hidden layer is cal-culated as six according to the inequality “≤2N + 1” (N is the inputamount), which is suggested by Hecht Nielsen [56]. The trainingmethod for ANN models is scaled conjugate gradient.

ANFIS is a hybrid method that benefits from the advantages ofANN and FIS methods [57]; it combines the learning ability of ANNand the understandability of FIS. ANFIS is a specific structure of theANN. The neuron layers of ANFIS models have certain responsibil-ities, which correspond to the phases of FIS. In ANFIS, Sugeno-typeFIS rules are employed. According to Sugeno-type FIS, the conclu-sion part of the rule is a polynomial equation that consists of crispinputs and coefficients. Experts supply the number of fuzzy sets foreach input (linguistic variable) with their membership functions tothe model. The number of neurons in hidden layers is specified byan expert, whereas the remainder of the parameters (parameters

of membership functions and polynomial coefficients) are learnedfrom data. A typical ANFIS model consists of six different layers(Fig. 4). In layer 0, crisp inputs are taken. The neurons of layer 1
Page 7: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

1026 E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033

Fig. 3. General structure of ANN.

Fig. 4. General struc

Table 5ANFIS implementation details.

Fuzzy inference system type Sugeno

Layer count 6Neuron count of hidden layers 64Input fuzzy sets Low

ModerateHigh

Membership function type Triangle-shapedmembership function

Membership function range Learning from dataRule count 27Aggregation method Max. methodDefuzzification method Weighted averageTraining algorithm Combination of the least

squares estimator and the

flrtdAi

4

twWcdhol

gradient descent methodIteration count 100

uzzify the crisp inputs with the selected membership function. Inayer 2, rule neurons are defined. Then, outputs of the rule neu-ons are normalized in layer 3. The defined rules are applied withhe normalized strength values in layer 4. The crisp output is pro-uced in layer 5. Additional details and the practical example ofNFIS are discussed in a study by Sezer et al. [58]. The common

mplementation details of ANFIS for the study are listed in Table 5.

.2. Model

SFP is not a problem that can be solvable with a single applica-ion step. Because the development of software can be describedith processes, SFP should support all phases of these processes.e argue that the consideration of SFP as a task of binary classifi-

ation that is applied to labeled data causes the main problem to be

isregarded. Consider the following two points regarding SFP: (i)ow SFP should be configured to make predictions at the beginningf a project and (ii) how predictive models can be retained withoutosing their generalization ability. SFP should be performed begin-

ture of ANFIS.

ning on the first day of a project using the advantages of metricdata, and SFP should be compatible with agile approaches.

We propose an iterative SFP methodology that involves morethan one classification method. According to the methodology,rule-based methods, such as FIS, are employed for fault predictionwhen historical data are not available, which corresponds to ini-tial versions of agile-developed software projects. In subsequentiterations or versions of projects, historical data accumulate. Pre-vious versions come into service and their faultiness informationcan be collected from users or developers. According to the col-lected faultiness information, the modules of previous versions canbe labeled as faulty or not faulty. As a result, the proposed itera-tive SFP methodology suggests the use of rule-based methods forinitial versions and data-driven methods are employed for subse-quent versions. After the first version, fault prediction is iterativelyperformed using data-driven methods.

According to the proposed iterative SFP methodology, thetraining sets of data-driven methods can be specified using twoapproaches. In the first approach, only the previous version of thecurrent version is selected as the training set (Fig. 5-a) becausesequential versions are more likely to resemble the previous ver-sions. Thus, use of the previous version as the training set provideshigh performance for the prediction model. In the second approach,all previous versions of the current version are employed as thetraining set (Fig. 5-b). In this manner, the training set becomeslarger than the training set in the first approach and the speedof prediction decreases. Because data-driven methods can learnmore information from a larger training set, they may produce moresuccessful prediction results.

5. Experiments and results

In this study, we aim to indicate the practical usage of SFPby transferring knowledge among software versions and inves-tigate methods that can be applied to predict software faults inthe early phases of software projects. To achieve these goals, sev-

Page 8: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033 1027

e[Xboficvrfttsetvoe

p(rvmi

whR0irt

Fig. 5. Proposed methodology.

ral experiments are performed in the MATLAB 7.13.0 (R2011b)59] environment. The experiments are applied to Ant, jEdit, Camel,alan, Log4j and Lucene projects from the PROMISE repositoryecause they have more than one version. The available versionsf the projects in the PROMISE repository do not contain the realrst versions of projects. Thus, we assume that ant-v1.3, jEdit-v3.2,amel-1.0, xalan-2.4, log4j-1.0 and lucene-2.0 projects are the firstersions of the Ant, jEdit, Camel, Xalan, Log4j and Lucene projects,espectively. In this study, three types of experiments are per-ormed: (i) FIS is employed for the first version of projects, i.e.,he beginning of projects, (ii) ANN and ANFIS models, which arerained with the metrics of exactly one previous version due theimilarity with sequential versions, and (iii) ANN and ANFIS mod-ls are trained with the accumulated data of previous versions toake advantage of a larger training set, and the faults of the currentersion are predicted. Fig. 6 illustrates the third type of experimentsn the jEdit project as an example to explain the structure of thexperiment.

To assess the results of the iterative SFP model, ROCAUC isreferred because it is an extensively applied evaluation criteriae.g., [1–3,5,7,9–12,14,19,22,29,31–33,37,40,41,43,44,47]) in thisesearch area and does not require any threshold values. A ROCAUCalue that is less than or equal to 0.5 indicates that the predictionodel predicts no occurrence. The success of the prediction models

ncrease as the ROCAUC value converges to 1 [60].The experimental results are shown in Table 6. FIS predicts soft-

are faults for the first versions (first type of experiments) with aigh success rate (ROCAUC(ant-v1.3): 0.8312, ROCAUC(jEdit-v3.2): 0.7696,OCAUC(camel-1.0): 0.84, ROCAUC(xalan-2.4): 0.7862, ROCAUC(log4j-1.0):

.8735 and ROCAUC(lucene-2.0): 0.7714). In the second type of exper-

ments (‘Previous’ columns in Table 6), results in the acceptableange (ROCAUC > 0.5) are listed for both the ANN and ANFIS predic-ion models. However, the performance of the ANN exceeds the

Fig. 6. Application of cumulative SFP on jEdit with ANN.

performance of the ANFIS, with the exception of the experimentsin the Camel project. The results of the third type of experiments(‘Cumulative’ columns in Table 6) indicate a similar behavior in thesecond type of experiments. Thus, the ANN performs better thanANFIS, with the exception of the experiments in the Camel and Log4jprojects.

According to the suggested methodology, the training sets ofthe learning methods can be specified using two approaches, aspreviously mentioned, and we use ‘previous prediction’ to indicatethe use of data from a previous version in the training phase and‘cumulative prediction’ to indicate the use of data from all previousversions in the training phase. The results of the experiments (referto Table 6) reveal small differences between these two approaches.However, the cumulative prediction models can be utilized becausethe number of successful results of the cumulative prediction mod-els is greater than the successful results of the previous predictionmodels. In Table 6, the results of the cumulative and previous pre-diction models are compared; the better results are emphasized inboldface for each test set of the ANN and ANFIS models.

According to Table 6, some prediction results are not acceptable,such as the results for ant-v1.4 (ROCAUC(ANN): 0.5752, ROCAUC(ANFIS):0.6116) and jEdit-v4.3 (ROCAUC(ANN): 0.6292, ROCAUC(ANFIS): 0.5457).To investigate the reasons for these unsatisfactory results, self-training ANN and ANFIS prediction models are constructed. Thesoftware faults of each version of each project within this studyare predicted using specific data as the training set. The objectiveof these experiments is to specify the possible maximum predic-tion performance for each version. The results are listed in Table 7;some versions could not be predicted with high scores, such asant-v1.4 (ROCAUC(ANN): 0.7009, ROCAUC(ANFIS): 0.7455) and jEdit-v4.3 (ROCAUC(ANN): 0.4613) because these datasets may containimbalanced values or pathological irregularity. Thus, the proposediterative SFP methodology does not produce the previously men-tioned unsatisfactory results. However, the proposed methodologymay decrease the prediction performance according to the perfor-mance of self-training SFP models. This situation is expected, andthe results of self-training SFP models can be assumed as upperlimits for the success of the proposed SFP. The most importantpoint is that the results of the self-training SFP and the proposedSFP are similar. Table 8 lists the ROCAUC values, which decreasedue to proposed methodology. According to these results, theaverage differences between the ANN self-training ROCAUC results

and ANN previous and cumulative ROCAUC results are minimal(ROCAUC(Previous): 0.046 and ROCAUC(Cumulative): 0.0329). Similarly,the average differences for ANFIS can also be accepted as low.ANN may be preferred for cumulative and previous approaches.
Page 9: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

1028 E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033

Table 6ROCAUC results.

Predicted Set (Version) FIS ROCAUC ANN ANFIS

Previous Cumulative Previous Cumulative

Train Set ROCAUC Train Set ROCAUC Train Set ROCAUC Train Set ROCAUC

ant-v1.3 0.8312ant-v1.4 v1.3 0.5752 v1.3 0.6116ant-v1.5 v1.4 0.8186 v1.4 + v1.3 0.8224 v1.4 0.5985 v1.4 + v1.3 0.6742ant-v1.6 v1.5 0.8390 v1.5 + v1.4 + v1.3 0.8387 v1.5 0.7944 v1.5 + v1.4 + v1.3 0.7397ant-v1.7 v1.6 0.8244 v1.6 + v1.5 + v1.4 + v1.3 0.8221 v1.6 0.7938 v1.6 + v1.5 + v1.4 + v1.3 0.8058jEdit-v3.2 0.7696jEdit-v4.0 v3.2 0.7766 v3.2 0.7704jEdit-v4.1 v4.0 0.8199 v4.0 + v3.2 0.8047 v4.0 0.8216 v4.0 + v3.2 0.8170jEdit-v4.2 v4.1 0.8386 v4.1 + v4.0 + v3.2 0.8450 v4.1 0.8257 v4.1 + v4.0 + v3.2 0.8102jEdit-v4.3 v4.2 0.6292 v4.2 + v4.1 + v4.0 + v3.2 0.6303 v4.2 0.5457 v4.2 + v4.1 + v4.0 + v3.2 0.6299camel-1.0 0.84camel-1.2 v1.0 0.5757 v1.0 0.5795camel-1.4 v1.2 0.7103 v1.2 + v1.0 0.7075 v1.2 0.7120 v1.2 + v1.0 0.7159camel-1.6 v1.4 0.6016 v1.4 + v1.2 + v1.0 0.6399 v1.4 0.6364 v1.4 + v1.2 + v1.0 0.6432xalan-2.4 0.7862xalan-2.5 v2.4 0.6232 v2.4 0.5889xalan-2.6 v2.5 0.6577 v2.5 + v2.4 0.6537 v2.5 0.6457 v2.5 + v2.4 0.6333xalan-2.7 v2.6 0.7771 v2.6 + v2.5 + v2.4 0.7872 v2.6 0.7098 v2.6 + v2.5 + v2.4 0.7551log4j-1.0 0.8735log4j-1.1 v1.0 0.8527 v1.0 0.7986log4j-1.2 v1.1 0.6491 v1.1 + v1.0 0.6673 v1.1 0.5870 v1.1 + v1.0 0.6706lucene-2.0 0.7714lucene-2.2 v2.0 0.6778 v2.0 0.6698lucene-2.4 v2.2 0.7456 v2.2 + v2.0 0.74 v2.2 0.6749 v2.2 + v2.0 0.6974

Table 7ROCAUC results for self-training.

Version ANN ANFIS

ant-v1.3 0.9524 0.8214ant-v1.4 0.7009 0.7455ant-v1.5 0.8626 0.9199ant-v1.6 0.8643 0.8664ant-v1.7 0.8469 0.8184jEdit-v3.2 0.8796 0.8997jEdit-v4.0 0.8246 0.7826jEdit-v4.1 0.8682 0.9144jEdit-v4.2 0.8750 0.9755jEdit-v4.3 0.4613 0.9115camel-1.0 0.9242 0.8939camel-1.2 0.6008 0.6009camel-1.4 0.7911 0.8132camel-1.6 0.6807 0.7143xalan-2.4 0.8186 0.8197xalan-2.5 0.6747 0.6633xalan-2.6 0.6821 0.6782xalan-2.7 0.8167 0.8589log4j-1.0 0.8929 0.8857log4j-1.1 0.9018 0.9018log4j-1.2 0.7804 0.7719lucene-2.0 0.8492 0.8651lucene-2.2 0.7628 0.7457

Hmwp

niesirTt

Table 8Decrease in the ROCAUC of the proposed methodology according to self-training.

ANN ANFIS

Previous Cumulative Previous Cumulative

ROCAUC(min) −0.1679 −0.169 0.0122 0.0126

projects during the development phase. In contrast to models inmany other studies, the major design crite ria of the methodologyis integrated in a tool.

lucene-2.4 0.8248 0.8148

owever, all decreased rates in Table 8 demonstrate that the mini-al performance loss according to self-training data-driven modelsarrants the use of the proposed SFP methodology to achieve theractical use of SFP in the development phase.

In the ANN and ANFIS experiments, which are detailed in Table 6,o data partitioning activities are performed because the train-

ng and test sets are belong to different versions. The self-trainingxperiments require a sampling process because they employ theame dataset for training and testing. Thus, N-fold cross validations employed to achieve valid results for the experiments whose

esults are shown in Table 7. This technique is applied with N = 5.he datasets are partitioned into five parts; and four parts ofhe dataset (80%) are employed for the training phase, and the

ROCAUC(max) 0.1313 0.1131 0.3658 0.2816ROCAUC(average) 0.046 0.0329 0.1185 0.1221

remaining part (20% of dataset) is utilized for the testing phasein each experiment. In the application of N-fold cross-validation,attention is paid to generate parts; they include approximately thesame faulty instance rate.

The proposed SFP model has the ability to produce promisingresults. However, the methodology is implementable and can serveas a solution for the previously mentioned SFP problems. For thisreason, the following section of the paper explains the developmentof the proposed SFP methodology.

6. Eclipse plugin

In this study, the proposed iterative fault prediction model isimplemented as a plugin for the Eclipse environment. The objectiveof plugin implementation is to prove that the suggested itera-tive model is implementable and practically applicable in software

Fig. 7. Layered view of SFP eclipse plugin.

Page 10: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033 1029

Fig. 8. Preliminary for FIS prediction in eclipse plugin.

Fig. 9. Example predictions of eclipse plugin.

Page 11: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

1030 E. Erturk, E. Akcapinar Sezer / Applied Soft Computing 49 (2016) 1020–1033

etails

dmTOTcApad

Fig. 10. Flow and d

The fault prediction plugin is developed in the Eclipse Lunaevelopment environment [61]. First, a ‘Plug-in Project’ is auto-atically created by selecting the File/New menu of Eclipse Luna.

hen, the instructions in the ‘Plug-in Development Environmentverview’ section of Eclipse Luna documentation [62] are followed.he fault prediction plugin consists of four main modules: metricollector, FIS modeler, ANN modeler and performance evaluator.

s shown in Fig. 7, ANFIS is not considered in the scope of thelugin development because ANN gives a superior performancend implementation of an ANN can be an example for all data-riven methods. When other data-driven methods are intended

of plugin functions.

to integrate in this plugin, their input and output connectionsbecome available due to the existing ANN module. Thus, they canbe easily integrated in the plugin. Details of the components are asfollows:

Metric collector: Collected metrics from real software projectsare available in the datasets of the PROMISE repository. Thus,metrics do not need to be collected when a prediction model is

constructed for one of these datasets. However, software metricsmust be collected when a real software project is employed as thedataset. An open-source and extended version of a tool for calcu-lating Chidamber and Kemerer Java Metrics (CKJM extended) [63]
Page 12: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

ed Sof

imoCtecoftptsroaao

dtvrteisaAtputmpmAatHdusupoas

onHccbaigftTtsfb

E. Erturk, E. Akcapinar Sezer / Appli

s integrated in the plugin. CKJM extended collects object-orientedetrics from compiled Java files (*.class) by processing byte codes

f the files. Within the prediction plugin, the collected metrics usingKJM extended can be listed as follows: wmc, depth of inheritanceree, number of children, cbo, rfc, lack of cohesion in methods, affer-nt coupling, efferent coupling, number of public methods for alass, lack of cohesion in methods Henderson-Sellers version, linesf code, data access metric, measure of aggregation, measure ofunctional abstraction, cohesion among methods of class, inheri-ance coupling, and coupling between methods. According to userreferences, metrics that are supposed to be collected may be fil-ered prior to any prediction. The values of the preferred metrics aretored in a file that is only updated if the user changes his/her met-ic selection preferences. The metric collector should be run at leastnce before tagging a software version. Thus, all available metricsre collected and saved with the version number to be employeds a candidate training set of data-driven methods for predictionsf subsequent versions.

FIS modeler: The FIS modeler module is implemented to pre-ict software faults by FIS. No third-party tools are employed forhis module. Mamdani-type FIS is implemented in this paper. Thealidation of FIS implementation is performed by comparing itsesults with the results obtained from the MATLAB FIS modeler forhe same inputs. According to the results, the developed FIS mod-ler module is considered to be reliable. Before the FIS modelers executed, the input metrics that are provided by the plugin areelected (Fig. 8-a). In the FIS modeler, the membership functionsre in triangular form, as suggested by the iterative methodology.ll input linguistic variables (selected metrics) have three linguis-

ic values, such as L, M and H. This type of FIS model has threearameters, as shown in Fig. 1. Because the proposed methodologyses 0 for point “a” and the maximum value of the metric for “c”,he plugin automatically assigns 0 and the maximum value of the

etric to points of “a” and “c”. Thus, the user should specify theeak point (b) of the M linguistic value (Fig. 8-b). In Fig. 8-b, theaximum points of metrics are shown for information purposes.fter the range of fuzzy sets is adjusted, the rules of the FIS modelre specified (Fig. 8-b). According to the proposed methodology,he values of the output linguistic variable (faultiness) are L and. While the FIS modeler automatically generates all possible con-ition parts of the rules, it presents two options (L and H) for theser to select for rule consequent parts. The FIS modeler saves allpecifications about fuzzy sets and fuzzy rules to a file. Thus, theser can repeatedly use the saved model. When the user begins arediction by the FIS modeler, the module triggers the collectionf the selected metrics and produces the prediction results (Fig. 9-). The produced results range between 0 and 1, and the faultinessusceptibility increases as the prediction results converge to 1.

ANN modeler: To predict software faults by the ANN, the toolf Encog [64] is integrated to the plugin. Encog is an advancedeural network and bot programming library [64] developed byeaton Research. The reliability of the ANN modeler is tested byomparing the results with the MATLAB ANN results. They areonsiderably similar (R2 = 96%). However, the similarity is not sta-le due to randomness of the ANN models. The ANN modeler hasthree-layered ANN structure. The hidden layer of the structure

ncludes two times as many neurons as inputs and, a backpropa-ation method is employed for learning. The possible training setsor ANN models must be accumulated by collecting all metrics andagging all Java files as “Faulty” or “Non-Faulty” for each version.hen, the composed historical data can be used as a training set forhe developing version. After the training sets and input metrics are

elected, the ANN modeler collects the selected metrics and makesault prediction for each class (Fig. 9-b). The prediction results cane interpreted, as mentioned in the FIS modeler section.

t Computing 49 (2016) 1020–1033 1031

Performance evaluator: The performance evaluator modulecontains calculations for three types of evaluation criteria: ROCAUC,precision and recall. The validation of the performance evaluatormodule is performed by comparing it with MATLAB ROCAUC andthe precision and recall abilities; two programs produce exactly thesame results. Calculation of the evaluation criteria requires taggedor labeled data for each class to compare the predicted results withthe real values. Thus, the process of tagging Java files as “Faulty” or“Non-Faulty” for each version must be performed before the per-formance evaluator module runs. In addition, the threshold valuefor precision and recall must be specified by the user before cal-culations are performed. The performance evaluator generates aconfusion matrix and calculates the performance using it for givenversion number and then saves the historical results into a file.

These four modules can be illustrated in a layered manner(Fig. 7). The process of software fault prediction with this pluginis detailed in Fig. 10. In this figure, the steps that are highlightedwith stars are repeated for each class of each version of the softwareproject, and the first three steps, which only have to be performedonce, can be considered to be the pre-settings.

7. Conclusion

An SFP task can support project management and softwaredevelopment in large software systems if it is methodicallyemployed. To regularly apply SFP, the constructed fault predic-tion models should be compatible with current popular softwaredevelopment processes, such as agile approaches. In this study,we consider the SFP problem—how SFP can be a part of the sys-tems development life cycle—instead of a simple classificationproblem and propose an iterative SFP methodology that can beintegrated into a tool. According to the methodology, we sug-gest that previous versions of the same project can be utilizedto transfer knowledge for prediction of the current version. Wedetermine that expert knowledge can be employed for the firstversion, which has a special case. Thus, the first version of thesoftware project is handled using a rule-based technique, suchas the FIS model as the predictor, according to the experimentalresults. The method is suitable for predicting faults in early projectphases (ROCAUC(average): 0.812). The experiments for subsequentiterations of the proposed methodology indicate that data-drivenprediction methods, such as ANN and ANFIS, can transfer knowl-edge among software versions and perform iterative SFP for eachversion, with the exception of the first version (ROCAUC(ANN average):0.7317, ROCAUC(ANN max): 0.8527, ROCAUC(ANFIS average): 0.6986 andROCAUC(ANFIS max): 0.8257). The ANN is preferred for subse-quent iterations of the methodology because the performance ofANN models is generally better than the performance of ANFIS(Table 6). The comparison of the results of previous and cumula-tive experiments suggests a cumulative approach for subsequentiterations in the methodology (ROCAUC(ANN previous average): 0.7218,ROCAUC(ANN cumulative average): 0.7466, ROCAUC(ANFIS previous average):0.6869 and ROCAUC(ANFIS cumulative average): 0.716).

In this study, we claim that an SFP task can be converted to a tooland automatically run. We propose a methodology and integrate itas a tool. In future studies, we will apply the tool to an ongoing soft-ware project and observe the effect of an SFP tool on project quality.Another research interest that we explore is assessing the faultinessof data, which is the greatest challenge during development of the

tool. When we consider large systems, thousands of files can bemanually labeled to enable each version to employ the developedtool. Thus, we aim to automate this process by integrating the SFPtool and an issue tracking system, such as Atlassian Jira.
Page 13: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

1 ed Sof

R

[

[

[

[

[

[

[

[

[[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

032 E. Erturk, E. Akcapinar Sezer / Appli

eferences

[1] G. Czibula, Z. Marian, I.G. Czibula, Software defect prediction using relationalassociation rule mining, Inf. Sci. 264 (2014) 260–278, http://dx.doi.org/10.1016/j.ins.2013.12.031.

[2] E. Erturk, E.A. Sezer, A comparison of some soft computing methods forsoftware fault prediction, Expert Syst. Appl. 42 (4) (2015) 1872–1879, http://dx.doi.org/10.1016/j.eswa.2014.10.025.

[3] C. Catal, B. Diri, A systematic review of software fault prediction studies,Expert Syst. Appl. 36 (4) (2009) 7346–7354, http://dx.doi.org/10.1016/j.eswa.2008.10.027.

[4] T.D. Oyetoyan, D.S. Cruzes, R. Conradi, A study of cyclic dependencies ondefect profile of software components, J. Syst. Software 86 (12) (2013)3162–3182, http://dx.doi.org/10.1016/j.jss.2013.07.039.

[5] D. Radjenovic, M. Hericko, R. Torkar, A. Zivkovic, Software fault predictionmetrics: a systematic literature review, Inf. Software Technol. 55 (8) (2013)1397–1418, http://dx.doi.org/10.1016/j.infsof.2013.02.009.

[6] G. Carrozza, D. Cotroneo, R. Natella, R. Pietrantuono, S. Russo, Analysis andprediction of mandelbugs in an industrial software system, in: Proc. IEEE the6th International Conference on Software Testing, Verification and Validation(ICST 2013), IEEE Press, 2013, pp. 262–271, http://dx.doi.org/10.1109/ICST.2013.21 (March).

[7] R. Malhotra, Comparative analysis of statistical and machine learningmethods for predicting faulty modules, Appl. Soft Comput. 21 (2014)286–297, http://dx.doi.org/10.1016/j.asoc.2014.03.032.

[8] J. Cahill, J.M. Hogan, R. Thomas, Predicting fault-prone software modules withrank sum classification, in: Proc. the 22nd Australian Conference on SoftwareEngineering (ASWEC 2013), IEEE Press, 2013, pp. 211–219, http://dx.doi.org/10.1109/ASWEC.2013.33 (June).

[9] S. Shivaji, E.J. Whitehead, R. Akella, S. Kim, Reducing features to improve codechange-based bug prediction, IEEE Trans. Software Eng. 39 (4) (2012)552–569, http://dx.doi.org/10.1109/TSE.2012.43.

10] I.H. Laradji, M. Alshayeb, L. Ghouti, Software defect prediction using ensemblelearning on selected features, Inf. Software Technol. 58 (2015) 388–402,http://dx.doi.org/10.1016/j.infsof.2014.07.005.

11] J. Chen, S. Liu, W. Liu, X. Chen, Q. Gu, D. Chen, Empirical studies on featureselection for software fault prediction, in: Proc. the 5th Asia-PacificSymposium on Internetware (Internetware 2013), ACM Press, 2013, pp.163–166, http://dx.doi.org/10.1145/2532443.2532461 (October).

12] J. Chen, S. Liu, W. Liu, X. Chen, Q. Gu, D. Chen, A two-stage data preprocessingapproach for software fault prediction, in: Proc. the 8th InternationalConference on Software Security and Reliability (SERE 2014), IEEE Press, 2014,pp. 20–29, http://dx.doi.org/10.1109/SERE.2014.15,June–July.

13] K. Beck, M. Beedle, A. Van Bennekum, A. Cockburn, W. Cunningham, M.Fowler, J. Grenning, J. Highsmith, A. Hunt, R. Jeffries, et al., Manifesto for AgileSoftware Development, 2001.

14] E. Erturk, E.A. Sezer, Software fault prediction using Mamdani type fuzzyinference system, Proc. the 3rd International Fuzzy Systems Symposium(FUZZYSS’13) (2013) 1–6 (October).

15] R. Goyal, P. Chandra, Y. Singh, Identifying influential metrics in the combinedmetrics approach of fault prediction, Springer Plus 2 (1) (2013) 627–634,http://dx.doi.org/10.1186/2193-1801-2-627.

16] F. Peters, T. Menzies, A. Marcus, Better cross company defect prediction, in:Proc. the 10th International Conference on Measurement, Mining SoftwareRepositories (MSR 2013), IEEE Press, 2013, pp. 409–418, http://dx.doi.org/10.1109/MSR.2013.6624057 (May).

17] D. Rodríguez, R. Ruiz, J.C. Riquelme, R. Harrison, A study of subgroupdiscovery approaches for defect prediction, Inf. Software Technol. 55 (10)(2013) 1810–1822, http://dx.doi.org/10.1016/j.infsof.2013.05.002.

18] M.D.P. NASA, <http://mdp.ivv.nasa.gov/>.19] K. Dejaeger, T. Verbraken, B. Baesens, Towards comprehensible software fault

prediction models using bayesian network classifiers, IEEE Trans. SoftwareEng. 39 (2) (2013) 237–257, http://dx.doi.org/10.1109/TSE.2012.20.

20] T. Zimmermann, R. Premraj, A. Zeller, ‘Predicting defects for Eclipse ictingdefects for Eclipse, in: Proc. the 3rd International Workshop on PredictorModels in Software Engineering (PROMISE 07), IEEE Press, 2007, p. 9, http://dx.doi.org/10.1109/promise.2007.10 (May).

21] D. Cotroneo, R. Natella, R. Pietrantuono, Predicting aging-related bugs usingsoftware complexity metrics, Perform. Eval. 70 (3) (2013) 163–178, http://dx.doi.org/10.1016/j.peva.2012.09.004.

22] N. Kasai, S. Morisaki, K. Matsumoto, Fault-prone module prediction using aprediction model and manual inspection, in: Proc. the 20th Asia-PacificSoftware Engineering Conference (ASPEC 2013), IEEE Press, 2013, pp.106–115, http://dx.doi.org/10.1109/APSEC.2013.25 (December).

23] O. Mizuno, On effects of tokens in source code to accuracy of fault-pronemodule prediction, in: Proc. of the International Computer Science andEngineering Conference (ICSEC), IEEE Press, 2013, pp. 103–108, http://dx.doi.org/10.1109/ICSEC.2013.6694761 (September).

24] T.M. Khoshgoftaar, Y. Xiao, K. Gao, Software quality assessment using amulti-strategy classifier, Inf. Sci. 259 (2014) 555–570, http://dx.doi.org/10.

1016/j.ins.2010.11.028.

25] B. Li, B. Shen, J. Wang, Y. Chen, T. Zhang, J. Wang, A scenario-based approachto predicting software defects using compressed C4.5 model, in: Proc. the38th Annual International Computers, Software and Applications Conference

[

t Computing 49 (2016) 1020–1033

(COMPSAC 2014), IEEE Press, 2014, pp. 406–415, http://dx.doi.org/10.1109/COMPSAC.2014.64 (July).

26] G. Abaei, Z. Rezaei, A. Selamat, Fault prediction by utilizing self-organizingmap and threshold, in: Proc. of the International Conference on ControlSystem, Computing and Engineering (ICCSCE), IEEE Press, 2013, pp. 465–470,http://dx.doi.org/10.1109/ICCSCE.2013.6720010 (December).

27] J. Sayyad Shirabad, T.J. Menzies, The PROMISE Repository of SoftwareEngineering Databases, School of Information Technology and Engineering,University of Ottawa, Canada, 2005 (available: <http://promise.site.uottawa.ca/SERepository>, visit date 01.08.2015).

28] M. D’Ambros, M. Lanza, R. Robbes, An extensive comparison of bug predictionapproaches, in: Proc. the 7th IEEE Working Conference on Mining SoftwareRepositories (MSR), IEEE Press, 2010, pp. 31–41, http://dx.doi.org/10.1109/MSR.2010.5463279 (May, available: <http://bug.inf.usi.ch>, visit date01.08.2015).

29] E. Erturk, E.A. Sezer, Software fault prediction using fuzzy inference systemand object-oriented metrics, in: Proc. the 13th IASTED InternationalConference on Software Engineering (SE 2014), Acta Press, 2014, pp. 101–108,http://dx.doi.org/10.2316/P.2014.810-004 (February).

30] Y. Wu, Y. Yang, Y. Zhao, H. Lu, Y. Zhou, B. Xu, The influence of developerquality on software fault-proneness prediction, in: Proc. of the 8thInternational Conference on Software Security and Reliability (SERE), IEEEPress, 2014, pp. 11–19, http://dx.doi.org/10.1109/SERE.2014.14, June–July.

31] H. Lu, B. Cukic, M. Culp, A semi-supervised approach to software defectprediction, in: Proc. of the 38th Annual International Computers, Softwareand Applications Conference (COMPSAC), IEEE Press, 2014, pp. 416–425,http://dx.doi.org/10.1109/COMPSAC.2014.65 (July).

32] H. Lu, E. Kocaguneli, B. Cukic, Defect prediction between software versionswith active learning and dimensionality reduction, in: Proc. IEEE the 25thInternational Symposium on Software Reliability Engineering (ISSRE 2014),IEEE Press, 2014, pp. 312–322, http://dx.doi.org/10.1109/ISSRE.2014.35(November).

33] Y. Ma, G. Luo, X. Zeng, A. Chen, Transfer learning for cross-company softwaredefect prediction, Inf. Software Technol. 54 (3) (2012) 248–256, http://dx.doi.org/10.1016/j.infsof.2011.09.007.

34] G. Canfora, A.D. Lucia, M.D. Penta, R. Oliveto, A. Panichella, S. Panichella, Multiobjective cross-project defect prediction, in: Proc. IEEE the 6th InternationalConference on Software Testing, Verification and Validation (ICST 2013), IEEEPress, 2013, pp. 252–261, http://dx.doi.org/10.1109/ICST.2013.38 (March).

35] F. Peters, T. Menzies, L. Gong, H. Zhang, Balancing privacy and utility incross-company defect prediction, IEEE Trans. Software Eng. 39 (8) (2013)1054–1068, http://dx.doi.org/10.1109/TSE.2013.6.

36] B. Turhan, A.T. Mısırlı, A. Bener, Empirical evaluation of the effects of mixedproject data on learning defect predictors, Inf. Software Technol. 55 (6) (2013)1101–1118, http://dx.doi.org/10.1016/j.infsof.2012.10.003.

37] B. Ma, H. Zhang, G. Chen, Y. Zhao, B. Baesens, Investigating associativeclassification for software fault prediction: an experimental perspective, Int. J.Software Eng. Knowl. Eng. 24 (1) (2014) 61–90, http://dx.doi.org/10.1142/S021819401450003X.

38] T. Kamiya, S. Kusumoto, K. Inoue, Prediction of fault-proneness at early phasein object-oriented development, in: Proc. the 2nd IEEE InternationalSymposium on Object-Oriented Real-Time Distributed Computing (ISORC’99),IEEE Press, 1999, pp. 253–258, http://dx.doi.org/10.1109/ISORC.1999.776386(May).

39] N. Nagappan, T. Ball, Static analysis tools as early indicators of pre-releasedefect density, in: Proc. the 27th International Conference on SoftwareEngineering (ICSE ‘05), ACM Press, 2005, pp. 580–586, http://dx.doi.org/10.1145/1062455.1062558 (May).

40] A. Halim, Predict fault-prone classes using the complexity of UML classdiagram, in: Proc. 2013 International Conference on Computer, Control,Informatics and Its Applications (IC3INA), IEEE Press, 2013, pp. 289–294,http://dx.doi.org/10.1109/IC3INA.2013.6819188 (November).

41] Y. Jiang, J. Lin, B. Cukic, S. Lin, Z. Hu, Replacing code metrics in software faultprediction with early life cycle metrics, in: Proc. the 3rd InternationalConference on Information Science and Technology (ICIST 2013), IEEE Press,2013, pp. 516–523 (March).

42] Tera-Promise, <http://openscience.us/repo/>, visit date 01.08.201510.1109/ICIST.2013.6747602.

43] C. Catal, Software fault prediction: a literature review and current trends,Expert Syst. Appl. 38 (4) (2011) 4626–4636, http://dx.doi.org/10.1016/j.eswa.2010.10.024.

44] T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic literaturereview on fault prediction performance in software engineering, IEEE Trans.Software Eng. 38 (6) (2012) 1276–1304, http://dx.doi.org/10.1109/TSE.2011.103.

45] M. Jureczko, L. Madeyski, Towards identifying software project clusters withregard to defect prediction, in: Proc. the 6th International Conference onPredictive Models in Software Engineering (PROMISE ‘10), ACM Press, 2010,pp. 9:1–9:10, http://dx.doi.org/10.1145/1868328.1868342 (September).

46] S.R. Chidamber, C.F. Kemerer, A metrics suite for object oriented design, IEEETrans. Software Eng. 20 (6) (1994) 476–493, http://dx.doi.org/10.1109/32.

295895.

47] E. Erturk, E.A. Sezer, Software fault inference based on expert opinion, J.Software 10 (6) (2015) 757–766, http://dx.doi.org/10.17706/jsw.10.6.757-766.

Page 14: Iterative software fault prediction with a hybrid approach · been replaced with agile software development approaches [13], such as the Agile Unified Process, Extreme Programming,

ed Sof

[

[

[

[

[

[

[

[

[

[

[

[[

[[

[

[

Hacettepe in Turkey, where she presently teaches as anassociate professor. She has published approximately 30articles about intelligent systems, fuzzy logic, and suscep-tibility analysis and two book chapters on soft computingmodeling and semantic information retrieval.

E. Erturk, E. Akcapinar Sezer / Appli

48] R. Goyal, P. Chandra, Y. Singh, Suitability of KNN regression in thedevelopment of interaction based software fault prediction models, IERIProcedia 6 (2014) 15–21, http://dx.doi.org/10.1016/j.ieri.2014.03.004.

49] S. Kanmani, V.R. Uthariaraj, V. Sankaranarayanan, P. Thambidurai,Object-oriented software fault prediction using neural networks, Inf.Software Technol. 49 (5) (2007) 483–492, http://dx.doi.org/10.1016/j.infsof.2006.07.005.

50] I. Gondra, Applying machine learning to software fault-proneness prediction,J. Syst. Software 81 (5) (2008) 186–195, http://dx.doi.org/10.1016/j.jss.2007.05.035.

51] L.A. Zadeh, Fuzzy sets, Inf. Control 8 (3) (1965) 338–353, http://dx.doi.org/10.1016/S0019-9958(65)90241-X.

52] E. Ghasemi, H. Amini, M. Ataei, R. Khalokakaei, Application of artificialintelligence techniques for predicting the flyrock distance caused by blastingoperation, Arabian J. Geosci. 7 (1) (2012) 193–202, http://dx.doi.org/10.1007/s12517-012-0703-6.

53] A. Yazdani, S. Shariati, A. Yazdani-Chamzini, A risk assessment model basedon fuzzy logic for electricity distribution system asset management, Decis. Sci.Lett. 3 (3) (2014) 343–352.

54] E.H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzylogic controller, Int. J. Man Mach. Stud. 7 (1) (1975) 1–13, http://dx.doi.org/10.1016/S0020-7373(75)80002-2.

55] A. Akgun, E.A. Sezer, H.A. Nefeslioglu, C. Gokceoglu, B. Pradhan, Aneasy-to-use MATLAB program (MamLand) for the assessment of landslidesusceptibility using a Mamdani fuzzy algorithm, Comput. Geosci. 38 (1)(2012) 23–34, http://dx.doi.org/10.1016/j.cageo.2011.04.012.

56] R. Hecht-Nielsen, Kolmogorov’s mapping neural network existence theorem,in: Proc. the First IEEE International Conference on Neural Networks, IEEEPress, 1987, pp. 11–14 (June).

57] J.S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEETrans. Syst. Man Cybern. 23 (3) (1993) 665–685, http://dx.doi.org/10.1109/21.256541, May–June.

58] E.A. Sezer, B. Pradhan, C. Gokceoglu, Manifestation of an adaptive neuro-fuzzymodel on landslide susceptibility mapping: klang valley, Malaysia, ExpertSyst. Appl. 38 (7) (2011) 8208–8219, http://dx.doi.org/10.1016/j.eswa.2010.

12.167.

59] MATLAB, User’s Guide Version 7.8, R2009a, MathWorks Co., USA, 2009.60] T. Menzies, J. Greenwald, A. Frank, Data mining static code attributes to learn

defect predictors, IEEE Trans. Software Eng. 33 (1) (2007) 2–13, http://dx.doi.org/10.1109/TSE.2007.256941.

t Computing 49 (2016) 1020–1033 1033

61] Eclipse Luna, <https://eclipse.org/luna/> visit date 02.07.2015.62] Eclipse Luna Documentation <http://help.eclipse.org/luna/index.jsp> visit

date 02.07.2015.63] M. Jureczko, D.D. Spinellis, Using object-Oriented design metrics to predict

software defects, in: Proc. the 5th International Conference on Dependabilityof Computer Systems (DepCoS 2010), Oficyna Wydawnicza PolitechnikiWroclawskiej, 2010, pp. 69–81, June–July, 10.1.1.226.2285.

64] Encog, <https://code.google.com/p/encog-java/> visit date 05.07.2015.

Ezgi Erturk was born in Izmir, Turkey in 1987. In 2010,she received her BSc in computer engineering from theDepartment of Computer Engineering at the University ofHacettepe, where she is currently studying to receive aPhD degree in computer engineering. Her research inter-ests include software engineering, software metrics andfuzzy logic, and she has published several internationalconference proceedings about these subjects. She is study-ing at the Scientific and Technological Research Councilof Turkey, Software Technologies Research Institute as asoftware engineer.

Ebru A. Sezer was born in Ankara, Turkey in 1974.She received her BSc, MSc and PhD degrees from theDepartment of Computer Engineering at the University of