10
DETECTION OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department of Computer Science Sri Ramakrishna College of Arts and Science for Women Coimbatore-44 [email protected] G.K Purnimaa Research Scholar Department of Computer Science Sri Ramakrishna College of Arts and Science for Women Coimbatore-44 [email protected] Abstract Code smells are symptoms of poor design and implementation choices that may hinder code comprehension, and possibly increase change- and fault-proneness. Different tools have been proposed for code smell detection, each one characterized by particular features. The aim of this paper is to describe different tools for code smell detection and to evaluate the accuracy of each tool in the detection of five code smells namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy.The proposed work detects the smell in source code in software by using the data mining techniques HIST is used and Association Rules.Association rule concept is implemented by using the support and confidence. The algorithm used here is Apriori; this algorithm combine two item set and do the breath first search technique to find the data sets which are duplicate after that the Apriori algorithm needs to scan the whole data to find the code smells which occurs in the source code. Aggregate Function Based Enhanced Apriori (AFEA) Algorithm is uses theMinimum Support and Confidence to detect the code smell in different classes. For each code smell the performance is calculated using Recall, Precision and F-measure. 1. Introduction Code smells are a concept to characterize the source code that suffers from structural deficiencies that make it hard to understand, change, or test [1]. Fowler et al. introduced code smells as indicators that the source code structure might need to be improved through refactoring. Research has addressed the detection [2] and correction [3] of code smells. Moreover, the negative impact of code smells on software development that has been studied. Complementarily, Brown et al. have introduced anti- patterns, which are related to code smells that describe shortcomings with more profound consequences (e. g., architectural problems) that is not limited to code. Despite the maturities of code smell and anti- pattern research for traditional software systems (especially object-oriented software), current approaches fall short when dealing with the variability of highly configurable software systems. A highly configurable software system (a. k. a. software product line (SPL)) implements not just a single program, but a set of related programs (a program family), which are built from a common set of assets [4]. The commonalities and differences of members of this program family are communicated in terms of features, i. e., increments in functionality that are important to some stakeholder. Code smells have been defined by Fowler [5] as symptoms of poor design and implementation choices. In some cases, such symptoms may originate from activities performed by developers while in a hurry, e.g., implementing urgent patches or simply making suboptimal choices. In other cases, smells come from some recurring, poor design solutions, also known as anti-patterns [6]. For example a Blob is a large and complex class that centralizes the behavior of a portion of a system and only uses other classes as data holders. Blob classes can rapidly grow out of control, making it harder and harder for developers to understand them, to fix bugs, and to add new features. Association rule mining is interested in finding frequent rules that define relations between unrelated frequent items in databases, and it has two main measurements: support and confidence values. The frequent itemset is defined as the itemset that have support value greater than or equal to a minimum threshold support value, and frequent rules as the rules that have confidence value greater than or equal to minimum threshold confidence value. These threshold values are traditionally assumed to be available for mining frequent itemsets. Association Rule Mining is all about finding all rules whose support and G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151 IJCTA | Jan-Feb 2016 Available [email protected] 142 ISSN:2229-6093

DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

Embed Size (px)

Citation preview

Page 1: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

DETECTION OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI

Dr.D.Gayathri Devi Assistant Professor

Department of Computer Science Sri Ramakrishna College of Arts and Science for Women Coimbatore-44

[email protected]

G.K Purnimaa Research Scholar

Department of Computer Science Sri Ramakrishna College of Arts and Science for Women Coimbatore-44

[email protected]

Abstract

Code smells are symptoms of poor design and implementation choices that may hinder code comprehension, and possibly increase change- and fault-proneness. Different tools have been proposed for code smell detection, each one characterized by particular features. The aim of this paper is to describe different tools for code smell detection and to evaluate the accuracy of each tool in the detection of five code smells namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy.The proposed work detects the smell in source code in software by using the data mining techniques HIST is used and Association Rules.Association rule concept is implemented by using the support and confidence. The algorithm used here is Apriori; this algorithm combine two item set and do the breath first search technique to find the data sets which are duplicate after that the Apriori algorithm needs to scan the whole data to find the code smells which occurs in the source code. Aggregate Function Based Enhanced Apriori (AFEA) Algorithm is uses theMinimum Support and Confidence to detect the code smell in different classes. For each code smell the performance is calculated using Recall, Precision and F-measure. 1. Introduction Code smells are a concept to characterize the source code that suffers from structural deficiencies that make it hard to understand, change, or test [1]. Fowler et al. introduced code smells as indicators that the source code structure might need to be improved through refactoring. Research has addressed the detection [2] and correction [3] of code smells. Moreover, the negative impact of code smells on software development that has been studied. Complementarily, Brown et al. have introduced anti-patterns, which are related to code smells that describe shortcomings with more profound consequences

(e. g., architectural problems) that is not limited to code. Despite the maturities of code smell and anti-pattern research for traditional software systems (especially object-oriented software), current approaches fall short when dealing with the variability of highly configurable software systems. A highly configurable software system (a. k. a. software product line (SPL)) implements not just a single program, but a set of related programs (a program family), which are built from a common set of assets [4]. The commonalities and differences of members of this program family are communicated in terms of features, i. e., increments in functionality that are important to some stakeholder.

Code smells have been defined by Fowler [5] as symptoms of poor design and implementation choices. In some cases, such symptoms may originate from activities performed by developers while in a hurry, e.g., implementing urgent patches or simply making suboptimal choices. In other cases, smells come from some recurring, poor design solutions, also known as anti-patterns [6]. For example a Blob is a large and complex class that centralizes the behavior of a portion of a system and only uses other classes as data holders. Blob classes can rapidly grow out of control, making it harder and harder for developers to understand them, to fix bugs, and to add new features.

Association rule mining is interested in finding frequent rules that define relations between unrelated frequent items in databases, and it has two main measurements: support and confidence values. The frequent itemset is defined as the itemset that have support value greater than or equal to a minimum threshold support value, and frequent rules as the rules that have confidence value greater than or equal to minimum threshold confidence value. These threshold values are traditionally assumed to be available for mining frequent itemsets. Association Rule Mining is all about finding all rules whose support and

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

142

ISSN:2229-6093

Page 2: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

confidence exceed the threshold, minimum support and minimum confidence values.

In this work perform association rule mining using a well-known algorithm, namely Apriori. Note that, minimum Support and Confidence to consider an association rule as valid can be set in the Apriori algorithm. Once Historical Information for Smell deTection (HIST) detects these change rules between methods of the same class, it identifies classes affected by Divergent Change as those containing at least two sets of methods with the following characteristics:

1) The cardinality of the set is at least 2) All methods in the set change together, as

detected by the association rules and 3) Each method in the set does not change with

methods in other sets as detected by the association rules.

Here developed an algorithm that converts the HIST information into array of zeros and ones, bitmap. Calculate the support value for each function and variables. Minimum support value is calculated based on aggregation functions, Simple Mean, Mean Square Error, and Standard Deviation. Changed methods and variables are the functions that have low support value that lies on the extra small region defined by Standard Deviation parameter. After pruning elements (non-changed variables) which have support value below the min-supp value, use frequently changed variables and functions into generate rules, and then calculate the confidence value for each rule, and the minimum confidence (mincing) value.

The rest of the paper is organized as follows.. Section 2 discussed the proposed Aggregate Function Based Enhanced Apriori for code detection. Section 4 discusses the experiments and results. Finally Section 5 concludes the paper. 2. Proposed Methodology Code and design smells include low-level or local problems such as code smells, which are usually symptoms of more global design smells such as anti-patterns. The problem occurs in the existing system is low accuracy performance of code smell and software quality is not good. To improve the accuracy performance and software quality, the function used in the proposed work is minimum absolute support value. The proposed work contains the HIST technique which identifies six different code smells namely: Duplicated code, long methods, inappropriate intimacy, override, Cyclomatic Complexity and Down casting. The enhanced Apriori algorithm calculates the performance of code smells by using f-measure, precision and recall. The result obtained is better improvement of software quality and better performance of code smell detection

This research is detection of smells can substantially reduce the cost of subsequent activities in the development and maintenance phases. 3.1 Proposed Code Smells In this section different code smells are described and the detection process of each code smell is mentioned Duplicate Code Identical or very similar code exists in more than one location. Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons. A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. The following are some of the ways in which two code sequences can be duplicates of each other: • character-for-character identical • character-for-character identical with white

space characters and comments being ignored • token-for-token identical • token-for-token identical with occasional

variation (i.e., insertion/deletion/modification of tokens)

• functionally identical Long Method A method, function, or procedure that has grown too large is detected. Method is too long and difficult to understand the changes occur in the classes. The Long Method smell is similar to the Brain Method smell defined, which tend to centralize the functionality of a class, in the same way as a God Class centralizes the functionality of an entire subsystem, or sometimes even a whole system. Measuring long methods should be quite easy. However, relying on a too simple size measure such as NLOC will definitely bring a wrong result, because, for example, initiation methods can often be quite long. There is no sense in refactoring long initiation methods, because they usually have very low Cyclomatic complexity and therefore they are very easy to understand and modify. Inappropriate Intimacy A class that has dependencies on implementation details of another class. This smell indicates that a class has too many dependencies on implementation details of another class. A related spreadsheet smell would be a worksheet that is overly related to a second worksheet. This is possibly

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

143

ISSN:2229-6093

Page 3: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

unhealthy for several reasons. First, adapting one sheet likely requires inspecting the other worksheet, requiring the spreadsheet user to switch back and forth between the two worksheets, increasing the chance that errors are made. Secondly, since there is a strong semantic connection between the two worksheets, the fact that they are split could influence understandability Refused Bequest A class that overrides a method of a base class in such a way that the contract of the base class is not honored by the derived class. Liskov substitution principle. 'Refused Bequest' this refers to inappropriate use of inheritance in object-oriented systems. This code smell occurs when subclasses do not take advantage of the inherited behavior, implying that replacement by delegation should be used instead. Cyclomatic Complexity Too many branches or loops; this may indicate a function needs to be broken up into smaller functions, or that it has potential for simplification. Cyclomatic complexity of a program is a structural (or topological) measure of programs' complexity for measuring software quality. Cyclomatic complexity measuring evaluates the quality of the program code and detects high-complexity procedures. High-complexity procedures are subject to errors and detecting them is highly required to perform code review. Program Cyclomatic complexity was the first topological complexity measure which was used in practice and became basis for many modifications. Measuring of Cyclomatic complexity relates to static code analysis methods. Down Casting A type cast which breaks the abstraction model; the abstraction may have to be refactored or eliminated. type refinement is the act of casting a reference of a base class to one of its derived classes. In programming languages, it is possible to check through type introspection to determine whether the type of the referenced object is indeed the one being cast to or a derived type of it, and thus issue an error if it is not the case. In other words, when a variable of the base class (parent class) has a value of the derived class (child class), downcasting is possible. 3.2 Detectio of Proposed Code smell The set of fine-grained changes computed by the Change history extractor is provided as an input to the Code Smell detector, that identifies the list of code components (if any) affected by specific smells. While the exploited underlying information is the same for all target smells (i.e., the change history

information), AFEA with HIST uses custom detection heuristics for each smell. Note that, since AFEA with HIST relies on the analysis of change history information, it is possible that a class/method that behaved as affected by a smell in the past does not exist in the current version of the system, e.g., because it has been refactored by the developers. Thus, once AFEA with HIST identifies a component that is affected by a smell, AFEA with HIST checks the presence of this component in the current version of the system under analysis before presenting the results to the user. If the component does not exist anymore, AFEA with HIST removes it from the list of components affected by smells Enhanced Apriori Algorithm Here we developed an algorithm that converts the database into array of zeros and ones, bitmap. Calculate the support value for each element, and then the minimum support (min-supp) value. Minimum support value is calculated based on aggregation functions, Simple Mean, Mean Square Error, and Standard Deviation. Infrequent items are the items that have low support value that lies on the extra small region defined by Standard Deviation parameter. After pruning elements (non-frequent items) which have support value below the min-supp value, we use frequent items to generate rules, and then calculate the confidence value for each rule, and the minimum confidence (minconf) value. Minimum confidence value is calculated based on Simple Mean, Mean Square Error, and Standard Deviation. Prune rules which have confidence below the min-conf. (pruning non-frequent rules). Again, generate next pass rules, combining frequent rules with frequent items.

Input: Transaction database D Output: Non-coincidental frequent itemsets For all transactions t ϵ D { Ct=subset (C1, t); For all codes c ϵ Ct c.count++ } L1= Min_sup(C1); For (k=2; Lk -1 ≠ φ;k+ + ) { Ck=Apriori (Lk-1); For all transactions t ϵ D { Ct=subset (Ck,t); For all codes c ϵ Ct c.count++ Lk= {MinSup(Ck)} }

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

144

ISSN:2229-6093

Page 4: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

Return (UkLk ) }

3.3 Apriori Inverse The Apriori-Inverse algorithm [18] is based on a level wise search. On the first pass through the database, an inverted index is built using the unique items as keys and the transaction IDs as data. At this point, the support of each unique item (1- itemsets) in the database is available as the length of each data chain. To generate k-itemsets under max-sup, the (k – 1) itemsets are extended in precisely the same manner as Apriori to generate candidate k- itemsets. That is, a (k – 1) itemset i1 is turned into a k itemset by finding another (k – 1) itemset i2 that has a matching prefix of size (k – 2), and attaching the last item of i1 to i2. For example, the 3 - itemsets {1, 3, 4} and {1, 3, 6} can be extended to form the 4 - itemset {1, 3, 4, 6}, but {1, 3, 4} and {1, 2, 5} will not produce a 4 - itemset due to their prefixes failing to match at the second item. These codes then are checked against the inverted index to ensure that they at least meet a minimum absolute support requirement and are pruned if they do not, (the length of the intersection of a data chain in the inverted index provides support for a k-itemset with k larger than 1). The process continues until no candidate itemsets can be generated, and then association rules are formed in the usual way. It should be clear that Apriori-Inverse finds all perfectly sporadic rules, since we have simply inverted the downward-closure principle of the Apriori algorithm; rather than all subsets of rules being over min-sup, all subsets are under max-sup. Since making a candidate itemset longer cannot increase its support, all extensions are viable except those that fall under the minimum absolute support requirement. Those exceptions are pruned out and are not used to extend itemsets in the next round. For example, let D be {{1, 2, 3, 4}, {1, 3, 5}, {1, 3, 5, 7}, {1, 6, 8}, {2, 3, 4, 6}, {3, 6, 7, 8},{3, 6, 8}, {6, 9}}. The 𝐼𝐼𝑑𝑑𝑥𝑥from D where {item:[tid-list]} is {{1:[1, 2, 3, 4]}, {2: [1, 5]}, {3:[1, 2, 3, 5, 6, 7]}, {4: [1, 5]}, {5: [2, 3]}, {6: [4, 6, 7, 8]}, {7:[3, 6]}, {8:[4, 6, 7]}, {9:[8]}}. Given a maximum support of 25% and supposing that the minimum absolute support value is 2, S1 will be {2, 4, 5, 7}. Items below the minimum absolute support value would not be considered for extension. Thus, item 9, which had the support of 1, was pruned out. The itemsets then are extended to {{2, 4}, {2, 5}, {2, 7}, {4, 5}, {4, 7}, {5, 7}}, but S2 only contains itemset {2, 4}, because the other itemsets have support below the

minimum absolute support value and so are pruned out. Because we are dealing with candidate itemsets with low support, the chance that an itemset appears due to noise or just by coincidence is higher than for candidate itemsets with higher support. Itemsets that occur within the database due to coincidence do not add any meaningful information and, therefore, should not be considered when we are searching for rare itemsets using Apriori-Inverse. The minimum absolute support value is used to filter out these candidate items. The value varies for different codes; the minimum absolute support value for items that have a higher support is generally higher. The minimum absolute support value is dependent solely on the support of the individual items. 3.3.1 Minimum Absolute support Value When searching for rare itemsets, two circumstances are considered: occurrences of itemsets due to some nonrandom process that is generating them or occurrences of itemsets by coincidence. It is important to determine this, as itemsets that have a low support but high confidence that seem interesting may be occurring by chance and should be considered as noise. Clearly, it makes sense only to consider candidate itemsets that appear together more often than coincidence. Coincidence is defined in this manner: for N transactions in which antecedent A occurs in a transactions and consequent B occurs in b transactions, the probability that A and B will occur together exactly c times by chance can be calculated. It is referred to this as probability of collision chance 𝑃𝑃𝑐𝑐𝑐𝑐 . It can be calculated using equation (3). The probability that A and B will occur together an exactly c time is:

𝑃𝑃𝑐𝑐𝑐𝑐 (𝑐𝑐|𝑁𝑁,𝑎𝑎, 𝑏𝑏) =�𝑎𝑎𝑐𝑐� �

𝑁𝑁 − 𝑎𝑎𝑏𝑏 − 𝑐𝑐 �

�𝑁𝑁𝑏𝑏�

This equation is the usual calculation for exact probability of a 2×2 contingency table. Now, we want the least number of collisions above which 𝑃𝑃𝑐𝑐𝑐𝑐 is smaller than some small value p (usually 0.001). This is: 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑏𝑏𝑀𝑀𝑀𝑀𝑀𝑀(𝑁𝑁,𝑎𝑎, 𝑏𝑏) = �𝑚𝑚��𝑃𝑃𝑐𝑐𝑐𝑐 (𝑐𝑐|𝑁𝑁,𝑎𝑎, 𝑏𝑏)� > 1

This formula amounts to invert the usual sense of Fisher’s exact test [19]. Usually a 2×2 contingency table is provided and a p-value calculated; however, here we are providing two of the four values and a p-value and calculating the minimum value to complete the table.

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

145

ISSN:2229-6093

Page 5: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

In order to define a detection strategy for this smell, we exploited the following conjecture: a class affected by Shotgun Surgery contains at least one method changing together with several other methods contained in other classes. Also in this case, the Code Smell detector uses association rules for detecting methods—in this case methods from different classes—often changing together. Hence, a class is identified as affected by a Shotgun Surgery smell if it contains at least one method that changes with methods present in more than 𝛿𝛿 different classes. 4. Results and Discussion The goal of the study is to evaluate AFEA with HIST and HIST, with the purpose of analyzing its effectiveness in detecting smells in software systems. The quality focus is on the detection accuracy and completeness as compared to the approaches based on the analysis of a single project snapshot, while the perspective is of researchers, who want to evaluate the effectiveness of historical information in identifying smells for building better recommenders for developers Context Selection The context of the study consists of twenty software projects. The characteristics of the analyzed systems, namely the software history that we investigated and the size range (in terms of KLOC and # of classes). Among the analyzed projects we have:

• Nine projects belonging to the Apache ecosystem5: ANT, TOMCAT, COMMONS LANG, CASSANDRA, COMMONS CODEC, DERBY, JAMES MIME4J, COMMONS IO, and COMMONS LOGGING.

• Five projects belonging to the Android APIs6: FRAMEWORK-OPT-TELEPHONY, FRAMEWORKSBASE, FRAMEWORKS-SUPPORT, SDK, and TOOLBASE. Each of these projects is responsible for implementing parts of the Android APIs. For example, framework-opt-telephony provides APIs for developers of Android apps allowing them to access services such as texting.

• Six open source projects from elsewhere: JEDIT7, ECLIPSE CORE8, GOOGLE GUAVA9, AARDVARK10, AND ENGINE11, and MONGO DB12.

Performance Measures

𝑅𝑅𝑅𝑅𝑐𝑐𝑎𝑎𝑅𝑅𝑅𝑅 = |𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑅𝑅𝑐𝑐𝑐𝑐 ∩ 𝑑𝑑𝑅𝑅𝑐𝑐𝑅𝑅𝑐𝑐𝑐𝑐𝑅𝑅𝑑𝑑|

|𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑅𝑅𝑐𝑐𝑐𝑐| %

𝑀𝑀𝑐𝑐𝑅𝑅𝑐𝑐𝑀𝑀𝑀𝑀𝑀𝑀𝑐𝑐𝑀𝑀 =|𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑅𝑅𝑐𝑐𝑐𝑐 ∩ 𝑑𝑑𝑅𝑅𝑐𝑐𝑅𝑅𝑐𝑐𝑐𝑐𝑅𝑅𝑑𝑑|

|𝑑𝑑𝑅𝑅𝑐𝑐𝑅𝑅𝑐𝑐𝑐𝑐𝑅𝑅𝑑𝑑| %

where correct and detected represent the set of true positive smells (those manually identified) and the set of smells detected by HIST, respectively. As an aggregate indicator of precision and recall, we report the F-measure, defined as the harmonic mean of precision and recall:

𝐹𝐹 −𝑚𝑚𝑅𝑅𝑎𝑎𝑀𝑀𝑀𝑀𝑐𝑐𝑅𝑅 = 2 ∗𝑀𝑀𝑐𝑐𝑅𝑅𝑐𝑐𝑀𝑀𝑀𝑀𝑀𝑀𝑐𝑐𝑀𝑀 ∗ 𝑐𝑐𝑅𝑅𝑐𝑐𝑎𝑎𝑅𝑅𝑅𝑅𝑀𝑀𝑐𝑐𝑅𝑅𝑐𝑐𝑀𝑀𝑀𝑀𝑀𝑀𝑐𝑐𝑀𝑀 + 𝑐𝑐𝑅𝑅𝑐𝑐𝑎𝑎𝑅𝑅𝑅𝑅%

The f-measure performance is evaluated for Blob, Feature Envy, Divergent Change, Shotgun Surgery, Duplicated code, Long method, Inappropriate intimacy, Refused bequest, Cyclomatic complexity, Downcasting.

Table 1: Snapshots considered for the smell detection

Project Classes

git snapshot

Date classes KLOC

Apache Ant da641025 Jun 2006

846 173

Apache Tomcat

398ca7ee Jun 2010

1,284 336

jEdit feb608el Aug 2005

316 101

Android API (framework-opt-telephony)

b3a03455 Feb 2012

223 75

Android API (frameworks-base)

b4ff35df Nov 2011

2,766 770

Android API (frameworks- support)

0f6f72e1 Jun 2012

246 59

Android API 6feca9ac Nov 268 54

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

146

ISSN:2229-6093

Page 6: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

(sdk) 2011

Android API (tool-base)

cfebaa9b Dec 2012

532 119

Apache Commons Lang

4af8bf41 Jul 2009

233 76

Apache Cassandra

4f9e551 Sep 2011

826 117

Apache Commons Codec

c6c8ae7a Jul 2007

103 23

Apache Derby

562a9252 Jun 2006

1,746 166

Eclipse Core 0eb04df7 Dec 2004

1,190 162

Apache James Mime4j

f4ad2176 Mar 2009

250 280

Google Guava

e8959ed0 Aug 2012

153 16

Aardvark ff98d508 Jun 2012

103 25

And Engine f25236e4 Oct 2011

596 20

Apache Commons IO

c8cb451c Oct 2010

108 27

Apache Commons Logging

d821ed3e May 2005

61 23

]Mongo DB b67c0c43 Oct 2011

22 25

Table 2: Compared values for HIST and AEFA with HIST for duplicate code

Classes HIST (%) AFEA with HIST (%)

1 14 25

2 55 58

3 63 67

4 65 69

5 68 73

6 52 58

7 32 45

8 22 31

9 15 18

10 1 10

Fig 1: Comparison of HIST Duplicated code and

AFEA with HIST Duplicated code 𝝉𝝉

01020304050607080

1 2 3 4 5 6 7 8 9 10

F-m

easu

re

number of subjects

HIST AFEA with HIST

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

147

ISSN:2229-6093

Page 7: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

Table 3: Compared values for HIST and AEFA with HIST for Long method

Classes HIST (%) AFEA with

HIST(%)

1 10 25

2 58 65

3 60 63

4 53 58

5 22 45

6 18 32

7 10 22

8 5 18

9 3 10

10 0 5

Fig 2: Comparison of HIST Long method and AFEA with HIST Long method 𝝎𝝎

Table 4. Compared values for HIST and AEFA with HIST for Inappropriate Intimacy

Fig 3: Comparison of HIST Inappropriate intimacy and AFEA with HIST Inappropriate

intimacy𝑺𝑺

010203040506070

1 2 3 4 5 6 7 8 9 10

F-m

easu

re

number of subjects

HIST AFEA with HIST

01020304050607080

1 2 3 4 5 6 7 8 9 10

F-m

easu

re

number of subjects

HIST AFEA with HIST

Classes HIST (%) AFEA with

HIST(%)

1 12 20

2 48 55

3 56 61

4 61 67

5 66 71

6 48 64

7 28 55

8 18 31

9 9 18

10 2 10

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

148

ISSN:2229-6093

Page 8: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

Table 5. Compared values for HIST and AEFA with HIST for Refused bequest

Classes HIST (%) AFEA with

HIST(%)

1 15 27

2 63 67

3 68 71

4 57 67

5 25 47

6 20 35

7 13 25

8 7 21

9 5 12

10 3.5 8

Fig 4: Comparison of HIST Refused bequest and AFEA with HIST Refused bequest 𝑼𝑼

Table 6. Compared values for HIST and AEFA with HIST for Cyclomatic Complexity

Classes HIST (%) AFEA with

HIST(%)

1 16 25

2 60 65

3 70 75

4 67 69

5 63 65

6 57 60

7 48 55

8 32 41

9 18 25

10 10 12

Fig 5: Comparison of HIST Cyclomatic complexity and AFEA with HIST Cyclomatic

complexity𝑽𝑽

01020304050607080

1 2 3 4 5 6 7 8 9 10

F-m

easu

re

number of subjects

HIST AFEA with HIST

01020304050607080

1 2 3 4 5 6 7 8 9 10

F-m

easu

re

number of subjects

HIST AFEA with HIST

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

149

ISSN:2229-6093

Page 9: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

Table 7. Compared values for HIST and AEFA

with HIST for Downcasting

Classes HIST (%) AFEA with

HIST (%)

1 12 27

2 60 67

3 62 65

4 55 60

5 24 47

6 20 34

7 18 24

8 7 20

9 5 12

Results of the calibration are reported in Fig.1, 2, 3, 4, 5, 6, 7, 8, 9, 10 for the HIST parameters 𝛼𝛼,𝛽𝛽,𝛾𝛾, 𝑎𝑎𝑀𝑀𝑑𝑑𝛿𝛿, 𝜏𝜏,𝜔𝜔,𝑆𝑆,𝑈𝑈,𝑉𝑉,𝑊𝑊. As for the confidence and support, the calibration was not different from what was done in other work using association rule discovery. In particular, we tried all combinations of confidence and support obtained by varying the confidence between 0.60 and 0.90 by steps of 0.05, and the support between 0.004 and 0.04 by steps of 0.004, and searching for the one ensuring the best F-measure value on XERCES. Table 1 summarizes the calibration process, reportinthe values for each parameter that we experimented with and the values that achieved the best results. Accuracy Performance Fig 7 shows that the accuracy comparison of existing HIST and proposed AFEA with HIST. accuracy is measured for six detection techniques of Blob, feature envy, divergent change, shotgun surgery, duplicate code, long method. From the given accuracy comparison the proposed AFEA with HIST is high compare than existing HIST. comparison the proposed AFEA with HIST is high compare than existing HIST.

Fig 6: Comparison of HIST Downcasting and

AFEA with HIST Downcasting 𝑾𝑾

Duplicated code

Long method

Inappropriate

intimacy

Refused bequest

Cyclomatic

complexity

Downcasting

HIST 74 69 65 73 75 69AFEA with HIST 75.65 71 66.5 74.5 75.6 70.12

5860626466687072747678

dete

ctio

n ac

crac

y(%

)

Code Smell Detection

Fig 7: Accuracy comparison of HIST and AFEA with HIST of different detection techniques

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

150

ISSN:2229-6093

Page 10: DETECTION OF CODE SMELLS FROM MINING … OF CODE SMELLS FROM MINING VERSION HISTORIES USING AGGREGATE FUNCTION BASED ENHANCED APRIORI Dr.D.Gayathri Devi Assistant Professor Department

5. Conclusion The detection of smells is important to improve the quality of software systems, to facilitate their evolution, and thus, to reduce the overall cost of theirdevelopment and maintenance. Proposed the following improvements to previous HIST work. First, we introduced AFEA, a method that embodies all the steps necessary to define detection techniques. The AFEA with HIST is evaluated by two empirical studies. Six types of code smells used for evaluated the proposed scheme such as Blob, Feature Envy, Divergent Change, Shotgun Surgery, Duplicated code, Long method. The first, conducted on twenty open source projects, aimed at assessing the accuracy of Aggregate Function Based Enhanced Apriori (AFEA) in detecting instances of the code smells mentioned above. The results indicate that the precision of AFEA, recall is increased. Also, results of the first study indicate that AFEA is able to identify code smells that cannot be identified by competitive approaches solely based on code analysis of a single system’s snapshot. Then, conducted a second study aimed at investigating to what extent the code smells detected by AFEA (and by competitive code analysis techniques) reflect developers’ perception of poor design and implementation choices. Experimental results show that the proposed AFEA with HIST is high accuracy and F-measure compare than existing HIST. The future work includes using the WORDNET dictionary, using existing tools to improve the implementation of our method, improving the quality and performance of the source code of the generated detection algorithms, computing the recall on other systems, applying our detection technique to other kinds of smells, comparing quantitatively our method with previous work. With respect to the last work, we are currently conducting a study on smells detection tools including several tools such as RevJava, FindBugs, PMD, Hammurapi, or Lint4j to our detection technique against existing tools. 6. References [1] M. Fowler, K. Beck, J. Brant, and W. Opdyke,

Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999.

[2] E. van Emden and L. Moonen, “Java quality assurance by detecting code smells,” in WCRE. IEEE, 2002, pp. 97–106.

[3] K. Czarnecki and U. W. Eisenecker, Generative Programming. Addison-Wesley, 2000.

[4] M. Fowler, Refactoring: improving the design of existing code. Addison-Wesley, 1999.

[5] W. J. Brown, R. C. Malveau, W. H. Brown, H. W. McCormick III, and T. J. Mowbray, Anti Patterns: Refactoring Software, Architectures, and Projects in Crisis, 1st ed. John Wiley and Sons, 1998.

[6] Moha,N, YG Gueheneuc, L Duchien, and AF.Le Meur-2010 ,“DECOR: A method for the specification and detection of code and design smells,”IEEE Trans. Softw. Eng., vol. 36, no. 1, pp. 20–36, Jan./Feb.

[7] E. Murphy-Hill and A.P. Black ―Refactoring Tools: Fitness for Purpose‖ IEEE TRANSACTIOON SOFWARE ENGINEERING, VOL 30, NO 2, September 2008

[8] M. Mantyla, J. Vanhanen and C. Lassenius, “A Taxonomy and Initial Empirical Study of Bad Smells in Code. (2003)”, Proceedings of the International Conference on Software Maintenance, pp. 381.

[9] S. Counsell, H. Hamza and R. M. Hierons, “An Empirical Investigation of Code Smell 'Deception' and Research Contextualisation through Paul's Criteria”, (2010) Journal of Computing and Information Technology-CIT 18, (2010), vol. 4, March 4, pp. 333-340.

[10] F. Khomh, M. D. Penta and Y. Guéhéneuc, “An Exploratory Study of the Impact of Code Smells on Software Change proneness”, Proceedings of the 16th Working Conference on Reverse Engineering (WCRE), Lille, France, IEEE Computer Society Press, (2009) October 13-16.

[11] N. Moha, Y. Guehenue, L. Duchien and F. L. Meur, “DECORE: A method for specification and detection of code and design smells”, Software Engineering, IEEE Transactions, vol. 36, no. 1, (2010) January-February, pp. 20-36.

[12] Chatzigeorgiou and A. Manakos, “Investigating the Evolution of Bad Smells in Object-Oriented Code”, Quality of Information and Communications Technology (QUATIC), 2010 Seventh International Conference, pp. 106-115, (2010) September 29-October 2.

[13] F. Fontana, E. Mariani, A. Mornioli, R. Sormani and A. Tonello, “An Experience Report on Using Code Smells Detection Tools”, ICSTW '11: Proceedings of the 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops, (2011).

[14] Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, “Mining version histories to guide software changes,” in ICSE ’04: Proceedings of the 26th International Conference on Software Engineering, 2004, pp. 563–572.

[15] Eva van Emden and Leon Moonen. Java quality assurance by detecting code smells. In Proceedings 9th Working Conference on Reverse Engineering (WCRE 2002), Richmond, Virginia, USA, 2002. IEEE Computer Society.

[16] Scheffer, T.: Finding association rules that trade support optimally against confi- dence. In De Raedt, L., Siebes, A., eds.: Principles of Data Mining and Knowledge Discovery. Volume 2168 of Lecture Notes in Computer Science. Springer Berlin /Heidelberg (2001) 424–435 doi:10.1007/3-540-44794-6 35.

G K Purnima et al, Int.J.Computer Technology & Applications,Vol 7 (1),142-151

IJCTA | Jan-Feb 2016 Available [email protected]

151

ISSN:2229-6093