[ACM Press the 27th IEEE/ACM International Conference - Essen, Germany (2012.09.03-2012.09.07)] Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

Identifying Refactoring Sequences for Improving SoftwareMaintainability

Panita MeananeatraComputer Science Department, Thammasat University, Pathumthani, Thailand

Software Engineering Laboratory, National Electronics and Computer Technology Center, Pathumthani,Thailand

[email protected]

ABSTRACTRefactoring is a well-known technique that preserves soft-ware behaviors and improves its bad structures or bad smells.In most cases, more than one bad smell is found in a pro-gram. Consequently, developers frequently apply refactor-ings more than once. Applying appropriate refactoring se-quences, an ordered list of refactorings, developers can re-move bad smells as well as reduce improvement time andproduce highly maintainable software. According to our2011 survey, developers consider four main criteria to selectan optimal refactoring sequence: 1) the number of removedbad smells, 2) maintainability, 3) the size of refactoring se-quence and 4) the number of modified program elements.A refactoring sequence that satisfies these four criteria pro-duces code without bad smells, with higher maintainability,using the least improvement effort and time, and provid-ing more traceability. Some existing works suggest a list ofrefactorings without ordering, and others suggest refactoringsequences. However, these works do not consider the fourcriteria discussed earlier. Therefore, our research proposesan approach to identify an optimal refactoring sequence thatmeets these criteria. In addition, it is expected that the find-ings will reduce maintenance time and cost, increase main-tainability and enhance software quality.

Categories and Subject DescriptorsD.2.7 [Software Engineering]: Distribution, Maintenance,and Enhancement—restructuring, reverse engineering, andreengineering

General TermsAlgorithm

KeywordsRefactoring sequence, bad smell, maintainability and soft-ware maintenance

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ASE ’12, September 3-7, 2012, Essen, GermanyCopyright 12 ACM 978-1-4503-1204-2/12/09 ...$15.00.

1. INTRODUCTIONRecently, the interest in eliminating bad structures in soft-

ware, known as bad smell such as duplicated code, largeclass or long method, has been growing in the communityof software developers. The bad smell removal makes soft-ware easy to understand and modify. Moreover, it leads toincrease in software maintainability and decrease in main-tenance time. In this process, a well-known technique thatimproves software structures without changing its behavioris called refactoring [1].

To remove a bad smell, developers must identify and applyappropriate refactorings. Different refactoring selection leadto differences in number of modified program elements (suchas classes, methods, variables and so on) and code quality.Therefore, identifying and applying appropriate refactoringsnot only reduces improvement time but also produces highsoftware maintainability. Generally, a lot of bad smells arefound in a program. Consequently, developers frequently ap-ply refactorings more than once. The resulting ordered listof refactorings to be applied is called refactoring sequence.

According to our survey [2], developers consider four maincriteria to select an optimal refactoring sequence: 1) thenumber of removed bad smells, 2) maintainability, 3) thesize of refactoring sequence and 4) the number of modi-fied program elements. An optimal sequence should producecode without bad smells, with higher maintainability, usingthe least improvement effort and time, and providing moretraceability – the understandability of changes [3].

Some existing works [4, 5, 6, 7, 8, 9] only suggested a list ofrefactorings without ordering, and others [3, 11, 12] suggestrefactoring sequences. However these works do not considerthe criteria. Therefore, our research problem is “Can wefind an optimal refactoring sequence that removes the badsmells, uses the least effort to understand refactored codeand improves the maintainability?”

The primary contributions of our research are:

• Refactoring filtering conditions (RFC), which help de-velopers in refactoring identification and program ele-ment identification.

• The optimal refactoring sequence selection, which helpsdevelopers to choose a refactoring sequence that satis-fies a developer’ s criteria.

This paper is organized as follows. Section 2 describes theproposed methodology to identify an optimal refactoring se-quence. Section 3 explains related works about refactoringand sequence identification. Section 4 discusses an evalua-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ASE’12, September 3–7, 2012, Essen, GermanyCopyright 2012 ACM 978-1-4503-1204-2/12/09 ...$15.00

406

tion plan, and Section 5 presents research progress. FinalSection lists our publications

2. PROPOSED METHODOLOGYThis section describes the scope of our approach and ex-

plains the steps of proposed methodology.

2.1 The Scope of Our ApproachThe scope is:

• Focus on refactoring sequence identification for remov-ing long method bad smell – often occurs when a methodis too long or has many variables, parameters or com-plex conditions. Our research aims to remove this badsmell because it leads code that is difficult to under-stand and reuse [5, 10]. Moreover, it may introduceother bad smells such as large class or duplicate code[1].

• Use six refactorings (replace temp with query, intro-duce parameter object, preserve whole object, extractmethod, decomposed conditional, replace method withmethod object) for removing long method bad smelland one basic refactoring (move method)

• assess four sub characteristics of ISO 9126’ s maintain-ability through OO metrics

2.2 The Steps of Proposed MethodologyThis section explains the steps of our approach. At the

beginning, developers give code with bad smells and specifytheir objectives. In our approach, the objectives are definedby two criteria: 1) stop criteria that decide when to stopthe refactoring process and 2) selection criteria used to se-lect an optimal refactoring sequence among the candidatesequences. For example, if a developer decides that his ob-jective is to obtain more comprehensive code after refactor-ing, the process will stop when a sequence that transformsthe original code to the one with the higher value of change-ability is found. If two or more refactorings are generated,our approach uses selection criteria to select a sequence withhigher changeability. After transforming the developer’s ob-jective into stop and selection criteria, our approach iden-tifies appropriate refactorings, applies each refactoring, andassesses whether the stop criteria are met. If the stop cri-teria are not satisfied, this approach repeats the aforemen-tioned steps on the refactored code obtained in the previousiteration.

During this process, our approach tracks the transforma-tions of the original piece of code through a graph as shownin Figure 1. Each state of the code is represented as a node inthis graph. For example, the original piece of code is repre-sented as the root node of the graph, P. The code after eachrefactoring is then represented as a new node. Each refac-toring is represented as an edge in this graph. For example,the refactoring r1 is represented as an edge from node P tonode Pr1, which represents the code after applying refactor-ing r1. Therefore, when our approach identifies and applieseach new set of refactorings, it will unfold the graph byadding a new set of nodes and edges corresponding to thenew states of code from applying each of the refactorings.From Figure 1, suppose that the four candidate refactoringsequences: {r1, r2}, {r1, r3, r4}, {r2, r4} and {r3} meet thestop criteria. The approach selects an optimal refactoringsequence among these four sequences based on the selection

Figure 1: Refactoring graph.

criteria. Each step of our methodology is explained in detailbelow.

2.2.1 Transforming Developer ObjectivesOur approach allows developers to specify their objec-

tives of refactoring by determining the number of bad smellsand the priorities of all sub characteristics of maintainabilitymentioned in Table 1 from one to four where four means veryimportant and one means unimportant. From a given objec-tive, we imply the threshold values for two metrics used fordefining stop criteria: 1) the number of removed bad smellsand 2) maintainability value of code after refactoring. Sup-pose that a developer objective is to remove all bad smells inthe original code and to assign the priority of changeabilityto four and the priorities of the others to one. This leadsto fixing the threshold value for the first metric as the totalnumber of bad smells detected in the original code. For thesecond metric, we imply that the value of changeability forthe refactored code must be higher than that of the origi-nal code. While the values of other sub characteristics forthe refactored code maybe higher or less than that of theoriginal code.

Analogously, our approach defines the selection criteriafrom the developer’s objective. Our selection criteria is de-fined as function f(x1, x2, x3, x4) where x1 is the numberof removed bad smells, x2 is the maintainability value com-posed of four sub characteristics as shown in Table1, x3 isthe size of refactoring sequence, and x4 is the number ofprogram elements to modify. We select an optimal refactor-ing sequence that maximizes the values of x1 and x2 andminimizes the values of x3 and x4.

2.2.2 Unfolding Graph of Refactoring SequencesAfter setting the stop and selection criteria, our approach

calculates metric values of the four sub characteristics forthe original code. This calculation uses the formulae pro-posed by [13]. For example, the formula for changeabilitycalculation is as follows: C = 0.5 LOC + 0.8 CC + 0.15 I,where C is the changeability value, LOC is the number ofline of code, CC is the complexity and I is the Instabilityvalue.

Our approach identifies candidate refactorings throughthe use of refactoring filtering condition (RFC). The ap-proach then applies each refactoring to the original code andobtains a different version of code. Our approach generatesas many versions of the code as the number of candidaterefactorings. Our approach then calculates metric valuesfor each version and determines whether the stop criteriaare met. If the stop criteria are not satisfied, this approachiterates all the steps using each version of the code after

407

Table 1: ISO 9126 Sub Characteristics of Maintain-ability

Sub Characteristics DescriptionAnalyzability Can the faults be easily diagnosed?Changeability Can the software be easily

modified?Stability Can the software continue to

function if changes are made?Testability Can the software be tested easily?

applying a refactoring in the previous iteration. This pro-cess gradually generates the refactoring graph. The graphunfolds when our approach explores the refactoring space.

From the second iteration onwards, our approach also val-idates whether the candidate refactorings identified in thecurrent iteration conflict with previous refactorings in thesequence. The conflict occurs when the refactorings con-stitutes inverse path [11]. Our approach also attempts toreduce the number of refactoring sequences by consideringcommutative and equivalent paths.

2.2.3 Assessing Stop CriteriaAfter unfolding each level of the graph, our approach as-

sesses the values of all the nodes in that level. If there existsa node whose value satisfies the stop criteria, we stop theunfolding process. This ensures that at least one optimalrefactoring sequence that satisfies the developer’s objectiveexists. Note that it is possible to have several sequencesthat satisfy the stop criteria; in that case, we select the onesatisfying our selection criteria the most.

2.2.4 Selecting the Optimal Refactoring SequenceThe last and most important step is to select an optimal

refactoring sequence, which meets our selection objectives,from candidate sequences. The conclusion from our survey[9] shows that developers want a refactoring sequence thatremoves the largest number of bad smells, yields code withhighest maintainability values and requires least effort tounderstand the changed code. Therefore, the selection ob-jectives can be formulated as to maximize the number ofremoved bad smells and maintainability values and to mini-mize the size of refactorings in refactoring sequence and thenumber of program elements to modify.

2.3 Our prototypeWe are developing a prototype for use in the evaluation

of our approach. The prototype receives code and developerobjectives (the number of bad smells to be removed and thepriority values of four sub characteristics of maintainability)as inputs. The tool then (1) transforms developer objectivesinto stop and selection criteria, (2) identifies and appliesappropriate refactorings, (3) gradually generates refactoringgraphs, (4) calculates maintainability metrics, (5) assessesstop criteria, (6) detects conflicts among refactorings in thesequence, (7) simplifies the sequences, and (8) selects anoptimal refactoring sequence. The output of the prototypeis an optimal refactoring sequence.

By focusing on only seven refactorings mentioned in thescope of our approach, we try to automate all these steps asmuch as possible. For example, applying ”replace temp withquery” refactoring can be automated (assuming the query

method naming convention is provided). However, apply”replace method with method object” refactoring is a chal-lenge and may require user intervention.

3. RELATED WORKOur research is related to two categories of works: refac-

toring identification and refactoring sequence.

3.1 Refactoring IdentificationIn refactoring identification, existing works proposed three

well-known methods: logic meta programming, block-basedslicing and software metrics.

Tourwe and Mens use a semi automated approach basedon logic meta programming method to detect bad smellsand propose refactoring opportunities that remove these badsmells [4]. This method detects two bad smells: obsoleteparameter and inappropriate interface; and selects amongfive refactoring: 1) remove parameter, 2) add class, 3) addmethod, 4) rename variable and 5) pull up variable. Thiswork concerned only bad smells at the class hierarchy level.

For block-based slicing method, Yang, Lui and Niu pro-pose an approach to recommend fragments within long meth-ods for extraction [5]. This approach uses blank lines fordividing fragment. Therefore, source code without blanklines cannot use this approach. Tsantails and Chatzigeor-giou proposed an approach that automatically identified ex-tract method refactoring opportunities [6]. This approachuse the union of static slice for extracting the complete com-putation of a given variable declared inside a method. Theirapproach also proposes a set of rules that preserve the codebehavior after slice extraction and prevent the code dupli-cation. Both works focus on the selected fragment to applyextract method refactoring.

For software metrics method, Schneider, Vasa and Hoonuse change metrics to detect refactoring opportunities inobject oriented system after release [10]. Meananeatra etal. propose refactoring filtering condition (RFC) based onsoftware metrics, which are defined in terms of data flowand control flow graphs [8]. Furthermore, they propose therules which help to select the refactoring for removing longmethod bad smell.

In brief, we choose an approach that combined logic metaprogramming and software metrics to identify refactoring[9]. Our approach used software metrics [8] to define pred-icates for detecting refactoring opportunities. To supportthe automatic refactoring identification, we develop a toolin which the part of rules used to identify program elementsand refactoring to apply is implemented in Prolog and in-terfaces with other parts which are implemented in Java.

3.2 Refactoring SequenceExisting works optimize refactoring sequence in two ways:

1) given one sequence, they reduce steps in that sequenceand 2) given a set of sequences which constitutes a searchspace, they select the best one.

Piveta et al. propose an approach to narrow the num-ber of refactoring sequences by discarding those that didnot make sense semantically and avoiding those that leadto the same result [11]. The situations that can occur in asequence are commutative path, inverse path, independentpath and forbidden path. The approach used a deterministicfinite automaton (DFA) to represent a refactoring sequence.Kuhleman et al. present how to optimize sequences of refac-

408

toring transformations to reduce the composition time forproduct line programs [12]. In refactoring transformation,they use feature modules that host restructuring transforma-tion, called refactoring Feature modules (RFMs). For theirprocedure, they reorder and replace a single refactoring ina sequence of refactorings by a different newly created ad-hoc sequence of refactoring. Then they focus on optimizingrefactoring sequences with RFMs refactorings.

Qayum and Heckel propose an approach based on the ap-proximated unfolding of graph transformation to improvescalability and understandability of search-based refactoring[3]. They use graphs to represent object-oriented software atthe class level and graph transformations to describe theirrefactoring operations. In the step of graph unfolding, theyidentify dependencies and conflicts between refactoring stepsleading to an implicit and therefore more scalable represen-tation of the search space. An optimization algorithm basedon the Ant Colony paradigm was used to explore this searchspace for finding the best sequence of refactoring steps.

In summary, these works are used when developers knowwhich refactorings should be applied in sequence. In con-trast, our approach does not generate all refactorings at oncebut gradually unfold a graph of refactoring sequences thenuse pruning technique to eliminate conflicted refactorings.After that, we use an objective function based on four cri-teria mentioned in Section 1 to find an optimal sequence.

4. EVALUATION PLANTo evaluate the effectiveness of our approach, we aim to

compare the optimal refactoring sequence selected by ourapproach with the choice of refactoring sequences suggestedby experienced developers. We will use ten software projectsfrom three software companies for evaluation.

5. RESEARCH PROCESSThis section presents our research progress: finished tasks

and ongoing tasks. According to our methodology, we com-pleted the following tasks:

• Defining three RFC (replace temp with query, intro-duce parameter object and preserve whole object) inform of predicates based on formal model of data flowgraph.

• Defining all object-oriented and maintainability met-rics.

• Developing an Eclipse plug-in for assessing maintain-ability and automatically identifying refactoring usingthree RFC.

The ongoing tasks are:

• Defining the rest of RFC necessary for long methodbad smell removal.

• Creating an optimization method for selecting refac-toring sequence from the candidate sequences.

• Enhancing our Eclipse plug-in with the newly definedRFC.

6. PUBLICATIONSo far, our publication includes three articles in Reference

2, 8 and 9 covering the survey of refactoring practices inThailand as well as RFC definition and metrics.

7. REFERENCES[1] Fowler, M. Refactoring: Improving the Design of

Existing Programs. Addison-Wesley PublishingCompany, 1999.

[2] Meananeatra, P., Rongviriyapanish S., ApiwattanapongT. A Survey on the Maintaining of Software Structurein Thai Software Industries. In Computer Science andInformation Technology,International Conference onInformation and Digital Engineering, September 2011.

[3] Qayum, F., and Heckel, R. Search-Based Refactoringbased on Unfolding of GraphTransformation Systems.In the Fifth International Conference on GraphTransformation - Doctoral Symposium Proceedings,September 2010.

[4] Touryw, T., and Mens, T. Identifying RefactoringOpportunities Using Logic Meta Programming. In theSeventh European Conference on Software Maintenanceand Reengineering Proceedings, March 2003.

[5] Yang, L., Liu, H., and Niu, Z. Identify Fragments to BeExtracted from Long Method. In the 2009 16thAsia-Pacific Software Engineering Conference ,December 2009.

[6] Tsantalis, N., Chatzigeorgiou, A. Identification ofExtract Method Refactoring Opportunities. In SoftwareMaintenance and Reengineering Proceedings, March2009.

[7] Tsantalis, N., Chaikalis, T., and Chatzigeorgiou, A.JDeodorant: Identi?cation and Removal ofType-Checking Bad Smells. In European Conference onSoftware Maintenance and Refactoring Proceedings,April 2008.

[8] Meananeatra, P., Rongviriyapanish S., ApiwattanapongT. Using software metrics to select refactoring for longmethod bad smell. In 8th International Conference onElectrical Engineering/ Electronics, Computer,Telecommunications and Information TechnologyProceedings, May 2011.

[9] Meananeatra, P., Rongviriyapanish S., ApiwattanapongT. Identifying Refactoring through Formal Model Basedon Data Flow Graph. In 5th Malaysian Conference inSoftware Engineering Proceedings, December 2011.

[10] Schneider, J., Vasa, R. and Hoon, L. Do metrics helpto identify refactoring? In the Joint ERCIM Workshopon Software Evolution (EVOL) and InternationalWorkshop on Principles of Software EvolutionProceedings, 2010.

[11] Piveta, E., Araujo, J., Pimenta, M., Moreira, A.,Guerreiro, P., and Price, R.T. Searching forOpportunities of Refactoring Sequences: Reducing theSearch Space. In 32nd Annual IEEE InternationalComputer Software and Applications ConferenceProceedings, July 2008.

[12] Kuhlemann, M., Liang, L. and Saake, G. Algebraicand Cost-based Optimization of Refactoring Sequences.In 2nd International Workshop on Model-drivenProduct Line Engineering held in conjunction withECMFA 2010 Proceedings, June 2010.

[13] Kanellopoulos, Y., Antonellis, P., Antoniou, D.,Makris, C., Theodoridis, E., Tjortjis, C., and Tsirakis,N. Code quality evaluation methodology using theISO/IEC 9126 standard. International Journal ofSoftware Engineering and Applications, July 2010.

409

Documents

[ACM Press the 27th IEEE/ACM International Conference - Essen, Germany (2012.09.03-2012.09.07)] Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering