[ACM Press the Fifth Workshop - Rapperswil, Switzerland (2012.06.01-2012.06.01)] Proceedings of the Fifth Workshop on Refactoring Tools - WRT '12 - Let's make refactoring tools user-extensible!

Let’s Make Refactoring Tools User-extensible!

Huiqing LiSchool of [email protected]

Simon ThompsonSchool of Computing

[email protected]

AbstractWe present a framework for making a refactoring tool ex-tensible, allowing users to define refactorings from scratchusing the concrete syntax of the language, as well as todescribe complex refactorings in a domain-specific lan-guage for scripting. We demonstrate the approach in practicethrough a series of examples.

The extension framework is built into Wrangler, a tool forrefactoring Erlang programs, but we argue that the approachis equally applicable to tools for other languages.

Categories and Subject Descriptors D.2.3 [SOFTWAREENGINEERING]: Coding Tools and Techniques; D.2.6 []:Programming Environments; D.2.7 []: Distribution, Main-tenance, and Enhancement

General Terms Languages, Design

Keywords Erlang, refactoring, Wrangler, API, DSL, anal-ysis, program transformation, extensible, concrete syntax.

1. IntroductionWhat do we mean when we say ‘refactoring’? Interpretednarrowly, it is the process of invoking a refactoring tool toperform one of a fixed repertoire of transformations, whereasthe term ‘refactoring’ is often also used for any process oftransforming our programs so that they work in a differentway, though preserving their original behaviour. A recentstudy by Murphy-Hill et. al. [14] reports that up to 90% ofrefactoring is done ‘by hand’ in practice.

Why such a low proportion of tool use? Tools can onlysupport a limited collection of transformations, and thesewill be the common, ‘vanilla flavoured’ refactorings suchas renaming, function or method extraction, data-type andclass-based refactorings and so on. Users who want to dosomething more complex, which will typically also be more

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.WRT, June 01 2012, Rapperswil, Switzerland.Copyright c© 2012 ACM /12/06-978-1-4503-1500-5. . . $10.00

specific to their application, will have to implement theirrefactorings for themselves. This could simply mean doingthe whole refactoring ‘by hand’ in an editor or IDE – whichmay well provide support for some of the steps – or it couldinvolve the user extending an existing refactoring tool fortheir own purposes.

The latter option – to extend an existing tool such as theJava refactorer in Eclipse – is attractive in theory but sel-dom used in practice. The cost of understanding the internalrepresentation of programs, the internal APIs and the overallarchitecture of the tool means that it’s just not cost effectivefor the working programmer to take this option.1

In this paper we advocate designing refactoring tools forease of extension, and report on the extensibility frameworkbuilt into the Wrangler [11] refactoring tool for Erlang [2].While this is a tool for a specific language, rather than ageneric tool (a point we discuss in more detail in Section7), the approach and the lessons learned are applicable torefactoring tools for any language. The guiding principle ofthe design is to improve the cost-benefit ratio, making itsimple for a user to define powerful new refactorings withas little effort as possible; in particular:

• Users are able to specify transformations using the con-crete syntax of Erlang [9], rather than some internal rep-resentation of a syntax tree (or an XML version of it).This means that users – who will understand how to writeErlang programs – can express transformations throughtemplates and rules written in a familiar language ratherthan having to learn something new.• Information about static semantics that forms the basis

for most refactoring pre-conditions is accessible througha simple API.• As the study [14] points out, some 40% of refactorings

performed using a tool occur in batches. We provide ahigh-level domain-specific language for building com-plex refactorings from simpler components [10].• We specify a callback interface so that user-defined

refactorings can be invoked from within Wrangler, em-

1 The study [14] suggests that even when tools are configurable, this facilitytends not to be used in practice, perhaps because a manual tweak to thedefault behaviour is easier to do than changing the configuration.

32

bedded in either Emacs or Eclipse. Doing this allowsuser-defined refactorings to be applied interactively, pre-viewed and undone, just as for their built-in counterparts.This allows users to develop refactorings in an iterative,test-driven, style.

The main goal of this paper is to prove the usefulnessof the framework through a series of examples in Section 6that demonstrate the practicality of making refactoring toolsuser-extensible. Before that we briefly introduce Erlang andWrangler in Section 2, and give a high-level overview ofthe extension framework: Section 3 covers the template- andrule-based framework for defining elementary Erlang refac-torings, Section 4 the domain-specific language for describ-ing complex refactorings, and Section 5 the callback in-terface (or Erlang behaviour) for refactorings. After theexamples, Section 7 discusses related work and Section 8draws some conclusions and addresses future work.

2. Erlang and WranglerErlang [2] is a strict, impure, dynamically typed functionalprogramming language with support for macros, higher-order functions, pattern matching, concurrency, distribution,fault-tolerance, and dynamic code loading.

An Erlang program consists of a number of modules, eachof which defines a collection of functions. Only functionsexported explicitly through the export directive may becalled from other modules. In Erlang, a function name canbe defined with different arities, and the same function namewith different arities represent entirely different functions.Calls to functions defined in other modules generally qualifythe function name with the module name: the function F

from the module M is called thus: M:F(...).Wrangler [11], downloadable from http://www.github.

com/RefactoringTools, is an open source tool that sup-ports interactive refactoring, API migration and “code smell”detection for Erlang programs. Wrangler is implemented inErlang, and is integrated with (X)Emacs as well as withEclipse through the ErlIDE plugin. Wrangler supports a va-riety of built-in structural refactorings, as well as facilities todetect and eliminate duplicate code [8].

Wrangler uses an Abstract Syntax Tree (AST) to repre-sent Erlang programs. The AST representation generated isdesigned so that all the nodes in the tree have a uniformstructure; building on this we extend nodes with various an-notations such as location, static semantic information, etc.We call the extended ASTs annotated ASTs (AAST). Wran-gler preserves the original layout of the program as much aspossible, and supports undo of refactorings and preview ofrefactoring results.

Wrangler, as an interactive refactoring tool, allows theuser to perform what we term selective refactorings. By thiswe mean refactorings that involve a single local transfor-mation, but may be applicable to multiple places across theproject. An example of this kind of refactoring is to replace

the use of lists:map/2 with a list comprehension through-out a project. Selective refactoring allows the user to choosewhich candidates to refactor, and which not.

3. Composing Elementary RefactoringsIn this section, we give an overview of how to write new

refactorings from scratch in Wrangler.

3.1 Templates and RulesThe template- and rule-based API allows Erlang program-mers to express program analysis and transformation in Er-lang concrete syntax. In Wrangler, a code template is de-noted by an Erlang macro ?T whose only argument is thestring representation of an Erlang code fragment that maycontain meta-variables. A meta-variable is a placeholder fora syntax element in the program, or a sequence of syntax el-ements of the same kind. We also support meta-atoms, thatonly match atoms, which in Erlang represent function namesand so forth.

Syntactically a meta-variable is an Erlang variable, end-ing with the character ‘@’. A meta-variable ending witha single ‘@’ represents a single language element, andmatches a single subtree in the AST; a meta-variable endingwith ‘@@’ is a list meta-variable that matches a sequence ofelements of the same sort. For instance, the template

?T("erlang:spawn(Args@@, Arg1@)")

matches the applications of spawn function to one or morearguments, where Arg1@ matches the last argument, andArgs@@ will match the remaining arguments, if any.

Templates are matched at AST level, that is, the tem-plate’s AST is pattern matched to the program’s AST usingstructural pattern matching techniques. If the pattern match-ing succeeds, the meta-variables/atoms in the template arebound to AST subtrees, and the context and static seman-tic information attached to the subtrees matched can be re-trieved as described below in Section 3.2.

The template-based API is not only used to retrieve in-formation about a program, but also to define transformationrules used in the transformation of a program. A rule definesa basic step in the transformation of a program; it involvesrecognising a program fragment to transform and construct-ing a new program fragment to replace the old one, and isdenoted by a macro ?RULE with the format of

?RULE(Template, NewCode, Cond),

where Template is a template representing the kind of codefragment to search for; Cond is an Erlang expression thatevaluates to true or false; and NewCode is an Erlang ex-pression that returns the new code fragment in the format ofa string or an AST. All the meta-variables/atoms declared inTemplate are made visible to NewCode and Cond by meansof parse transformation, i.e., applying transformations to theparse tree generated by the compiler before it is further pro-cessed and checked for errors, and can be referenced in

33

defining them; furthermore, it is also possible for NewCodeto define its own meta-variables to represent code fragments.

Erlang allows a programmer to make many arbitrary syn-tactic decisions while constructing an expression or func-tion. For example, a case expression in Erlang can havemultiple expression clauses, and there is often flexibility inthe order of the clauses. This flexibility will significantlyincrease the number or complexity of the rules that haveto be written to describe a transformation. To address thisproblem, we have introduced the concept of a meta-rule thatprovides a concise way to express a collection of rules thata user has to write in order to cover the various syntacticchoices that could be made when writing an expression. Likerules, a meta-rule is denoted by a macro ?META RULE withthe format of

?META RULE(Template, NewCode, Cond),

however, a meta rule uses more flexible algorithms for pat-tern matching, and for the generation of new object code.More details about meta-rules can be found in the Wranglerdocumentation.

3.2 Support for Program AnalysisProgram analysis plays a vital role in refactoring. Very oftenthe program analysis process needs to collect some syntacticor semantic information from an AST. This task is supportedby Wrangler in two ways. First, information derived from theprogram by Wrangler is attached to the AST as annotations:we provide functions to extract these annotations from theAST. Second, information available at different nodes canbe collected together: our API provides a macro to supportthis collection.

The Erlang macro ?COLLECT is defined to allow informa-tion collection from nodes that match the template spec-ified and satisfies certain conditions. Calls to the macro?COLLECT have the format:

?COLLECT(Template, Collector, Cond),

in which Template is a template representation of the kindof code fragments of interest; Cond is an Erlang expressionthat evaluates to either true or false; and Collector isan Erlang expression which retrieves information from thecurrent AST node. We call an application of the ?COLLECT

macro a collector.

3.3 AST Traversal StrategiesEach transformation rule or collection macro is written tobe applied to a particular node of an AST; in order forthese operations to be applied across a complete AST it isnecessary to use an AST traversal strategy, as introducedin [1]. An AST traversal strategy walks through the AST incertain order, and applies transformation rules to nodes thatmeet certain conditions or collects some information fromeach node visited when program analysis is concerned.

A number of pre-defined AST traversal strategies areprovided through the Wrangler API. Traversal strategies canbe distinguished according to three criteria:

• The purpose of the traversal, i.e., collection informa-tion (TU) or AST transformation (TP).• The termination condition for the traversal (FULL/STOP/

ONCE), i.e. whether to continue the AST traversal after asuccessful match and how.• The order in which the AST nodes are visited, i.e top-

down traversals (TD) or bottom-up traversals (BU).

In Wrangler, a traversal strategy macro is named to reflectthe three aspects above. For example, the traversal strategyFULL TD TU means that the AST is to be traversed top-down,all the nodes will be visited, and the traversal will return theinformation collected.

A traversal strategy macro takes two arguments. The firstis a collection of transformation rules or a collection ofinformation collectors, and the second specifies the scopeto which the transformation or analysis is to be applied.

4. Scripting Composite RefactoringsThe idea of composite refactorings was first proposed by

Opdyke [16], and investigated by Roberts [17] and others.Existing approaches to composite refactoring tend to focuson the derivation of a combined precondition for a compositerefactoring, so that the entire precondition of the compositerefactoring can be checked on the initial program before per-forming any transformation [3, 7]. The ostensible rationalefor this is to give improved performance of the refactoringengine. However, given the usual way in which refactoringtools are used in practice – where the time to decide on theappropriate refactoring to apply will outweigh the executiontime – we do not see that the efficiency gains that this ap-proach might give are of primary importance to the user.

In contrast, our aim is to increase the usability and ap-plicability of the refactoring tool, by expanding the way inwhich refactorings can be put together. Our work does nottry to carry out precondition derivation, instead each primi-tive refactoring is executed in the same way as it is invokedindividually, i.e., precondition checking followed by pro-gram transformation. While it may be less efficient when anatomic composite refactoring, whose meaning we will ex-plain later, fails during the execution, it does have its advan-tages in expressivity. In an overview, Wrangler’s frameworktries to address the following limitations when compositerefactorings are described manually:

• When the number of primitive refactoring steps involvedis large, enumerating all the primitive refactoring com-mands could be tedious and error prone.• The static composition of refactorings does not support

generation of refactoring commands that are program-dependent or refactoring scenario dependent, or where a

34

subsequent refactoring command is somehow dependenton the results of an earlier application.• Some refactorings refer to program entities by source lo-

cation instead of name, as this information may be ex-tracted from cursor position in an editor or IDE, say.Tracking of locations is again tedious and error prone;furthermore, the location of a program entity might bechanged after a number of refactoring steps, and in thatcase locations become untrackable.• Even though some refactorings refer to program entities

by name (rather than location), the name of a programentity could also be changed after a number of refactoringsteps, which makes the tracking of entity names hardor sometimes impossible, particularly when non-atomiccomposite refactorings are involved.

We resolve these problems in a number of ways:

• Each primitive refactoring has been extended with a refac-toring command generator that can be used to generaterefactoring commands in batch mode. A command gener-ator searches the AST of a program for refactoring candi-dates according to the constraints on arguments specified,and returns a set of refactoring commands.• A command generator can generate commands lazily, i.e.,

a refactoring command is generated only as it is to be ap-plied, so we can make sure that the information gatheredby the generator always reflects the latest status, includingsource locations, of the program under refactoring.• Wrangler always allows a program entity to be referenced

using its original name, as it performs name tracking be-hind the scenes.• Finally, and most importantly, we provide a small domain-

specific language (DSL) to allow composition of refactor-ings in a compact and intuitive way. The DSL allows usersto have fine-grained control over the generation of refac-toring commands and the interaction between the user andthe refactoring engine so as to allow decision making dur-ing the execution of a composite refactoring.

4.1 Transactions: atomic vs non-atomic compositionWe use the notion of atomic and non-atomic to control thepropagation of failure during the execution of a compositerefactoring. If a composite refactoring succeeds only if allof its direct constituent refactorings succeed, then we saythis composition is an atomic composition; a non-atomiccomposite refactoring always succeeds. The failure of anatomic composite refactoring will leave the original programunchanged. Another way of expressing this would be to saythat atomic refactorings are treated as transactions.

4.2 The Domain-Specific LanguageWe give an overview of the definition of the DSL now, butmore details can be found in [10]. The DSL, as shown in

RefacName ::= rename fun | rename mod

| rename var | new fun | gen fun | ...PR ::= {refactoring, RefacName, Args}CR ::= PR

| {interactive, Qualifier, [PRs]}| {repeat interactive, Qualifier, [PRs]}| {if then, fun()→ Cond end, CR}| {while, fun()→ Cond end, Qualifier, CR}| {Qualifier, [CRs]}

PRs ::= PR | PRs, PR

CRs ::= CR | CRs, CR

Qualifier ::= atomic | non atomic

Args ::= ...A list of Erlang terms...

Cond ::= ...An Erlang expression that evaluates to

a boolean value...

Figure 1. The DSL for scripting composite refactorings

Figure 1, is defined in Erlang syntax, using tuples and atoms.In the definition, PR denotes a primitive refactoring, andCR denotes a composite refactoring.

• A primitive refactoring is, by definition, an atomic com-posite refactoring. A primitive refactoring is an ele-mentary behaviour-preserving source-to-source programtransformation that consists of a set of preconditions anda set of transformation rules. When a primitive refactoringis applied to a program, all the preconditions are checkedbefore the program is actually transformed by applyingall the transformation rules. A primitive refactoring failsif the conjunction of the set of preconditions returns false;otherwise the primitive refactoring succeeds.• {interactive, Qualifier, [PRs]} represents a list of

primitive refactorings to be executed in an interactive way,that is, before the execution of every primitive refactoring,Wrangler asks the user for confirmation that he/she reallywants that refactoring to be applied. The confirmationquestion is generated automatically by Wrangler.• {repeat interactive, Qualifier, [PRs]} also repre-

sents a list of primitive refactorings to be executed in aninteractive way, but different from the previous one, it al-lows user to repeatedly apply one refactoring, with differ-ent parameters supplied by the user, until he/she decidesto stop. The user-interaction is carried out before each runof a primitive refactoring.• {if then, fun() → Cond end, CR} represents the

conditional application of CR, i.e., CR is applied only ifCond, which is an Erlang expression, evaluates to true.We wrap Cond in an Erlang function closure to delay itsevaluation until it is needed.• {while, fun() → Cond end, Qualifier, CR} allowsCR, which is generated dynamically, to be continually ap-

35

plied until Cond evaluates to false. Qualifier specifieswhether the refactoring is to be applied atomically or not.• {Qualifier, [CRs ]} represents the composition of a list

of composite refactorings into a new composite refactor-ing, where the qualifier states whether the resulting refac-toring is applied atomically or not.

5. Generic Refactoring BehaviourWhile every refactoring has its own pre-condition analysis

and transformation rules, there are some parts of the refac-toring process that are common to most refactorings, suchas the generation and annotation of ASTs, the outputting ofrefactoring results, the collecting of change candidates, andthe workflow of the refactoring process. We can use an Er-lang behaviour to capture this genericity.

An Erlang behaviour is an application framework thatis parameterized by a callback module. The behaviour im-plements the generic parts of the problem, while the call-back module implements the specific parts. We have definedtwo behaviours, namely gen refac and gen composite refac.gen refac is the behaviour to use when defining a primitiverefactoring, whereas gen composite refac is the behaviourfor defining a composite refactoring.

With both behaviours, we aim to encapsulate those partsthat are generic to all refactorings in the behaviour module,and let the user to handle the parts that are specific to therefactoring under consideration. Another advantage of hav-ing a refactoring behaviour is the ease of integration with anIDE. Since the integration only involves the IDE and the be-haviour module, it is done by Wrangler; the developer of thecallback module need not be concerned.

6. ExamplesIn this section, we demonstrate our approach by means of

four examples. All the examples are available for download(as part of Wrangler).

Example 1. Remove an argument. Removing an unusedargument from a function definition is a common refactor-ing supported by most refactoring tools. Part of the imple-mentation of this refactoring is shown in Figs 2 and 3. Thepart of code not shown includes two more transformationrules, which handle the M:F/A format representation of func-tion names and the function type specification respectively,and the function for refactoring modules that directly dependon the current module; however this should not affect one’sunderstanding of the way in which refactorings are imple-mented in Wrangler. We explain the implementation now.

• Line 2 declares that this module implements the gen refacbehaviour, and lines 4-5 exports the callback functionsthat are required by gen refac.• This refactoring needs the user to specify the index of the

argument to be removed, and lines 6-7 defines the promptused when asking the user to input the index.

1. -module(refac_remove_an_argument).

2. -behaviour(gen_refac).

3. -include_lib("wrangler/include/wrangler.hrl").

4. -export([input_par_prompts/0,select_focus/1,

5. check_pre_cond/1,selective/0,transform/1]).

6. -spec input_par_prompts() -> [string()]

7. input_par_prompts() -> ["Parameter Index : "].

8. -spec select_focus(Args::#args{}) ->

9. {ok, syntaxTree()} |{ok, none}

10. select_focus(_Args=#args{current_file_name=File,

11. cursor_pos=Pos}) ->

12. api_interface:pos_to_fun_def(File, Pos).

13. -spec check_pre_cond(Args::#args{})->

14. ok | {error, Reason}

15. check_pre_cond(Args=#args{focus_sel=FunDef,

16. user_inputs=[Ith]}) ->

17. {_M,_F,A} = api_refac:fun_define_info(FunDef),

18. case Ith>=1 andalso Ith=<A of

19. true -> check_pre_cond_1(Args, Ith);

20. false -> {error, "Index is invalid."}

21. end.

22. check_pre_cond_1(#args{focus_sel=FunDef},Ith) ->

23. IthArgs=?STOP_TD_TU(

24. [?COLLECT(

25. ?T("f@(Args@@)when Guard@@-> Bs@@;"),

26. lists:nth(Ith, Args@@),

27. true)], FunDef),

28. case lists:all(fun(A) ->

29. api_refac:type(A) == variable

30. andalso api_refac:var_refs(A) ==[]

31. end, IthArgs) of

32. true -> ok;

33. false ->{error, "Parameter cannot be removed."}

34. end.

Figure 2. Remove an argument (part 1)

• lines 8-12 defines the callback function select focus,which allows the user to select the function of interest bypointing the cursor to the function definition, and returnsthe AST representation of the function selected.• Lines 13-34 implements the pre-condition checking. The

API function fun define info takes the function defini-tion as input, and returns the module name, function nameand arity of the function in a tuple. Apart from checkingthat the index value is valid, the pre-condition also checksthat the argument to be removed is represented as a vari-able, and not used by the function definition. Since an Er-lang function could consist of multiple function clauses,the argument from each clause needs to be checked; thecollecting of these arguments is done by the code betweenlines 23-27, where a COLLECT macro is used together withan AST traversal strategy. i.e. STOP TD TU, to retrieve theargument from each clause.• The callback function transform defined in Fig 3 performs

the program transformation using the AST traversal strat-egy FULL TD TP, which does a full top-down traversal of

36

35. transform(Args=#args{current_file_name=File,

36. focus_sel=FunDef,

37. user_inputs=[Ith]}) ->

38. MFA = api_refac:fun_define_info(FunDef),

39. ?FULL_TD_TP([rule1(MFA,Ith),rule2(MFA,Ith)],[File]).

40. rule1({M,F,A}, Ith) ->

41. ?RULE(?T("f@(Args@@) when Guard@@ -> Bs@@;"),

42. begin

43. NewArgs@@=delete(Ith, Args@@),

44. ?TO_AST("f@(NewArgs@@) when Guard@@->Bs@@;")

45. end,

46. api_refac:fun_define_info(f@) == {M, F, A}).

47. rule2({M,F,A}, Ith) ->

48. ?RULE(?FUN_APPLY(M,F,A),

49. begin

50. Args = api_refac:get_app_args(_This@),

51. NewArgs=delete(Ith, Args),

52. api_refac:update_app_args(_This@, NewArgs)

53. end, true).

54. delete(Ith, Args) ->

55. ... delete the Ith argument from a list...

Figure 3. Remove an argument (part 2)

transform(_Args=#args{search_paths=SearchPaths})->

?FULL_BU_TP([replace_bug_cond_macro_rule(),

logic_rule_1(),

.....

if_rule_1()], SearchPaths).

replace_bug_cond_macro_rule() ->

?RULE(?T("Expr@"), ?TO_AST("false"),

is_bug_cond_macro(Expr@)).

logic_rule_1() ->

?RULE(?T("not false"),?TO_AST("true"),true).

if_rule() ->

?RULE(?T("if Conds1@@, false,Conds2@@ -> Body1@@;

true -> Body2@@

end"), Body2@@, true).

is_bug_cond_macro(Expr) ->

api_refac:type(Expr) == macro andalso

is_bug_cond_name(?PP(Expr)).

is_bug_cond_name::string()-> boolean().

is_bug_cond_name(Str) -> ..check the macro name....

Figure 4. Eliminating Bug Pre-conditions

the AST while trying to apply the rules to the each of thenodes visited. The first rule, i.e. rule1 defined betweenlines 40-46, removes the Ith parameter from the func-tion clause definition, and the second rule, i.e. rule2 de-fined between lines 47-53 removes the Ith argument froman application site of the function. The pre-defined macroFUN APPLY is used to capture the various ways a functioncan be called in Erlang. The function delete is a normalErlang function that deletes the Ith element from a list.

Example 2. Bug pre-condition This example illustratesthe implementation of a refactoring that is more application-

specific, therefore not likely to be included in the distributionof refactoring tools. Without tool support, this kind of refac-toring is likely to be applied manually. This example showshow little effort a user needs to invest to implement a partic-ular refactoring that meets his/her own needs.

This particular example comes from testing, and specifi-cally testing based on models. It is often necessary to patcha faulty model to be able to continue testing, otherwise thesame faults keep being detected, and no new faults are found.The solution is to add macros to the code: if the bug ispresent in the code, the macro is true, otherwise it is false.However, when delivering the code after testing, it is nec-essary to remove uses of those macros. This is a repetitiveprocess, and previously had to be done manually. One of thefiles we examined contains 43 use places of the bug macros;obviously, manual refactoring of all these use cases could bea time-consuming process, particularly if it had to be per-formed repeatedly.

The actual refactoring is simple: recognise the macro ap-plications of syntax ?xxx bug nnn, replace the macro ap-plication with ‘false’, and clean up the code accordingly.For example, the code fragment

if ?com bug 006 -> set raw data(Id, Data, NewS);

true -> NewS

end

should be refactored to the variable NewS.To automate this refactoring process, we once again im-

plement a gen refac behaviour. This refactoring does notneed user input, focus selection or pre-condition checking,therefore the only salient function is the transform call-back function. The definition of transform together withsome of the rules are shown in Fig 4. In total, 14 rules weredefined to handle various scenarios. With this refactoringsupport, eliminating bug pre-condition macros becomes amatter of simply pressing a button.

The previous two examples demonstrate how Wrangler’stemplate- and rule-based API makes it feasible for users todefine refactorings themselves. With the next two examples,we will demonstrate how Wrangler’s DSL makes it practicalfor users to script composite refactorings.

Example 3. Batch prefixing of module names. Renam-ing module names is one of the most common refactorings.When a collection of modules need to be renamed in a sys-tematic way, refactoring support for batch renaming of mod-ule names becomes handy, but composite refactorings of thiskind are not supported by most refactoring tools. Wrangler’sDSL support for composite refactorings means that batchrenaming of module names is easy to achieve: for exam-ple, Fig 5 shows the script that adds a given prefix to ev-ery Erlang module name in a project. The script implementsthe gen composite refac behaviour, i.e. all the callback func-tions required by this behaviour are defined by this module.This refactoring asks the user to input the prefix, as shownby the function input par prompts.

37

1. -module(refac_batch_prefix_module).

2. -export([input_par_prompts/0, select_focus/1,

3. composite_refac/1]).

4. -behaviour(gen_composite_refac).

5. -include_lib("wrangler/include/wrangler.hrl").

6. input_par_prompts() -> ["Prefix: "].

7. select_focus(_Args) -> {ok, none}.

8. -spec composite_refac(Args::#args{})->

9. composite_refac().

10. composite_refac(_Args=#args{user_inputs=[Prefix],

11. search_paths=SearchPaths}) ->

12. {non_atomic,

13. {refac_, rename_mod,

14. [{file, fun(File)->

15. filename:extension(File)==".erl"

16. end},

17. {generator, fun(File) ->

18. Prefix++filename:basename(File)

19. end},

20. false, SearchPaths])).

Figure 5. Batch prefixing of module names

The script of this composite refactoring is defined in func-tion composite refac. Lines 13-20 use the refactoring com-mand generator for the rename mod refactoring, representedas a tuple with refac as the first element, to generate acollection of refactoring commands. The command gener-ator says that for every file in the directories specified bySearchPaths, if this file is an Erlang file, i.e. has a file ex-tension of .erl, then generate a rename mod command forthis file, and the new module name is generated by prefixingthe original module name with the prefix given by the user.

The ‘false’ in line 20 tells the command generatorthat all the commands should be generated before the firstrefactoring is to be applied, i.e. no lazy command generationis used. Finally, the top-level qualifier non atomic says thatif one of the refactorings fails to rename a module, therefactoring process does not fail as a whole.

Example 4. Batch clone removal Wrangler’s similar codedetection functionality [8] is able to detect code clones in anErlang program, and help with the clone elimination process.For each set of code fragments that are clones of each other,Wrangler generates a function, named new fun, which rep-resents the least-general common abstraction of the clones;cloned code fragments can then be replaced by applicationsof this function, thus eliminating duplicate code.

The general procedure to remove such a clone in Wran-gler is to copy and paste the function new fun into a module,then carry out a sequence of refactoring steps as follows:

• Rename the function to reflect its meaning.• Rename variables if necessary, especially those of the

form NewVari , which are generated by the clone detector.• Swap the order of parameters if necessary.• Export the function if some clones are in other modules.

{atomic,

[{interactive, atomic, RENAME_FUNCTION new_fun},

{atomic, RENAME_VARIABLES NewVar*},

{repeat_interactive, atomic, SWAP_ARGS},

{if_then, NOT_EXPORTED, ADD_TO_EXPORT},

{non_atomic, {refac, fold_expr, CLONE_INSTANCES}}

]}

Figure 6. Batch Clone Elimination

• For each module that contains one or more cloned codefragments, apply the ‘fold’ refactoring to replace theclones in that module with calls to the new function.

The process can be scripted as a gen composite refac be-haviour, and full details are given in [10]; here we reviewthe top-level structure, as given in Fig. 6.

• The whole refactoring is atomic, i.e. a single transaction.• This does not stop parts of the refactoring not being

transactions (i.e. non-atomic): if replacing a particularclone instance fails, this will not affect the correctnessof the refactoring. On the other hand, swapping functionarguments must be a transaction: we must have all thearguments in the correct order before we continue.• We may we need to export the renamed function, but we

don’t know its new name. The infrastructure will trackthis, and allow us to refer to the function by its old name.• Aspects of the refactoring are interactive: we prompt for

names of variables and functions. Interactions can also berepeated, as in the case of swapping argument positions.

7. Related Work

Rule-based structural transformation is used by manyother meta-programming paradigms, we discuss these now.

TXL [4] is a generalised source-to-source translation sys-tem. A TXL program takes as input a context-free grammarin BNF-like notation, and a set of transformation rules to beapplied to inputs parsed using the grammar.

Stratego/XT [1] is a language for transformation of ab-stract syntax trees, and XT is a bundle of transformationtools that combine it with tools for other aspects of programtransformation, such as parsing, pretty-printing, etc.

The ASF+SDF [19] meta-environment supports programanalysis, transformation, generation of interactive programenvironments and pretty-printers, etc. As the successor of theASF+SDF, the recently-released RASCAL [6] is a languagethat aims to provide a general framework for high-levelintegration of source code analysis and manipulation.

JunGL [20] is a DSL for implementing refactorings, com-bining an imperative core, ML-like algebraic data types withpattern-matching for representing syntax trees, and featuresfor defining attributes on edges between AST nodes. In-stead of providing declarative descriptions of transforma-tions, JunGL relies on imperative modification of the AST.

38

The tools above give powerful support for program trans-formation, and infrastructure on which to build programanalysis; others explicitly support the analysis of conditionsfor composite refactorings; we examine these now.

The idea of composite refactorings was proposed byOpdyke [16], and investigated by Roberts [17]. This workfocused on the derivation of a composite refactoring’s pre-conditions from the pre- and postconditions of its constituentrefactorings. O Cinneide [3] extends Roberts’ approach invarious ways including static manual derivation of pre- andpostconditions for a composite refactoring.

ContTraCT is an experimental refactoring editor for Java[7], allowing composition of larger refactorings from ex-isting ones. ContTraCT has two basic composition opera-tions: AND and OR, that correspond to the atomic and non-atomic composition described in this paper. A formal modelbased on the notion of backward transformation descriptionis used to derive the preconditions of an AND-sequence.

Extensibility has long been a question for those designingrefactoring tools [13]. MOOSE [15] is a language-genericenvironment that provides support for re-engineering com-plex software systems. While it is clearly tailorable, andhas been used successfully in a variety of projects, it istoo heavyweight for the working programmer to use. More-over, the effort of tuning a system of this sort to handle thespecifics of a language (see e.g. [18] on renaming in Java)makes it difficult for it to perform as well as a system de-signed for a particular language.

Jbrx [12] is a Java refactoring tool that provides extensi-bility through giving users access to a program representa-tion in XML, over which they can describe transformationsand pre-conditions, but this requires users to learn both XMLand the Sapid/XML tool platform.

RubyTL [5] is the closest to our work, in presentinga language for describing Ruby program transformations,embedded in Ruby. It does not combine this with the use ofconcrete syntax for specifying new atomic transformations;rather it concentrates on transformations on (meta-)models.

8. Conclusions and Future WorkWe have presented the API and DSL based approach taken

by Wrangler to make it practical for users to define forthemselves elementary and composite refactorings. Througha series of examples, we have demonstrated the usability andpower of this framework. We believe that making refactoringtools user-extensible is crucial step towards removing thebarriers that inhibit the full take-up of refactoring tools.

In the future, we aim to carry out more case studies to seehow the support for user-defined refactorings is perceived byusers, and whether this changes the way they refactor theircode. We will also explore how this work applies to othertools, such as HaRe, a refactoring tool for Haskell programs.

This research was supported by EU FP7 collaborativeproject 215868, ProTest (www.protest-project.eu).

References[1] M. Bravenboer, K. T. Kalleberg, et al. Stratego/XT 0.17.

A language and toolset for program transformation. Sci. ofComputer Prog., 72(1-2), 2008.

[2] F. Cesarini and S. Thompson. Erlang Programming. O’ReillyMedia, Inc., 2009.

[3] M. O. Cinneide. Automated Application of Design Patterns:A Refactoring Approach. PhD thesis, Trinity College Dublin,2000.

[4] J. R. Cordy. Source Transformation, Analysis and Generationin TXL. In PEPM, 2006.

[5] J. S. Cuadrado, J. G. Molina, and M. Menarguez. Rubytl: Apractical, extensible transformation language. In 2nd Euro-pean Conference on Model Driven Architecture, 2006.

[6] P. Klint, T. van der Storm, and J. J. Vinju. RASCAL: ADomain Specific Language for Source Code Analysis andManipulation. In SCAM, 2009.

[7] G. Kniesel and H. Koch. Static Composition of Refactorings.Sci. Comput. Program., 52, August 2004.

[8] H. Li and S. Thompson. Incremental Code Clone Detectionand Elimination for Erlang Programs. In Fundamental Ap-proaches to Software Engineering (FASE’11), 2011.

[9] H. Li and S. Thompson. A User-extensible Refactoring Toolfor Erlang Programs. Technical Report 4-11, School of Com-puting, Univ. of Kent, UK, 2011.

[10] H. Li and S. Thompson. A Domain-Specific Language forScripting Refactoring In Erlang. In Fundamental Approachesto Software Engineering (FASE’12), 2012.

[11] H. Li, S. Thompson, G. Orosz, and M. Toth. Refactoring withWrangler, updated. In ACM SIGPLAN Erlang Wkshp, 2008.

[12] K. Maruyama and S. Yamamoto. Design and Implementationof an Extensible and Modifiable Refactoring Tool. In Pro-ceedings of IWPC ’05. IEEE Computer Society, 2005.

[13] T. Mens and A. V. Deursen. Refactoring: Emerging Trendsand Open Problems, 2003.

[14] E. Murphy-Hill, C. Parnin, and A. P. Black. How We Refac-tor, and How We Know It. IEEE Transactions on SoftwareEngineering, 99, 2011.

[15] O. Nierstrasz, S. Ducasse, and T. Gırba. The Story of Moose:an Agile Reengineering Environment. In Proceedings of the10th European software engineering conference. ACM, 2005.

[16] W. F. Opdyke. Refactoring Object-Oriented Frameworks.PhD thesis, Univ. of Illinois, 1992.

[17] D. B. Roberts. Practical Analysis for Refactoring. PhD thesis,Univ. of Illinois, 1999.

[18] M. Schafer, T. Ekman, and O. de Moor. Sound and extensiblerenaming for java. In OOPSLA ’08. ACM, 2008.

[19] M. van den Brand et al. The ASF+SDF Meta-environment: AComponent-Based Language Development Environment. InCompiler Construction, volume 44, 2001.

[20] M. Verbaere et al. JunGL: A scripting Language for Refactor-ing. In Proceedings of the 28th international conference onSoftware engineering(ICSE), 2006.

39

Documents

[ACM Press the Fifth Workshop - Rapperswil, Switzerland (2012.06.01-2012.06.01)] Proceedings of the Fifth Workshop on Refactoring Tools - WRT '12 - Let's make refactoring tools user-extensible!