A Multidimensional Empirical Study on Refactoring Activityusers.encs.concordia.ca/~nikolaos/publications/CASCON_2013.pdf · of implicit branches. In git, as explained by Bird et al

A Multidimensional Empirical Study on Refactoring Activity

Nikolaos Tsantalis†, Victor Guana*, Eleni Stroulia*, Abram Hindle*

†Department of Computer Science and Software EngineeringConcordia University, Montreal, Quebec, Canada

*Department of Computing ScienceUniversity of Alberta, Edmonton, Alberta, Canada

Abstract

In this paper we present an empirical studyon the refactoring activity in three well-knownprojects. We have studied five research ques-tions that explore the different types of refac-torings applied to different types of sources, theindividual contribution of team members onrefactoring activities, the alignment of refac-toring activity with release dates and testingperiods, and the motivation behind the appliedrefactorings. The studied projects have a his-tory of 12, 7, and 6 years, respectively. We havefound that there is very little variation in thetypes of refactorings applied on test code, sincethe majority of the refactorings focus on thereorganization and renaming of classes. Addi-tionally, we have identified that the refactoringdecision making and application is often per-formed by individual refactoring “managers”.We have found a strong alignment betweenrefactoring activity and release dates. More-over, we found that the development teams ap-ply a considerable amount of refactorings dur-ing testing periods. Finally, we have also foundthat in addition to code smell resolution themain drivers for applying refactorings are theintroduction of extension points, and the reso-lution of backward compatibility issues.

Copyright c© 2013 Nikolaos Tsantalis, VictorGuana, Eleni Stroulia, and Abram Hindle. Permissionto copy is hereby granted provided the original copy-right notice is reproduced in copies made.

1 Introduction

In the past, refactoring activity has been empir-ically investigated with respect to its frequency[15], the adoption of refactoring tools [10], itsrelation with bug fixing [7] and testing [11], andits perception by developers [9]. There are stillsome interesting questions that have not beenexplored yet. For instance, what are the prin-cipal drivers behind the application of refactor-ings, and how do different team roles and codeartifacts affect refactoring practice.

In this paper, we inspected the refactoringhistory of three well known projects, namelyJUnit, HTTPCore, and HTTPClient, and in-vestigated 5 research questions related to im-portant aspects of refactoring practice. Thesequestions are:

RQ1: Do software developers perform differenttypes of refactoring operations on testcode and production code?

RQ2: Which developers are responsible forrefactorings?

RQ3: Is there more refactoring activity beforemajor project releases than after?

RQ4: Is refactoring activity on productioncode preceded by the addition or mod-ification of test code?

RQ5: What is the purpose of the applied refac-torings?

It is our strong belief that answers to theseresearch questions will help the community bet-ter reflect on the actual refactoring practice.

Furthermore, they will increase the awarenessof the software maintenance community on thedrivers behind the application of refactoringsand help motivate the creation of refactoringtools (e.g., code smell detectors, IDEs) address-ing the actual needs of developers in a princi-pled and empirical manner. This study pri-marily focuses on high and medium level refac-torings, as defined by Murphy-Hill et al. [10].High level refactorings are those that changethe signatures of classes, methods, and fields,while medium level refactorings change alsoblocks of code (e.g., Extract Method) in addi-tion to the aforementioned changes [10]. To thebest of our knowledge, this is the first empiri-cal study that investigates the purpose of theapplied refactorings with respect to the designproblems being faced or the design decisionsbeing made. Additionally, this work makes adistinction between the refactoring operationsapplied on production and test code.

The rest of the paper is organized as fol-lows. Section 2 presents the related work re-garding refactoring activity analysis and refac-toring detection; Section 3 shows our experi-mental setup, selection of case studies, issueson decentralized repository analysis, and ourrefactoring detection methodology; Section 4presents our experimental results along with adiscussion around the research questions; Sec-tion 5 discusses the threats to the validity ofthe study; Section 6 summarizes the paper anddiscusses its main conclusions.

2 Related Work

One among the earlier empirical studies onrefactoring practice was that of Xing andStroulia [15] who examined the structuralchanges in the evolution of the Eclipse JDT, us-ing the UMLDiff [14] algorithm, in order to in-vestigate (a) what proportion of these changesare due to refactorings, (b) which are the typ-ical refactorings applied in practice, and (c)which types of refactorings are “safe” to clientapplications that reuse the refactored system.They concluded that about 70% of structuralchanges may be due to refactorings and forabout 60% of these changes, the references tothe affected entities in a component-based ap-

plication can be automatically updated by arefactoring-migration tool.

Murphy-Hill et al. [10] investigated the refac-toring practices followed by the developers us-ing the version history of the Eclipse code baseas extracted from its CVS repository. Theyconcluded that (a) commit comments are notuseful for predicting refactoring activity, (b)refactoring is frequently used as a means toreach a specific end, such as adding a fea-ture or fixing a bug (floss refactoring), and (c)the percentage of low-level and medium-levelrefactorings (i.e., sub-method level refactor-ings) is significant (40-60% of the total numberof refactorings) compared to high-level refac-torings (i.e., refactorings changing the signa-ture of classes, methods and fields).

Kim et al. [7] investigated the roleof API-level refactorings (i.e., package/-class/method moves/renames, and method sig-nature changes) using the fine-grained evolu-tion history (i.e., revisions) of three open sourceprojects, namely Eclipse JDT Core, JEdit andColumba. They concluded that (a) there is anincrease in the number of bug fixes after API-level refactorings, (b) the time taken to fix bugsis shorter after API-level refactorings than be-fore, (c) a large number of refactoring revisionsinclude bug fixes at the same time or are re-lated to later bug fix revisions implying thatrefactorings introduce bugs or refactorings areapplied to facilitate bug fixes, and (d) API-levelrefactorings occur more frequently before thanafter major software releases.

Kim et al. [9] surveyed developers at Mi-crosoft regarding their use and understandingof refactoring. They found that many develop-ers did not view refactoring solely as “semantic-preserving code transformations” but more asefforts related to code quality improvement.Some developers were motivated to refactorin order to decrease dependencies to aid codereuse. They repeated some of their prior anal-ysis on Microsoft products [7].

Rachatasumrit and Kim [11] studied the im-pact of refactoring operations on regressiontests. The main concern of the study was toidentify whether code that has been refactoredpreserves its original behavior or, in some cases,introduces faulty enhances that need to be re-tested. Among others, the study found that

only 22% of refactorings are tested, and that atleast half of the failed test affected after codemodifications, included exercising code that in-volved refactoring edits. From our particularinterest, this empirical study reports the ratioof failure-inducing changes out of all the refac-toring edits in three open source projects.

With respect to refactoring detection toolsseveral approaches have been proposed in theliterature. Dig et al. [3] combined a fastsyntactic analysis based on Shingles encodingto detect refactoring candidates and a moreexpensive semantic analysis to refine the re-sults. Weissgerber and Diehl [13] developed asignature-based analysis to detect refactoringcandidates and clone detection to rank thesecandidates. Kim et al. [8] proposed a rule-based program differencing approach that au-tomatically discovers and summarizes system-atic code changes as logic rules.

3 Study Design

In this section we discuss the experimental de-sign of our empirical study. First, we discussour rationale for selecting the particular soft-ware systems for our study. Second, we exposethe technical problems we faced in our attemptto compare subsequent revisions in decentral-ized repositories, and the solutions we devel-oped for these problems. Lastly, we explainour conceptual and practical framework for de-tecting refactorings within successive revisionsof code.

3.1 Selecting Repositories

For our study we selected three medium-sizeprojects: the JUnit testing framework, theJakarta HTTPCore, and the Jakarta HTTP-Client. These systems have a development his-tory ranging from 6 to 12 years. We choseJUnit because it is a well studied system andthe two Jakarta HTTP components becausethey are managed by the same team of peo-ple and the comparison of their histories wouldenable us to investigate if the types of appliedrefactorings depend on the development team,or whether they are dictated by the specificproject requirements. All three selected sys-tems were developed in Java and provide a re-

liable build automation system, since our refac-toring detection mechanism requires to analyzecompiled Java sources.

3.1.1 JUnit

JUnit is perhaps the most widely known andused testing framework for Java programs.Its development started by Erich Gamma andKent Beck in 2000. Its development historyspans over 12 years. Particularly, between re-leases 3.8.1 (in September 2002) and 4.0 (inFebruary 2006) the project went through a rel-atively inactive period of activity. We analyzed1018 revisions ranging from JUnit 3.5 to 4.6.

3.1.2 Jakarta HTTPComponents

HTTPCore is a set of Java components usedto implement client and server side HTTP-based services. It has been mirrored in a pub-licly available git repository exposing 7 yearsof history (February 2005 to March 2012). Wehave analyzed 1588 revisions produced in thisperiod of time. HTTPClient provides clientside applications with authentication, connec-tion state representation, and general connec-tion management capabilities. It has been alsomirrored in git repository covering its develop-ment history from December of 2005 to March2012. We have analyzed a total 1838 revisionsregistered during its 6 years of development.

3.2 Determining Successive Revi-sions

In order to identify the refactorings appliedthroughout the evolution of a software system,it is necessary to extract the changes that oc-curred between successive revisions. However,in contrast to centralized repositories (such asSVN) which have a linear commit history, thehistory of decentralized repositories is a di-rected acyclic graph (DAG) due to the presenceof implicit branches. In git, as explained byBird et al. [2], when two collaborating develop-ers commit changes to their local repositories,their repositories diverge (i.e., they contain dif-ferent commits). When one of the developerspulls from the other, git creates a new committhat merges the remote sequence of commitsinto the pulling developer’s repository.

As a result, the definition of successive re-visions cannot be based on the commit time,as is possible in the case of centralized reposi-tories. In our approach, we compare each ex-amined revision with its parent revision in thecommit history DAG. In order to avoid the du-plicate report of refactorings we excluded fromour analysis the merge commits. A merge com-mit has at least two parents and thus the com-parison of the merged revision with all parentrevisions from the merged branches would re-sult in duplicate refactoring reports in the casewhere refactoring activity has taken place inat least one of the branches. In the exampledepicted in Figure 1, c5 is a merge commit inwhich two branches are merged. If a refactor-ing took place in commit c4, then comparingthe revisions corresponding to commits c2 andc4 is sufficient to detect it. The comparisonof the revisions corresponding to c3 (the lastcommit of the branch where no refactoring ac-tivity took place) and c5 (the merge commit)would erroneously report that the same refac-toring was applied twice.

Figure 1: An example of implicit branching.

3.3 Detecting Refactorings

To detect the refactorings applied betweentwo successive revisions, we have adopted alightweight version of the UMLDiff [14] algo-rithm for differencing object-oriented models.First, our process matches the model elementsin a top-down order (starting from the classesand going to the methods and fields) basedon their names and the similarity of their sig-natures (in the case of methods). This stepperforms exact matching and returns a set ofmatched elements, a set of elements that wereremoved from the first model, and a set of ele-ments that were added to the second model.Next, the removed/added elements betweenthe two models are matched based only on theequality of their names in order to find changes

in the signatures of fields and methods (in thecase of multiple removed/added methods hav-ing the same name, an equal number of param-eters is also required). This process returns aset of changed elements which are deleted fromthe corresponding sets of removed/added ele-ments. Third, the removed/added classes arematched based on the similarity of their mem-bers at signature level. This process tolerateschanges of types in the signatures of the classmembers and returns a set of moved/renamedclasses which are deleted from the correspond-ing sets of removed/added classes. This processis more lightweight than the original UMLDiffalgorithm in the sense that it does not take intoaccount the dependencies between the modelelements to compute their similarity.

We adopted and extended the refactoring-detection rules defined by Biegel et al. [1]. Foreach revision v ∈ V we extract the followingsets:

• Cv: The set of system classes/interfaces.This set contains tuples (p, n, m), wherep is the name of the package to which theclass belongs, n is the class name and mis the set of class members (methods andfields) belonging to the class.

• Mv: The set of system methods. Thisset contains tuples (c, m, p, r, b), wherec is the fully qualified name of the classto which the method belongs, m is themethod name, p is the parameter list ofthe method, r is the return type of themethod and b is the set of class members(methods and fields) being accessed withinthe body of the method (b = {x: (x ∈Mv)or (x ∈ Fv)}).

• Fv: The set of system fields. This set con-tains tuples (c, f , t), where c is the fullyqualified name of the class to which thefield belongs, f is the field name and t isthe field type.

• Gv: The set of generalization relationshipsin the system. This set contains tuples (ci,cj), where class ci extends class cj and ci,cj are fully qualified names.

• Rv: The set of realization relationships inthe system. This set contains tuples (ci,

cj), where class ci implements interface cjand ci, cj are fully qualified names.

Next, for each transaction t, which corre-sponds to the comparison of two successive re-visions v1, v2 ∈ V , we determine the entitiesthat were added, removed and remained. Forexample, C+

t is the set of classes that wereadded in transaction t, C−t is the set of classesthat were removed in transaction t and C=

t isthe set of classes that remained in transactiont. The same notation (i.e., “+” for added, “-” for removed, and “=” for remained) is usedfor the other sets of entities as well. Using thisconceptual framework, Table 1 presents a setof rules used to identify 11 different types ofrefactorings in successive code revisions.

4 Results

This section discusses the findings of our ex-ploration of the research questions we identi-fied in the introduction and the details of themethodology used to investigate each questionbased on our revision extraction mechanism,and refactoring detection tool. More impor-tantly, we discuss the observations from ourempirical study, and address possible explana-tions of the examined behaviors.

4.1 RQ1: Do software develop-ers perform different types ofrefactoring operations on testcode and production code?

The first step towards answering this questionis to make a distinction between classes cor-responding to test code and those correspond-ing to production code. In JUnit, test codeis placed within package junit.tests, sincethe beginning of the project. Thus refactor-ings on classes/methods/fields with qualifiednames matching junit.tests were classifiedas test code refactorings. Out of the total 383detected refactorings, 219 (57%) were produc-tion code refactorings, while 164 (43%) weretest code refactorings.

For the HTTPCore and HTTPClientprojects we followed a different approach, sincethe test classes are not placed within a specific

Table 1: Refactoring detection rules

Refactoring Rule

∃ (c, mi, pi, ri, b′i) ∈M=

t ∧

Extract∃ (c, mj , pj , rj , bj) ∈M+

t ∧

Methodbj ⊆ bi ∧(c, mj , pj , rj , bj) ∈ b′ib′i: body of mi after refactoring

bi: body of mi before refactoring

Move* ∃ (c, f , t) ∈ F−t ∧Field ∃ (c′, f , t) ∈ F+

t

Move*∃ (c, m, p, r, b) ∈M−t ∧

Method∃ (c′, m, p′, r, b) ∈M+

t ∧p ⊆ p′

∃ (p, n, m) ∈ C+t ∧

Extract∃ (px, nx, m′x) ∈ C=

t ∧

Superclass∃ (px.nx, p.n) ∈ G+

t ∧m ⊆ mx ∧ m′x ⊆ mx

m′x: members of nx after refact.

mx: members of nx before refact.

∃ (p, n, m) ∈ C+t ∧

Extract∃ (px, nx, m′x) ∈ C=

t ∧

Interface∃ (px.nx, p.n) ∈ R+

t ∧m ⊆ mx ∧ m′x = mx

Move ∃ (p, n, m) ∈ C−t ∧Class ∃ (p′, n, m) ∈ C+

t

∃ (p, n, m) ∈ C−t ∧

Rename∃ (p, n′, m) ∈ C+

t

ClassNote: constructor names and

field/parameter types may be

renamed accordingly

* ∃ (c, c′) ∈ G=t , then Move ⇒ Pull Up

∃ (c′, c) ∈ G=t , then Move ⇒ Push Down

package. We noticed that the names of testclasses had a prefix of Test or a suffix of Mock.Out of the total 527 detected refactoringsin HTTPCore, 421 (80%) were applied toproduction code, while 106 (20%) were appliedto test code. In HTTPClient, 161 (81%) whereapplied to production code and 37 (19%) totest code out of 198 total refactorings. Wemanually inspected the detected refactoringsin all examined projects.

We found 8 False Positives for the ExtractMethod refactoring (96.4% precision) and 4False Positives for the Rename Class refactor-ing (97.6% precision), while the precision forall the other types of refactorings was 100%.

Table 2 reports the number of detected refac-torings on production code and test code, foreach examined project. The variety of therefactorings applied to production code and testcode differed: in test code of the examinedprojects we found only 3 types of refactorings,namely Move Class, Rename Class and MoveMethod. The goal of the applied refactoringson test code was often to organize the tests intonew packages, reorganize inner classes, renamesome test classes by giving a more meaningfulname and move methods between test classesfor conceptual reasons. Conversely, the refac-torings that were applied to production code,of the examined systems, were intended to im-prove the design of the system by modularizingthe code and removing design problems. Thethree most dominant refactorings on productioncode were Move Class, Extract Method andRename Class followed by refactorings relatedto the reallocation of system’s behavior (Moveand Pull Up Method). The refactorings relatedto the reallocation of system’s state (Move andPull Up Field) as well as the introduction ofadditional inheritance levels (Extract Super-class/Interface) are less frequent. Finally, therefactorings that move state or behavior to sub-classes (Push Down Method/Field) are rarelyapplied.

4.2 RQ2: Which developers areresponsible for refactorings?

The JUnit project version control system ref-erences 32 developers, although most of themare not big contributors, they have at least onecommit. Figures 2a and 2b show the percent-age of the refactoring activities performed byJUnit committers on production and test code,respectively. It is evident that the refactoringsare performed by a limited number of devel-opers (7 contributors on production code and4 on test code). Additionally, the proportionof the refactorings contributed by each devel-oper is almost identical between production andtest code. The top two refactoring contributors,namely David Saff and Kent Beck (the currentand the former project managers), are also thetwo top committers with 63% and 12% of thecommits, respectively.

Figures 2c and 2d show the distribution of

(a) JUnit productioncode

(b) JUnit test code

(c) HTTPCore produc-tion code

(d) HTTPCore test code

(e) HTTPClient produc-tion code

(f) HTTPClient testcode

Figure 2: Refactoring contributors.

the refactoring activities contributed to theHTTPCore production code and test code, re-spectively. In this case, only 4 developersactively applied production code refactorings,while 3 applied test code refactorings. Wehave identified Oleg Kalnichevski (top commit-ter with 78% of the commits), and Roland We-ber (third in the list of top committers with8% of the commits) as the two main refactor-ing contributors of this project. The secondin the top list of committers (Sebastian Baz-ley) with 11% of the total commits has con-tributed a small number of refactorings onlyon test code.

Figures 2e and 2f present the refactoring dis-tribution for the HTTPClient production and

Table 2: Number of detected refactorings on the production and test code of the examined projects.

Production code Test code

Refactoring/JUnit

HTTP HTTPJUnit

HTTP HTTPProject Core Client Core ClientMove Class 94 199 52 131 48 28Extract Method 66 81 71 2 0 0Rename Class 22 75 14 22 25 7Move Method 15 14 8 9 33 2Pull Up Method 17 11 2 0 0 0Move Field 1 10 7 0 0 0Extract Interface 1 12 1 0 0 0Extract Superclass 4 7 1 0 0 0Pull Up Field 0 7 1 0 0 0Push Down Method 1 2 0 0 0 0Push Down Field 0 3 0 0 0 0

test code, respectively. HTTPClient shares thesame development team with HTTPCore ex-cept for one additional developer (JonathanMoore, fourth in the list of top committerswith 6% of the commits) who is the secondmost active refactoring contributor after OlegKalnichevski. Similarly to the HTTPCoreproject, Oleg Kalnichevski (top committer with58% of the commits) is also the top refactoringcontributor. However, Sebastian Bazley andRoland Weber (the second and third in list oftop committers with 18% and 16% of the to-tal commits, respectively) have not contributedany refactorings in the HTTPClient project.

It is worth noticing that although HTTP-Core (with 8 committers) and HTTPClient(with 9 committers) share almost the sameteam of developers, only Oleg Kalnichevskicontributes refactorings in both projects. Thismeans that the rest of the developers, althoughthey commit code in both projects, they con-tribute refactorings in only one of them (prob-ably the project they are more familiar with).

Based on these observations, we can concludethat most of the applied refactorings are per-formed by specific developers that usually havea key role in the management of the project.Moreover, we did not notice any substantialchange in the distribution of the refactoringcommits by the developers to the product codeand test code of the three projects under study.

4.3 RQ3: Is there more refac-toring activity before majorproject releases than after?

We inspected the refactoring activity aroundthe release points along the lifetime of the threeprojects under study. We have tracked eachproject major release dates (based on the pub-lic software release dates) and manually exam-ined their relevance in order to filter minor re-leases such as isolated application of patches,major documentation migration, or change oflicense types. In order to perform a homoge-neous study, we have selected windows of 80days around each release date. Each windowwas divided in two groups of 40 days in or-der to split the analysis in the periods beforeand after a release point. In our three casestudies the window size met two fundamentalconditions. First, a high volume of change ac-tivity was registered in the repository duringthe framed time period, and second, the timewindow did not overlap with others. While insome cases the HTTPCore project had majorreleases within 2 months of distance, in the JU-nit project the commit activity decreased sub-stantially 30 days after the release dates. Wefound that a window of 80 days is a practicaltime frame around the release dates, and co-hesive with the development style of the threeprojects under observation.

Figures 3, 4, and 5 present the refactoringactivity around the release dates of JUnit (13

releases), HTTPCore (9 releases), and HTTP-Client (7 releases), respectively. The distribu-tion of the refactoring frequency around the re-lease dates of each project has been summa-rized with violin plots [6]. The advantage ofa violin plot over a box plot is that the for-mer also includes frequency density informa-tion. First, for each project using a single win-dow, we projected the distribution of the refac-toring activity by adding the daily refactoringactivity counts. This projection allowed us tocondense the information of the total refac-toring activity per day in a single dimension,and to analyze each project refactoring activ-ity trends around a generic release point. Fig-ure 3b for example, presents the distribution ofthe refactoring activity around the generic re-lease day for the JUnit project. On the left, theviolin plot shows the distribution of the refac-toring activity before the release dates of theproject, while on the right it portrays the afterrelease counterpart.

In the JUnit project, Figure 3a shows a refac-toring activity peak around 20 days before therelease day. Moreover, it shows constant refac-toring activity 10 days before the release day inmultiple release points. On the right side (afterthe release day) no significant refactoring activ-ity was detected in less than 25 days. Figure3b shows a wide and tall distribution of refac-toring activity before the projected release day.A very small distribution body was spotted onthe right, exposing the small significance of theactivity after the release point.

Regarding the HTTPCore project, Figure 4shows a similar behavior. In this case, mostof the refactoring activity that occurred be-fore the release day is situated within a singleweek. Figure 4a shows significant activity after10 days of the release day. However, this activ-ity has a much minor scale compared to theone registered before the same release. Theviolin plot presented in Figure 4b shows thehigh refactoring distribution before the releasedays of the project. However, an important butsmaller in comparison body was found for theperiod after the release point.

With respect to the HTTPClient project,Figure 5a shows a common refactoring zone inthe period between 36 and 28 days before theproject release date. Another common consid-

erable peak can be observed just one week be-fore the release day. In general, the refactoringactivity after the release point is very small incomparison to the window before the release.This is also observable in Figure 5b where bothdistribution bodies are compared.

In the three projects we studied, we observeda common behavior where the refactoring ac-tivity is high before the release dates. More-over, our results expose that the refactoring ac-tivity after the releases is almost non-existent,and that in the cases where some activity wasfound, it was very small compared to the activ-ity observed before the releases. The analysisof commits before and after the release datesrevealed that the developers were always com-mitting changes after the releases, although inmost cases the number of commits is lower incomparison to the number of commits beforethe releases. However, this means that the ob-served absence of refactoring activity after thereleases is not due to the absence of commits.

4.4 RQ4: Is refactoring activityon production code precededby the addition or modifica-tion of test code?

Our methodology for studying this question in-volved three steps: (i) test activity detection,(ii) manual inspection of test hotspots, and (iii)window analysis construction. In the first step,we built an algorithm that explored the revi-sion history of the projects and detected theaddition and modification of test files, count-ing its occurrences along the lifetime of eachproject. We followed the same approach pre-sented in Section 4.1 to differentiate productioncode from test code. Next, based on the timeperiods identified as test hotspots, we manuallyinspected the commit comments, and their re-spective new or modified sources. Through amanual inspection process we found a numberof false positives that were excluded from theanalysis. In these cases several test classes weremodified, but the actual changes were: reor-ganization of the import headers, addition ofcopyright notices, or documentation of specificcases. Lastly, in the third step, we grouped thecontiguous test hotspots and identified the ac-tual test periods of each project. Taking the

Figure 3: Refactoring Activity Comparison (Release) - JUnit (40 days before and after release)

Figure 4: Refactoring Activity Comparison (Release) - HTTPCore (40 days before and after release)

Figure 5: Refactoring Activity Comparison (Release) - HTTPClient (40 days before and after release)

last day of each period as our pivot point, wefollowed the same window analysis as the onepresented in Section 4.3. Figures 6, 7, and8 present the refactoring activity of our threecase studies around the end of each testing pe-riod. The pivot of the window (the zero point)is labeled as End of Testing Period (E.T.P).

In the case of the JUnit project, Figure 6asummarizes the refactoring activity around theend of 10 testing periods. Here, we can ob-serve a common pattern where the refactoringactivity was particularly high during the test-ing phases of the project. Furthermore, in al-most all cases, the refactoring activity substan-tially increased the last days before (and eventhe same day) the end of testing periods. In thecontext of the same project, Figure 6b presentsthe frequency distribution bodies of the projectduring (left) and after (right) testing periods.A higher refactoring density can be easily ob-served during the testing phases of the project.

Figure 7a shows the results obtained for theHTTPCore project with 8 testing periods un-der analysis. In this case, a very peculiar pat-tern emerged. In almost all cases, the signif-icant refactoring activity peaks occurred thesame day the testing period ended. This phe-nomenon exposed for this project that bothtesting and refactoring activities were linkedand specially ordered. Since the E.T.P dayis included in the left portion of the windowfrequency density analysis, Figure 7b portraysa large density body for the period under theproject testing phases.

The refactoring activity around testing pe-riods for the HTTPClient project is very sim-ilar to the one observed in the JUnit project.Figure 8a shows two interesting sections. Inthe first section, around 30 and 10 days be-fore the end of the testing activities, the projectpresents a considerably high refactoring activ-ity, while the refactoring activity after the endof testing periods is almost non-existent. Con-sequently, the density summary in Figure 8b isflat and no significant volume of refactorings isobserved after the testing end point.

The three projects under examinationdemonstrate a strong alignment of refactoringactivities with testing periods. In each of theanalyzed windows, significant refactoring op-erations were applied along with test modifica-

tions. Especially in the HTTPCore project, themajority of refactoring activity was performedon the ending day of testing periods. We be-lieve that one possibility for the alignment be-tween refactoring and test modifications is theadoption of test-driven development practices.In some cases, we observed that the develop-ers updated their test code first and then per-formed refactorings.

4.5 RQ5: What is the purpose ofthe applied refactorings?

To answer this question two of the authors(Tsantalis and Guana) inspected together thesource code involved in each detected refactor-ing, in order to postulate the main drivers thatmotivated the developers towards the applica-tion of the refactorings. The process was per-formed in two rounds. In the first round, a sub-set of randomly selected refactorings was ex-amined by inspecting the relevant source codebefore and after the application of the refac-toring with a text diff tool. Next, they definedrules (described in the rest of the section) thatshould apply in order to classify a refactoringinstance in each of the defined categories in thefirst round. In the second round, they per-formed a systematic labeling of all refactoringinstances using the same inspection method byapplying the aforementioned rules. For eachrefactoring instance, the authors inspected thetext diff report as well as the correspondingcommit logs and suggested a label. In the caseof disagreement, a discussion with argumentsfrom both sides took place in order to reacha consensus. In total, 210 Extract Methodand 70 Inheritance related refactorings wereinspected over the span of approximately 40hours (i.e., 5 working days). The inspectiontime for each refactoring was between 5 to 10minutes depending on the complexity of the diffreport and the number of files affected by thecorresponding refactoring.

4.5.1 Extract Method refactoring

The Extract Method refactoring can be appliedto resolve a large variety of code smells [4]either directly or as a part of a more complextransformation, as well as to facilitate the

Figure 6: Refactoring Activity Comparison (Test) - JUnit (40 days before and after release)

Figure 7: Refactoring Activity Comparison (Test) - HTTPCore (40 days before and after release)

Figure 8: Refactoring Activity Comparison (Test) - HTTPClient (40 days before and after release)

introduction of new functionalities and bugfixes. To provide an insight to this diversity,we inspected in total 210 instances (detectedin the three examined projects) and foundthree main motivations, namely Code smellresolution, Extension and Backward compat-ibility, which are further divided into morefine-grained sub-categories:

Code smell resolution:

• Remove duplicate code: Duplicate codefragments are replaced with a singlemethod call to the extracted method.

• Decompose method : A code fragment hav-ing a distinct functionality is extractedinto a new separate method in order todecompose and simplify a long or complexmethod (Long Method [4]).

• Hide message chain: A chain of methodcalls is extracted and replaced with a sin-gle method call in order to simplify theoriginal method (Message Chain [4]).

• Encapsulate field : A field access/assign-ment is replaced with a call to a newlyintroduced getter/setter method and thevisibility of the field is reduced in order toenforce the design principle of encapsula-tion or data hiding.

Extension:

• Facilitate functionality extension: The ex-tracted/original method contains newlyadded code apart from the extracted/re-moved one. The goal is to simplify thecode in the original method and make eas-ier the addition of the new functionality.

• Introduce polymorphism: The extractedmethod is abstract and the extracted codeis moved to a subclass providing a con-crete implementation for the newly intro-duced abstract method. The purpose isto enable future extension of the systemthrough polymorphism.

• Self encapsulate field : A field access/as-signment is replaced with a call to a newlyintroduced getter/setter method, but theaccess modifier of the field remains the

same. The goal is to enable the future ex-tension of the system, since indirect vari-able access allows subclasses to overridethe superclass getter/setter and managethe data with more flexibility (e.g., lazyinitialization).

• Introduce factory method : The creationof an object is replaced with a call to anewly introduced factory method creatingthe same object. The goal is to enablefuture extension of the system, since con-structors can only return an instance ofthe object that is asked for, while a fac-tory method can return instances of sub-class types as well [4].

Backward compatibility:

• Extract Delegate: The entire body of amethod or a single method call (withoutchain) is extracted into a new method.The goal is (a) to improve the name of amethod, (b) to remove unnecessary or un-used parameters from a method, or (c) todeprecate a method and at the same timepreserve the public API of the class.

Table 3: Motivations for the application of Ex-tract Method refactorings.

Motivation Sub-category Count (%)

Code smell

Decompose method 32 (15%)

Remove duplication 21 (10%)

Hide Message chain 19 (9%)

Encapsulate field 20 (10%)

Extension

Facilitate extension 23 (11%)

Use Polymorphism 9 (4%)

Self encapsulate field 24 (11%)

Factory method 37 (18%)

BackwardExtract Delegate 25 (12%)

compatibility

Total 210 (100%)

As one can observe from Table 3, the Ex-tract Method refactoring serves a large varietyof different purposes. With respect to the codesmells being resolved in the examined projects,the use of the Extract Method refactoring forthe decomposition of methods is more domi-nant compared to the other three code smells.

Additionally, there is an almost equal use ofthe Extract Method refactoring for the pur-poses of removing duplicate code, hiding mes-sage chains and encapsulating fields. With re-spect to the extension motivation, the Extractmethod refactoring has been primarily used tointroduce factory methods. Moreover, it hasbeen used to facilitate the introduction of newfunctionality and self encapsulate fields, whileit has been more rarely used for the introduc-tion of polymorphism (actually instances of thelatter motivation were found only in HTTP-Core project). Finally, the Extract Methodrefactoring has been used a significant num-ber of times across the three examined projectsfor preserving the backward compatibility andpublic API of the classes. By analyzing theresults for each project separately, we foundout that the application of the Extract Methodrefactoring is driven more by code smells in JU-nit (where 53% of the refactorings were appliedfor the resolution of code smells) and HTTP-Client (where 63% of the refactorings were ap-plied for the resolution of code smells), while itis more extension driven in HTTPCore (where74% of the refactorings were applied for exten-sion purposes).

4.5.2 Inheritance related refactorings

For the inheritance related refactorings (PullUp/Push Down Method/Field, Extract Su-perclass/Interface) we inspected in total 70instances (detected in the three examinedprojects) and found three main motivations,namely Code smell resolution, Extension andAbstraction level refinement :

Code smell resolution:

• Remove duplicate code: Duplicate meth-ods or fields declared in multiple classesare moved to an already existing or newlyintroduced common superclass. The goalis to make the subclasses inherit and notcopy the duplicate behavior/state.

Extension:

• Form template method : The TemplateMethod design pattern [5] is introducedby “pulling up” into a template methodthe behavior of methods having the same

signature but different functionality. Thebody of the template method calls a newlyintroduced abstract method in the super-class and the subclasses provide a concreteimplementation for the abstract methodby copying the bodies of the pulled upmethods.

Abstraction level refinement:

• Generalize by pulling up code: A method-/field is moved from a subclass to an al-ready existing or newly introduced super-class. The goal is to reallocate behavior/s-tate to a higher level of abstraction.

• Specialize by pushing down code: Amethod/field is moved from a superclassto a subclass. The goal is to reallocate thebehavior/state that is not used by a super-class to a lower level of abstraction.

• Generalize contract : A new interface iscreated containing methods which are al-ready implemented in existing classes.The goal is to introduce an additional levelof abstraction to capture the behavior be-ing common between different classes.

• Decompose interface: A portion of themethods declared in an existing interfaceis moved to a new interface that extendsthe original. The goal is to make the orig-inal interface “thinner” and cover futureclasses that do not need to implement theremoved methods.

As one can observe from Table 4, the ap-plication of inheritance-related refactorings ismainly motivated by the removal of dupli-cate code and the generalization of abstrac-tion (either by pulling up code or extractingcommon superclasses and interfaces). On theother hand, the specialization of abstraction(by pushing down code) and the decompositionof interfaces are rarely applied practices andhave been mainly observed in the HTTPCoreproject. Finally, the application of inheritance-related refactorings in order to form instancesof the Template Method design pattern was ob-served only in the JUnit project and thus wecannot draw any general conclusion about thisspecific motivation.

Table 4: Motivations for the application of In-heritance related refactorings.

Motivation Sub-category Count (%)

Code smell Remove duplication 22 (31%)

Extension Template method 14 (20%)

AbstractionGeneralize (pull up) 12 (17%)

levelGeneralize contract 12 (17%)

refinementSpecialize (push down) 6 (9%)

Decompose interface 4 (6%)

Total 70 (100%)

5 Threats to Validity

Internal validity is threatened by the pres-ence of false negatives (i.e., undetected actualrefactorings). Our refactoring detection tech-nique, as mentioned in Section 4, has a smallnumber of false positives and thus high pre-cision, yet it was not possible to compute itsrecall due to the absence of the actual refactor-ings applied in the examined projects. Further-more, we tried to mitigate inconsistencies inthe labeling of refactorings into the intent cat-egories through a consensus building process.

External validity is threatened by thescope of the study. The selected projects aredeveloped in Java, are managed by relativelysmall teams and constitute libraries (in the caseof HTTP Components) or frameworks (in thecase of JUnit). Consequently, we cannot claimthat the results of the study can be generalizedto other programming languages where refac-toring tool support may be different, projectswith a significantly larger number of develop-ers or different team organization, and differentsoftware systems (e.g., applications) where theneed for backward compatibility and extensi-bility might be less important.

6 Conclusions

In this empirical study we investigated fivequestions addressing different refactoring activ-ity in three projects.

RQ1: Do software developers performdifferent types of refactoring operationson test code and production code? Weconcluded that there is a wider variety of refac-

toring types applied to production code. Ad-ditionally, we observed that this activity wasprimarily focused on design improvements suchas, resolution of code smells and modulariza-tion refinement. In terms of the refactoringsobserved on test code, there is a clear focus oninternal reorganization of the classes of eachproject into packages and renaming of classesfor conceptual reasons.

RQ2: Which developers are responsi-ble for refactorings? We observed that spe-cific developers have taken over the responsi-bility of planning and performing refactorings.We have found for all of the projects that westudied this role was mainly fulfilled by a singledeveloper acting as a refactoring manager.

RQ3: Is there more refactoring ac-tivity before major project releases thanafter? In our three case studies, we ob-served that the refactoring activity is signifi-cantly more frequent before a release than af-ter. We believe this activity is motivated bythe desire to improve the design, and preparethe code for future extensions before stable ver-sions of the API are released to the public.

RQ4: Is refactoring activity on pro-duction code preceded by the addition ormodification of test code? We found a tightalignment between refactorings and active pe-riods of test code changes. We detected intenserefactoring activity during the testing periodsof the projects that declined immediately afterthe test period ended. Our results and poste-rior manual inspection led us to conclude thattesting and refactoring are dependent implyingthe adoption test-driven development practicesin the examined projects.

RQ5: What is the purpose of the ap-plied refactorings? We found a wide varietyof reasons motivating the application of theExtract Method refactoring. With respect tocode smells, the decomposition of methods wasthe most dominant motivation for applyingExtract Method refactorings. With respect tofacilitating extension, the primary motivationwas the introduction of Factory Methods.Regarding inheritance related refactorings, themain motivation was the removal of duplicatecode and the generalization of abstraction,while the decomposition of interfaces and spe-cialization of abstraction were rarely applied.

Lessons learned : Continuous Code Qual-ity Management platforms (e.g., Sonar) havemainly incorporated code convention checkers(e.g., FindBugs, Checkstyle, and PMD) toassess and improve code quality. There is alsoevidence that open-source projects fix codeconvention violations before public releases[12]. We believe that the results of the studyhighlight the need for incorporating code smelldetection and resolution tools in Code QualityManagement platforms. Additionally, researcharound the detection of refactoring opportu-nities has mainly focused on refactorings thatremove code smells. Based on the results of thestudy, developers refactor their code for otherreasons as well (e.g., extension, and backwardcompatibility). Consequently, current and fu-ture refactoring recommendation tools shouldsupport refactorings serving multiple purposes.

Acknowledgments: The authors wouldlike to acknowledge the generous support ofNSERC, AITF (former iCORE), and IBM.

References

[1] Benjamin Biegel, Quinten David Soetens,Willi Hornig, Stephan Diehl, and Serge De-meyer. Comparison of similarity metrics forrefactoring detection. In Proceedings of the8th Working Conference on Mining SoftwareRepositories, pages 53–62, 2011.

[2] Christian Bird, Peter C. Rigby, Earl T. Barr,David J. Hamilton, Daniel M. German, andPrem Devanbu. The promises and perils ofmining git. In Proceedings of the 6th IEEEInternational Working Conference on MiningSoftware Repositories, pages 1–10, 2009.

[3] Danny Dig, Can Comertoglu, Darko Marinov,and Ralph Johnson. Automated detection ofrefactorings in evolving components. In Pro-ceedings of the 20th European Conference onObject-Oriented Programming, pages 404–428.2006.

[4] Martin Fowler. Refactoring: improving the de-sign of existing code. Addison-Wesley, Boston,MA, USA, 1999.

[5] Erich Gamma, Richard Helm, Ralph Johnson,and John Vlissides. Design Patterns: Ele-ments of Reusable Object-Oriented Software.Addison-Wesley, 1995.

[6] J.L. Hintze and R.D. Nelson. Violin plots: abox plot-density trace synergism. AmericanStatistician, 52(2):181–184, 1998.

[7] Miryung Kim, Dongxiang Cai, and SunghunKim. An empirical investigation into the roleof api-level refactorings during software evolu-tion. In Proceedings of the 33rd InternationalConference on Software Engineering, pages151–160, 2011.

[8] Miryung Kim, David Notkin, and Dan Gross-man. Automatic inference of structuralchanges for matching across program versions.In Proceedings of the 29th International Con-ference on Software Engineering, pages 333–343, 2007.

[9] Miryung Kim, Thomas Zimmermann, andNachiappan Nagappan. A field study of refac-toring challenges and benefits. In Proceed-ings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of SoftwareEngineering, pages 50:1–50:11, 2012.

[10] Emerson Murphy-Hill, Chris Parnin, and An-drew P. Black. How we refactor, and howwe know it. In Proceedings of the 31st Inter-national Conference on Software Engineering,2009.

[11] Napol Rachatasumrit and Miryung Kim. Anempirical investigation into the impact ofrefactoring on regression testing. In Proceed-ings of the 28th IEEE International Confer-ence on Software Maintenance, pages 357–366, 2012.

[12] Michael Smit, Barry Gergel, H. James Hoover,and Eleni Stroulia. Code convention adher-ence in evolving software. In Proceedings of the27th IEEE International Conference on Soft-ware Maintenance, pages 504–507, 2011.

[13] Peter Weissgerber and Stephan Diehl. Iden-tifying refactorings from source-code changes.In Proceedings of the 21st IEEE/ACM Inter-national Conference on Automated SoftwareEngineering, pages 231–240, 2006.

[14] Zhenchang Xing and Eleni Stroulia. Umldiff:an algorithm for object-oriented design differ-encing. In Proceedings of the 20th IEEE/ACMInternational Conference on Automated Soft-ware Engineering, pages 54–65, 2005.

[15] Zhenchang Xing and Eleni Stroulia. Refac-toring practice: How it is and how it shouldbe supported - an eclipse case study. In Pro-ceedings of the 22nd IEEE International Con-ference on Software Maintenance, pages 458–468, 2006.

Documents

A Multidimensional Empirical Study on Refactoring Activityusers.encs.concordia.ca/~nikolaos/publications/CASCON_2013.pdf · of implicit branches. In git, as explained by Bird et al