Explaining why methods change together

Angela Lozano, Carlos Noguera, Viviane Jonckers Vrije Universiteit Brussel, Belgium

Why co-changes?Reveal hidden dependencies [Gall, Hajek, Jazayeri ICSM 1998]

Identify restructuring candidates [Gall, Hajek, Jazayeri ICSM 1998, Girba, Ducasse, Lanza ICSM 2004]

Predict change propagation [Hassan & Holt ICSM 2004]

Validate the completeness of a change [Zimmermann, Weisgerber, Diehl, Zeller ICSE 2004]

Support some change tasks [Robillard, Dagenais JSME 2010]

Explaining co-changes

Find out the reason of the co-change

• Out-perform co-changes

• Useful for impact analysis of new entities

Explaining co-changes

Find out the reason of the co-change

• Out-perform co-changes

• Useful for impact analysis of new entities

Reason: common properties (structural/semantic) of co-changing methods

For instance*

CALLS_METHOD_NAME:goodsType, LOCAL_VARIABLE_DECLARATION_NAME:good, !METHOD_JAVADOC_MENTIONS:manufactured, METHOD_JAVADOC_MENTIONS:material, ! !METHOD_JAVADOC_MENTIONS:raw, METHOD_JAVADOC_MENTIONS:type

net.sf.freecol.common.model.Goods.getRawMaterial(int) Inet.sf.freecol.common.model.Goods.getManufactoredGoods(int) I

"Fixes a bug in getRawMaterial and getManufactoredGoods."

* The typos you see in the examples come from the data collected

Commit transaction

Cluster

Time (commit transactions)

) m1 Im2 I I I I I I Im3 I I I I I I Im4 I I I I I I Im5 I I I I I I Im6 I I I I I I I Im7 I I I I I I I I Im8 I I I I I Im9 I I I I I I Im0 I I

Identifying co-change relations

!!RETURN_TYPE: !

METHOD_PARAM_TYPE: !DECLARING_TYPE: !

DECLARING_TYPE_EXTENDS: !

DECLARING_TYPE_IMPLEMENTS: !

LOCAL_VARIABLE_DECLARATION_TYPE:

Structural properties (a.k.a. Syntactic / Explicit)

Semantic properties (a.k.a. Lexical / Implicit)

!!METHOD_NAME: !

CALLS_METHOD_NAME: !

LOCAL_VARIABLE_DECLARATION_NAME: !

METHOD_PARAM_NAME: !

METHOD_JAVADOC_MENTIONS: !

Are these good reasons?

Coverage: Describes most co-changes !

Idiosyncrasy: Describes only co-changing methods

What is a good reason?

defined above, we describe the methods on this committransaction using 93 properties (50 for the first one,and 43 for the second one). Since the methods shared 9characteristics, the reason for the commit transaction becomes:CALLS_METHOD_NAME: add, equals, get, getLength, size,toString; LOCAL_VARIABLE_DECLARATION_NAME:i, task; and LOCAL_VARIABLE_DECLARATION_TYPE:GanttTask.

B. Some reasons are better than others

With this set of properties as possible explanations for co-changes, we aim at answering five research questions:

RQ1: To what extent is it possible to automatically find areason for a set of co-changing methods? We analyze thisquestion by comparing the coverage of the reasons found:

Coverage: This relates to the number of commit-transactions that have non-empty reasons. We define two typesof coverage.

Coverage per commit, CovC as the ratio of commits witha non-empty CR reason to total number of commits in thesystem’s history Cs. And coverage per methods, CovM as theratio of methods with a non-empty MR reason† to total numberof methods in the system’s history Ms.

CovC = CRCs

, CovM = MRMs

Given that we eliminate commits in which only one methodchanges, and that commits in which many methods change areunlikely to have a single reason, the coverage of our approachwill be low.

RQ2: To what extent the automatically detected reasonsdescribe only the set of co-changing methods? We analyzethis question by assessing the discriminating power of reasons.

Idiosyncrasy: Good reasons will contain properties thattend to occur only in methods that change together. If theproperties found in a reason are also found in methods thatdid not change together, then those properties are likely foundby a coincidence and do not represent an explanation for thechange.

Therefore, we measure the idiosyncrasy Idios(RM ) of areason RM as one minus the ratio between the set of methodsthat are described by RM by coincidence (i.e., the number ofmethods that have properties in common with RM but thatdo not belong to the methods it describes –M–) and the totalnumber of methods in the system’s history Ms.

Idios(RM ) = 1− | ∪m∈Ms RM ⊂ Dm|− |M ||Ms|

For example the idiosyncrasy for the example commit is: 1-((5‡ - 2§)/(14895¶) ) = 1- 0.0002 = 0.9998.

†Methods modified in at least one commit with non-empty reasons.‡Methods whose description includes all the properties of the example

commit: GanttXMLSaver.writeTask(OutputStreamWriter,int,String)GanttXMLOpen.DefaultTagHandler.startElement(String,String,String,Attributes)GanttXMLOpen.GanttXMLParser.startElement(String,String,String,Attributes)GanttXMLSaver.writeTask(OutputStreamWriter,DefaultMutableTreeNode,String)GanttXMLSaver.writeTask(Writer,DefaultMutableTreeNode,String)

§Number of methods modified in the example commit see section II-A.¶Methods belonging to GanttProject during the period analyzed (table III)

RQ3: To what extent the automatically detected reasons forco-changing methods overlap with each other? We analyzethis question by measuring the uniqueness of reasons.

Uniqueness: It is also important to know whether rea-sons are sufficiently different between each other to serveas explanations only for the changes they describe. Thus, wemeasure the similarity Sim(R1, R2) between two reasons astheir Jaccard index (i.e., the intersection over the union of theirproperties.).

The uniqueness of a reason Ri is the mean difference tothe rest of reasons found in the project (i.e., R).

Unq(Ri) = 1− x̃(!

Rj∈R∧i ̸=j

Sim(Ri, Rj))

Uniqueness tells us if different co-change relations are likelyto be due to different reasons. Lets suppose that there are threemethods (m1, m2, and m3) but there are only two co-changeclusters (m1 and m2, m2 and m3). Even though m2 co-changeswith m1 and m3, it is likely that the reasons for co-changingwith m2, are different from the reasons for co-changing withm3. Therefore we expect the reasons to be unique. For examplethe uniqueness for the example commit is∥ 0.985333.

RQ4: To what extent the automatically detected reasonsfor sets of co-changing methods are sound? We analyzethis question by manually checking their plausibility.

Plausibility: Even if reasons explain a large number ofmethods (high CovM ), covering most of the history of theapplication (high CovC), being sufficiently discriminatory(high Idios, and high Unq), they might still make no sense.Therefore, we need to perform a manual assessment of thereasons found to see whether they are plausible. This isachieved by comparing the reasons for a (set) of commitsproduced by our analysis with the message that corresponds tothe (set of) commits. We assert each reason as either plausibleor not. We consider a reason a plausible explanation for acommit, if the words or terms mentioned in the properties ofthe reason appear in the commit message that accompanies theco-change. For groups of co-changes, for those that span morethan 6 co-changes, we extract a word-cloud from the commitmessages to provide an overview of the general terms therein.We apply a small degree of liberty when aligning the termsof the reason with those present in the commit message(s);for example if the commit message mentions UI, or mouseevents, we will consider plausible reasons those that includedependencies to JPanel or ActionListener types (both typespresent in Java’s SWING UI framework). Finally, note that theplausibility depends on the quality of the commit message, ifthe commit message provides no information (i.e., “bugfix” or“no message”) we will consider the reason as an implausibleexplanation for the commit regardless of the properties foundin the reason itself.

The example commit is considered plausible because thecommit message and the reason refer to an I/O task∗∗.

∥The example commit had a non-empty intersection with 128 commits(from the 626 commits in GanttProject with a non-empty reason). Moreover,most of the intersections had only one property. Therefore, its similarity withother commits is very low (Min: 0, 1st Qu.:0, Median:0, Mean:0.018, 3rdQu.:0, Max:0.3).

∗∗See the description of the example in section II-A.

CovC = CRCs

, CovM = MRMs

Unq(Ri) = 1− x̃(!

Rj∈R∧i ̸=j

Sim(Ri, Rj))

Uniqueness: Differs from other reasons !

Plausibility: Makes sense !

What is a good reason?

CovC = CRCs

, CovM = MRMs

Unq(Ri) = 1− x̃(!

Rj∈R∧i ̸=j

Sim(Ri, Rj))

20% random sample manually compared to commit message

Empirical study

CVS repository

Freecol GanttProject

Empirical study

commit transactions

CVS repository

1.087 2.701

Empirical study

commit transactions properties

semantic structuralCVS repository

1.087 2.701

478.312 in 4.099 methods 547.394 in 14.895 methods

Empirical study

commit transactions properties

semantic structural

CmBoth CmSm CmSt

CVS repository

1.087 2.701

Reasons

Empirical study

commit transactions

clusters

propertiessemantic structural

CmBoth CmSm CmSt

CVS repository

1.087 2.701

14 280

Reasons

Empirical study

commit transactions

clusters

propertiessemantic structural

ClBoth ClSm ClSt

CmBoth CmSm CmSt

CVS repository

1.087 2.701

14 280

Reasons

Results!

Coverage (bad. lower for clusters.) Idiosyncrasy (good. lower for clusters.)

• Which properties are better? • Both prop. >> Structural prop. only • Both prop. > Semantic prop. only

Coverage (bad. lower for clusters.) Idiosyncrasy (good. lower for clusters.)

• Which properties are better? • Both prop. >> Structural prop. only • Both prop. > Semantic prop. only

The reasons tend to be unique (usual similarity < 5% & 10%, worst case <10% & 20%).

0.90 0.94 0.98Freecol

0.85 0.95GanttProject

Uniqueness (good. cluster ? commit) !!!

0.90 0.94 0.98Freecol

!• Commits:

• Both prop. is better. Structural ≃ Semantic • Clusters:

• Semantic -> Both -> Structural

0.90 0.94 0.98Freecol

!• Commits:

0.90 0.94 0.98Freecol

!• Commits:

0.90 0.94 0.98Freecol

!• Commits:

Are these good reasons?Plausibility (good. lower for commits)

• Best properties? (depend on the project)

example: CmSm - Freecol

Conclusion• Finding automatically the reason for co-changes IS

POSSIBLE!

• Clusters provide better plausibility

• Commits provide better coverage

• Both properties provide better reasons for commits, unclear for clusters

Explaining why methods change together

More info: Angela Lozano alozano@soft.vub.ac.be

Explaining why methods change together

Software

Entrepreneurship Methods - Sites FCT/UNLsites.fct.unl.pt/.../files/entrepreneurship_methods.pdf · Why Entrepreneurship Methods? ... Unit 2 & 9 • putting together ... Entrepreneurship

Explaining Genetics

Explaining Contrasting Solution Methods Supports Problem-Solving Flexibility and Transfer Bethany Rittle-Johnson Vanderbilt University Jon Star Michigan

Agile Methods (Extreme Programming) A Framework For Pulling It All Together

Prescriptive and risk based SIL allocation methods used ... · Working together for a safer world Prescriptive and risk based SIL allocation methods used together Handling of potential

Methods for Self Improvement Together

Rdg Document Template - uniquespeakerbureau.com€¦ · Web viewTo apply, please submit a completed application form, together with a 600-700 word motivational letter, explaining

Explaining stroke

Tutorial on Computational Linguistic Phylogenylbbe.univ-lyon1.fr/projets/tannier/ENSL/warnow_linguistics.pdf · estimation methods are evaluated. Explaining this methodology, and

Counting Crime Methods for Counting Crime? Current Crime Numbers/Trends Explaining the Crime Drop 1

Methods Better Together - trieca.com › app › uploads › 2019 › 04 › 7-130-200pm-Heather-… · Methods –Better Together . Agenda 1. Design Approach Overview 2. Idlewood

Easy Methods To Put Together The Perfect Tesla Generator Fast But Easy

Explaining Motion

Explaining Contrasting Solution Methods Supports Problem-Solving Flexibility and Transfer

EXPLAINING DIFFERENCES…EXPLAINING SIMILARITIES NATURE VS NURTURE

· Web viewChapter IV introduced Object Oriented Programming. We took small steps and started by explaining how to use class methods and object methods. In …

Explaining Punctuations

Explaining Enron

Tying together means and methods in a Methods of Construction – Concrete & Masonry curriculum A Faculty Perspective Case Study

Chapter Two: Explaining Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e