View
69
Download
0
Embed Size (px)
Citation preview
Authors:
Edison Klafke Fillus
Silvia Regina Vergilio
Software Engineering Research Group
UFPR – Federal University of Paraná
LAWASP
2012
Aims to discover potential aspect candidates inObject Oriented (OO) software.
Searches for crosscutting concern symptoms from:◦ Source code; or
◦ Run-time behavior
4
Generally use data mining and data analysis:◦ Fan-In
◦ Formal Concept Analysis
◦ Clone Detection
◦ Pattern Matching
◦ Natural Language Processing
◦ Clustering Analisys
◦ Etc.
5
Discovers potential aspect candidates fromthe searched data without requiring priorknowledge of the system characteristics.
Identifies groups of methods or statementsrelated to the crosscutting concerns.
Composed by two main parts:◦ Clustering Algorithm: Defines grouping strategy.
◦ Distance Measure: Measure the distance betweenelements based on the selected characteristics.
6
Use different distance measures based oncode scattering and tangling symptoms.
Based on a single symptom a time.
No support neither to core concerns removalnor to fast core concerns separation frompotential crosscutting concerns.
Discover advices, don’t support pointcutidentification.
7
Introduces an integrated process to discoveraspect candidates and to identify pointcuts.
Input: OOP source code.
Output:◦ i) the groups obtained with clustering analysis
ranked by crosscutting concern potential, and
◦ ii) the pointcut candidates associated to eachcrosscutting concern.
9
10
+Support some Core ConcernRemoval with Fan-In Analisys
+ Support combinedsymptoms with DSOND
Included
+ SupportPointcut part
+ Support FastCCC prioritization
Included
DSOND (Combined) – It is an aggregatedfunction of three distance measures:◦ DS - Scattering
◦ DO - Operation Cloning
◦ DN - Name convention
Combines the measures by weight attributionto each symptom.
11
DS (Scattering) – Groups methods that arefrequently invoked by the same classes andmethods.
It is an enhancement of DCCC, addingpolymorphism treatment.
Polymorphism is given by treating all thepolymorphic methods as a group, counting allthe invocations to methods that overrides andrefines them.
12
DO (Operation Cloning) – Groups methods withsimilar or cloned operations.
Identifies code cloning by considering only theinvocations to methods and classes (Fan-out).
13
int i = 0;
while (i <=20)
{
i++;
(...)
DN (Name Convention) – Groups methods withsimilar name conventions.
Considers the method name, class name,parameter types and return type.
Uses the edit distance to measure the stringdistance (quantity of inserts, changes, deletes):◦ Levenstein Distance:
d(figureChanged, figureWillChange)=11/16 = 68,7%
14
Defines a score to each group found in theclustering step such that the elements ofgroups with higher score are more likely tobe crosscutting concerns than elements ofother ones with lower score.
K123 120
K232 104
K083 98
CrosscuttingPotencial
Group Scattering (GSRank): Scores based onthe number of distinct methods and classesthat invoke the methods belonging to agroup.
Group Fan-in (GFRank): Scores based on thesum of the fan-in associated to the methodsbelonging to a group.
16
m1()Fan-In: 2
m2()Fan-In: 2
GFRank = 4GSRank = 3
Interconnectivity: Computes how related themethods of the group are. If a group has lowinterconnectivity, so its score is reduced bythe factor.
GSIRank: GSRank with InterconnectivityReduction Factor.
GFIRank: GFRank with InterconnectivityReduction Factor.
17
m1()Fan-In: 2
m2()Fan-In: 2
GFIRank = 2GSIRank = 1,5
DS=0,5
Pointcut Identification Method based onAssociation Rule Mining on Source Code.◦ Item Set = Methods
◦ Transaction Set = Invocations.
The analysis is based on adjunct operationinvocations, when the consequent occursright after or right before the antecedent,considering the invocation sequence order.
18
Item Set
Transaction Set
a b c d
Before Execution
After Execution
Before Call
After Call PROCEEDEDb c
PROCEEDED ENDd
ANTECEDED STARTa
19
ANTECEDEDc b
T1 = {a,b,c,d}T2 = {a,p,b,c,q,d}
PC A Before Execution [T1,T2]
T1 = {a,b,c,d}T2 = {a,p,b,c,q,d} PC D After Execution [T1,T2]
T1 = {a,b,c,d}T2 = {a,p,b,c,q,d}T3 = {q,w,b,e}
PC B Before Call C
PC C After Call BT1 = {a,b,c,d}T2 = {a,p,b,c,q,d}T3 = {q,w,c,e}
* Rules found in format: IF Core THEN Crosscutting
20
T1 = {t,a,b,a,q}T2 = {x,a,b,a,w}
PROCEEDEDb a
ANTECEDEDb a
Around Execution
T1 = {t,a,b,a,q}T2 = {x,a,b,a,w}T3 = {e,a,y,a,o} PC A Around Execution B
= =
22
JHotDraw v5.4b1
• Graphical Editor
HSQLDB v1.8.0.2
• Database Server
Tomcat v5.5.17
• Java Web Server
Were combined:◦ Clustering Algorithms: k-medoids, Hierarchical and
Chameleon.◦ Distance Measures: DSOND, DS, DO, DN, DCCC e DHBZH.◦ 18 instances.
Parameter calibration to each combinationalgorithm X measure X system.
Evaluation Indexes: DISP and DIV. Based onthe principle that an optimal partition is thatwhere each group represents a singlecrosscutting concern, and, each crosscuttingconcern is represented by a single group.
23
24
Algorithm Measure JHotDraw Tomcat HSQLDB
7 Hierarchical DCCC 1.1650 0.6367 0.2416
8 Hierarchical DHBZH 0.8947 0.4413 0.2719
9 Hierarchical DS 0.3428 0.1734 0.1835
10 Hierarchical DO 0.9892 0.4672 0.3095
11 Hierarchical DN 0.5833 0.5763 0.5272
12 Hierarchical DSOND 0.1071 0.1272 0.1690
13 Chameleon DCCC 1.1368 0.5304 0.4325
14 Chameleon DHBZH 1.0378 0.7860 0.3916
15 Chameleon DS 0.3585 0.2687 0.4054
16 Chameleon DO 1.1750 0.6226 0.5607
17 Chameleon DN 0.3091 0.3615 0.4305
18 Chameleon DSOND 0.2211 0.2233 0.1690
*DIF
The combination DSOND with the Hierarchicalalgorithm obtained the best results for allsystems. 91% efficacy.
The best results obtained by the clusteringalgorithms were with DSOND.
The best results were obtained by hierarchicalalgorithms.
Combinations Hierarchical and DS, andCHAMELEON and DN also obtained goodresults.
25
Ranking Measures:◦ GSRank◦ GSIRank◦ GFRank◦ GFIRank
Ranking applied to the best instance obtainedin clustering step in each system. (I12)
Evaluation Index: RANK - Based on theprinciple that in an optimal ranking the ngroups with higher score belongs to acrosscutting concern set composed by ninstances
26
27
Measure JHotDraw Tomcat HSQLDB
GSRank 0.9013 0.9647 0.7444
GSIRank 0.8682 0.9678 0.7457
GFRank 0.9114 0.8682 0.7299
GFIRank 0.9037 0.9517 0.7552
Any measure was absolutely better thanothers, considering the three cases.
The most notably measures are GSIRank andGFIRank, measures that use theinterconnectivity as a reduction factor.
Only ARPIM.
ARPIM applied to the best instance obtainedin clustering step in each system. (I12)
Evaluation Indexes: COV and USE - Based onthe principle that an optimal pointcutidentification occurs when at least onepointcut was identified for each crosscutting,and that each pointcut is useful on therefactoring.
28
29
System COV USE
JHotDraw 0.7647 0.5263
Tomcat 0.4054 N/A
HSQLDB 0.4000 0.6666
The minimal coverage was 40%, achieving 76% inJHotdraw.
More than 52% of the identified pointcuts wererefactored on the aspect oriented version of thesystems, achieving 66% of usefulness onHSQLDB.
The CAAMPI process allows the analyst toprioritize the aspect candidates analysis byusing the ranking information, and providesthe advice and pointcut information about theaspect candidates.
Clustering: The introduced measure DSOND isthe best distance measure among theevaluated ones, independently of system andclustering algorithm, with 91% of efficacy.
Ranking: Measures proposed make feasible theaspect candidates prioritization by crosscuttingpotential.
31
Pointcut: ARPIM method enables theidentification of the pointcuts with more 40%of coverage and 52% of usefulness, whencompared with the aspectized version of thesystems.
32
Filter step can be improved, by using heuristics or patterns catalog, to automatically eliminate utility and accessor methods that do not follow the get and set conventions.
Multi-objective optimization algorithms should be explored to eliminate the weights calibration task.
To explore ways to classify the type ofcrosscutting concern represented by thecandidate, such as persistence, logging, etc.◦ Patterns recognition, natural language processing, etc.
33
Edison Klafke Fillus◦ Software Engineering Research Group◦ Informatics Department◦ UFPR - Federal University of Paraná◦ [email protected]
34