2001 IEEE International Conference on Software Maintenance (ICSM'01)

Preview:

DESCRIPTION

Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements. 2001 IEEE International Conference on Software Maintenance (ICSM'01). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University. Motivation. - PowerPoint PPT Presentation

Citation preview

1

Comparing the Decompositions Produced by Software Clustering

Algorithms using Similarity Measurements

2001 IEEE International Conference on Software Maintenance (ICSM'01).

Brian S. Mitchell & Spiros MancoridisMath & Computer Science, Drexel University

2Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Motivation

Using module dependencies when determining the similarity between two decompositionsis a good idea…

3Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (1)Given the structure of a system…

4Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (2)

The goal is to partition the system structure graphinto clusters…

The clusters shouldrepresent the

subsystems

5Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (3)

But how do we know that the clustering result is good?

6Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Ways to Evaluate Software Clustering Results…

Assess it against a mental modelAssess it against a benchmark standardTechniques: Subjective Opinions Similarity Measurements

Given a software clustering result, we can:

7Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Example: How “Similar” are these Decompositions?

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

Green Edges:Similarity still the same…Red Edges:Not as similar…Conclusions:Once we add the red edges the similarity between PA and PB decreases

PA

PB

Blue Edges:Similarity still the same…

8Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Observations

Edges are important for determining the similarity between decompositionsExisting measurements don’t consider edges: Precision / Recall (similarity) MoJo (distance)

Our idea: Use the edges to determine similarity

9Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Research ObjectivesCreate new similarity measurements that use dependencies (edges) EdgeSim (similarity) MeCl (distance)

Evaluate the new similarity measurements against MoJo & Precision/RecallUse similarity measurements to support evaluation of software clustering results (see our WCRE’01 paper)

10Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Example: How “Similar” are these Decompositions?

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

Add Green Edges:PR, MoJo, MeCl & EdgeSim unchanged.

Add Red Edges:PR, MoJo unchanged. EdgeSim, MeCl reduced.

PA

PB

Add Blue Edges:PR, MoJo, MeCl & EdgeSim unchanged.

11Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Definitions

M1

M2

M3

M4

Internal/Intra-Edge: Edge within a cluster

External/Inter-Edge: Edge between two clusters

12Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

13Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Step 1:Find Common

Inter- and Intra-Edges

14Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

CommonEdge Weight

Total EdgeWeight

=10

19= 53%

15Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

16Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(AB)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

17Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A1 B1)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c

A1,1

A1 A3A2

B1 B2

U

18Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A2 B1)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

A1,1 A2,1

A1 A3A2

B1 B2

U

19Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A1 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e

A1,1 A2,1

A1,2

A1 A3A2

B1 B2

U

20Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A2 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

A1,1 A2,1

A1,2 A2,2

A1 A3A2

B1 B2

U

21Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A3 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

U

22Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

B1

B2

MeCl Example(AB)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

23Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example (AB)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

B1

B2

Newly Introduced Inter-Edges

24Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(BA)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

25Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

A1 A2

A3

MeCl Example(BA)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

26Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example (BA)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

A1 A2

A3

Newly Introduced Inter-Edges

27Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Calculation

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Inter-Edges Introduced

MeCl(AB):({b,e},{e,c},{g,h},{f,h})

MeCl(BA):({e,i},{h,j},{b,f},{c,f},{h,e})

MeCl= 1 -maxW(MAB, MBA)Total Edge Weight

MeCl = 1 -5

19 = 73.7%

28Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Measurement RecapM1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A1:

B1:

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A2:

B2:

P1 P2

MoJo(P1) = MoJo(P2) = 87.5%

PR(P1) = PR(P2) = P:84.6%, R:68.7%, AVGPR=76.7%

Conclusion… P1 is equally similar to P2

29Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Measurement RecapM1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A1:

B1:

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A2:

B2:

P1 P2

EdgeSim(P1)=77.8% EdgeSim(P2)=58.3%

MeCl(P1)=88.9% MeCl(P2)=66.7%

Conclusion… P1 is more similar than P2

30Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Summary:EdgeSim & MeCl

EdgeSim: Rewards clustering algorithms for

preserving the edge types Penalizes clustering algorithms for

changing the edge types

MeCl: Rewards the clustering algorithm for

creating cohesive “subclusters”

31Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Special Modules

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Omnipresent Modules:“Strong” Connection to other ModulesLibrary Modules:Always used by other modules, never use other modules

Isomorphic Modules:Modules equally connected to other subsystems

32Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Special Modules

i

k la b

c f g

h

c

f

g

a

b

h

k

l

i

PA

PB

A1 A3A2

B1 B2

k

k

Omnipresent Modules:RemovedLibrary Modules:Removed

Isomorphic Modules:Replicated

Special Treatment of Special Treatment of Special Modules helps to Special Modules helps to determine the Similaritydetermine the Similarity

d

d

33Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study Overview

Clustering Algorithms

Similarity Evaluation Tool

Similarity Analysis

Precision/Recall

Source Codevoid main(){ printf(“hello”);}

Clustered Result

M1

M2

M3

M5M4

M6

M7 M8

MoJo

EdgeSim

MeClAverage, Variance, etc. based on 100 clustering runs…(4950 Evaluations)

34Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study ObservationsAll similarity measurements exhibit consistent behavior for the systems studied

Removal of “special” modules improved all similarity measurements Treating isomorphic modules specially only improved similarity slightlyEdgeSim and MeCl produced higher and less variable similarity values then Precision/Recall and MoJo

For all systems examined:If MeCl(SA) < MeCl(SB) then MoJo(SA) < MoJo(SB), PR(SA) < PR(SB), and EdgeSim(SA) < EdgeSim(SB)

35Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Questions

Special Thanks To: AT&T Research Sun Microsystems DARPA NSF US Army

Recommended