Information Theory in Software Metrics (Assessment and Issues) Steve Counsell, (Brunel University...

Preview:

Citation preview

Information Theory in Information Theory in Software Metrics Software Metrics (Assessment and Issues)(Assessment and Issues)

Steve Counsell,Steve Counsell,

(Brunel University and CREST)(Brunel University and CREST)

IntroductionIntroduction

Coupling: Coupling: Well-understoodWell-understood Excessive coupling should be avoided Excessive coupling should be avoided Empirically (in excess) has been associated with Empirically (in excess) has been associated with

fault-proneness in C++ at least fault-proneness in C++ at least The Coupling Between Objects (CBO) metric of The Coupling Between Objects (CBO) metric of

Chidamber and Kemerer has dominated the areaChidamber and Kemerer has dominated the area Simple count of the number of unique classes to which any Simple count of the number of unique classes to which any

single class is coupled (in whatever way)single class is coupled (in whatever way)

Introduction (cont.)Introduction (cont.)

Theoretical properties also well Theoretical properties also well understoodunderstood Coupling of a modular system is non-Coupling of a modular system is non-

negative negative Merging two modules can’t increase system Merging two modules can’t increase system

coupling coupling Based on a modular system being Based on a modular system being

comprised of nodes and ‘edges’ comprised of nodes and ‘edges’ connecting those nodesconnecting those nodes

Information Theoretic Information Theoretic metrics (for coupling)metrics (for coupling)

Pioneered by Allen and Khoshgoftaar Pioneered by Allen and Khoshgoftaar (A&K)(A&K) First appeared based on Allen’s PhD work, First appeared based on Allen’s PhD work,

c.1996c.1996 METRICS paper in 1999METRICS paper in 1999

At the time created a bit of a stir At the time created a bit of a stir Metrics community re-thinkMetrics community re-think Could be applied to both OO and proceduralCould be applied to both OO and procedural Appealed to the cross-disciplinary ethos Appealed to the cross-disciplinary ethos

RoadmapRoadmap

Explain A&K’s metric for system couplingExplain A&K’s metric for system coupling Based on a modular system graphBased on a modular system graph

Demonstrate its usefulness Demonstrate its usefulness and drawbacks and drawbacks

Identify open issues Identify open issues Research paths in evaluating/modifying the Research paths in evaluating/modifying the

metricmetric Other applicationsOther applications

Explaining A&K’s Explaining A&K’s couplingcoupling

A modular system A modular system

Source: Allen and Khoshgoftaar, 1999

Inter-module couplingInter-module coupling

Source: Allen and Khoshgoftaar, 1999

Part IPart I

RepresentationRepresentation

Source: Allen and Khoshgoftaar, 1999

EntropyEntropy

The average information per nodeThe average information per node Always non-negative Always non-negative

Defined as:Defined as:

Entropy (cont.) Entropy (cont.)

All logs All logs base 2base 2

Unit of measure is a bit Unit of measure is a bit Graph selected has entropy Graph selected has entropy HH((SS) of 2.46) of 2.46

Part IIPart II

Sub-graph analysisSub-graph analysis

Consider the subgraph SConsider the subgraph Si i consisting of all consisting of all

the nodes in S and the edges of S that the nodes in S and the edges of S that have the have the iithth node as an end point node as an end point Disconnected nodes included in the sub-Disconnected nodes included in the sub-

graphgraph

Calculate the same probability Calculate the same probability distribution as we did previously distribution as we did previously

For node 2 For node 2 Node Edge 1 Edge 4

0 0 0

1 1 0

2 1 1

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

9 0 0

10 0 0

11 0 1

12 0 0

13 0 0

14 0 0

Source: Allen and Khoshgoftaar, 1999

Entropy (for distribution Entropy (for distribution of node labels)of node labels)

Defined as:Defined as:

Entropy (cont.)Entropy (cont.)

Gives an entropy Gives an entropy HH((SSii) total) totalvalue (value (ii : 0..14) of 6.28 : 0..14) of 6.28

Part IIIPart III

Ethos of the coupling Ethos of the coupling metric metric

The entropy of the modular system taken The entropy of the modular system taken as a whole is less than or equal to the as a whole is less than or equal to the sum of entropies of the individual sum of entropies of the individual componentscomponents H(S) <= sum H(SH(S) <= sum H(Sii))

The difference between these values The difference between these values represents the true coupling relationships represents the true coupling relationships or ‘excess entropy’or ‘excess entropy’

Excess entropy Excess entropy CC((SS))

C(S) = 6.28 – 2.46 = 3.82

Where:

Coupling in a modular Coupling in a modular system (ms)system (ms)

Coupling(MS) = (Coupling(MS) = (nn+1) C(S) +1) C(S)

= 15 * 3.82 = 57.28= 15 * 3.82 = 57.28

Assessment of the Assessment of the metricmetric

‘‘A metric sensitive to patterns A metric sensitive to patterns of connections. This is of connections. This is attractive, because software attractive, because software engineers recognize patterns as engineers recognize patterns as well’ (Allen and Khoshgoftaar, well’ (Allen and Khoshgoftaar, 1999)1999)

Coupling(MS): a. 2.76 f. 26.83b. 8.00 g. 30.83c. 16.00 h. 34.83d. 17.32 i. 22.04e. 24.07 j. 27.78

Source: Allen and Khoshgoftaar, 1999

Coupling (CBO): a. 2 f. 8b. 4 g. 10c. 6 h. 12 d. 6 i. 8

e. 8 j. 8

Source: Allen and Khoshgoftaar, 1999

Coupling(MS): a. 2.76 f. 26.83b. 8.00 g. 30.83c. 16.00 h. 34.83d. 17.32 i. 22.04e. 24.07 j. 27.78

Comparison with CBOComparison with CBO

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8 9 10

Graph

Coupling(MS)

CBO

IssuesIssues

Computes system coupling Computes system coupling Most coupling studies use a class coupling basisMost coupling studies use a class coupling basis

Need a ‘class-based’ entropy measure (NHD)Need a ‘class-based’ entropy measure (NHD) Comparison between i. and j. Comparison between i. and j.

Suggests that I is ‘better’ than j. Suggests that I is ‘better’ than j. OO people might disagree with an inheritance structure being OO people might disagree with an inheritance structure being

‘better’ ‘better’ Maintaining the root node would be highly problematicMaintaining the root node would be highly problematic

Do developers really look for patterns?Do developers really look for patterns? Does not take into account the ‘type’ of coupling Does not take into account the ‘type’ of coupling Can not be gleaned from a UML class diagramCan not be gleaned from a UML class diagram

Potential studiesPotential studies

Fault analysis Fault analysis Which of the two correlates more with faultsWhich of the two correlates more with faults

Larger-scale study Larger-scale study The effect of refactoring on the values of The effect of refactoring on the values of

both Coupling(MS) and CBO both Coupling(MS) and CBO Hamming distance for coupling?Hamming distance for coupling? A final word on cohesion……A final word on cohesion……

CohesionCohesion

A key advantage of the CBO and the A key advantage of the CBO and the reason for its popularity is that there is no reason for its popularity is that there is no argument about its interpretation and to argument about its interpretation and to some extent the Coupling(MS); it is an some extent the Coupling(MS); it is an objectiveobjective measure measure

The same cannot be said about The same cannot be said about cohesion, because it is cohesion, because it is subjectivesubjective

Thanks for Thanks for listeninglistening

Recommended