Upload
feng-zhang
View
75
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Software metrics have many uses, e.g., defect prediction, effort estimation, and benchmarking an organization against peers and industry standards. In all these cases, metrics may depend on the context, such as the programming language. Here we aim to investigate if the distributions of commonly used metrics do, in fact, vary with six context factors: application domain, programming language, age, lifespan, the number of changes, and the number of downloads. For this preliminary study we select 320 nontrivial software systems from SourceForge. These software systems are randomly sampled from nine popular application domains of SourceForge. We calculate 39 metrics commonly used to assess software maintainability for each software system and use Kruskal Wallis test and Mann-Whitney U test to determine if there are significant differences among the distributions with respect to each of the six context factors. We use Cliff’s delta to measure the magnitude of the differences and find that all six context factors affect the distribution of 20 metrics and the programming language factor affects 35 metrics. We also briefly discuss how each context factor may affect the distribution of metric values.We expect our results to help software benchmarking and other software engineering methods that rely on these commonly used metrics to be tailored to a particular context.
Citation preview
How does context affect the
distribution of software
maintainability metrics?
Feng Zhang, Audris Mockus, Ying Zou, Foutse Khomh,
and Ahmed E. Hassan
2
Software Metrics
Numerous Software
3
Various Usage of Software Metrics
4
Contexts !
Motivation
5
In Software Engineering Area?
6
What are the Contexts of Software?
Age (AG)
Number of Changes (NC)
Life Span (LS)
Number of Downloads
(ND)
Application Domain (AD)
Programming Language (PL)
7
39 Software Maintainability Metrics
Complexity (14 metrics) Abstraction (5 metrics)
Coupling (8 metrics)
Cohesion (4 metrics)
Encapsulation (4 metrics)
Documentation (4 metrics)
8
Data Collection
56,833
824
9
Data Cleaning
618
506
478
390
320
824
10
31
26
23
29
49
19
16
41
29
14
13
7
7
9
7
Build Tools
Code Generators
Communications
Framework
Games / Entertainmaint
Internet
Network
Software Development
System Administrator
Build & CodeGen
Comm & Internet
Comm & Network
Games & Internet
Internet & SW Dev
SW Dev & Sys Admin
57
85
18
146
14
C
C++
C#
Java
Pascal
Data Description
320
Software
Systems
11
Research Questions
12
Separately
RQ1. Analysis Methods
13
RQ1. Analysis Methods (cont’)
For example
C Java Pascal C++ C#
Metric
1
Metric
1
Metric
1
Metric
1
Metric
1
Metric
n
Metric
n
Metric
n
Metric
n
Metric
n
Kruskal Wallis test
Kruskal Wallis test
14
Complexity (8/14 metrics) Abstraction (1/5 metrics)
Coupling (5/8 metrics)
Cohesion (2/4 metrics)
Encapsulation (1/4 metrics)
Documentation (3/4 metrics)
YES!! the Contexts Matter!
51 % of metrics are
impacted by all Six
Contexts
15
and Among the Six Contexts …
at least 72 % of
metrics are impacted
by a Single Context
16
Does it mean ALL six contexts
should be considered all the time?
17
Research Question 2
18
RQ2. Analysis Methods
19
RQ2. Analysis Methods (cont’)
C Java Pascal C++ C#
Metric
i
Metric
i Mann-Whitney U test
Metric
i
Metric
i Mann-Whitney U test
Metric
i
Metric
i Mann-Whitney U test
Metric
i
Metric
i Mann-Whitney U test
Metric
i Metric
i Mann-Whitney U test
Metric
i Mann-Whitney U test Metric
i
20
RQ2. Analysis Methods (cont’)
0.147 0.330 0.474 Cliff’s delta
14.7% 33.0% 47.4% % of non-overlap
Small Medium Large Cohen’s standard
0.20 0.50 0.80 Cohen’s d
21
RQ2. Findings for
each Category of Metrics
22
Metric AD PL AG LS NC ND
TLOC - - - - -
TNF - - - - -
TNC - - -
TNM - - -
TNS - - - - - -
CLOC - - - - - -
NOM - - - - - -
NIM - - - - - -
NIV - - - - - -
WMC - - - - - -
NMP - - - - - -
CC - - - - - -
NPATH - - - - - -
MNL - - - - - -
Contexts Impacting ‘Complexity’
AD: Application Domain
PL : Programming Language
NC: Number of Changes
23
Metric AD PL AG LS NC ND
CF - - - - -
CBO - - - -
ICP - - - - - -
MPC - - - - - -
RFC - - - -
NMI - - - - -
FANIN - - - - - -
FANOUT - - - - - -
Contexts Impacting ‘Coupling’
AD: Application Domain
PL : Programming Language
NC: Number of Changes
24
Metric AD PL AG LS NC ND
LCOM - - - - -
TCC - - - - - -
LCC - - - - - -
ICH - - - - - -
Contexts Impacting ‘Cohesion’
AD: Application Domain
25
Metric AD PL AG LS NC ND
NACI - - - - -
MIF - - - - -
IFANIN - - - -
NOC - - - - - -
DIT - - - - -
Contexts Impacting ‘Abstraction’
AD: Application Domain
PL : Programming Language
26
Metric AD PL AG LS NC ND
RPA - - - - -
RPM - - - - - -
RSA - - - - - -
RSM - - - - -
Contexts Impacting ‘Encapsulation’
AD: Application Domain
27
Metric AD PL AG LS NC ND
CLC - - - - - -
RCCC - - - -
CLM - - - - - -
RCCM - - - - - -
Contexts Impacting ‘Documentation’
AD: Application Domain
PL : Programming Language
28
Summary of RQ2 Findings Metric Category AD PL AG LS NC ND
Complexity - - -
Coupling - - -
Cohesion - - - - -
Abstraction - - - -
Encapsulation - - - - -
Documentation - - - -
AD: Application Domain
PL : Programming Language
NC: Number of Changes
29
Metric Category Context Groups
Complexity
AD (2) (Framework); and others
PL (3) (C); (Pascal); and others
NC (3) (Low NC;) (moderate NC); and (high NC)
Coupling
AD (3) (Communication, Network); (Build Tools, Code
Generators;) and others
PL (3) (Pascal;) (Java;) and others
NC (3) (Low NC); (moderate NC); and (high NC)
Cohesion AD (2) (Communication, Network); and others
Abstraction
AD (4) (Communication, Network); (Games); (Build Tools,
Code Generators); and others
PL (3) (Java;) (C++); and others
Encapsulation AD (3) (Build Tools); (Communication, Network); and others
Documentation AD (2) (Build Tools, Code Generators); and others
PL (2) (Java); and others
Guidelines for Benchmarking
Maintainability Metrics
30