View
213
Download
1
Category
Preview:
Citation preview
Information and Software Technology 39 (1997) 497-509 ELSEVIER
Metrics for quality analysis and improvement of object-oriented software
Christof Ebert”,*, Ivan Morschelb
aAlcatel TeleEom, Switching Systems Division, B-2000 Anhverpen, Belgium bDaimlt-r Benz Research Centre, Urn, Germany
Received 13 August 1996; revised 15 November 1996; accepted 21 January 1997
Abstract
Software metrics are playing an important role in analysing and improving quality of software work products during their development. Measuring the aspects of software complexity for object-oriented software strongly helps to improve the quality of such systems during their development, while especially focusing on reusability and maintainability. It is widely accepted that more widespread use of object-oriented techniques can only come about when there are tool systems that provide development support beyond visualising code. Unfortunately, many
object-oriented metrics are defined and applied to classroom projects but no evidence is given that the metrics are useful and applicable - both from an experience viewpoint and from a tools viewpoint - for industrial object-oriented development. Distinct complexity metrics are developed and integrated in a Smalltalk development support system called SmallMetric. Thus, we achieve a basis for software analysis (metrics) and development support (critique) of Smalltalk systems. The main concepts of the environment including the underlying metrics are explained, its use and operation is discussed, and some results of the implementation and its application to several industrial projects are
given with examples. 0 1997 Elsevier Science B.V.
Keywords: Development support; Maintainability; Metrics; Object-oriented metrics; Quality control; Smalltalk
1. Introduction
Software metrics are measures of development processes and the resulting work products. In this context we will
focus on metrics that are applicable to software developed
in Smalltalk. We will further concentrate on such metrics
that can be used as quality indicators during the develop- ment process, hence providing support for the developers. These metrics are often classified as product metrics
because their inputs are products of the development pro-
cess. We will not distinguish metrics and measures from a mathematical point of view. When referring to complexity
metrics we are using this phrase for a group of software metrics that measure structural or volume aspects of pro- ducts that are intuitively related to parts difficult to under-
stand. These difficulties in dealing with such complex components have been proved to cause high error-rates,
testing effort and bad maintainability (for further details
on metrics see 111). Object-oriented programming practices per se will never
make poor programmers into good ones. The opposite is the
* Corresponding author. Tel.: +32 3 240 4081; fax: +32 3 240 9935; e-mail: christof.ebert@alcatel.be
0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PI1 SO950-5849(97)00005-O
case, especially while reflecting current surveys that show that corporate adoption of object-oriented technology has tripled since 1994, while at the same time, it is considered
as having the lowest success rate among new technologies. Or in the words of one recent workshop summary: “Realis-
tically, object-orientation will deliver benefits, but fewer of
them, and with more effort required on the part of the orga- nisation, than the promises of its most zealous advocates imply.” Typical class libraries consist of look-alike classes
that drown programmers, classes that cannot delete
unwanted properties, encapsulation at the class level that
severely limits reuse (i.e. hiding too much details of overly specialized classes), or static class hierarchies undermined with quickly evolving information domains. As a result
most programmers rather copy the class, change it, and create a new member to the library. It is highly necessary to give during design insight in design decisions that improve reuse instead of hiding valuable concepts in a
flood of case-specific details. Because of the extreme subjectivity of quality attributes
per se, it is important to select metrics that can be applied to the specific objectives of a project, that have been derived from the project’s requirements and can be used to prove consistency, that can be applied during several phases of the
498 C. Ebert, I. MorscheUInformation and Sofrware Technology 39 (1997) 497-509
development process on resulting products (design, code, documentation, etc.), and that can be collected and
evaluated automatically by the development environment
in use. This article introduces complexity-based classification
techniques as a basis for constructing quality models that can identify outlying software components that might cause
potential quality problems. Such models are based on the
experience that typically a rather small number of compo-
nents (e.g. methods) has a high failure rate and is most difficult to test and maintain. Our own project experiences
for instance just recently showed that 20% of all compo- nents in telecommunication projects were the origin of over
40% of all field failures with high priority. Even worse is the
fact that we could also show that it is not so difficult to identify these components ex ante - either by asking
designers and testers and grouping their subjective ratings, or by applying classification rules based on simple structural software metrics [Z].
Effects of applying complexity-based criticality predic-
tion to a new project can be summarised as follows:
l 20% of all modules in the project were predicted as most
critical (after coding), and l these modules contained over 40% of all faults (up to
release time).
Knowing that
l 60% of all faults can theoretically be detected before
system integration, and l fault correction during module test and code reading
costs less than 10% compared to fault correction during system test.
Twenty-four percent of all faults can be detected early by
investigating 20% of all modules more intensively with 10%
of effort compared to fault correction during system test, therefore yielding a 20% total cost reduction for fault cor- rection. Additional costs for providing the statistical analy- sis are in the range of two person days per project.
Necessary tools are off the shelf and account for even less per project.
In this context the paper addresses typical questions often asked in object-oriented software engineering projects:
How can I identify early the relatively small number of critical components that mainly contribute to bad quality identified later in the life cycle?
Which components should be better redesigned because their maintainability is bad and their overall criticality to the project’s success is high? What is the benefit of introducing a metrics program that investigates structural properties of software? Are there structural properties that can be measured early in the code to predict quality attributes? Can I use the - often heuristic - design and test know- how on trouble identification and risk assessment to
build up knowledge-based systems that help to identify critical components early in the development process?
l Last, but surely not least, is it after all feasible to auto-
mate such classification in an object-oriented develop-
ment environment?
The paper is organised as follows. The introductory Sec- tion 2 presents a brief overview of background and problems
associated with metric-based classification models in the
context of object-oriented development. Section 3 gives a
very brief overview of the basic concepts of object-
orientation and of Smalltulk. We will describe a program- ming analysis environment for SmaZltulk-80 [3], that we
selected because of the language’s perceived uniformity and elegance. The syntax of Smalltalk is easy to understand,
it possesses a small number of operators (in contrast to C++), and it completely supports the notion of object,
class, and inheritance. This article presents a basic set of metrics to support the development of object-oriented pro-
grams as well as a tool to automatically measure and to judge programs written in Smulltulk. Sections 4 and 5
describe the selection of metrics for object-oriented soft- ware and a tool environment called SmallMetric. Results
from applying the analysis environment in industrial pro- jects are presented in Section 6. Conclusions and a brief summary with an outlook for further work are given in
Sections 7 and 8.
2. Metrics for quality models
Although striving for high quality standards, only a few organisations apply true quality management. Quality man-
agement consists of proactively comparing observed quality
with expected quality, hence minimising the effort expended on correcting the sources of defect, In order to
achieve software quality, it must be developed in an organised form by using defined methods and techniques and applying them consistently. In order to achieve an indi-
cation of software quality, software must be subjected to measurement. This is accomplished through the use of
metrics and statistical evaluation techniques that relate specific quantified product requirements to some attributes of quality.
The approach of integrating software measures and statistical techniques is shown in Fig. 1. The object-oriented
CASE environment provides the formal description of
different products developed during the software life-cycle with its necessarily defined methods and the underlying process. Multivariate statistical techniques provide feed- back about relationships between components (e.g. factor analysis, cluster analysis) [2,4]. Classification techniques help determining outliers (e.g. error-prone components) [4-61. Finally, detailed diagrams and tables provide insight into the reasons why distinct components are potential out- liers and how to improve them [ 1,4].
C. Ebert, I. MorscheUInformation and Sojiware Technology 39 (1997) 497-509 499
Fig. 1. Measures and statistical techniques in software engineering.
Product metrics are used to supply models for [ 1,2,5-71:
l estimating effort and costs of new projects; l evaluating the productivity to introduce new technolo-
gies (together with their methods and tools);
l measuring and improving software quality; l forecasting and reducing testing and maintenance effort.
Quality or productivity factors to be predicted during the development of a software system are affected by many product and process attributes, e.g. software design charac- teristics or the underlying development process and its
environment [ 1,5,6]. Quality models are based upon former project experiences and combine the quantification of aspects of software components with a framework of rules
(e.g. limits for metrics, appropriate ranges etc.). They are generated by the combination and statistical analysis of product metrics (e.g. complexity measures) and product or process attributes (e.g. quality characteristics, effort, etc.) [5,6]. These models are evaluated by applying and compar- ing exactly those invariant figures they are intended to pre- dict, the quality metrics (e.g. maintainability, error rate,
etc.). Iterative repetition of this process can refine the qual- ity models hence allowing the use of them as predictors for similar environments and projects. For assessing overall quality or productivity, it is suitable to break it down into
its component factors (e.g. maintainability), thus arriving at
several aspects of software that can be analysed quantita- tively. Typical problems connected to data collection, ana-
lysis, and quality modelling are addressed and discussed comprehensively in [ 1,5,6].
Classification or clustering algorithms are mathematical
tools for detecting similarities between members of a col- lection of objects. Classification algorithms can be loosely categorised by the underlying principle (objective function, graph-theoretical, hierarchical) or model type (determinis- tic, probabilistic, statistical, fuzzy). Information about the objects (e.g. software components) to be analysed is input to classification algorithms in the form of metric vectors. The elements of a metric vector are the measurements of distinct software features that have been chosen as a basis for com- paring a specific object to other objects. The output of a classification or clustering algorithm can then be used to
500 C. Eberi, 1. MorscheVlnformation and Software Technology 39 (1997) 497-509
classify the objects into subsets or clusters. The classifica- tion of metric vectors can be performed with the information about different classes (e.g. errors, change-rate). The train-
ing of any classification algorithm using this kind of infor-
mation is called supervised. If the algorithm classifies the
data autonomously the training is unsupervised. We will
further focus on supervised learning because quality metrics
are provided within training data sets. Metric vectors assigned to the same cluster are in some
sense similar to each other, more so than they are to other metric vectors not assigned to that cluster. Instead of pre-
dicting number of errors or changes (i.e. algorithmic rela-
tionships) we are considering assignments to groups (e.g. ‘high maintainability’). While the first goal has been
achieved more or less with regression models or neural net-
works predominantly for finished projects, the latter goal
seems to be adequate for predicting potential outliers in running projects, where preciseness is too expensive and unnecessary for decision support.
Due to successful application in many projects such metrics obviously should be available for object-oriented
environments. The goals are the same as for procedural systems, primarily indicating potentially troublesome classes that should be improved before being introduced
to the class libraries. The object-oriented paradigm could directly profit from metrics as a vehicle to instruct staff who are new to this approach. Furthermore, software metrics could be used to measure the problems to introduce this paradigm and its acceptance as well as to set design stan-
dards for an organisation. Traditional metrics for procedural approaches are not
adequate for evaluating object-oriented software, primarily because they are not designed to measure basic elements like classes, objects, polymorphism, and message-passing. Even when adjusted to syntactically analyze object-oriented
software they can only capture a small part of such software and so can just provide weak quality indication [g-lo]. Even dedicated research on metrics for object-oriented pro- grams gave only rough guidelines, such as limiting the size
of methods in Smalltalk to 12 lines, without any indication on how to taylor project-specific design guidelines [IO]. It is hence important to define customised metrics for object- oriented programs. Additionally the characteristics of the
target language should be considered. Some languages directly support the object-oriented approach (C + + , Smalltalk, Eifsel) and others just to some extent (Ada). Other factors like the size and contents of the class library and the semantics and syntactical form of particular com- mands should also be considered.
3. Object-orientation and Smalltalk
Object-oriented modelling and programming is based on four fundamental concepts, namely abstraction, inheritance, encapsulation within classes, and polymorphism. For better
understanding of the approaches described later, we will try to give a rather brief summary about interesting features of
object-orientation with respect to the Smalltalk program- ming language. In this paper we use the Smalltalk terminol-
ogy. As can easily be imagined the term ‘object’ plays a
central role in object-oriented programs. It encompasses
data structures that describe its state, and methods that rea- lise its functionality. Data structures are encapsulated and
provide information hiding with respect to their object
which means that they do only offer access functions, called methods, but no direct use of the internal data structures.
Objects communicate with each other via message passing which means that one method starts a method in another
object. Mechanisms to hierarchically structure objects in
classes exist in all object-oriented languages. Instances can be derived from classes and differ from other objects
only on the basis of associated states. Another important characteristic is the possibility of
incrementally defining class hierarchies. This is done by
the inheritance mechanism. From a superclass, a subclass inherits all its data structures and methods. In the subclass, new data structures and methods can be defined or they can be rewritten. Smalltalk’s inheritance mechanism for exam-
ple is designed to model software evolution as well as to classify.
Smalltalk supports the object-oriented concepts fully. It manipulates classes, objects and implements a single inheri-
tance mechanism. It does not include multiple inheritance, prototypes, delegation, or concurrency in its standard ver- sion. In addition to its programming language, Smalltalk and
its current derivatives and flavours include an open pro- gramming environment to develop object-oriented pro- grams. It offers a comfortable graphical user interface, several helpful tools and a vast class library. Programming language and environment coexist in a homogeneous form,
where concepts at the programming level are reproduced in the environment. An example is the message passing
mechanism. In Smalltalk programs, it means the activation of a method. The same strategy is used in the user interface to identify a selected object. This message passing consists
of: an object (the receiver of the message), a message selec- tor and optional arguments as parameters.
Object behaviour is described by the mentioned methods,
which have a selector and include Smalltalk commands. In different classes, methods with the same selector can exist.
This is called polymorphism. The status of an object is
captured through class and instance variables that might be accessed from outside by suitable methods. Class vari- ables are defined at the class level and instance variables at the object level to store object’s states.
4. Metric analysis and development support for object- oriented software
Goals such as quality improvement, increasing productivity
C. Ebert, I. MorscheWlnformation and Sofhvare Technology 39 (1997) 497-509 501
or maturity certification are of growing interest in industry.
Navigating the way with metrics is one important approach
to ensure that a company stays on the course of achieving
these goals. Though the search for underlying structures and rules in a set of observations is performed in software quality control and effective solutions to refine forecasting
methods based on past data have been suggested, so far their applicability to object-oriented software development has
been restricted.
There is a growing awareness that such approaches could also support the object-oriented software development pro- cess. Anybody starting with object-oriented software raises
similar questions that also serve as guidelines for develop- ing a measurement tool environment for quality control:
What is good style in object-oriented programs?
Are there any rules that can be applied to develop a good
object-oriented program? Which metrics could be employed in order to determine
if a program is ‘good’ or not? What contributes to the complexity of an object-oriented
system?
Based on such questions there has been substantial work concerning the definition of metrics for object-oriented pro-
grams, however most approaches have been based on num- ber crunching after the facts and only a few industrial environments that include complete tool-support were
described. With the broader application of this paradigm quality control, both analytic (i.e. with metrics) and con- structive (i.e. by providing design and help facilities) quality
control are of increasing importance. One of the first attempts to investigate quality aspects of
object-oriented programs was done by Lieberherr and col- leagues [ 111. They defined a set of design rules that restricts
the message-sending structure of methods. It was called Law of Demeter. Informally, the law says that each method can send messages to only a limited set of objects: to argument objects, to the self pseudovariable, and to the immediate
subparts of self (self being the object or class itself). The Law of Demeter thus attempts to minimise the coupling
between classes. Many applications of metrics for object-oriented pro-
grams originated from transforming metric-concepts for
procedural programs (e.g. message passing, calling struc- tures, cyclomatic complexity) [7,9,12]. It is of course clear that many concepts hold for object-oriented software
as well and thus might be questioned as influencing param- eters when it comes to measuring achieving objectives (e.g. volume, class tree). However, such metrics do not cover
completely the relevant aspects of coupling, such as inheri- tance or polymorphism. Other approaches suggest metrics that go beyond such transformations and really focus on object-oriented descriptions, but do not offer any guidelines for using the metrics in practical projects [8,13]. Sharble and Cohen compare two object-oriented development methods using an object-oriented brewery as example [14]. They
suggest indicators to enhance the software quality by
increasing cohesion, reducing coupling, increasing poly-
morphism and eliminating redundancy. Many metrics
have been defined and applied to a toy environment but no evidence was given that the metrics are useful and applicable - both from an experience viewpoint and
from a tools viewpoint - for practical object-oriented development.
Our approach for selecting appropriate metrics is goal-
oriented, rather than following already published literature that often measured what seemed measurable. Such a mere definition of a metrics suite combined with statistical num-
ber crunching without intuitive backgrounds would result in the same acceptance problems procedural metrics applica-
tions ran into during the eighties [l]. As long as the objec-
tives of a object-oriented development process are not stated and supported with tailored methods, metrics would be of no
practical help. We therefore focused on the product and its inherent quality attributes and then determined how to mea-
sure their achievement during design. The process for building a development support environ-
ment thus closely followed measurement theory [ 11:
1.
2.
3.
4.
5.
Identify and define intuitive and well-understood attri- butes of software quality and productivity that should be achieved during the project. Here we selected reuse and
maintainability. Determine metrics that clearly specify these desired attri-
butes. We built this set of quality metrics (as opposed to quality indicators that are determined in a later step) upon interviews with designers and retrieved few repro- ducible quality metrics related to maintainability and
reuse. Most were based on ranking classes of past
projects. Specify precisely the underlying documents, structures, or attributes of these documents to be measured. We identified as objects to be measured SmalEtulk programs from the beginning of a project onwards.
Determine formal models or abstractions which relate the quality attributes to properties of the underlying
documents and their individual attributes. Based on the interviews with designers we built relations between what reusability and maintainability means and how it
relates to distinct product properties. Define metrics that measure these selected product prop- erties. These metrics must be available during design because otherwise they cannot be used as indicators for
the already selected quality metrics. We thus call them quality indicators.
6. Validate these metrics based on past projects.
To overcome the stated problems related to practical
applicability and necessary tool support we introduce Small-
Metric, a tool to evaluate and meliorate object-oriented pro- grams written in Smalltalk. It is embedded in an environment for the learning of object-oriented program- ming [1.5].
502 C. Ebert, I. MorscheUInfomation and Software Technology 39 (1997) 497-509
5. A description of the object-oriented metrics framework
SmallMetric analyses object-oriented programs by apply- ing construction rules that distinguish between (Fig. 2):
. the static and dynamic structure of a class or an object; l the static and dynamic relationships between classes and
or objects.
The metrics that are presented in the following list repre- sent different aspects of object-oriented software. We will describe the underlying intuition of the metrics as well as a comprehensive summary of our observations from object- oriented development projects.
5.1. Metric 1: Volume
The volume of an object is a basic size measure that is intuitively connected with the amount of information inside a class. Many empirical investigations of metrics showed relationships among size metrics and comprehensibility or number of errors [7,10,12]. Volume thus is a potential indi- cator of the effort to develop an object as well as for its future maintenance. The larger the number of variables and methods, the more specific for one application. In other words, the object’s reusability is likely to be small with increasing volume. Volume can be measured by:
l number of variables (class and instance variables); l number of methods (class and instance methods).
Both metrics measure mere volume and do not account for the impact of inheritance and reuse, which will only be seen within a whole suite of metrics.
5.2. Metric 2: Method structure.
The internal structure of an object based on its methods and the information that is accessed by them is an indicator
of its functionality. If the methods are overloaded with information to pass back and forth, there is good reason to assume that the object or class should be broken into several objects or classes. Method metrics are used to forecast effort for debugging and testing early. Method structure can be measured by:
l number of parameters per method; l number of temporary variables per method; l number of message passing per method.
5.3. Metric 3: Cohesion
The term cohesion is frequently used in software engi- neering to designate a mechanism for keeping related things
together. Cohesion can be defined to be the degree of simi- larity of methods. The higher the degree of similarity of methods in one class or object, the greater the cohesiveness of the methods. Cohesion in Smalltalk means the organisa- tion of methods, which set or access the value of a class or instance variable, under predefined schemes (protocols).
These protocols are predetermined in Smalltalk. The pro- grammer can use them to manipulate variables of an object. Such methods are called accessors [16]. The intuitive base is that direct reference to class and instance variables limits inheritance by fixing storage decisions in the superclass that can not be changed in a subclass. Besides, modifications in the structure of these variables are not visible to other meth- ods, just to the accessors. Hence, the effort to extend or to modify a given program is minimised.
As an example consider an instance variable instVar of an object anobject. To access the class and instance variables it is necessary to define two kind of methods:
l one method for getting the value of an instance variable
instVar
* insWar
INTER- INTRA-
classes / objects classes I objects
used inherited variables naming
us& inherited methods number of variables
external use of methods number of methods
coupling: prdocol ‘jxivatd useofprcbcois
abstractclasses number of parameters I method
number of messagespassing~~
cohesion: preckftmed protocols
Fig. 2. Taxonomy for SmallMetric.
C. Eberf, 1. MorscheUlnformation and Software Technology 39 (1997) 497-509 503
l and another for setting an instance variable
instvar: aValue
instVar := aValue
This solution forces all accesses to variables to go through an accessor method. Therefore, information hiding with respect to variables and methods in a class is enforced [ 171. SmallMetric examines a Smalltalk program to find accesses to variables outside of the predefined protocols. This is called a cohesion violation of an object.
5.4. Metric 4: Coupling
Coupling designates the interaction between objects that are not related through inheritance. Excessive coupling between objects besides inheritance is detrimental to mod- ular design and prevents reuse. The more independent an object, the easier it is to reuse it in another project [ 1,111. The suggested metric is:
. number of invoked classes.
A predefined scheme in Smalltalk is the protocol private. It comprehends methods that should only be activated inside an object. The Smalltalk compiler or interpreter does not check these specific accesses. When a message from another object starts a method under this protocol, undesirable effects can occur because during development such access had not been anticipated. SmallMetric tries to identify such references.
5.5. Metric 5: inheritance tree
This group of metrics analyses the amount of inherited variables and methods used by a class. The use of inherited methods and data in a class indicates the difficulty of chan- ging superior classes. On a low level of the inheritance tree variables and methods available to a class could be changed in meaning several times on higher levels, thus increasing complexity even more. It is hence necessary to provide information about how many methods and variables are available to a distinct class. The metrics are:
l inherited variables used; l inherited methods used.
In Smalltalk, an instance variable can be directly set by an object of a subclass. This can reduce the reuse of a class in other applications. SmallMetric nominates it an ‘informa- tion hiding violation’ (Fig. 3).
5.6. Metric 6: Class organisation
This group of analyses captures three comprehensibility indicators: naming, checking of comments and the use of predefined protocols. Naming analyses all identifiers of a class. SmallMetric informs the developer about their
distribution. This metric has just documentation purposes. The existence of comments within an object is also checked. In Smalltalk, one can define a global comment to clarify the intents and functionality of an object. SmallMetric warns when there is no such comment provided. It is clearly impossible to check comments based on contents, so we focused only on existence. The programmer may organise the methods of an object under predefined protocols. The Smalltalk environment advises the developer to use these recommendations which is checked by SmallMetric. For novices, these protocols can help to elucidate some aspects of a Smalltalk program.
6. Experiences with SmallMetric for Smalltalk program analysis
Upon starting SmallMetric a window is opened, which inquires the name of the class to be analysed. Wildcards (*) can be used. When the given class is found, a new window is created (Fig. 4). It presents the following infor- mation:
1. number and list of all variables; 2. number of methods; 3. buttons to switch between class and instance 4. predefined protocols used; 5. naming; 6. violations of SmallMetric metrics.
Four buttons are provided to select a new class, to print the information of a class, to switch between different dia- logue languages (now English and German) and to activate Help. The critique window of course can be adjusted to specific standards and process guidelines of an organisation. It has a menu, which presents the design limits for develop- ment support.
SmallMetric comprises a basic set of guidelines for metric-based development support of Smalltalk applica- tions. On the basis of the metrics above as applied to Small- talk projects with available quality data, we extracted some design guidelines to enhance the quality of object-oriented programs written in Smalltalk. Because one of the main reasons for using object-oriented technology is reusability, we focused our evaluations on maintainability and reusabil- ity. Such guidelines should be understood as recommenda- tions and not as restriction of a programmer’s creativity. The projects being analysed ranged in size from few classes to 400 classes of a commercially available Smalltalk-based tool, thus covering effort of up to 30 person years. Our approach for extracting guidelines of metrics that can serve as indicators of poor quality is based on analysing the classes with respect to complexity metrics and quality data. Since the metrics are applied on different scales (com- plexity metrics: at least interval scale and quality metrics: ordinal scale) we performed non-parametric statistical methods for correlations and factor analysis. Unlike in
504 C. Ebert, I. MorscheWInformation and Sojiware Technology 39 (1997) 497-509
Object subclass: #Superclass instanceVariableNames: ‘text classVariableNames: ” poolDictionaries: ” category: ‘SmallMetric’
Superclass subclass: #Subclass instanceVariableNames: ” classVariableNames: ” poolDictionaries: ” category: ‘SmallMetric’
Subclass class methodsfor: ‘instance creation’ new
“super new initialize
Subclass methodsfor: ‘initialize release’ initialize
text := ‘This is an example of an information hiding violation !!’
Fig. 3. An information hiding violation.
other approaches we do not discard outliers, because it is - a SmaZLtuZk project related to a development environment
ex ante - unknown what classes or metrics are outliers. with simulation on an Occam real-time engine. The whole
Instead all metrics are normalised with a quadratic approach project was performed by a five person core development
before comparing or ranking them. team in a timeframe of several years. 71 classes have been
To show the practical application of SmallMefric we took provided and we investigated these classes with Small-
Small-Critic on Bibliothek-llserlnterface
Bestandsverwdung Mdnwesen .- Verwsltung LeitHele Likraturnachweis Perscnslverwsfhng
65 (27) Class messages at all. 7 (7) Class messages to SELF. 2 (1) Class messages to a super class. 0 (0) Class messages to MODEL. 0 (0) Class messages to VIEW 0 (0) Class messages to CONTROLLER. 56 (21) Class messa to c&her classes. 0 (0) class messego 0 (0) Class message 0 (0) Ctnss message 0 (0) Class message
Maximal number of meeeagas in a Method:
Mexbnsl number 04 Ymthodm:
Marimsl number af Varisbles:
VdUlll& hWmal kngthofnemes: There ere 6 varisbles in this class (more than 5) ! Max. number d cencerned cleereer
Structure d method+: Yawimsl mmlber d mmmmagmm to otiside:
6 Class methods.
Fig. 4. The user interface of SmallMetric.
C. Eben, I. h4orscheUlnformation and So&are Technology 39 (1997) 497-509 505
cohesion quality
0 20 40 60 80 100 120 140 160 180
voi-meth
Fig. 5. Scatterplot of number of methods (axis) and cohesion (ordinate) with respect to maintainability (shape of dots on a 5-level scale) for all samples.
Metric. Before the object-oriented metric collection, how-
ever, we interviewed the development team to find out what the ranking of classes in terms of maintainability is, This -
subjective, yet experience-driven - ranking was further exploited for analysing the structural metrics on their valid-
ity as indicators of maintainability. Maintainability involves a longitudinal study which we achieved by interviewing
designers after the facts of the first design - which we investigated - when they were already maintaining it for
more than a year. The approach can be generalised because it follows exactly to what has been proposed by measure-
ment theory (see Section 4). Fig. 5 shows all 71 samples that were investigated in the course of this analysis (axis is volume in terms of methods, ordinate is cohesion metric, and the shape represents the perceived maintainability). The scatterplot relates volume in terms of number of methods
and cohesion on the two axes together with maintainability
as the shape of the dots. Fig. 6 shows for one metric, number of methods, the
sualr & __ . . . . . __ __ .._ . . . . . . . . . . .
m
variance related to the five levels of maintainability assigned to each sample. The box is drawn with the mean
in its centre and one standard deviation size to both direc- tions. Outliers are indicated by the whiskers. In the case of
few samples the standard deviation makes not so much sense which results with whiskers inside the box.
Not all object-oriented metrics calculated by SmallMetric
will be discussed in the context of this article due to space restrictions. We will further concentrate on the following metrics that have been condensed with factor analysis (i.e.
factor analysis has been applied for reducing the dimension- ality of the original metrics by focusing on what are ortho- gonal factors; factor analysis typically replaces the original metrics which is why in two cases so-called hybrids were
added that mainly reflect structure and cohesion):
l vol-meth: number of methods; . vol-vur: number of variables; l str-meth: method structure (this is a hybrid of structural
metrics regarding the code of a method);
. . . . . . . ._ . . . . . . . . . . . . . f. . . . 0 vol.math
Cl-. ,LL.
1 I 0 20 40 60 80 100 120 140 160 180
Fig. 6. Box-Whisker plots for number of methods and the related maintainability levels of a11 samples. Boxes are with mean in the center and one standard
deviation length in both directions, which explains why in cases with few samples the outliers are within the boxes.
506 C. Ebert, I. MorscheUInformation and Software Technology 39 (1997) 497-509
Table 1
Results of the Spearman rank correlation
vol-meth vol-var str-meth cohesion comments quality
vol-meth 0.800 0.607 0.721 - 0.131 - 0.691
vol-var 0.413 0.754 - 0.194 - 0.569
str-meth 0.414 0.077 - 0.439
cohesion - 0.101 - 0.675
comment - 0.023
l cohesion: cohesion (this hybrid of primarily the men-
tioned cohesion metrics mainly focuses on access to variables outside the predefined protocols);
l comments: comments (this metric was only added because we wanted to investigate the importance of
comments).
All metrics but the last are at least on an interval scale; comments are on an ordinal scale. The related quality metric
maintainability is on an ordinal scale as well. This quality metric is on a scale 1. ..5, where 5 is the highest maintainability.
6.1. Analysis 1: Relationships between the metrics
A Spearman rank correlation was performed to investi- gate relationships among the metrics (Table 1). Signifi- cance levels for most correlations were far below 0.0005; only comments had a significance level above this limit. The highest random correlation coefficient
that we generated in 1000 trials with random metric gen- eration based on the given set of metrics observations and their distribution was 0.244. This means that correlation
coefficients higher than this limit are meaningful because even many trials with random, however fitting data, would not generate higher correlations. The p value based on the given r-coefficients higher than 0.50 with a significance
level of CY = 0.05 is in the interval of [0.43,0.73]. An orthogonal factor rotation of all metrics clearly separates three groups, namely the object-oriented metrics, com- ments, and quality. For better insight the single linkage
values of building clusters of ‘similar’ metrics are given. Single linkage values combine vol-meth and vol-var on a
level of 0.35, both with cohesion on a level of 0.42, then quality on a level of 0.50, str-meth on 0.59 and finally comments on 0.92.
6.2. Analysis 2: Regression model of maintainability
The second step of the analyses performed was regarding the predictability of the maintainability factor. For the com- plete regression analysis the dependent variable is quality, while the independent variables are vol-meth, vol-var, str- meth, cohesion and comments. The probability > F is equal to 2E - 07. There are 5 independent and one dependent variable in the regression model. Probability > F, com- monly known as the p-value, indicates the significance of
a regression. The smaller the probability, the more signifi- cant the regression. The very small value of probability > F
indicates, with a high degree of confidence, that some pre-
diction is possible. R-square is the quality indicator of a regression test that
measures the quality of the predictions. It shows how much variance in the dependent variable is accounted for in the
sample of 71 observations. Adjusted R-square measures the
same aspect as R-square, but in the population, with the adjustment depending on both sample size and the number of independent variables. R-square is 0.450 and adjusted R-
square is 0.408. The residual is 88.2. In our sample > 45%
of the total variance in maintainability is accounted for by the metrics in the sample, and > 40% in the population. It is interesting to investigate the same regression
without comments as independent variable. R-square and residual values remain unchanged, while the probability > F is a bit smaller. Obviously comments do not account much in the prediction model which was
already clear based on the cluster analysis. The standard error in both cases for all predictions is below 1.165,
which is a good result given the scale of [ 1,5] for the quality variable.
We conclude that the prediction of maintainability from the given metrics is possible.
6.3. Analysis 3: Discriminant analysis
The third step of the statistical analyses that give insight in the validity of given metrics as quality indicators is typi- cally a real prediction based on some kind of discriminant
analysis. Although more advanced analyses are feasible, we will stick to one because the effects are similar [2]. The
chosen discriminant analysis tries to fit hyperplanes in a space built up by the independent variables. The planes
separate the 5 classes of the dependent variable, i.e. main- tainability (Table 2).
In total 52.11% of all cases had been classified correctly. Below 10% of all samples were predicted as belonging to the opposite side of the dependent variable (i.e. difference between real value and predicted value > 2).
7. Discussion and further results
Based on the three analyses we conclude that there is a
C. Ebert, I. MorscheWInformation and Software Technology 39 (1997) 497-509
Table 2 Results of discriminant analysis
507
Maintainability Calculated N 1 2 3 4 5 Real
1 12 I I 1 0 3 2 5 0 1 2 1 I 3 II 0 0 I 1 3 4 12 0 1 2 9 0 5 31 1 2 0 15 13
strong statistical relationship between object-oriented
metrics used within SmallMetric for decision support during development and maintainability. Maintainability can be
predicted from combinations of metrics collected from
Smalltalk.
Based on the described set of metrics we investigated
different projects both from industry and academia to pro- vide practical guidelines. Our experiences with quantitative support during the development of object-oriented software projects can be summarised as follows:
Provide continuous integration of new components instead of one big integration effort near delivery date.
Attack risks during the whole development process actively based on early assessments, especially based on quantitative feedback. Both risk and progress is
measured in the product and less in supporting documentation. Quantitative feedback of the product to the engineers and
project manager helps in much better scheduling of the next steps (e.g. testing, integration, supportive tool delivery, configuration management). Metrics used as quality indicators during design should cover at least the diversity of object-oriented software in terms of
classes (i.e. volume, complexity, structure), messages (i.e. interface structure, communication between enti- ties), and processes (i.e. dynamic structure, run-time
dependencies, method usage). Rapid changes are a source of risk because they are a sign
of unstable architectures. Quantitative feedback on changes in each class help design co-ordinators during
the design, and even more important the test co-ordina- tors during test to immediately detect critical areas that need thorough investigation. For instance, changes in several classes with same frequency clearly indicate
inherent ripple effects that are better removed early than during maintenance. Industrial software projects, such as switching systems,
are typical functionally structured. Since object-oriented
software emphasises on data structuring, many data structures that were originally part of a module are new classes. It is therefore impossible to reuse quantita- tive design heuristics of former projects when moving to object-oriented technology in parts of legacy software (as is the case for most large industrial projects).
Our investigations of more classes of Smalltalk projects
provide some generalised suggestions related to the design and implementation of classes that due to having analyzed
more projects go beyond the described analysis. They seem to be very clear in theory, however practical use often shows the opposite:
l Volume:
Number of object attributes or instance variables per
class; Number of methods per class - maximum 30;
l Structure:
Number of message passing per methods - maximum 30; Cyclomatic complexity of methods - maximum 5;
Nesting depth within methods - maximum 5;
l Cohesion and Coupling:
Existence of an accessor outside of a predefined proto- col; Number of external message passing per method - maximum 5;
External access of methods under the protocol private - maximum 5;
l Inheritance:
Number of predecessor classes - maximum 5; Number of successor classes - minimum 2, maximum
10; Number of changes to inherited methods - maximum
5 (in fact zero overriding would be the best; however in some cases it cannot be avoided);
0 Class organisation:
Number of characters of an identifier - minimum 7;
Comments must be given; suggested comment density - minimum 0.2 Editing distance of identifiers - minimum 3.
The measured values were analysed with respect to boundaries (minimum, maximum), intervals, deviations from the average, and nonparametric correlations between them. The interpretation was performed according to these criteria and used as follows:
508 C. Ebert, I. MorscheUlnformation and Sofhvare Technology 39 (1997) 497-509
1.
2.
3.
Sustain a high comprehensibility level by providing a sufficient length of descriptive parts in all design objects and object names with meanings, rather than enumera-
tion such as ‘class-l ‘. The descriptions should include
subclasses or inheritance relations, changes of inherited
methods or variables, functionality, related objects, used
data items, date, author, test cases to be performed,
requirements fulfilled, management activities and staff connected with this project.
During class and object design, the metrics and their statistical evaluation (regarding similar projects) are
taken to distinguish between different designs (e.g. alter- native approaches, division into subclasses). Rules that
can be tailored to support design include size limits (e.g. lines per method) or functionality limits (e.g. methods per class, variables per class). It should be clear that for
distinct applications and libraries these limits must be adjusted taking into consideration for instance the design
experience, the available resources, or the requested
quality. During reviews at the end of design and coding, the
metrics are taken as indicators for weak components (e.g. inadequate inheritance hierarchy, unsatisfying
object description) and as indicators for process manage- ment (timely ordered number of classes or volume metrics).
After applying such metrics to several projects, the results obtained can be used to define intervals or limits for project- specific metrics in order to achieve more tailored quality indicators.
8. Summary
Most complexity metrics have been designed without
regard to the problem domain and the programming envir- onment. There are many aspects of complexity and a lot of design decisions which influence the complexity of a pro- duct. This paper presents an approach to integrate software metrics with design support for object-oriented techniques
based on Smalltalk. A tool environment for program analy- sis called SmallMetric that incorporates metrics and guide-
lines for improving programs has been developed. Based on this set of metrics we investigated different projects both from industry and academia to provide practical guidelines.
This approach to integrate a measurement tool system into Smalltalk illustrates a way to minimise the efforts for implementation and maintenance of such a tool and shows how to cope with changes in future requirements for such tools and their individual interfaces. By transforming the object-oriented information representation into another language it is possible to integrate such measurement techniques into other environments as well.
Collecting metrics in the running projects helps in build- ing up a historical database that can be further used for
better estimates and risk assessments in following projects. Be careful, however, not to measure everything; 6.. .12 metrics should be sufficient which is substantiated by the
factor analysis we described that reduces factor dimension-
ality to 5 and less. It is harder to tell what is important than
to say what can be measured. Clearly the traditional com-
plexity metrics, such as cyclomatic complexity or function
size are not measuring the significant aspects of complexity in object-oriented systems. Sticking to lines only does not
help in identifying the maximum size of methods. We rarely see high cyclomatic numbers or numerous lines of code in
methods of Smalltalk programs, but this does not mean that Smalltalk programs are better than C programs. It only indi-
cates that Smalltalk utilises a paradigm that distributes func- tionality into smaller units.
With an early analysis of software products we are able to
provide developers with helpful hints to improve their designs and code during the development process and not at the end when it win be much more expensive. By follow-
ing the given suggestions we could improve designs and achieve better programs in terms of such quality items as understandability, reusability and maintainability. Of
course, much more research is necessary in order to provide complete guidelines for achieving high quality designs. We consider this approach also as a vehicle towards measuring
productivity and estimating effort early in the analysis and design of object-oriented software. The basic step, however, still is the measurement and evaluation of software com-
plexity as early as possible: during the software develop- ment process when the most expensive faults are induced (e.g. inheritance trees). By making software engineers
aware that there are suitable techniques and tools for analys- ing their programs, even when they are object-oriented, this could be a small step to avoid a similar software crisis to
what we are currently facing in procedural environments.
Acknowledgements
The assistance of the Landis & Gyr corporation, Switzer-
land, to provide product and process data of object-oriented projects is gratefully acknowledged. Several discussions with A. Riegg of Debis in Stuttgart contributed to the pro-
posed guidelines.
References
[l] N.E. Fenton, S.L. Pfleeger, Software Metrics: A Rigorous and Prac-
tical Approach, 2nd edition, Thomson Computer Press, London, UK,
1996.
[2] C. Ebert, Evaluation and application of complexity-based criticality
models, Proc. of the 3rd Int. Software Metrics Symposium
(METRICS96), IEEE Computer Sot. Press, Los Alamitos, CA,
USA, 1996, pp. 174-185.
[3] A. Goldberg, D. Robson, Smalltalk-80: The language and its imple- mentation, Addison-Wesley, 1983.
C. Ebert, I. MorscheUlnformation and Software Technology 39 (1997) 497-509 509
[4] C. Ebert, Visualization techniques for analysing and evaluating soft-
ware measures, IEEE Transactions on Software Engineering 18 (1 I)
(1992) 1029-1034.
[5] G. Stark, R.C. Durst, C.W. Vowel], Using metrics in management
decision making, IEEE Computer 27 (9) (1994) 42-48.
[6] B.A. Kitchenham, S.G. Linkman, D.T. Law, Critical review of quan-
titative asessment, Software Engineering Journal 9 (3) (1994) 43-53.
[7] S.R. Chidamber, C.F. Kemerer, Towards a metric suite for object
oriented design, Proc. of Conf. on Object-Oriented Programming Sys-
tems, Languages, and Applications (OOPSLA), Sigplan Notices 26
(11) (1991).
[8] W. LaLonde, J. Pugh, Gathering metric information using metalevel
facilities, Journal of Object Oriented Programming 6 (1994) 33-37.
[9] B. Henderson-Sellers, J. Edwards, BOOKTWO of Object-Oriented
Knowledge: The Working Object, Prentice Hall, Sydney, Australia,
1994.
[IO] M. Lorenz, J. Kidd, Object-Oriented Software Metrics, Prentice Hall
Object-Oriented Series, Englewood Cliffs, USA, 1994.
[I I] K.J. Lieberherr, I.M. Holland, Assuring good style for object-oriented
programs, IEEE Software 6 (9) (1989) 38-48.
[12] S. Karunanithi, J.M. Bieman, Candidate reuse metrics for object
oriented and Ada software, Proc. Int. Software Metrics Symposium,
IEEE Computer Society Press, New York, 1993, pp. l20- 128.
[ 131 N.I. Churcher, M.J. Shepperd, Towards a conceptual framework for
object-oriented software metrics. Software Engineering Notes 20 (2)
(1995) 69-75.
[ 141 R. Sharble, S. Cohen, The object-oriented brewery: A comparison of
two object-oriented development methods, Software Engineering
Notes 18 (2) (1993).
[ I.51 I. Morschel, An intelligent tutoring system for the learning of object-
oriented programming, Proc. EAEEIE’93, Prague, 1993.
[ 161 K. Beck, To accessor or not to accessor?, The Smalltalk Report, vol. 2,
no. 8, 1993.
[17] D.L. Pamas, P.C. Clements, D.M. Weiss, The modular structure of
complex systems, IEEE Transactions on Software Engineering 11 (3)
(1985) 259-266.
Recommended