Metrics for quality analysis and improvement of object-oriented software

Information and Software Technology 39 (1997) 497-509 ELSEVIER

Metrics for quality analysis and improvement of object-oriented software

Christof Ebert”,*, Ivan Morschelb

aAlcatel TeleEom, Switching Systems Division, B-2000 Anhverpen, Belgium bDaimlt-r Benz Research Centre, Urn, Germany

Received 13 August 1996; revised 15 November 1996; accepted 21 January 1997

Abstract

Software metrics are playing an important role in analysing and improving quality of software work products during their development. Measuring the aspects of software complexity for object-oriented software strongly helps to improve the quality of such systems during their development, while especially focusing on reusability and maintainability. It is widely accepted that more widespread use of object-oriented techniques can only come about when there are tool systems that provide development support beyond visualising code. Unfortunately, many

object-oriented metrics are defined and applied to classroom projects but no evidence is given that the metrics are useful and applicable - both from an experience viewpoint and from a tools viewpoint - for industrial object-oriented development. Distinct complexity metrics are developed and integrated in a Smalltalk development support system called SmallMetric. Thus, we achieve a basis for software analysis (metrics) and development support (critique) of Smalltalk systems. The main concepts of the environment including the underlying metrics are explained, its use and operation is discussed, and some results of the implementation and its application to several industrial projects are

given with examples. 0 1997 Elsevier Science B.V.

Keywords: Development support; Maintainability; Metrics; Object-oriented metrics; Quality control; Smalltalk

1. Introduction

Software metrics are measures of development processes and the resulting work products. In this context we will

focus on metrics that are applicable to software developed

in Smalltalk. We will further concentrate on such metrics

that can be used as quality indicators during the development process, hence providing support for the developers. These metrics are often classified as product metrics

because their inputs are products of the development pro-

cess. We will not distinguish metrics and measures from a mathematical point of view. When referring to complexity

metrics we are using this phrase for a group of software metrics that measure structural or volume aspects of products that are intuitively related to parts difficult to under-

stand. These difficulties in dealing with such complex components have been proved to cause high error-rates,

testing effort and bad maintainability (for further details

on metrics see 111). Object-oriented programming practices per se will never

make poor programmers into good ones. The opposite is the

* Corresponding author. Tel.: +32 3 240 4081; fax: +32 3 240 9935; e-mail: [email protected]

0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PI1 SO950-5849(97)00005-O

case, especially while reflecting current surveys that show that corporate adoption of object-oriented technology has tripled since 1994, while at the same time, it is considered

as having the lowest success rate among new technologies. Or in the words of one recent workshop summary: “Realis-

tically, object-orientation will deliver benefits, but fewer of

them, and with more effort required on the part of the organisation, than the promises of its most zealous advocates imply.” Typical class libraries consist of look-alike classes

that drown programmers, classes that cannot delete

unwanted properties, encapsulation at the class level that

severely limits reuse (i.e. hiding too much details of overly specialized classes), or static class hierarchies undermined with quickly evolving information domains. As a result

most programmers rather copy the class, change it, and create a new member to the library. It is highly necessary to give during design insight in design decisions that improve reuse instead of hiding valuable concepts in a

flood of case-specific details. Because of the extreme subjectivity of quality attributes

per se, it is important to select metrics that can be applied to the specific objectives of a project, that have been derived from the project’s requirements and can be used to prove consistency, that can be applied during several phases of the

498 C. Ebert, I. MorscheUInformation and Sofrware Technology 39 (1997) 497-509

development process on resulting products (design, code, documentation, etc.), and that can be collected and

evaluated automatically by the development environment

in use. This article introduces complexity-based classification

techniques as a basis for constructing quality models that can identify outlying software components that might cause

potential quality problems. Such models are based on the

experience that typically a rather small number of compo-

nents (e.g. methods) has a high failure rate and is most difficult to test and maintain. Our own project experiences

for instance just recently showed that 20% of all components in telecommunication projects were the origin of over

40% of all field failures with high priority. Even worse is the

fact that we could also show that it is not so difficult to identify these components ex ante - either by asking

designers and testers and grouping their subjective ratings, or by applying classification rules based on simple structural software metrics [Z].

Effects of applying complexity-based criticality predic-

tion to a new project can be summarised as follows:

l 20% of all modules in the project were predicted as most

critical (after coding), and l these modules contained over 40% of all faults (up to

release time).

Knowing that

l 60% of all faults can theoretically be detected before

system integration, and l fault correction during module test and code reading

costs less than 10% compared to fault correction during system test.

Twenty-four percent of all faults can be detected early by

investigating 20% of all modules more intensively with 10%

of effort compared to fault correction during system test, therefore yielding a 20% total cost reduction for fault correction. Additional costs for providing the statistical analysis are in the range of two person days per project.

Necessary tools are off the shelf and account for even less per project.

In this context the paper addresses typical questions often asked in object-oriented software engineering projects:

How can I identify early the relatively small number of critical components that mainly contribute to bad quality identified later in the life cycle?

Which components should be better redesigned because their maintainability is bad and their overall criticality to the project’s success is high? What is the benefit of introducing a metrics program that investigates structural properties of software? Are there structural properties that can be measured early in the code to predict quality attributes? Can I use the - often heuristic - design and test know- how on trouble identification and risk assessment to

build up knowledge-based systems that help to identify critical components early in the development process?

l Last, but surely not least, is it after all feasible to auto-

mate such classification in an object-oriented develop-

ment environment?

The paper is organised as follows. The introductory Sec- tion 2 presents a brief overview of background and problems

associated with metric-based classification models in the

context of object-oriented development. Section 3 gives a

very brief overview of the basic concepts of object-

orientation and of Smalltulk. We will describe a programming analysis environment for SmaZltulk-80 [3], that we

selected because of the language’s perceived uniformity and elegance. The syntax of Smalltalk is easy to understand,

it possesses a small number of operators (in contrast to C++), and it completely supports the notion of object,

class, and inheritance. This article presents a basic set of metrics to support the development of object-oriented pro-

grams as well as a tool to automatically measure and to judge programs written in Smulltulk. Sections 4 and 5

describe the selection of metrics for object-oriented software and a tool environment called SmallMetric. Results

from applying the analysis environment in industrial projects are presented in Section 6. Conclusions and a brief summary with an outlook for further work are given in

Sections 7 and 8.

2. Metrics for quality models

Although striving for high quality standards, only a few organisations apply true quality management. Quality man-

agement consists of proactively comparing observed quality

with expected quality, hence minimising the effort expended on correcting the sources of defect, In order to

achieve software quality, it must be developed in an organised form by using defined methods and techniques and applying them consistently. In order to achieve an indi-

cation of software quality, software must be subjected to measurement. This is accomplished through the use of

metrics and statistical evaluation techniques that relate specific quantified product requirements to some attributes of quality.

The approach of integrating software measures and statistical techniques is shown in Fig. 1. The object-oriented

CASE environment provides the formal description of

different products developed during the software life-cycle with its necessarily defined methods and the underlying process. Multivariate statistical techniques provide feedback about relationships between components (e.g. factor analysis, cluster analysis) [2,4]. Classification techniques help determining outliers (e.g. error-prone components) [4-61. Finally, detailed diagrams and tables provide insight into the reasons why distinct components are potential outliers and how to improve them [ 1,4].

C. Ebert, I. MorscheUInformation and Sojiware Technology 39 (1997) 497-509 499

Fig. 1. Measures and statistical techniques in software engineering.

Product metrics are used to supply models for [ 1,2,5-71:

l estimating effort and costs of new projects; l evaluating the productivity to introduce new technolo-

gies (together with their methods and tools);

l measuring and improving software quality; l forecasting and reducing testing and maintenance effort.

Quality or productivity factors to be predicted during the development of a software system are affected by many product and process attributes, e.g. software design characteristics or the underlying development process and its

environment [ 1,5,6]. Quality models are based upon former project experiences and combine the quantification of aspects of software components with a framework of rules

(e.g. limits for metrics, appropriate ranges etc.). They are generated by the combination and statistical analysis of product metrics (e.g. complexity measures) and product or process attributes (e.g. quality characteristics, effort, etc.) [5,6]. These models are evaluated by applying and comparing exactly those invariant figures they are intended to predict, the quality metrics (e.g. maintainability, error rate,

etc.). Iterative repetition of this process can refine the quality models hence allowing the use of them as predictors for similar environments and projects. For assessing overall quality or productivity, it is suitable to break it down into

its component factors (e.g. maintainability), thus arriving at

several aspects of software that can be analysed quantita- tively. Typical problems connected to data collection, ana-

lysis, and quality modelling are addressed and discussed comprehensively in [ 1,5,6].

Classification or clustering algorithms are mathematical

tools for detecting similarities between members of a collection of objects. Classification algorithms can be loosely categorised by the underlying principle (objective function, graph-theoretical, hierarchical) or model type (determinis- tic, probabilistic, statistical, fuzzy). Information about the objects (e.g. software components) to be analysed is input to classification algorithms in the form of metric vectors. The elements of a metric vector are the measurements of distinct software features that have been chosen as a basis for comparing a specific object to other objects. The output of a classification or clustering algorithm can then be used to

500 C. Eberi, 1. MorscheVlnformation and Software Technology 39 (1997) 497-509

classify the objects into subsets or clusters. The classification of metric vectors can be performed with the information about different classes (e.g. errors, change-rate). The train-

ing of any classification algorithm using this kind of infor-

mation is called supervised. If the algorithm classifies the

data autonomously the training is unsupervised. We will

further focus on supervised learning because quality metrics

are provided within training data sets. Metric vectors assigned to the same cluster are in some

sense similar to each other, more so than they are to other metric vectors not assigned to that cluster. Instead of pre-

dicting number of errors or changes (i.e. algorithmic rela-

tionships) we are considering assignments to groups (e.g. ‘high maintainability’). While the first goal has been

achieved more or less with regression models or neural net-

works predominantly for finished projects, the latter goal

seems to be adequate for predicting potential outliers in running projects, where preciseness is too expensive and unnecessary for decision support.

Due to successful application in many projects such metrics obviously should be available for object-oriented

environments. The goals are the same as for procedural systems, primarily indicating potentially troublesome classes that should be improved before being introduced

to the class libraries. The object-oriented paradigm could directly profit from metrics as a vehicle to instruct staff who are new to this approach. Furthermore, software metrics could be used to measure the problems to introduce this paradigm and its acceptance as well as to set design stan-

dards for an organisation. Traditional metrics for procedural approaches are not

adequate for evaluating object-oriented software, primarily because they are not designed to measure basic elements like classes, objects, polymorphism, and message-passing. Even when adjusted to syntactically analyze object-oriented

software they can only capture a small part of such software and so can just provide weak quality indication [g-lo]. Even dedicated research on metrics for object-oriented programs gave only rough guidelines, such as limiting the size

of methods in Smalltalk to 12 lines, without any indication on how to taylor project-specific design guidelines [IO]. It is hence important to define customised metrics for object- oriented programs. Additionally the characteristics of the

target language should be considered. Some languages directly support the object-oriented approach (C + + , Smalltalk, Eifsel) and others just to some extent (Ada). Other factors like the size and contents of the class library and the semantics and syntactical form of particular commands should also be considered.

3. Object-orientation and Smalltalk

Object-oriented modelling and programming is based on four fundamental concepts, namely abstraction, inheritance, encapsulation within classes, and polymorphism. For better

understanding of the approaches described later, we will try to give a rather brief summary about interesting features of

object-orientation with respect to the Smalltalk programming language. In this paper we use the Smalltalk terminol-

ogy. As can easily be imagined the term ‘object’ plays a

central role in object-oriented programs. It encompasses

data structures that describe its state, and methods that rea- lise its functionality. Data structures are encapsulated and

provide information hiding with respect to their object

which means that they do only offer access functions, called methods, but no direct use of the internal data structures.

Objects communicate with each other via message passing which means that one method starts a method in another

object. Mechanisms to hierarchically structure objects in

classes exist in all object-oriented languages. Instances can be derived from classes and differ from other objects

only on the basis of associated states. Another important characteristic is the possibility of

incrementally defining class hierarchies. This is done by

the inheritance mechanism. From a superclass, a subclass inherits all its data structures and methods. In the subclass, new data structures and methods can be defined or they can be rewritten. Smalltalk’s inheritance mechanism for exam-

ple is designed to model software evolution as well as to classify.

Smalltalk supports the object-oriented concepts fully. It manipulates classes, objects and implements a single inheri-

tance mechanism. It does not include multiple inheritance, prototypes, delegation, or concurrency in its standard ver- sion. In addition to its programming language, Smalltalk and

its current derivatives and flavours include an open programming environment to develop object-oriented programs. It offers a comfortable graphical user interface, several helpful tools and a vast class library. Programming language and environment coexist in a homogeneous form,

where concepts at the programming level are reproduced in the environment. An example is the message passing

mechanism. In Smalltalk programs, it means the activation of a method. The same strategy is used in the user interface to identify a selected object. This message passing consists

of: an object (the receiver of the message), a message selector and optional arguments as parameters.

Object behaviour is described by the mentioned methods,

which have a selector and include Smalltalk commands. In different classes, methods with the same selector can exist.

This is called polymorphism. The status of an object is

captured through class and instance variables that might be accessed from outside by suitable methods. Class variables are defined at the class level and instance variables at the object level to store object’s states.

4. Metric analysis and development support for object- oriented software

Goals such as quality improvement, increasing productivity

C. Ebert, I. MorscheWlnformation and Sofhvare Technology 39 (1997) 497-509 501

or maturity certification are of growing interest in industry.

Navigating the way with metrics is one important approach

to ensure that a company stays on the course of achieving

these goals. Though the search for underlying structures and rules in a set of observations is performed in software quality control and effective solutions to refine forecasting

methods based on past data have been suggested, so far their applicability to object-oriented software development has

been restricted.

There is a growing awareness that such approaches could also support the object-oriented software development process. Anybody starting with object-oriented software raises

similar questions that also serve as guidelines for develop- ing a measurement tool environment for quality control:

What is good style in object-oriented programs?

Are there any rules that can be applied to develop a good

object-oriented program? Which metrics could be employed in order to determine

if a program is ‘good’ or not? What contributes to the complexity of an object-oriented

system?

Based on such questions there has been substantial work concerning the definition of metrics for object-oriented pro-

grams, however most approaches have been based on number crunching after the facts and only a few industrial environments that include complete tool-support were

described. With the broader application of this paradigm quality control, both analytic (i.e. with metrics) and con- structive (i.e. by providing design and help facilities) quality

control are of increasing importance. One of the first attempts to investigate quality aspects of

object-oriented programs was done by Lieberherr and col- leagues [ 111. They defined a set of design rules that restricts

the message-sending structure of methods. It was called Law of Demeter. Informally, the law says that each method can send messages to only a limited set of objects: to argument objects, to the self pseudovariable, and to the immediate

subparts of self (self being the object or class itself). The Law of Demeter thus attempts to minimise the coupling

between classes. Many applications of metrics for object-oriented pro-

grams originated from transforming metric-concepts for

procedural programs (e.g. message passing, calling structures, cyclomatic complexity) [7,9,12]. It is of course clear that many concepts hold for object-oriented software

as well and thus might be questioned as influencing parameters when it comes to measuring achieving objectives (e.g. volume, class tree). However, such metrics do not cover

completely the relevant aspects of coupling, such as inheritance or polymorphism. Other approaches suggest metrics that go beyond such transformations and really focus on object-oriented descriptions, but do not offer any guidelines for using the metrics in practical projects [8,13]. Sharble and Cohen compare two object-oriented development methods using an object-oriented brewery as example [14]. They

suggest indicators to enhance the software quality by

increasing cohesion, reducing coupling, increasing poly-

morphism and eliminating redundancy. Many metrics

have been defined and applied to a toy environment but no evidence was given that the metrics are useful and applicable - both from an experience viewpoint and

from a tools viewpoint - for practical object-oriented development.

Our approach for selecting appropriate metrics is goal-

oriented, rather than following already published literature that often measured what seemed measurable. Such a mere definition of a metrics suite combined with statistical num-

ber crunching without intuitive backgrounds would result in the same acceptance problems procedural metrics applica-

tions ran into during the eighties [l]. As long as the objec-

tives of a object-oriented development process are not stated and supported with tailored methods, metrics would be of no

practical help. We therefore focused on the product and its inherent quality attributes and then determined how to mea-

sure their achievement during design. The process for building a development support environ-

ment thus closely followed measurement theory [ 11:

1.

2.

3.

4.

5.

Identify and define intuitive and well-understood attributes of software quality and productivity that should be achieved during the project. Here we selected reuse and

maintainability. Determine metrics that clearly specify these desired attri-

butes. We built this set of quality metrics (as opposed to quality indicators that are determined in a later step) upon interviews with designers and retrieved few repro- ducible quality metrics related to maintainability and

reuse. Most were based on ranking classes of past

projects. Specify precisely the underlying documents, structures, or attributes of these documents to be measured. We identified as objects to be measured SmalEtulk programs from the beginning of a project onwards.

Determine formal models or abstractions which relate the quality attributes to properties of the underlying

documents and their individual attributes. Based on the interviews with designers we built relations between what reusability and maintainability means and how it

relates to distinct product properties. Define metrics that measure these selected product properties. These metrics must be available during design because otherwise they cannot be used as indicators for

the already selected quality metrics. We thus call them quality indicators.

6. Validate these metrics based on past projects.

To overcome the stated problems related to practical

applicability and necessary tool support we introduce Small-

Metric, a tool to evaluate and meliorate object-oriented programs written in Smalltalk. It is embedded in an environment for the learning of object-oriented programming [1.5].

502 C. Ebert, I. MorscheUInfomation and Software Technology 39 (1997) 497-509

5. A description of the object-oriented metrics framework

SmallMetric analyses object-oriented programs by applying construction rules that distinguish between (Fig. 2):

. the static and dynamic structure of a class or an object; l the static and dynamic relationships between classes and

or objects.

The metrics that are presented in the following list repre- sent different aspects of object-oriented software. We will describe the underlying intuition of the metrics as well as a comprehensive summary of our observations from object- oriented development projects.

5.1. Metric 1: Volume

The volume of an object is a basic size measure that is intuitively connected with the amount of information inside a class. Many empirical investigations of metrics showed relationships among size metrics and comprehensibility or number of errors [7,10,12]. Volume thus is a potential indicator of the effort to develop an object as well as for its future maintenance. The larger the number of variables and methods, the more specific for one application. In other words, the object’s reusability is likely to be small with increasing volume. Volume can be measured by:

l number of variables (class and instance variables); l number of methods (class and instance methods).

Both metrics measure mere volume and do not account for the impact of inheritance and reuse, which will only be seen within a whole suite of metrics.

5.2. Metric 2: Method structure.

The internal structure of an object based on its methods and the information that is accessed by them is an indicator

of its functionality. If the methods are overloaded with information to pass back and forth, there is good reason to assume that the object or class should be broken into several objects or classes. Method metrics are used to forecast effort for debugging and testing early. Method structure can be measured by:

l number of parameters per method; l number of temporary variables per method; l number of message passing per method.

5.3. Metric 3: Cohesion

The term cohesion is frequently used in software engineering to designate a mechanism for keeping related things

together. Cohesion can be defined to be the degree of similarity of methods. The higher the degree of similarity of methods in one class or object, the greater the cohesiveness of the methods. Cohesion in Smalltalk means the organisation of methods, which set or access the value of a class or instance variable, under predefined schemes (protocols).

These protocols are predetermined in Smalltalk. The programmer can use them to manipulate variables of an object. Such methods are called accessors [16]. The intuitive base is that direct reference to class and instance variables limits inheritance by fixing storage decisions in the superclass that can not be changed in a subclass. Besides, modifications in the structure of these variables are not visible to other methods, just to the accessors. Hence, the effort to extend or to modify a given program is minimised.

As an example consider an instance variable instVar of an object anobject. To access the class and instance variables it is necessary to define two kind of methods:

l one method for getting the value of an instance variable

instVar

* insWar

INTER- INTRA-

classes / objects classes I objects

used inherited variables naming

us& inherited methods number of variables

external use of methods number of methods

coupling: prdocol ‘jxivatd useofprcbcois

abstractclasses number of parameters I method

number of messagespassing~~

cohesion: preckftmed protocols

Fig. 2. Taxonomy for SmallMetric.

C. Eberf, 1. MorscheUlnformation and Software Technology 39 (1997) 497-509 503

l and another for setting an instance variable

instvar: aValue

instVar := aValue

This solution forces all accesses to variables to go through an accessor method. Therefore, information hiding with respect to variables and methods in a class is enforced [ 171. SmallMetric examines a Smalltalk program to find accesses to variables outside of the predefined protocols. This is called a cohesion violation of an object.

5.4. Metric 4: Coupling

Coupling designates the interaction between objects that are not related through inheritance. Excessive coupling between objects besides inheritance is detrimental to modular design and prevents reuse. The more independent an object, the easier it is to reuse it in another project [ 1,111. The suggested metric is:

. number of invoked classes.

A predefined scheme in Smalltalk is the protocol private. It comprehends methods that should only be activated inside an object. The Smalltalk compiler or interpreter does not check these specific accesses. When a message from another object starts a method under this protocol, undesirable effects can occur because during development such access had not been anticipated. SmallMetric tries to identify such references.

5.5. Metric 5: inheritance tree

This group of metrics analyses the amount of inherited variables and methods used by a class. The use of inherited methods and data in a class indicates the difficulty of chan- ging superior classes. On a low level of the inheritance tree variables and methods available to a class could be changed in meaning several times on higher levels, thus increasing complexity even more. It is hence necessary to provide information about how many methods and variables are available to a distinct class. The metrics are:

l inherited variables used; l inherited methods used.

In Smalltalk, an instance variable can be directly set by an object of a subclass. This can reduce the reuse of a class in other applications. SmallMetric nominates it an ‘information hiding violation’ (Fig. 3).

5.6. Metric 6: Class organisation

This group of analyses captures three comprehensibility indicators: naming, checking of comments and the use of predefined protocols. Naming analyses all identifiers of a class. SmallMetric informs the developer about their

distribution. This metric has just documentation purposes. The existence of comments within an object is also checked. In Smalltalk, one can define a global comment to clarify the intents and functionality of an object. SmallMetric warns when there is no such comment provided. It is clearly impossible to check comments based on contents, so we focused only on existence. The programmer may organise the methods of an object under predefined protocols. The Smalltalk environment advises the developer to use these recommendations which is checked by SmallMetric. For novices, these protocols can help to elucidate some aspects of a Smalltalk program.

6. Experiences with SmallMetric for Smalltalk program analysis

Upon starting SmallMetric a window is opened, which inquires the name of the class to be analysed. Wildcards (*) can be used. When the given class is found, a new window is created (Fig. 4). It presents the following information:

1. number and list of all variables; 2. number of methods; 3. buttons to switch between class and instance 4. predefined protocols used; 5. naming; 6. violations of SmallMetric metrics.

Four buttons are provided to select a new class, to print the information of a class, to switch between different dia- logue languages (now English and German) and to activate Help. The critique window of course can be adjusted to specific standards and process guidelines of an organisation. It has a menu, which presents the design limits for development support.

SmallMetric comprises a basic set of guidelines for metric-based development support of Smalltalk applications. On the basis of the metrics above as applied to Small- talk projects with available quality data, we extracted some design guidelines to enhance the quality of object-oriented programs written in Smalltalk. Because one of the main reasons for using object-oriented technology is reusability, we focused our evaluations on maintainability and reusability. Such guidelines should be understood as recommendations and not as restriction of a programmer’s creativity. The projects being analysed ranged in size from few classes to 400 classes of a commercially available Smalltalk-based tool, thus covering effort of up to 30 person years. Our approach for extracting guidelines of metrics that can serve as indicators of poor quality is based on analysing the classes with respect to complexity metrics and quality data. Since the metrics are applied on different scales (complexity metrics: at least interval scale and quality metrics: ordinal scale) we performed non-parametric statistical methods for correlations and factor analysis. Unlike in

504 C. Ebert, I. MorscheWInformation and Sojiware Technology 39 (1997) 497-509

Object subclass: #Superclass instanceVariableNames: ‘text classVariableNames: ” poolDictionaries: ” category: ‘SmallMetric’

Superclass subclass: #Subclass instanceVariableNames: ” classVariableNames: ” poolDictionaries: ” category: ‘SmallMetric’

Subclass class methodsfor: ‘instance creation’ new

“super new initialize

Subclass methodsfor: ‘initialize release’ initialize

text := ‘This is an example of an information hiding violation !!’

Fig. 3. An information hiding violation.

other approaches we do not discard outliers, because it is - a SmaZLtuZk project related to a development environment

ex ante - unknown what classes or metrics are outliers. with simulation on an Occam real-time engine. The whole

Instead all metrics are normalised with a quadratic approach project was performed by a five person core development

before comparing or ranking them. team in a timeframe of several years. 71 classes have been

To show the practical application of SmallMefric we took provided and we investigated these classes with Small-

Small-Critic on Bibliothek-llserlnterface

Bestandsverwdung Mdnwesen .- Verwsltung LeitHele Likraturnachweis Perscnslverwsfhng

65 (27) Class messages at all. 7 (7) Class messages to SELF. 2 (1) Class messages to a super class. 0 (0) Class messages to MODEL. 0 (0) Class messages to VIEW 0 (0) Class messages to CONTROLLER. 56 (21) Class messa to c&her classes. 0 (0) class messego 0 (0) Class message 0 (0) Ctnss message 0 (0) Class message

Maximal number of meeeagas in a Method:

Mexbnsl number 04 Ymthodm:

Marimsl number af Varisbles:

VdUlll& hWmal kngthofnemes: There ere 6 varisbles in this class (more than 5) ! Max. number d cencerned cleereer

Structure d method+: Yawimsl mmlber d mmmmagmm to otiside:

6 Class methods.

Fig. 4. The user interface of SmallMetric.

C. Eben, I. h4orscheUlnformation and So&are Technology 39 (1997) 497-509 505

cohesion quality

0 20 40 60 80 100 120 140 160 180

voi-meth

Fig. 5. Scatterplot of number of methods (axis) and cohesion (ordinate) with respect to maintainability (shape of dots on a 5-level scale) for all samples.

Metric. Before the object-oriented metric collection, how-

ever, we interviewed the development team to find out what the ranking of classes in terms of maintainability is, This -

subjective, yet experience-driven - ranking was further exploited for analysing the structural metrics on their valid-

ity as indicators of maintainability. Maintainability involves a longitudinal study which we achieved by interviewing

designers after the facts of the first design - which we investigated - when they were already maintaining it for

more than a year. The approach can be generalised because it follows exactly to what has been proposed by measure-

ment theory (see Section 4). Fig. 5 shows all 71 samples that were investigated in the course of this analysis (axis is volume in terms of methods, ordinate is cohesion metric, and the shape represents the perceived maintainability). The scatterplot relates volume in terms of number of methods

and cohesion on the two axes together with maintainability

as the shape of the dots. Fig. 6 shows for one metric, number of methods, the

sualr & __ . . . . . __ __ .._ . . . . . . . . . . .

m

variance related to the five levels of maintainability assigned to each sample. The box is drawn with the mean

in its centre and one standard deviation size to both directions. Outliers are indicated by the whiskers. In the case of

few samples the standard deviation makes not so much sense which results with whiskers inside the box.

Not all object-oriented metrics calculated by SmallMetric

will be discussed in the context of this article due to space restrictions. We will further concentrate on the following metrics that have been condensed with factor analysis (i.e.

factor analysis has been applied for reducing the dimension- ality of the original metrics by focusing on what are orthogonal factors; factor analysis typically replaces the original metrics which is why in two cases so-called hybrids were

added that mainly reflect structure and cohesion):

l vol-meth: number of methods; . vol-vur: number of variables; l str-meth: method structure (this is a hybrid of structural

metrics regarding the code of a method);

. . . . . . . ._ . . . . . . . . . . . . . f. . . . 0 vol.math

Cl-. ,LL.

1 I 0 20 40 60 80 100 120 140 160 180

Fig. 6. Box-Whisker plots for number of methods and the related maintainability levels of a11 samples. Boxes are with mean in the center and one standard

deviation length in both directions, which explains why in cases with few samples the outliers are within the boxes.

506 C. Ebert, I. MorscheUInformation and Software Technology 39 (1997) 497-509

Table 1

Results of the Spearman rank correlation

vol-meth vol-var str-meth cohesion comments quality

vol-meth 0.800 0.607 0.721 - 0.131 - 0.691

vol-var 0.413 0.754 - 0.194 - 0.569

str-meth 0.414 0.077 - 0.439

cohesion - 0.101 - 0.675

comment - 0.023

l cohesion: cohesion (this hybrid of primarily the men-

tioned cohesion metrics mainly focuses on access to variables outside the predefined protocols);

l comments: comments (this metric was only added because we wanted to investigate the importance of

comments).

All metrics but the last are at least on an interval scale; comments are on an ordinal scale. The related quality metric

maintainability is on an ordinal scale as well. This quality metric is on a scale 1. ..5, where 5 is the highest maintainability.

6.1. Analysis 1: Relationships between the metrics

A Spearman rank correlation was performed to investigate relationships among the metrics (Table 1). Signifi- cance levels for most correlations were far below 0.0005; only comments had a significance level above this limit. The highest random correlation coefficient

that we generated in 1000 trials with random metric gen- eration based on the given set of metrics observations and their distribution was 0.244. This means that correlation

coefficients higher than this limit are meaningful because even many trials with random, however fitting data, would not generate higher correlations. The p value based on the given r-coefficients higher than 0.50 with a significance

level of CY = 0.05 is in the interval of [0.43,0.73]. An orthogonal factor rotation of all metrics clearly separates three groups, namely the object-oriented metrics, comments, and quality. For better insight the single linkage

values of building clusters of ‘similar’ metrics are given. Single linkage values combine vol-meth and vol-var on a

level of 0.35, both with cohesion on a level of 0.42, then quality on a level of 0.50, str-meth on 0.59 and finally comments on 0.92.

6.2. Analysis 2: Regression model of maintainability

The second step of the analyses performed was regarding the predictability of the maintainability factor. For the complete regression analysis the dependent variable is quality, while the independent variables are vol-meth, vol-var, str- meth, cohesion and comments. The probability > F is equal to 2E - 07. There are 5 independent and one dependent variable in the regression model. Probability > F, com- monly known as the p-value, indicates the significance of

a regression. The smaller the probability, the more significant the regression. The very small value of probability > F

indicates, with a high degree of confidence, that some pre-

diction is possible. R-square is the quality indicator of a regression test that

measures the quality of the predictions. It shows how much variance in the dependent variable is accounted for in the

sample of 71 observations. Adjusted R-square measures the

same aspect as R-square, but in the population, with the adjustment depending on both sample size and the number of independent variables. R-square is 0.450 and adjusted R-

square is 0.408. The residual is 88.2. In our sample > 45%

of the total variance in maintainability is accounted for by the metrics in the sample, and > 40% in the population. It is interesting to investigate the same regression

without comments as independent variable. R-square and residual values remain unchanged, while the probability > F is a bit smaller. Obviously comments do not account much in the prediction model which was

already clear based on the cluster analysis. The standard error in both cases for all predictions is below 1.165,

which is a good result given the scale of [ 1,5] for the quality variable.

We conclude that the prediction of maintainability from the given metrics is possible.

6.3. Analysis 3: Discriminant analysis

The third step of the statistical analyses that give insight in the validity of given metrics as quality indicators is typically a real prediction based on some kind of discriminant

analysis. Although more advanced analyses are feasible, we will stick to one because the effects are similar [2]. The

chosen discriminant analysis tries to fit hyperplanes in a space built up by the independent variables. The planes

separate the 5 classes of the dependent variable, i.e. maintainability (Table 2).

In total 52.11% of all cases had been classified correctly. Below 10% of all samples were predicted as belonging to the opposite side of the dependent variable (i.e. difference between real value and predicted value > 2).

7. Discussion and further results

Based on the three analyses we conclude that there is a

C. Ebert, I. MorscheWInformation and Software Technology 39 (1997) 497-509

Table 2 Results of discriminant analysis

507

Maintainability Calculated N 1 2 3 4 5 Real

1 12 I I 1 0 3 2 5 0 1 2 1 I 3 II 0 0 I 1 3 4 12 0 1 2 9 0 5 31 1 2 0 15 13

strong statistical relationship between object-oriented

metrics used within SmallMetric for decision support during development and maintainability. Maintainability can be

predicted from combinations of metrics collected from

Smalltalk.

Based on the described set of metrics we investigated

different projects both from industry and academia to provide practical guidelines. Our experiences with quantitative support during the development of object-oriented software projects can be summarised as follows:

Provide continuous integration of new components instead of one big integration effort near delivery date.

Attack risks during the whole development process actively based on early assessments, especially based on quantitative feedback. Both risk and progress is

measured in the product and less in supporting documentation. Quantitative feedback of the product to the engineers and

project manager helps in much better scheduling of the next steps (e.g. testing, integration, supportive tool delivery, configuration management). Metrics used as quality indicators during design should cover at least the diversity of object-oriented software in terms of

classes (i.e. volume, complexity, structure), messages (i.e. interface structure, communication between enti- ties), and processes (i.e. dynamic structure, run-time

dependencies, method usage). Rapid changes are a source of risk because they are a sign

of unstable architectures. Quantitative feedback on changes in each class help design co-ordinators during

the design, and even more important the test co-ordinators during test to immediately detect critical areas that need thorough investigation. For instance, changes in several classes with same frequency clearly indicate

inherent ripple effects that are better removed early than during maintenance. Industrial software projects, such as switching systems,

are typical functionally structured. Since object-oriented

software emphasises on data structuring, many data structures that were originally part of a module are new classes. It is therefore impossible to reuse quantitative design heuristics of former projects when moving to object-oriented technology in parts of legacy software (as is the case for most large industrial projects).

Our investigations of more classes of Smalltalk projects

provide some generalised suggestions related to the design and implementation of classes that due to having analyzed

more projects go beyond the described analysis. They seem to be very clear in theory, however practical use often shows the opposite:

l Volume:

Number of object attributes or instance variables per

class; Number of methods per class - maximum 30;

l Structure:

Number of message passing per methods - maximum 30; Cyclomatic complexity of methods - maximum 5;

Nesting depth within methods - maximum 5;

l Cohesion and Coupling:

Existence of an accessor outside of a predefined protocol; Number of external message passing per method - maximum 5;

External access of methods under the protocol private - maximum 5;

l Inheritance:

Number of predecessor classes - maximum 5; Number of successor classes - minimum 2, maximum

10; Number of changes to inherited methods - maximum

5 (in fact zero overriding would be the best; however in some cases it cannot be avoided);

0 Class organisation:

Number of characters of an identifier - minimum 7;

Comments must be given; suggested comment density - minimum 0.2 Editing distance of identifiers - minimum 3.

The measured values were analysed with respect to boundaries (minimum, maximum), intervals, deviations from the average, and nonparametric correlations between them. The interpretation was performed according to these criteria and used as follows:

508 C. Ebert, I. MorscheUlnformation and Sofhvare Technology 39 (1997) 497-509

1.

2.

3.

Sustain a high comprehensibility level by providing a sufficient length of descriptive parts in all design objects and object names with meanings, rather than enumera-

tion such as ‘class-l ‘. The descriptions should include

subclasses or inheritance relations, changes of inherited

methods or variables, functionality, related objects, used

data items, date, author, test cases to be performed,

requirements fulfilled, management activities and staff connected with this project.

During class and object design, the metrics and their statistical evaluation (regarding similar projects) are

taken to distinguish between different designs (e.g. alter- native approaches, division into subclasses). Rules that

can be tailored to support design include size limits (e.g. lines per method) or functionality limits (e.g. methods per class, variables per class). It should be clear that for

distinct applications and libraries these limits must be adjusted taking into consideration for instance the design

experience, the available resources, or the requested

quality. During reviews at the end of design and coding, the

metrics are taken as indicators for weak components (e.g. inadequate inheritance hierarchy, unsatisfying

object description) and as indicators for process management (timely ordered number of classes or volume metrics).

After applying such metrics to several projects, the results obtained can be used to define intervals or limits for project- specific metrics in order to achieve more tailored quality indicators.

8. Summary

Most complexity metrics have been designed without

regard to the problem domain and the programming environment. There are many aspects of complexity and a lot of design decisions which influence the complexity of a product. This paper presents an approach to integrate software metrics with design support for object-oriented techniques

based on Smalltalk. A tool environment for program analysis called SmallMetric that incorporates metrics and guide-

lines for improving programs has been developed. Based on this set of metrics we investigated different projects both from industry and academia to provide practical guidelines.

This approach to integrate a measurement tool system into Smalltalk illustrates a way to minimise the efforts for implementation and maintenance of such a tool and shows how to cope with changes in future requirements for such tools and their individual interfaces. By transforming the object-oriented information representation into another language it is possible to integrate such measurement techniques into other environments as well.

Collecting metrics in the running projects helps in building up a historical database that can be further used for

better estimates and risk assessments in following projects. Be careful, however, not to measure everything; 6.. .12 metrics should be sufficient which is substantiated by the

factor analysis we described that reduces factor dimension-

ality to 5 and less. It is harder to tell what is important than

to say what can be measured. Clearly the traditional com-

plexity metrics, such as cyclomatic complexity or function

size are not measuring the significant aspects of complexity in object-oriented systems. Sticking to lines only does not

help in identifying the maximum size of methods. We rarely see high cyclomatic numbers or numerous lines of code in

methods of Smalltalk programs, but this does not mean that Smalltalk programs are better than C programs. It only indi-

cates that Smalltalk utilises a paradigm that distributes functionality into smaller units.

With an early analysis of software products we are able to

provide developers with helpful hints to improve their designs and code during the development process and not at the end when it win be much more expensive. By follow-

ing the given suggestions we could improve designs and achieve better programs in terms of such quality items as understandability, reusability and maintainability. Of

course, much more research is necessary in order to provide complete guidelines for achieving high quality designs. We consider this approach also as a vehicle towards measuring

productivity and estimating effort early in the analysis and design of object-oriented software. The basic step, however, still is the measurement and evaluation of software com-

plexity as early as possible: during the software development process when the most expensive faults are induced (e.g. inheritance trees). By making software engineers

aware that there are suitable techniques and tools for analysing their programs, even when they are object-oriented, this could be a small step to avoid a similar software crisis to

what we are currently facing in procedural environments.

Acknowledgements

The assistance of the Landis & Gyr corporation, Switzer-

land, to provide product and process data of object-oriented projects is gratefully acknowledged. Several discussions with A. Riegg of Debis in Stuttgart contributed to the pro-

posed guidelines.

References

[l] N.E. Fenton, S.L. Pfleeger, Software Metrics: A Rigorous and Prac-

tical Approach, 2nd edition, Thomson Computer Press, London, UK,

1996.

[2] C. Ebert, Evaluation and application of complexity-based criticality

models, Proc. of the 3rd Int. Software Metrics Symposium

(METRICS96), IEEE Computer Sot. Press, Los Alamitos, CA,

USA, 1996, pp. 174-185.

[3] A. Goldberg, D. Robson, Smalltalk-80: The language and its implementation, Addison-Wesley, 1983.

C. Ebert, I. MorscheUlnformation and Software Technology 39 (1997) 497-509 509

[4] C. Ebert, Visualization techniques for analysing and evaluating soft-

ware measures, IEEE Transactions on Software Engineering 18 (1 I)

(1992) 1029-1034.

[5] G. Stark, R.C. Durst, C.W. Vowel], Using metrics in management

decision making, IEEE Computer 27 (9) (1994) 42-48.

[6] B.A. Kitchenham, S.G. Linkman, D.T. Law, Critical review of quan-

titative asessment, Software Engineering Journal 9 (3) (1994) 43-53.

[7] S.R. Chidamber, C.F. Kemerer, Towards a metric suite for object

oriented design, Proc. of Conf. on Object-Oriented Programming Sys-

tems, Languages, and Applications (OOPSLA), Sigplan Notices 26

(11) (1991).

[8] W. LaLonde, J. Pugh, Gathering metric information using metalevel

facilities, Journal of Object Oriented Programming 6 (1994) 33-37.

[9] B. Henderson-Sellers, J. Edwards, BOOKTWO of Object-Oriented

Knowledge: The Working Object, Prentice Hall, Sydney, Australia,

1994.

[IO] M. Lorenz, J. Kidd, Object-Oriented Software Metrics, Prentice Hall

Object-Oriented Series, Englewood Cliffs, USA, 1994.

[I I] K.J. Lieberherr, I.M. Holland, Assuring good style for object-oriented

programs, IEEE Software 6 (9) (1989) 38-48.

[12] S. Karunanithi, J.M. Bieman, Candidate reuse metrics for object

oriented and Ada software, Proc. Int. Software Metrics Symposium,

IEEE Computer Society Press, New York, 1993, pp. l20- 128.

[ 131 N.I. Churcher, M.J. Shepperd, Towards a conceptual framework for

object-oriented software metrics. Software Engineering Notes 20 (2)

(1995) 69-75.

[ 141 R. Sharble, S. Cohen, The object-oriented brewery: A comparison of

two object-oriented development methods, Software Engineering

Notes 18 (2) (1993).

[ I.51 I. Morschel, An intelligent tutoring system for the learning of object-

oriented programming, Proc. EAEEIE’93, Prague, 1993.

[ 161 K. Beck, To accessor or not to accessor?, The Smalltalk Report, vol. 2,

no. 8, 1993.

[17] D.L. Pamas, P.C. Clements, D.M. Weiss, The modular structure of

complex systems, IEEE Transactions on Software Engineering 11 (3)

(1985) 259-266.

Documents

Metrics for quality analysis and improvement of object-oriented software