Lecture Notes - Graz University of Technology · 2016-10-31 · the article A Survey of Program Slicing Techniques written by Frank Tip [31], di erent papers dealing with delta debugging

716.064 VU

Software-Maintenance

Lecture Notes

Prof. Dr. Franz Wotawa

Dipl.-Ing. Dipl.-Ing. Roxane Koitz

Dr. Birgit Hofer

WS 2016/17

October 31, 2016

Institut fur Softwaretechnologie, Inffeldgasse 16b/2, A-8010 Graz, Austria,Phone +43-316-873-5711, Fax [email protected], http://www.ist.tugraz.at/

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 The Context of Maintenance 52.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Classification of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Software Evolution - Lehman’s Laws . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Costs of Software Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 The Maintenance Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Potential Solutions to Maintenance Problems . . . . . . . . . . . . . . . . . . . . . 102.7 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.7.1 Decisions based on one external factor . . . . . . . . . . . . . . . . . . . . . 112.7.2 Decision based on several external factors . . . . . . . . . . . . . . . . . . . 13

2.8 Maintenance Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.8.1 Traditional software development models . . . . . . . . . . . . . . . . . . . 152.8.2 Maintenance process models . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.8.3 Capability maturity model . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.9 The Maintenance Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.9.1 Program comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.9.2 Reverse engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.9.3 Reuse and reuseability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Techniques for Program Analysis 293.1 Flow Propagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Program Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.2 Static slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.3 Dynamic Program Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2.4 Forward slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2.5 Hitting sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.3 Delta Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.3.1 The minimizing delta debugging algorithm . . . . . . . . . . . . . . . . . . 683.3.2 The isolation differences algorithm . . . . . . . . . . . . . . . . . . . . . . . 73

3.4 Object Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.4.1 Language abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.4.2 Graph creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.4.3 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.4.4 Object sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.5 Class Diagram Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

i

Bibliography 94

ii

Chapter 1

Introduction

1.1 Motivation

Why is software maintenance important?Functionality, flexibility, availability, correctness and performance are some important character-istics of good software. The aim of software maintenance is to guarantee these characteristics overthe whole software life cycle. Changes of software are often necessary in order to eliminate errors,extend the functionality of the software or adapt the software to a changed environment.

The following example shows that software maintenance can lead to errors in programs. How-ever, a good software maintenance department must be able to identify and correct errors as soonas possible in order to avoid high costs and a possible image loss for the company.

Man Charged $23 Quadrillion For Cigarettes

Thursday July 16, 2009

A man who was charged more than 23 quadrillion dollars on his credit card for a packetof cigarettes.

He had been charged a total of $23,148,855,308,184,500 (twenty-three quadrillion, onehundred forty-eight trillion, eight hundred fifty-five billion, three hundred eight million,one hundred eighty-four thousand, five hundred dollars).

”I thought somebody bought Europe with my credit card,” he told the WMUR-TVstation in his home town of Manchester, New Hampshire.

. . .

Visa has since released a statement saying: ”A temporary programming error at VisaDebit Processing Services caused some transactions to be inaccurately posted to asmall number of Visa prepaid accounts. The technical glitch, which impacted fewerthan 13,000 Visa prepaid transactions, has been corrected and erroneous postings havebeen removed. . . . Visa regrets any inconvenience to our customers and has takenimmediate steps to ensure this error doesn’t occur again.” 1

What are the main problems/challenges in software maintenance?One of the biggest challenges in software maintenance is the change/correction of one part of thesoftware without changing other parts of the software.

In many companies, different departments are responsible for the development and maintenanceof software. A considerable part of the knowledge about the software is not transmitted fromthe development department to the maintenance department. In many cases, the responsibleprogrammer has already left the company, together with his or her information about the programstructure, e.g. in the case of the Y2K problem:

1http://news.sky.com/, July, 30th 2009

1

http://news.sky.com/

The Y2K Bug. In the early 1970s, programmers saved memory by shortening datesto 2 digits. At that time, programmers thought that their programs would be replacedor updated until the year 2000. At the end of the 1990s, many of the programs werestill used, but the programmers were not available any more because they were retiredor left the company. As a guess several 100 billion dollars were spent worldwide toreplace or update programs with the Y2K bug. [23]

Software maintenance has to deal with the problems made in the pre-release time and has tostraighten out the errors made. The following example shows the influence of decisions madeduring the pre-release phase on maintenance:

Intel Pentium Floating-Point Division Bug. In 1994, the software test engineersat Intel discovered in pre-release tests a floating-point division bug burned into com-puter chips. Intel’s management decided not to fix the bug because it emerges onlyin extremely math-intensive calculations. In October of 1994, the bug was found bya user. Intel tried to play down the bug. There was a public outcry. The Internetcommunity and the media denounced Intel as uncaring and incredulous. Finally, Intelreplaced the faulty chips. The total costs to correct this fault come up to more than$ 400 million2 [23].

Can you avoid software maintenance?Software maintenance consists of two major tasks: (1) the correction of errors and (2) changes infunctionality. Lets answer the questions whether we could avoid errors and changes.

(1) Can you avoid errors so that there is no need to correct them?

”’Error-free’ software is non-existent.” [11]

Some reasons of faulty software are the following:

• Insufficient knowledge of the user domain.

• To err is human. Humans make errors again and again.

• Changes to a software are nearly unavoidable. Often the documentation is insufficient andthus new errors are introduced by changing a software system.

• Complexity of programs.

There exist errors which are difficult to detect even with the most powerful testing techniques. Inpractice, these errors are often detected by users.

(2) Can you avoid changes in functionality?When you design software, it is not possible to consider all possibilities that could happen in thefuture. We are living in a changing world (e.g. bigger airports, change to e-payment systems).Subsequently, software systems must be adapted in order to cope with the changed environment.

If you cannot avoid maintenance, can you improve software maintenance?You can consider maintenance in the development phase by increasing the development effort inorder to avoid errors and by writing structured code. In addition, you can decrease the maintenanceeffort by using the techniques presented in this lecture for error location and reverse engineering.

2http://en.wikipedia.org/wiki/Pentium_FDIV_bug, August, 4th 2009

2

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

1.2 Overview

These lecture notes consist of two major parts: (1) the general view on the field of softwaremaintenance and (2) concrete reverse-engineering and debugging techniques.

Chapter 2 is based on the book Software Maintenance: Concepts and Practice written by Grubband Takang [11]. This chapter presents the definition of the term maintenance, the classification ofchanges made during maintenance, the costs of maintenance, and the environment of maintenance.In addition, the maintenance of an existing system is compared to the development of a new systemand the risks of both options are discussed. A digression deals with decision theory and explainshow decision theory can be used in software maintenance. Finally, different process models areexplained and the software maintenance process is discussed.

Chapter 3 addresses concrete techniques which (1) help to comprehend a program (e.g. byautomatically extracting different types of diagrams from the source code) and (2) support themaintenance personnel in the task of finding the location of errors. The content of this chapter isbased on

• the book Reverse engineering of object oriented code written by Tonella and Potrich [32],

• the article A Survey of Program Slicing Techniques written by Frank Tip [31],

• different papers dealing with delta debugging [36–39] and

• papers concerning model-based debugging [21,27,34].

3

4

Chapter 2

The Context of Maintenance

This chapter deals with the general view on software maintenance. Section 2.1 gives a definitionof the term software maintenance. Section 2.2 explains the different types of changes. Section 2.3lists Lehman’s Laws. Lehman identified over the last 40 years legalities which hold in the field ofsoftware maintenance. Section 2.4 compares the costs of software maintenance with the total costsof software engineering and explains why the maintenance costs are high. Section 2.5 addressesthe environment of software maintenance. It identifies the factors which influence software main-tenance. The maintenance of an existing system is compared to the development of a new systemin Section 2.6. In addition, the risks of both options are discussed. Section 2.7 is a digressionto decision theory and explains how decision theory can be used in the field of software main-tenance. Section 2.8 describes the difference between the software development process and themaintenance process, and explains several process models for both. Finally, Section 2.9 describesthe term forward engineering, reverse engineering, reengineering, restructuring and abstraction.Additionally, this section explains in detail program comprehension, reverse engineering, reuse andreusability.

2.1 Definition

Simplified, software maintenance can be understood as ”any work that is undertaken after deliveryof a software system” [3]. The Institute of Electrical and Electronics Engineers (IEEE) definessoftware maintenance as follows:

”Software maintenance is the process of modifying a software system or componentafter delivery to correct faults, improve performances or other attributes, or adaptto a changed environment.” [25]

Some authors (e.g. [24,28]) disagree with this definition, because they see software maintenance asan accompanying process during the whole software life-cycle instead of an after-release process.They argue that the development process must consider future maintenance tasks. Therefore,software developers must generate intuitive code, a documentation which is easy to understandand automated tests which can be used in order to ensure the basic functionality after changes ofthe software. The development department is responsible for providing a complete software, whichcomprises the program (source code and object code), the documentation (analysis documents,e.g. formal specifications, design documents, e.g. UML diagrams), the testing data (test cases,test results) and the operating procedures (installation handbook, user handbook). In practice,certain parts are often missing. As a consequence, the maintenance department have to deal withincomplete information.

5

2.2 Classification of Changes

Software maintenance can be categorized into four sub-tasks [19,26]:

• Corrective maintenanceThis task comprises the correction of faults, which can be classified as follows [20]:

◦ Design errors (e.g. wrong requirements)

◦ Logic errors

◦ Coding errors

• Adaptive maintenanceThis task comprises adaption of the software to a changed environment, e.g. adaptationsdue to changes by law or a new platform (hardware, operating system).

• Perfective maintenancePerfective maintenance is responsible for changes because of user requests. This task com-prises removing, extending, and modifying of functionality and the improvement of theperformance.

• Preventive maintenanceThese are all improvements of the software in order to ease the further maintenance.

Figure 2.1 gives an overview of the categorization. In practice, companies often see softwaremaintenance only as corrective maintenance.

Figure 2.1: Categorization of maintenance

Figure 2.2 shows the distribution of the effort in software maintenance in accordance to the cat-egorization of Lientz et al. [19]. Most of the effort in software maintenance is spent in perfectivemaintenance (55 %), while 25 % are spent for adaptive maintenance and 20 % are spent forcorrective maintenance.

Figure 2.2: Distribution of the effort in software maintenance [19]

6

2.3 Software Evolution - Lehman’s Laws

Lehman formulated the laws of software evolution over the last 40 years. He subsequently amendedthese laws and summarized them in [17]:

I Continuing change

Systems must be continually adapted otherwise they become progressively less satisfactory.

II Increasing complexity

The complexity of evolving systems increases unless work is done to maintain or reduce it.

III Self regulation

As the software is implemented within the context of a wider organization, the objective of get-ting the system finished is constrained by the wider objectives and constraints of organizationalgoals [11].

IV Conservation of organizational stability

The average effective global activity rate in an evolving system is invariant over the product lifetime.

V Conservation of familiarity

The average incremental growth of systems tends to remain constant or decreases [11]. Themaintenance personnel must maintain control of its content and behavior to achieve satisfactoryevolution. Excessive growth diminishes that control.

VI Continuing growth

The functional content of a system must be continually increased in order to maintain user satis-faction over its life time.

VII Declining quality

The quality of a system will appear to be declining unless the system is rigorously maintained andadapted to operational environment changes.

VII Feedback system

Evolution processes constitute multi-level, multi-loop, multi-agent feedback systems and must betreated as such to achieve significant improvement over any reasonable base.

7

2.4 Costs of Software Maintenance

Figure 2.3 illustrates the costs of software maintenance. Figure 2.3(a) shows the results of differentinvestigations on the costs of maintenance in comparison to the costs of a new development over10 years (see [2, 9, 16,19]). Bernstein [5] estimates even higher costs for maintenance:

”In the ’90’s, yearly corporate expenditures for software will reach $100 billion dollars.At least $70 billion are spent yearly on maintaining systems, while only $30 billion onnew development.”

(a) Costs of maintenance and new development (b) Costs of the correction of a bug [11]

Figure 2.3: Cost of software maintenance

Reasons for the high costs of maintenance

The high costs for the correction of a bug in the maintenance phase (see Figure 2.3(b)) are themain reason for the high costs of maintenance. Further (more precise) reasons are:

• Insufficient knowledge of the user domain

• Insufficient or obsolete documentation

• Complexity of programs

• Programmer 6= maintenance personnel

• Employee responsible for the code left the company and with him/her the correspondingknowledge

• Usage of undocumented assumptions introduces (further) errors

• Knowledge about management, business rules and processes, operational and functionalinformation are often only implicitly (in the program code) available

Jones [13] compares software maintenance with the extension of an existing building:

”The architect and the builders must take care not to weaken the existing structurewhen additions are made. Although the costs of the new room usually will be lowerthan the costs of constructing an entirely new building, the costs per square foot maybe much higher because of the need to remove existing walls, reroute plumbing andelectrical circuits and take special care to avoid disrupting the current site.”

8

2.5 The Maintenance Framework

Figure 2.4 illustrates the software maintenance framework with the organizational environment,the user environment, and the operational environment as external influence factors as well asthe software, the maintenance process itself, and the maintenance personnel as internal influencefactors.

Figure 2.4: The main influence factors to software maintenance [11]

• External influence factors

◦ User environmentThe user requirements involve requests for additional functionality and error correctionas well as requests for non-programming-related support.

◦ Operational environmentThe changes here can be the exchange of the hardware platform or of the softwaresystem, e.g. a new operating system or a different database management system.

◦ Organizational environmentThe changes in the organizational environment comprise changes in policies, e.g. changesin business rules or taxation policies and changes which enable to stay competitive.

• Internal influence factors

◦ Maintenance processThe maintenance process itself is the most important part in the software maintenanceframework. It covers the following tasks:

∗ Capturing change requirements - process of finding out what changes are required.

∗ Variation in programming practice - the differences in the approaches used forwriting and maintaining programs. Basic guidelines (e.g. meaningful identifiers,no ’GOTO’s) avoid variations in the code.

∗ Paradigm shift - alteration of the way a software is developed and maintained.There are still many systems in use developed in low-level programming languages.

9

∗ Error detection and correction.

◦ Software to maintainThe main aspects of the software itself are the difficulty of the application domain, thequality of the documentation (obsolete or no documentation at all) and the structureof the program.

◦ Maintenance personnelThe high staff turnover is a big problem in the field of software maintenance. Mostsoftware systems are maintained by people who are not the original authors of theprogram. The maintenance personnel often lacks expertise in the domain. Programmerscan introduce errors in other parts of the software by changing one part of the software(ripple effect).

Good employees for software maintenance have the following abilities:

∗ Knowledge of the programming language

∗ Abstraction, compression of information, and analytical abilities

∗ Impact analysis abilities

Often the maintenance personnel is inexperienced. Beath and Swanson [30] reportedthat 25 % of the maintenance personnel are students and 60 % are newly hired people.

Changes to a software often cannot be realized because of the restrictions from the environment,for example [11]:

• Limitations of resources

◦ Lack of skilled and trained maintenance personnel

◦ Lack of suitable tools and environment

◦ Lack of sufficient budget allocation

• Quality of the existing system

◦ Ripple effects

◦ Errors sometimes cannot be addressed because of the poor quality of the software

• Organizational strategy

◦ Determination of the maintenance budget

2.6 Potential Solutions to Maintenance Problems

There exist the following solutions to the problem that the maintenance of existing systems is asexpensive as the development of a new system [11]:

• Investigate more budget into the development of new systems in order to decrease themaintenance costs and to increase the quality of the product. The use of more advanced re-quirement specification approaches, design techniques and tools, as well as quality assuranceprocedures and standards should help that the new system becomes more maintainable.

• Replace the existing system with a new system. Problems result from the fact thaterrors can also occur in the new system. You have to take the maintenance costs of the newsystem into consideration. In addition, you have to consider that maintenance causes smallcosts over the total period of time, whereas a new development causes a big amount in asmall period of time. Furthermore, there is much information only implicitly available in thesource code. Often the requirements are only indirectly available in the old system. It makessense to redevelop a system if the program is unmaintainable, the costs for maintenance arehigher than the costs of a new system or if there are profoundly changes in the processstructure or general requirements.

10

• Improve the existing system by preventive maintenance.

The next section describes how to rationally decide for or against redevelopment.

2.7 Decision Theory

It is necessary to make different decisions in the context of software maintenance. In principle,there exist many different modes to make decisions and to enforce them. Beside random decisionsand decisions based on emotions, decisions can be rational as well. Rational decisions are basedon the available knowledge, they can be comprehended and justified.

Making decisions means choosing among alternatives, which have an impact on the environ-ment. In the field of software maintenance, there appear amongst others the following questions:”Do we have to fix an error?” or ”Should we continue maintaining the existing system or shouldwe start the development from scratch?”

A rational answer to these questions requires that several options are available. A good decisionis one which selects an alternative which furnishes an as good as possible result in regard to a givencriterion. It is essential to have a good value system which allows to differentiate good and badresults. The decision theory deals with the question how to make a rational and good selection ofone alternative.

The decision theory assumes that there exist evaluation functions which help to evaluate theeffects of decisions. The selection of appropriate evaluation functions is not part of the decisiontheory. More information on decision theory is given for example by Sven Ove Hansson in ”DecisionTheory: A Brief Introduction” 1. A good technical introduction with relation to the probabilitytheory and stochastics is given by Herman Chernoff and Lincoln E. Moses in Elementary DecisionTheory, Reprint Dover Edition,1986, Originally published by John Willey & Sons, Inc., 1959.

2.7.1 Decisions based on one external factor

In this section, we discuss how decisions can be made if only one aspect is considered. Have a lookat the following example:

You have to decide if you continue maintaining an existing software system. Alter-natively, your company can redevelop the system which will lead to an improved sys-tem concerning functionality, stability and performance. You assume e 250,000 toe 400,000 maintenance costs per year because of necessary perfective and adaptivechanges. The department responsible for the new development of the system estimatesthat the costs for the new development are about e 1.5 million and that one year ofdevelopment will be required. A budget overrun of 10 % is possible. The maintenancecosts for the new software system aggreagate e 50,000 to e 80,000 per year. Do youorder the new development of the system? Assume that the software is needed forthe next 5 years. The corporate management is interested in knowing when the newdevelopment amortizes.

In order to decide if a new development should be made or not, we have to evaluate the alternatives.This can be done either by

• itemizing the relation or

• the usage of a numerical value assigned to the alternatives.

In the first case, the answer can be ”The new development is better than the maintenance of theexisting system”. This answer establishes the relation between the new development (N) and themaintenance (M): N > M . In order to establish such relations, we have to evaluate the impactsof our decisions. In this example, the impacts of the decision influences the expected costs. Thus,

1http://www.infra.kth.se/~soh/decisiontheory.pdf

11

http://www.infra.kth.se/~soh/decisiontheory.pdf

we focus only on the costs. We regard the costs as stochastic parameters (X) and describe thetotal costs as equations. The costs of the new development N are

N = N0 +

j∑i=1

M(M)i +

n∑i=j+1

M(N)i

where N0 are the costs for the new development, n is the life time of the software, j is thetime required to build the new system, M (M) are the costs for the maintenance of the old systemand M (N) are the costs for the maintenance of the new system. The maintenance costs of the oldsystem have to be considered because the old system has to be maintained until the new systemcan be used. The costs of the old system M are

M =

n∑i=1

M(M)i .

We compare the expectations of the costs. Therefore, we need a distribution function of thestochastic variables (X). We assume a Gaussian distribution (X ∼ N(µ, σ2)). Other distributionscan be used as well. The mean value µ and the standard deviation σ are computed with themaximal and minimal costs as follows:

µ =max+min

2, σ =

max−min2

.

The advantage of the Gaussian distribution is that the sum and difference of two Gaussian dis-tributed variables are Gaussian distributed again:

X ∼ N(µx, σ2x) and Y ∼ N(µy, σ

2y)→ X ± Y ∼ N(µx ± µy, σ

2x + σ2

y).

The probabilities of an arbitrary Gaussian value can be calculated with the Gaussian distribution:

P (X < x) = φ

(x− µσ

).

The expectations for our example are:

E(N) =1, 500, 000 + 1, 650, 000

2+ 1× 250, 000 + 400, 000

2+ (n− 1)× 50, 000 + 80, 000

2

E(M) = n× 250, 000 + 400, 000

2.

The expected costs over 5 years are e 2,160,000 for the new development and e 1,625,000 for themaintenance of the existing system. Thus we decide to continue using the old system.

In order to answer the second question (the time of amortization), we equate E(N) with E(M)and solve the equation for n:

n ≈ 7.31

Thus, the software needs to be used for at least eight years in order to justify a new development.Up to now, we have only considered problems limited to two alternatives (new development

or maintenance) and one external factor (the costs). In practice, there are often many differentfactors which have to be considered in the decision. This problem is addressed in the next section.

12

2.7.2 Decision based on several external factors

We are looking for an alternative which furnishes the best results according to the evaluationfunction in the decision making process. The decision making problem is defined by the followingthree parameters

• the set of alternatives or options,

• the set of external factors which influence the decision and

• the results of the alternatives of given factors.

This type of decision making deals with an arbitrary amount of alternatives and factors. Theresult of a decision is the combined effect of all external factors.

The general used illustration of a decision making problem is the decision matrix. All possiblealternatives are opposed to the external factors. Each alternative is assigned to a row. Theexternal factors are given in the columns. Each cell contains the potential impacts of the decisionto the particular factor. The external factors model the state of the world according to the chosenabstraction.

The example of the previous section is extended by the external factors ‘Costs for the newdevelopment ’, ‘Costs for maintenance’ and ‘Revenues’. Table 2.1 illustrates the decision matrixfor this example.

Costs new development Maintenance costs Expected revenuesNew developmentMaintenance

Table 2.1: Decision matrix

We make the following considerations for the values of the cells : The costs for the new developmentare high and have a main influence on the budget. On the other hand the maintenance costs of thenew developed system are low. There are no developing costs for the old system, but high costsfor the maintenance. The results for the expected revenues have to be investigated by the salesdepartment. We assume a higher revenue for the new system. Table 2.2 shows the filled decisiontable.

Costs new development Maintenance costs Expected revenuesNew development high low higherMaintenance - high unchanged

Table 2.2: Filled decision matrix

The considerations are dependent on the environment and are made at the beginning of thedecision making process. It is important to analyze all possible impacts of a decision according tothe external factors. The values of Table 2.2 are subjective values, costs could be used alternatively.

It is not possible to get the best alternative directly from Table 2.2. The evaluation scheme ismissing. We have to assign an evaluation to every cell by allocating so-called utility values. Theutility value indicates the importance of a result for the decision. The utility values are relativevalues. We assign utility values between 0 and 100 inclusively, where 0 is bad and 100 is excellent.Table 2.3 shows utility values that we have chosen for this example.

Costs new development Maintenance costs Expected revenuesNew development 0 80 80Maintenance 100 0 50

Table 2.3: Utility values for the decision matrix

13

One way to obtain the expected results for the given alternatives is to sum-up all utility valuesfor one row and choose the one with the highest value. This is not the best solution because theutility values should be set in relation to the other utility values according to their importance.The probabilities can be obtained by observations. In software maintenance, information is oftencollected over current projects which can be projected to probability values. The sum of theprobability values of one column must be 1, because the listed external factors should mirror thecomplete state of the world. The probabilities quantify the extend of influence of each externalfactor.

Table 2.4 shows the assigned probabilities for our example and the sum per row. The newdevelopment would be preferred because it shows better values. If the results are close moreexternal factors should be used in the decision matrix.

Costs new development Maintenance costs Expected revenues SumNew development 0 × 0.25 80 × 0.25 80 × 0.5 80Maintenance 100 × 0.25 0 × 0.25 50 × 0.5 50

Table 2.4: Utility values and probabilities for the decision matrix

Beside the objective information, subjective quantifiable information is used. The chosen utilityvalues have a strong impact on the final result.

14

2.8 Maintenance Process Models

There exist different models for the new development of software and the maintenance of software.Figure 2.5 illustrates the differences in the effort spent in new development and maintenance inthe different development phases. While the effort during the development of a new system is highin the implementation and testing phases, the effort in the maintenance of a software is high inthe analysis, specification and testing phases.

Section 2.8.1 describes typical process models for new developments. Section 2.8.2 describesprocess models which are used for software maintenance. Finally, Section 2.8.3 describes thedifferent levels of process maturity.

Figure 2.5: Effort spent during development phases [11]

2.8.1 Traditional software development models

Figure 2.6 summarizes some important software development models. Figure 2.6(a) shows thegeneral generic life cycle of a software which consists of the five phases requirement analysis,design, implementation, testing and operation.

Code-and-fix model

This ad hoc model [11] consists of two phases as shown in Figure 2.6(b): writing code and fixingit. Fixing can be the correction of errors or the extension of the functionality. Analysis and designare not considered in this model. Thus the code soon becomes chaotic and incomprehensible andthus more errors are introduced and the program becomes less maintainable. Even though, thismodel introduces many problems, it is still used in practice because of time pressure.

Waterfall model

The waterfall model gives a high-level view of the software life-cycle [11]. It distinguishes thefollowing phases:

• Analysis (Decide what to do)

• Design (Decide how to do it)

• Implementation / Coding (Do it)

• Testing (Test it)

• Operation (Use it)

15

(a) The generic life cycle (b) The code-and-fix model

(c) The waterfall model (d) The spiral model

Figure 2.6: Overview of traditional software development models [11]

16

As shown in Figure 2.6(c), the phases of the waterfall model are represented as a cascade [11]. Theoutput of one phase is the input to the next phase. Therefore, this model is document driven. Thebig disadvantage of the waterfall model is that it is not suited for a large amount of modificationsin the software. However, software itself is evolutionary and changes are unavoidable.

Spiral model

The spiral model is a cyclic model with four stages. In each cycle the risks and alternatives areidentified and evaluated, a part of the system is developed and verified and the next phases areplanned (see Figure 2.6(d)). The spiral model tries to eliminate the disadvantage of the waterfallmodel by introducing cycles [11]. The aim of this model is to eliminate high-risk problems as soonas possible in order to avoid high costs for their elimination. In contrast to the waterfall model,the spiral model is risk driven. Developers tend to ’do the easy bits first ’ and to delay the difficultbits until the end. The spiral model forces the developers to attend to the problems first.

2.8.2 Maintenance process models

In general, a maintenance process involves the following five steps:

1. Analyze the existing system

2. Analyze the necessary changes of the system

3. Prediction of potential impacts (ripple effects)

4. Determination of skills and knowledge required for the changes

5. Implementation of the changes

Grubb and Takang [11] present the following process models for software maintenance. Figure 2.7summarizes them.

Quick-fix model

This ad hoc approach [11] is shown in Figure 2.7(a). If a problem occurs programmers try tofix it as quickly as possible. There is no analysis of effects and little or no documentation. Thismodel can be used if a system is developed and maintained by a single person. However, thismodel is often used in bigger projects because of time pressures and deadlines. It is possible to usethe advantages of the Quick-Fix-Model by integrating it into another, more sophisticated model:Urgent modifications are made using the quick fix model. Later, these quick modifications can bechanged as part of the preventive maintenance and the documentations is updated.

Boehm’s model

Boehm [6] proposed the model illustrated in Figure 2.7(b) for the software maintenance process. Hedescribes the maintenance process as a cycle of decisions, implementations, usage and evaluation.Boehm’s maintenance process is driven by manager decisions [11].

Osborne’s model

Osborne [22] proposed the maintenance model described in Figure 2.7(c). In contrast to othermaintenance models, it deals directly with the reality of the maintenance environment and does notassume an ideal situation (e.g. existence of a complete documentation) [11]. If the documentationor formal specification is missing, allowance is made for them to be build in. Many problems duringthe maintenance are a consequence of inadequate management communications and control.

17

(a) The Quick-fix model (b) Boehms model

(c) Osbornes model (d) Iterative enhancement model

(e) Reuse model

Figure 2.7: Overview of maintenance process models

18

Iterative enhancement model

Software maintenance is an iterative process and enhances a system in an iterative way [11]. Theiterative enhancement model requires a complete documentation as the starting point for eachiteration. Figure 2.7(d) shows the model as a three-stage cycle. The existing documenation foreach stage (requirements, design, coding, testing) is modified from the highest-level documenteffected down to the other documents. Problems occur when you use this model for a softwaresystem with obsolete or even missing documentation.

Reuse-oriented model

This model is based on the principle to see maintenance as an activity involving the reuse ofexisting program components [11]. Figure 2.7(e) shows this model, which builds a componentlibrary for reusing requirements, design, source code and test data. These are the four main stepsfor reuse [4]:

1. Identification of the parts of the old system which can be reused

2. Understanding of these parts

3. Modification of the parts according to the new requirements

4. Integration of the modified parts into the new system

2.8.3 Capability maturity model

Knowledge of the theory does not automatically lead to effective use in practice [11]. A maturitymodel describes to what extend a process model is used in an organization. The capability maturitymodel developed by the Software Engineering Institute (SEI) defines five levels of maturity ofprocesses [7]:

1. Initial: ad hoc software process

2. Repeatable: basic processes are established (tracking costs, scheduling, . . . )

3. Defined: processes are documented and standardized

4. Managed: measures of the quality of the process

5. Optimized: continuous evaluation and use of the quantitative feedback of projects

19

2.9 The Maintenance Process

Five important terms in the context of software maintenance are [11]:

Forward engineering The traditional software development approach.

Reverse engineering The process of analyzing a system in order to identify the components ofa system and their interrelationships.

Re-engineering Enhancement of a system by using first reverse engineering to comprehendthe system and then forward engineering to produce the new system.

Restructuring Creating a more maintainable system, changing the abstract representa-tion of the system without changing its functionality.

Abstraction A model which summarizes important features of a system and ignoresirrelevant ones. There exist three types of abstraction:

• Function/procedural abstractionIt is used to obtain knowledge about the input-output relation offunctions.

• Data abstractionIt is used to obtain knowledge about the used data models.

• Process abstractionIt is used to obtain the exact order in which operations are per-formed.

The IEEE standard IEEE-1219 defines the maintenance process as a seven-step-process:

1. Problem identification, classification and prioritization

2. Analysis

(a) Feasibility analysis (Identification of alternatives, impacts and costs)

(b) Detailed analysis (Definition of requirements for modification, test strategy, . . . )

3. Design

4. Implementation

5. Regression/system testing

6. Acceptance testing

7. Delivery

The software maintenance process consists of the following main steps [11]:

1. Identifying the need for change

2. Understanding the current system - program comprehension

• What does the software actually do?

• Where does the change need to be made?

• How do the parts work that need to be corrected?

3. Carrying out the change - re-engineering

4. Testing - to gain confidence that the changes are correct and to ensure the absence ofaccidental changes of other parts of the software

20

Section 2.9.1 deals with program comprehension. The aims and the influence factors of programcomprehension are explained. Section 2.9.2 is about reverse engineering. It explains its goals, itsdifferent levels and the reasons for it. Finally, the types of restructuring are explained and theproblems of reverse engineering are listed. Section 2.9.3 deals with the topic of reuse. It explainswhat can be reused, what are the benefits of reuse and the approaches of reuse. It summarizesthe factors which impact reusability, and problems with reuse libraries. Subsequently, it explainsthe reuse process model.

2.9.1 Program comprehension

The program comprehension process is one of the most important parts of software maintenance.It is estimated that 50 % to 90 % of the time in software maintenance is consumed on programcomprehension [8, 18,29].

By investigating a system, the programmer builds a mental model of the program (a mentalrepresentation of the target system). The completeness and accuracy of the model depends to alarge extend on the needs of the person who builds it.

Figure 2.8 shows the harmonization of the mental model to the knowledge presented in thesoftware. During the process of comprehension, sometimes the mental model is closer to the realprogram, sometimes it is farther away.

Figure 2.8: Mental model

Influence factors on program understanding

A number of factors can effect the ease with which a program can be understood [11]:

• Domain knowledge

• Programming practice, expertise

• Implementation issues

◦ Naming style (informative, concise, unambiguous identifier names)

◦ Comments, level of nesting, readability and simplicity

◦ Coding standards

• Documentation

• Program organization and presentation

◦ Emphasis of the control flow

◦ Visual enhancement of the source code through indentation and spacing

• Comprehension support tools

21

The aims of program comprehension

Grubb and Takang [11] listed the following features of a software product and explain why it isimportant to understand them:

• Problem domain

◦ Important to estimate the necessary resources

◦ Helps to choose suitable algorithms, methods, tools and personnel

Information of the problem domain can be obtained for example from the system documen-tation, from end users and the program source code.

• Execution effect

◦ Determines whether a change achieves the desired effect

◦ Allows the maintenance personnel to reason about how components interact

• Cause-effect relation

◦ Establishes the scope of a change

◦ Predicts potential ripple effects

• Product-environment relation

◦ Determines how changes in the product’s environment affect the product

The environment of a software are the totality of all conditions and influences which actfrom outside upon the product, e.g. business rules, government regulations, work patterns,software and hardware operating platforms.

• Decision-support features

◦ Support of technical and management decision-making processes

The program comprehension process

Figure 2.9 shows the comprehension process model described in [11]. In the first step (’Readabout the program’ ), different sources of information (e.g. specification and design documents)are browsed in order to get an overall understanding of the system. In the second step (’Read thesource code’ ), global and local views of the program are obtained. In this steps, tools can be usedto generate these views (e.g. static analyzers). In the third step (’Run the program’ ), the dynamicbehavior of the program is studied. In practice, backtracking and iterations of these steps are usedto clarify doubts and to obtain more information.There are three different program comprehension strategies:

• Top-down modelYou start by comprehending the top-level aspects of a program and work towards under-standing the low-level details. This model maps from how the program works (programmingdomain) to what is to be done (problem domain). First, you generate hypotheses. After-wards, you evaluate them.

• Bottom-up modelThe programmer recognizes patterns in the program. These patterns are chunked togetherinto bigger structures.

22

Figure 2.9: Comprehension process model [11]

• Opportunistic modelThe programmer makes use of both bottom-up and top-down strategies. Comprehensionis dependent on three complementary features: knowledge base (expertise and backgroundknowledge), mental model (current understanding of the program) and an assimilation pro-cess (procedure used to obtain information from various sources).

Figure 2.10 shows the knowledge domains encountered during the process of comprehension. Itshows the two fundamental processes of software development: composition (production of adesign) and comprehension (understanding of the design). Comprehension is the transformationfrom the programming domain to the problem domain. Thus, it is the reverse of composition.

Figure 2.10: Knowledge domains during comprehension [11]

23

2.9.2 Reverse engineering

In this section, we consider the goals of reverse engineering, the different levels where reverseengineering can be used, the reasons why reverse engineering is necessary, the different types ofrestructuring and the main problems of reverse engineering.

Goals of reverse engineering [11]

• Recovering of lost information

• Facilitate migration between platforms

• Improve or provide documentation

• Provide alternative views (e.g. data flow diagrams, control flow diagrams and entity-relationship diagrams)

• Extract reusable components

• Cope with complexity by abstracting system information

• Detect side effects

• Reduce maintenance effort

Levels of abstraction of reverse engineering

Figure 2.11 shows different levels of abstraction [11]. If the product of a reverse engineeringprocess takes place at the same level as the original system it is called re-documentation, otherwiseit is called design recovery or specification recovery. The goal of re-documentation is to createalternative views (e.g. control flow graphs, class diagrams) of the system and thus to improvethe documentation. The aim of design recovery is to obtain a higher abstraction directly fromthe source code. Specification recovery is required for example when there is a paradigm shift.The switch from structured programming to object-oriented programming for example requiresthe recovery of the specification because the design is not valid any more.

Figure 2.11: Levels of abstraction in reverse engineering [11]

24

Reasons for reverse engineering [11]

• Missing or incomplete design/specification

• Obsolete, incorrect or missing documentation

• Increased program complexity

• Poorly structured source code

• Need to translate a program into a different programming language

• Need to migrate between different software and hardware platforms

• Increasing error rate

• Need for continuous and excessive corrective change

• Need to extend economic life of the system

• Need to make a similar product

Restructuring [11]

• Control-flow-driven restructuringThis is the conversion to a clear control structure and can be done inter-modular (e.g.regrouping physically distant routines in order to obtain a coherent structure) or intra-modular (e.g. restructuring of a ’spaghetti-like’ module).

• Efficiency-driven restructuringThis involves the restructuring of functions and algorithms to make them more efficient.

• Adaption-driven restructuringThis involves the change of the coding style in order to adapt the program to a new pro-gramming language or a new environment (e.g. a new operating system).

Problems of reverse engineering [11]

• AutomationAutomation is not possible at a very high level of abstraction.

• Naming problemIt is difficult to give the results adequate names.

25

2.9.3 Reuse and reuseability

Instead of ’reinventing the wheel’, parts of software systems can be used for different projects.Software reuse can increase the productivity because less time and effort is required to specify,design, implement and test the new system. The quality of the new product tends to be higherbecause the reused components have already been rigorously tested.

What can be reused?

• Code (libraries, high level languages)

• Test data

• Software architecture

• Algorithms

• Knowledge

◦ Process knowledge (application of methodologies, e.g. formal methods, object-orienteddesign, cost estimation models to different problems)

◦ Lessons learned

◦ Domain knowledge obtained in previous similar problems

◦ Product knowledge (data, design, programs)

Benefits of reuse

• Increment of productivity because of the reduction in time and effort spent on specification,design, implementation and testing

• Increment of quality because the reused programs have already been tested and have shownto satisfy the requirements

• Reduction of the maintenance time and effort because reused components are often mucheasier to read, understand and modify

• Increment of the maintainability

Approaches to reuse

• Composition-based reuseThe reused components are atomic building blocks that are assembled to compose the targetsystem. The components retain their basic characteristics even if they have been reused.The UNIX pipe for example is such a mechanism which ’glues’ the components together.A composition mechanism in object-oriented programming is the principle of inheritance.There are two types of reuse:

◦ Black-box reuse. The component is reused without modification. Only informationabout what the component does and not how the component works is required. Theuse of standard libraries is an example of black-box reuse.

◦ White-box reuse. The component is reused after modification. Information of what thecomponent does and how it works is required.

• Generation-based reuseThe reusable components are active entities that are used to generate the target system. Thereused component is the program that generates the target system, for example applicationgenerators, transformation-based systems and language-based systems.

26

Components engineering

The definition of reuse is easy. However, to actively support it is more difficult. There exist twoways to enforce components engineering [11]:

• Design for reuse. Components are designed in a way that they can easily be reused.

• Reverse engineering. The code of existing programs is investigated in order to identifycomponents which can be reused.

Principles which impact reusability [11]

• Generality. The potential use of a component for a wide spectrum of application or problemdomains.

• Cohesion versus coupling. Cohesion (internal property which describes the degree to whichelements within a component are related) should be high, but coupling (external propertywhich characterizes the interdependence between two or more modules) should be low.

• Interaction. The interaction with the user should be minimized.

• Uniformity and standardization. The use of standards promotes the re-usability of softwarecomponents.

• Data and control abstractions. Abstract data types, encapsulation, inheritance, iteratorspromote the re-usability.

• Interoperability. Take advantage of remote services.

Problems with reuse libraries [11]

• Granularity and size dilemma.Low granularity: few functions and thus easy to understand, but restricted in usage.High granularity: many functions and thus confusing.

• Search problem. Difficulty of finding a component that matches the requirements. It maytake more time than writing the component from scratch.

• Classification problem. It is difficult to determine a classification (scheme) for components.

• Specification and flexibility problems. The specification and the constraints of a compo-nent are often difficult to describe. In addition, it is not possible to tell which design andimplementation decisions are fixed and which are flexible.

27

Reuse process model

Figure 2.12 shows a ’generic reuse/reuseability model’ [12,14]. The first step of the model involvesthe understanding of the problem and the identification of a solution structure based on the avail-able components [12]. In the second step, domain experts study the proposed solution and identifyreusable components. In the third step, the reusable components are prepared (modified and/orinstantiated) in order to solve the problem. In the fourth step the components are integrated intothe product. In the fifth step, the components that need to be developed and those componentsthat have been obtained from the adaption are evaluated.

Figure 2.12: Generic Reuse Process Model [11]

Factors that impact upon reuse [11]

• Technical factors

◦ Use of several programming languages

◦ Representation of (design) information

◦ Availability of a reuse library

• Non-technical factors

◦ Initial capital outlay for a reuse library

◦ Not-invented-here factor (software engineers tend to develop their own code rather thanuse other’s code)

◦ Commercial interest (wide circulation of components)

◦ Education of managers (their ability to recognize the potential financial and produc-tivity benefits of reuse)

◦ Degree of inter-project co-ordination (often duplicates)

◦ Legal issues (patents of someone else’s software, liability for faults)

28

Chapter 3

Techniques for Program Analysis

In this chapter, we discuss different techniques which help in the process of program analysis.Every documentation of source code becomes obsolete over time [32]. In particular, it is tooexpensive to manually update a set of different views (e.g. class diagram, interaction diagram andstate diagram) of the same code after changes. Thus, the source code should be the only sourceof information about the behavior and organization of a program in order to avoid outdateddocumentation. In this chapter, we learn how to obtain different views directly from the sourcecode.

Section 3.1 explains the flow propagation algorithm. This algorithm is used in several programanalysis algorithms, e.g., forward slicing and class diagram extraction.

In Section 3.2, we will discuss program slicing. Slicing is a technique that determines thestatements which influence the value of a variable at a certain location in the program. Wewill learn about static and dynamic slicing. Static slicing requires the source code as the onlysource of information. The obtained results hold valid for any program execution. However, staticanalysis often investigates code which is unreachable. It is undecidable to determine if a staticallypossible path is feasible (e.g. if there exists an input value allowing its traversal). Dynamic slicingfocuses on a single execution of a program. Therefore, it requires a trace tool in order to obtaininformation about the objects manipulated and the methods executed. The output obtained froma dynamic slicer hold valid only for the given execution of the program. This output cannot begeneralized for the behavior of the program for any execution. Dynamic analysis is only possiblefor an executable program, but in object-oriented programing often only the analysis of a singlecomponent is required.

In addition, we will learn about forward slicing and hitting sets. Forward slicing is a techniquethat can be used when correcting errors. Forward slicing helps to determine which variables areinfluenced by a change in a certain statement. Hitting sets can be used to determine minimalcauses for failures. The hitting set technique combines the results of slices of different faultyvariables and returns the statements or combinations of statements that are likely to produce thedetected error.

Section 3.3 deals with delta debugging. Delta debugging is a technique that helps to reducethe amount of failure inducing input in case of an error. The approach systematically tests partsof the input against the program and determines a smaller subset which is responsible for thefailure.

The Sections 3.4 and 3.5 deal with the creation of object flow graphs and the reverse engineeringof class diagrams. Object flow graphs are used to describe the flow of objects in object-orientedprograms. They allow the tracing of object information and they can be used to improve classdiagrams.

29

3.1 Flow Propagation Algorithm

The flow propagation algorithm can be applied to many different tasks. For example, it can beused for compiler optimization, forward slicing (see Section 3.2.4) and the improvement of classdiagrams (see Section 3.5).

The algorithm works with a directed graph g. Each node n in the graph g consists of foursets: in(n), out(n), gen(n) and kill(n). in(n) and out(n) store the incoming and outgoing flowinformation of the node n. Each node n generates a set of flow information items which is storedin gen(n). The elements in kill(n) are prevented from being further propagated after node n.According to the application area the sets gen(n) and kill(n) are defined differently. We indicatein the different sections how to compute these sets. Incoming information in(n) is transformedinto outgoing information out(n) by removing the elements in the set kill(n) and adding thosein gen(n). The flow information is propagated as long as any incoming or outgoing informationchanges. This is the pseudo code for the generic flow propagation algorithm (see [1, 32] for adetailed description):

1 for each node n ∈ g2 in(n) = {}3 out(n) = gen(n) ∪ (in(n) \ kill(n))4 end for5 while any in(n) or out(n) changes6 for each node n ∈ g7 in(n) = ∪p∈pred(n)out(p)8 out(n) = gen(n) ∪ (in(n) \ kill(n))9 end for10 end while

If backward propagation is required Line 7 must be changed to in(n) = ∪p∈succ(n) out(p).

Example

In the following example, the values for the gen and kill sets are already defined. This is thecontrol flow graph of the example:

First, we transfer all gen sets to the out sets:

30

Afterwards, we start to calculate the in sets. in(1) remains empty because node 1 does not haveany predecessors. Node 2 has node 1 and node 3 as predecessors, therefore in(2) = out(1)∪out(3):

The nodes 3 and 4 have node 2 as predecessor. Therefore in(3) = out(2) and in(4) = out(2):

Node 5 has node 4 as predecessor. Therefore in(5) = out(4):

As there have been changes in the out sets we have to do a second iteration, where only in(2)changes:

There aren’t any changes in the out sets and we are finished.

31

3.2 Program Slicing

The detection of a fault location is often a very time consuming task when debugging a softwaresystem. Program slicing helps to faster determine the location of a fault.

A program slice is a reduced executable program obtained from a program by removing state-ments. A program slice consists of the parts of the program that may affect the value of a variableat some point of interest. Here are some important definitions in the context of slicing:

Program slice A program slice is a subset of statements and control predicates of a programwhich directly or indirectly affect the values of certain variables at a givenpoint in the program.

Program slicing Program slicing is the task of computing program slices.

Slicing criterion A slicing criterion is a pair (i, V ) where i is a line number in the program andV is the set of variables of interest, e.g. (4, {x}).

Static slice A static slice is computed without making assumptions regarding the program’sinput.

Dynamic slice A dynamic slice relies on a specific test case.

A slice has the following characteristics:

• It is defined for variables at a chosen point in the program.

• It ignores statements which are irrelevant for the variable’s value at the chosen programpoint.

• It focuses on relevant parts and removes irrelevant information.

• It is a subset of the original program and is itself a program.

• It is a projection of the original program into a lower dimensional space and preserves theprogram semantics for the chosen variables.

The following subsections explain different types of slices. The explained algorithms are borrowedfrom [31]. We refer to this paper in the case of any lack of clarity.

3.2.1 Basics

We explain how slicing can be used for imperative languages. In the following subsection, wedefine a language which is used in the examples of the following sections. Afterwards, we presentan algorithm for how to construct control flow graphs from code written in that language.

3.2.1.1 Language definition

We will use the following abstract language in the examples:

P → BB → begin S* end;S → identifier = E;

| if E then B [else B] fi;| while E do B od;

E → num| identifier| (E OP E)| (UOP E)

OP → + | - | * | / | && | || | > | < | == | ...UOP → ! | -

32

Keywords are written in bold. Tokens are written in italic. There exist two types of tokens:identifier and num. They can be composed according to the following rules:

num ::= [ 0|1| ... |9 ]+letter ::= [A | B | ... | Z] | [a | b | ... | z]identifier ::= [letter ] [letter | num]*

Identifier are case sensitive. In addition, there are two predefined functions:

read(identifier): assigns the value of the standard input channel to the identifierwrite(identifier): writes the value of the identifier to the standard output channel

3.2.1.2 Control flow graphs (CFG)

A control flow graph is a representation of all paths that might be traversed during the programexecution. Each node represents a basic block, e.g. a statement. The directed edges representthe jumps in the control flow. Table 3.1 shows the transformation of code written in the languagedescribed above to a control flow graph.

Sequence of statements

1 begin2 Stmt1 ;3 . . .4 StmtN ;5 end ;

If statement

1 begin2 Stmt1 ;3 i f (Cond1) then4 begin5 BranchStmt1 ;6 . . .7 BranchStmtN ;8 end ;9 f i ;

10 Stmt2 ;11 end ;

If-else statement

1 begin2 Stmt1 ;3 i f (Cond1) then4 begin5 BranchStmt1 ;6 . . .7 BranchStmtN ;8 end ;9 else

10 begin11 ElseStmt1 ;12 . . .13 ElseStmtN ;14 end ;15 f i ;16 Stmt2 ;17 end ;

33

While statement

1 begin2 Stmt1 ;3 while (Cond1) do4 begin5 LoopStmt1 ;6 . . .7 LoopStmtN ;8 end ;9 od ;

10 Stmt2 ;11 end ;

Table 3.1: Transformation of code to the CFG

The following algorithm (written in pseudo-code) explains how to obtain a control flow graphfrom a program written in the language described above. The algorithm is not relevant to theexamination. However, it is conducive to the comprehension of the nesting of the different parts.

P is a queue containing the single tokens of the program. Assume the function NextToken()returns the next token of the program. A token can either be a single keyword (e.g. while) or alltokens before the next keyword (e.g. identifier = E).

The algorithm does not check the syntax of the program. It takes a valid program structurefor granted. The result of the algorithm for invalid program is undefined.

1 Pair<List , L i s t> createCFG ( Queue program )2 {3 List<Node> Nodes = new List<Node>() ;4 Li s t<Pair<Node , Node>> Edges = new List<Pair<Node , Node>>();5 Queue<Token> P = program ;67 Node startNode = new StartNode ( ) ;8 Nodes . Add( startNode ) ;9 Li s t<Node> be f o r e = createNodes (new List<Node>( startNode ) ) ;

10 Node endNode = new EndNode ( ) ;11 Nodes . Add( endNode ) ;12 fo r each Node n in be f o r e13 Edges . add (n , endNode ) ;14 return new Pair<List , L i s t >(Nodes , Edges ) ;15 }1617 Lis t<Node> createNodes ( Lis t<Node> be f o r e )18 {19 P. nextToken ( ) ; // b e g i n ;

20 Lis t<Node> a f t e r ;21 Token token = P. NextToken ( ) ) ;22 while ( token != END)23 {24 a f t e r = new List<Node>() ;25 i f ( token == IF )26 a f t e r = createNodesI fStatement ( be f o r e ) ;27 else i f ( token == WHILE)28 a f t e r = createNodesWhileStatement ( be f o r e ) ;

34

29 else30 {31 Node stmt = new Node ( token ) ;32 Nodes . Add( stmt ) ;33 fo r each Node n in be f o r e34 Edges . Add(n , stmt ) ;35 a f t e r . Add( stmt ) ;36 }37 token = P. NextToken ( ) ;38 be f o r e = a f t e r ;39 }40 return a f t e r ;41 }4243 Lis t<Node> createNodesI fStatement ( Li s t<Node> be f o r e )44 {45 Node branchNode = new BranchNode (P. NextToken ( ) ) ;46 Lis t<Node> a f t e r = new List<Node>() ;47 Nodes . Add( branchNode ) ;48 fo r each Node n in be f o r e49 Edges . Add(n , branchNode ) ;50 P. NextToken ( ) ; // t h e n

51 a f t e r . Add( createNodes (new List<Node>(branchNode ) ) ) ;52 Token token = P. NextToken ( ) ; // c o u l d be e l s e o r f i

53 i f ( token == ELSE)54 {55 a f t e r . Add( createNodes (new List<Node>(branchNode ) ) ) ;56 token = P. NextToken ( ) ; // f i

57 }58 else59 a f t e r . Add( branchNode ) ;60 return a f t e r ;61 }6263 Lis t<Node> createNodesWhileStatement ( Lis t<Node> be f o r e )64 {65 Node branchNode = new BranchNode (P. NextToken ( ) ) ;66 Nodes . Add( branchNode ) ;67 List<Node> a f t e r = new List<Node>() ;68 a f t e r . Add( branchNode ) ;69 P. NextToken ( ) ; // do

70 be f o r e . Add( createNodes (new List<Node>(branchNode ) ) ) ;71 fo r each Node n in be f o r e72 Edges . Add(n , branchNode ) ;73 P. NextToken ( ) ; // od

74 return a f t e r ;75 }

35

3.2.2 Static slicing

A static slice SC of a program P on the slicing criterion C = (i, V ) is any program with thefollowing properties: (1) SC can be obtained from P by deleting zero or more statements from P .(2) SC has the same values for the variables in V as P for any input I. In this section, we willdiscuss two different approaches which can be used to compute static slices. Afterwards, we willdiscuss some problems in the field of static slicing.

3.2.2.1 Static slices with relevant variables

Before we explain the algorithm, we introduce basic definitions required in the algorithm:

• A definition set DEF (n) denotes the set of variables which are defined at Node n.

• A reference set REF (n) denotes the set of variables which are referenced at Node n.

• The predecessor PRE(n) is the statement before Statement n.

With these definitions, we construct a first version of a static slicing algorithm.

Static slicing algorithm (Version 1):Requires: A program Π, and a slicing criterion C = (i, V ).Ensures: A static slice.

1. Compute the relevant variables RC backwards (begin with Statement i and end with theStatement 1) as follows:

RC(n) =

all v ∈ V if n = i Rule 1all v ∈ REF (n) ∪ (RC(m) \DEF (n)) if ∃{w|w ∈ DEF (n) ∧ w ∈ RC(m) Rule 2.(a)

∧n ∈ PRE(m)}all w ∈ RC(m) where w /∈ DEF (n) ∧ n ∈ PRE(m) Rule 2.(b)

The statement n ∈ PRE(m) is equivalent to n →CFG m. This means that Line n is apredecessor of m in the control flow graph.

2. Add all statements n where RC(m) ∩DEF (n) 6= 0 with n ∈ PRE(m) to the slice SC .

RC is a function mapping statements to variables, SC is a set of statements. The slice S comprisesall statements where a variable is defined which is in the relevant variables of one of its successors.Note that Rule 1 defines the base case of the algorithm.

Example 1

1 begin2 z = 43 y = z + 14 x = 5 + z5 end ;

Slicing criterion: (5,{x})

n PRE(n) REF (n) DEF (n) R(5,{x})(n) Notes S(5,{x})2 - - {z} - x

3 2 {z} {y} {z} Rule 2.(b)

4 3 {z} {x} {z} Rule 2.(a) x

5 4 - - {x} Rule 1.

Table 3.2: Slice for Example 1

36

Control flow statements

The algorithm described above works fine if there are no control flow statements in the program.Let us consider Example 2 on the next page with a simple if statement in Line 3. It is obviousthat this statement must be part of the static slice, but according to our algorithm the static slicefor the slicing criterion (9, {z}) is {2, 5, 8}. We have to improve our algorithm in order to enablecontrol flow statements in programs.

Through the introduction of control flow statements, statements can have more than onepredecessor. We define PRE(n) as the set of predecessors of statement n.

The influence INFL(b) is the set of statements which are on the path P from b to itsnearest inverse dominator d, excluding the endpoints of P . In the case of our language definedin Section 3.2.1.1, we can simplified say that the statements in the branches of an if or whilestatement are influenced by the control statement. Lets consider the code of Example 2: Statement3 is the branch statement b. The nearest inverse dominator d is the statement in Line 7. Allstatements between b and d are influenced by b: INFL(3)={5}.

If a statement j is in the slice SC and j ∈ INFL(b) then b must be a part the slice SC . We callthe initial slice (without consideration of control flow statements) S0

C and the subsequent slice S1C .

This is the improved algorithm for static slicing (Version 2):

1. Compute the directly relevant variables RC and the slice S0C .

2. Compute the control flow statements influencing the variables which are already in the slice:

BC =⋃b∈P

{b|j ∈ S0C , j ∈ INFL(b)}

3. Compute the next slice S1C as follows: S1

C = S0C ∪BC

Example 2

1 begin2 x = 1 ;3 i f ( y > 0) then4 begin5 z = 4 + x ;6 end ;7 f i ;8 z = z + y ;9 end ;

Slicing criterion: (9,{z})

n PRE(n) REF (n) DEF (n) R(9,{z})(n) S0(9,{z})(n) INFL(n) B S1

(9,{z})(n)

2 - - {x} {y, z} x x3 2 {y} - {x, y, z} 5 x x5 3 {x} {z} {x, y} x x8 3,5 {y, z} {z} {y, z} x x9 8 - - {z}


R(9,{z})(3) is the union of the relevant variables of its successors (Line 5 and Line 8). We investigatewhich statements are part of the slice: First we build the intersection of DEF (2) with its successor(line 3): DEF (2) ∩ R(9,{z})(3) = {x}, so Line 2 is part of the slice. The statement in Line 3 hastwo successors - Line 5 and Line 8: DEF (3)∩R(5) = {} and DEF (3)∩R(9,{z})(8) = {}, because

37

both intersections are empty, Line 3 is not part of the slice. DEF (5)∩R(9,{z})(8) = {z}, so Line 5is part of the slice. DEF (8) ∩R(9,{z})(9) = {z}, therefore Line 8 is part of the slice.

The result of the first iteration S0(9,{z})(n) is {2, 5, 8}. The behavior of this program does

not correspond to the behavior of the original program according to the slicing criterion (9,{z}).We have to take the range of influence of the branch statement in Line 3 into consideration.Statement 5 is in the slice S0

(9,{z})(n) and in the range of influence of statement 3 (INFL(3) = {5}).Therefore the whole program is in the slice.

Example 3

1 begin2 r = 0 ;3 i = 0 ;4 while ( i < n) do5 begin6 i f (A[ i ] == 1) then7 begin8 r = r + 1 ;9 end ;

10 f i ;11 i = i + 1 ;12 end ;13 od ;14 end ;

Slicing criterion: (14,{i})

n PRE REF DEF R(14,{i}) S0(14,{i}) INFL B S1

(14,{i})2 - - {r} -3 2 - {i} - x x4 3,11 {i, n} - {i} 6,8,11 x x6 4 {A, i} - {i} 88 6 {r} {r} {i}11 6,8 {i} {i} {i} x x14 4 - - {i}


Our improved algorithm works well for Example 2 and Example 3, but lets consider Example 4.According to our algorithm we obtain the slice S1

C = {3, 4, 11}. Since Line 12 influences the resultof the condition in Line 4, Line 12 should be part of the slice. Therefore, we have to modify ouralgorithm: We have to compute the slice for the variables of the added branch conditions andadd them to the slice. Since variables in branch statements may be defined in other branches, wehave to make several iterations. We introduce a counter variable i for the iterations which will beincreased as long as Bi 6= Bi−1.

38

This is the improved algorithm for static slicing (Version 3):

1. Compute the directly relevant variables R0C and the slice S0

C .

2. For i ≥ 0:

(a) Compute the branch statements influencing the variables which are already in the slice:

BiC =

⋃b∈P

{b|j ∈ SiC , j ∈ INFL(b)}

(b) Compute the statements which influence the variables referenced in BiC (indirectly

relevant variables) and union them with the relevant variables of the previous iteration:

Ri+1C (n) = Ri

C(n) ∪⋃

b∈BiC

R0(b,REF (b))(n)

(c) The slice of the next iteration consists of the statements in BiC and the statements

which redefine one of the variables of interest:

Si+1C = Bi

C ∪{Ri+1

C (m) ∩DEF (n) 6= 0 where n ∈ PRE(m)}

Example 4

1 begin2 r = 0 ;3 i = 0 ;4 while ( i < n) do5 begin6 i f (A[ i ] == 1) then7 begin8 r = r + 1 ;9 end ;

10 f i ;11 i = i + 1 ;12 n = n − 1 ;13 end ;14 od ;15 wr i t e ( i ) ;16 end ;

Slicing criterion: (15,{i})

n PRE REF DEF R0(15,{i}) S0

(15,{i}) INFL B0 R0(4,{i,n}) S0

(4,{i,n}) S1(15,{i})

2 - - {r} - {n}3 2 - {i} - x {n} x x4 3,12 {i, n} - {i} 6,8,11,12 x {i, n} x6 4 {A, i} - {i} 8 {i, n}8 6 {r} {r} {i} {i, n}11 6,8 {i} {i} {i} x {i, n} x x12 11 {n} {n} {i} {i, n} x x15 4 {i} - {i} -


In this example, it is important to consider statement 3 and 12 as predecessors of statement 4.Otherwise statement 12 would not be in the slice.

39

Example 5

1 begin2 a = 0 ;3 i = 0 ;4 while ( i < n) do5 begin6 a = b ;7 b = i ;8 i = i + 1 ;9 end ;

10 od ;11 wr i t e ( a ) ;12 end ;

Slicing criterion: (11,{a})

n PRE REF DEF R0(11,{a}) S0

(11,{a}) INFL B0 R0(4,{i,n}) S0

(4,{i,n}) S1(11,{a})

2 - - {a} {b} x {n} x3 2 - {i} {a,b} x {n} x x4 3,8 {i, n} - {a,b,i} 6,7,8 x {i, n} x6 4 {b} {a} {b,i} x {i, n} x7 6 {i} {b} {a,i} x {i, n} x8 7 {i} {i} {a,b,i} x {i, n} x x11 4 {a} - {a}


Now, we investigate in detail the values of column R0(11,{a}). First, we write the variables

of the slicing criterion into Line 11. The predecessor of Line 11 is Line 4 (see CFG). BecauseDEF (4) ∩ R(11) = {}, we can carry the relevant variables of Line 11 over to Line 4. Thus, R(4)is {a}. DEF (8) ∩ R(4) = {}, therefore, we carry the relevant variables of Line 4 over to one ofits predecessors - Line 8 (R(8) = {a}). The same case is valid for the transition from Line 8 toLine 7 (R(7) = {a}). DEF (6)∩R(7) = {a}, so we have to replace the variable a through variableb (R(6) = {b}). R(4) is composed of the relevant variables of its successors (Line 6 and 11) andtherefore R(6) ∪R(11) = {a, b}.

We have to iterate through the loop until there aren’t any more changes. In the next iteration,we have the following relevant variables: R(8) = {a, b}, R(7) = {a, i}, R(6) = {b, i} and R(4) ={a, b, i}. This are the relevant variables for the third iteration: R(8) = {a, b, i}, R(7) = {a, i},R(6) = {b, i} and R(4) = {a, b, i}. We can stop with the iteration of the loop and continue withthe calculation of the relevant variables of Line 3.

40

3.2.2.2 Minimal slice

A slice S of a program P on the slicing criterion C is statement-minimal if no other slice of Pon C has fewer statements then S. There exists no algorithm for finding state-minimal slices forarbitrary programs, because this problem is similar to the halting problem [33]. The followingexample illustrates that:

1 begin2 A = constant ;3 while (K < 10) do4 begin5 i f (C = 0) then6 begin7 B = A;8 X = 1 ;9 end ;

10 else11 begin12 C = B;13 Y = 2 ;14 end ;15 f i ;16 K = K + 1 ;17 end ;18 od ;19 Z = X + Y;20 wr i t e (Z)21 end ;

Statement 2 will be in S(20,{Z}) but cannot affect the values of Z because any path by which A atLine 2 influences C at 5 will also execute both statements 8 and 13, resulting in a constant valuefor Z at Line 20.

41

3.2.2.3 Static slicing with of program dependence graphs (PDG)

A program dependence graph is a directed graph for a single procedure in a program. The nodesare the statements of the program and the edges are data and control dependencies. Node j isdata dependent on node i if there exists a variable x such that

• x ∈ DEF (i),

• x ∈ REF (j) and

• there exists a path P from i to j without intervening definitions of x.

Alternatively, we can say that the definition of x at node i is a reaching definition for node j.Control dependence is usually defined in terms of post-dominance: A node i is post-dominatedby a node j if all paths from node i to the end of the program pass trough j. A node j is controldependent on a node i if:

• there exists a path P from i to j such that any u /∈ i, j in P is post-dominated by j and

• i is not post-dominated by j.

We can determine control dependencies in programs with a structured control flow in a simplesyntax-directed manner: the statements in the branches of an if or while are control dependenton the control predicate.

The slicing criterion is defined with a vertex v in the PDG, where v corresponds to the linenumber n in the slicing criterion (n, V ). V is the set of all variables used at v. For single-procedureprograms, the slice with respect to v consists of all nodes from which v is reachable as well allnodes from which the last definition nodes of v are reachable.

Continuation of Example 4

We construct the program dependency graph for Example 4 and mark the slicing criterion, i.e.node 15. write(i), as well as all nodes defining the variables in the slicing criterion reachingthe end of the program. These are node 3. i = 0, 11. i = i + 1 and ENTRY:

We mark all nodes which reach node 15. write(i), 3. i = 0 and 11. i = i + 1, i.e. wefollow the edges in a backward direction and mark the node 4. while(i<n):

42

Finally, we mark the node 12. n = n - 1, because 4. while(i<n) is data dependent on it:

Example 6

1 begin2 a = 0 ;3 a = 2 + x ;4 i f ( a > z ) then5 begin6 a = z ;7 b = 2 ;8 end9 else

10 begin11 b = a − 2 ;12 end13 end ;14 end ;

There are no data dependencies between the then- and the else-branch as only one of thesebranches can be executed. Consider the nodes 6. a = z and 11. b = a - 2. There is no edgebetween 6. a = z and 11. b = a - 2, as the value of a in the then-branch cannot influencethe value of b in the else-branch.

Imagine we would like to compute the static slice for the slicing criterion (14,{a}). There is nonode in the PDG corresponding to line 14. Thus, we just mark all last definitions of a, i.e. node3. a = 2 + x and 6. a = z, reaching the last line of the program. Note that 2. a = 0 isnot marked, as the definition is overwritten by the one in line 3. Tracing the edges in a backwarddirection we receive the slice {3,4,6}.

43

PDG for the example from Section 3.2.2.2

The following figure shows the PDG for the example from Section 3.2.2.2 and highlights twoaspects:

• In a PDG, it is not possible to distinguish between statements of the then- and the else-branch. They are just arranged side-by-side.

• Have a look at the data dependency from Node 7 to Node 12. Usually, data dependenciesbetween the then- and the else-branch are not possible, because only one of branches isexecuted. However, in this example the if-construct is embedded into a loop. Therefore, itis possible to execute both branches of the if-construct.

44

3.2.2.4 Problems of static program slicing

In this lecture, we focus on the basic concepts of program slicing. There are many problems whichwe have not discussed:

• Function calls (call-by-value, call-by-reference)

• Presence of other control flow statements (e.g. goto, break, continue)

• Presence of composite data types (e.g. an array): a conservative solution is to regard anupdate of/access to one element as an update of/access to the whole composite data type

• Aliasing/pointers (two or more variables refer to the same memory location)

• Interprocess communication: Distributed/concurrent programs

3.2.2.5 Conclusion

Static slices can be computed offline at compile time. They provide relevant information of aprogram, but they do not help in all cases and often contain information that is not relevant to aspecific test case.

45

3.2.3 Dynamic Program Slicing

Korel and Laski [15] introduced the basic ideas behind dynamic program slicing. A dynamic sliceis an executable part of a program whose behavior is identical to the behavior of the originalprogram for a certain input and a certain variable at a specific program location. As only theexecution trace of the test case is considered the slice can be computed very fast and the resultingslice can be significant smaller than the corresponding static slice.

This section is organized as follows: First, we discuss the main reason for moving from staticto dynamic program slicing by means of a small example. In Section 3.2.3.1, we learn about basicdynamic slicing. There exist several extensions to dynamic slicing. We will deal with one of thoseextensions (Relevant slicing [40]) in Section 3.2.3.3.

Example 1

This small example program implements the addition of two variables x and y using only anincrement operator (represented as + 1 in the program).

1 begin2 i = 0 ;3 r e s u l t = x ;4 while ( i < y ) do5 begin6 r e s u l t = r e s u l t + 1 ;7 i = i + 1 ;8 end ;9 od ;

10 . . .11 end ;

For showing that static slicing does not always provide a good solution, we have to introduce abug into the program. Assume that we wrote result = 0; instead of result = x; in Line 3 ofthe program. The test case x=3 and y=0 with the expected outcome result=3 reveals the bug,because the faulty variant computes 0 instead of 3 for the variable result.

Figure 3.1: Program dependence graph for Example 1

If we want to use static slicing for debugging, we have to compute the static program slice forall variables where the computed value differs from the expected one. For our example, the staticslicing criterion is (11, {result}). Figure 3.1 shows the program dependence graph (PDG) for theprogram. Unfortunately, all nodes are part of the static slice. Thus, static slicing does not provideany useful information. Hence, there is a need for improvement and dynamic slicing is a proposalfor reducing the size of slices taking into account a particular test case.

46

3.2.3.1 Basic definitions

We start with the definition of execution traces, which represent one run of a given programand one particular test case. Execution traces comprise the executed statements. Executiontraces represent a program run in the real order of executions. Therefore, an execution trace isrepresented as a sequence and not as a set.

Definition 1 (Execution trace) An execution trace for a program Π and a test case TC is asequence 〈s(1)1, . . . , s(k)k〉 where s(i) ∈ Π is a statement that has been executed when running Πon the given inputs of test case TC. The index i is used to further indicate the execution order,i.e., 1 indicates the first executed statement, 2 the second, and so on.

Example 1.a)

Consider the test case TC: x=3 and y=0 with the expected outcome result=3 for Example 1.The obtained execution trace contains four statements:

(2. i=0)1,(3. result=0)2,(4. while (i<y))3,(10. )4

We ignore syntactical constructs like end or od and do not represent them in the executiontrace because they do not directly contribute to the computation of values.

From the execution trace, we easily obtain a directed graph called execution trace graph, whichis used for computing dynamic program slices. The nodes of the graph are the elements of theexecution trace. The directed edges of the graph are the data and control dependencies of theexecution trace.

Definition 2 (Execution Trace Graph) An execution trace graph (ETG) for a given executiontrace is a directed graph (VE , AE) where VE is the set of nodes, which is equivalent to the set ofelements in the execution trace, and AE is a set of edges n → m, where n,m ∈ VE, representingthe data and control dependencies of the execution trace.

For the purpose of computing the data dependencies, we use the functions DEF and REF ,where DEF returns a set of variables defined in a statement, and REF returns a set of vari-ables referenced or used in the statement. DEF returns the empty set for loop and conditionalstatements. Using this functions, we define data dependencies as follows:

Definition 3 (Data dependency) Given an execution trace 〈s(1)1, . . . , s(k)k〉 for a program Πand a test case T . An element of the execution trace s(j)j is data dependent on another elements(i)i, i.e., s(i)i →D s(j)j, iff there exists a variable x in DEF (s(i)i), which is used in s(j)j, i.e.,x ∈ REF (s(j)j), and there exists no element s(k)k, i < k < j, in the execution trace where x isdefined.

s(i)i →D s(j)j ⇔DEF

∃x ∈ DEF (s(i)i) :(x ∈ REF (s(j)j) & 6 ∃k : i < k < j & x ∈ DEF (s(k)k)

)On the one hand, we have data dependencies capturing the data flow through the program.

On the other hand, we have control dependencies representing the necessary control knowledgefor computing dynamic slices. In contrast to control flow graphs representing all possible pathsthrough a program, we now have a concrete execution run. In this run, the sequence of statementsis given as defined in the programming language’s semantics. In our limited language, we haveonly two different control dependencies: while statements and if-then-else statements. In thefollowing definition, the term test elements refers to an element of an execution trace where thecorresponding statement is either a while-statement or an if-then-else statement.

47

Definition 4 (Control dependency) Given an execution trace 〈s(1)1, . . . , s(k)k〉 for a pro-gram Π and a test case TC. An element of the execution trace s(j)j is control dependent ontest element s(i)i, i.e., s(i)i →C s(j)j, iff the execution of s(i)i causes the execution of s(j)j.

In the above definition, the term cause has to be interpreted very thoroughly. If the conditionof the while-statement executes to TRUE, then all statements of the sub-block of the while-statement are control dependent. If the condition evaluates to FALSE, no statement is controldependent because the first statement after the while is always executed regardless of the evaluationof the condition (except in the case of an infinite loop, which is not in our scope). For if-then-elsestatements the interpretation is similar. If its condition evaluates to TRUE, then the statementsof the then-block are control dependent on the conditional statement. If it evaluates to FALSE,the statements of the else-block are considered. In case of nested while-statements or if-then-else statements, the control dependencies are not automatically assigned for the blocks of theinner while-statements or if-then-else statements. Table 3.7 summarizes the control dependencecomputation for the i-th statement of the execution.

while E do begin E evaluates to TRUE Ei →C si+11

s1 . . . sn . . .end od Ei →C si+n

n

if E then begin E evaluates to TRUE Ei →C ti+11

t1 . . . tn . . .end else begin Ei →C ti+n

n

e1 . . . em E evaluates to FALSE Ei →C ei+n+11

end fi . . .Ei →C ei+n+m

m

Table 3.7: Computation of control dependence

Using the definitions of data and control dependencies, we are now able to formally specify alldirected edges of an execution trace graph.

n→ m⇔DEF n→D m ∨ n→C m

Continuation of Example 1.a)

The execution trace graph of the execution trace from Example 1 comprises the nodes

VE = {(2. i=0)1, (3. result=0)2, (4. while (i<y))3, (10. )4}

and the following edges:

AE = {(2. i=0)1 → (4. while (i<y))3}.

In this example, there is only one dependency in the execution trace and therefore only oneedge in the corresponding graph. This dependency is a data dependency because the variable i isdefined in node (1. i=0)1 and used in node (3. while (i<y))3.

48

Example 1.b)

Consider again the program introduced in Example 1 together with test case TC ′: x=3 and y=2with the expected outcome result=5. When we apply the introduced theory we obtain thefollowing execution trace:

(2. i=0)1,(3. result=0)2,(4. while (i<y))3,(6. result=result+1)4,(7. i=i+1)5,(4. while (i<y))6,(6. result=result+1)7,(7. i=i+1)8,(4. while (i<y))9,(10. )10

Figure 3.2 shows the graphical representation of the corresponding execution trace graph.

Figure 3.2: Execution trace for Example 1.b) enhanced with data and control dependencies

We are able to directly use the execution trace graph of an execution trace obtained fromexecuting a program Π using a test case T for computing a dynamic program slice. We only needto adapt the idea of the computation of static program slices from program dependence graphs.Hence, marking a relevant node and traversing the graph backward marking each reachable nodeis sufficient. What are the relevant nodes? In contrast to static slicing where several nodes mightbe marked, in dynamic slicing there is only one node where a variable of interest is defined the lasttime. Hence, only this corresponding node needs to be taken into account. What is still missingis the definition of a slicing criterion for which a slice should be computed.

Definition 5 (Dynamic slicing criterion) A dynamic slicing criterion is a tuple (TC, In, x)where TC is a test case, I is a statement of interest, n is the index of the statement and x is avariable. Note that n is not the line number of the statement but indicates the node of the executiontrace that is of interest.

The following algorithm formalizes the computation of dynamic slices.

49

Algorithm DynamicSliceRequires: An execution trace ET = 〈s(1)1, . . . , s(m)m〉 of a program Π and a slicing criterion(TC, In, x), n ≤ m.Ensures: A dynamic slice.

1. Compute the execution trace graph ETG for ET using the definitions of →C and →D.

2. Mark the node s(k)k in ETG where x ∈ DEF (s(k)k) and there is no s(i)i, k < i ≤ n,x ∈ DEF (s(i)i) in ETG.

3. Traverse the graph ETG from the marked node in the reverse direction of the arrows inETG until no new nodes can be marked.

4. Let S be the set of all marked nodes.

5. Return the set {s(i)|s(i)i ∈ S} as result.

Note that DynamicSlice returns only elements of the program Π. Hence, the returned set issmaller or equal to the marked nodes in the execution trace graph.

Continuation of Example 1.a)

The following dynamic slice will be obtained for the slicing criterion (TC, 104, result), which onlycomprises the faulty statement:

1. begin

3. result = 0;

10. ...

11. end;

Continuation of Example 1.b)

From the execution trace of Figure 3.2, DynamicSlice returns the whole program as dynamicprogram slice for (TC ′, 1010, result).

The complexity of DynamicSlice is of order O(n+m) where n is the number of nodes and m thenumber of vertices of the execution trace graph. This is due to the fact that every node and everyarrow has to be traversed in the worst case. The size of the execution trace graph depends onthe number of executed statements, which is of order O(I) where O(I) represents the complexityof the program. Therefore, the complexity of computing dynamic slices for programs and a giventest case is also of order O(I).

50

3.2.3.2 Terminating slices

As stated in the introduction, a dynamic slice is an executable part of a program whose behavioris identical to the behavior of the original program for a certain input and a certain variable ata specific program location. In cases where a loop is executed only once, it’s possible that theresulting slice does not terminate, as the following example demonstrates.

Example 2

Consider the following code fragment with respect to the slicing criterion ({n = 0, x = 1}, 96, {x}):

1 begin2 i = 0 ;3 while ( i <= n) do4 begin5 x = x + 1 ;6 i = i + 1 ;7 end8 od9 . . .

10 end

The resulting trajectory is:

(2. i=0;)1,(3. while(i<=n))2,(5. x=x+1;)3

(6. i=i+1;)4

(3. while(i<=n))5

(9. )6

The obtained dynamic slice is:

1 begin2 i = 0 ;3 while ( i <= n) do4 begin5 x = x + 1 ;6 end7 od8 end

It is obvious that the resulting program does not terminate. Thus, the dynamic slice is not identicalto the behavior of the original program for the given input for variable x.

We can avoid the problem of non-terminating slices by introducing an additional dependency.

Definition 6 (Symmetric dependency) Given an execution trace 〈s(1)1, . . . , s(k)k〉 for a pro-gram Π and a test case T . An element of the execution trace s(j)j is symmetric dependent onanother element s(i)i iff s(j) = s(i) and iff s(j) is a while-statement.

n→S m⇔DEF s(m) = s(n) ∧ s(n) is a while− statement

We improve the previous algorithm by adding the transitions →S to the (extended) executiontrace graph.

51


The extended execution trace graph of Example 2 is shown in Figure 3.3.

Figure 3.3: Extended execution trace graph for Example 2

By applying the algorithm to the new extended execution trace graph, we obtain the followingslice:

1 begin2 i = 0 ;3 while ( i <= n) do4 begin5 x = x + 1 ;6 i = i + 1 ;7 end8 od9 end

This slice contains the statement responsible for updating the counter variable of the loop. Theprogram could terminate and is thus equivalent to the original program with respect to the slicingcriterion.

In this section, we introduced dynamic slicing and showed that the results obtained for oneexample are better than those obtained when using static slicing. There is still an open question:Does dynamic slicing always perform better than or equal to static slicing with respect to the sizeof the returned slice? The answer is yes, because static slicing considers all possible program runs.Hence, all dependencies whether they are really used in one run or not, are taken into accountfor static slicing. There are more arrows in the program dependence graph of static slices andtherefore it’s more likely that a node is reachable. As a consequence static slicing leads to largerslices than dynamic slicing.

52

3.2.3.3 Relevant slicing

Does a computed dynamic slice always contain the statement that is faulty? At first glance, thisquestion seems to be somehow awkward because the computation of slices uses control and datadependencies. Therefore, we expect that all necessary dependencies are captured and that faultystatements are contained in slices. Unfortunately, this is not the case as we will see in the followingexample.

Example 3

Consider the program used in Example 1 again, but this time with Line 2 as faulty. Instead ofi = 0; Line 2 comprises the statement i = 1;.

1 begin2 i = 1 ;3 r e s u l t = x ;4 while ( i < y ) do5 begin6 r e s u l t = r e s u l t + 1 ;7 i = i + 1 ;8 end ;9 od ;

10 . . .11 end ;

Moreover, let us assume to run the program on the following test case TC ′′: x=2, y=1, andresult=3. The program returns result=2, which contradicts test case TC ′′. The execution tracefor TC ′′ is:

(2. i=1)1,(3. result=x)2,(4. while (i<y))3,(10. )4

After constructing the corresponding execution trace graph and the applying the dynamic slicingalgorithm, we obtain only node (3. result=x)2 as marked. The corresponding slice is:

1 begin2 r e s u l t = x ;3 . . .4 end ;

This dynamic slice is small but unfortunately, it does not contain the faulty line. What is thereason for the observation obtained in Example 3? The fault in Line 2 prevents the condition inLine 4 to be evaluated to TRUE, and therefore the block of the while-statement is not executed.In this block, we have a statement that re-defines the variable result. If this statement would beexecuted, then the missing dependency, which finally causes Line 2 not to be in the slice, wouldbe available and Line 2 would be element of the slice. When using static slicing this dependencyis always present. Therefore, we do not have a similar problem in static slicing.

A solution for the described problem is to consider while-statements or if-then-else statementsin the dynamic slice when there exists a statement in a sub-block that re-defines a relevant variable.Zhang et al. [40] introduced the described improvement and calls the solution relevant slicing. Inthe following, we introduce the basic concepts of relevant slicing and extend our definitions andalgorithms for this purpose.

Therefore, we introduce a new dependency, the potential data dependency. This dependencyis between a test node (i.e., the condition in a while-statement or if-then-else statement) and a

53

node where a variable is used that is defined somewhere in the not executed block of the while-statement or if-then-else statement. In order to define potential data dependency, we introducethe concept of potential relevant variables for conditional and while statements. This informationcan be directly obtained statically from the source code.

Definition 7 (Potential relevant variables) Given a while-statement or an if-then-else state-ment n. PR (potential relevant variables) is a function mapping the statement and a boolean valueto the set of variables which would be defined in a sub-block of n if the condition of n evaluates toa different value.

The definition above requires all defined variables to be element of the set of potential relevantvariables under a certain condition: If there are other while-statements or if-then-else statementsin a sub-block, the defined variables of all their sub-blocks must be considered as well. Table 3.8summarizes the definition of potential relevant variables:

Statement n Condition E Potential relevant variables PRwhile E do begin TRUE PR(n, TRUE) = ∅Send od FALSE PR(n, FALSE) = {m|m defined in S }if E then begin TRUE PR(n, TRUE) = {m|m defined in S2}S1

end else begin FALSE PR(n, FALSE) = {m|mdefined in S1}S2

end fi

Table 3.8: Summary of potential relevant variables


The while statement in Line 4 of the program has the following potential relevant variables:PR(4. while (i<y), TRUE) = ∅ and PR(4. while (i<y), FALSE) = {result, i}.

From the definition of potential data dependency set, we define potential data dependenciesstraightforward.

Definition 8 (Potential data dependency) Given an execution trace 〈s(1)1, . . . , s(k)k〉 for aprogram Π and a test case TC. An element of the execution trace s(j)j is potentially data dependenton a test element s(i)i which evaluates to TRUE (FALSE) , i.e., s(i)i →P s(j)j, iff there is avariable x ∈ PR(s(i), TRUE) (x ∈ PR(s(i), FALSE)) that is referenced in s(j)j and not re-defined between i and j.

54

Example 4

Consider the following program that wrongly implements the multiplication of two integer variablestaking care of the signs.

1 begin2 i = 1 ; // BUG : S h o u l d be i = 0 ;

3 r e s u l t = 0 ;4 i f ( y < 0) then5 begin6 s i g = −1;7 end ;8 else9 begin

10 s i g = 1 ;11 end ;12 f i ;13 y = y ∗ s i g ;14 while ( i < y ) do15 begin16 r e s u l t = r e s u l t + x ;17 i = i + 1 ;18 end ;19 od ;20 r e s u l t = r e s u l t ∗ s i g ;21 . . .22 end ;

Given the faulty program, the test case TCM : x = 3, y = −1, and result = −3 reveals thebug. In this case, the program returns the value 0 for variable result because the condition ofthe while-statement in Line 14 evaluates to FALSE. For TCM , we obtain the following executiontrace:

(2. i=1;)1,(3. result=0;)2,(4. if (y < 0))3 PR(3, true) = {sig}(6. sig=-1;)4

(13. y=y*sig;)5

(14. while (i<y))6 PR(14, false) = {result, i}(20. result=result*sig;)7

(21. )8

In addition to the data and control dependencies, we have a potential data dependency betweenthe element (14. while (i<y))6 and (20. result=result*sig;)7, i.e.,(14. while (i<y))6 →P (20. result=result*sig;)7.

Please note that there could be additional potential data dependency added from (4. if (y < 0))3

to (13. y=y*sig;)5 and (20. result=result*sig;)7. However, since the then-branch of theif-then-else construct also changes the variable sig, this is not necessary.

The new potential data dependency can be used in an execution trace graph. We call suchan execution trace graph an extended execution trace graph in order to distinguish it from theordinary execution trace graph.

Definition 9 (Extended execution trace graph) An extended execution trace graph is an ex-ecution trace graph where the potential data dependency are represented as arrows.

55


The extended execution trace graph for the program is depicted in Figure 3.4. This graph com-prises three additional vertices representing potential data dependencies between the if-then-elsestatement (Line 4) and the while-statement (Line 14) in the program.

Figure 3.4: Extended execution trace graph for Example 4

Similar to extended execution trace graphs, we extend the dynamic slicing algorithm.

Algorithm RelevantSliceRequires: A program Π and a slicing criterion (TC, In, x), n ≤ m.Ensures: A relevant slice.

1. Compute the extended execution trace graph EETG for program Π and test case TC usingthe definitions of control dependencies (→C), data dependencies (→D), and and potentialdata dependencies →P .

2. Mark the node s(k)k in EETG where x ∈ DEF (s(k)k) and there is no s(i)i, k < i ≤ n,x ∈ DEF (s(i)i) in EETG.

3. Mark all test nodes between s(k)k and s(n)n, which evaluate to the boolean value B andwhere x ∈ PR(s(t), B) with k < t < n.

4. Traverse the graph EETG from the marked node in the reverse direction of the arrows untilno new nodes can be marked.

5. Let S be the set of all marked nodes.

6. Return the set {s(i)|s(i)i ∈ S} as result.

There are only two changes of the original dynamic slicing algorithm in order to obtain theRelevantSlicing algorithm:

• Instead of the execution trace graph, we are now using the extended execution trace graph(see Step 1).

• Test nodes that potentially re-define the variable of interest after its last definition in theexecution trace are marked (see Step 3).

Please note that the statement in Line 4 of Example 3 gets part of the slice because of Step 3of the algorithm.

56


From the extended execution trace graph, we use the RelevantSlice algorithm to obtain thefollowing slice for the program.

1 begin2 i = 1 ; // BUG : S h o u l d be i = 0 ;

3 r e s u l t = 0 ;4 i f ( y < 0) then5 begin6 s i g = −1;7 end ;8 f i ;9 y = y ∗ s i g ;

10 while ( i < y ) do11 od ;12 r e s u l t = r e s u l t ∗ s i g ;13 . . .14 end ;

In this example, only the node (result = result * sig)20 is marked. There is no node to markin Step 3. Please not that the statement in Line 14 gets part of the slice because of the potentialdata dependency: (14. while (i<y))6 →P (20. result=result*sig;)7. The statement inLine 2 gets part of the slice since there is a data dependency to Line 14 and Line 14 is alreadypart of the slice.

3.2.3.4 Conclusion

In this section, we introduced the ideas behind dynamic slicing, its basic definitions and algorithms.We discussed why dynamic slices are usually smaller than static slices and showed how to extendthe approach in order to be useful for debugging. In addition, we learned about relevant slicing.Relevant slicing is an advancement of dynamic slicing where dependencies that are not revealedduring a program run are considered for the computation of slices. This extension avoids loosinginformation. Therefore, relevant slicing leads to slices comprising faulty statements, which is notguaranteed when relying on the original definitions of dynamic slicing. In addition, we have learnedhow to obtain executable dynamic slices.

57

3.2.4 Forward slicing

Up to now, we have learned how to compute a backward slice. Thus, we can compute the state-ments which influence a variable. We can use backward slicing for the identification of faultystatements. If we want to correct these statements it is desirable to compute the variables whichare influenced by the statement. We call the task of computing the variables which are influencedby a statement forward slicing. Forward slices describe possible effects of one statement on therest of the program.

How can we compute a forward slice for a variable v at Line n? We could compute a backwardslice for all variables after Line n and check if v is in the slice. Obviously, this approach is tootime consuming. We need another approach which computes the slice forward: We construct acontrol flow graph of the code fragment and use the flow propagation algorithm (see Section 3.1)in order to compute the effected variables. We define gen(n) and kill(n) as follows:

gen(n) = {v|v ∈ def(n) ∧ (ref(n) ∩ in(n) 6= {} ∨ inSlice(n))}kill(n) = {v|v ∈ def(n)}

A variable v is added

• if v is defined in the statement n and the statement n is already in the forward slice or

• if v is defined in the statement n and a relevant variable of a predecessor node is referenced(v2 ∈ REF (n) ∧ v2 ∈ in(n)).

The forward slicing approach works as follows: First, we set the logical value inSlice(n) for allnodes to false, except for the node for which the forward slice should be calculated. Afterwards,we compute the relevant variables with the help of the flow propagation algorithm. We set thelogical value inSlice(n) to true for all nodes n where ref(n) ∩ out(n) 6= {}. We set inSlice(m)to true for all nodes m which are in the influence of n (m ∈ INFL(n)) if inSlice(n) is true. Theprocess is repeated until there are no more changes.

Example 1

We want to compute the forward slice for Statement 3 for the following code fragment:

1 begin2 r = 0 ;3 i = 0 ;4 while ( i < x ) do5 begin6 r = r + y ;7 i = i + 1 ;8 end9 od

10 . . .11 end

First, we compute the control flow graph:

58

After the initialization, the graph looks like this:

Afterwards, we compute the relevant variables:

We continue setting all inSlice(n) to true

• where ref(n) ∩ out(n) 6= {} - in our example the statements 4 and 7

• where n ∈ INFL(m) and inSlice(m) is true - in our example statement 6.

59

After the first iteration of the algorithm, the graph looks like this:

Since there have been some change in inSlice we have to do a second iteration and compute therelevant variables:

We continue setting all inSlice(n) to true where ref(n)∩ out(n) 6= {} and where n ∈ INFL(m)and inSlice(m) is true. Since no logical values of inSlice change we are finished. The forwardslice for the statement 3 is {3, 4, 6, 7}.

60

Example 2

If we apply this approach to statement 2 we obtain the following relevant variables after the firstiteration:

We continue with setting inSlice(6) to true because ref(6) ∪ out(6) = {r}:

In the next iteration, no logical values of inSlice change. We obtain {2, 6} as result.

Summary of the necessary steps to compute a forward slice

1. Set inSlice to true for the statement of interest, set inSlice to false for all other statements.

2. Use the flow propagation algorithm to compute the relevant variables.

3. Set inSlice(n) to true for all nodes where ref(n) ∩ out(n) 6= {}.

4. If inSlice(n) is true for a node n set inSlice(m) to true for all m ∈ INFL(n).

5. If inSlice has changed for at least one node continue with step 2.

61

3.2.5 Hitting sets

In practice, slices are often very large. If an error influences the values of several variables aslice can be too voluminous. Therefore, we have to use a technique that reduces the number ofstatements that a programmer has to investigate. Consider the following example of calculatingfeatures of a circle:

1 begin2 r = d / 1 ;3 c = 2 ∗ r ∗ 3 . 1 4 ;4 a = r ∗ r ∗ 3 . 1 4 ;5 . . .6 end

Obliviously, the program is erroneous in Line 2. One of our test cases (d = 2, c = 6.28, a = 3.14)detects the error, because a and c do not correspond to their expected results. We calculatethe slice for the slicing criterion (5, {a, c}) (all variables with wrong values) and get as resultS = {2, 3, 4}.

It is better to calculate a separate slice for each variable and then combine the slices. If wecompute two separate slices for our example we obtain the slices S(5,{c}) = {2, 3} and S(5,{a}) ={2, 4}. Statement 2 is contained in both slices. Thus statement 2 is the minimal cause for themalfunction of both variables. Lets define a more general algorithm for the computation of minimalexplanations.

Conflict A conflict for a given program and a given test case is a slice (n, {x}) forexactly one variable x whose computed value does not correspond to theexpected test case.

Conflict set The conflict set CO is the set of all conflicts.

Hitting set A hitting set is a set which contains at least one element of every conflict.The hitting set h for a conflict set CO is a subset of

⋃x∈CO x for which

holds ∀C ∈ CO : C ∩ h 6= {}.

Minimal hitting set A hitting set h is minimal if there exists no set h′ ⊂ h which is a validhitting set as well.

Lets consider our example with the conflict set CO = {{2, 3}, {2, 4}}.

• {2} is a hitting set because: {2} ∩ {2, 3} 6= {} ∧ {2} ∩ {2, 4} 6= {}

• {3} is not a hitting set because: {3} ∩ {2, 4} = {}

• {3, 4} is a hitting set because: {3, 4} ∩ {2, 3} 6= {} ∧ {3, 4} ∩ {2, 4} 6= {}

• {2, 3, 4} is a hitting set because: {2, 3, 4} ∩ {2, 3} 6= {} ∧ {2, 3, 4} ∩ {2, 4} 6= {}.

{2} and {3, 4} are minimal hitting sets. {2, 3, 4} is not minimal because {2} ⊂ {2, 3, 4} and{3, 4} ⊂ {2, 3, 4} respectively. Both minimal hitting sets are possible causes for the malfunction.In the first hitting set ({2}), a single statement is hold as possible cause, while in the secondhitting set ({3, 4}), multiple errors are the cause.

For this tiny example, the minimal hitting sets can be intuitionally computed. However, thecomputation gets more complicated for a larger conflict set. We need an algorithm which helpsto determine the minimal hitting sets: We can create a directed graph for our example whichcontains all possible minimal hitting sets. We create the root node and choose the conflict c fromCO with the lowest cardinality. We create a child node for each element in the conflict. We labelthe edges with the corresponding element:

62

Afterwards, we work on each new node n: We compute the set of all edge labels from the rootnode to n (h(n)) and search if there exists an element c ∈ CO where h(n) ∩ c = {}. If thereexists such a c we add the elements of c and continue recursively, otherwise we label the node nas finished (

√). We finish when all nodes are marked as finished:

The above described algorithm delivers not the minimal hitting sets. The hitting set {2, 3} is notminimal because there exists the hitting set {2}. We improve our algorithm by adding a new rule:If there exists a node m which is already marked with

√and where h(m) ⊆ h(n), we mark n

with ×:

More formal, we can define the algorithm (Reiter’s algorithm [27]) as described in Algorithm 1.Hitting sets are not executable programs. You can use static slices and dynamic slices as conflicts.In addition, you can use the dynamic slices of different test cases.

Example

We have the test input a=10, b=-9, r1=4 and the test output s1=-1, r1=21, r2=-11 for thefollowing code fragment:

1 begin2 s1 = a + b ;3 i f ( s1 > 0) then4 begin5 s1 ∗= (−1);6 r1 = a − b ;7 end8 f i9 c = a ∗ 4 ;

10 r2 = r1 − c ;11 r1 = b + c ;12 . . .13 end

We obtain wrong results for the variables r1 and r2. Thus, we compute the slices for these variables(S(12,{r1}) = {9, 11} and S(12,{r2}) = {2, 3, 6, 9, 10}) and the minimal hitting sets:

63

Algorithm 1 HittingSets(CO)

Require: Set of conflict sets CO ordered by their cardinality (left-most is the smallest)Ensure: Set of minimal diagnoses ∆S

1: Let L, L′, H be the empty sets. Create the root node n with label(n) = {} and h(n) = {}.Add n to L.

2: for all nodes n ∈ L do3: From left to right search for a set C ∈ CO with C ∩ h(n) = {}.4: if C exists then5: for all x ∈ C do6: if previously handled node m with h(m) = h(n) ∪ x exists then7: Generate a new arc from n to m.8: else9: Generate a new node n′ with h(n′) = h(n) ∪ x and an arc from n to n′

10: if previously handled node m in H (label(m) =√

) with h(m) ⊂ h(n′) then11: Close n′ and set label(n′) = ×.12: else13: Add n′ to L′.14: end if15: end if16: end for17: else18: if not (h(m) in H with h(m) ⊂ h(n) exists) then19: Set label(n) =

√(New minimal hitting set found) and add h(n) to H.

20: end if21: end if22: end for23: if L′ 6= {} then24: Set L = L′, L′ = {} and goto step 225: else26: return H27: end if

64

One explanation for the error can be found in statement 9. The corrected program is:

1 begin2 s1 = a + b ;3 i f ( s1 > 0) then4 begin5 s1 ∗= (−1);6 r1 = a − b ;7 end8 f i9 c = a ∗ 3 ;

10 r2 = r1 − c ;11 r1 = b + c ;12 . . .13 end

Summary of the steps to compute the minimal causes

1. Compute the set CV comprising all variables where the values contradict with the expectedvalues of a test case t.

2. Compute the CO set by computing the slices for all variables in CV

3. Sort the sets in CO according to their cardinality (lowest cardinality first).

4. Compute the minimal hitting sets of CO.

You can find additional literature to hitting sets in [10,27,34].

65

3.2.6 Summary

In this section, we have learned about different types of slices and different techniques to computethem. In conclusion, we want to give a short survey for what slices can be used:

• Debugging and program analysis

◦ Slicing potentially allows to ignore many statements in the process of localizing a bug.If a program computes an erroneous value for a variable x, only the statements in theslice with respect to x have (possibly) contributed to the computation of the wrongvalue. All other statements can be ignored safely.

◦ Forward slicing can be used to examine the program parts which are affected by s.

◦ Another method that can be used is program dicing. Both, the hitting-set-methodintroduced in Section 3.2.5 and program dicing, process the information of several slices.Unlike the hitting-set-method, the traces of positive test cases and correct computedvariables are considered by the dicing method. The basic idea is that when a programcomputes a correct value for variable x and an incorrect value for variable y the bugis likely to be found in statements which are in the slice Sy but not in the slice Sx.Example: A test case computes the correct value for variable x but the wrong value forvariable y. The dynamic slices for the test case are Sx = {1, 2, 3, 4} and Sy = {1, 2, 3, 5}.Statement 5 is only part of the slice of the incorrect computed variable. Therefore,Statement 5 is more suspicious than statements 1, 2 and 3, which are contained in bothslices. This approach is not failsafe because of coincidental correctness (computationsthat use incorrect values produce correct values).

◦ Slicing can be used to detect ’dead code’ and uninitialized variables.

◦ The information of several dynamic slices can be combined to gain some insight intothe location of a bug by building the union, intersection and difference of them.

• Program differencing and program integration

◦ Program differencing is the task of analyzing an old and a new version of a programin order to determine the set of program components of the new version that representsyntactic and semantic changes. The key issue in program differencing consists ofpartitioning the components of the old and new version in a way that two componentsare in the same partition only if they have equivalent behaviors.

◦ A program integration algorithm compares slices to detect equivalent behaviors.

• Software maintenance

◦ Program slicing can be used to determine whether a change at some place in a programwill affect the behavior of other parts of the program.

◦ Program slicing can be used to decomposite a program into a set of components whereeach component captures a part of the behavior of the original program.

• Testing

◦ You can define that each def-use pair must be executed in a test-case and the pair mustinfluence the output value of the test.

◦ You can use program slicing in regression testing to determine the parts which areaffected by a modification of a previously tested program.

• Optimization

◦ Dynamic slices are used in compiler tuning in order to detect potential occurrences ofredundant common subexpressions, which indicate that a sub-optimal code is generated.

◦ Slicing can be used to parallelize the execution of a sequential program.

66

3.3 Delta Debugging

In practice, we often have to face large amounts of input which cause a program to fail. It iseasier to debug a program if the location of the error can be narrowed down. The code responsiblefor the failure must be part of the execution trace of the failure-inducing input. If this input islarge, the execution trace is large too. A large execution trace is not a great help in localizingthe cause of the failure. Therefore, we have to minimize the execution trace by minimizing thefailure-inducing input. We need a technique which systematically downsizes the failure inducinginput to easier localize and correct faults.

Delta debugging is a technique that uses the results of automated testing to systematicallynarrow down the set of failure-inducing circumstances. It is called delta debugging because itworks with differences (deltas). Figure 3.5 represents the main idea of delta debugging. We startwith a large failure-inducing input and divide it into two parts. If one of the parts still producesthe error we focus on that part (divide and conquer).

Figure 3.5: Narrowing down hypothesis [35]

Figure 3.6 illustrates the delta debugging process. A program and a failing test case arethe inputs to a function called test. The output of this function can either be FAIL, PASS orUNRESOLVED. Depending on the output of the function, the test case is minimized by the deltadebugging algorithm.

Figure 3.6: Delta debugging process

67

Delta debugging relies on monotony. A test is monotone if

∀c ⊆ C : test(c) = FAIL→ ∀c′ ⊇ c : test(c′) 6= PASS

holds. Delta debugging can be used in different areas, e.g for simplifying program input, simpli-fying user interactions and finding failure-inducing code changes. There exist two types of deltadebugging algorithms: (1) algorithms which try to minimize the input (see Section 3.3.1) and (2)algorithms which deliver one failing and one successful input differing only in a small amount ofcharacters (see Section 3.3.2). The following algorithms are based on [36–39].

3.3.1 The minimizing delta debugging algorithm

In this section, we develop the final minimization algorithm step-by-step. Only the final version(Version 3) of the algorithm is of practical value.

3.3.1.1 Version 1 - Initial Version

Assume we have “12345678” as input string which induces a crash of the software system. “7”is the input which causes the crash. We can downsize our input string by applying a binarysearch [36]:

ddmin(cf ) =

cf if |cf | = 1 (1)ddmin(∆1) else if test(∆1) = FAIL (2)ddmin(∆2) else if test(∆2) = FAIL (3)

Assume test(x) is a function which returns the result of applying input x to the software undertest. It returns ‘PASS’ if the program behaves as expected for the input x, ‘FAIL’ if the input xproduces the failure in the program. cf is the initial input sequence which should be minimized.∆1 and ∆2 are substrings of the input string cf with the following characteristics:

• ∆1 ∩∆2 = ∅ (pairwise disjoint)1

• ∆1 ∪∆2 = cf (complete partitioning)2

• |∆1| ≈ |∆2| (approximately same size)

We strengthen this definition by the following restriction:

|∆1| = |∆2|+ ε with |ε| ≤ 1

Example 1

“12345678” is the input where a program crash can be observed. “7” is the input which causesthe crash. We apply the algorithm to this example as follows:

Step Test case Input Result Rule Action0 12345678 FAIL ddmin(12345678)1 ∆1 1234 PASS2 ∆2 5678 FAIL (3) ddmin(5678)3 ∆1 56 PASS4 ∆2 78 FAIL (3) ddmin(78)5 ∆1 7 FAIL (1) ddmin(7) → return 7

1∆1 and ∆2 do not contain any of the same elements, thus their intersection is empty.2The union of ∆1 and ∆2 results in the initial input.

68

3.3.1.2 Version 2 - Dealing with interference

Our first algorithm works fine for this example. What happens in the case of interference? Inpractice, programs often crash because of a combination of several inputs. Assume the applicationin our example crashes only, when “3” and “7” are applied together. The above described algorithmis not able to deal with interference. Thus, we have to improve the algorithm [36]:

ddmin(cf ) = ddmin2(cf , ∅)

ddmin(cf ) =

cf if |cf | = 1 (1)ddmin2(∆1, r) else if test(∆1 ∪ r) = FAIL (2)ddmin2(∆2, r) else if test(∆2 ∪ r) = FAIL (3)ddmin2(∆1,∆2 ∪ r) ∪ ddmin2(∆2,∆1 ∪ r) otherwise (4)

If the program does not fail on the two subsets of the failing input the error must be generatedbecause of a combination of the input. r is the set of input that remains applied. First, we try tominimize ∆1 by still applying ∆2 to the test function (test(∆1 ∪ r)). Afterwards, we minimize ∆2

in the same way. Finally, we return the union of both minimizations.

Example 2

“12345678” is the input string that reveals an error. The application crashes only when “3” and“7” are applied together. We utilize the improved algorithm for this example:

Step Delta r Test Input Result Rule Action0 12345678 12345678 FAIL ddmin2(12345678,-)1 ∆1 = 1234 1234 PASS2 ∆2 = 5678 5678 PASS (4) ddmin2(1234,5678)

∪ddmin2(5678,1234)

ddmin2(1234,5678)3 ∆1 = 12 5678 12 5678 PASS4 ∆2 = 34 5678 345678 FAIL (3) ddmin2(34,5678)5 ∆1 = 3 5678 3 5678 FAIL (2),(1) ddmin2(3,5678) → return 3

ddmin2(5678,1234)6 ∆1 = 56 1234 123456 PASS7 ∆2 = 78 1234 1234 78 FAIL (3) ddmin2(78,1234)8 ∆1 = 7 1234 1234 7 FAIL (2),(1) ddmin2(7,1234) → return 7

3 7 FAIL

After Step 2 we have to apply rule (4), since the tests in Step 1 and 2 pass. In Steps 3-5, weminimize ∆1=1234 by still applying ∆2 to the test function. In Steps 6-8, we determine the errorinducing input in ∆2. Finally, we combine the results.

The algorithm works fine as long as there is only one sub-input revealing an error. In case ofseveral failure-inducing sub-inputs the algorithm might not return the correct minimized input.You can avoid this problem by replacing ddmin2(∆2,∆1 ∪ r) with

ddmin2(∆2, ddmin2(∆1,∆2 ∪ r) ∪ r).

69

3.3.1.3 Version 3 - Dealing with inconsistency

In practice, inconsistency is often a problem. Inconsistency occurs when some parts of the inputdepend on other parts of the input. Example: Input a cannot be applied without applying input b.The outcome of a test applying input a without input b is unresolved (UNRES). The previousalgorithm cannot handle unresolved results. Thus, we have to adapt it [39]:

ddmin(cf ) = ddmin2(cf , 2)

ddmin2(cf , n) =

ddmin2(∆i, 2) if test(∆i) = FAIL (1.reduce to subset)ddmin2(∇i,max(n− 1, 2)) else if test(∇i) = FAIL (2.reduce to complement)ddmin2(cf ,min(|cf |, 2n)) else if n < |cf | (3.increase granularity)cf otherwise (4.done)

The algorithm is different from the previous algorithm: It does not make use of the r set. Instead,it increases the amount of ∆s if the failure cannot be produced by the actual ∆s (see rule 3) andit uses the complements (∇) of the single ∆s as well.

Example 3

“12345678” is the failure-inducing input string. Input 2,3 and 7 cannot be applied without eachother. Input 8 reveals the bug.

Step n Subset/Complement Result Rule Action0 ∆ = 12345678 FAIL ddmin2(12345678,2)1 2 ∆1 = 1234 (= ∇2) UNRES increase granularity:2 2 ∆2 = 5678(= ∇1) UNRES (3) ddmin2(12345678,4)3 4 ∆1 = 12 UNRES4 4 ∆2 = 34 UNRES5 4 ∆3 = 56 PASS6 4 ∆4 = 78 UNRES7 4 ∇1 = 345678 UNRES8 4 ∇2 = 12 5678 UNRES reduce to complement:9 4 ∇3 = 1234 78 FAIL (2) ddmin2 (123478,3)10 3 ∆1 = 12 UNRES11 3 ∆2 = 34 UNRES12 3 ∆3 = 78 UNRES13 3 ∇1 = 34 78 UNRES14 3 ∇2 = 12 78 UNRES increase granularity:15 3 ∇3 = 1234 UNRES (3) ddmin2(123478,6)16 6 ∆1 = 1 PASS17 6 ∆2 = 2 UNRES18 6 ∆3 = 3 UNRES19 6 ∆4 = 4 PASS20 6 ∆5 = 7 UNRES reduce to subset:21 6 ∆6 = 8 FAIL (1),(4) ddmin2(8,2) → return 8

Let us consider Step 1 and 2. First, since n = 2 we split the input into two ∆s. Since the ∇sand ∆s are the same (∆1 = ∇2 and ∆2 = ∇1), we only have to perform two tests, i.e. test(∆1)and test(∆2). Second, since both of these tests reveal UNRES, we know rule (1) and (2) are

70

not applicable. Checking rule (3) we can observe that n < |cf | since n = 2 and |cf | = 4 andsubsequently we increase the granularity to 2n.

Steps 3 to 8 are uneventful, as all tests for each ∆ and ∇ result in either PASS or UNRES.For Step 9 the test shows FAIL, thus, we can check whether any of the rules apply. Rule (1) doesnot apply as we have tested the complement ∇3, but rule (2) is relevant. Therefore, we continuewith ∇3 and reduce it to narrow down the failure inducing input. From Step 10 to 15 we test all∆s and ∇s and since all are unresolved, we have to increase the granularity (as we did in Step 2).

The last check in Step 21 reveals the bug. Since test(∆6) = FAIL we apply rule (1) andsubsequently perform ddmin2(8,2). Lastly, we can see that |cf | = 1 and n = 2, therefore rule (3)is not applicable and we are done (rule (4)).

Informally, we can describe the algorithm ddmin2(cf , n) as follows:

1. Call the algorithm with ddmin2(cf , 2)

2. Split the input cf into n ∆s

3. Test all ∆s and their ∇s

4. Examine the test results

a. If any test(∆i) leads to FAIL (∆i is still failing, thus, we further split it to isolate thefailure inducing input)

a.1 Apply rule (1) and call ddmin2(∆i, 2)

a.2 Goto Step 2

b. Else if any test(∇i) leads to FAIL (∇i is failing, thus, we further split it to isolate thefailure inducing input)

b.1 Apply rule (2) and call ddmin2(∇i,max(n− 1, 2))

b.1 Goto Step 2

c. Else if n < |cf | (all tests are either unresolved or passed, thus we need to increase thegranularity)

c.1 Apply rule (3) and call ddmin2(cf ,min(|cf |, 2n))

c.1 Goto Step 2

d. Otherwise return cf and done (rule (4))

3.3.1.4 Quality of the minimization

We want to minimize the error-producing input. Casually, we would say: An input c′f ⊆ cf is aminimum if there is no smaller subset of cf that causes the error. More formally, we distinguish [36]:

Global minimum An input c′ ⊆ c is said to be the global minimum if and only if test(c′) = FAILand there exists no c′′ with |c′′| < |c′| where test(c′′) = FAIL.

Local minimum An input c′ ⊆ c is said to be the local minimum if and only if test(c′) = FAILand there exists no c′′ ⊂ c′ where test(c′′) = FAIL. The problem of checkinglocal and global minimality is a decision problem which is NP-complete.

n-minimal input An input c′ ⊆ c is said to be n-minimal if and only if test(c′) = FAIL andthere exists no c′′ ⊂ c′ with |c′| − |c′′| ≤ n where test(c′′) = FAIL. We wantto achieve 1-minimality.

If an input consists of several failure-inducing inputs (this means the input is ambiguous) thedelta debugging algorithm returns only one possibility [36]. The returned result may not be theminimal solution.

71

Example 4

Lets consider our example input string “12345678”. Assume “34” and “7” cause a program crash.If we apply the delta debugging algorithm we get “34” as result:

Step n Subset/Complement Result Rule Action0 ∆ = 12345678 FAIL ddmin2(12345678,2)1 2 ∆1 = 1234 FAIL (1) ddmin2(1234,2)2 2 ∆1 = 12 PASS3 2 ∆2 = 34 FAIL (1) ddmin2(34,2)4 2 ∆1 = 3 = ∇2 PASS5 2 ∆2 = 4 = ∇1 PASS

34 FAIL (4) → return 34

3.3.1.5 Example of practical usage

Bugzilla entry #24735 of the Mozilla project describes the “Mozilla cannot print” bug: Thebrowser crashes when you print a certain file (see [37] for details). The original input file has896 lines. First, the delta debugging algorithm was used to reduce the input to one failureinducing line <SELECT NAME="priority" MULTIPLE SIZE=7>.Finally, this line is reduced to thesingle statement <SELECT>.

72

3.3.2 The isolation differences algorithm

In some cases, it makes sense to determine a minimal difference between a failing and a successfultest run because of

• Time constraintsTo simplify an input to n characters, we need at least n tests because we have to verifythat each character is relevant for the failure. As an alternative you can focus on the inputdifference between a failing and a successful test. This is in general more efficient than sim-plifying a large failing input [37,38].

• Required input structureIn some cases, a special input structure is required. There can be much input which isessential but does not influence the error.

The failure inducing difference ∆ is the minimal difference between two program inputs cs and cfwith test(cs) = PASS and test(cf ) = FAIL. This means cs is the succeeding program input andcf the failing program input. Thus, we want to maximize cs while minimizing cf . The followingalgorithm describes how to isolate a difference:

dd(cs, cf ) = dd2(cs, cf , 2)

dd2(c′s, c′f , n) =

dd2(c′s, c′s ∪∆i, 2) if ∃{i|test(c′s ∪∆i) = FAIL} (1)

dd2(c′f\∆i, c′f , 2) else if ∃{i|test(c′f\∆i) = PASS} (2)

dd2(c′s ∪∆i, c′f ,max(n− 1, 2)) else if ∃{i|test(c′s ∪∆i) = PASS} (3)

dd2(c′s, c′f\∆i,max(n− 1, 2)) else if ∃{i|test(c′f\∆i) = FAIL} (4)

dd2(c′s, c′f ,min(2n, |∆|)) else if n < |∆| (5)

(c′s, c′f ) otherwise (6)

We have two sets c′s and c′f , which converge. cs ⊆ c′s ⊂ c′f ⊆ cf . The final difference ∆ betweenc′s and c′f is 1-minimal.

Lets look at each rule individually (and remember we always call dd2(c′s, c′f , n) with a successful

input c′s and a failing input c′f ):

(1): (∃{i|test(c′s ∪∆i) = FAIL} → dd2(c′s, c′s ∪∆i, 2)

The successful input c′s together with ∆i leads to a failing test. Thus, ∆i must contain thefaulty input sequence. We call dd2 with a successful input, in our case c′s, and a failinginput, in our case c′s ∪∆i. Since we observed a failing test case and would like to minimizethe faulty sequence we reduce n to 2.

(2): (∃{i|test(c′f\∆i) = PASS} → dd2(c′f\∆i, c′f , 2))

The failing test case c′f without ∆i leads to a successful test. Thus, ∆i must contain thefaulty input sequence. In order to determine the difference between the successful test casec′f\∆i and the failing test case c′f , we try to reduce the difference between the failing andpassing test case by setting n = 2.

(3): (∃{i|test(c′s ∪∆i) = PASS} → dd2(c′s ∪∆i, c′f ,max(n− 1, 2)))

The passing test case c′s with ∆i still passes. Thus, ∆i does not lead to a failing test. Againwe try to find a difference between a passing test case, this time c′s ∪∆i and a failing one,i.e. c′f . To reduce the difference we decrease n by 1 and ensure that the new n ≥ 2, since c′sand c′f now differ one ∆ less.

73

(4): (∃{i|test(c′f\∆i) = FAIL} → dd2(c′s, c′f\∆i,max(n− 1, 2)))

The failing test case c′f fails even without ∆i. Thus, c′f\∆i must contain a faulty inputsequence. We call the algorithm with the successful input c′s and the faulty input c′f\∆i.Further, we reduce n by 1 and ensure that the new n ≥ 2, since c′s and c′f now differ one ∆less.

(5): (n < |∆| → dd2(c′s, c′f ,min(2n, |∆|)))

This rule is applied in case none of the above can be applied, i.e. all tests were UNRES. Weincrease the n and use same inputs as before.

(6): ( otherwise → (c′s, c′f ))

The stopping criterion is met and the final difference ∆ between c′s and c′f is 1-minimal.

Note that the rules should be applied in order, i.e. rule (1) has priority over rule (2), etc.

Example 1

Lets consider our example input “12345678”. Assume the program crashes if the input stringcontains “6”. In this example, the strings “3” and “7” must be part of the input in order to geta result different to UNRES. The test passes if the program gets the empty string as input.

Step n cs cf ∆ TC Test input Result Rule Action

1 - - 12345678 12345678 cs - PASS2 cf 12345678 FAIL dd2(-,12345678,2)

3 2 - 12345678 12345678 c′s ∪∆1 1234 UNRES4 2 - 12345678 12345678 c′s ∪∆2 5678 UNRES (5) dd2(-,12345678,4)

5 4 - 12345678 12345678 c′s ∪∆1 12 UNRES6 4 c′s ∪∆2 34 UNRES7 4 c′s ∪∆3 56 UNRES8 4 c′s ∪∆4 78 UNRES9 4 c′f \∆1 345678 FAIL (4)10 4 c′f \∆2 12 5678 UNRES11 4 c′f \∆3 1234 78 PASS (2) dd2(123478,12345678,2)

12 2 123478 12345678 56 c′s ∪∆1 12345 78 PASS13 c′s ∪∆2 1234 678 FAIL (1) dd2(123478,1234678,2)

14 2 123478 1234678 6 - - - (6) (123478,1234678)

We start with the failing input cf = {12345678} and the passing input cs = {}. In Step 1 and 2,we verify those inputs and call the algorithm for the first time (dd(cs, cf ) = dd2(cs, cf , 2)).

The ∆, i.e. difference, between cs and cf in Step 3 is {12345678}. This ∆ is split into n = 2parts, i.e. ∆1 = {1234} and ∆2 = {5678}. As before within the minimization algorithm wherewe tested all deltas and all complements, we always check test(c′s ∪∆i), i.e. the succeeding inputand a delta, as well as test(c′f \∆i), i.e. a failing input without a delta, for all ∆i. The results oftest(c′s∪∆1) and test(c′s∪∆2) in Step 3 and 4 are unresolved. Further, we known that test(c′f \∆1)and test(c′f \∆2) are unresolved since c′f \∆1 = c′s ∪∆2 and c′f \∆2 = c′s ∪∆1. Thus, we cannotapply rule (1) to (4). We check whether the condition for rule (5) is met, i.e. n < |∆|. Since n = 2and |∆| = 8, we apply rule (5) and increase the granularity (dd2(c′s, c

′f ,min(2n, |∆|))→ n = 4).

The test input in Step 11 passes the test (∃{i|test(c′f\∆i) = PASS}) for ∆3, i.e. rule (2) hasto be applied. Subsequently, we continue with dd2(c′f\∆i, c

′f , 2). Please note, that we call the

algorithm in Step 11 and not in Step 9. Would we call dd2(−, 345678, 3) in Step 9 instead ofdd2(123478, 12345678, 2) in Step 11 our ∆ would be {345678} instead of {56}. To ensure that weapply the right rule, we have to check all test cases, that is all test(c′s ∪∆i) and all test(c′f \∆i).

74

Based on the computed outcomes we investigate whether we can apply rule (1) due to one of thetests. If not we continue to check whether we can use rule (2) etc. Thus, the ordering of the rulesis essential to ensure a minimal number of steps.

The test input in Step 12 passes the test and the test input in Step 13 fails the test. The differ-ence between these two tests is already 1-minimal, but the algorithm calls itself with dd2(c′s, c

′s ∪

∆i, 2). Since the difference of the successful input cs and the failing input cf contains only oneitem it is not possible to divide the difference into the required to parts. Thus, only the last rule,rule (6), matches and cs and cf are returned.

Informally we can describe the algorithm as follows:

1. Split the input into a passing (cs) and a failing part (cf ) and call the algorithm withdd2(cs, cf , 2).

2. The difference between cs and cf is ∆, which is split into n parts.

3. Compute for each ∆i

• c′s ∪∆i and test it

4. Compute for each ∆i

• c′f \∆i and test it

5. Check according to the results of all tests which rule is applicable

• If any test(c′s ∪∆i) fails apply rule (1) and goto Step 2

• else if any test(c′f \∆i) passes apply rule (2) and goto Step 2

• else if any test(c′s ∪∆i) passes apply rule (3) and goto Step 2

• else if any test(c′f \∆i) fails apply rule (4) and goto Step 2

• else if all tests are unresolved and n < |∆| apply rule (5) and goto Step 2

• else apply rule (6), return(c′s, c′f ) and done

Example 2

Lets consider the example input “12345678”. Assume the program crashes if the input stringcontains either “2” or “8”. In this example, the strings “1” and “7” must be part of the input inorder to get a result different to UNRES. The test passes if the program gets the empty string asinput.

75

Step n cs cf ∆ TC Test input Result Rule Action

1 - - 12345678 12345678 cs - PASS2 cf 12345678 FAIL dd2(-,12345678,2)

3 2 - 12345678 12345678 c′s ∪∆1 1234 UNRES4 2 - 12345678 12345678 c′s ∪∆2 5678 UNRES (5) dd2(-,12345678,4)

5 4 - 12345678 12345678 c′s ∪∆1 12 UNRES6 c′s ∪∆2 34 UNRES7 c′s ∪∆3 56 UNRES8 c′s ∪∆4 78 UNRES9 c′f \∆1 345678 UNRES10 c′f \∆2 12 5678 FAIL (4) dd2(-,125678,3)11 c′f \∆3 1234 78 FAIL (4)12 c′f \∆4 123456 UNRES

13 3 - 125678 125678 c′s ∪∆1 12 UNRES14 c′s ∪∆2 56 UNRES15 c′s ∪∆3 78 UNRES16 c′f \∆1 5678 UNRES17 c′f \∆2 12 78 FAIL (4) dd2(-,1278,2)18 c′f \∆3 12 56 UNRES

19 2 - 1278 1278 c′s ∪∆1 12 UNRES20 c′s ∪∆2 78 UNRES (5) dd2(-,1278,4)

21 4 - 1278 1278 c′s ∪∆1 1 UNRES22 c′s ∪∆2 2 UNRES23 c′s ∪∆3 7 UNRES24 c′s ∪∆4 8 UNRES25 c′f \∆1 2 78 UNRES26 c′f \∆2 1 78 FAIL (4) dd2(-,178,3)27 c′f \∆3 12 8 UNRES28 c′f \∆4 12 7 FAIL (4)

29 3 - 178 178 c′s ∪∆1 1 UNRES30 c′s ∪∆2 7 UNRES31 c′s ∪∆3 8 UNRES32 c′f \∆1 78 UNRES33 c′f \∆2 1 8 UNRES34 c′f \∆3 1 7 PASS (2) dd2(17,178,2)

35 2 17 178 8 - - - (6) (17,178)

Have a look at Steps 13 to 18. First, we check if there exists a ∆i where test(c′s ∪ ∆i) = FAIL(Step 13 to 15). Since no such ∆i exists, we check if there exists a ∆i where test(c′f\∆i) = PASS(Step 16 to 18). Again, we do not find such a ∆i. We continue our search by checking if thereexists a ∆i where test(c′s ∪ ∆i) = PASS. For this check, we recycle the test results of Step 13to 15. Finally, we check if there exists a ∆i where test(c′f\∆i) = FAIL. By inspecting the testresults of Step 16 to 18, we find such a ∆i in Step 17. Thus, we call dd2(−, 1278, 2) and do notinvestigate the test result of Step 18.

76

3.3.2.1 Example of practical usage

Consider the following code which causes the GNU C compiler version 2.95.2 to crash [37,38]:

1 double mult (double z [ ] , int n)2 {3 int i , j ;4 i = 0 ;5 for ( j = 0 ; j < n ; j++)6 {7 i = i + j + 1 ;8 z [ i ] = z [ i ] < ( z [ 0 ] + 1 . 0 ) ;9 }

10 return z [ n ] ;11 }

GCC only crashes if “+ 1.0” is present in the code. The compiler requires much structure aroundthe failure inducing input in order to keep the lexical structure. The empty file can be compiledby GCC and is therefore our primary successful input (cs = {}). Our primary failing input is theabove shown program (cf = {∆1, ...,∆n}).

As a compiler needs a certain input structure, there are many other error messages which couldbe produced by the compiler. We define our test function test(x) to return FAIL if and only ifthe run crashes at the same location as the original run. It returns PASS if the program exitsnormally and UNRES otherwise. Now, the above described algorithm can be applied.

77

3.4 Object Flow Graph

There are two types of analyzing techniques - data flow analysis techniques and control flowanalysis techniques. We have already learned about control flow graphs, which are used in thecontrol flow analysis. Now, we will learn a technique used in data flow analysis - the object flowgraph (OFG). Control flow analysis is a popular analyzing technique for imperative programs, theobject flow analysis is used for analyzing object-oriented programs.

The OFG describes the flow of objects in an object-oriented program. It allows tracing the flowof information about an object from the object creation through object assignment to variables,up until the storage of objects in class fields or their usage in method invocations. OFGs can beused to improve class diagrams.

Figure 3.7 shows the basic steps in the transformation of Java source code to an OFG. First,the Java program is translated into an abstract program. This step is explained in Section 3.4.1.Afterwards, the abstract program is transformed into an OFG as explained in Section 3.4.2. Thedescription of both parts is from the book Reverse engineering of object oriented code [32].

Figure 3.7: Projection of the original program to the OFG

3.4.1 Language abstraction

In the first step, the Java language is simplified into an abstract language. This abstract languageis data-flow sensitive, but control-flow insensitive. All Java instructions that refer to data-flowsare properly represented in the abstract language while control-flow statements (e.g. conditionals,loops) are ignored.

The name resolution of the abstract language is simplified. All identifiers are represented bythe fully scoped name instead of being enclosed by packages, classes and methods. In addition,we ignore the type of variables and the type of the return value of methods.

The abstract language consists of zero or more declarations (D) followed by zero or morestatements (S). A declaration can be made for a class attribute (a), a method (m) or a classconstructor (cs). f1, ..., fk are formal parameters while a1, ..., ak are actual parameters.x and y are program locations (local variables, class attributes, method parameters). c is a classname. m.return is the return value of a method and E is an expression (e.g. a local variable, aclass attribute, a new class). If the expression results in a primitive data type in the original Javaprogram, the return statement will not be taken over into the abstract program.

P := D*S*

D := a

| m(f1, ..., fk)

| cs(f1, ..., fk)

S := x = new c(a1, ..., ak);

| x = y;

| [x=] y.m(a1, ..., ak);

| m.return = E;

In summary, we have to discard control flow statements and primitive data types and resolveidentifier names. The exact transformation of statements and declarations is explained below.We use the following example to demonstrate the transformation for each type of declaration andstatement.

78

Example 1

1 class Library {2 C o l l e c t i o n loans = new LinkedLis t ( ) ;3 HashMap<Integer , User> use r s = new HashMap<Integer , User >() ;45 public Library ( ){6 . . .7 }89 private void addLoan ( Loan loan ){

10 i f ( loan == null ) return ;11 User user = loan . getUser ( ) ;12 Document doc = loan . getDocument ( ) ;13 loans . add ( loan ) ;14 . . .15 }1617 public boolean borrowDocument ( User user , Document doc ){18 i f ( user == null | | doc == null ) return fa lse ;19 Document document = doc ;20 i f ( user . numberOfLoans ( ) < MAX NUMBER OF LOANS &&21 doc . i s A v a i l a b l e ( ) && doc . authorizedLoan ( user ) ) {22 Loan loan = new Loan ( user , doc ) ;23 addLoan ( loan ) ;24 return true ;25 }26 return fa lse ;27 }2829 public User getUser ( int userCode ){30 User user = use r s . get (new I n t e g e r ( userCode ) ) ;31 return user ;32 }3334 public void addUser ( I n t e g e r userCode , User user ){35 us e r s . put ( userCode , user ) ;36 }37 }

Declarations

• Attribute declarationsAn attribute declaration consists of the fully scoped name a of the attribute. The fullyscoped name consists of a dot-separated list of packages, followed by a dot-separated list ofclasses, followed by the attribute identifier.Example 1, Line 2: Collection loans → Library.loans

• Method declarationsA method declaration consists of the fully scoped method name m, followed by the list offormal parameters f1, ..., fk. Each formal parameter fi has m (fully scoped methodname) as prefix and the parameter identifier as dot-separated suffix.Example 1, Line 17: public boolean borrowDocument(User user, Document doc)

→ Library.borrowDocument(Library.borrowDocument.user, Library.borrowDocument.doc)

• Constructor declarationsConstructors have an abstract syntax similar to that of methods.Example 1, Line 5: public Library() → Library.Library()

79

Statements

• Allocation statementsx and y have the same structure as the formal parameters (dot-separated package/class pre-fix, followed by a method identifier, followed by a variable identifier). Chains of attributeaccesses are replaced by the last field only, fully scoped (e.g. a.b.c → B.c assuming b ofclass B). The actual parameters a1, ..., ak are also program locations. this to representa pointer to the current object and return to represent the return value of a method arespecial values of the program location.Example 1, Line 22: Loan loan = new Loan(user, doc);→ Library.borrowDocument.loan

= new Loan.Loan(Library.borrowDocument.user, Library.borrowDocument.doc);

• Assignment statementsExample 1, Line 19: Document document = doc;

→ Library.borrowDocument.document = Library.borrowDocument.doc;

• Method invocations

◦ Method invocation on another objectThis type of method invocation consists of the fully scoped object, followed by a dotand the fully scoped method.Example 1, Line 12: Document doc = loan.getDocument();

→ Library.addLoan.doc = Library.addLoan.loan.Loan.getDocument();

◦ Invocation of a method of the same objectThis type of method invocation starts with the fully scoped caller method, followed by.this. and ends with the fully scoped called method.Example 1, Line 23: addLoan(loan);

→ Library.borrowDocument.this.Library.addLoan(Library.borrowDocument.loan);

• Return statementsExample 1, Line 31: return user;

→ Library.getUser.return = Library.getUser.user;

Summary of the necessary steps to create an abstract program

1. Replace all variables through their unique names

• package.class.name for class and instance variables

• package.class.method.name for formal parameters and local variables

• package.class.this for references to the current instance or class respectively

2. Replace all methods and constructors through their unique names

• package.class.method for methods

• package.class.constructor for constructors

3. Remove all variables that have primitive data types

4. Remove all characters from the program except

• Declarations

• Statements

Constants (even null) and characters for structuring the program are removed. If a statementis incomplete after deleting constants (e.g. the right side is missing) the whole statementhas to be removed.

80

3.4.2 Graph creation

A object flow graph is a directed graph (N,E) with a set of nodes N and a set of edges E. Anode is added to the OFG for each program location. The edges define the object flow.

When a constructor or a method is invoked, edges are added which connect each actual param-eter ai to the respective formal parameter fi. In case of constructor invocation, the newly createdobject (c.this) is paired with the left hand side x of the related assignment. In case of methodinvocation, the target object y becomes m.this inside the called method, generating the edge(y, m.this) and the value returned by method m flows to the left hand side x (m.return, x).

• x = new c(a1, ..., ak);

◦ (a1, f1) ∈ E, ..., (ak, fk)∈ E◦ (c.this, x) ∈ E

Example: Lib.borrowDoc.loan = new Loan.Loan(Lib.borrowDoc.user, Lib.borrowDoc.doc)

→ (Lib.borrowDoc.user, Loan.Loan.user)

→ (Lib.borrowDoc.doc, Loan.Loan.doc)

→ (Loan.Loan.this, Library.borrowDoc.loan)

• [x =] y.m(a1, ..., ak);

◦ (y, m.this) ∈ E◦ (a1, f1) ∈ E, ..., (ak, fk) ∈ E,

◦ (m.return, x) ∈ E

• x = y;

◦ (y, x) ∈ E

Example: Loan.user = Loan.Loan.user

→(Loan.Loan.user, Loan.user)

3.4.3 Containers

Containers are used to store an arbitrary amount of classes of the same type (e.g. the Java classVector). Each container class has at least one method to add objects (add, insert, . . . ) and onemethod to remove objects (remove, delete, . . . ) or to get objects (extract, . . . ). If we use thesource code of container classes, the OFG would significantly increase and we would obtain moreinformation than required. In addition, the source code of container classes is often not available.In order to avoid this problems, we use simplifications for container classes:

• We replace c.insert(o) with c = o. Therewith, we ensure the object flow from o to thecontainer c.

• We replace o = c.extract() with o = c.

Similar adaptations can be made for other container classes and their methods.

81


The code snippets of Example 1 result in the following abstract program:

2 Library . l oans = new LinkedLis t . L inkedLis t ( ) ;3 Library . u s e r s = new HashMap . HashMap ( ) ;5 Library . Library ( )9 Library . addLoan ( Library . addLoan . loan )

11 Library . addLoan . user = Libary . addLoan . loan . Loan . getUser ( )12 Library . addLoan . doc = Library . addLoan . loan . Loan . getDocument ( )13 Library . l oans = Library . addLoan . loan17 Library . borrowDocument ( Library . borrowDocument . user ,

Library.borrowDocument.doc)19 Library . borrowDocument . document = Library . borrowDocument . doc20 Library . borrowDocument . user . User . numberOfLoans ( )21 Library . borrowDocument . doc . Document . i s A v a i l a b l e ( )21 Library . borrowDocument . doc . Document . authorizedLoan

(Library.borrowDocument.user)22 Library . borrowDocument . loan = new Loan . Loan

(Library.borrowDocument.user, Library.borrowDocument.doc)23 Library . borrowDocument . this . L ibrary . addLoan

(Library.borrowDocument.loan)29 Library . getUser ( )30 Library . getUser . user = Library . u s e r s31 Library . getUser . return = Library . getUser . user34 Library . addUser ( Library . addUser . userCode , Library . addUser . user ) ;35 Library . u s e r s = Library . addUser . userCode ;35 Library . u s e r s = Library . addUser . user

This is the resulting OFG. Please note, that not all method declarations are available, so theformal parameters are abstracted by .

82

Example 2

Consider the following code fragment:

1 class Toy {2 Object id ;3 public Toy ( ) {4 id = null ;5 }6 public void name ( Object name) {7 id = name ;8 }9 public stat ic void main ( S t r i n g s args [ ] ) {

10 Toy t = new Toy ( ) ;11 t . name( ”PSP” ) ;12 }13 }

It results in the following abstract program:

2 Toy . id3 Toy . Toy ( )6 Toy . name (Toy . name . name)7 Toy . id = Toy . name . name ;9 Toy . main (Toy . main . args )

10 Toy . main . t = new Toy . Toy ( ) ;11 Toy . main . t . Toy . name( ) ;

This is the resulting OFG:

83

Example 3

1 public class BinaryTreeNode{2 BinaryTreeNode l e f t , r i g h t ;3 Comparable obj ;4 public BinaryTreeNode ( Comparable x ) {5 obj = x ;6 }7 . . .8 }9 public class ToyCol lect ion {

10 stat ic Set toys = new TreeSet ( ) ;11 . . .12 public stat ic void addToy (Toy t ) {13 BinaryTreeNode n = new BinaryTreeNode ( t ) ;14 toys . i n s e r t (n ) ;15 }16 . . .17 public stat ic void main ( St r ing args [ ] ) {18 . . .19 Toy t = new Toy( ”PS2” ) ;20 addToy ( t ) ;21 . . .22 }23 }24 class Toy implements Comparable {25 Object id ;26 public Toy ( ) {27 id = null ;28 }29 public void name ( Object name) {30 id = name ;31 }32 }


2 BinaryTreeNode . l e f t , BinaryTreeNode . r i g h t3 BinaryTreeNode . obj4 BinaryTreeNode . BinaryTreeNode ( BinaryTreeNode . BinaryTreeNode . x )5 BinaryTreeNode . obj = BinaryTreeNode . BinaryTreeNode . x ;

10 ToyCol lect ion . toys = new TreeSet . TreeSet ( ) ;12 ToyCol lect ion . addToy ( ToyCol lect ion . addToy . t )13 ToyCol lect ion . addToy . n = new BinaryTreeNode . BinaryTreeNode

(ToyCollection.addToy.t);14 ToyCol lect ion . toys = ToyCol lect ion . addToy . n ;17 ToyCol lect ion . main ( ToyCol lect ion . main . args )18 . . .19 ToyCol lect ion . main . t = new Toy . Toy( ) ;20 ToyCol lect ion . main . this . ToyCol lect ion . addToy ( ToyCol lect ion . main . t ) ;25 Toy . id26 Toy . Toy ( )29 Toy . name (Toy . name . name)30 Toy . id = Toy . name . name ;

84


Figure 3.8: OFG for example 3

85

Example 4

Consider the following code fragment:

1 class Course {2 St r ing name ;3 Lis t<Student> s tudents ;4 int i t e r a t o r =0;56 public Course ( S t r ing name) {7 this . name = name ;8 s tudents = new ArrayList<Students >() ;9 }

10 public void addStudent ( Student student ) {11 students . add ( student ) ;12 }13 public Student getNextStudent ( ){14 i f ( ! ( i t e r a t o r <s tudents . s i z e ( ) ) )15 return null ;16 Student student = students . get ( i t e r a t o r ) ;17 i t e r a t o r ++;18 return student ;19 }20 public St r ing toS t r i ng ( ){21 St r ing r e s u l t = name ;22 while ( true ){23 Student student = getNextStudent ( ) ;24 i f ( student==null )25 break ;26 r e s u l t += student . t oS t r i ng ( ) ;27 }28 return r e s u l t ;29 }30 public stat ic void main ( S t r i n g s args [ ] ) {31 Course course = new Course ( ” Software maintenance ” ) ;32 Student ot to = new Student ( ) ;33 course . addStudent ( ot to ) ;34 St r ing cou r s eDe s c r i p t i on = course . t oS t r i ng ( ) ;35 System . out . p r i n t l n ( cou r s eDe s c r i p t i on ) ;36 }37 }


2 Course . name3 Course . s tudents6 Course . Course ( Course . Course . name)7 Course . name = Course . Course . name ;8 Course . s tudents = new ArrayList . ArrayList ( ) ;

10 Course . addStudent ( Course . addStudent . student )11 Course . s tudents = Course . addStudent . student ;13 Course . getNextStudent ( )14 Course . s tudents . L i s t . s i z e ( ) ;16 Course . getNextStudent . student = Course . s tudents ;18 Course . getNextStudent . return = Course . getNextStudent . student ;20 Course . t oS t r i ng ( )21 Course . t oS t r i ng . r e s u l t = Course . name ;23 Course . t oS t r i ng . student = Course . t oS t r i ng . this . Course . getNextStudent ( ) ;26 Course . t oS t r i ng . r e s u l t = Course . t oS t r i ng . student . Student . t oS t r i ng ( ) ;28 Course . t oS t r i ng . return = Course . t oS t r i ng . r e s u l t ;

86

30 Course . main ( Course . main . args )31 Course . main . course = new Course . Course ( ) ;32 Course . main . o t to = new Student . Student ( ) ;33 Course . main . course . Course . addStudent ( Course . main . o t to ) ;34 Course . main . c ou r s eDes c r i p t i on = Course . main . course . Course . t oS t r i ng ( ) ;


87

3.4.4 Object sensitivity

We map class attributes, method names and program locations to the class scope. We are ableto distinguish two locations (e.g. two class attributes) if they belong to different classes, but wecannot distinguish between them if they belong to the same class but different class instances(objects). Different instances of a class are not distinguished. Therefore, the OFG is imprecise.We call this type of OFG object-insensitive OFG.

An object-sensitive OFG can be build by giving the objects an object scope instead of a classscope while static attributes and static methods maintain the class scope. The construction ofobject-sensitive OFGs is more complicated, but the finer grained edge construction results in amore precise propagation of information along the data-flows.

Have a look at the following example. Can you answer the question ”What is the exact typeof the object doc in Line 8?” by using an object-insensitive OFG?

Example 5

1 class Main{2 public stat ic void main ( ){3 User u1 = new User ( ”J . Smith” , ”” , ”” ) ;4 Document d1 = new A r t i c l e ( ”Very i n t e r e s t i n g a r t i c l e ” ) ;5 Loan l 1 = new Loan ( u1 , d1 ) ;6 Document d2 = new Book( ”Very i n t e r e s t i n g book” ) ;7 Loan l 2 = new Loan ( u1 , d2 ) ;8 Document doc = l 1 . getDocument ( ) ;9 }

10 }11 class Loan{12 User user ;13 Document document ;14 public Loan{User usr , Document doc ){15 user=usr ;16 document=doc ;17 }18 public Document getDocument ( ){19 return document ;20 }21 }22 class A r t i c l e extends Document { . . . }23 class Book extends Document{ . . . }

Object-insensitive OFG

It results in the following object-insensitive abstract program:

2 Main . main ( )3 Main . main . u1 = new User . User ( , , ) ;4 Main . main . d1 = new A r t i c l e . A r t i c l e ( ) ;5 Main . main . l 1 = new Loan . Loan (Main . main . u1 , Main . main . d1 ) ;6 Main . main . d2 = new Book . Book ( ) ;7 Main . main . l 2 = new Loan . Loan (Main . main . u1 , Main . main . d2 ) ;8 Main . main . doc = Main . main . l 1 . Loan . getDocument ( ) ;

12 Loan . user13 Loan . document14 Loan . Loan ( Loan . Loan . usr , Loan . Loan . doc )15 Loan . user=Loan . Loan . usr16 Loan . document=Loan . Loan . doc18 Loan . getDocument ( )19 Loan . getDocument . return = Loan . document ;

88

This is the resulting object-insensitive OFG:

Figure 3.9: Object-insensitive OFG for Example 5

By using this object-insensitive graph, we are only able to answer the question with ”The typeof the object doc in Line 8 could either be Book or Article.”. This answer is often not sufficient.Object-sensitive OFGs find a remedy here.

89

Object-sensitive OFG

In the object-sensitive version, you have to replace the class scopes with object scopes. Therefore,the statement

19 Loan . getDocument . return = Loan . document ;

is replaced with the two following statements, because there exist two objects of the type Loan:

19 Loan1 . getDocument . return = Loan1 . document ;19 Loan2 . getDocument . return = Loan2 . document ;

This is the resulting object-sensitive OFG:

Figure 3.10: Object-sensitive OFG for Example 5

With the object-sensitive graph, we could answer the question with ”The exact type of the objectdoc in Line 8 is Article.”.

90

3.5 Class Diagram Recovery

A class diagram shows the static structure of a system. The most relevant features - attributesand methods - are provided together with properties (visibility, type, . . . ) and the relationshipsof the classes. It is a very informative summary of many design decisions about the system’sorganization. The information of internal class features can be directly obtained by analyzing thesyntax of the source code (see [11] for a detailed description).

Figure 3.11: Reverse engineering of inter-class relationships [11]

There are several types of inter-class relationships:

• Access relationshipThere are two types of access relationships: aggregation and composition. A class is relatedto another class by an aggregation if the latter is part of the former. The composition is astronger form of aggregation where the parts and the whole have the same lifetime.

• AssociationTwo classes are connected by a (bidirectional) association if there is a possibility to navigatefrom an object instantiating the first class to an object instantiating the second class (andvice versa).

• DependencyDependency is a weak relationship among classes. A dependency holds between two classesif any change in one class might affect the dependent class. A typical dependency is a classthat uses resources from another class.

• GeneralizationThe subclass inherits the features (attributes and methods) of its superclass. The subclasscan add further features or redefine inherited methods (overriding).

• RealizationA class A realizes an interface B if class A implements all methods declared in interface B.

91

Aggregation, association and dependency relationships are displayed in a class diagram to indicatethat a class has access to resources (attributes or operations) from another class.

The reverse engineering of inter-class relationships is explained in Figure 3.11. Generalizationand realization can be easily determined by looking for the keywords extends and implements.The declared type of the program locations (attributes, local variables, method parameters) isused to determine associations and dependencies.

Problems with inter-class relationship recovery

It is quite typical that the declared type is the root of a sub-tree in the inheritance hierarchy or aninterface. Containers often have Object as members, binary trees for example work with objectswhich implement the interface Comparable. If the application program uses only a portion of theinheritance sub-tree, the target of the association/dependency is inaccurate. There might be amismatch between the type declared for a program location (super classes and interfaces) and theactual types of the objects that are possibly assigned to such a location. In these cases, a preciserecovery of the class diagram can be achieved by determining the type of the actually allocatedobjects.

We can improve the class diagram by using the generic flow propagation algorithm describedin Section 3.1. All gen sets are empty except those of the type x.this (constructors) withgen[x.this]=x. All kill sets are empty.

Assume we want to know if an instance of a class is stored directly or indirectly in a certainvariable during program execution. We can answer that question by marking the node whichrepresents the object and then we propagate the marking through the graph.

Continuation of Example 3 from Section 3.4

The class diagram for this example would look like the one in Figure 3.14(a). The direct relationBinaryTreeNode and Toy is missing. We can use the OFG and the flow propagation algorithmto improve the class diagram. First, we set the gen sets of all nodes representing constructors tothe class name (e.g. gen[Toy.Toy.this] = {Toy}), while all other sets are empty as illustrated inFigure 3.12. Then, we propagate this information through the OFG by setting the in and out setsaccordingly. Figure 3.13 shows the OFG after the flow propagation algorithm terminates.

Figure 3.12: Initial state of the FPA for Example 3

For example, we see in Figure 3.13 that out[BinaryTreeNode.obj] = {Toy}. Thus, we candraw an association from BinaryTreeNode to Toy in the class diagram, as shown in Figure 3.14(b).Class names that reach nodes representing class attributes help us refine association targets, while

92

class names reaching local variables or method parameters allow an improvement of dependencyrelations. The entire improved class diagram is illustrated in Figure 3.14(b).

Figure 3.13: Final state of the FPA for Example 3

(a) Without OFG information (b) With OFG information

Figure 3.14: Class diagram for Example 3

93

Bibliography

[1] Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles,Techniques, and Tools (2nd Edition). Addison Wesley, August 2006.

[2] Ghazi Alkhatib. The maintenance problem of application software: an empirical analysis.Journal of Software Maintenance, 4(2):83–104, 1992.

[3] D. J. Robson B. J. Cornelius, M. Munro. An approach to software maintenance education.Software Engineering Journal, July 1988.

[4] V. R. Basili. Viewing software maintenance as reuse-oriented software development. IEEESoftware, 7:19-15, January 1990.

[5] L. Bernstein. Tidbits. ACM SIGSOFT - Software Engineering Notes, 18(3):A-55, July 1993.

[6] B. W. Boehm, editor. The economics of software maintenance, In R.S. Arnold, editor, Pro-ceedings, Workshop on Software Maintenance, pages 9-37, Silver Spring, MD. IEEE ComputerSociety Press, 1983.

[7] CORPORATE Carnegie Mellon University, Mark C. Paulk, Charles V. Weber, Bill Curtis,and Mary Beth Chrissis. The capability maturity model: guidelines for improving the softwareprocess. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

[8] T. A. Corbi. Program understanding: challenge for the 1990’s. IBM Syst. J., 28(2):294–306,1989.

[9] J. L. Elshoff. An Analysis of Some Commercial PL/I Programs. IEEE Trans. Softw. Eng.,2(2):113–120, 1976.

[10] R. Greiner, B. A. Smith, and R. W. Wilkerson. A correction to the algorithm in reiter’stheory of diagnosis. Artif. Intell., 41(1):79–88, 1989.

[11] Penny Grubb and Armstrong A. Takang. Software Maintenance: Concepts and Practice.World Scientific Publishing Company, 2nd edition, July 2003.

[12] James W. Hooper and R. O. Chester. Software Reuse: Guidelines and Methods. PlenumPress, New York, 1991.

[13] C. Jones. How not to measure programming quality. Computer World, XX(3):82, January20 1986.

[14] K. C. Kang. A Reuse-based software development methodology. IEEE Computer SocietyPress, Los Alamitos, CA, USA, 1988.

[15] Bogdan Korel and Janusz Laski. Dynamic Program Slicing. Information Processing Letters,29:155–163, 1988.

[16] S. Lauchlan. Case Study Reveals Future Shock, Computing, February 1993.

94

[17] M.M. Lehman, Mm Lehman, J.F. Ramil, P.D. Wernick, D.E. Perry, and Wm Turski. Metricsand Laws of Software Evolution - The Nineties View, 1997.

[18] Hareton K. N. Leung. The Dark Side of Object-Oriented Software Development. In ICSM ’94:Proceedings of the International Conference on Software Maintenance, page 438, Washington,DC, USA, 1994. IEEE Computer Society.

[19] Bennett P. Lientz and E. Burton Swanson. Software Maintenance Management. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1980.

[20] David H. Longstreet. Software Maintenance and Computers. IEEE Computer Society Press,Los Alamitos, CA, USA, 1990.

[21] Cristinel Mateis, Markus Stumptner, Dominik Wieland, and Franz Wotawa. Extended Ab-stract - Model-Based Debugging of Java Programs. In M. Ducasse, editor, Proceedings ofthe Fourth International Workshop on Automated Debugging (AADEBUG 2000), Munich,Germany, 28-30 August 2000, 2000.

[22] W. M. Osborne. Building and Sustaining Software Maintainability. In Proc. Int’l Conf.Software Maintenance (ICSM), pages 13–23, 1987.

[23] Ron Patton. Software Testing. Sams, Indianapolis, IN, USA, 2000.

[24] Thomas M. Pigoski. Practical Software Maintenance: Best Practices for Managing YourSoftware Investment. John Wiley & Sons, Inc., New York, NY, USA, 1996.

[25] IEEE Computer Society Press. IEEE Std. 610.12, ”Standard Glossary of Software EngineeringTerminology”, Los Alamitos, CA, 1990.

[26] R. S. Pressman. Software engineering: a practitioner’s approach (2nd ed.). McGraw-Hill,Inc., New York, NY, USA, 1986.

[27] R. Reiter. A theory of diagnosis from first principles. Artif. Intell., 32(1):57–95, 1987.

[28] N. F. Schneidewind. The State of Software Maintenance. IEEE Trans. Softw. Eng., 13(3):303–310, 1987.

[29] Thomas A. Standish. An Essay on Software Reuse. IEEE Trans. Software Eng., 10(5):494–497, 1984.

[30] E. Burton Swanson and Cynthia Mathis Beath. Maintaining information systems in organi-zations. John Wiley & Sons, Inc., New York, NY, USA, 1989.

[31] Frank Tip. A Survey of Program Slicing Techniques. Journal of Programming Languages,3:121–189, 1995.

[32] Paolo Tonella. Reverse engineering of object oriented code. In ICSE ’05: Proceedings of the27th international conference on Software engineering, pages 724–725, New York, NY, USA,2005. ACM.

[33] Mark Weiser. Program Slicing. IEEE Transactions on Software Engineering, 10(4):352–357,1984.

[34] Franz Wotawa. On the Relationship between Model-Based Debugging and Program Slicing.Artificial Intelligence, 135(1–2):124–143, 2002.

[35] Andreas Zeller. Delta debugging - from automated testing to automated debugging. http:

//www.st.cs.uni-saarland.de/dd/, 2010-11-10.

[36] Andreas Zeller. Yesterday, my program worked. Today, it does not. Why? SIGSOFT Softw.Eng. Notes, 24(6):253–267, 1999.

95

http://www.st.cs.uni-saarland.de/dd/

http://www.st.cs.uni-saarland.de/dd/

[37] Andreas Zeller. Automated Debugging: Are We Close? IEEE Computer, pages 26–31,November 2001.

[38] Andreas Zeller. Isolating Cause-Effect Chains from Computer Programs, 2002.

[39] Andreas Zeller, Ieee Computer Society, and Ralf Hildebrandt. Simplifying and IsolatingFailure-Inducing Input. IEEE Transactions on Software Engineering, 28:2002, 2002.

[40] Xiangyu Zhang, Haifeng He, Neelam Gupta, and Rajiv Gupta. Experimental evaluation ofusing dynamic slices for fault localization. In Sixth International Symposium on Automated& Analysis-Driven Debugging (AADEBUG), pages 33–42, 2005.

96

Documents

Lecture Notes - Graz University of Technology · 2016-10-31 · the article A Survey of Program Slicing Techniques written by Frank Tip [31], di erent papers dealing with delta debugging