A Novel Software Visualisation Model to Support Object-Oriented …€¦ · software visualisation model based on a range of abstraction levels and structural and behavioural perspectives;

A Novel Software Visualisation Model to

Support Object-Oriented Program

Comprehension

Michael John Pacione

Department of Computer and Information Sciences

PhD

November 2005

The copyright of this thesis belongs to the author under the terms of the United Kingdom

Copyright Acts as qualified by University of Strathclyde Regulation 3.51. Due

acknowledgement must always be made of the use of any material contained in, or derived

from, this thesis.

ii

Abstract

Current software visualisation tools do not address the full range of software comprehension

requirements. This thesis presents a novel software visualisation model for supporting

object-oriented software comprehension that is intended to address the shortcomings of

existing tools. Related work in the fields of software visualisation, tool evaluation,

abstraction, diagrams, views, exploration and querying, metamodels, and software modelling

is discussed. An initial case study that prompted the development of this novel model is

described. The model is then introduced, based on multiple levels of abstraction, structural

and behavioural perspectives, and the integration of statically and dynamically extracted

information. The model is assessed theoretically against its original goals, and its support for

software comprehension strategies is examined. Abstraction operations between views in the

model and the combination of views are defined formally. A demonstration of the

application of the model to a real system is presented. A tool implementation of the model is

introduced. This tool is then used to evaluate the utility of the model in addressing typical

software comprehension tasks in real world software systems. It is concluded that the novel

software visualisation model proposed in this thesis provides effective support for the full

range of software comprehension tasks.

The contributions of this thesis are as follows: an abstraction scale and set of criteria for

classifying software comprehension tools; a thorough review and comparison of the extant

software visualisation tools; typical software comprehension activities and tasks to be used

in the evaluation of software comprehension tools; a schema for categorising view

arrangements in software tools; the findings of an initial study assessing the capabilities of

the extant software visualisation tools using typical software comprehension tasks; the novel

software visualisation model based on a range of abstraction levels and structural and

behavioural perspectives; a prototype implementation of the model as the VANESSA tool;

and the findings of the evaluation of this model using real software comprehension tasks and

real systems.

iii

Publications

Refereed conference papers

M J Pacione, ‘VANESSA: Visualisation Abstraction NEtwork for Software Systems

Analysis’ in Industrial and Tool Proceedings of the 21st IEEE International Conference on

Software Maintenance (ICSM), Budapest, pp. 85-88, Vienna: Harry Sneed, 2005

Best Tool Paper Award

M J Pacione, M Roper, M Wood, ‘A novel software visualisation model to support software

comprehension’ in Proceedings of the 11th Working Conference on Reverse Engineering

(WCRE), Delft, pp. 70-79, Los Alamitos, CA: IEEE CS Press, 2004

M J Pacione, ‘Software visualisation for object-oriented program comprehension’ in

Doctoral Symposium, Proceedings of the 26th International Conference on Software

Engineering (ICSE), Edinburgh, pp. 63-65, Los Alamitos, CA: IEEE CS Press, 2004

M J Pacione, ‘Software visualisation for object-oriented program comprehension’ in Poster

Proceedings of the 5th Postgraduate Research Conference in Electronics, Photonics,

Communications & Networks, and Computing Science (PREP), Hatfield, pp. 158-159,

Swindon: EPSRC, 2004

M J Pacione, M Roper, M Wood, ‘A comparative evaluation of dynamic visualisation tools’

in Proceedings of the 10th Working Conference on Reverse Engineering (WCRE), Victoria,

BC, pp. 80-89, Los Alamitos, CA: IEEE CS Press, 2003

iv

Technical reports

M J Pacione, A Fully Specified Abstraction Model for Software Visualisation, Technical

Report EFoCS-54-2004, Glasgow: Department of Computer and Information Sciences,

University of Strathclyde, 2004

M J Pacione, Evaluating a Model of Software Visualisation for Software Comprehension,

Technical Report EFoCS-53-2004, Glasgow: Department of Computer and Information

Sciences, University of Strathclyde, 2004

M J Pacione, Effective Visualisation for Comprehending Object-Oriented Software: A

Multifaceted, Three-Dimensional Abstraction Model for Software Visualisation, Technical

Report EFoCS-52-2004, Glasgow: Department of Computer and Information Sciences,

University of Strathclyde, 2004

M J Pacione, A Review and Evaluation of Dynamic Visualisation Tools, Technical Report

EFoCS-50-2003, Glasgow: Department of Computer and Information Sciences, University

of Strathclyde, 2003

v

Acknowledgements

I would like to thank my supervisors, Murray Wood and Marc Roper, for their guidance and

support during the course of my research.

I would also like to thank my fellow EFoCS PhD students – Neil Walkinshaw, Doug Kirk,

Al Dunsmore, and Matt Munro – for their solidarity, friendship, and innumerable games of

pool, foosball, and frisbee.

Doug Kirk at the University of Strathclyde, Jens Gulden at the Technical University of

Berlin, and Rob Lintern at the University of Victoria each made an invaluable contribution

by providing me with evaluation data.

The greatest debt I owe to my parents, Michael and Christine, for their encouragement and

support – both financial and moral – over the past three years (and the preceding twenty-

two). Lastly, I want to thank all my friends, especially Clare and Richard, for keeping me

sane and reminding me that there are more important things in life than research and the

pursuit of knowledge.

“All this worldly wisdom was once the unamiable heresy of some wise man.”

“All endeavor calls for the ability to tramp the last mile, shape the last plan, endure the last hours toil. The fight to the finish spirit is the one... characteristic we must posses if we are to face the future as finishers.”

Henry David Thureau, 1817-1862

“You qualify in your boiler suit and then put on your tuxedo.”

Jock Stein, 1922-1985

vi

Contents

Abstract iii

Publications iv

Acknowledgements vi

Contents vii

List of figures xix

List of tables xxx

1 Introduction 1

1.1 Background 1

1.1.1 Software visualisation 1

1.1.2 Software comprehension 1

1.1.3 Challenges in object-oriented software comprehension 2

1.1.4 Reverse engineering tools support software comprehension 2

1.1.4.1 The relationship between forward and reverse

engineering

3

1.1.5 Abstraction 3

1.2 Thesis overview 4

1.2.1 Motivation and aim 4

1.2.2 Research hypothesis 4

1.2.3 Approach and methodology 5

1.2.4 Contributions of this thesis 5

2 Related Work 6

2.1 Software visualisation techniques 6

2.1.1 Static software comprehension techniques 6

2.1.2 Dynamic software comprehension techniques 7

2.1.3 Advantages of dynamic analysis for object-oriented systems 7

2.1.4 Debuggers 8

2.1.5 Software visualisation tools 9

2.1.6 Data collection for software visualisation 9

2.1.7 Analysing the data produced 11

vii

2.1.8 Presenting the results 12

2.1.8.1 Basic graph representations 12

2.1.8.2 UML diagrams 12

2.1.8.3 Message sequence charts 14

2.1.8.4 Other representations 15

2.2 Software visualisation tools 16

2.2.1 Characteristics of software visualisation tools 17

2.2.1.1 Three criteria for characterising software

visualisation tools

17

2.2.1.2 A scale to indicate level of abstraction 17

2.2.1.3 Software visualisation tool taxonomies 18

2.2.2 Program Explorer (level 2) 19

2.2.2.1 Description 19

2.2.2.2 Evaluation 20

2.2.2.3 Comparison 20

2.2.2.4 Assessment 21

2.2.3 Scene (level 2) 22





2.2.4 Architecture-oriented visualization (level 4) 23





2.2.5 ISVis (level 4) 27





2.2.6 Dali (level 4) 30




viii


2.2.7 Ovation (level 2) 33





2.2.8 Reflexion models (level 4) 36


2.2.8.1.1 AVID 36

2.2.8.1.2 RMTool 38


2.2.8.2.1 AVID 39

2.2.8.2.2 RMTool 40


2.2.8.3.1 AVID 41

2.2.8.3.2 RMTool 41


2.2.9 Gaudi (levels 3-4) 42





2.2.10 Shimba (levels 2-4) 44





2.2.11 Jinsight (level 2-3) 47





2.2.12 Collaboration Browser (levels 2-4) 50



ix



2.2.13 Together debugger (level 1) 53





2.2.14 Together diagrams (levels 2-3) 54





2.2.15 SHriMP (levels 0, 2-4) 55





2.2.16 BLOOM and JIVE (levels 2-3) 57





2.2.17 Polymetric Views, Class Blueprint, RelVis (levels 2-3) 58





2.2.18 Seesoft, SeeSys, SeeSlice, HierNet, SeeNet, SeeNet3D

(levels 0, 2-3)

62





2.2.19 sv3D and Imsovision (levels 2-3) 66


x




2.2.20 Tool summary 67

2.3 Abstraction 68

2.3.1 The concept of abstraction 68

2.3.2 The historical origins of abstraction 69

2.3.3 The application of abstraction 69

2.3.4 Abstraction in software engineering 70

2.3.5 Abstraction in software visualisation 71

2.4 Effective presentation techniques for software visualisation 71

2.4.1 Diagrams for describing software 71

2.4.1.1 Structured design diagrams 72

2.4.1.2 Object-oriented diagrams 73

2.4.1.3 Recent literature 74

2.4.2 Views for software comprehension 74

2.4.2.1 A single view illustrating a single facet 75

2.4.2.2 Multiple independent views illustrating a single

facet

76

2.4.2.3 Multiple interdependent views illustrating a

single facet

76

2.4.2.4 A single view illustrating multiple facets 77

2.4.2.5 Multiple independent views illustrating multiple

facets

77

2.4.2.6 Multiple interdependent views illustrating

multiple facets

77

2.5 Effective techniques for exploring and querying visualisations 78

2.5.1 Exploration 78

2.5.2 Querying 78

2.5.3 Guided navigation 79

2.6 Software modelling 79

2.6.1 The 4+1 view model 79

2.6.2 Hofmeister et al. 80

2.6.3 ManSART 81

2.6.4 Zachman Framework for Enterprise Architecture 82

xi

2.6.5 IEEE Recommended Practice for Architectural Description 82

2.6.6 Other approaches 83

2.7 Evaluation 83

2.7.1 Globus and Uselton (1995) 84

2.7.2 Murphy et al. (1996) 85

2.7.3 Bellay and Gall (1997) 86

2.7.4 Armstrong and Trudeau (1998) 87

2.7.5 Storey et al. (1996) 88

2.7.6 Sim and Storey (2000) 88

2.7.7 Sim et al. (2000) 89

2.7.8 Storey et al. (2000) 90

2.7.9 Bassil and Keller (2001) 91

2.7.10 Hatch et al. (2001) 92

2.7.11 Knight (2001) 93

2.7.12 Kollmann et al. (2002) 93

2.7.13 Conclusions 93

3 Initial Study 96

3.1 Introduction 96

3.2 Generic questions 97

3.2.1 General software comprehension questions 97

3.2.2 Specific reverse engineering questions 98

3.3. Specific reverse engineering questions specified for JHotDraw 98

3.4 Together diagrams 101

3.5 Jinsight 102

3.6 Reflexion models 103

3.7 Together debugger 104

3.8 Case study summary 105

3.9 Conclusions 111

4 A Novel Software Visualisation Model 112

4.1 Background 112

4.2 Research hypothesis 112

4.3 A visualisation model for object-oriented software 113

4.4 Examples 118

xii

4.5 Key research challenges 122

5 Refining the Initial Model 123

5.1 Evaluation based on representative tasks 123

5.2 The basis for typical software comprehension tasks 123

5.3 Task set analysis 127

5.4 New task sets 128

5.4.1 General software comprehension tasks 129

5.4.2 Specific reverse engineering tasks 129

5.5 Justification 130

5.6 Task set revision summary 131

5.7 Theoretical evaluation of the proposed model 131

5.7.1 Model information required to address typical software

comprehension tasks

131

6 The Refined Model 134

6.1 Introduction 134

6.2 Abstraction levels 134

6.3 Inter-level abstraction relationships 138

6.3.1 Abstraction mechanisms 138

6.3.2 Detailed abstraction example 140

6.3.3 Generic abstraction mappings 142

6.3.3.1 Structure hierarchy 143

6.3.3.2 Behaviour hierarchy 146

6.3.4 Combining information from multiple views 147

6.3.4.1 From the same level of each hierarchy 148

6.3.4.2 From different levels of the same hierarchy 148

6.3.4.3 From different levels of each hierarchy 149

6.4 Metamodels 151

6.4.1 Dagstuhl Middle Metamodel 152

6.4.2 UML metamodel 152

6.5 Applying the model to a real system 154

6.6 VANESSA: Visualisation Abstraction NEtwork for Software Systems

Analysis

155

6.6.1 Tool implementation 155

xiii

6.6.2 Example analyses 157

6.6.3 Comparison with other software visualisation tools 160

6.7 Summary 161

7 Evaluation 163

7.1 Experimental setup 163

7.2 Comprehension questions 164

7.3 Threats to validity 164

7.3.1 Internal validity 165

7.3.2 Construct validity 165

7.3.3 External validity 166

7.4 Subject systems 167

7.4.1 JHotDraw 167

7.4.2 BeautyJ 167

7.4.3 SHriMP 168

7.4.4 ArgoUML 170

7.5 Findings 171

7.5.1 Finding 1 171

7.5.1.1 Replication summary 174

7.5.2 Finding 2 175

7.5.3 Finding 3 178

7.5.4 Finding 4 183

7.5.5 Finding 5 185

7.5.6 Finding 6 186

7.5.7 Finding 7 188

7.5.8 Finding 8 189

7.5.9 Finding 9 192

7.5.10 Finding 10 193

7.5.11 Finding 11 195

7.5.12 Miscellaneous issues 197

7.5.13 Conclusions 199

8 Conclusions 200

8.1 Summary 200

8.2 Conclusions 201

xiv

8.3 Future work 201

References 203

Appendices A-1

Appendix A – Initial Study Lab Book A-2

A.1 Together diagrams A-2

A.1.1 General software comprehension questions A-2

A.1.2 Specific reverse engineering questions A-7

A.2 Jinsight A-14



A.3 Reflexion models A-28

A.3.1 jRMTool A-28

A.3.2 AVID A-29



A.4 Together debugger A-34



Appendix B – Manual Model Verification Lab Book A-43

B.1 The JHotDraw framework A-43

B.2 Producing system-specific abstraction mappings A-44

B.2.1 The structure hierarchy A-44

B.2.1.1 From source code to level 0 A-44

B.2.1.2 From level 0 to level 1 A-56






B.2.2 The behaviour hierarchy A-97

B.2.2.1 From event trace to level 0 A-97



xv





B.3 JavaDrawApp abstraction hierarchies A-131

B.4 Combining information from multiple views A-134

B.4.1 From the same level of each hierarchy A-134

B.4.2 From different levels of the same hierarchy A-137

B.4.3 From different levels of each hierarchy A-147

B.5 Validation and analysis A-154

B.5.1 Logical validation A-154

B.5.2 Comparison with other diagrams A-154

B.5.2.1 The JHotDraw class diagram A-155

B.5.2.2 The JHotDraw sequence diagram A-158

B.5.2.3 An expert’s component model A-160

B.5.2.4 An expert’s use case model A-164

Appendix C – Validation of Support for Software Comprehension Strategies A-167

C.1 Software comprehension strategies A-167

C.1.1 The bottom-up model A-167

C.1.2 The top-down model A-168

C.1.3 The knowledge-based model A-168

C.1.4 The systematic and as-needed models A-168

C.1.5 The integrated model A-169

C.2 Object-oriented software comprehension A-169

C.3 Comprehension in software visualisation A-171

C.4 Support for software comprehension strategies in the novel software

visualisation model

A-172

C.5 Summary A-173

Appendix D – Entity-Relationship Diagrams for the Novel Model A-175

Appendix E – Comparison with Simulation and Continuous System Abstraction

Techniques

A-181

E.1 Multimodelling A-181

xvi

E.2 Modelling and simulation abstraction techniques A-184

Appendix F – Comparison of Abstraction Relations in the Novel Software

Visualisation Model and the UML Metamodel

A-188

F.1 Abstraction relations in the novel software visualisation model A-188

F.2 UML metamodel A-190

F.3 Representing model information in the UML metamodel A-190

F.3.1 Activities metamodel (behaviour level 1) A-190

F.3.2 Interactions metamodel (behaviour level 2) A-190

F.3.3 Classes metamodel (structure level 2) A-191

F.3.4 Components metamodel (structure level 3) A-192

F.4 Representing abstraction relationships in the UML metamodel A-192

F.4.1 Behaviour levels 1-2 A-192

F.4.2 Structure levels 2-3 A-193

F.5 Summary A-195

Appendix G – Evaluation Lab Book A-196

G.1 System 1 - JHotDraw A-196

G.1.1 Replication - general software comprehension questions A-196

G.1.2 Replication - specific reverse engineering questions A-200

G.1.3 Together diagrams A-208

G.1.4 System designers’ diagrams A-214

G.1.5 System expert’s diagrams A-217

G.2 System 2 – BeautyJ A-221

G.2.1 Comprehension questions A-221

G.2.1.1 Question 1 A-221





G.2.2 Documentation A-246


G.3 System 3 - SHriMP A-260



xvii

G.4 System 4 - ArgoUML A-277



G.5 Conclusions A-336

G.5.1 Abstraction A-336

G.5.1.1 Abstraction levels A-336

G.5.1.2 Navigation between levels A-337

G.5.1.3 Combination of levels A-337

G.5.2 Facets A-337

G.5.2.1 Structural and behavioural A-337

G.5.2.2 Combination of facets A-337

G.5.3 Static/dynamic analysis A-338

G.6 Package structures A-338

G.6.1 JHotDraw A-338

G.6.2 BeautyJ A-338

G.6.3 SHriMP A-339

G.6.4 ArgoUML A-339

xviii

List of Figures

Figure 1.1 The relationship between forward and reverse engineering 3

Figure 2.1 The UML sequence diagram for the Singleton design pattern 13

Figure 2.2 The UML collaboration diagram for the Singleton design pattern 14

Figure 2.3 A scale to indicate level of abstraction 18

Figure 2.4 The positions of tools on the abstraction scale of Figure 2.3 68

Figure 2.5 The six arrangements of views onto a software model. The rectangles

around the views in parts c and f represent the coordination inherent in such

interdependent arrangements

75

Figure 3.1 The orrery application. The circles represent astronomical bodies, such

as planets and moons, coloured according to their diameter. A blue border around

a planet represents atmosphere. The satellite icons represent satellites. The

directed arcs indicate gravitational attraction. The toolbar on the left is used to

select diagram objects, and to create planets, satellites (orbiting and non-orbiting),

atmosphere, and gravity

99

Figure 3.2 The sequence diagram drawn by Together for the

CH.ifa.draw.standard.SelectionTool.mouseDown() method,

illustrating the wide and shallow diagrams produced by static analysis

110

Figure 4.1 A multifaceted, three-dimensional abstraction model for software

visualisation

117

Figure 4.2 An example of the structure abstraction hierarchy 119

Figure 4.3 An example of the behaviour abstraction hierarchy 120

Figure 4.4 An example of the data abstraction hierarchy 121

Figure 6.1 An example instantiation of the behaviour hierarchy for part of the

JHotDraw framework

137

Figure 6.2 An abstraction network illustrating the abstraction relationships

between the views of Table 6.1

139

xix

Figure 6.3 Example combination of level 2 structure and level 5 behaviour

information

151

Figure 6.4 The VANESSA analysis process 156

Figure 6.5 The level 3 Behaviour view of JHotDraw. Arcs denote usage 157

Figure 6.6 Combining views from the same level of each hierarchy. In the

combined view, solid arcs denote usage and dashed arcs denote dependency

158

Figure 6.7 Combining views from different levels of the same hierarchy. In the

combined view, arcs between components denote usage and arcs between

business entities denote business rules

159

Figure 6.8 Combining views from different levels of each hierarchy. Between

classes: solid arcs denote association; dashed arcs denote extension; dotted arcs

denote inheritance. Between components: arcs denote usage

160

Figure 7.1 A screenshot of the BeautyJ options dialogue 168

Figure 7.2 A screenshot of the SHriMP application 169

Figure 7.3 A screenshot of ArgoUML 170

Figure 7.4 A part of the S2 view of JavaDrawApp 172

Figure 7.5 The B3 view for JavaDrawApp 173

Figure 7.6 The custom B1 view of RoundRectangleFigure 174

Figure 7.7 The BeautyJ documentation main classes diagram 176

Figure 7.8 A custom S2/S3 view of BeautyJ 177

Figure 7.9 The Together class diagram for the shrimp.DisplayBean package 177

Figure 7.10 The custom S2 view of the shrimp.DisplayBean package 177

Figure 7.11 The JHotDraw expert’s use case diagram 178

Figure 7.12 The S5 and B5 views of JavaDrawApp 179

Figure 7.13 The expert’s diagram of the static behavioural relationships between

the BeautyJ components

180

Figure 7.14 The B3 view of BeautyJ 181

Figure 7.15 The custom S2 view of the cognitive.critics package 181

Figure 7.16 The custom S1 view of PluggableDiagram 182

Figure 7.17 A custom S2 view of BeautyJ 184

Figure 7.18 The custom S2 view of the Pluggable types from application.api 185

Figure 7.19 The custom S2 view of the multi editor pane classes 186

Figure 7.20 The custom B2 view of StandardSourclet’s interactions 187

Figure 7.21 The combined S3/B3 view of BeautyJ 188

xx

Figure 7.22 The Together sequence diagram for

GXLPersistentStorageBean.loadData()

190

Figure 7.23 A custom B2 view of SHriMP 191

Figure 7.24 A part of the custom B2 view 193

Figure 7.25 Illustration of abstraction level usage 194

Figure A.1 The class diagram generated by Together for the orrery application A-3

Figure A.2 The sequence diagram generated by Together for the

Orbit.MainClass.main() method

A-5


AbstractFigure.moveBy() method

A-8


EllipseFigure.basicMoveBy() method

A-9

Figure A.5 The class diagram generated by Together for the Figure interface A-10

Figure A.6 The class diagram generated by Together for the Rectangle class A-11


AbstractFigure.displayBox(Point, Point) method

A-12


AbstractFigure.displayBox(Rectangle) method

A-13


EllipseFigure.basicDisplayBox() method

A-14

Figure A.10 Part of the Jinsight execution view for the orrery application. The

coloured horizontal lines represent method calls

A-15

Figure A.11 A Jinsight reference pattern view for part of the orrery application A-16

Figure A.12 Part of the Jinsight execution view for the orrery application with

repetition detection turned off

A-17

Figure A.13 The Jinsight execution view from Figure A.12 with repetition

detection turned on

A-18

Figure A.14 Part of the Jinsight object histogram for the orrery application,

showing the number of calls to methods of each object. The scale is shown at the

top of the window, with black being the lowest and red the highest. Filled

rectangles represent objects; outline rectangles represent garbage-collected

objects. Diamonds represent the class object of a class

A-19

Figure A.15 Part of the Jinsight object histogram for the orrery application,

showing the active memory size of the objects

A-19

xxi

Figure A.16 Part of the Jinsight method histogram for the orrery application,

showing the methods called by the selected method

A-20

Figure A.17 Part of the Jinsight invocation browser view for the orrery

application, showing methods that call the method highlighted in Figure A.16

A-21

Figure A.18 Part of the Jinsight execution view for the second orrery event trace,

showing the redraw methods

A-22

Figure A.19 Part of the Jinsight call tree for the second orrery event trace,

showing the call tree from the invalidate() method

A-23


showing the implicit control structure of the JHotDraw screen redraw mechanism

A-24

Figure A.21 The Jinsight call tree showing the implicit control structure of the

JHotDraw screen redraw mechanism

A-24


with the EllipseFigure.basicMoveBy() method selected

A-25

Figure A.23 The Jinsight call tree for the EllipseFigure.basicMoveBy()

method

A-26


which shows that AbstractFigure.displayBox(Point, Point) calls

MyEllipseFigure.basicDisplayBox(Point, Point)

A-27

Figure A.25 The initial high-level model input to jRMTool. Ovals represent high-

level system components. Directed arcs represent communication

A-28

Figure A.26 The reflexion model computed by jRMTool. Ovals represent high-

level system components. Directed arcs represent communication; arc annotations

indicate frequency. Solid arcs indicate agreement with the analyst’s model

(convergences); dashed arcs indicate absences from the analyst’s model

(divergences); dotted arcs indicate erroneous communications in the analyst’s

model (absences)

A-29

Figure A.27 The AVID cell view at the start of the execution A-30

Figure A.28 The AVID cell view partway through the execution A-31

Figure A.29 The AVID summary view of the execution A-31

Figure A.30 Together debugger output showing the method calls involved in a

screen redraw in JHotDraw

A-38

Figure A.31 Together debugger output showing the implicit control structure of

the JHotDraw redraw mechanism

A-41

xxii

Figure A.32 The Together debugger user interface. The top-left pane shows

packages and classes. The top-right pane shows a class diagram. The middle-right

pane shows the program code. The bottom pane is the debugger interface, which

shows the watches set at the DEFAULT_MASS, MINIMUM_MASS,

MAXIMUM_MASS, and mass attributes

A-42

Figure B.1 The JavaDrawApp source code, marked up to illustrate level 0

structure entities and relationships. Key: ClassContainmentOperator0,

MethodContainmentOperator0, MethodDeclarationOperand0,

AttributeDeclarationOperand0, ClassOperand0, InheritanceOperator0,

CompositionOperator0

A-54

Figure B.2 A graphical illustration of the extracted level 1 structure information.

Key: ClassContainmentOperator0, MethodContainmentOperator0

A-58


Key: InheritanceOperator0, CompositionOperator0

A-60


Key: extension, implementation, composition

A-74

Figure B.5 A graphical illustration of the generated level 3 structure information A-86

Figure B.6 A graphical illustration of the derived level 4 structure information A-90

Figure B.7 A graphical illustration of the derived level 5 structure information A-96

Figure B.8 Part of the extracted event trace. Square bracketed lines contain state

information

A-98

Figure B.9 A graphical illustration of the extracted AnimationDecorator level 1

behaviour information

A-101

Figure B.10 A graphical illustration of the extracted level 2 behaviour information A-103

Figure B.11 A graphical illustration of the extracted level 2 behaviour information A-117

Figure B.12 A graphical illustration of the generated level 3 behaviour

information

A-122

Figure B.13 A graphical illustration of the derived level 4 behaviour information A-127

Figure B.14 A graphical illustration of the derived level 5 behaviour information A-130

Figure B.15 The JavaDrawApp structure hierarchy A-132

Figure B.16 The JavaDrawApp behaviour hierarchy A-133

Figure B.17 A graphical illustration of the combined level 3 structure and


A-137

xxiii

Figure B.18 A graphical illustration of the combined level 2 and level 5 behaviour

information

A-146

Figure B.19 A graphical illustration of the combined level 2 structure and level 5


A-153

Figure B.20 The main class diagram from the JHotDraw architecture overview A-155

Figure B.21 Combined and filtered level 2 structure and behaviour information

from the novel model. Key: composition

A-155


from the novel model. Key: extension, composition, implementation, invocation

A-157


from the novel model. Key: structure, behaviour

A-158

Figure B.24 The sequence diagram from the JHotDraw architecture overview A-159

Figure B.25 Filtered level 3 behaviour information from the novel model A-160

Figure B.26 A component diagram produced by an experienced JHotDraw reuser A-161

Figure B.27 Combined level 3 information from the novel model A-162

Figure B.28 The reflexion model resulting from comparing the combined level 3

model with the expert’s component model

A-163

Figure B.29 A use case diagram produced by an experienced JHotDraw reuser A-164

Figure B.30 Combined level 5 information from the novel model A-165

Figure B.31 The reflexion model resulting from comparing the combined level 5

model with the expert’s use case model

A-166

Figure D.1 An ERD for the program code (level 0). Operators can be unary (e.g.

boolean NOT, ! in C, C++, and Java), binary (e.g. assignment, = in C, C++, and

Java), or ternary (e.g. conditional, ?: in C, C++, and Java )

A-175

Figure D.2 An ERD for the event trace (level 0) A-175

Figure D.3 An ERD for intra-class structure (structure level 1) A-176

Figure D.4 An ERD for inter-class structure (structure level 2). The (0, *)

cardinality of the Inheritance relationship assumes that multiple inheritance is

permitted

A-176

Figure D.5 An ERD for system architecture (structure level 3) A-177

Figure D.6 An ERD for system structure deployment (structure level 4) A-177

Figure D.7 An ERD for business structure (structure level 5) A-178

Figure D.8 An ERD for intra-object interaction (behaviour level 1) A-178

Figure D.9 An ERD for inter-object interaction (behaviour level 2) A-179

xxiv

Figure D.10 An ERD for component interaction (behaviour level 3) A-179

Figure D.11 An ERD for system behaviour distribution (behaviour level 4) A-180

Figure D.12 An ERD for business behaviour (behaviour level 5) A-180

Figure E.1 Frantz’s taxonomy of model abstraction techniques (from [Frantz

1995])

A-184

Figure G.1 A part of the S2 view of JavaDrawApp A-196

Figure G.2 A part of the B2 view for JavaDrawApp A-197

Figure G.3 The S3 view for JavaDrawApp A-198

Figure G.4 The B3 view for JavaDrawApp A-199

Figure G.5 A part of the custom B2 view A-200

Figure G.6 A part of the pruned B2 view A-202

Figure G.7 The custom S1/S2 view showing the methods of Figure and

AbstractFigure

A-206

Figure G.8 The custom B1 view of RoundRectangleFigure A-207

Figure G.9 The Together class diagram for the figures package A-209

Figure G.10 The custom S2 view of the figures package A-210

Figure G.11 The Together sequence diagram for

DrawApplication.saveAsStorableOutput()

A-212

Figure G.12 The custom B2 view A-213

Figure G.13 The main class diagram from the architecture overview A-214

Figure G.14 The custom S2 view corresponding to the main class diagram A-215

Figure G.15 The SelectionTool class diagram from the architecture overview A-215

Figure G.16 The custom S2 diagram corresponding to the SelectionTool class

diagram

A-215

Figure G.17 The sequence diagram from the architecture overview A-216

Figure G.18 The custom B2 view corresponding to the architecture overview

sequence diagram

A-217

Figure G.19 The JHotDraw expert’s component diagram A-218

Figure G.20 The S3 view of JavaDrawApp A-218

Figure G.21 The B3 view of JavaDrawApp A-219

Figure G.22 The JHotDraw expert’s use case diagram A-220

Figure G.23 The S3 and B3 views of JavaDrawApp A-221

xxv

Figure G.24 The expert’s diagram of the static behavioural relationships between

the BeautyJ components

A-222

Figure G.25 The B3 view of BeautyJ A-223

Figure G.26 The combined S3/B3 view of BeautyJ A-224

Figure G.27 The expert’s diagram of the static behavioural relationships between

BeautyJ’s main classes

A-225

Figure G.28 The custom B2 view of BeautyJ’s main classes A-226

Figure G.29 The expert’s diagram of the dynamic behavioural relationships

between BeautyJ’s main classes

A-228

Figure G.30 A part of the custom B2/B3 view of the system’s main classes A-229

Figure G.31 A part of the S2 view of BeautyJ A-230

Figure G.32 The custom S2 view of the javasource package A-231

Figure G.33 The expert’s solution for adding generics support to BeautyJ A-233

Figure G.34 The expert’s solution for adding typesafe enum support to BeautyJ A-234

Figure G.35 The expert’s solution for adding vararg support to BeautyJ A-235

Figure G.36 The expert’s solution for adding static import support to BeautyJ A-236

Figure G.37 The expert’s diagram of the internal behaviour of

StandardSourclet.buildStartSource

A-237

Figure G.38 The custom B1 view of StandardSourclet’s interactions A-238

Figure G.39 The custom B2 view of StandardSourclet A-239

Figure G.40 The expert’s diagram of Sourclet behaviour A-241

Figure G.41 The custom S1 view of ProgressTracker A-242

Figure G.42 The custom B2 view showing SourceParser interactions A-243

Figure G.43 The expert’s solution for adding ProgressTracker support to BeautyJ A-245

Figure G.44 The BeautyJ documentation component overview diagram A-246

Figure G.45 The S3 view of BeautyJ A-246

Figure G.46 The BeautyJ documentation main classes diagram A-247

Figure G.47 A custom S2/S3 view of BeautyJ A-248

Figure G.48 The BeautyJ documentation sourclet diagram A-248

Figure G.49 A custom S2/S3 view of BeautyJ A-249

Figure G.50 The BeautyJ documentation Java Source Parser diagram A-250

Figure G.51 A custom S2 view of BeautyJ A-251

Figure G.52 The application.beautyj class diagram A-254

Figure G.53 The custom S2 view showing the application.beautyj package A-254

Figure G.54 The util.javasource class diagram A-254

xxvi

Figure G.55 The util.javasource.jit class diagram A-255

Figure G.56 The custom S2 view showing the util.javasource.jit package A-256

Figure G.57 The class diagram for the util.javasource.sourclet package A-257

Figure G.58 The custom S2 view of the util.javsource.sourclet package A-257

Figure G.59 The sequence diagram for SourceParser.buildSource() A-258

Figure G.60 A custom B2 view of BeautyJ A-259

Figure G.61 The S3 view of SHriMP A-261

Figure G.62 The B3 view of SHriMP A-261

Figure G.63 The expert’s component diagram of SHriMP A-261

Figure G.64 The custom S2 view of the DisplayBean.layout package A-263

Figure G.65 The pruned custom S1 view of ShrimpView A-265

Figure G.66 The Together class diagram for the shrimp package A-266

Figure G.67 The custom S2 diagram for the shrimp package A-267

Figure G.68 The Together class diagram for the shrimp.gui package A-268

Figure G.69 The custom S2 view of the shrimp.gui package A-269

Figure G.70 The Together class diagram for the shrimp.SearchBean package A-270

Figure G.71 The custom S2 view of the shrimp.SearchBean package A-271

Figure G.72 The Together class diagram for the shrimp.DisplayBean package A-272

Figure G.73 The custom S2 view of the shrimp.DisplayBean package A-273


GXLPersistentStorageBean.loadData()

A-275

Figure G.75 A custom B2 view of SHriMP A-276

Figure G.76 The cookbook diagram for the model subsystems A-277

Figure G.77 A custom S3 and B3 view of ArgoUML A-278

Figure G.78 The cookbook diagram for the view subsystem A-278

Figure G.79 A custom S3 view of ArgoUML A-278

Figure G.80 A custom B3 view of ArgoUML A-279

Figure G.81 The cookbook diagram for the control subsystem A-279



Figure G.84 The cookbook diagram for the loadable subsystems A-281



Figure G.87 A part of the S1 view of ModelEventPump A-283

Figure G.88 The custom S2 view of the model.uml package A-285

xxvii

Figure G.89 The custom S1 view of CoreFactoryImpl A-288

Figure G.90 The custom S1 view of CoreHelperImpl A-289


Figure G.92 The cookbook diagram of the main critics and cognitive classes A-291

Figure G.93 The custom S2 view of the cognitive.critics package A-292

Figure G.94 The custom S1 view of CrCircularInheritance A-293

Figure G.95 The custom S1 view of CrEmptyPackage A-293

Figure G.96 The custom S1 view of CrIllegalName A-294

Figure G.97 The cookbook diagram of the cognitive.critics package and related

classes

A-296

Figure G.98 The custom S3 combination of the cognitive.critics package and

related classes

A-297

Figure G.99 The cookbook diagram of the multi editor pane classes A-398

Figure G.100 The custom S2 view of the multi editor pane classes A-399

Figure G.101 A part of the custom S1 view of MultiEditorPane A-399

Figure G.102 The custom S2 view of the uml.diagram.static_structure.layout

package

A-301

Figure G.103 The custom S2 view of the uml.diagram.static_structure.ui package A-302

Figure G.104 A part of the custom S1 view of ClassDiagramGraphModel A-304

Figure G.105 The custom S1 view of ClassDiagramRenderer A-306

Figure G.106 The custom S2 view of the PropertyPanel classes A-308

Figure G.107 The cookbook diagram of the other languages components A-309

Figure G.108 The custom S3 view of the other languages components A-309

Figure G.109 The custom B3 view of the other languages components A-309

Figure G.110 A part of the custom S1 view of DetailsPane A-311

Figure G.111 The custom S2 view of the uml.ui.TabXXXX classes A-313

Figure G.112 A part of the custom S2 view of the ui.explorer.rules package A-315

Figure G.113 A part of the custom S1 view of PerspectiveManager A-316

Figure G.114 The custom S1 view of FigNodeModelElement A-316

Figure G.115 The custom S2 view of the moduleloader package A-317

Figure G.116 The custom S1 view of ModuleInterface A-318

Figure G.117 A part of the custom S1 view of ModuleLoader A-318

Figure G.118 The custom S1 view of ModuleStatus A-319

Figure G.119 A portion of the custom S1 view of ModuleTableModel A-319

Figure G.120 The custom S1 view of PluggableMenu A-321

xxviii

Figure G.121 The custom S1 view of PluggableDiagram A-322

Figure G.122 The custom S2 view of the Pluggable types from application.api A-323

Figure G.123 The Together class diagram for the kernel package A-324

Figure G.124 The custom S2 view of the kernel package A-324

Figure G.125 The Together class diagram for the language.java.generator package A-325

Figure G.126 The custom S2 view of the language.java.generator package A-326

Figure G.127 The Together class diagram for the model.uml package A-327

Figure G.128 The Together class diagram for the uml.reveng.java package A-331

Figure G.129 The custom S2 view of the uml.reveng.java package A-332


ClassDiagramGraphModel.addNode()

A-334


xxix

List of Tables

Table 2.1 A selection of diagrams for describing software 72

Table 3.1 Tools summary comparison 106

Table 3.2 Questions summary comparison 107

Table 4.1 The proposed visualisation model for object-oriented software 114

Table 5.1 The correspondence between typical software comprehension activities

and the revised task sets

130

Table 5.2 Information required from each dimension of the proposed model to

address the general software comprehension tasks

132

Table 5.3 Information required from each dimension of the proposed model to

address the specific reverse engineering tasks

132

Table 6.1 The abstraction levels of the proposed model 135

Table 6.2 Abstraction mappings for the structure hierarchy 144

Table 6.3 Abstraction mappings for the behaviour hierarchy 146

Table 7.1 Instances of usage of each of the five abstraction levels of the model in

addressing the comprehension tasks

194

Table 7.2 Categorisation of BeautyJ evaluation questions by typical software

comprehension questions

195

Table 7.3 Categorisation of SHriMP evaluation questions by typical software


196

Table 7.4 Categorisation of ArgoUML evaluation questions by typical software


196

Table B.1a The level 0 structure entities and relationships extracted from the

JavaDrawApp source code. Format: line_number:name

A-55

xxx

Table B.1b The level 0 structure entities and relationships extracted from the

JavaDrawApp source code

A-56

Table B.2 The level 1 structure entities and relationships derived from Table B.1a A-57

Table B.3 The level 2 structure entities and relationships derived from Table B.1b A-60

Table B.4 The level 2 structure entities and relationships derived from Table B.2 A-62

Table B.5 The extracted level 2 structure entities and relationships A-64

Table B.6. The analyst mappings from level 2 – level 3 A-75

Table B.7 The generated level 3 structure entities and relationships A-83

Table B.8 The derived level 4 structure entities and relationships A-87

Table B.9 The analyst mappings from structure level 3 – level 5 A-91

Table B.10 The derived level 5 structure entities and relationships A-95

Table B.11 The extracted level 0 behaviour entities and relationships A-99

Table B.12 The AnimationDecorator level 1 behaviour entities and relationships

derived from Table B.11

A-100

Table B.13 The level 2 behaviour entities and relationships derived from Table

B.11

A-102

Table B.14. The level 2 behaviour entities and relationships derived from Table

B.12

A-104

Table B.15 The extracted level 2 behaviour entities and relationships A-105

Table B.16 The derived level 3 behaviour information A-118

Table B.17 The derived level 4 behaviour entities and relationships A-123

Table B.18 The derived level 5 behaviour entities and relationships A-128

Table B.19 The entity and relationship sets of the combined view A-134

Table B.20 The entity and relationship sets of the combined view. Key: level 2

behaviour, level 5 behaviour

A-138

Table B.21 The entity and relationship sets of the combined view. Key: Level 2

structure, Level 5 behaviour

A-148

Table E.1 Abstraction processes listed by Zeigler (from [Ziegler 2000]) A-183

Table E.2 Correspondence between abstraction operations in the novel model and

Frantz’s taxonomy

A-186

Table F.1 Entities and relationships at behaviour levels 1 and 2 A-189

Table F.2 Entities and relationships at structure levels 2 and 3 A-189

Table F.3 Activities metamodel correspondence A-190

xxxi

Table F.4 Interactions metamodel correspondence A-190

Table F.5 Classes metamodel correspondence A-191

Table F.6 Components metamodel correspondence A-192

Table G.1 Correspondence between interactions in the architecture overview

sequence diagram and VANESSA diagram

A-216

xxxii

1 Introduction

“Software visualisation is nifty stuff”

M Petre, A F Blackwell, T R G Green [Petre 1997]

1.1 Background

1.1.1 Software visualisation

This thesis presents a novel software visualisation model for supporting object-oriented

software comprehension. Software visualisation is the process of modelling software systems

for comprehension [Price 1993]. The comprehension of software systems both during and

after development is a crucial component of the software process. The complex interactions

inherent in the object-oriented paradigm make visualisation a particularly appropriate

comprehension technique. Software visualisation is therefore a useful technique in object-

oriented software maintenance. The large volume of information typically generated during

visualisation necessitates tool support.

1.1.2 Software comprehension

Software comprehension involves gaining an understanding of the functionality, structure,

and behaviour of a software system [von Mayrhauser 1995]. Software comprehension has a

number of applications in the development and maintenance of software. During the

development phase, software comprehension techniques can be used to ensure that the

system being developed complies with the system design. During software maintenance,

software comprehension can be applied to assist software evolution (extension and

contraction of functionality), reverse engineering (extracting design information),

reengineering (changing existing functionality), and refactoring or restructuring (improving

code by making it more extensible or maintainable). Software comprehension also has

applications in the field of software reuse [Szyperski 1998, Fayad 1999], where source code

or accurate documentation are not always available. Other areas of application include

1

redocumentation (documenting existing software) and legacy system migration (making old

systems work in new environments, e.g. the World Wide Web).

1.1.3 Challenges in object-oriented software comprehension

The prevalent software engineering paradigm in use today is the object-oriented (OO)

approach. Object-oriented software systems present new challenges for software

comprehension compared to traditional procedural or functional systems. The principal

features of the OO paradigm that place additional requirements on comprehension and

visualisation techniques, and hence require new approaches, are complex control flow,

inheritance, polymorphism, and dynamic binding. In the OO paradigm there are typically

many asynchronous interactions between objects and different points where methods can be

called; it is now much more difficult to follow a program’s execution than was the case with

traditional paradigms which produced more linear programs. Polymorphism is a concept

closely related to inheritance that allows a subtype to be substituted for a reference to a type

in the program; this makes it difficult to determine exactly which class is actually being

referred to in the program. Dynamic binding is a method of implementing polymorphism

where the type to be used is not bound to the type reference until runtime; thus it is not

possible to determine statically (i.e. from the program code) which class, and hence which

methods, will actually be referred to at runtime, nor is there any guarantee that subsequent

executions of the same code will refer to the same classes/methods.

1.1.4 Reverse engineering tools support software comprehension

Reverse engineering [Cross 1992] describes the process of analysing a software system

(complete or incomplete) in order to extract information about its design. Reverse

engineering tools exist to support the software developer in his software comprehension

tasks. A variety of industrial and academic reverse engineering tools exist, employing either

static or dynamic techniques, or an integrated approach. These tools range from relatively

simple debuggers, which allow the developer to step through the code execution in a

controlled manner and examine variable assignments and method calls as they occur, to

interactive visualisation tools, which produce diagrams to the user’s specification, based on a

directed analysis of the software system.

2

1.1.4.1 The relationship between forward and reverse engineering

The reverse engineering process can, to an extent, be considered the antithesis of the forward

engineering process. The traditional forward software engineering process (linear sequential

model, waterfall model, or classic life cycle) comprises: requirements elicitation and

analysis, high level design, detailed design, code generation, testing, and maintenance, in

that order [Pressman 2000 Sec. 2.4]. The reverse engineering process is equivalent to the

reverse of the second, third, and fourth stages of this process. The first stage of reverse

engineering is to extract information from the program code. This information can then be

analysed to produce low-level and high-level views of the software system. This relationship

is illustrated in Figure 1.1.

Figure 1.1 The relationship between forward and reverse engineering

1.1.5 Abstraction

Abstraction is the process of producing a simplified representation that emphasises the

important information while suppressing details that are (currently) uninteresting, with the

goal of reducing complexity and increasing comprehensibility [Berard 1993]. Thus, a more

abstract representation is produced from a less abstract (i.e. more detailed) base

representation by applying an abstraction operation to it. An abstraction operation may

perform aggregation on the entities and relationships in the representation, or it may apply

3

some mapping or other function to the information. Different levels of abstraction are

commonly employed in both the forward and reverse engineering of object-oriented software

systems. For example, a software system is typically specified at a high level of abstraction

during the initial design phase, which is then refined to a more detailed (less abstract)

representation later in the development prior to the system’s implementation in code.

Conversely, in reverse engineering, it is possible to generate representations of an existing

system at various levels of abstraction. For example, a debugger produces detailed

information about method calls and variable accesses, while an integrated development

environment (IDE) may provide functionality to extract a (more abstract) class diagram from

the system’s source code.

1.2 Thesis overview

1.2.1 Motivation and aim

It appears that software visualisation tools are seldom employed outwith the context of

research. This is because current tools are relatively tightly focussed in that they address only

a very small range of abstraction levels or a single aspect (i.e. structure or behaviour) of the

software. As a result, each of the extant software visualisation tools addresses only a small

subset of the range of software comprehension tasks. In order to address the limitations of

current visualisation techniques, an approach is proposed that integrates abstraction with

structural and behavioural perspectives. The aim of this research is to improve the

effectiveness of visualisation techniques for large-scale software understanding. The

motivation for this work was the lack of a unified model for software visualisation that

allows the analyst to move conveniently between abstraction levels. Such a model would

allow the analyst to visualise the information required for their analysis within the context of

the system as a whole, and hence to relate and reason about visualisations.

1.2.2 Research hypothesis

The research hypothesis explored in this thesis is whether a model that supports visualisation

of software through a range of abstraction levels that incorporate structural and behavioural

4

views and integrates statically and dynamically extracted information provides effective

support for the full range of software comprehension tasks.

1.2.3 Approach and methodology

The first stage in investigating this hypothesis was to examine related work in the fields of

software visualisation, tool evaluation, abstraction, diagrams, views, exploration and

querying, metamodels, and software modelling. An initial case study was then carried out to

assess the capabilities of the extant software visualisation tools. It is from the results of this

study that the research hypothesis was derived. A model was then proposed based on

multiple levels of abstraction, structural and behavioural perspectives, and the integration of

statically and dynamically extracted information. The model was assessed theoretically

against its original goals and then refined. Support for software comprehension strategies in

the proposed model was considered and abstraction operations between views in the model

and the combination of views were defined formally. The model was then applied manually

to a real system to demonstrate and verify its utility. A tool implementation of the model was

developed to facilitate its evaluation. The tool was used to perform a replication of the

original study with the novel model proposed, and to evaluate the performance of the model

in addressing typical software comprehension tasks in real world software systems. It is

concluded that the model proposed in this thesis provides support for the full range of

software comprehension tasks.

1.2.4 Contributions of this thesis

This thesis presents a novel software visualisation model based on a range of abstraction

levels and incorporating structural and behavioural perspectives of software, and introduces

a prototype implementation of the model. An abstraction scale and set of criteria for

classifying software comprehension tools are presented, and are used to review and compare

the extant software visualisation tools. A schema for categorising view arrangements in

software tools is presented. Typical software comprehension activities and tasks to be used

in the evaluation of software comprehension tools are proposed, and are used to assess the

extant tools and the novel model presented in this thesis.

5

2 Related Work

“Effectively presenting large amounts of information in any form is challenging.”

M-A D Storey, H Müller [Storey 1995]

This chapter discusses related work in order to provide a foundation for the work in

succeeding chapters and demonstrate the need for such work. Firstly, the basic techniques

involved in producing software visualisations are discussed. An overview and comparison of

the extant software visualisation tools is then presented. The fundamental concept of

abstraction is explored in detail. Various diagrams for presenting visualisations are then

described, along with the concept of views for organising them, and techniques for exploring

and querying visualisations. Related work from the field of software modelling is discussed.

Lastly, a variety of evaluation techniques in software comprehension and visualisation are

surveyed.

2.1 Software visualisation techniques

This section introduces the basic techniques involved in software visualisation – static and

dynamic extraction, data analysis, and presentation.

2.1.1 Static software comprehension techniques

Software comprehension techniques can be classified as either static or dynamic. Static

techniques analyse a system by examining its source or object code. Static techniques can

help in understanding the relationships between classes in a system, and in identifying the

system architecture [Müller 1993]. Although software systems written in procedural

languages are well suited to analysis with static techniques, aspects of the object-oriented

paradigm, such as polymorphism, overloading, and dynamic binding, make it more difficult

to gain an understanding of an object-oriented software system using static techniques alone.

Gamma et al. [Gamma 1995 pp.22-23] state, “An object-oriented program’s run-time

6

structure often bears little resemblance to its code structure. The code structure is frozen at

compile-time; it consists of classes in fixed inheritance relationships. A program’s run-time

structure consists of rapidly changing networks of communicating objects. In fact, the two

structures are largely independent. Trying to understand one from the other is like trying to

understand the dynamism of living ecosystems from the static taxonomy of plants and

animals, and vice versa. […] With such disparity between a program’s run-time and

compile-time structures, it’s clear that code won’t reveal everything about how a system will

work.”

2.1.2 Dynamic software comprehension techniques

Dynamic software comprehension techniques analyse a software system by extracting

information from the system as it is executing. Dynamic techniques can help to illustrate the

interactions between objects in a target system, and the flow of control between the system’s

components. Dynamic software comprehension techniques address many of the

shortcomings of static techniques in the comprehension of object-oriented software systems.

A potential disadvantage of dynamic techniques is that they can consider only a subset of the

software system’s possible behaviour. While static techniques can analyse the entire system,

dynamic techniques analyse only the behaviour evident in the execution trace. It is the

responsibility of the analyst to ensure that a suitably representative trace is selected for

analysis.

2.1.3 Advantages of dynamic analysis for object-oriented systems

As described above, dynamic analysis is particularly useful in the context of object-oriented

software systems. Dynamic information describes the actions of a system a run time; it

includes information such as object instantiation and communication, method calls, and

branching decisions. In contrast to the collection of static information, dynamic analysis

takes place in the context of a running system, rather than by examination of static program

code or design documents. As described in Section 1.1.3, the object-oriented programming

model often has a complex control flow, with many asynchronous interactions between

objects and points where methods can be called. Information for the comprehension of

object-oriented systems is hence often difficult to collect and complex to analyse. The large

7

number of object interactions and often unpredictable control flow can result in a large and

complicated event trace.

It should be stressed that dynamic analysis does not supplant static analysis, even in the

context of object-oriented systems. However, much of the information traditionally collected

by static analysis techniques can also be collected dynamically, thus subsuming much of the

functionality of static techniques. For example, that one method calls another method can be

revealed through dynamic analysis, but may be also be evident using static analysis

techniques, such as a call graph extractor (notwithstanding the difficulties posed to static

techniques by object-oriented concepts such as polymorphism, overloading, and dynamic

binding, as noted above). As noted above, there are a number of exceptions to the

subsumption of static analysis techniques by dynamic analysis in the form of information

that cannot necessarily be extracted dynamically. Such information includes, for example,

line numbers, comments, and branch conditions. It depends upon the goals of the software

comprehension process whether it is more important to know, for example, the conditions

pertaining to a branch structure, or whether or not that branch is taken during the execution

of the software. Also, as noted above, dynamically extracted information pertains only to the

program execution from which it was extracted, and does not necessarily represent all of the

possible runtime behaviour of a system. Thus, as in all scientific analysis, the analyst must

select the technique appropriate to his task.

2.1.4 Debuggers

A debugger is a utility that enables the collection of dynamic information from a running

system, and has long been part of the software engineer’s tool set. Debuggers operate in an

online mode, producing output as the software executes. The software engineer can control

the execution of the software by means of the debugger interface, for example by stepping

through the code or by suspending and resuming threads of execution. Breakpoints can also

be set at points in the code in which the software engineer is interested. Upon encountering a

breakpoint during execution, the debugger will output an appropriate message, e.g. ‘Method

x called’. The debugger can also be used to examine the values of variables and expressions

during execution. A debugger provides a view of a software system at a low level of

abstraction (i.e. at a level relatively close to the level of detail provided by the source code

itself), and can be invaluable in locating code-level errors a program. However, its low level

8

of abstraction is less useful in software comprehension activities, where a view at a higher

level of abstraction (i.e. at a level more distant from that of the code) of the system under

analysis is often required [Ball 1996].

2.1.5 Software visualisation tools

The foregoing discussion suggests that tools are required to assist the software developer in

the collection and analysis of information. As with any scientific analysis procedure, analysis

of a software system consists of three phases: collection of data about the software system;

analysis of the data collected; and presentation of the analysis results. In common with many

other scientific fields [Nielson 1990], visualisation has been found to be a particularly

effective method of presentation for the large and complex data sets produced by dynamic

analysis [Roman 1993].

Software visualisation tools typically operate in an offline mode, in which the collection

phase precedes the analysis and presentation phases. Walker et al. [Walker 1998] explain

that an offline system has two distinct advantages over an online system. Firstly,

preprocessing of the entire data set can be carried out prior to the presentation of the results,

allowing summary information to be produced for the execution. Such information can be

useful in helping the analyst to gain an overall view of the system. Secondly, (a part of) the

execution can be analysed repeatedly without requiring the execution to be repeated. This

allows the analyst to examine the same execution data in a number of different ways.

However, the disadvantage of the offline approach is that it is not possible to explore

alternative paths through the execution without rerunning the execution. This makes it

difficult to ask “What if…” questions of the system under analysis.

2.1.6 Data collection for software visualisation

During the data collection phase, static data is extracted from the program code and dynamic

data is extracted from the system during execution; this data is stored in a repository on disk.

The repository can be either a simple file or set of files, or a database. The usual advantages

and disadvantages of database systems also apply in this context: while it is quicker to write

9

a simple text file, a database can be queried more efficiently. The most appropriate

repository format depends on the functionality of the visualisation tool.

Extracting information statically from the program code typically involves analysis of the

program code (e.g. by means of call graph analysis [Grove 1997]). Dynamic data collection

occurs during the execution of the system. This necessitates some form of data collection

procedure running either within or alongside the system. One method of collecting this data

is by instrumentation of the source or object code of the system. This involves inserting

additional statements into the code that generate appropriate output when an ‘interesting’

event occurs during the execution of the system. In the context of object-oriented systems,

‘interesting’ events are usually defined as method calls and returns (when instrumenting the

caller) or method entries and exits (when instrumenting the callee). Koskimies and

Mössenböck [Koskimies 1996] explain that the advantages of inserting the instrumentation

in the caller’s code are that callee methods with multiple return points do not require

additional instrumentation, and that information about the caller method is conveniently

available. However, the instrumentation of method calls within expressions can appear

convoluted. Instrumentation of the source code can also reduce the readability of the code.

Code can be instrumented manually or automatically, e.g. using a preprocessor as in Scene

[Koskimies 1996]. One method of instrumenting either source or object code is the use of

wrappers. Brant et al. [Brant 1998] define wrappers as “mechanisms for introducing new

behaviour that is executed before and/or after, and perhaps even in lieu of, an existing

method”. Method wrappers were used in Gaudi [Richner 1999] to add instrumentation to the

code of the system under analysis.

Another method of collecting the dynamic data required for visualisation is the

instrumentation of the environment in which the software system is executing. This method

has the advantage that no changes to the source or object code of the system are required.

The environment is instrumented to produce appropriate output on the occurrence of relevant

events, as with code-level instrumentation. This method is employed in Ovation [De Pauw

1998], where the system under analysis is executed in an instrumented Smalltalk [Goldberg

1983] environment.

An alternative to instrumentation of the code or environment is to run the system under the

control of a debugger. Breakpoints set at appropriate points generate the output required.

Breakpoints can be set automatically, e.g. at every method entry and exit. This technique is

10

used in Shimba [Systä 2001] to generate trace information for selected methods and control

statements. Jinsight [De Pauw 2002] uses a profiling agent to control execution. As with an

instrumented environment, running under debugger or profiler control does not require

changes to be made to the code.

All of these methods are employed by the tools that are discussed later in this chapter and

evaluated in Chapter 3.

2.1.7 Analysing the data produced

The huge amount of data produced during the data collection phase must be analysed to

produce useful information about the software system. Three ways of reducing the event

trace to a manageable size are: selective instrumentation, pattern recognition, and

abstraction. These techniques may be used singly or in sequence.

Selective instrumentation involves instrumenting only those methods that are considered

‘important’. An analyst interested in gaining an overall understanding of a system may

choose to exclude all methods in library classes (such as java.lang.* and

javax.swing.* in Java [Arnold 2000, Gosling 2000, Sun 2005]). Alternatively, an

analyst pursuing a specific reverse engineering task, such as investigating how two classes

interact, may choose to instrument only the methods of those classes. Selective

instrumentation is employed in Shimba.

Pattern recognition is concerned with examining the event trace to detect repetition, in order

that this can be factored out to improve comprehensibility. This can performed using string

matching algorithms, such as the Boyer-Moore algorithm [Boyer 1977] used in Shimba.

Alternatively, Ovation employs a hashing technique to detect and generalise patterns in the

event trace.

Abstraction according to specified criteria can be performed on the event trace to raise the

level of abstraction from that of individual method calls and returns to some higher level.

Gaudi allows trace elements to be clustered arbitrarily into user-defined components to aid

understanding of the system under investigation.

11

Additionally, traces may be split manually or automatically into one or more smaller traces

to aid manageability, as in Shimba. It is also possible to start and stop the instrumentation

process, producing trace output only for defined periods of the system’s execution, as in

AVID [Walker 1998].

2.1.8 Presenting the results

The goal of software visualisation is to present information about the software system under

investigation to the analyst in a format that is useful in helping them to achieve their

software comprehension tasks. A number of diagramming techniques have been employed in

visualisation tools in an effort to achieve this goal; these include basic graph representations,

UML diagrams, and message sequence chart-based representations.

2.1.8.1 Basic graph representations

Basic node/arc graphs are often used to illustrate the structure or behaviour of a software

system. For example, flow graphs are directed graphs that can be used to represent the flow

of control in a system; one application of these is in testing [Pressman 2000 Sec. 17.4.1,

17.6.1]. In an object-oriented context, directed or undirected graphs can be used to depict

object interactions by representing objects as nodes and messages as directed arcs between

nodes. The problem of scalability common to many representations is particularly evident

with graphs – attempting to draw numerous messages between objects can quickly reduce

the readability of the diagram. Directed graphs are used in Program Explorer [Lange 1995a].

The tool described by Sefika et al. [Sefika 1996a] and Gaudi employ directed graphs to

illustrate interaction between system components, while Dali [Kazman 1999] uses

undirected graphs to do so. The reflexion models used to show this in AVID are based on

directed graphs. Shimba uses undirected graphs to illustrate static method dependencies.

2.1.8.2 UML diagrams

Interaction diagrams, statechart diagrams, and activity diagrams are part of the UML

diagramming standard [Rumbaugh 1999, OMG 2001], which provides diagrams that

12

illustrate both the static structure and dynamic behaviour of a system. Interaction diagrams

illustrate interactions, which comprise objects, the relationships between them, and the

messages that are passed among them. There are two types of interaction diagrams:

collaboration diagrams and interaction sequence diagrams (sequence diagrams). The

emphasis of collaboration diagrams is on the structural organisation of the objects, while

sequence diagrams emphasise the temporal order of the messages passed between the

objects. Though semantically equivalent, the information shown by the two types of diagram

differs: collaboration diagrams show the connections between objects, while this is only

implied in sequence diagrams. While sequence diagrams show message return values,

collaboration diagrams do not. Figures 2.1 and 2.2 show a pair of corresponding sequence

and collaboration diagrams representing the Singleton.getInstance() method of

the Singleton design pattern [Gamma 1995 pp. 127-134]. This pattern ensures that a class

has only one instance in a system, and provides a global point of access to the instance. For

example, a system may be connected to a number of printers, but there should be only one

print queue. The getInstance() method returns this instance. Interaction diagrams solve

some of the scalability issues inherent in graph-based representations by representing time

explicitly along the vertical axis.

Figure 2.1 The UML sequence diagram for the Singleton design pattern

13

Figure 2.2 The UML collaboration diagram for the Singleton design pattern

Statechart diagrams (statecharts) [Harel 1990] model the behaviour of an individual object

as it changes state in response to events. Statecharts emphasise the states in which an object

can exist and the transitions between these states. Activity diagrams are flowcharts that

describe the flow of control between activities; the Together documentation [TogetherSoft

2001a] describes an activity as “an ongoing, non-atomic execution within a state machine”.

While interaction diagrams emphasise the flow of control between objects, activity diagrams

emphasise control flow between activities.

Together ControlCenter [TogetherSoft 2001b] synthesises interaction diagrams, statechart

diagrams, and activity diagrams from source code. An extended version of UML sequence

diagrams and statecharts are used in Shimba. Booch’s object interaction diagram [Booch

1994] (a precursor to the UML sequence diagram) is used in Sefika et al.’s approach [Sefika

1996a] to illustrate component interaction in a system.

2.1.8.3 Message sequence charts

Message sequence charts (MSCs) [ITU-T 1996] are similar to UML interaction diagrams.

Objects are listed along the top of the diagram, with vertical lines indicating the lifetime of

the object. Messages between objects are shown as directed arcs; time progresses

downwards. A variation of message sequence charts is used in Ovation [De Pauw 1998]. De

14

Pauw et al. [De Pauw 1998] explain that the tree-structured interaction diagrams – called

execution patterns - used in Ovation emphasise the progression of time, rather than the flow

of control as in sequence diagrams. They also give the disadvantages of sequence diagrams

as being that they do not scale conveniently, there are ambiguities in that the ordering of

objects on the horizontal axis is arbitrary, and the lifetimes of recursive calls are not easily

resolved. The execution patterns of De Pauw et al. also use colour to indicate the class of an

object, and label each object with a unique identifier. De Pauw et al. explain that execution

patterns address the perceived shortcomings of sequence diagrams because being

unidirectional in both axes makes them more convenient to read, they scale better, and more

efficient use is made of space in both axes.

An early form of MSC - an interaction chart - was used in Program Explorer. The OMT

[Rumbaugh 1991] event trace diagram (scenario diagram) used in Scene [Koskimies 1996]

is a variant of the MSC. ISVis [Jerding 1997] used a style of MSC called a Temporal

Message Flow Diagram (TMFD) [Citrin 1995].

2.1.8.4 Other representations

A number of less widely used representations also exist, including three-dimensional

[Marcus 2003a] and virtual reality [Maletic 2001] visualisations (discussed later in the

context of specific tools).

Chuah et al. present three novel interactive glyphs for visualising software: InfoBUG,

timeWheel, and 3D-wheel [Chuah 1997]. The InfoBUG glyph provides an overview of the

software’s components. It is shaped like an insect and consists of wings, head, tail, and body,

each of which encodes some metric about the software, such as LOC, number of errors, lines

of code added or deleted, contained objects, etc. The timeWheel glyph illustrates multiple

properties of a software system over time. Each property is represented by a time series,

which are laid out in a circle. This glyph is useful for visualising trends in evolution. The

3D-wheel glyph represents the same data as timeWheel, but uses height to denote time. Each

variable is represented by an equal-sized portion of the circumference of the circle, with its

radius (i.e. thickness) denoting its value (as in a rose diagram). It is easier to identify trends

using the 3D-wheel than the timeWheel, but harder to identify divergences. The approach is

15

demonstrated by a description of its application to a large real-time software system

developed by thousands of developers over twenty years.

Eick et al. present several visualisations designed to help in understanding and managing the

software change process [Eick 2002]. These are matrix views, cityscapes, bar and pie charts,

data sheets, and networks. Multiple visualisations can be combined to form perspectives that

show high-level structure in change data while allowing access to details. Matrix views are

effective for displaying values that are a function of two indices. Advantages of matrix

displays are that many cells are visible and there is no overplotting. One drawback that also

applies to cityscape views is the arbitrary ordering of the columns and rows, which makes it

difficult to relate the representation to other data. Cityscapes are 3D bar charts which are an

extension of matrix views. There are two indices with one or more values. Although more

compelling than 2D matrices, cityscapes have decreased scalability and suffer from

occlusion. The bar and pie charts used here have been enhanced to improve scalability, thus

bar charts scale effectively, though pie charts still do not. These representations are most

effective when used as selectors linked to other views. A data sheet is basically a

multicolumn scrollable textual display, with the addition of zooming. They are useful in

providing direct access to details, and are especially effective when linked to other views. In

network views, nodes represent software units and visual attributes denote measures of

association between them. The strength of network views lies in revealing high-level

structure. Their weaknesses are lack of scalability, inability to display multiple link

characteristics, and overlap. Perspectives are used to show multiple views simultaneously,

with links between them so that manipulations in one view can be reflected in the others.

Usage is demonstrated in the context of understanding software change by exploration of

change data, and managing software development.

2.2 Software visualisation tools

This section discusses a representative selection of the extant software visualisation tools. A

scheme for characterising software visualisation tools and a scale for measuring abstraction

level are presented. The extant tools are then assessed and discussed in the context of this

framework. Examining and comparing the existing tools in this way emphasises the

capabilities of current software visualisation tools and highlights potential areas for

improvement.

16

2.2.1 Characteristics of software visualisation tools

2.2.1.1 Three criteria for characterising software visualisation tools

From the foregoing discussion, three distinguishing criteria regarding software visualisation

tools can be identified. The first of these is the method used to extract the dynamic

information from the software system. Techniques include instrumentation of the source or

object code (e.g. using wrappers) or environment, or running the system under the control of

a debugger or profiler.

The second criterion is the methods of analysis that are applied to the extracted data to

improve its comprehensibility and usefulness to the analyst. These include selective

instrumentation, pattern recognition, abstraction, trace splitting, and suspension/resumption

of tracing.

The third criterion is the method by which the results of the visualisation are presented to the

analyst. Diagramming techniques are typically based on graphs, UML diagrams, or message

sequence charts.

2.2.1.2 A scale to indicate level of abstraction

The combination of these three criteria determines the level of abstraction at which the

software visualisation tool operates. This thesis proposes a scale for the classification of

software analysis tools based on their level of abstraction; this is illustrated in Figure 2.3. An

ordinal scale is used to assign a value (or range of values) from one to five to a tool, based on

its position relative to the five indicated reference points. At the microscopic end of the

scale, debuggers (1) are representative of the lowest level of abstraction that an analysis tool

can produce. At the opposite, macroscopic, end are tools that provide a broad overview of an

entire software system at a high level of abstraction (5). The middle portion of the scale

ascends from tools that illustrate method calls and returns (2), through tools giving an object-

or class-level representation of the system (3), to tools that provide an architectural-level

view of the system (4). The program code itself can be considered to be at level 0. The

application of this scale is not restricted to the assessment of software visualisation tools. It

17

is equally applicable to diagrams, and indeed other forms of documentation, at any stage of

the software engineering life cycle.

Figure 2.3 A scale to indicate level of abstraction

2.2.1.3 Software visualisation tool taxonomies

The remainder of this section examines a representative sample of the software visualisation

tools that have been produced. Each tool is described according to the three criteria proposed

in Section 2.2.1.1, and placed on the abstraction scale described in Section 2.2.1.2.

The template used to categorise software visualisation tools in this section is based around

four headings – Description, Evaluation, Comparison, and Assessment. A number of

alternative taxonomies have been proposed in the literature, including work by Myers

[Myers 1986], Chikofsky and Cross [Chikofsky 1990], Stasko and Patterson [Stasko 1992],

Price et al. [Price 1992, Price 1993], and Roman and Cox [Roman 1993]. Price et al. [Price

1993] propose a detailed, multi-level taxonomy for classifying software visualisation tools.

Unlike earlier taxonomies that have derived categorisations based on observations of tools,

Price et al. justify their categories (Scope, Content, Form, Method, Interaction, and

Effectiveness) based on the theory of visualisation tools. They then attempt to classify a

selection of software visualisation tools according to this taxonomy. The software

visualisation tools in this thesis are categorised according to four categories that were

observed from the extant software visualisation tools (extraction, analysis, and presentation

techniques, and abstraction level). There is some commonality between these categories and

those of the taxonomy of Price et al. While this categorisation may be less detailed than the

18

taxonomy of Price et al., it provides much of the cogent information that may be required

when selecting a software visualisation tool for a software comprehension or reverse

engineering task.

2.2.2 Program Explorer (level 2)

2.2.2.1 Description

Lange and Nakamura [Lange 1995a] discuss the investigation of object-oriented frameworks

by means of visualisation that focuses on identifying design patterns. They describe Program

Explorer, a tool that uses a combination of static and dynamic information to visualise C++

programs. The program has a GUI front end, and queries are formulated in Prolog. Static and

dynamic information are stored in a single “Program Database”. The static information for

this database is gleaned from files output by a compiler. The system consists of a C++

program database; an “instrumentation utility” that instruments the C++ source code; a

“Trace Recorder” that is linked with the program under analysis to capture the event trace

during execution; and “Program Explorer”, which is used to control the execution of the

program, and present the static and dynamic information using its GUI.

The tool presents the visualisations using a graph-based representation and interaction charts.

These visualisations can be navigated step-by-step in a hypertext-like manner (e.g. expand a

node in the graph, explore a relationship between two nodes). The dynamic information is

extracted by automatic instrumentation of the source code. Further information on the

instrumentation technique is given by Lange and Nakamura [Lange 1995b]. A version that

uses runtime trapping instead of source code instrumentation, thus eliminating the need for

an extra compilation stage at the expense of execution speed, is discussed by Lange and

Nakamura [Lange 1997].

The visualisation can be localised by allowing the user to set breakpoints at a variety of

points in the source code, including classes, objects, function calls, etc. This also allows the

user to switch the tracing on and off, limiting the size of the trace. It does not appear that any

automatic analyses (e.g. behavioural pattern matching as in Shimba, described in Section

2.2.10) are applied to limit the size of the event trace and, hence, the resultant diagrams.

19

Lange and Nakamura [Lange 1995a] explain how identifying design patterns can help in

framework understanding, using examples from the Interviews framework [Linton 1992]. It

is argued that patterns can help in two ways. Firstly that, once identified in the software

comprehension process, they can help to “fill in the blanks” about the rest of system.

Secondly, patterns can provide a starting point for the exploration of a system. Lange and

Nakamura comment that although some automation using heuristics may be possible, it is

unlikely that the identification of patterns in the visualisation could be fully automated. The

system relies on the user being “pattern-literate”, and being able to identify the semantics of

design patterns from the method names and interactions between objects.

2.2.2.2 Evaluation

Lange and Nakamura [Lange 1995a] cite user reports that the tool was useful for three types

of task, namely: in supporting the understanding of certain specific C++ frameworks; for

reviewing designs, allowing visualisation of the implemented design in comparison to the

original design; and for visually debugging the application logic of C++ systems. Lange and

Nakamura state that Program Explorer provides framework developers with “abstract”

design pattern views, and “microscopic” views that provide sufficient detail as to make

source code superfluous in the software comprehension process. They explain that Program

Explorer’s ability to handle complex frameworks such as Interviews and CommonPoint

[Taligent 1994] is attributed mainly to the integration of static and dynamic information,

ease of view navigation and interaction, the ability to trace selectively, and user control over

the execution.

2.2.2.3 Comparison

At the time of the original paper [Lange 1995a], there do not appear to be any tools with

functionality comparable to that of Program Explorer. Lange and Nakamura [Lange 1995a]

discuss briefly two static analysis tools - CIA++ [Grass 1992] and GraphLog [Consens

1993] - and two dynamic analysis tools - Object Visualizer [De Pauw 1993, De Pauw 1994]

and HotWire [Laffra 1994]. Both dynamic tools are based on the same instrumentation

mechanism, which is less accurate than that used in Program Explorer in that it lacks

information on implicit functions, variable usage, and variable values. Object Visualizer is an

20

object-oriented profiling tool, while HotWire is a visual C++ debugger. Lange and

Nakamura explain that it is HotWire that is most similar to Program Explorer. Both HotWire

and Program Explorer generate microscopic visualisations concerning the state and

behaviour of objects, while Object Visualizer provides a more general overview, similar to a

profiling tool.

Jerding and Rugaber [Jerding 1997] observe that Program Explorer is not intended to give

an overall understanding of a software system, but to focus on specific classes or objects.

The analyst must be aware of what he is looking for, or where in the execution it occurs,

before he begins his analysis.

Walker et al. [Walker 1998] note that the analyst must possess an in-depth knowledge about

the software system under analysis in order to query usefully the fine-grained models

produced by Program Explorer.

De Pauw et al. [De Pauw 1998] argue that Program Explorer bridges the gap between

microscopic and macroscopic extremes by its use of interaction diagrams. However, they

note that such diagrams are inconvenient and suffer from the difficulties described

previously relating to scalability, ambiguity, and recursive calls.

Richner and Ducasse [Richner 1999] note that the highest level of abstraction supported by

Program Explorer is the class or object level.

Systä et al. [Systä 2001] observe that the analyst cannot specify how the event trace is split

into sequence diagrams, and that the level of abstraction of these diagrams is fixed.

2.2.2.4 Assessment

The comments made by Jerding and Rugaber [Jerding 1997] suggest that Program Explorer

is more suited to specific reverse engineering tasks than overall software comprehension, as

the analysis must be focussed clearly. Therefore, it would be expected that Program

Explorer might struggle with general software comprehension tasks, but could perform well

in specific reverse engineering tasks, depending on the level of detail required for useful

analysis.

21

2.2.3 Scene (level 2)

2.2.3.1 Description

Koskimies and Mössenböck [Koskimies 1996] describe a tool called Scene (Scenario

Environment) that produces scenario diagrams from a dynamic event trace. Calls can be

expanded and collapsed to simplify the scenario diagram. A hypertext approach enables the

analyst to click various areas of the diagram (e.g. a method call or an object) to jump to a

related document (e.g. a point in the source code or a class interface). An externally-

produced class diagram can also be linked to the scenario diagram. The scenario diagrams

are partitioned to display only as many objects as fill the screen horizontally, thus

eliminating horizontal scrolling. Calls to ‘uninteresting’ objects can be filtered out in the

diagram by selecting the object(s) to retain or discard. Calls can be expanded and viewed in a

‘single-step mode’ where subsequent events are displayed one by one in a separate window.

For any call event (or for the whole diagram), a call summary can be viewed in the form of a

call matrix.

The system is implemented in the Oberon-2 language [Mössenböck 1991] and runtime

environment [Reiser 1991, Wirth 1992], and traces programs in this language. Oberon-2 is a

hybrid language, which, in addition to the object-oriented concepts of classes and methods,

also supports modules and procedures. The trace is obtained by automatically instrumenting

the source code using a preprocessor, which is then compiled and executed. The event trace

is then input to Scene, which produces a scenario diagram. ‘Uninteresting’ modules (e.g.

those related to mouse events in a GUI) can be excluded from the instrumentation, or

instrumented manually by the analyst at their discretion.

One problem identified was the lack of support for understanding of the relationships

between multiple scenario windows, which represent a hierarchy. Future work includes

automatic production of object state information, instrumentation of object rather than source

code, and application of Scene to other languages such as C++. More detail on Scene is given

by Koskimies and Mössenböck [Koskimies 1995a].

22

2.2.3.2 Evaluation

Koskimies and Mössenböck [Koskimies 1996] report that Scene has been used to analyse a

number of “framework-like” systems, including a compiler construction framework

[Koskimies 1995b] and a graphics editor [Templ 1994]. They believe that Scene is most

beneficial in the analysis of frameworks, as understanding their complex dynamic behaviour

is vital for reuse.

2.2.3.3 Comparison

The scenario diagrams used in Scene are similar to Program Explorer’s interaction charts,

and represent the same level of abstraction. Both tools extract dynamic information through

automatic instrumentation of the program source code.

2.2.3.4 Assessment

The similarity of the representations in both Scene and Program Explorer would suggest that

Scene may also be better suited to targeted reverse engineering tasks than overall software

comprehension activities. However, the diagram manipulations and summary generation

supported by Scene would be expected to give it an advantage over Program Explorer in

overall comprehension tasks.

2.2.4 Architecture-oriented visualization (level 4)

2.2.4.1 Description

Sefika et al. [Sefika 1996a] discuss the concept of architecture-oriented visualization, which

is concerned with the visualisation of architecture-level components of software systems,

e.g. subsystems, frameworks, design patterns. Sefika et al. state that it is often architectural-

level questions that are most useful in understanding software systems, but that answering

such questions using traditional programming tools is difficult for a number of reasons.

Firstly, the volume of data generated by “flat” instrumentation of method calls and returns is

23

too great for architectural-level understanding, and its collection disrupts the system under

analysis. Secondly, the more abstract architectural structures are hidden from

instrumentation. Thirdly, prior systems have scant support for multiple perspectives or

hierarchical navigation, making it difficult to analyse the information from various

abstraction levels and design aspects that is required to discern the software architecture.

The (unnamed) tool described by Sefika et al. generates a variety of diagrams. Summary

statistics are displayed using bar charts and ternary diagrams [Haynes 1995]. Relationships

between run-time information and static system structure are represented using space-filling

diagrams [Baker 1994]. Interaction between system components is illustrated using affinity

diagrams [Sefika 1996b] and object interaction diagrams [Booch 1994].

The user has a choice of diagrams for different purposes. In the case studies, a bar chart is

used to display the number of processes blocked per subsystem; space-filling diagrams to

illustrate process blocking statistics at sub-framework, inheritance structure and class levels,

and for particular class instances; ternary diagrams to show communication between sub-

frameworks and subsystems; affinity diagrams to show communication between classes of a

sub-framework and classes of a subsystem; and object interaction diagrams to show object

interactions. The system supports multiple simultaneous diagrams of combined static and

dynamic information, with hyperlinks between diagrams. The tool described uses an online

approach.

The two principal constraints on the design of the system were that it must incur low spatial

and temporal overheads, and that it must be flexible enough to allow the analyst to change

the data extraction technique conveniently. The structure of the system is based around

events being received by an event sensor and passed to an event announcer, which informs

an instrument at the relevant level of abstraction (i.e. that selected by the user) that the event

has occurred. The key architectural design decisions were identified as: how the query

interpreter maps architectural units to instruments; how the instrument managers control

instruments; how data collectors visit instruments to obtain data; and how events should be

directed to interested instruments. The information is obtained from method-level

instrumentation contained in the Choices operating system. Queries are formed automatically

via the GUI, or can be entered manually (the grammar is given in Extended BNF notation

[Aho 1986]).

24

At the time of the original paper [Sefika 1996a], this appears to be the first work to consider

dynamic architecture-oriented visualization. Potential for future research is identified in

combining the system with a code refactory to automate design repair; in utilising the

instrumentation techniques in an optimising compiler; and integration of the system with

everyday programming tools such as debuggers and code browsers. More generally, Sefika

et al. expect the pervasive instrumentation technique to provide analysts with better user

interfaces and views, particularly 3D views exploiting virtual reality technology.

Unfortunately, this latter development does not appear to have materialised.

2.2.4.2 Evaluation

Sefika et al. [Sefika 1996a] present two case studies based on the Choices object-oriented

operating system [Campbell 1993], written in C++. One is related to identifying a system

bottleneck, and the other related to analysing subsystem cohesion and coupling.

The performance of the architecture-oriented tool is compared to that of traditional “flat”

method-level instrumentation tools, and the improvements of architecture-oriented

visualization over traditional visualisation are identified as follows. Firstly, architecture-

oriented instrumentation utilises knowledge of the software structure, enabling it to reduce

the volume of data that must be collected, lowering the overhead of dynamic analysis. This

reduction also decreases the volume of data sent to the visualiser, and hence also reduces the

requirement for analysis to improve the comprehensibility of the data. The volume of data

collected is further reduced in the architecture-oriented approach as event sensors and

instruments are enabled according to the requirements of the query. The hierarchical

organisation of the instruments allows information about system structure to be obtained

quickly.

In terms of data generation, a graph comparing architecture-oriented instrumentation with

traditional instrumentation reveals that architecture-oriented instrumentation dramatically

reduces the trace size as the level of abstraction employed increases. In terms of

instrumentation overhead, two graphs illustrate clearly that architecture-oriented

instrumentation reduces the analysis overhead, as the volume of data and time required for

collection are decreased, and instruments are enabled depending on the components of the

current query.

25

Sefika et al. note that architecture-oriented instrumentation is not entirely without cost.

While traditional instrumentation increased the size of the Choices executable by 7.8%,

architecture-oriented instrumentation caused an increase in size of 14.3% in the worst case.

This difference is due to the addition of code for query processing and instrument

management required by architecture-oriented instrumentation. Sefika et al. feel that this is

an acceptable trade-off, given the benefits of architecture-oriented instrumentation and

falling memory prices, and the fact that most of the architecture-oriented instruments in the

code will be unused until explicitly required.

2.2.4.3 Comparison

Jerding and Rugaber [Jerding 1997] note that the goals of Sefika et al.’s approach are similar

to those of ISVis in visualising a system from a variety of architectural levels. However, they

point out that some of the views described by Sefika et al. are tightly coupled to the subject

system domain, rather than being generally applicable to software architectures. They

speculate that this could be because the tool was applied only to an operating system.

Walker et al. [Walker 1998] state that the higher-level visualisations provided by Sefika et

al.’s tool are an improvement over earlier techniques in analysing component interactions in

large systems. However, Walker et al. also explain that the tool is not as flexible as it could

be. While an online approach provides a connection between system execution speed and the

speed shown in the visualisation, this places restrictions on the analyses that can be carried

out, as discussed in Section 2.1.5. The approach taken by Sefika et al. of using predefined

abstraction types built into the tool, while gathering dynamic information effectively,

reduces the flexibility of the technique by making it more difficult for the analyst to adapt it

to a different system. The reflexion model technique described by Walker et al. can

conveniently be applied to a variety of systems, partly due to the decoupling of the data

collection and visualisation components.

Richner and Ducasse [Richner 1999] note that while Sefika et al.’s tool is one of the few

tools to support architectural-level visualisations, the approach taken requires application-

specific instrumentation, unlike Gaudi.

26

Systä et al. [Systä 2001] observe that Sefika et al.’s tool requires the analyst to select the

abstraction level and views to be produced before running the software system to be

analysed. Shimba is more flexible: it does not have this requirement, and provides a variety

of techniques to allow the analyst to construct abstractions from the low-level views

produced.

2.2.4.4 Assessment

The architectural-level visualisations produced by Sefika et al.’s tool suggest that it would

perform well in general software comprehension tasks. If appropriate views and abstraction

level were selected, it could also be useful in specific reverse engineering tasks. However,

the evaluation presented by Sefika et al. was in the context of an operating system only, and

it remains to be seen whether the technique will perform well when visualising other types of

software.

2.2.5 ISVis (level 4)

2.2.5.1 Description

Jerding and Rugaber [Jerding 1997] describe a tool called ISVis (Interaction Scenario

Visualiser) for identifying software system architecture. Static information is extracted from

files generated by the Solaris C/C++ compiler. An instrumentor then combines this static

information, the source code, and information from the analyst about what to instrument, and

generates instrumented source code. This code is compiled, executed according to the

desired usage pattern, and event traces are produced. The ISVis trace analyser then converts

this information into a set of scenarios and involved actors that are stored in a program

model. The user then queries views of this program model. ISVis has a Main View and a

Scenario View. The Main View lists the actors in the program model, including user-defined

components, files, classes, and functions, and the scenarios and interactions in the program

model. A Scenario View can be opened for any scenario in the model, which takes the form

of a Temporal Message Flow Diagram (also called TMFD, interaction diagram, message

sequence chart, event-trace diagram). A global overview is shown using an Information

Mural [Jerding 1995]; this allows the analyst to identify repeated patterns in the execution

27

visually. An option allows actors to be grouped by containing file, class, or component

actors. Another option allows the user to select an interaction or class of interactions and

define them as a scenario, which can then be abstracted out and replaced in the diagram by a

reference to the scenario. Interaction patterns can also be identified by a technique similar to

regular expression matching. Jerding and Rugaber compare the interaction patterns of ISVis

to design patterns [Coplien 1995, Gamma 1995], stating that interaction patterns are a result

of the implementation of design patterns, and constitute low-level evidence of their

existence.

The relationship between the two views and the program model is an implementation of the

Observer design pattern [Gamma 1995 pp. 293-303], and an example of the Model-View-

Controller (MVC) architecture used in languages such as Smalltalk [Krasner 1988]. The

Observer design pattern defines a one-to-many relationship between objects, such that when

one object changes state all its dependent objects are notified and updated automatically. For

example, objects representing different views of the same data, e.g. a pie chart, a bar chart,

and a spreadsheet, could be registered to observe the data source and hence be updated

automatically when the data source changed. The Observer pattern allows consistency

between cooperating objects, without making them tightly coupled which would reduce their

reusability. ISVis allows the analyst to save the event traces and program model for future

analysis. The process of reading in the trace, creating the program model, creating scenarios

and architectural models, and viewing the results is iterative, with each analysis building on

the results from the previous analysis. Analysis sessions can be loaded and saved. ISVis can

simultaneously analyse a number of traces from one system.

Jerding and Rugaber suggest future improvements to ISVis as suggesting patterns to the

analyst more effectively, and import/export of components from/to other tools. Future work

is to include interoperation of ISVis with the Balboa machine-learning finite state machine

generation tool [Cook 1995], and with the SAAMTool architectural analysis tool [Kazman

1994].

2.2.5.2 Evaluation

ISVis is applied to a case study involving adding functionality to the Mosaic web browser

[NCSA 2003]. Jerding and Rugaber term the problem of finding where in a system to insert

28

an enhancement “architectural localization”. The high-level process of architectural

localization during the case study consisted of producing scenarios, removing interactions

that do not pertain to the functionality being localised, using the information mural to browse

the scenarios and identify patterns, using pattern matching to find scenarios similar to those

already identified, then relating this behaviour to the source code.

The principal strength of ISVis was reported to be its support of the abstraction process by

means of interaction patterns. This frees the analyst from the computationally-intensive work

and allows them to identify semantically those patterns that are relevant to the task at hand.

This allows the analyst to perform inferences manually that would not be considered by a

wholly automated approach. The authors emphasise the importance of appropriate usage

scenarios being chosen, as these have a direct effect on the analyst’s ability to identify

patterns. The problem of selecting a suitably representative trace is a key concept in dynamic

analysis, as discussed above in Section 2.1.2 and Section 2.1.3.

A weakness is given as the complexity of the user interface, which is attributed to its rich

features. The importance of scalability is emphasised, as architectural visualisation is only

useful if the system is large enough to benefit from such analysis. It is reported that the

information mural was effective at compressing the large volume of data.

2.2.5.3 Comparison

Unlike Ovation [De Pauw 1998] and Jinsight [De Pauw 2002], ISVis does not automatically

identify patterns of repeated execution; ISVis requires the analyst to identify such patterns.

Richner and Ducasse [Richner 1999] believe that Gaudi complements ISVis in that while

both tools acknowledge that higher-level views are required for architectural understanding,

ISVis concentrates on pattern detection, while Gaudi allows the analyst to specify the type of

view used.

Systä et al. [Systä 2001] observe that source files are the lowest level of granularity that can

be excluded from the trace in ISVis, while Shimba allows individual classes and methods to

be excluded. Shimba also allows more flexible construction of abstractions. However,

29

Shimba only allows pattern searching using exact string matches, and patterns must be

contained within a single sequence diagram.

2.2.5.4 Assessment

As with Sefika’s Architecture-Oriented Visualization tool [Sefika 1996a], the architectural-

level visualisations used in ISVis would appear to lend themselves well to general software

comprehension. The tool may also be useful in specific reverse engineering tasks, depending

on the level of abstraction required for the task.

2.2.6 Dali (level 4)

2.2.6.1 Description

Kazman and Carrière [Kazman 1998, Kazman 1999] describe a tool called the Dali

Workbench, which is designed to help with the extraction of program architecture. It is

designed as a lightweight, flexible tool that integrates other tools, the argument being that no

single tool is adequate for architectural extraction. Kazman and Carrière [Kazman 1999]

argue that software architecture is a “shared hallucination” – it exists from the various points

of view of people involved with the software. It is thus argued further that a human element

is essential in the process of architectural extraction. The goal of Dali is to assist the analyst

in the analysis of software architecture. This implies a need for the reconstruction of

architectural representations of the system. Kazman and Carrière list the main contributions

of Dali as its use of a central data repository to integrate system information, its use of a

common language (SQL) to enable the combination of views and user-defined pattern

matching, and its assessment of such patterns as a metric for architectural conformance.

Four iteratively applied techniques are involved in the process of reconstructing software

architecture using Dali. Firstly, static information from source artefacts, such as the program

code, and dynamic information from the output of profilers or coverage tools is used to

create extracted views of the system. These views represent the implemented architecture of

the system. Secondly, the extracted views are combined to produce fused views giving a

more complete representation of the architecture. Thirdly, the analyst defines a number of

30

architectural patterns that represent his understanding of the implemented architecture, which

are used to create refined views. Fourthly, the refined views are visualised to allow the

analyst to compare the implemented architecture to the designed architecture.

The extraction component of Dali extracts information using tools such as lexical analysis,

parsing, and profiling tools, then combines this information. This information is stored in a

central repository (a relational database). The contents of the repository can be visualised

and manipulated, and analyses can be performed on them. The various tools that are used

with the Dali Workbench are not fixed in its specification, but examples include the

following tools: Lightweight Source Model Extraction (LSME) [Murphy 1996b] for

extraction of static information; gprof for extraction of dynamic information; PostgreSQL

(based on POSTGRES [Stonebraker 1990]) as the relational database; SQL for view fusion

and architectural pattern definition; and RMTool [Murphy 1995] for analysis.

Fusing views in Dali means defining connections between them. The fused views in Dali are

concerned with providing complementary information from multiple views, navigating

between views, and improving the accuracy of a view with information from another view.

When combining views the fusion process must reconcile the information extracted using

different, complementary techniques. For example, Kazman and Carrière point out that both

their static and dynamic extractors provide information on function calls, the former listing

potential calls and the latter actual calls. A simple naïve union of these two sets of

information would lead to inconsistencies, so it is necessary to reconcile the elements in

these views. Statically extracted class inheritance information can be added to disambiguate

calls to sub/super classes.

Kazman and Carrière state that the intention was not to provide an ultimate solution, but to

develop an extensible environment for tool integration. Future research includes extending

the scope of Dali to analyse other languages and larger systems (e.g. legacy COBOL

systems) – at the time of the original work [Kazman 1999], Dali had been used on systems

up to 200 KLOC (thousand lines of code) in C, C++, Objective C, and Fortran – there is

evidence of such extension in work by O’Brien [O’Brien 2002]. There is also the possibility

of integrating other tools, such as to enable the import and export of architecture

representations in ACME [Garlan 1997] or UniCon [Shaw 1995]. It is also hoped to improve

user interaction, with the addition of a history/undo feature in the short term, and the ability

for the user to manipulate the architecture directly and have the system infer appropriate

31

architectural rules. Finally, Dali could be used to guide architectural evolution, e.g. in

determining how difficult it would be to change the connection mechanisms of an

architecture; this could be useful in web-enabling legacy systems or distributing them via

CORBA.

2.2.6.2 Evaluation

Kazman and Carrière [Kazman 1999] describe the application of Dali to two C++ systems:

VANISH [Kazman 1996], which has a well-designed architecture, and UCMEdit [Buhr

1996], which has no designed architecture. The study describes the stages of extracting the

information, forming “fused views”, then applying patterns (expressed as SQL queries) to

simplify the resultant visualisation (application-independent patterns, common application

patterns, then application-specific patterns). The analyst carrying out the architectural

extraction would appear to have to be either a very good software engineer, or even to be

intimately familiar with the system under investigation. The sorts of manipulation that are

carried out involve, for example, the grouping of methods and variables into their associated

classes, and the grouping of functions and header files into their associated classes. The case

studies extracted the as-implemented architecture from both systems, but, as would be

expected, found the VANISH architecture much more useful. The analysts were also able to

identify some architectural exceptions and points for improvement in the VANISH

architecture using the extracted model. Kazman and Carrière note that a good architecture is

characterised by functional consistency.

O’Brien [O’Brien 2002] describes three case studies in which Dali was employed in an

industrial architecture reconstruction project at Nokia. The system involved in the first case

study was a network management system consisting of 500 KLOC of C; the goal was to

understand how the system could be improved. The second case study concerned another

network management system consisting of 100 KLOC of Java; the goal was to understand

the system and determine whether it could be reused. The third case study involved a mobile

phone system consisting of 1 MLOC (million lines of code) of C++; the goals were to

examine the way in which this application was integrated with the operating system, and to

determine whether a specific component could be extracted and reused. O’Brien reports that

the architecture reconstruction efforts were successful in each of these contexts with their

various goals, and that the architects found the Dali views to be useful. However, a difficulty

32

was identified concerning the static analysis of the C and C++ systems. It was found that

identifier names extracted from the source code were often not unique, and could not be

discriminated between without compiling and linking. O’Brien concludes that architecture

reconstruction requires tool support, and that such tools are available. However, research is

required to improve the reconstruction process and the tools that support it.

2.2.6.3 Comparison

Systä et al. [Systä 2001] observe that Dali uses a single merged view to represent both static

and dynamic information about the software system, whereas Shimba uses separate, linked

views to separate static and dynamic information.

2.2.6.4 Assessment

As with the other architecture-level visualisation tools in this report, Dali would appear to be

well-suited to general software comprehension tasks. The intended role of Dali as an

architectural extractor may make it less suitable for specific reverse engineering tasks.

However, performance in either type of task will depend on the ease with which appropriate

architectural patterns can be identified and useful architectural views built.

2.2.7 Ovation (level 2)

2.2.7.1 Description

De Pauw et al. [De Pauw 1998] describe a tool for visualising programs using an execution

pattern view, which is a variation of Jacobson’s interaction diagrams [Jacobson 1992]. The

technique is based on that used in Ovation [De Pauw 1993, De Pauw 1994], and has since

been implemented in Jinsight [De Pauw 2002]. De Pauw et al. [De Pauw 1998] recognise the

inherent information overload problem, noting that both statically complex and small,

repetitive programs can produce huge traces. They state that dynamic execution trace data

can be comprehended if it is summarised into distinct, abstract portions and detail is

provided to the analyst on demand, and if patterns in the trace can be detected and

33

generalised. The execution pattern view achieves these requirements by allowing the analyst

to examine program execution at various levels of detail, with information supplied only on

demand, and by extracting and visualising generalised patterns in the trace. Ovation can

visualise C++ or Java programs using traces generated from the VisualAge development

environment [IBM 2004a], and Smalltalk programs through instrumentation added to the

Little Smalltalk [Budd 1987] and VisualAge Smalltalk [IBM 2003] environments.

De Pauw et al. observe that, while interaction diagrams are an improvement on directed

graphs for illustrating program interactions, they do not scale up well to larger execution

traces. The execution pattern view instead uses a tree structure, emphasising the progression

of time, rather than control structure. Colour is used to indicate the class of an object, and a

unique object ID appears in each object box. In the execution pattern views, horizontal space

is mapped to the call sequence, not the object population, and vertical space is also used

more efficiently. The view can be explored by searching for execution patterns based on a

number of criteria, such as the involvement of a specific class, object, or method. Subtrees

can be collapsed and expanded, allowing the user to “drill down” to focus on interactions of

interest while excluding extraneous detail. The context of the view can also be changed by

moving up or down the call hierarchy. Filtered expansion is also possible, for example by

expanding only those nodes in the tree that lead to a certain type of object. The system can

detect repetition automatically, either in the form of iteration (shown vertically) or recursion

(shown horizontally). Zooming and panning the view is also supported. Flattening can be

used to limit the horizontal depth by collapsing only the receiver of the message.

Underlaying saves horizontal space by hiding all the messages sent by the underlaying class

and displaying call recipient objects on top of the object that initiated the call. These

techniques allow the analyst to navigate the execution one step at a time. A number of

alternative charts for representing subtrees are available, including class legends and class

communication graphs. Other possible charts could include a CPU time meter, a call matrix,

or an instance histogram [De Pauw 1993]. To aid comprehension, “flyovers” and zooming

(without scaling method names) are supported.

Ovation supports generalized (i.e. non-identical occurrences) pattern matching for detecting

patterns of similar execution. The generalization criteria for pattern matching implemented

in Ovation are those that De Pauw et al. report that programmers found most useful: object

identity, class identity, message structure, depth-limiting, repetition, polymorphism,

associativity, and commutativity. To implement this generalisation, the tool assigns a hash

34

value to each subtree of the execution tree. The subtree hash code is formed from the hash

codes of the subtree’s children and values in the subtree’s root. The values used to form the

hash code depend on the matching criterion specified, e.g. method names (method and class

names would be used) or class names (class names would be used). The hash values are

stored in a pattern dictionary, which records summary statistics for each entry (e.g.

frequency of this pattern). De Pauw et al. argue that the execution pattern view bridges the

gap between microscopic and macroscopic visualisation representations by providing a view

of the entire trace, with more detail available on demand.

De Pauw et al. conclude that execution patterns have three key benefits for object-oriented

visualisation. Firstly, they provide a convenient representation of object-oriented

communication. Secondly, similar execution patterns can be generalised. Thirdly, execution

patterns can help in the assessment of system complexity (for example through metrics such

as pattern redundancy). Future work is to include improving the flexibility of the pattern

matching, visual grammars, and reporting of qualitative results.

2.2.7.2 Evaluation

De Pauw et al. [De Pauw 1998] report that the system proved helpful for discovering

unexpected behaviour, comprehension of unfamiliar code, and performance improvement in

both medium-sized systems (such as Ovation itself) and large systems (such as Taligent).

2.2.7.3 Comparison

Walker et al. [Walker 1998] believe that the analyst requires a detailed knowledge of the

system under investigation in order to compose appropriate queries for Ovation.

Koskimies and Mössenböck [Koskimies 1996] believe that the techniques employed in

Scene and Ovation are complementary. They observe that Ovation compresses the extracted

execution trace into statistical information, while Scene retains the trace. The variations in

the two approaches are due to their different intended applications. Scene aims to visualise

method calls and returns, whereas Ovation aims to characterise and illustrate programs using

35

dynamic statistics. The call summary in Scene is an example of such statistical information,

and was inspired by earlier work by De Pauw et al. [De Pauw 1994].

2.2.7.4 Assessment

As with the other method-level visualisation tools, it would be expected that Ovation would

perform better in a specific reverse engineering task, where the area of application would be

more focussed than in a general software comprehension task. However, the summary views

of Ovation may be useful in this latter context.

2.2.8 Reflexion models (level 4)

2.2.8.1 Description

2.2.8.1.1 AVID

Walker et al. [Walker 1998] describe an approach for producing architectural–level

visualisations of behaviour. The approach derives its abstractions from the number of objects

in the program trace, and the communications between these objects. The tool uses a

sequence of cels to represent the information collected during the system’s execution. Each

cel constitutes an abstraction of dynamic information about the system at that point, and

about the execution until that point. The approach is intended to complement and extend

existing techniques for analysing dynamic information. The benefits of the approach are that

it enables the analysis of a system without changing the source code, allows the user to

manipulate the abstraction, provides an offline visualisation that is independent of the

execution speed of the target system, and allows the analyst to navigate both forwards and

backwards through the visualisation. The tool was originally implemented in Smalltalk for

the analysis of Smalltalk programs, and has since been implemented in Java for the analysis

of Java programs and named AVID (Architectural Visualization of Dynamics in Java

Systems).

The tool has two main views. One view displays a series of cels showing the events that

occurred during the program execution. The other view is a summary view showing cels

36

representing an aggregate of the whole execution. The execution can be viewed in an

animated form in the first view, and the user can step both forwards and backwards through

the execution. Each cel consists of: a box that represents a set of objects in the high-level

model defined by the analyst; a directed hyperarc passing between and through a number of

boxes; a set of directed arcs between pairs of boxes, representing method calls; a histogram

representing the age and garbage collection status of the objects associated with the box;

annotations and bars within boxes; and annotations on each directed arc. The hyperarc

represents the call stack at the end of the interval displayed. The summary view is equivalent

to the final cel of the animated view. Additionally, it displays two histograms for each box:

one showing the pattern of object allocation for the entire execution, and the other the age of

garbage-collected objects. Although only one view can be displayed at a time, the offline

nature of the tool allows multiple instances to be run simultaneously on the same execution

trace. The animation controls allow the user to “play” the trace, step back and forward

through it, and set the step (number of cels between steps) and interval (number of events

represented per cel) size. Clicking an arc, hyperarc, or histogram in either view pops up a

text box giving more information on the selection. Walker et al. note that it would be

possible to link the tool with a textual code browser, and have the browser jump to the

relevant position in the source code when an item in the text box popup is selected.

Constructing a visualisation in AVID is a four-stage process. Firstly, execution data is

extracted from the system under analysis and stored to disk. Secondly, the analyst produces a

high-level model of the system using abstract entities designed to emphasise the architectural

properties that he is investigating. Thirdly, the analyst defines a mapping from the abstract

entities to the extracted dynamic information. The tool then applies this mapping to the

extracted information to produce the visualisation. Finally, the analyst examines the

visualisation to investigate the system’s dynamic behaviour. This offline, multi-stage process

increases the tool’s usability by allowing iterations over the latter stages of the process –

there is no need to re-run the program to collect the dynamic information again. This process

is based on the concept of reflexion models introduced by Murphy et al. [Murphy 1995].

The tool collects information for every method call, object creation, and object deletion,

which consists of the class of the calling (or creating) object, and either the method being

called and the class of the object containing it, or the class of the object being created or

deleted. The tool was originally implemented in Smalltalk and the dynamic information is

collected by instrumenting the Smalltalk VM. A map relates dynamic system entities (e.g.

37

objects or methods) to abstract ones (e.g. a box in the visualisation). The mapping process is

achieved by use of regular expressions. The map consists of a set of entries, each with three

parts: the name of the level of the Smalltalk structural hierarchy being mapped (i.e.

application, subapplication, category, class, or method); a regular expression defining the set

of names to be mapped for that level; and the name of the abstract entity to which the system

entities represented by these names should be mapped.

As discussed previously, the separation of visualisation from system execution by using an

off-line approach has two benefits. Firstly, pre-processing can be performed prior to

visualisation, e.g. to generate summary information for the entire execution. Secondly, it

allows the trace to be replayed from an arbitrary point without having to re-run the

execution. Concerning navigation, a further advantage of the off-line approach is that the

user can play, step back and forward through, and access randomly any part of the execution.

Although no information on execution time is built into the representations, Walker et al.

note that this could be desirable. An object is identified by a description of the call stack that

exists when the object is created.

An area for further research is the possibility of allowing objects’ mappings to change, to

allow them to “migrate” between abstraction units. Walker et al. recognise the difficulties

concerning the huge volume of data generated by tracing and believe that the flexibility and

usability of the tool are limited by the use of trace information and that the use of sampled

information could partially resolve such limitations.

2.2.8.1.2 RMTool

Murphy et al. [Murphy 2001] discuss a technique to extract a model of a system that is

“good enough” to be used for a specified task. The reflexion model technique involves

comparing a high-level model (produced by the analyst) of a system with the actual

implemented model. The analyst defines a mapping (using regular expressions) between the

source code constructs (e.g. file names, class names, function names, etc.) and his high-level

model. The RMTool (Reflexion Model Tool) system compares the two models and produces

a diagram containing the modules from the analyst’s model with three types of arc

connecting them: convergences (communications that agree with the analyst’s model),

divergences (communications that did not appear in the analyst’s model, but do appear in the

38

extracted model), and absences (communications that appear in the analyst’s model but not

in the extracted model). Not all source code constructs need be mapped to a high-level

equivalent – partial and approximate models are allowed. The process is designed to be

iterative – the mapping can be refined as the task proceeds. Murphy et al. give the key

characteristics of the technique as being that it is “lightweight”, requiring low effort and a

timeframe of hours not days, “approximate”, using a variety of source models and refining

the mapping as the analysis proceeds, and “scalable”, capable of analysing various languages

and systems from several to over 1000 KLOC. The procedure is as follows: the analyst

specifies his model; he then uses a third-party tool to extract structural information from the

system (via static or dynamic analysis); he then defines the mapping between this source

model and his high-level model; the analyst uses a tool to compute the reflexion model; and

finally he investigates the reflexion model via a GUI.

A formal Z specification of the technique for producing the reflexion models is given by

Murphy et al. Optimisations were applied to reduce the computation time to acceptable

levels (55 seconds for the 1000 KLOC MS Excel application). Murphy et al. discuss the

similarities and differences between their tool and consistency checkers, reverse engineering

tools, knowledge-based approaches, and model comparison techniques. Future work is to

include use of the tool to produce documentation on demand for a specific task.

2.2.8.2 Evaluation

2.2.8.2.1 AVID

A qualitative evaluation was obtained through two case studies involving performance-

tuning tasks on Smalltalk programs, each involving an expert and a non-expert Smalltalk

developer. The expert participant found the summary view and animated hyperarc useful, but

that the tool was lacking integration with a traditional code browser and the ability to view a

detailed stack dump as in a Smalltalk debugger. The tool was designed to allow the

integration of a code browser, but seeks to complement existing techniques, so does not seek

to replace a debugger by incorporating one. The non-expert found the garbage collection

histograms, and the correlation between abstract information and method/object names

available in the pop-up useful, but desired different displays of information, feeling that one

screen was “too cluttered”.

39

2.2.8.2.2 RMTool

Murphy et al. [Murphy 2001] discuss the tool in the context of NetBSD (written in C), and a

number of case studies are discussed, including Microsoft Excel (C), the SPIN OS (Modula-

3 and C), and a restructuring tool (C++); the tool appeared to help with all of them.

The MS Excel case study is described in more detail by Murphy and Notkin [Murphy 1997].

The Excel application consists of 1.2 MLOC of C. The goal of the reengineering task was to

identify and extract components from the application source code. To achieve this, an

understanding of the structure of the application was required. Specifically, the analyst

needed to gain an understanding of how the source code was divided into static modules, and

how the modules communicated at runtime. The analyst reported that the reflexion model

technique had assisted him in refining an architectural view of the application, and in

investigating the correspondence between that view and the source code. Additionally, the

reflexion model helped the analyst with his overall understanding of the application, and

highlighted aspects that were not apparent from the initial high-level model or the source

code. The analyst also reported that it was straightforward to focus the investigation on the

relevant parts of the system and exclude extraneous detail. Murphy and Notkin assert that

this case study proves that the reflexion model technique has useful practical applications for

the following reasons. Firstly, the analyst elected to use the reflexion model technique even

with the constraints of an industrial setting. Secondly, the analyst continued to use the

technique for future revisions of the application outwith the case study period. Thirdly, the

analyst believed that the reengineering task could have been completed sooner had the

reflexion model technique been employed earlier. Murphy and Notkin attribute much of the

success of the technique to its support for approximation in the form of unrefined areas of

the model. They believe that the results of this case study can be generalised to similar

reengineering efforts, as the application was written in a commonly-used language (C), the

source code had evolved over time with multiple developers, and the task of identifying and

extracting components from an existing system is a common one.

40

2.2.8.3 Comparison

2.2.8.3.1 AVID

Richner and Ducasse [Richner 1999] believe that the Gaudi technique complements that of

Walker et al. [Walker 1998] in recognising that object-level tracing information is too low-

level to assist in architectural understanding of a system. While the approach of Walker et al.

appears to be targeted to performance evaluation, Gaudi aims to allow the analyst to specify

the view that most suits his analysis.

Systä et al. [Systä 2001] observe that the mapping between low-level system artefacts and

high-level components of the analyst’s model in Walker et al.’s approach is constructed

manually using a declarative mapping language. Shimba presents static and dynamic

information in separate views, and Rigi is used to build high-level static components. The

analyst can then construct high-level sequence diagrams by mapping low-level artefacts to

high-level components.

2.2.8.3.2 RMTool

Richner and Ducasse [Richner 1999] note the similarity of their process with that of Murphy

and Notkin [Murphy 1997], in that it allows the analyst to navigate their investigation

through an iterative process. Another similarity is that Richner and Ducasse also expect the

engineer to produce a high-level model of the system under analysis.

Murphy et al. [Murphy 2001] present the idea of combining models from different extractors

as a simple case of set union, which is in contrast to the production of fused views in Dali

described by Kazman and Carrière [Kazman 1997]. A possible disadvantage of the reflexion

model technique is that the analyst needs to start with a model – the system gives no help if

the model is very inaccurate. It must be considered whether or not it would always be

acceptably straightforward to produce a sufficiently accurate model. The technique appears

to require either an understanding of the system under investigation, or an experienced

analyst. The effort involved in producing the mapping for a large system would appear to be

considerable, even if it were produced iteratively (e.g. 1,425 map entries for Excel). The

system appears to be reliant on conventions (e.g. directory or class structure) in the source

41

code for producing its models; although Murphy et al. note that this was not a problem in

their case studies, if the source code is disorganised the model produced may be of little

value.

2.2.8.4 Assessment

The high-level architectural views produced by these tools would be expected to be useful in

general software comprehension tasks, provided appropriate high-level models of the target

system could be constructed. The reflexion model approach may be less successful with

specific reverse engineering tasks, depending on the level of abstraction required.

Specifically, tasks at a low level of abstraction, such as those concerned with intra-object

behaviour, would be too detailed for the information presented in a reflexion model to

address.

2.2.9 Gaudi (levels 3-4)

2.2.9.1 Description

Richner and Ducasse [Richner 1999] describe a technique for extracting application

visualisations from Smalltalk programs using a combination of static and dynamic

information. A set of Prolog facts defines the basic static (e.g. superclass-subclass) and

dynamic (e.g. message send) relations between elements. Derived relations can be produced

from these, such as overrides (static) and sendsCreate (dynamic). Views are defined by a

describing a set of components and the connectors between them. Prolog rules are used to

define a clustering of components (C), and a relation (R). The diagrams contain ovals

representing components, and directed arcs representing communications between those

components. Methods can also be grouped by class.

The static information is extracted by parsing the code using the MOOSE tool [Ducasse

2000] and representing it in the FAMIX model [Tichelaar 1998]. The dynamic information is

collected by instrumenting the application with Method Wrappers [Brant 1998], and stored

as Prolog facts. Prolog queries are used to build the abstractions. The Gaudi tool was used to

create the views, which were then displayed using the dot tool [Koutsofios 1996b]. Richner

42

and Ducasse note that the approach could be adapted easily to Java to C++, but that it does

not presently support concurrency.

Richner and Ducasse give the weaknesses of the approach as follows. Obtaining dynamic

information requires an executable, instrumentable system – Gaudi is therefore not suitable

for sections of partially constructed systems, or other unexecutable code. They also note the

problem of scalability, and give possible solutions as instrumenting only some

methods/classes, feedback from query results to instrumentation so that only relevant

methods are instrumented, appropriate scenario choice, and pre-analysis trace filtering.

Richner and Ducasse give the strengths of Gaudi as flexibility in the kinds of views that can

be recovered by allowing the analyst to define relations and clusterings, and in the questions

that can be answered through its use of both static and dynamic information.

Future work includes determining which views are most useful in reverse engineering, and

guidelines for the use of such tools in reverse engineering.

2.2.9.2 Evaluation

A case study of reverse engineering of Smalltalk HotDraw [Johnson 1992, Beck 1994]

demonstrates the technique. The case study proceeded as follows. A high level view was

created that shows all the relations between HotDraw classes, grouped by Smalltalk

category. Based on this information, a new clustering was then defined to give a different

view. A view was then created showing creation invocations, and one to show non-creation

invocations.

The clustering in Gaudi provided a number of views at different levels of granularity, while

the combination of static and dynamic information was reported to assist in focussing the

effort. The views produced helped the analyst to formulate questions about the interactions

in the system, and provided a comparison with his own mental model of the system.

43

2.2.9.3 Comparison

Systä et al. [Systä 2001] observe that the query-based approach of Gaudi allows the user to

tailor the views produced, which may contain either static or dynamic information, or a

combination of both, and exist at various levels of abstraction. The query-based approach

also allows the analyst to control the volume of information generated. However, unlike

Shimba, Gaudi does not support the direct exchange of information among views.

2.2.9.4 Assessment

In common with other architectural-level tools, Gaudi would be expected to perform best in

general software comprehension tasks. The varying levels of abstraction that can be

produced using its query-based approach may also allow it to perform well in specific

reverse engineering tasks.

2.2.10 Shimba (levels 2-4)

2.2.10.1 Description

Systä et al. [Systä 2001] describe the Shimba tool, which produces visualisations of Java

programs using both static and dynamic information. Shimba extracts static and dynamic

information from the Java bytecode of the system. It displays static information using

directed graphs (Rigi dependency graphs), and dynamic information using a variation of

UML sequence diagrams (SCED (Scenario Editor) sequence diagrams) from which

statecharts can be generated automatically. The principal contribution of this work is that

Shimba considers both static (structural) and dynamic (behavioural) information and

constructs separate diagrams for each, but maintains a relationship between the diagrams.

Most other tools consider either static structure or dynamic behaviour, or combine both into

a single diagram.

The dynamic information is extracted by running the target system under a customised Java

SDK debugger [Sun 2000], which automatically sets breakpoints in the code. Shimba

integrates the Rigi [Müller 1988, Müller 2001] (static) and SCED [Koskimies 1998]

44

(dynamic) tools to carry out both general program understanding and goal-driven reverse

engineering. Shimba (and, in a similar manner, Dali) demonstrates the possibility of

constructing software comprehension tools using pre-existing tools, rather than starting from

scratch. SCED sequence diagrams can be used to slice the static graphs produced by Rigi, to

enable visualisation of the part of the system that is responsible for a particular observed

behaviour. Rigi graphs can be used to guide the generation of SCED sequence diagrams to

observe the behaviour of a specific part of the system, and can also be used to raise the level

of abstraction of the SCED diagrams. Dynamic control flow information can also be added to

sequence diagrams, while the static graphs can be annotated with software metrics

[Chidamber 1994]. The event trace explosion problem is handled by applying behavioural

pattern matching algorithms [Boyer 1977] to the trace to extract out repeated patterns. These

are then represented in the SCED sequence diagram using subscenario and repetition

constructs. The trace can be split (both automatically and by the user) into a number of

smaller traces to limit the size of the sequence diagrams produced.

Systä et al. note that the techniques in Shimba are also applicable to forward engineering, to

check the implemented structure against design guidelines and the implemented behaviour

against use cases. Future work is planned to integrate the techniques of Shimba into the

Nokia TED UML modelling tool [Wikman 1998]. This will allow the usefulness of the

techniques in Shimba to be studied with real users, and will allow tighter integration than is

possible with current reverse engineering environments. Systä et al. comment that a reverse

engineering environment using various UML diagrams would be useful.

Further details on the use of Shimba in analysing metrics is given by Systä et al. [Systä

2000a]. Further information is available on the reverse engineering of Java software using

Shimba [Systä 2000b, Systä 2000c], using Rigi and SCED [Systä 1999a], and using SCED

[Systä 1999b, Systä 2000d].

2.2.10.2 Evaluation

A case study of the FUJABA system [Rockel 2000] illustrates the use of Shimba. The

combination of static and dyamic information was found to be particularly useful. Although

the string matching algorithms employed were able to detect numerous, nested patterns in

the trace, one of the most problematic aspects involved structuring the SCED sequence

45

diagrams using behavioural patterns. One problem related to the naming of subscenario

boxes, which is automatic and therefore not descriptive of the subscenario. Another problem

relating to subscenarios was that a pattern is defined based on its length and contains an

arbitrary sequence of SCED sequence diagram elements, which may not necessarily form a

logical unit within the context of the system under analysis.

Using static information to guide the generation of dynamic information was found to be

particularly useful for goal-driven reverse engineering tasks. This helps to prune

‘uninteresting’ information from the visualisation. The statechart synthesis functionality was

useful for analysing the dynamic behaviour and control flow of selected parts of the system.

The model slicing technique was used to determine the cause of certain behaviour, the

system structure that relates to this behaviour, and how elements of a SCED sequence

diagram relate to the rest of the system. Raising the level of abstraction of the SCED

sequence diagrams using static Rigi abstractions was also employed to understand

communication between high-level components, and to validate such static abstractions.

Further information on this case study is given by Systä [Systä 2000c].

2.2.10.3 Comparison

Unlike other tools that produce diagrams containing only static or dynamic information, or

combine both into one diagram, Shimba produces separate diagrams for static and dynamic

information and provides linkages between them. The pattern-matching functionality is

comparable to that employed in Ovation, allowing repeated behaviour to be factored out as a

subscenario in the visualisation. Shimba’s automatic statechart generation function is unique

– none of the other tools considered produce state-level representations of dynamic

behaviour.

2.2.10.4 Assessment

The sequence diagram, statechart, and dependency graph representations used in Shimba

should enable it to perform well in specific reverse engineering tasks. The ability to slice the

static dependency graphs using dynamic sequence diagrams, and to raise the level of

46

abstraction of a scenario diagram using high-level static abstractions should make Shimba

useful for general software comprehension tasks also.

2.2.11 Jinsight (levels 2-3)


De Pauw et al. [De Pauw 2002] describe the Jinsight tool and its application to the visual

exploration of runtime information. Jinsight illustrates object population, thread activity, and

method calls in Java software. Jinsight includes a profiling agent that is used to produce an

execution trace from which visualisations are generated. Tracing can be enabled and disabled

during execution. Uninteresting classes and packages can be excluded from the visualisation.

Visualisations are presented in the form of interdependent views, each of which illustrates a

different facet of the software’s runtime behaviour.

One such view is the histogram view, which illustrates resource usage (CPU time and

memory space) for classes, objects, and methods. This view allows the analyst to identify

‘hot spots’1 of activity in the execution that could indicate a bottleneck. Each row in the

histogram corresponds to a class in the system. Symbols are coloured to represent activity on

that class, such as the time spent executing methods of the class, the number of calls made to

methods in the class, the amount of memory consumed by instances of the class, or the

number of threads in which instances of the class participate. Hollow rectangles represent

garbage-collected objects, which helps in identifying memory leaks. The lines in the

histogram view represent inter-object communication, and can be set to show either method

calls, object creation, or references between objects. However, the combinatorial nature of

inter-object communications means that this aspect of the histogram view is not scalable

beyond very simple programs.

One way in which Jinsight simplifies the huge amount of data produced from a dynamic

trace is through pattern extraction. A pattern extractor analyses the event trace information

and identifies patterns of repeated behaviour. These patterns can be used to present an

aggregated view of the execution. The reference pattern view illustrates patterns of object

1 Hot spots in this context are distinct from hot spots in the context of framework reuse, where they are points where a framework is designed to be extended.

47

references in the execution. Colours denote classes. Double rectangles represent a group of

objects of a certain type. Labels denote the number of instances of a class, and the class

name. The reference pattern view can be used to help in identifying memory leaks in the

form of objects that are no longer required but cannot be garbage-collected due to

outstanding references from other objects.

Jinsight’s execution view illustrates the sequence of method calls that make up the event

trace of the system’s dynamic behaviour. Time proceeds from top to bottom. Each horizontal

stripe represents the execution of a method, with deeper calls at the right hand side. Stripes

are coloured by class. A vertical lane constitutes all of the method stripes for a thread of the

execution. Lanes are added from left to right. Zooming in further to the execution view

reveals individual method calls, annotated with their names. Pattern recognition can also be

applied to the execution view. The execution pattern view illustrates patterns of method calls

in the execution.

The call tree view gives quantitative data on the sequence of method calls, including the

number of calls and their contribution to the total execution time.

Jinsight allows the analyst to group related behaviour into execution slices, which can be

used as a basis for comparison between executions, or to filter out information not pertinent

to the visualisation objectives. Execution slices can be defined by selecting elements in a

view, or by querying the trace data directly.

De Pauw and Sevitsky [De Pauw 1999, De Pauw 2000] describe the use of Jinsight in

examining memory leaks, while Sevitsky et al. [Sevitsky 2001] discuss the use of Jinsight

for performance analysis. A brief summary of Jinsight’s functionality is given by De Pauw et

al. [De Pauw 2001].

Future work includes enabling the visualisation of systems running on multiple JVMs

simultaneously and across networks, and of heterogeneous systems containing middleware

such as databases in addition to Java components.

48

2.2.11.2 Evaluation

De Pauw et al. [De Pauw 2002] report that Jinsight has been used successfully to diagnose a

number of problems in industrial applications. They note that the system did not perform

well when analysing high-volume web-based applications as the tracing overhead caused

undesirable behaviour in the application, requiring more selective trace information

collection. They found that their aggregate statistics did not provide sufficient information to

support some analyses, and that broad filtering at the class or method level did not scale

well. To rectify this, Jinsight allows task-oriented tracing, where relevant details can be

extracted while retaining other important contextual information.

2.2.11.3 Comparison

De Pauw et al. [De Pauw 2002] note that it is important to select appropriate diagram

abstractions that are sufficiently scalable to large amounts of execution information. They

comment that Sefika et al. [Sefika 1996] use large architectural units, while Walker et al.

[Walker 1998] include additional structural units to organise the data.

Jinisight shares some ideas with Ovation [De Pauw 1998], notably the concept of execution

patterns.

2.2.11.4 Assessment

The call tree view and execution view would be expected to help with specific reverse

engineering tasks. The reference patterns may be useful for general software comprehension

tasks. Jinsight would appear to be particularly useful for examining performance issues, for

which the histogram view would be useful.

49

2.2.12 Collaboration Browser (levels 2-4)


Richner and Ducasse [Richner 2002a] describe a process for recovering collaborations from

software systems using dynamic information. A tool called the Collaboration Browser

illustrates the technique. A collaboration represents a part of the software system that

performs some function and details how the classes that make up the collaboration interact

by playing certain roles.

The first stage in extracting collaborations from source code is to analyse the code

dynamically to extract interactions. Static analysis is inadequate for this purpose as it cannot

provide the object-oriented control flow information required. It is then necessary to identify

the important collaborations that help to answer the analyst’s questions. Collaboration

Browser records an event trace containing information for each method call, consisting of

sender class and identity, receiver class and identity, and the name of the called method.

Pattern matching is used to abstract similar sequences of execution from the trace. Querying

allows the analyst to identify the interesting collaborations.

A collaboration instance is the sequence of method calls between a method call and its

corresponding return. A collaboration pattern is a generalised class of collaboration

instances, and represents the collaboration design concept. The set of methods called on a

class during a collaboration pattern corresponds to the role design concept.

The pattern matching settings used to identify collaboration patterns from instances can be

adjusted in three ways. Firstly, any of the five items of information that represent an event in

the trace (caller class and identity, callee class and identity, and method) can be included or

excluded from the match. Secondly, events can be ignored when an object sends itself a

message, or if the depth of invocation exceeds some limit in the pattern or overall execution.

Thirdly, the analyst can choose to treat events as a tree-structure sequence, or simply as a set

of events with no implied ordering.

Collaboration Browser uses a textual GUI to allow the analyst to query the entire execution

or a single collaboration. The analysis can be focussed by excluding selected senders,

receivers, or methods. A collaboration can also be illustrated as a sequence diagram.

50

Two limitations of the Collaboration Browser were identified as follows. Firstly, the pattern

matching was simplified by only considering all of the events between a method call and

return; it could be useful to consider a subset. Secondly, the role of a class is identified as the

set of all methods called on that class during the execution; considering individual class

instances separately could produce a more refined view of roles.

Collaboration Browser is implemented in Smalltalk and visualises Smalltalk programs. The

program to be analysed is instrumented using Method Wrappers [Brant 1998], which allows

selective instrumentation. The Interaction Diagram tool [Brant 1998] is used as the basis for

the sequence diagram representations.

Richner and Ducasse note that the recovery of collaborations is most effective when

combined with high-level views showing the interaction of components in a system [Richner

1999, Richner 2002b].

2.2.12.2 Evaluation

Collaboration Browser is evaluated in a HotDraw case study where the goal is to investigate

the implementation of tools. The scenario executed produced 53,735 method calls, from

which 183 collaboration patterns were extracted using the pattern matching functionality.

The results were then queried to discover the collaboration patterns containing an interaction

between the Tool class and another class in the trace; this produced twelve unique

collaboration patterns. The results were then focussed further to examine four collaboration

patterns resulting from a call to Tool.handleEvent. Further queries on these collaboration

patterns revealed the role played by each of the participant classes. The role of Tool in other

collaborations was also investigated. Further case study evaluation of Collaboration Browser

is given By Richner [Richner 2002b].

It is reported that the case studies showed that the queries helped in locating interesting

collaborations and in understanding the roles of classes in collaborations. They also

demonstrate that the process cannot be fully automated – a human analyst is required. It was

a challenge to identify suitable pattern matching criteria to obtain a balance between too

much and too little information. The iterative process employed in the case study was as

51

follows: collaboration patterns were created; queries were formulated regarding class

interfaces; collaboration patterns involving certain classes were identified; the collaboration

pattern participants were investigated; and the collaboration was investigated further using

the interaction diagram representation.

2.2.12.3 Comparison

Richner and Ducasse [Richner 2002a] consider their approach to be complementary to other

reverse engineering techniques that are more focussed towards visualisation, such as those of

De Pauw et al. (Ovation and Jinsight) [De Pauw 1998] and ISVis [Jerding 1997]. The

approach of Richner and Ducasse is focussed more on querying the trace data to extract

collaborations than on producing a visualisation. They feel that, whereas the techniques of

De Pauw et al. and Jerding and Rugaber consider the trace as a whole, their approach

complements these techniques by concentrating on smaller portions of the interaction. They

also note that no single tool can provide all of the functionality necessary for design

recovery.

The only other approach that attempts to reverse engineer collaborations is one based on

static analysis only [De Hondt 1998]. This approach relies on the analyst selecting

participants and roles for the collaboration and proposing appropriate links between them.

2.2.12.4 Assessment

The collaboration approach used in Collaboration Browser suggests that it would be useful

in general comprehension of specific parts of a software system. The ability to view

collaborations as sequence diagrams would be expected to be helpful in specific reverse

engineering tasks.

52

2.2.13 Together debugger (level 1)


The Together debugger is part of the Together ControlCenter development environment

[TogetherSoft 2001a, TogetherSoft 2001b]. It provides all of the standard debugger features,

including breakpoints, expression evaluation and monitoring, variable modification, and

program flow control. Breakpoints can be set at classes, methods, lines, or exceptions.

Whenever a breakpoint is encountered during the execution of the program, the debugger

outputs a message and/or suspends the execution. The values of variables and expressions

can be monitored during execution, and variable values can be modified. Program execution

can be suspended and resumed by the user. Execution can proceed as normal, or in steps

where the debugger executes one line of code then suspends. The user can instruct the

debugger to step to the next line, or into, out of, or over a method. Integration with the

source code allows the user to set breakpoints and watches by selecting a position in the

code, and also to instruct the debugger to run the program up to the current cursor position.

Many IDEs provide a debugger as part of their standard tool set, such as Eclipse [Eclipse

2005].

2.2.13.2 Evaluation

There do not appear to have been any evaluations published regarding the performance or

functionality of the Together debugger.

2.2.13.3 Comparison

The graphical interface of the Together debugger makes it easier for non-experts to use.

Debuggers traditionally have a command line interface, for example jdb [Sun 2002]. The

integration with the source code also makes it more convenient to set and manage

breakpoints and watches.

53

2.2.13.4 Assessment

The low level information provided by the debugger is likely to be useful for some specific

reverse engineering tasks, which are often amenable to analysis at a low level of abstraction.

The debugger is less likely to be useful for general software comprehension tasks, where

information at a higher level of abstraction is typically required.

2.2.14 Together diagrams (levels 2-3)


Together ControlCenter can produce UML class and interaction diagrams from program

source code. Unlike other tools considered in this section, Together produces behavioural

diagrams by parsing the program code, rather than by analysing an event trace. As discussed

in Section 2.1, this limits the accuracy of the interaction diagrams generated, while

maximising their generality by considering the entire system. When generating interaction

diagrams, Together addresses the potential information overload problem by allowing the

user to select the classes to be included in the diagram, limit the depth of method calls to be

included, and hide method internals. Interaction diagrams are generated for a method

specified by the user. Together supports ‘simultaneous round trip engineering’, meaning that

changes to the program code are reflected in the derived diagrams and vice versa.

2.2.14.2 Evaluation

Kollmann et al. [Kollmann 2002a] present a comparison of four static reverse engineering

tools. Together is compared with the commercial Rational Rose tool [Rational 2003], and the

IDEA [Kollmann 2001, Kollmann 2002b] and Fujaba [Fujaba 2002] research tools. The

tools were assessed by evaluating the class diagrams that they produced. While basic

diagram generation results were broadly similar across the tool set, Rational Rose detected

some associations that Together did not. The research tools were able to handle more

advanced diagram concepts than the industrial tools, such as multiplicities, inverse

associations, and container resolution.

54

2.2.14.3 Comparison

Together is unique among the tools in this section as it produces behavioural diagrams by

parsing the program code. All other tools considered extract behavioural information

dynamically. As discussed above, this has the effect of reducing the detail of the diagrams

while increasing their generality.

2.2.14.4 Assessment

It would be expected that the combination of the class and interaction diagrams for the entire

system produced by Together would be useful in general software comprehension tasks. The

lack of dynamically extracted information and resultant lack of detail may be a problem in

specific reverse engineering tasks.

2.2.15 SHriMP (levels 0, 2-4)


SHriMP (Simple Hierarchical Multi-Perspective) views [Storey 1995] display software

modelled as nested graphs [Harel 1988] using fisheye views [Furnas 1986]. Nodes represent

software artefacts, such as functions or variables. Arcs represent dependencies, such as

function calls. Composite nodes represent subsystems, and composite arcs represent

collections of dependencies. This nesting encapsulates the hierarchical nature of the

software, and allows multiple levels of abstraction to be visualised concurrently. The fisheye

view approach allows the analyst to examine some area of the system in detail in the context

of the entire system. This is achieved by enlarging the nodes of interest while shrinking those

not immediately relevant. Graphs also include links to the source code. SHriMP is intended

to i) provide the user with a range of views of a system, from information about its

architecture down to the source code; and ii) enable the user to focus in on part of the system

while maintaining the big picture.

55

2.2.15.2 Evaluation

The authors describe the application of SHriMP to comprehend the structures of two

systems. Ray Tracer is a C system consisting of approximately thirty modules, and SQL/DS

(Structured Query Language/Data System) is an RDBMS written in PL/AS (a proprietary

IBM systems language) consisting of around 1,300 compilation units. SHriMP views were

implemented as an extension to the Rigi program understanding tool, and their performance

was compared to that of Rigi without nested graphs and fisheye views. They found that

showing detail in context, visualising software structures, visualising source code, and

navigating the hierarchy to be useful techniques in comprehending the subject systems. One

potential drawback noted was that the capability of Rigi to illustrate part of the software

system in a separate window without higher-level information is not present in SHriMP; this

may be useful for very large systems where the maintainer is only interested in a small part

of the system. Similarly, the Rigi overview window, which shows a tree or graph-based view

of the containment hierarchy is not present in SHriMP; this may be a more familiar

visualisation of a hierarchy for some analysts. SHriMP has since been reimplemented using

Java Beans, and applied to itself as a case study [Storey 2001].

2.2.15.3 Comparison

Unlike most of the tools discussed thus far, SHriMP addresses a range of abstraction levels

from code to system architecture through its use of hierarchical views. SHriMP is also

unique in its use of fisheye views, which allow the analyst to display more detail for

interesting parts of the system while maintaining overall context.

2.2.15.4 Assessment

SHriMP would be expected to be useful for tasks relating to the static structure of software

systems at a range of abstraction levels.

56

2.2.16 BLOOM and JIVE (levels 2-3)


BLOOM extracts static and dynamic information [Reiss 2001]. A visual query language

allows views to be combined. The system suggests appropriate visualisations based on the

data chosen by the user.

JIVE visualises dynamic information about Java programs [Reiss 2003a]. It uses a ‘box

layout’ which consists of a number of rectangles whose height, width, hue, saturation, and

brightness depict various properties, such as number of calls, number of instantiations, etc.

2.2.16.2 Evaluation

There does not appear to be any documented evaluation of BLOOM.

Anecdotal evidence regarding the use of JIVE on a variety of Java programs is presented by

Reiss [Reiss 2003b]. It is reported that JIVE illustrates the different phases that an

application goes through during its execution, provides rudimentary performance

information, and highlights unexpected behaviour. Weaknesses reported are bias in the

statistics presented, missed thread state transitions, and unwanted artefacts in the trace. Reiss

comments that it would be useful to allow the user to determine how information is grouped.

This would allow the user to examine a program at a high level, then zoom in on a particular

area of interest. Reiss also notes that it would be useful to be able to save and later reload the

trace data.

2.2.16.3 Comparison

Like Ovation and Jinsight, JIVE and BLOOM are concerned with analysing the dynamic

behaviour of software. Reiss comments that the tracing element of Jinsight, which makes use

of a modified JVM, is too inefficient for extensive use of the tool.

57

2.2.16.4 Assessment

The focus of JIVE and BLOOM on dynamically extracted behavioural information should

make them suitable for tasks involving the run-time behaviour of software systems, such as

analysing memory leaks, etc.

2.2.17 Polymetric Views, Class Blueprint, RelVis (levels 2-3)


Bertuli et al. describe a lightweight dynamic visualisation technique [Bertuli 2003]. The

technique employs polymetric views, which consist of rectangular nodes connected by arcs,

annotated with metrics. Up to five metrics can be represented per node by the node’s x

position, y position, height, width, and colour. A minimal amount of information is collected

at run-time. The twelve measurements extracted include the number of called methods, rate

of called methods, number of method invocations, number of created instances (class-based),

and total number of method calls (method-based). This approach requires much less space to

store, and incurs a much lower overhead to collect than traditional tracing approaches. Static

information is extracted using the Moose reengineering environment (built on the FAMIX

metamodel). Wrappers are used to trace the program and output the metrics (by means of

counters). The views are specified and displayed using CodeCrawler [Lanza 2003b].

Four types of view are produced using the technique. Each view is illustrated in the context

of a case study of the Moose system. A number of system characteristics were identified

using the views. The Instance Usage Overview view shows the instantiation and usage of

classes, and is intended as a starting point for analysis. It considers the entire system, and

uses a logarithmic scale. This view is displayed as an inheritance tree with nodes

representing classes and edges representing inheritance. The node width represents the

number of created instances, the node height represents the number of called methods, and

the node colour represents the number of method invocations on a class. This view combines

both static (inheritance hierarchy, number of classes) and dynamic (number of class

instances, number of method calls, number of invoked methods) information. It is useful as

an overview of the whole system’s behaviour, and shows the classes used in the system in

the context of the inheritance hierarchy.

58

The Communication Interaction View shows inter-class communication. It considers the

entire system and uses a linear scale. This view uses the embedded spring layout, with

springs being weighted so that classes between which there is a lot of communication will be

aggregated. Nodes represent classes, and edges represent invocations. The node width and

height represent the number of called methods, the node colour represents the number of

method invocations on a class, and the edge width represents the number of invocations

between two classes. This view identifies heavily used classes. It is less scalable than the

Instance Usage Overview view as the layout algorithm employed does not readily identify

well defined groups of classes.

The Creation Interaction View shows class creation between classes. It considers the entire

system and uses a logarithmic scale. This view also uses the embedded spring layout, with

springs being weighted so that classes between which there are a lot of creation invocations

will be grouped. Nodes represent classes, and edges represent invocations. The node width

represents the number of objects created by the class, the node height and colour represent

the number of created instances of the class, and the edge width represents the number of

creation invocations between the two classes. The lower number of arcs makes the Creation

Interaction View more scalable than the Communication Interaction View.

The Method Call Origin View shows the origin of method calls – i.e. internal or external to

the class. It can be used to consider the entire system, a subsystem, or a single class, and uses

a logarithmic scale. This view is displayed as a scatterplot, with nodes representing methods.

The x coordinate represents the number of calls from external methods, the y coordinate

represents the number of calls from internal methods, and the node colour represents the total

number of calls. The scatterplot layout illustrates the three metrics well, even with a large

number of nodes.

The class blueprint approach visualises the static structure of a class [Lanza 2001]. A class

blueprint is based on a template consisting of five rectangles representing Initialization,

Interface, Implementation, Accessor, and Attributes. Size, shape, and colour are used to

visualise these attributes – there is an obvious connection with the authors’ polymetric views

approach.

59

The RelVis approach provides graphical views of source code and release history

information [Pinzger 2005]. Kiviat diagrams are used to display metrics, which can then be

used to identify trends and hence potential refactoring targets [Kolence 1973].

2.2.17.2 Evaluation

The polymetric views approach has been used to analyse a number of applications of up to

1800 classes in size written in Smalltalk, COBOL, C, C++, and Java. It was found that the

approach was useful to give an overview of the system, assess the quality of inheritance

hierarchies, identify candidate classes for refactoring, and assess class coupling.

Disadvantages of the approach are that it considers only static information, lacks detail in

places, and the reengineering of larger systems would require more information than is

provided by the current model [Demeyer 1999, Ducasse 2001, Lanza 2003a].

A version of the approach based on run-time information was useful in providing insights

into the runtime behaviour of the system, presenting various different kinds of information,

and providing overviews as well as more detailed information [Ducasse 2004]. Drawbacks

include the lack of very detailed information (e.g. sequence of interactions, as in a sequence

diagram), and that the user is required to interact with the view to gather the relevant

information.

The approach has also been applied to the problem of software evolution [Lanza 2002]. It

was found that the approach reduces complexity and provides system wide views of the

evolution, provides a finger-grained understanding of class evolution, builds a vocabulary to

describe evolution, and scales well. Limitations include fragility relating to class naming,

screen limitations that necessitate working at a new level of abstraction, and a lack of other

levels of granularity.

Another application of the approach was to the problem of code duplication [Rieger 2004]. It

is reported that the goal of data reduction on different levels was achieved and that the views

were useful for providing overview information. However, layout and readability could be

improved, and a link to the source code would be useful.

60

The approach has also been used to analyse class hierarchy evolution [Gîrba 2005]. Gîrba et

al. describe how they used the technique to answer a number of questions regarding the

evolution of the inheritance hierarchies in several systems.

Two case studies of the class blueprint approach are presented by Lanza and Ducasse [Lanza

2001] and four by Ducasse and Lanza [Ducasse 2005]. The benefits of the approach are

listed as the reduction of complexity and the definition of a common vocabulary. Limitations

are the lack of consideration for cognitive science, the layout, the lack of illustration of a

class’s functionality, the lack of illustration of collaboration between classes, and the lack of

dynamic information.

Pinzger et al. [Pinzger 2005] demonstrate the RelVis approach by applying the technique to

seven releases of the open source Mozilla project spanning three years. The graphs produced

highlighted positive and negative trends in the entities and relationships of the system.

Future work is planned to explore 3D Kiviat diagrams and different sets of metrics.

2.2.17.3 Comparison

Bertuli et al. note that AVID, Program Explorer, ISVis, and Jinsight all employ sophisticated

diagramming techniques to make an entire event trace comprehensible [Bertuli 2003]. In

contrast, the polymetric views approach condenses this information into a number of metrics

that are used to annotate visualisations.

Bertuli et al. note that Program Explorer focuses on classes and objects, such as method

invocation, object instantiation, and attribute access, but it is not intended as a global

understanding tool. They point out the user must know what he is looking for before

commencing the analysis, whereas the polymetric views approaches are intended to cover

the whole system.

Bertuli et al. explain that the purpose of ISVis is to visualise method calls. While patterns can

be recognised and extracted, there is a lack of flexibility in the analysis. The approach scales

well for a large number of messages, but not for a large number of classes in which case the

visualisation becomes less useful.

61

Bertuli et al. comment that AVID is focussed on the lifetimes and number of object in a

system. They explain that AVID is concerned more with static architectural models while the

polymetric views based approaches consider the various types of interactions between

classes during execution.

Bertuli et al. point out that Jinsight visualises messages between objects and extracts

execution patterns, but that class roles are difficult to understand during execution for large

traces. They comment further that the approach taken by Ovation involving class call

clusters and class call matrix is closer to their approach. However, while such visualisations

are simple and have good scalability, they present only a small facet of an OO application.

2.2.17.4 Assessment

The advantages of the techniques based on the polymetric views approach are as follows.

The lightweight approach employed allows minimal disruption to the system under analysis.

It also reduces the amount of data produced compared to a full trace. The technique can be

attached to a running system. This allows it to be used for systems such as web servers that

are running constantly. The approach is incremental and data can be analysed cumulatively.

The views provide overviews as well as more fine-grained information. The disadvantages

are the lack of invocation sequence level information, as in Jinsight, and the shortcomings of

the spring layout in dealing with high levels of communication.

2.2.18 Seesoft, SeeSys, SeeSlice, HierNet, SeeNet, SeeNet3D (levels 0, 2-3)


Seesoft visualises source code from systems up to 50KLOC [Eick 1992]. Each line of code is

mapped to a thin row of colour. The four key ideas are: reduced representation, colouring by

statistic, direct manipulation, and capability to read the actual code. Data can be taken from

version control systems, static analyses, or dynamic analyses.

SeeSys visualises statistics associated with code organised hierarchically into systems,

subsystems, and files [Baker 1994]. The approach can display the relative sizes of

62

components, which components are stable and which are changing, where new functionality

is being added, and identify error-prone code that has many bug fixes. Animation can be

used to display code evolution. The visualisation is based on nested rectangles. Each

subsystem is denoted by a rectangle whose area is determined by some statistic. These

rectangles are then partitioned to show their internal directory structure, each sub-rectangle’s

area being proportional to the NCSL (non-commentary source lines) metric for that

directory. Rectangles can be filled to illustrate additional metrics. For example, a fill may be

used to show the proportion of a directory’s NCSL that corresponds to new code.

SeeSlice is a tool that allows slicing at the statement, procedure, or file level and visualises

the structure of the slice produced [Ball 1994]. Files are displayed as columns containing

representations of procedures. Procedures can be display ‘open’ (code visible) or ‘closed’

(code hidden). Pointing to a statement immediately highlights the procedures and code

included in the slice.

HierNet visualises networks where each link has an associated weight, and exploits any

hierarchy present [Eick 1993]. The position, area, and colour of nodes is significant, as is the

colour of arcs. For example, in an email system, the area of a node could be proportional to

the number of messages sent or received by the user represented by that node, the node

colour could indicate job function (clerical, technical, management), links could show email

communication between individuals, and a heat colouring scale could be used to indicate

communication frequency.

SeeNet is a tool for visualising network data [Becker 1995]. It consists of three static

displays and direct manipulation techniques that allow these displays to be parameterised.

Link maps consist of nodes connected by lines to indicate data flow. This shows the

connectivity of the network. Line segments may be coloured or drawn with varying

thicknesses to illustrate values. Arrows can be used to indicate link directionality. Problems

with link maps are link overlap, long links, and difficulties in determining line terminations.

An alternative representation that avoids this clutter is the nodemap. Nodemaps use symbols

or glyphs to represent nodes and illustrate statistics through visual characteristics such as

size, shape, and colour. Complex glyphs can represent more than one statistic. Although a

nodemap solves the clutter problem experienced with link maps, it does so at the expense of

detailed information about individual links. Another possible approach to the clutter problem

is to omit geographical information. A matrix display displays a network in a matrix form

63

with each matrix element allocated to a link. While overcoming the clutter problem, the

matrix display sacrifices information about the geography of the network – indeed, it may

introduce a false idea of geography due to the ambiguous ordering of rows and columns.

SeeNet allows direct manipulation of the various parameters involved in network

visualisations (statistics, levels, geography/topology, time, aggregation, size, colour).

SeeNet3D introduces five new three-dimensional views to address some of the fundamental

problems that limit the scalability of two-dimensional geographical network displays [Cox

1996]. A global network positions nodes geographically on a globe and draws arcs between

them. Restricting the 3D space to a globe captures many of the advantages of a general 3D

layout, while helping the user to maintain context. Users are also familiar with globes. Arc

crossings, and hence visual clutter, are reduced by the background of the globe surface and

the 3D embedding. An arc map positions nodes on a flat 2D map and draws arcs between

them in 3D space. Advantages of arc maps are that they are not restricted to whole world

displays, they can be positioned arbitrarily in space, the use of arcs greatly reduces the line

crossings typical in 2D displays, and the most important links are represented by the highest

arcs. To analyse a particular node or subnetwork, drill-down network views can be

employed. These linked views showing data on demand display links emanating from a

central focal node. Spoke displays order nodes around the focal node in a circle. Spoke

displays become overwhelmed with >50-100 nodes. This problem can be circumvented by

means of a 3D layout that positions the nodes on a helix. An alternative 3D display,

motivated by the helix display, positions the nodes approximately uniformly round a sphere

(as with a pincushion), thus forming them into lines of latitude. Another alternative is to

tessellate the sphere surface and select points from the tessellation. To be effective, the

pincushion (like the helix) views need to be viewed interactively with motion.

2.2.18.2 Evaluation

An example of using Seesoft to visualise change data is presented for a 9 KLOC system

[Eick 1992]. The analyst was able to learn which files were changed most often, the age of

the code, when each file was last changed, and how files can be grouped by modification

request. Anecdotal evidence of field experiences is also discussed.

64

SeeSys [Baker 1994] is applied to the source code for the ATT 5ESS telephone switch. The

system consists of several MLOC, written by thousands over programmers over a decade.

They were able to show the sizes of the subsystems and directories that have changed

recently, zoom in on particularly active subsystems, discover how much of the development

activity involved bug fixes and new functionality, identify directories and subsystems with

high fix-on-fix rates, and identify the subsystems that have been historically active and also

those that have shrunk or been removed.

SeeSlice [Ball 1994] is applied to a 12KLOC profiling/tracing tool written in C. They were

able to determine that most of the program is dependent on five highly interdependent input

procedures, a set of interdependent procedures spanning four files is responsible for output,

and a single variable influences a large portion of the program.

HierNet is demonstrated by applying it to an intra-departmental email network over eight

months and to changes to a large section of a computer program [Eick 1993]. For the email

network, the visualisation showed that the amount of mail varies greatly, a community of

three users can be identified, and that it typically takes two months for communication

patterns to solidify after a new user joins the system. For the software system, the approach

reduced the size of the data set, found a large group of near-identical modules, located a

group of modules performing a function independently of the other modules, and identified

an anomalous module whose files are linked with most other modules.

The SeeNet approach is demonstrated by applying it to the CICNet packet-switched data

network and an email communications network [Becker 1995].

Anecdotal evidence of the application of the SeeNet3D approach to data such as NFS-

NET/ANSnet backbone (50 countries) and MBone Internet traffic is presented by Cox et al.

[Cox 1996].

2.2.18.3 Comparison

Unlike the other tools described in this section, these approaches focus on visualising a

particular type of data (i.e. source code, hierarchical data, network data, or slices), rather than

visualising the data for a particular purpose (e.g. to show object interactions, or to gain a

65

general understanding of the software) or at a predefined abstraction level (e.g. methods,

classes).

2.2.18.4 Assessment

These techniques would be particularly useful for tasks involving the visualisation of source

code, network data, or slices.

2.2.19 sv3D and Imsovision (levels 2-3)


sv3D uses a three-dimensional representation to visualise software structure [Marcus 2003a].

Source files are represented as coloured cylinders, where height represents nesting and

colour represents controls structure.

Imsovision uses a VR style representation to display classes and their relationships, along

with metric information [Maletic 2001]. Planes are used to represent classes, spheres

represent attributes, and columns represent functions.

2.2.19.2 Evaluation

Marcus et al. present anecdotal evidence of applying sv3D to a small (4KLOC) C++ system

to demonstrate the approach [Marcus 2003a]. Marcus et al. describe the application of sv3D

to a 56KLOC system (Doxygen) [Marcus 2003b]. They report on how sv3D was used to

identify execution hotspots from profiling information.

Imsovision is applied to a small mail system by Maletic et al. [Maletic 2001]. The purpose of

Imsovision seems to be to provide a general understanding of a system.

66

2.2.19.3 Comparison

sv3D and Imsovision are unique amongst the tools discussed here in their use of 3D and

virtual reality visualisations respectively. Marcus et al. [Marcus 2003a] comment that

SeeSoft’s use of 2D pixel bars limits the number of attributes that can be represented, and

makes it difficult to represent hierarchical relationships and multiple abstraction levels.

These are issues that sv3D seeks to address.

2.2.19.4 Assessment

sv3D would appear to be useful in analysing dynamic behaviour, while Imsovision’s strength

lies in gaining a general understand of a software system.

2.2.20 Tool summary

This section has reviewed a selection of software visualisation tools, which illustrate the

concepts described in Section 2.1. Each tool was discussed in the context of the three

characteristic criteria introduced in Section 2.2.1. Early object-oriented software

visualisation tools were concerned primarily with illustrating method-level interactions; such

tools included Program Explorer and Scene. Later tools began to consider the problem of

architectural extraction, and architectural-level visualisations were produced by tools such as

Sefika’s, ISVis, Dali, AVID, and RMTool. The latest tools have attempted to bridge the gap

between microscopic and macroscopic visualisations and provide both low-level and

architectural visualisations, namely Gaudi, Shimba, Collaboration Browser, and SHriMP.

Figure 2.4 annotates the abstraction scale from Figure 2.3 to illustrate the relative levels of

abstraction of these tools. It is clear from this figure that the extant software visualisation

tools address only a single level of abstraction or a limited range of levels.

Tools have also been developed to address specific tasks, such as Jinsight for performance

analysis, Together to support software development, BLOOM, JIVE, and sv3D for dynamic

understanding, and Imsovision for VR exploration. Some tools focus on addressing the

requirements of displaying specific types of data, such as the tools developed by Eick et al.

for visualising source code, hierarchical data, network data, and program slices, and the

67

Polymetric Views approach for visualising metrics. There is an emerging trend of

retargetable software visualisation tools, which can be used to visualise programs in a variety

of languages, rather than being designed for use with one specific language. Such tools

include Dali, RMTool, Gaudi, and Together. A retargetable design makes the tool more

flexible and should encourage usage and interoperability. Section 3 assesses those tools that

were available in the context of a case study involving both general software comprehension

and specific reverse engineering activities.

Figure 2.4 The positions of tools on the abstraction scale of Figure 2.3

2.3 Abstraction

It is clear from the foregoing discussion that abstraction is a crucial concept in software

visualisation. This section discusses the concept of abstraction and its application in software

engineering and visualisation.

2.3.1 The concept of abstraction

As stated in Section 1.1.5, abstraction is the process of producing a simplified representation

that emphasises the important information while suppressing details that are (currently)

uninteresting, with the goal of reducing complexity and increasing comprehensibility [Berard

1993]. Lee and Fishwick define an abstraction as a “generalized, idealised model of a

system” [Lee 1996]. Abstraction is employed in a wide variety of scientific fields, including

68

statistics, simulation theory, management science, and software engineering. Two principal

features of abstract models identified by Fishwick are that they are usually less complex and

more comprehensible than the model from which they are derived [Fishwick 1988].

2.3.2 The historical origins of abstraction

Abstraction has provided the foundation that we use for performing mental tasks ever since

human thought began [Kirsanov 1998]. The modern use of abstraction began in the early

twentieth century in a variety of fields [Hooker 1996]. Hooker provides evidence for this

with the examples of abstract art, atonal music, Einstein’s Theory of Relativity [Einstein

1920], and Keynesian economics [Keynes 1936]. In this modern context, abstraction refers

to the view that separate aspects of human experience are independent of each other, and can

hence be reasoned about in isolation.

2.3.3 The application of abstraction

Fishwick [Fishwick 1988] presents abstraction in the context of simulation using the dining

philosophers (DP) problem [Dijkstra 1968]. The models used are a frequency distribution,

finite state automaton, observed data, Petri net [Petri 1962, Peterson 1981], flow graph, and

equations. These models are then presented as an abstraction network, consisting of the

models and abstraction techniques that relate them. For example, a more abstract flow graph

model of the DP system can be derived from the Petri net model using abstraction by

representation. Fishwick describes a number of abstraction techniques, namely: abstraction

by representation, abstraction by induction, abstraction by reduction, total systems

morphism, and partial systems morphism.

In abstraction by representation, an abstract model represents a base model in another form.

Such models are often purely structural and have no behaviour, except as defined by the

more detailed base model. Abstraction by induction involves combining elements from the

base model to form a smaller, more compact representation. Abstraction by reduction is

achieved by deriving a representative summary of the base model. A total systems morphism

(TSM) [Zeigler 1976] is a mapping between all of the elements in the base and abstract

models. A TSM preserves both structure and behaviour. TSMs are well-suited for abstracting

69

discrete representations (e.g. graphs), but less so for continuous systems. A partial systems

morphism (PSM) is a mapping between some subset of the elements in the base and abstract

models. In contrast to a TSM, all structure and behaviour is not necessarily preserved in a

PSM. Sensory (visual) and cerebral abstraction are also discussed; unlike the previous five

techniques, these do not define any mappings. Sensory abstraction aims to produce a model

that is convincing to an audience, but without the attendant complexity of a mapping

technique, for example, particle systems simulating fire or explosions [Reeves 1983].

Cerebral abstraction relates to the way in which humans reason about models. Other methods

of abstraction include geometric model abstraction, where complex geometric elements are

approximated by simpler ones [Clark 1976, Feiner 1985].

It is important that abstractions are evaluated in order to determine their utility. Fishman and

Kiviat [Fishman 1967] define three components of evaluation as verification (ensure the

model is consistent and behaves as intended), validation (test the model against the real

system to assess similarities and differences), and analysis (ensure the output data is

correctly interpreted). Fishwick [Fishwick 1988] defines an abstraction method as being

valid by dint of its definition (i.e. if the definition of the method is valid, then the method

itself is valid). An abstract model is considered valid if it can be either validated empirically

or produced from a valid base model using a valid abstraction technique. An example of an

empirical validation of an abstraction model could be the percentage of human observers

who found the model convincing. Fishwick argues that abstraction models should be

formalised whenever possible.

2.3.4 Abstraction in software engineering

Abstraction is employed in software engineering to help manage the complexity of software

systems. For example, a diagram may be used as an abstraction to illustrate the principal

components of a system. A number of different types of abstraction are used in software

engineering. Functional or procedural abstraction allows a package of program functionality

to be considered as a ‘black box’ with a clearly defined interface and its implementation

hidden [Alexandridis 1986, Liskov 1986]. Iteration or action abstraction is used to express

repeated patterns of program behaviour, such as loop constructs [Zimmer 1985, Liskov

1986]. Data abstraction is based on the idea of ‘abstract data types’, which allow data to be

stored and manipulated through a defined interface without concern for how the raw data is

70

represented [Guttag 1977, Ledgard 1977, Shaw 1984]. Process abstractions are similar to

data abstractions, but include a thread of control [Alexandridis 1986]. In the context of

knowledge-based OO logic programming, Park defines object abstraction as the

combination of knowledge abstraction (models of knowledge base representation and

control), data abstraction, and connection abstraction (models of object hierarchy and

communication) [Park 1991].

2.3.5 Abstraction in software visualisation

Abstraction is crucial in software visualisation to allow the large quantities of information

involved to be comprehended usefully. The study of software visualisation tools described in

Section 2.2 found that the various tasks typically involved in software comprehension and

reverse engineering efforts are best addressed at different levels of abstraction. The work

also showed that most extant software visualisation tools operate at only one or two such

levels (as measured on the five-level abstraction scale proposed). Consequently, it is

currently necessary to utilise several tools in combination in order to address satisfactorily

the full range of software comprehension tasks.

2.4 Effective presentation techniques for software visualisation

It is clear from the foregoing discussion that the technique used to present the results of the

data analysis is a crucial component of the software visualisation process. This section

discusses diagram types for presenting software visualisations and view arrangements to

organise them.

2.4.1 Diagrams for describing software

The goal of software visualisation is to present information about the software system under

investigation to the analyst in a format that is useful in helping them to achieve their

software comprehension tasks. A variety of diagram types for describing software systems

have been proposed in the literature and implemented in CASE (computer-aided software

71

engineering) and visualisation tools. A selection of these are listed in Table 2.1 and

discussed below.

Table 2.1 A selection of diagrams for describing software

Structured design diagrams UML extension diagrams

Basic graph Robustness analysis diagram

Petri net Business process diagram

Nassi-Shneiderman diagram

Entity relationship diagram Real time modelling

Control flow diagram System context diagram

Data flow diagram System architecture diagram

Data structure diagram Event sheet diagram

Statechart

XML modelling

Pre-UML OO diagrams XML structure diagram

Booch diagram

Message sequence chart Recent SE literature

InfoBUG, timeWheel, 3D-wheel [Chuah 1997]

UML diagrams Execution pattern [De Pauw 1998]

Class diagram Reflexion model [Murphy 2001]

Object diagram Story board diagram [Fischer 2000]

Sequence diagram SoftArch diagrams [Grundy 2000]

Collaboration diagram Virtual reality [Knight 2000, Maletic 2001]

Component diagram Matrix views, cityscapes [Eick 2002]

Deployment diagram 3D [Martin 2002, Marcus 2003a]

Activity diagram Use case [Riva 2002]

Statechart diagram Visualization in contexts [Yin 2002]

Use case diagram Polymetric views [Bertuli 2003]

DRT [Chan 2003]

2.4.1.1 Structured design diagrams

Before object-oriented techniques became popular in the early 1990s, a number of diagrams

for supporting the traditional structured design process had been proposed. These included

72

Petri nets [Petri 1962], Nassi-Shneiderman diagrams [Nassi 1973], entity relationship

diagrams [Chen 1977], control flow diagrams [Hatley 1987], data flow diagrams [Pressman

2000, Sec. 12.4.1], data structure diagrams [Pressman 2000, Sec. 13.4.7], and statecharts

[Harel 1990].

2.4.1.2 Object-oriented diagrams

The advent of the object-oriented paradigm produced a new set of diagrams. These included

Booch diagrams [Booch 1994] and message sequence charts (MSCs) [ITU-T 1996]. A

popular set of OO diagrams is that defined by the Unified Modeling Language (UML)

[Rumbaugh 1999, OMG 2003c]. UML version 1.5 defines a set of nine diagrams for

describing various aspects of the analysis, design, and implementation of software, which are

popular during the forward engineering process. These diagrams consist of boxes

representing entities (e.g. classes, objects, components), connected by arcs representing

relationships (e.g. inheritance, communication, dependency). Selonen et al. [Selonen 2001]

discuss transformations between UML diagram types. Burd et al. [Burd 2002] describe an

experiment demonstrating that animation aids understanding of UML sequence diagrams.

UML models are essentially graph-based, and basic graphs (with one type of node and one

type of edge), such as call graphs, can also be used to represent software (e.g. in the Program

Explorer tool [Lange 1995b]). MSCs were a precursor to UML sequence diagrams, while

UML statechart diagrams are derived from Harel’s statecharts.

The UML diagrams described above are implemented in many popular CASE tools. A

number of additional diagrams that are not part of the UML standard, such as robustness

analysis diagrams and business process diagrams, as well as diagrams intended specifically

for modelling real-time systems, such as system context diagrams, system architecture

diagrams, and event sheet diagrams, and XML (XML structure diagrams), are also available

in some tools.

73

2.4.1.3 Recent literature

The recent software engineering literature has also proposed a number of diagrams. De Pauw

et al. [De Pauw 1998] describe a variation of the MSC called an execution pattern that

incorporates colour and emphasises time rather than control flow. Murphy et al. [Murphy

2001] present reflexion models for modelling high-level system entities. Fischer et al.

[Fischer 2000] describe story board diagrams (SBDs), which combine aspects from three

UML diagrams. Grundy and Hosking [Grundy 2000] have implemented the SoftArch

environment for architectural visualisations. Knight and Munro [Knight 2000] discuss the

use of virtual reality environments for modelling software. Martin et al. [Martin 2002] use a

three-dimensional environment to illustrate component dynamics. Riva and Rodriguez [Riva

2002] incorporate a basic use case visualisation into their approach. Yin and Keller [Yin

2002] use a UML-based notation in their visualization in contexts technique. Bertuli et al.

[Bertuli 2003] describe polymetric views, which are annotated with measurements collected

from software. Chan et al. [Chan 2003] enhance their visualisations with application

screenshots.

The survey in Section 2.2 revealed that the diagrams used in the extant software visualisation

tools address only a single abstraction level, or a small range. Arranging diagrams in an

interrelated hierarchy encompassing the entire range of abstraction levels would increase

their utility and aid comprehension, as all levels of abstraction could be addressed

conveniently.

2.4.2 Views for software comprehension

Diagrams, such as those described in the previous section, are used to illustrate models of

software. Different views of a software model are possible - these views are implemented

using diagrams. It is proposed in this thesis that there are six possible arrangements of views

onto a software model, illustrated in Figure 2.5, namely: (a) a single view illustrating a single

facet2; (b) multiple independent3 views illustrating a single facet; (c) multiple interdependent

2 A facet in this context is taken to mean a (interesting) property of a software system, such as its structure or behaviour [Jahnke 2002]. 3 The views are independent in the sense that there is no coordination between them. Two models of the same system may be implicitly dependent on each other unless they refer to disjoint parts of the system.

74

views illustrating a single facet; (d) a single view illustrating multiple facets; (e) multiple

independent views illustrating multiple facets; and (f) multiple interdependent views

illustrating multiple facets. This categorisation distinguishes view arrangements by the

number of views (one or multiple), the number of facets (one or multiple), and their

relationship (independent or interdependent) The remainder of this section describes these

arrangements in more detail.

View A View B2

View F1View F2

View F3

View E1

View E2

View E3

Single viewSingle facet

Single viewMultiple facets

Multiple interdependent viewsMultiple facets

Multiple independent viewsMultiple facets

a b

c

e

Multiple independent viewsSingle facet

f

View B1

View B3

View D

Multipleinterdependent

viewsSingle facet

dView F1

View F2

View F3

Figure 2.5 The six arrangements of views onto a software model. The rectangles around the views in

parts c and f represent the coordination inherent in such interdependent arrangements

2.4.2.1 A single view illustrating a single facet

This arrangement illustrates a single facet of the software system in one view. A single facet

may not illustrate all of the information necessary for comprehension, but for a specific task

it may be sufficient. A single view provides the analyst with only one perspective of the facet

under investigation, which may restrict exploration. The reference implementation of the

75

Dali workbench [Kazman 1999] and jRMTool [Murphy 2001] implement this arrangement.

In the case of these tools, either structural or behavioural facets can be visualised.

2.4.2.2 Multiple independent views illustrating a single facet

This arrangement uses multiple views to illustrate a single facet of the software system.

Multiple views give the analyst a number of perspectives of the facet, and may improve the

navigability of the model (cf. Baldonado et al.’s ‘Rule of Diversity’ and ‘Rule of

Complementarity’ [Baldonado 2000]). However, the lack of relationships between the views

can cause the analyst cognitive difficulties in reconciling the multiple views and transferring

information between them (cf. Baldonado et al.’s ‘Rule of Parsimony’ [Baldonado 2000]).

An example of this arrangement would be the use of a number of single view, single facet

tools in combination to visualise a single facet from multiple views. For example, the Dali

and jRMTool tools could be used in combination to provide two independent views of

structural or behavioural information.

2.4.2.3 Multiple interdependent views illustrating a single facet

This arrangement also illustrates a single facet of the software system using multiple views.

In this case, the interdependency between views alleviates many of the cognitive difficulties

inherent in the previous arrangement (cf. Baldonado et al.’s ‘Rule of Self-Evidence’ and

‘Rule of Consistency’ [Baldonado 2000]). Such interdependent arrangements are typically

implemented using a Model-View-Controller architecture [Krasner 1988] to maintain

synchronisation between the views and with the model. Scene [Koskimies 1996],

Architecture-Oriented Visualization [Sefika 1996a], ISVis [Jerding 1997], Sced [Koskimies

1998], Ovation [De Pauw 1998], AVID [Walker 1998], Gaudi [Richner 1999], Jinsight [De

Pauw 2002], and Collaboration Browser [Richner 2002a] implement this arrangement. The

facet in these tools illustrates behavioural information.

76

2.4.2.4 A single view illustrating multiple facets

This arrangement presents multiple facets of the software system in a single view. Multiple

facets present more information to the analyst, which may help with comprehension of the

software. However, compressing all the information into a single view can lead to

information overload and reduced comprehensibility (cf. Baldonado et al.’s ‘Rule of

Decomposition’ [Baldonado 2000]). The implementation of story board diagrams described

by Jahnke et al. [Jahnke 2002] is an example of this arrangement. The tool they describe

illustrates structural, behavioural, and data facets.

2.4.2.5 Multiple independent views illustrating multiple facets

This arrangement uses multiple views to illustrate multiple facets of the software system,

with no interaction between the views. While this arrangement combines the benefits of

multiple facets and multiple views, the lack of relationships between the views can cause

difficulties in comprehension as described in Section 2.4.2.2. An example of this

arrangement would be the use of a number of single view, single facet tools in combination

to visualise multiple facets from multiple views. For example, the Dali and jRMTool tools

described above could be used in combination to provide two independent views of

structural and behavioural information.

2.4.2.6 Multiple interdependent views illustrating multiple facets

This arrangement also uses multiple views to illustrate multiple facets, with the addition of

interrelationships between the views. This arrangement has the same advantages as the

previous one, but the interdependency between views aids comprehension as described in

Section 2.4.2.3. This arrangement is employed in Kruchten’s 4+1 View Model [Kruchten

1995] and in work by Hofmeister et al. [Hofmeister 1999b] to illustrate structural,

behavioural, and (in Kruchten’s work) data facets. The Program Explorer [Lange 1995b]

and Shimba [Systä 2001] visualisation tools and the Together CASE tool [Borland 2004a]

implement this arrangement. These tools illustrate both structural and behavioural facets.

77

It appears from the foregoing discussion that an arrangement of multiple interdependent

views illustrating multiple facets of a software system is the most desirable arrangement of

views for software comprehension. Multiple views give a variety of different perspectives on

various facets of the software, while the interdependency between the views aids cognition.

Such an arrangement would allow software to be described conveniently using a set of

diagrams illustrating relevant information at appropriate levels of abstraction. The use of

multiple views in visualisation is discussed in more detail by Baldonado et al. [Baldonado

2000].

2.5 Effective techniques for exploring and querying visualisations

A crucial factor in the usefulness of a visualisation system is the ease with which the analyst

can interact with the visualisation to obtain the information they require. In this thesis the

two principal types of navigation technique observed in the extant visualisation tools are

classified as exploration and querying.

2.5.1 Exploration

A system employing the exploration technique presents the visualisation to the analyst and

allows them to explore it freely. Although giving the analyst complete freedom to explore

the visualisation, the large volume of information typically generated can make it difficult to

find the cogent information required for the analyst’s tasks. The complexities inherent in the

object-oriented paradigm compound this issue. Tools such as ISVis and Together utilise the

exploration technique.

2.5.2 Querying

A system employing the querying technique allows the analyst to specify queries to be

applied to the visualisation and then view the results. Queries can be specified in a textual or

visual query language, such as SQL [ANSI 1998] or MURAL [Reiss 2002] respectively, or

using a GUI. This approach can help the analyst to focus the visualisation on the information

pertaining to their specific tasks. However, the analyst must know enough about the system

78

to be able to form useful queries. Tools such as Gaudi, Collaboration Browser, and BLOOM

utilise the querying technique. Gaudi uses a textual query language, Collaboration Browser

uses a GUI, and BLOOM uses a visual query language.

2.5.3 Guided navigation

There is a third possibility that has not been observed in the extant visualisation tools that in

this thesis is termed guided navigation. A system employing guided navigation would assist

the analyst in achieving their goals by suggesting likely lines of enquiry. A wizard-based

approach may be suitable for this technique.

In practice, some systems employ a combination of the exploration and querying techniques.

Such an arrangement combines the flexibility and scope of exploration with the focussing

power of the querying technique. Guided navigation, possibly using wizards, is an interesting

and complementary alternative.

2.6 Software modelling

This section discusses related work from the field of software modelling. Software modelling

is closely related to software visualisation: the goal of both approaches is to produce a

representation of a software system. In the case of software visualisation, such

representations are visual, whereas in software modelling they may be purely conceptual.

2.6.1 The 4+1 view model

Kruchten describes an architectural model consisting of four views and a set of scenarios for

validating these views [Kruchten 1995]. The logical view describes the object structure of

the system. Representations include Rational/Booch class diagrams or (for data-driven

systems) entity-relationship diagrams (ERDs). The process view describes how the logical

entities of the system are delineated into processing units. Kruchten uses a version of

Booch’s Ada task notation for this view. The development view describes the organisation of

the system’s development into a hierarchy of layers of subsystems. Again, a variation on

79

Booch’s notation is employed. The physical view describes the deployment of the system

amongst processing nodes. It appears that some form of Booch notation is used for this view.

The views are illustrated by examples. Scenarios are detailed in a similar manner to the

logical view, and are accompanied by a script that describes the interaction. There are

interconnections amongst the views, for example between the logical and process views and

the logical and development views. An iterative, scenario-driven approach is used to develop

and refine architectural specifications using the technique.

2.6.2 Hofmeister et al.

Hofmeister describes a method of describing software architecture using four views

consisting of UML diagrams [Hofmeister 1999b]. The views are based on the authors’

experiences with large systems. The conceptual view describes the functionality of the

system. The module view describes the decomposition of the software. The execution view

describes the correspondence between modules and run-time concepts, such as threads. The

code view describes the mapping of logical entities to program files.

The conceptual architecture view consists of components with ports, and connectors with

roles that define how they can connect to ports. A combination of components and ports is

termed a configuration. The conceptual view uses: class diagrams to depict the static

configuration; ROOM (Real-time Object-Oriented Modelling) protocol declarations [Selic

1994, Selic 1998] and sequence or state diagrams to show the correspondence between

protocols and ports; and sequence diagrams to illustrate sequences of interactions among

components.

The module architecture view decomposes subsystems into modules, and assigns them to

layers. There is no configuration in the module view as it describes inherent properties of the

system, rather than a particular instantiation. The module view uses: tables to map elements

between the conceptual and module views; package diagrams to illustrate subsystem

decomposition dependencies, use-dependencies among layers, and the mapping between

modules and layers; and class diagrams to illustrate inter-module use dependencies.

The execution architecture view describes the combination of modules to form a particular

product by assigning them to run-time images. Run-time images are bound to

80

communication paths to form a configuration. The execution view uses: class diagrams to

illustrate the static configuration; sequence diagrams to illustrate the dynamic behaviour of a

configuration, or transition between configurations; and state or sequence diagrams to

illustrate a communication path’s protocol.

The code architecture view consists of files and directories. As in the module view, there is

no configuration, as the relations described apply to all instantiations of the system. The code

architecture view uses: tables to map between elements in the module and execution views

and the code view; and component diagrams to illustrate the dependencies between source,

object, and executable files.

The views are demonstrated using an example. One concern the authors note is that UML

notation can be susceptible to a number of different interpretations due to the lack of a well-

defined semantics. They also note that the use of UML, which is traditionally used to design

implementation classes, to describe architecture risks blurring the distinction between

architecture and implementation. They found that UML was useful for describing static

structure, variability, and particular sequences of activities. The found it to be lacking in

describing correspondences, protocols, ports on components, dynamics, and general

sequences of activities. Although entities can be contained by other entities, the views are

only at one level of abstraction.

2.6.3 ManSART

ManSART (MITRE Software Architecture Recovery Tool) recovers architectural features

from source code by means of a library of ‘recognizers’ [Chase 1996]. ManSART makes use

of ‘operators’ to extract information from views. These operators manipulate views through

a combination of graph operations and ‘containment analysis’ (determining when an element

of a view overlaps or contains another element). Manipulations can be syntactic, derived, or

transformational. Syntactic manipulations provide an alternative representation of the

information. Derived manipulations create new views based on combinations of components

and connectors in existing views. Transformational manipulations produce a new

architecture from the information.

81

Yeh et al. describe the six types of view manipulation possible in ManSART [Yeh 1997]. A

cross-view relation combines the components of one view with the relationships from

another view. Merging combines the information from two views into a single view.

Bundling aggregates relations between components. Building a hierarchy allows the user to

create a hierarchy of views where each component in a view is described in terms of the

components it contains from a lower level view. Finding neighbours creates a subset of a

view by including only the selected component and its immediate neighbours. Finding

connected subsets decomposes a view into connected subgraphs, where each subgraph

becomes a component in the new view.

2.6.4 Zachman framework for enterprise architecture

Zachman’s framework for enterprise architecture consists of thirty views arranged in six

hierarchies [Zachman 1996]. The framework is intended to be a logical structure for

categorising representations of an enterprise that are significant to its management and the

development of its systems. The six hierarchies describe product abstractions for an

enterprise: Data (what it is made of), Function (how it works), Network (where the

components are located), People (who does what work), Time (when things happen), and

Motivation (why choices are made). Each hierarchy consists of five levels which define roles

in the design process: Planner (concerned with system scope, at a contextual level), Owner

(concerned with the business model, at a conceptual level), Designer (concerned with the

system model, at a logical level), Builder (concerned with the technology model, at a

physical level), and Subcontractor (concerned with detailed representation, out of context).

The intersection of a row/column represents the interaction of a design role with a product

abstraction. For example, the intersection of the Function hierarchy with the Builder role

corresponds to the System Design, which illustrates data elements and sets (the I/O) between

computer functions (the processes).

2.6.5 IEEE Recommended Practice for Architectural Description

IEEE P1471 Draft Recommended Practice for Architectural Description defines a

conceptual framework for describing software architectures [Hilliard 1999]. The central

abstraction of this standard is the concept of an Architectural Description – this is a

82

collection of artefacts that document the architecture of a System. An Architectural

Description is organised into one or more architectural Views. Each View conforms to a

Viewpoint, which embodies the rules governing the view. A View consists of one or more

architectural Models – the representational scheme (e.g. UML) of the model is not defined

by the standard, and each model may use a different scheme. A System exists in an

Environment and has an Architecture and one or more Stakeholders. Each Stakeholder has

one or more Concerns – aspects of the system that are important to them (e.g. security,

reliability).

2.6.6 Other approaches

Issarny et al. [Issarny 1998] propose four views for software architecture: Functional

showing the operations provided by the system; Interaction showing the protocols used in

interacting with the system; Efficiency showing how efficiency could be improved; and

Dependability showing fault tolerance mechanisms. In combining these views, they

concentrate on the issues of consistency and structure. They list some considerations to be

taken into account to ensure the former. The latter is stated as being ongoing work – they are

developing a tool to produce CORBA architectures from specific views.

The approach of Waters et al [Waters 1999] does not specify pre-defined views (unlike

Kruchten), but integrates architectural views generated by other tools or from

documentation. This approach allows non-disjoint views to be combined (‘fused’) to check

for commonality, consistency, and to create compositions (present structural information

from multiple views as a single view).

Bergey et al’s horseshoe model describes the architecture recovery process [Bergey 1999].

Changes to the system can be made at the ‘code structure’, ‘function’, or ‘architectural’

levels. There is no explicit mapping or interaction defined between the levels of the model.

2.7 Evaluation

Evaluation is a crucial component of any project. In order to assess the results of the project

evaluation criteria are required, by which it can be determined what has been achieved.

83

Without evaluation, it is impossible to state whether the project was successful or not, or to

draw any conclusions from the results. Before embarking on a project the criteria by which it

will be evaluated should be established. This provides a predetermined basis for assessing

the project on completion.

Empirical evaluation is evaluation consisting of experimentation, rather than purely

theoretical analysis [Basili 1996, Perry 2000]. Empirical studies are useful for testing

hypotheses, as they provide experimental data that can be analysed. Empirical studies are

often used in software engineering to perform more realistic evaluations than are possible

with purely theoretical methods.

This section will discuss a number of methods for the evaluation of software visualisation

and software comprehension tools. The techniques discussed are also more widely applicable

in other areas of software engineering and beyond.

2.7.1 Globus and Uselton (1995)

Globus and Uselton discuss the evaluation of scientific visualisation software [Globus 1995].

Although their analysis is concerned with modeling physical systems, such as the field of

computational fluid dynamics, there are important points that are relevant to software

visualisation. They observe that visualisation is becoming widely used, and that for a

visualisation to be useful it is important to be able to evaluate it. However, they also note that

there is a lack of evaluation in the visualisation community at the time of writing.

Firstly, they discuss the need for standardised test suites consisting of test data and a set of

tests of particular visualisation techniques and functions. In the case of software

visualisation, the test data would be provided by a representative sample of systems to which

the visualisation would be applied. The wider the range of the systems, the more generally

applicable and useful the validated visualisation is likely to be.

Secondly, they highlight the importance of the effect of error in visualisation. Errors should

be minimised and it is important to characterise (recognise and, where possible, quantify)

any error in order to provide an accurate visualisation. Although the potential for error is less

in software visualisation than in the visualisation of continuous physical systems, it is

84

important to recognise the possibility of inaccuracies in the visualisation (e.g. caused by an

inappropriate or misleading abstraction technique).

Finally, Globus and Uselton discuss the evaluation of visualisations using human subjects.

They argue that as the purpose of visualisation is to improve human insight into data,

humans are best suited to evaluate the performance of a visualisation in achieving this.

Although insight cannot be measured directly, task performance can be used an indicator of

this. They state that is it easier to perform experiments that compare two visualisation

systems than experiments intending to evaluate or characterise a single system. Given the

experimental results, it may be possible to predict performance in related tasks, and to

predict the effect of making changes to the visualisation system. As with all subject-based

empirical evaluation, the choice of subjects is crucial and must be representative of the

intended user base of the visualisation.

2.7.2 Murphy et al. (1996)

Murphy et al. describe an evaluation of five static call graph extractors [Murphy 1996a,

Murphy 1998]. The tools analysed were cflow (a standard Unix tool), CIA [Chen 1990],

Field [Reiss 1995], mkfunctmap [Hoagland 1995], and rigiparse [Müller 1988]. These tools

were chosen as they are readily available, extract calls from C code in textual form, and all

run on the same platform (SunOS on a Sun SPARC). The aim of the evaluation was to

compare both quantitatively and qualitatively the call graphs produced by the tools. Three C

systems were used for the case study: mapmaker (a molecular biology application) [Lincoln

1993], mosaic (a web browser) [NCSA 2003], and gcc (the GNU C compiler) [GNU 2004];

these were intended to represent a variety of application domains.

Call graphs were generated for each of the three applications by each tool. The results were

then compared quantitatively (pairwise) to determine the number of calls detected by both

tools, and by one tool but not the other. Of course, a higher number of calls detected does not

make one tool better than another. The differences in calls detected are attributable to the

analysis algorithms employed by the tools (which are often not elucidated). The results were

also sampled and qualitatively analysed to assess the numbers of false positives (a call

detected where one does not exist) and false negatives (a call not detected where one does

85

exist) in the results. It was determined that all of the tools generate both false positives and

false negatives. It appears that the study was conducted by the authors.

The use of a number of different application types makes both the experimental procedure

and results more generally applicable than if only a single system, or type of system, was

considered. The use of objective evaluation criteria (number of calls detected, number of

false positives, etc.) increases the validity of the study.

2.7.3 Bellay and Gall (1997)

Bellay and Gall describe an experiment to evaluate the capabilities of four reverse

engineering tools [Bellay 1997, Bellay 1998]. The tools analysed were Refine/C [Reasoning

1994], Imagix 4D [Imagix 2004], Rigi [Müller 1988], and Sniff+ [Wind River 2003]. The

aim of the case study was to investigate the capabilities of the tools, and identify their

advantages and disadvantages in terms of applicability to embedded software, usability, and

extensibility. An industrial embedded train control system was used for the case study,

containing approximately 150 KLOC.

The assessment criteria were expressed as a checklist, which was formulated based on the

authors’ experience in applying the tools during the case study. The consequence of this is

that the checklist is likely to be biased towards the system and tools analysed in the case

study. The checklist was delineated into four categories: Analysis, Representation,

Editing/Browsing, and General Capabilities. The Analysis category is concerned with the

functionality and performance of the parser. The Representation section is concerned with

the features of the representation used and how quickly it is generated; for textual reports,

sorting is examined while for graphical reports the view type and editing facility is

examined. The Editing/Browsing category is concerned with text editor integration and

speed, and other user interface facilities, such as search and history functions. The General

Capabilities section covers support for multiple platforms and users, extensibility, and

storage, output, history, search, and help facilities.

The study found that the different tools are best suited to different usage contexts. The

general result is that tool performance and capabilities depend on the case study and

application domain as well as the purpose of the analysis. It appears that the evaluation was

86

carried out by (one of) the authors. It is not clear exactly what was done in terms of

analysing the software system in order to exercise the tools’ functionality.

2.7.4 Armstrong and Trudeau (1998)

Armstrong and Trudeau perform an evaluation to compare the functionalities of five

architectural extraction tools [Armstrong 1998]. The tools analysed were Rigi, Dali [Kazman

1999], PBS [Finnigan 1997], CIA, and SNiFF+. The aim of the evaluation was to assess the

extraction, classification, and visualization features of the tools. Two C systems were used

for the case study: the CLIPS expert system tool [Riley 2003], and a small test program that

was intended to be problematic for the tools to parse.

Similar to the study by Belay and Gall described in Section 2.7.3, the evaluation criteria

were in the form of a checklist. The criteria were devised based on the authors’ experiences

of using the tools. As in the Bellay and Gall study, this makes it likely that the assessment

criteria will be biased towards the particular tools and systems used for the case study, hence

making both the results and the assessment criteria themselves difficult to generalise.

The Extraction checklist assessed the functionality of the tools’ parsers, such as the

exclusion of library calls, the contents of C structs, and recursion. The Classification

evaluation did not have an explicit checklist associated with it, but was based on using the

tools to generate meaningful abstractions from the extracted data. The Visualization

checklist was concerned with issues such as the types of nodes and edges available, and the

navigation functionality.

Similar to the Bellay and Gall study, it was found that various features from the different

tools are useful but no one tool integrates all of the features that would be desired. Again, it

appears that the evaluation was carried out by (one of) the authors. It appears that general

exploration of the software systems was performed in order to exercise the tools’

functionality, though there is little detail about this.

87

2.7.5 Storey et al. (1996)

Storey et al. [Storey 1996a] describe the preparation and execution of an empirical study to

assess the usabilities of two interfaces to the Rigi reverse engineering tool [Tilley 1994,

Wong 1995]. The aim of the study was to compare the interfaces to each other and to

standard Unix command line tools (vi and grep). Three C game programs of similar

complexity but varying size (300-1700 LOC) were used in the evaluation. Storey et al.

evaluated the usability of the user interfaces by observing users completing a set of software

maintenance tasks followed by a questionnaire and an interview. The small tasks involved in

the Storey et al. study were intended to be typical of those performed by software

maintainers working towards a larger goal; a trade-off was necessary between experiment

time and task complexity. The tasks were divided into two groups of four tasks, ‘abstract’

and ‘concrete’, which were concerned with high- and low-level understanding, respectively.

The importance of experimental setup is stressed; a ‘dry run’ was conducted in advance,

which helped to refine the experiment. The subjects were given training in each of the

interfaces beforehand. In addition to the questionnaire and interview, the participants were

observed performing the tasks. Appropriate statistical tests were applied to the results. In

addition to useful results in terms of the relative usabilities of the tool interfaces, Storey et al.

also identify a number of improvements to the experiment, namely the need for a larger user

group, more tasks, longer time, and greater experimental control.

2.7.6 Sim and Storey (2000)

Sim and Storey describe a structured tool demonstration in which several reverse

engineering tools were evaluated using a common software system and set of analysis tasks

[Sim 2000a]. The aim was to provide a fair comparative demonstration of the capabilities of

the tools. They argue that tool evaluations in the literature tend to be ad hoc, tools are rarely

evaluated formally by users, and when they are evaluated it is for only a short time by people

unfamiliar with the tool. Such potential users often assess tools on superficial observations,

such as appearance or feature set, rather than factors such as ease of use or scalability. While

useful, the results of case study evaluations are often difficult to generalise. The structured

demonstration was intended to address some of these shortcomings in evaluation. The three

main contributions were the establishment of an evaluation benchmark for reverse

88

engineering tools, the combination of usability assessment with benchmarking, and the

development of a package of materials to facilitate future tool evaluations.

The six tools evaluated in the study were Lemma [von Mayrhauser 1999], PBS, Rigi, TkSee

[Singer 1997], Visual Age C++ [IBM 2004A], and Unix command-line utilities. Each tool

was used by a team of expert users, who were monitored by industrial observers in an

attempt evaluate the usefulness of the tool in industrial software maintenance. The teams

were presented with two reverse engineering tasks (‘Documentation’ and ‘Evaluate the

structure of the application’) and three maintenance tasks (‘Modify the existing command

panel’, ‘Add a new method for specifying arcs’, and ‘Bug fix: loading library objects’) to be

performed on the xfig 3.2.1 utility [Xfig 2003], which consists of 50 KLOC of ANSI C. The

teams were then asked to present the results of their investigation. The documentation

generated was less than expected, and there were some differences of opinion between the

teams. A number of issues regarding the tools themselves were also uncovered.

The evaluation found that different tools are best suited to different tasks, and it would be

useful to combine features from a number of tools. Additionally, it is important to

understand what the tool will be used for and select an appropriate tool accordingly. It is also

important that the cost of introducing the tool can be justified. Certain users may be biased

towards certain types of tool based on past experience. The organisers noted that a pilot

evaluation would have been helpful in refining the experimental design, and is an important

phase in any experimental evaluation. They also state that more inter-tool evaluations would

have been interesting, more explicit instructions may also have been helpful, and more time

(longer than one day) may have been desirable. The networking and teambuilding fostered

by the collaborative demonstration was also noted. A future demonstration is planned based

on parsing tools, which was an area that many of the teams had problems with.

2.7.7 Sim et al. (2000)

Sim et al. also present a number of observations regarding maintenance tools based on

structured demonstration [Sim 2000b]. The aim was the same as that of the structured

demonstration by Sim and Storey discussed in Section 2.7.6: to compare the capabilities of

the tools. Three of the tools from the Sim and Storey evaluation (Rigi, PBS, and Unix

utilities) were supplemented by two tools from the Workshop on Algebraic and Graph-

89

Theoretic Approaches in Software Reengineering 2000 (GUPRO [GUPRO 2004] and

Bauhaus [Koschke 2003]). Remarks on the tools and their application covered parsing,

flexibility, quantity of extracted data, experience, and reasons for participation, while

remarks on the demonstration scenario addressed the issues of tool selection, educational

value, fairness, replication, and results scalability. A collaborative reengineering exercise is

planned in which tools will be combined to address tasks.

2.7.8 Storey et al. (2000)

Storey et al. describe a subject-based evaluation of how program understanding tools affect

users’ comprehension strategies [Storey 1997, Storey 2000]. Thirty subjects were observed

carrying out a number of software comprehension tasks using the Rigi, ShriMP [Storey

1996b], and SNIFF+ tools. The goals of the study were to: examine the factors affecting the

subjects’ choice of comprehension strategy; observe whether the tools enhanced the subjects’

preferred comprehension strategy; devise a framework to characterise comprehension tools;

and provide feedback for tool developers. A number of comprehension strategies have been

proposed in the literature, such as: bottom-up [Shneiderman 1980, Pennington 1987], top-

down [Brooks 1983, Soloway 1984], knowledge-based [Letovsky 1986], systematic

[Littman 1986, Soloway 1988], as-needed [Littman 1986, Soloway 1988], and integrated

[von Mayrhauser 1995].

Participants were assigned randomly to one of the three tools, and were asked to complete a

number of comprehension tasks relating to an implementation of the Monopoly game [Brady

1974]. Each two-hour participant session consisted of orientation, training, practice tasks,

formal tasks, a post-study questionnaire, and a post-study interview and debriefing. During

the orientation phase, the outline of the experiment was explained to the subjects. The

training phase was used to familiarise the subjects with the basic functionality of the tool

they were to use. A number of practice tasks were used to allow the subjects to acquaint

themselves with using the tool. The formal tasks were observed and videotaped, with the

subjects encouraged to think aloud as they worked through the tasks. The questionnaire

consisted of questions regarding the tools’ usabilities. Finally, the interview and debriefing

was intended to stimulate further thoughts from the subject that may not have been expressed

during the experiment.

90

Statistically significant results were obtained regarding the usabilities of the tools and the

extents to which they supported each subject’s comprehension strategy. In general, it was

found that the tools did enhance the subjects’ preferred comprehension strategies while

carrying out the tasks, though there were instances where users were hindered by the tools.

In this study, participants frequently browsed hierarchies of abstraction. Future work is

intended to study fewer, more experienced subjects with a broader task set over a greater

time period.

2.7.9 Bassil and Keller (2001)

Bassil and Keller describe a questionnaire-based evaluation of visualisation tools [Bassil

2001a, Bassil 2001b]. The questionnaire was available on the web and its location was

publicised via mailing lists, newsgroups, and email. 107 responses were received, concerning

more than 40 tools. This wide user base may make the results more generalisable. The aim of

the study was to assess the functional, practical and cognitive aspects of visualisation tools

that users desire, and how these compare to the functionality available in the various tools.

The questionnaire was designed around a list of properties of software visualisation tools,

extracted from existing taxonomies. This should result in an objective questionnaire. The

questionnaire consisted of two sections; the first for all software visualisation tool users, and

the second for expert users. In the first section, participants were asked about their work

context, the software systems they visualise, functional and practical aspects of software

visualisation tools, and the tool they use. The functional aspects were assessed in terms of a

list of 34 functional properties, such as source code browsing, graph visualisation, zooming,

and program slicing. The second section asked technical questions about the software

visualisation tool used by the participants. Practical aspects were investigated in terms of a

list of 13 aspects, such as tool cost, availability of technical support, ease of use, and

portability. Most questions were closed, though there were fields to allow expanded answers

for some questions.

Statistical analyses were performed on the survey results to reveal trends in the responses.

Although the small number of participants compared to the number of tools makes results for

individual tools insignificant, some statistically significant correlations were identified. The

most interesting correlations in terms of the work presented here were as follows. There was

91

a positive correlation between software system size and desire to visualise it graphically, and

a negative correlation between size and a desire to jump straight to the source code. There

was also a higher correlation between source code visualisation and procedural systems than

object-oriented systems. Lastly, there were high correlations between analysing object-

oriented software and the desire for hierarchical representations, and between OO software

and the ability to navigate across hierarchies.

Areas for improvement were identified as finer grained choices, more open questions, and

different surveys for different visualisation tool types. Future work is to include additional

statistical analyses (e.g. factor and cluster analyses), more targeted surveys, and industrial

integration.

2.7.10 Hatch et al. (2001)

Hatch et al. describe the strengths and weaknesses of four strategies for software

visualisation evaluation [Hatch 2001]. Guidelines and frameworks can be useful during the

initial formulation of a visualisation, but can also be used during evaluation if appropriate.

Care must be taken to avoid ‘self-measurement’ when a visualisation is constructed

according to a set of guidelines then evaluated against those same guidelines. Feature-based

evaluation frameworks are useful for assessing the features of a visualisation against a set of

questions. Care must be taken to select appropriate questions and question types. Also,

current frameworks often omit potential negative features of the visualisation. Scenarios and

walkthroughs allow a visualisation to be evaluated according to specific tasks, though it is

easy to show the visualisation in its best light, while hiding undesirable features. However,

the evaluation is influenced by the user and their biases. User and empirical studies can be a

valuable source of evidence for evaluation. However, these entail overheads such as

selecting and training subjects and analysing the results, and are also subject to user bias. It

may be possible to perform statistical analyses on the results, though care must be taken

when attempting to generalise any findings.

92

2.7.11 Knight (2001)

Knight discusses briefly some considerations to be taken into account when deciding

whether or not a visualisation is effective [Knight 2001]. This is expressed in the form of an

equation: effectiveness = suitability for task(s) + suitability of representation, metaphor, and

mapping based on the underlying data. It is important to take into account influences from

the domain for which the visualisation was designed, and also the dataset that it was intended

to visualise.

2.7.12 Kollmann et al. (2002)

Kollmann et al. evaluate four static UML-based reverse engineering tools [Kollmann 2002a].

The tools evaluated were Together [Borland 2004b], Rational Rose [IBM 2004b], Idea

[Kollmann 2002b], and Fujaba [Wikman 1998]. The aim of the case study was to compare

the class diagram generation facilities of the tools. The Java-based Mathaino legacy user

interface migration tool was the subject of the case study [Kapoor 2001].

The tools were assessed quantitatively by examining various properties of the class diagrams

they produced from the program code, such as the number of classes, types of associations,

multiplicities, and role names. The results were compared by performing model operations

using the BMO Toolkit [Koskinen 2001]. While basic diagram generation results were

broadly similar across the tool set, the research tools were able to handle more advanced

diagram concepts than the industrial tools, such as multiplicities, inverse associations, and

container resolution. It appears that the investigation was carried out by the authors, though

there is little information on the experimental procedure. The use of the BMO Toolkit to

compare the quantitative results should result in an objective comparison of the tools’

capabilities, provided the chosen measures provide an accurate reflection of the tools’

capabilities.

2.7.13 Conclusions

This chapter has provided a foundation for the thesis by discussing software visualisation

techniques; comparing the extant software visualisation tools; exploring the concept of

93

abstraction; presenting diagrams for visualisations, views to organise them, and techniques

for exploring and querying visualisations; reviewing related work from the field of software

modelling; and surveying evaluation techniques from the fields of software comprehension

and visualisation.

The basic software visualisation techniques presented demonstrate the requirement for

appropriate extraction, analysis, and presentation mechanisms for software visualisations. An

abstraction scale and assessment criteria were presented in order to compare the extant

software visualisation tools. The principal conclusion from this comparison was that the

current tools address only a single level of abstraction, or a very small range of levels, and

thus several of the existing tools would require to be employed in order to address the full

range of software comprehension tasks. This conclusion led on to a discussion of the concept

of abstraction and its application in software engineering and visualisation. A range of

diagram types observed in the extant tools and recent literature were then discussed, and it

was concluded that arranging such diagrams in an interrelated abstraction hierarchy would

increase their utility and aid comprehension, in line with the conclusions of the tool

comparison. This conclusion led on to an analysis of view arrangements for organising

visualisations; it was concluded that the most effective arrangement was that of multiple

interdependent views illustrating multiple facets. Techniques for exploring and querying

visualisations were discussed briefly; again, some of these techniques were observed in the

comparison of the extant visualisation tools. A discussion of relevant work from the related

field of software modelling was then presented; this discussion provided a useful perspective

from the point of view of the underlying model, in contrast to the externally observed

visualisation.

A variety of evaluation techniques from software comprehension and visualisation studies

were discussed. This survey consisted of both qualitative and quantitative studies, ranging

from small-scale studies conducted by the authors, through medium-scale studies consisting

of around 10-30 participants, to large-scale studies of over 100 participants. The evaluation

techniques included standard tests, checklist, specific tasks, interviews, observation,

questionnaire, and scenario walkthrough. Murphy et al.’s use of a broad base of application

types makes both their experimental procedure and results more generalisable [Murphy

1996a, Murphy 1998]. Their use of objective evaluation criteria increases the accuracy of the

study. The use of a specific checklist, for example by Bellay and Gall [Bellay 1997, Bellay

1998] and Armstrong and Trudeau [Armstrong 1998], reduces the generalisability of their

94

results. Storey et al. [Storey 1996a, Storey 1997, Storey 2000], Sim and Storey [Sim 2000a],

and Sim et al. [Sim 2000b] analyse performance in typical software comprehension tasks as

a basis for tool evaluation, and advocate the use of multiple subjects for studies. The use of a

questionnaire, for example by Bassil and Keller, allows a broader base of participants [Bassil

2001a, Bassil 2001b]. The use of an automated technique to compare quantitative study

results by Kollman et al. should improve the accuracy and objectivity of the analysis

[Kollmann 2002a].

The survey by Bassil and Keller reveals that as the size of software grows, analysts are more

likely to employ graphical visualisations, and also that analysts are less likely to go straight

to the source code [Bassil 2001a, Bassil 2001b]. This survey also shows that analysts are less

likely to examine source code visualisations for object-oriented software, which implies that

visualisations at a higher level of abstraction than source code are most useful in the context

of OO systems. The studies by Storey et al. [Storey 1997, Storey 2000] and Bassil and Keller

both describe users navigating hierarchies of abstraction.

This chapter has presented related work and compared the extant software visualisation

tools. In order to assess the capabilities of these tools comprehensively an empirical study is

required that will allow us to compare the real world performance of these tools objectively.

Such a study will also highlight areas of improvement and potential for future work.

95

3 Initial Study

“Descriptions of visualisation systems rarely specify any particular task that they are intended to support.” M Petre, A F Blackwell, T R G Green [Petre 1997]

3.1 Introduction

This section describes a study to evaluate a selection of the extant software visualisation

tools. The motivation for this work was the lack of use of software visualisation tools in

industry despite their apparent potential. Therefore, the tools will evaluated by assessing

their performance in a variety of software visualisation tasks. This is the approach taken in

the studies by Storey et al. [Storey 1996a, Storey 1997, Storey 2000], Sim and Storey [Sim

2000a], and Sim et al. [Sim 2000b] described in Section 2.7. The tasks take the form of

questions that an analyst would find it useful to be able to ask about a software system. The

questions are divided into two sets: general software comprehension questions consider the

entire system, and are typical of those that would be asked in a general software

comprehension effort; specific reverse engineering questions address only a part of the

system, and are typical of those asked while carrying out a specific reverse engineering task.

The goal of this study is to assess and compare the capabilities of the extant software

visualisation tools to determine where there is scope for improvement and hence future

research. The study was carried out by a single analyst who attempted to use the tools to

address the questions. Further detail is presented in the lab book in Appendix A.

The JHotDraw semantic drawing editor framework [Gamma 1998] was chosen for this case

study as it a reasonably complex, real-life application framework typical of the type of

system that would be subject to software comprehension and reverse engineering efforts.

HotDraw is also widely used as a case study in the literature.

The tools evaluated were Together diagrams, Jinsight, jRMTool, AVID, and Together

debugger. These tools were chosen as implementations capable of analysing Java programs

96

were available.4 All tasks were carried out on a minimally loaded AMD Athlon XP 2100+

machine with 512MB RAM running Windows 2000 Professional.

3.2 Generic questions

These generic questions can be reused for the evaluation of any type of software

comprehension tool in the context of any specific system. The general software

comprehension questions are immediately reusable, while the specific reverse engineering

questions can be instantiated within the context of the system being used for the evaluation.

3.2.1 General software comprehension questions

The following questions are intended to be typical of those asked during the course of a

software comprehension effort. Questions G1-G6 are inspired by the six ‘overall

understanding’ questions of Systä et al. [Systä 2001, p.378]. Questions G7 and G8 address

issues that are particularly relevant to framework reuse, while G9 is an important software

comprehension issue.

G1 What is the static structure of the software system?

G2 What interactions occur between objects at runtime?

G3 What is the high-level structure/architecture of the software system?

G4 How do the high-level components of the software system interact?

G5 What patterns of repeated behaviour occur at runtime?

G6 What is the load on each component of the software system at runtime?

G7 What design patterns are present in the software system's implementation?

G8 Where in the software system are the hotspots where additional functionality can be

added?

G9 What impact will a change made to the software system have on the rest of the software

system?

4 Although Dali is retargetable, a tool to provide dynamic information from Java programs in the format required by Dali was not available.

97

3.2.2 Specific reverse engineering questions

The following questions are intended to be typical of those asked during the course of a

specific reverse engineering effort. Questions S1, S2, and S6 are inspired by the ‘goal-driven

reverse engineering’ and ‘object/method behaviour’ questions of Systä et al. [Systä 2001,

p.378]. Questions S3, S4, and S5 address issues typically encountered in framework

comprehension [Kirk 2001] and are typical maintenance activities.

S1 What are the collaborations between the objects involved in an interaction?

S2 What is the control structure in an interaction?

S3 How can a problem solution be mapped onto the functionality provided by the software

system?

S4 Where is the functionality required to implement a solution located in the software

system?

S5 What alternative functionalities are available in the software system to implement a

solution?

S6 How does the state of an object change during an interaction?

3.3 Specific reverse engineering questions specified for JHotDraw

The system used for this case study was an orrery simulation consisting of 133 classes

constructed as a sample solution to a final year undergraduate software architecture

assignment at the University of Strathclyde, Glasgow. A JHotDraw drawing editor consists

of a drawing containing figures and connections between them, and a set of tools for creating

and manipulating the drawing elements. The orrery application is shown in Figure 3.1. The

source code for the application was available in the Orbit and CH.ifa.draw.*

packages. Javadoc documentation was available for the JHotDraw classes, but not for the

orrery extension. The coursework assignment worksheets provided some background to the

application functionality. The following questions instantiate the specific reverse engineering

questions for the JHotDraw domain.

98

Figure 3.1 The orrery application. The circles represent astronomical bodies, such as planets and

moons, coloured according to their diameter. A blue border around a planet represents atmosphere.

The satellite icons represent satellites. The directed arcs indicate gravitational attraction. The toolbar

on the left is used to select diagram objects, and to create planets, satellites (orbiting and non-

orbiting), atmosphere, and gravity

J1 A common problem in JHotDraw applications is the display not being updated as desired

when a change is made to the model. For example, attempting to move a planet (represented

by an object of type Figure) in an orrery application may not be reflected in the display.

To understand this problem, it is necessary to investigate the redraw mechanism of

JHotDraw. The redraw mechanism is an interaction consisting of a sequence of object

collaborations.

(Answer: The correct sequence of method calls is Figure.willChange(),

Figure.invalidate(), Figure.changed(), then Figure.invalidate().)

J2 When a Figure object is moved or has its dimensions changed, there may be erratic

changes both to this Figure and to other Figure objects to which it is connected (by

ConnectionFigure objects). For example, an orrery application may represent three

planets (as Figure objects) A, B, and C, and gravity between them (as

ConnectionFigure objects), such that A is connected to B, B to C, and C to A. If the

Figure objects are connected such that moving one planet also moves those planets

99

connected to it, then moving A would cause C to move, which would in turn cause B to

move, which would then cause A to move, resulting in an infinite loop of Figure

movements. To understand this problem, it is necessary to investigate the way in which

JHotDraw deals with cyclic constraints such as this. Interactions in JHotDraw can use an

implicit or explicit control structure; the control structure used is important in solving

problems such as this.

(Answer: It is JHotDraw's use of an implicit invocation mechanism to enforce constraints

that causes this problem. This can be circumvented by the use of explicit invocation, as in

the Pert Chart example application provided with JHotDraw.)

J3 JHotDraw applications often require collision detection, so that action can be taken when

two figures 'collide' (i.e. overlap on the diagram). For example, in an orrery application, it

may be desirable to detect when a Figure representing an asteroid crosses the connection

(i.e. overlaps the ConnectionFigure) between two Figures representing a planet and

a satellite orbiting that planet respectively. To understand this problem, it is necessary to

investigate the mechanism by which JHotDraw determines the locations of Figures in a

drawing. Collision detection is not provided natively in JHotDraw; therefore, the solution to

the collision detection problem must be mapped onto the functionality available in

JHotDraw.

(Answer: JHotDraw uses the concept of a ‘display box’ to define the location of a Figure.

This can cause problems by constraining all Figures to be rectangles.)

J4 Question J3 describes how collision detection can be implemented in JHotDraw by testing

when two Figures’ display boxes overlap. For example, if the display box of a Figure

representing a planet in an orrery application overlaps with that of a Figure representing

an asteroid, then a collision would have occurred. In order to implement this solution, it is

necessary to investigate how a Figure’s display box can be obtained, and how display

boxes can be tested to determine whether they overlap. In order to implement the solution to

the collision detection issue described in Question J3, it is necessary to identify the location

of the required functionality in JHotDraw.

(Answer: Figure.displayBox() returns a Figure’s display box as an object of type

java.awt.Rectangle. The Rectangle.intersects(Rectangle) method can

then be used to test if two rectangles intersect.)

J5 When Figures in a diagram are moved or resized, they may also be resized or moved

100

unexpectedly. For example, when moving a planet (represent by a Figure) in an orrery

application the planet may appear larger or smaller than expected, or when resizing a planet

its position may change. To understand this problem, it is necessary to investigate the way in

which Figures are moved and resized in JHotDraw. JHotDraw provides a number of ways

of altering the position and/or dimensions of a Figure, and it is necessary to select the

appropriate functionality.

(Answer: Figure.displayBox(java.awt.Point, java.awt.Point) and

Figure.displayBox(Rectangle) allow both the position and dimensions of a

Figure to be changed in one operation. Figure.moveBy(int, int) can be used to

move a Figure without changing its dimensions.)

J6 When debugging a JHotDraw application, it may be important to examine the internal

state of objects in the diagram. For example, in an orrery application, a Figure object

representing a planet would contain a reference to the mass of the planet it represents. In

order to extract such information, it is necessary to investigate the way in which an object’s

state changes during the course of an execution.

3.4 Together diagrams

Together was successful in producing a model of the static structure of the system in the

form of a class diagram. Its statically derived interaction diagrams could be used to give an

approximation of the runtime behaviour of a single method. There is no functionality for

identifying high-level structural components or interactions, save for what can be determined

by the analyst from the class and interaction diagrams. Behavioural and design patterns are

not automatically identified. The lack of runtime information makes it impossible to measure

the load on system components. There is no way to identify hotspots automatically. Some

idea of change impact analysis can be obtained using the ‘Search for Usages’ function,

which identifies all code locations where an attribute, method, class, interface, or package is

used.

101

Together coped well with the specific reverse engineering questions J1-J5: it was able to

answer the questions on object collaboration, control structure, mapping, and functionality

identification. However, Together’s lack of dynamically-extracted information prevents it

from observing changes to the state of an object at runtime.

The strengths of Together were seen as:

• the comprehensiveness of its diagrams due to their generation from source code; and

• its ‘Search for Usages’ functionality.

Together’s principal weaknesses are attributable to its lack of dynamically-extracted

information:

• while the diagrams are broad in scope they lack depth;

• it is impossible to focus the diagrams on a particular part of the system’s execution;

• it is difficult to know which are the ‘interesting’ methods for which the analyst

should create sequence diagrams;

• sequence diagram generation can be time-consuming;

• references to (methods of) interfaces and abstract classes cannot be resolved to

objects, as the implementing/extending class cannot be determined statically;

• references to subtypes cannot always be fully resolved, as it is not possible to

determine statically whether an object is an instance of the supertype or of one of its

subtypes. For example, a reference in a statically derived sequence diagram may be

to an object of type Figure, which could resolve to an object of any subtype of

Figure (e.g. EllipseFigure) at runtime; and

• the inability to examine internal object state.

3.5 Jinsight

Jinsight was not able to give information on the static structure or high-level architecture of

the system. It provides an array of diagrams for examining dynamic behaviour, but cannot

display behavioural information for high-level components. The execution pattern view was

used to identify patterns of repeated behaviour. The execution view and object histogram can

be used to identify high-activity classes and methods. Jinsight does not support the

identification of design patterns or hotspots for extension. The method histogram and

102

invocation browser can be used in conjunction with the execution view to identify where

methods are used, which would be useful for change impact analysis.

Jinsight was able to answer questions on object collaboration and control structure. The size

of the diagrams made it difficult to identify how a solution could be mapped onto the

framework. The lack of a static view hindered the identification of framework functionality.

Jinsight does not support analysis of objects’ internal state.

The strengths of Jinsight were considered to be:

• a variety of dynamic views;

• accuracy of its diagrams due to dynamically-extracted information; and

• automatic behavioural pattern identification.

The weaknesses of Jinsight were seen as:

• difficulty in focussing the visualisation due to the size of the diagrams;

• lack of a static representation of the software system;

• lack of generality in its diagrams resulting from a lack of statically-extracted

information; and

• the inability to examine internal object state.

3.6 Reflexion models

Reflexion models are at too high a level of abstraction to show basic static structure or object

interactions. The architecture and high-level interactions were clearly shown in the reflexion

model. Only a very general, aggregated impression of patterns of repeated behaviour and

runtime load were evident in the reflexion model. The identification of design patterns and

extension hotspots were both below the level of abstraction provided by the reflexion model.

Change impact can be investigated by altering the input high-level model or the mapping

from source to high-level entities.

Reflexion models are at too high a level of abstraction to illustrate object collaborations,

control flow, alternative functionalities, or object state. They would be useful for mapping

problems at a higher level of abstraction.

103

The strengths of the reflexion model technique are:

• it illustrates the software system architecture;

• it illustrates the high-level interactions in the system; and

• it enables the analyst to validate their model of the system.

The weaknesses of reflexion models were felt to be:

• the reflexion model technique relies on the analyst to provide an adequately accurate

high-level model as input; and

• reflexion models are at too high a level of abstraction for them to answer specific

reverse engineering questions, such as those relating to object interactions or internal

state.

3.7 Together debugger

Although static information is not shown, dynamic information can be output by setting

breakpoints at ‘interesting’ methods or classes. High-level structural and behavioural

information is above the low level of abstraction provided by the debugger. There is no

functionality to detect repeated patterns of execution, or to show runtime component load.

Questions relating to design patterns and extension hotspots are at too high a level of

abstraction to be answered using a debugger. Basic change impact analysis can be performed

by comparing the output from executions before and after the change.

If breakpoints can be accuracy placed at ‘interesting’ methods, questions about object

collaborations and control structure can be answered straightforwardly. The lack of a view of

the whole system makes mapping problems and identifying functionality difficult. The

dynamically extracted nature of the information means that alternative functionalities are not

always apparent, and the lack of full method signatures makes method identification

confusing. The debugger was able to query internal object state conveniently.

The strengths of the Together debugger are as follows:

• the low level of abstraction would be useful for finding code-level errors;

• dynamically extracted information gives precise output;

• integration with source code makes setting and monitoring breakpoints and watches

more convenient;

104

• diagram animation during debugging assists comprehension; and

• the ability to examine internal object state.

The weaknesses of the debugger were found to be:

• the low level of abstraction makes it impossible to answer many higher-level

questions, such as those relating to the system architecture;

• lack of statically extracted information means only a subset of possible behaviour is

shown;

• unlike some other debuggers, such as jdb [Sun 2002], the Together debugger

requires source code, which may not always be available, particularly for legacy

systems;

• it can be very time-consuming to set each breakpoint manually. To obtain

information comparable to that provided by a tracing tool, method breakpoints

would be required at every method in the system; and

• it is often difficult to know where to set breakpoints. Setting breakpoints for every

method would result in information overload.

3.8 Case study summary

A summary comparison of the five software visualisation tools evaluated in the case study is

given in Table 3.1 and Table 3.2. Table 3.1 evaluates the performance of each tool on the

question set; Table 3.2 assesses the performance of the tools on each question. These

comparisons assess each tool’s performance in each task simply as yes/no: if a tool

performed a task sufficiently well, it received a ‘yes’, otherwise a ‘no’.

105

Tabl

e 3.

1 To

ols s

umm

ary

com

paris

on

Too

l

Ext

ract

ion

tech

niqu

e

(Sec

tion

2.1.

6)

Ana

lysi

s

tech

niqu

e

(Sec

tion

2.1.

7)

Pres

enta

tion

tech

niqu

e

(Sec

tion

2.1.

8)

Abs

trac

tion

leve

l

(Sec

tion

2.2.

1.2)

GSC

perf

orm

ance

(/9)

SRE

perf

orm

ance

(/6)

Ove

rall

perf

orm

ance

(/15)

Toge

ther

diag

ram

s

Stat

ic

Abs

tract

ion

UM

L di

agra

ms

2-3

3

{G1,

G2,

G9}

5

{J1,

J2, J

3, J4

,

J5}

8

53%

Jins

ight

D

ynam

ic

(pro

filer

)

Patte

rn re

cogn

ition

,

abst

ract

ion,

susp

ensi

on5

MSC

-bas

ed

2-3

4

{G2,

G5,

G6,

G9}

4

{J1,

J2, J

3, J5

}

8

53%

jRM

Tool

St

atic

A

bstra

ctio

nG

raph

-bas

ed4

3

{G3,

G4,

G9}

0 {}

3

20%

AVID

D

ynam

ic

(pro

filer

)

Abs

tract

ion,

susp

ensi

on

Gra

ph-b

ased

4

3

{G3,

G4,

G9}

0 {}

3

20%

Toge

ther

debu

gger

Dyn

amic

(deb

ugge

r)

Sele

ctiv

e

inst

rum

enta

tion,

susp

ensi

on

Text

ual

11

{G2}

3

{J1,

J2, J

6}

4

27%

5 ‘S

uspe

nsio

n’ re

fers

to th

e ab

ility

to su

spen

d an

d re

sum

e tra

cing

.

10

6

Table 3.2 Questions summary comparison

General software comprehension Specific reverse engineering

Question Success (/5) Question Success (/5)

G1 1 J1 3

G2 3 J2 3

G3 2 J3 2

G4 2 J4 1

G5 1 J5 2

G6 1 J6 1

G7 0

G8 0

G9 4

TOTAL 14/45 (31%) TOTAL 12/30 (40%)

OVERALL 26/75 (35%)

It is clear from Table 3.1 that Together diagrams and Jinsight were able to answer the most

questions (53%), whereas jRMTool and AVID could answer the fewest (20%). Comparing

tools of similar abstraction levels that use different extraction techniques indicates that the

choice of statically or dynamically extracted information does not affect significantly the

number of questions the tool can answer. This was surprising, although a larger case study

involving more tools would be required before any strong conclusions could be drawn from

this result. Table 3.1 also shows that the reflexion model technique is unsuitable for specific

reverse engineering questions whether statically or dynamically extracted information is

used. It would be interesting to assess in this way the performance of a tool that combines

both types of information, such as Shimba (see Section 2.2.10) [Systä 2001]. With an

abstraction level of 2-4, Shimba addresses a wider range of abstraction levels than any of

tools in this case study. This range of abstraction levels, combined with the inclusion of both

statically and dynamically extracted information, should allow Shimba to perform well in

both the general software comprehension and specific reverse engineering questions. Shimba

would be expected to be useful in answering a higher proportion of questions than the tools

considered in this case study. Unfortunately, Shimba was not available for evaluation.

107

Table 3.1 reveals that an abstraction level of around 2-3 is optimal in terms of answering the

most questions. Moving away from this point, for specific reverse engineering questions, the

tools become less effective as their abstraction levels move towards the higher (macroscopic)

end of the scale, while for general software comprehension questions the opposite is true. As

expected, tools that employ abstraction as an analysis technique were able to answer more

general software comprehension questions than the tool that did not (Together debugger).

However, increasing the level of abstraction still further resulted in worse performance in

specific reverse engineering questions than if no abstraction were used. A larger case study

involving more tools is required before further conclusions can be drawn regarding the

effectiveness of the presentation techniques, analysis techniques (other than abstraction), or

dynamic extraction techniques.

Table 3.1 shows that tools employing solely behavioural information (Jinsight and Together

debugger) were not capable of answering questions relating to the structure of the software

system. This implies that a combination of structural and behavioural information is required

to address all tasks.

Table 3.2 shows that the tools were more successful in answering the specific reverse

engineering questions: 40% compared to 31%. It also shows that, on average, a tool could

answer only 35% of the questions. This may imply that a single software comprehension tool

may not be adequate for all tasks. Kazman and Carrière [Kazman 1999] posit that this is the

case for architectural extraction, and Richner and Ducasse [Richner 2002a] say this with

regard to design recovery. However, it may also suggest that tools require a combination of

both statically and dynamically extracted information to perform well in all tasks.

No tools were able to answer either of the general software comprehension questions L7

(What design patterns are present in the software system's implementation?) and L8 (Where

in the software system are the hotspots where additional functionality can be added?). Keller

et al. [Keller 1999] describe the role of the SPOOL environment in assisting an analyst in

locating three design patterns (Template Method [Gamma 1995 pp.325-330], Factory

Method [Gamma 1995 pp.107-116], and Bridge [Gamma 1995, pp.151-161]) in C++ code.

Tonella and Antoniol [Tonella 1999] describe a technique based on concept analysis [Siff

1997] and illustrate its use in identifying instances of the Adapter pattern [Gamma 1995,

pp.139-150]. The work by both Keller et al. and Tonella and Antoniol stresses the role of the

human analyst in identifying design patterns. Demeyer [Demeyer 1998] discusses hotspot

108

identification in Smalltalk HotDraw; the technique employed identified a large number of

false positives. Schauer et al. [Schauer 1999] describe the use of the SPOOL environment in

identifying hotspots in C++ code, and emphasise the importance of the human analyst.

However, Codenie et al. [Codenie 1997] contend that building applications by extending

framework hotspots is too simplistic an approach for real-world problems. These papers

reveal that detecting design patterns and hotspots is a non-trivial task, and one that can

benefit from tool support.

As discussed in Section 2.1.3, the complex interactions typical of object-oriented software

systems mean that dynamic analysis is often more appropriate than static analysis for

software comprehension tasks. However, dynamic analysis captures only a subset of the

possible behaviour of the program. The diagrams produced by dynamic analysis are narrow

and deep, while those produced by static analysis are wide and shallow. Figure 3.2 illustrates

this comparison. The fchild object is an attribute of initial. In practice, fchild is an

instance of one of three Tool subclasses: HandleTracker, DragTracker, or

SelectAreaTracker. This statically extracted diagram shows the three possible

outcomes of the user clicking on some part of a diagram: on either a handle, a figure, or a

blank space, respectively. The diagram cannot show messages that occur after message 1.8

(the call to mouseDown (e, x, y):void), as these depend on the type of fchild,

which is known only at runtime and hence cannot be determined statically. The diagram is

wide but shallow: it shows all of the possibilities. A dynamically extracted diagram would

show one possibility in more detail: it would be narrow but deep.

There are three possibilities for combining the benefits of both static and dynamic

information to produce a suitably wide and deep visualisation. Firstly, the analyst can ensure

that a representative trace is extracted. This raises the questions of how the analyst can

ensure that the trace is representative, and how he knows that it is representative enough for

the task at hand. Most dynamic analysis tools implicitly require the analyst to perform this

function. Secondly, a tool can combine multiple event traces into a single visualisation; this

approach is used in Dali and RMTool. Thirdly, statically and dynamically extracted

information can be combined, as in Shimba. The key problem of ensuring a representative

trace is inherent in dynamic visualisation, even when one of the latter two techniques is

employed.

109

Figure 3.2 The sequence diagram drawn by Together for the

CH.ifa.draw.standard.SelectionTool.mouseDown() method, illustrating the wide and

shallow diagrams produced by static analysis

It is clear from the case study results in Table 3.2 that no one software visualisation tool

answers all questions that are typical of a software comprehension or reverse engineering

effort. Some tasks are less well supported than others, and some tasks are beyond the

capabilities of all the tools. This implies that current software visualisation tools are not

adequate in isolation for supporting software comprehension, and must be employed along

with other software comprehension tools if all typical issues are to be addressed. The above

results also reveal that the application of software visualisation tools in combination can

improve comprehension performance. Tools employing higher levels of abstraction were

more successful in addressing general software comprehension questions, while those using

110

a lower level of abstraction were more useful for specific reverse engineering questions;

tools employing an abstraction level of 2-3 were most generally effective. The results

suggest that a combination of structural and behavioural information may be required to

address all comprehension tasks effectively. The results also suggest that a combination of

statically and dynamically extracted information may improve performance. The

visualisations generated from statically extracted data are more general but less precise than

those obtained from dynamically extracted data: statically extracted visualisations are wide

but shallow, while dynamically extracted visualisations are narrow but deep. The lack of a

single software visualisation tool that performs well in all tasks is likely a large contributory

factor in the lack of use of software visualisation tools outwith the context of research.

Analysts are evidently using alternative types of tool to obtain the information they require

for software comprehension.

3.9 Conclusions

The principal conclusion from this initial study is that the abstraction level of a tool is crucial

in determining which questions it can answer. It is clear that a range of abstraction levels

would be required to address the full range of comprehension questions from the study. It is

also observed that the use of statically and dynamically extracted information, and structural

and behavioural perspectives, allows different questions to be addressed. If all of the tools

were used in combination it should be possible to address almost all of the tasks. Therefore,

as the questions in this initial study were typical software comprehension tasks, a tool that

combines the desirable properties of these individual tools would be expected to perform

well in real-world software comprehension tasks.

111

4 A Novel Software Visualisation Model

“Model-based simulation is like a gem: it is multifacetted”

T I Ören [Ören 1984]

4.1 Background

The initial study to assess the capabilities of software visualisation tools found that no single

tool examined was capable of satisfying slightly more than half of the typical software

comprehension and reverse engineering tasks set. However, if all five of the tools in the

study were used in combination, it should be possible to address 13 out of the 15 tasks.6

However, such an arrangement of multiple independent views would cause the analyst

cognitive difficulties in reconciling the multiple views and transferring information between

them, as described in Section 2.4.2. It is clear from these results that a tool combining the

desirable properties of the individual tools in the previous study would perform well in these

representative software comprehension tasks. It would therefore be reasonable to expect that

such a tool would be useful in real world software comprehension.

4.2 Research hypothesis

It is proposed that a model that supports visualisation of software through a range of

abstraction levels that incorporate structural and behavioural views and integrates statically

and dynamically extracted information will provide effective support for the full range of


6 The remaining two tasks involved automatic framework hotspot and design pattern detection. Though they are amenable to visualisation, these are non-trivial tasks that require a high level of analyst interaction.

112

4.3 A visualisation model for object-oriented software

In order to combine the benefits of these alternative approaches to extraction, analysis,

presentation, and abstraction, this thesis proposes a multifaceted, three-dimensional

abstraction model for software visualisation. Similar to the abstraction scale proposed in

Section 2.2.1.2, the first dimension of the model consists of a number of abstraction levels

from microscopic to macroscopic. This arrangement allows the analyst to explore the

software system at the level(s) of abstraction appropriate to the comprehension task they are

undertaking. The second dimension of the model consists of a number of facets [Jahnke

2002], each representing some property of the system. The use of interrelated facets allows

the analyst to examine a property of the software system individually or in combination,

allowing them to focus the visualisation on the information appropriate to their query.

The model shown in Table 4.1 is proposed. The principal challenges associated with this

model are the way in which information extracted from the software system will be

represented, how view hierarchies will be generated from this information, and the definition

of inter- and intra-hierarchy relationships between views. It will also be important to identify

which views are useful for a variety of comprehension tasks.

Five levels of abstraction are chosen to represent OO systems; the program code can be

considered to be at level 0 as it is the least abstract representation of the software. Structure,

behaviour, and data have been selected as the three facets, as these are the principal elements

of typical OO systems. Classes, packages, and files provide structural abstractions;

procedures, functions, and – in OO systems – methods and interfaces provide behavioural

abstractions; and abstract data types provide data abstractions. (Jahnke et al. [Jahnke 2002]

also use these three facets.) Each abstraction level of each facet is a view and consists of a

name, a description, a set of entities and relationships, and example diagram types that can

be used to illustrate information from the facet at the specific level of abstraction. It is

intended that the analyst will be able to move conveniently between these views during the

course of their investigation in order to examine the information relevant to their task. The

views selected are intended to represent the information that an analyst would find useful

during software comprehension.

113

Diagrams (with the exception of storyboard diagrams7) that can illustrate information in

more than one facet appear at the same level of abstraction in each facet of the model,

though this is not a requirement. Each facet need not have the same number of abstraction

levels. There are no diagrams at level 5 or 1 of the structure facet. This is because the system

structure is not relevant at a business level (only behaviour and the data it operates on are

specified), and the internal structure of classes is not relevant or visible outside the class.

There are no diagrams at level 4 of the behaviour facet. This is because the behaviour

distribution is dictated by, and therefore encapsulated in, the structure distribution. There are

no diagrams at level 3 of the data facet. Abstract data types are typically described using

textual descriptions, algorithm pseudocode, and specific pictorial representations.

Robustness analysis diagrams bridge between levels 5 and 2: they relate business entities to

classes. System context diagrams bridge between levels 5 and 3: they relate business entities

to components.

Table 4.1 The proposed visualisation model for object-oriented software

Abstraction level Structure Behaviour Data

Business structure Business behaviour Business data

The structure defined

by the high-level

business goals of the

system

The behaviour

defined by the high-

level business goals

of the system

The data defined by

the high-level

business goals of the

system

{} {BusinessEntity} {BusinessEntity}

{} {BusinessRule} {DataDependency}

5

(macroscopic)

{} {Use case diagram,

business process

diagram, robustness

analysis diagram}

{Entity relationship

diagram, XML

structure diagram,

robustness analysis

diagram}

7 This is because SBDs illustrate data and behaviour at a low level, but are contained within a package. The SBD implementation considered in this report is that described by Jahnke et al. [Jahnke 2002].

114

System structure

deployment

System behaviour

distribution

Data distribution

The structural

deployment of the

system

The behavioural

distribution of the

system

The distribution of the

system’s data

{Component,

Machine}

{} {DataObject,

Machine}

{Dependency,

Containment}

{} {Dependency,

Containment}

4

{Deployment

diagram}

{} {Deployment

diagram}

System architecture Component

interaction

Abstract data types

The structural

relationships between

the system’s high-

level components

The behavioural


the system’s high-

level components

The abstract data

types used to

encapsulate the

system’s data

{Component} {Component} {}

{Dependency} {Usage} {}

3 {Component diagram,

system context

diagram, system

architecture diagram,

reflexion model, story

board diagram}

{System context

diagram, system

architecture diagram,

reflexion model,

[Martin 2002]}

{}

Inter-class structure Inter-object

interaction

Physical

implementation

The structural


the system’s classes

The behavioural


the system’s objects

The classes used in

the physical

implementation of the

system’s data

structures

{Class} {Object} {Class}

115

{Inheritance,

Implementation,

Aggregation,

Composition}

{Invocation} {Inheritance,

Implementation,

Aggregation,

Composition}

2

{Class diagram,

object diagram, basic

graph}

{Class diagram,

object diagram,

sequence diagram,

collaboration

diagram, event sheet

diagram, message

sequence chart,

execution pattern}

{Class diagram,

object diagram}

Intra-object structure Intra-object

interaction

Primitives

The internal structure

of the system’s

objects

The internal

behaviour of the

system’s objects

The primitive data

objects used in the

system

{} {State} {Primitive}

{} {Action} {Operator}

1

(microscopic)

{} {Statechart diagram,

activity diagram,

story board diagram}

{Statechart diagram,

activity diagram,

story board diagram}

0 Program code Program code Program code

The third dimension of the abstraction model consists of static and/or dynamic analyses of

the software. As discussed in Section 2.1, static analyses have broad coverage but less detail,

while dynamic analyses are more focussed and more detailed. In this three-dimensional

model, the width of static analysis can be combined with the depth of dynamic analysis,

without their attendant disadvantages. This is achieved by the combination of analyses.

Combining a single static analysis with several dynamic analyses results in a visualisation

that is both detailed and broad in its coverage. A combination of multiple dynamic analyses

(without static analysis) could also be used to achieve this to an extent. The principal

challenge with respect to this aspect of the model is the way in which statically and

dynamically extracted information is combined and presented.

116

The multifaceted, three-dimensional abstraction model is illustrated in Figure 4.1. This

model is an example of the ‘multiple interdependent views illustrating multiple facets’

arrangement described in Section 2.4.2.6. The model will be refined in the following chapter.

The interrelationships between facets and analyses are not shown explicitly; these will be

defined as part of this formalisation process.

Structure Behaviour Data

1

2

3

4

5

FacetAnalysis

Static

Dynamic 1

Dynamic 2

View



Figure 4.1 A multifaceted, three-dimensional abstraction model for software visualisation

117

4.4 Examples

Examples of the structure, behaviour, and data abstraction hierarchies are given in Figures

4.2 – 4.4 respectively. It would be possible to synthesize a fourth facet that combines the

three existing facets and represents the integration of structure, behaviour and data in a single

hierarchy. However, as discussed in Section 2.4.2.4, this can lead to information overload

and reduced comprehensibility. It would instead be more useful to allow the analyst to define

their own views by combining information from the three existing facets. These figures serve

only as examples of the type of information represented by each level and each facet; there is

no abstraction relationship between the diagrams at the various levels of these example

figures. Each of the diagrams shown is only one example of the possible diagram types that

could be used to illustrate the information from each view. For example, activity diagrams

are given as an example diagram for level 2 structural information in Figure 4.2; these may

also be used at higher levels of abstraction, e.g. to refine use cases, to describe data

processing within an information system or organisation, or to specify an algorithm. The

empty levels in each figure correspond to the as yet undefined levels of the model.

118

class MyFigure extends AbstractFigure implements Cloneable { int color; public MyFigure(Color c) { color = c; } ... }

1

2

3

4

5

Deploymentdiagram

Componentdiagram

Classdiagram

Code0

AbstractFigure Cloneable

MyFigure

CH.ifa.draw.standard

com.michael.myapp

debugger:PC debuggee:Sun

<<JDWP>>com.sun.jdi

com.michael.tracer com.michael.myapp

CH.ifa.draw.standard

Figure 4.2 An example of the structure abstraction hierarchy

119

int myMethod(int a, int b) { System.out.println(”Start”); if (a==b) return 1; else { System.out.println(”2”); return 2; } }

Activitydiagram

Code

Sequencediagram

Assign taskto developer

Assign developerto project

Start new project

Find developer

<<uses>>

<<uses>>

Reflexionmodel

Use casediagram

5

4

3

2

1

0

return 1

return 2print “2”

[a==b]

![a==b]

print “Start”

objectA objectB

objectC

doSomething(int)

ObjectC()

doMore(char)

DrawingView

Drawing

Figure

Tool

Figure 4.3 An example of the behaviour abstraction hierarchy

120

5

4

3

2

1

0

Entity relationshipdiagram

Deploymentdiagram

Classdiagram

Code

Statechartdiagram

class ResearchStudent extends Person { String studNum;

ResearchStudent(Person p, String s) { super(p); studNum = s; }

setStuNo(String s) {...} setTopic(String t) {...} setTitle(String t) {...} submitThesis() {...} }

ResearchStudent() Registered

Researching

Writing Up

when [startDate >= currentDate < startDate + 3 years]

when [currentDate >= startDate + 3 years]

submitThesis()

entry / setTopic

entry / setTitle

entry / setStuNo

sex:booleanaddress:Stringdob:java.util.Date

Person

ResearchStudentstudNum:String

java.util.Hashtable

StaffstaffNum:Stringsalary:int

Namefirst:Stringmiddle:Stringlast:String

PC client uniServer:Sun

<<JDBC>>sun.jdbc

Enquiry system

UniDB:Database

<<TCP/IP>>

<<TCP/IP>>Staff info

Research studentinfo

staffServer:Vax

resStudentServer:Vax

ResStudent

Name

MiddleFirst Last

Sex

DoB

StudentNo

Address SUPERVISION

Staff

WORKS_IN

Department

Name

MiddleFirst Last

Sex

DoB

Address

StaffNoSalary

Name

RoomsStartDate

RESEARCHES_IN

department-research

researcher

supervisee supervisor

department-employ

worker1

N

2N

N

1

StartDate

Status

Figure 4.4 An example of the data abstraction hierarchy

121

4.5 Key research challenges

There are a number of key research challenges associated with this proposed solution. One

such challenge is the way in which the visualisation information will be stored as a model,

and how this will be used to generate view hierarchies. Another challenge is the definition of

the inter- and intra-hierarchy relationships between the views. Abstraction techniques

applicable to software visualisation will also be investigated. Identifying which views are

appropriate and useful for which comprehension tasks is a further challenge. The way in

which statically and dynamically extracted information is combined and presented will also

require investigation. The following chapters will address these research challenges.

122

5 Refining the Initial Model

“Modelling, in its computerized form, increasingly will take its place as the key knowledge

component in all forms of decision making in modern life.”

B P Zeigler [Zeigler 1984]

This chapter addresses the challenge identified in Section 4.6 of identifying which views are

appropriate and useful for which comprehension tasks. This is achieved by first validating

and refining the typical comprehension task sets used in the initial evaluation, then

theoretically evaluating and refining the proposed model based on the refined task sets.

5.1 Evaluation based on representative tasks

Chapter 3 evaluated the performance of visualisation tools by assessing their performance in

typical software comprehension tasks. The first part of this chapter: describes what the basis

for such a task set should be in terms of what information is useful for the comprehension of

OO systems; reviews the sets of tasks used in that evaluation and evaluates their

appropriateness and usefulness in evaluating a model for software comprehension; and

presents a revised evaluation task set, with accompanying justification. It is intended that this

revised task set will be used to evaluate the multifaceted, three-dimensional model proposed

in Chapter 4.

5.2 The basis for typical software comprehension tasks

A set of typical software comprehension tasks should seek to encapsulate the principal

activities typically performed during real world software comprehension. Software

comprehension activities can be divided up into those performed during general software

comprehension, where the intention is to gain an overall understanding of (a subset of) a

system, and those performed during a specific reverse engineering effort, where the intention

is to carry out a specific task (e.g. fix a bug). Some activities may involve examining the

123

structure of the software system, its behaviour, or both. Analysis at various levels of

abstraction is often required. Depending on the activity, statically or dynamically extracted

information, or a combination of both, may be desirable.

A number of typical software comprehension tasks are suggested in the literature. Storey et

al. used two sets of tasks in their study of interfaces to the Rigi tool described in Section

2.7.5 [Storey 1996a]. The ‘abstract’ tasks, which were high-level comprehension activities

that involved understanding the overall structure or design of the software, were:

1. Show familiarity with the game [that the system simulates]

2. Summarise what subsystem x does

3. Describe the purpose of artefact x

4. On a scale of 1-5, how well was the program designed?

The ‘concrete’ tasks, which were low-level comprehension activities that involved

understanding only part of the software, were:

1. Find all artefacts on which artefact x directly or indirectly depends

2. Find all artefacts that directly or indirectly depend on artefact x

3. Find an artefact that is not used

4. Find an artefact that is heavily used

Sim and Storey used two sets of tasks in their structured tool demonstration described in

Section 2.7.6 [Sim 2000a]. The tasks were intended to be representative of those encountered

by a software developer in their everyday work. The reverse engineering tasks were:

1. Provide a textual and/or graphical summary of how the [system’s] source code is

organised

2. Was [the system] well designed initially?

3. Do you think the original design is still intact?

4. How difficult will [the system] be to maintain and modify?

5. Are there some modules that are unnecessarily complex?

6. Are there any GOTOs? If so, how many? What changes would need to be made to

remove them?

The maintenance tasks were:

1. Modify the existing command panel

2. Add a new method for specifying arcs

3. Bug fix: loading library objects

124

Storey et al. used a set of tasks in their evaluation of the comprehension strategies supported

by the Rigi, SHriMP, and SNiFF+ tools, described in Section 2.7.8, that were intended to be

typical of what a maintenance programmer would be asked to do [Storey 1997, Storey 2000].

These were:

1. Look at the real Monopoly game until you understand the general concept and rules

of the game. Have you played Monopoly before?

2. Spend a while browsing the program using the provided software maintenance tool

and try to gain a high level understanding of the structure of the program.

3. In the computer game, how many players can play at any one time?

4. Does the program support a ‘computer’ mode where the computer will play against

one opponent?

5. There should be a limited total number of hotels and houses; how is this limit

implemented and where is it used? If this functionality is not currently implemented,

would it be difficult to add? What changes would this enhancement require?

6. Where and what needs to be changed in the code to implement a new rule which

states that a player in jail (and not just visiting) cannot collect rent from anyone

landing on his/her properties?

7. Overall, what was your impression of the structure of the program? Do you think it

was well written?

In their description of the Shimba reverse engineering tool, Systä et al. suggest three sets of

tasks supported by the tool [Systä 2001]. The ‘overall understanding’ tasks were:

1. What are the static software artefacts and how are they related?

2. How are the software artefacts used at run-time?

3. What is the high-level structure of a subject system?

4. How do the high-level components interact with each other?

5. Does the run-time behaviour contain regular behavioural patterns that are repeated?

If so, what are the patterns and under which circumstances do they occur?

6. How heavily has each component of a subject system been used at run-time and

which components have not been used at all?

The ‘goal-driven reverse engineering tasks’ were:

1. How does a certain component behave and how is it related to the rest of the system?

2. When was an exception thrown or when did an error occur? What happened before

that and in which order?

125

3. How is the component that causes exceptional behaviour constructed?

The ‘object/method behaviour’ tasks were:

1. What is the dynamic control flow and the overall behaviour of an object or a

method?

2. How can a certain state of an object be reached (i.e. which execution paths lead from

the initial state to this state) and how does the execution continue (i.e. which

execution paths lead from this state to the final state)?

3. To which messages has an object responded at a certain state during its lifetime?

4. Which methods of the object have been called during execution?

Kirk et al. conducted a questionnaire survey of students reusing a framework [Kirk 2001].

The questions asked how difficult the students found understanding the following aspects of

the framework:

1. Understanding individual classes and their methods

2. Using abstract classes and interfaces

3. Mapping your solution to framework code

4. Understanding the structure of inheritance hierarchies and object compositions

5. Understanding design patterns

6. Understanding the dynamic structure of the framework

7. Choosing from alternative framework solution strategies

8. Understanding the [framework’s] problem domain

The study found that the key issues were:

3. Mapping your solution to framework code

6. Understanding the dynamic structure of the framework

7. Choosing from alternative framework solution strategies

From a review of these tasks, the principal software comprehension activities can be defined

as follows.

A1. Investigating the functionality of (a part of) the system

A2. Adding to or changing the system’s functionality

A3. Investigating the internal structure of an artefact

A4. Investigating dependencies between artefacts

A5. Investigating runtime interactions in the system

A6. Investigating how much an artefact is used

126

A7. Investigating patterns in the system’s execution

A8. Assessing the quality of the system’s design

A9. Understanding the domain of the system

A set of typical software comprehension tasks should address all of these activities.

5.3 Task set analysis

A definitive set of typical software comprehension tasks does not appear to exist in the

literature. Therefore, in Chapter 3, two sets of tasks that were intended to be representative

of those performed in a typical software comprehension effort were compiled. The tasks

were divided into those typical of general software comprehension tasks, usually carried out

when attempting to understand a large part of the system, and those typical of specific

reverse engineering tasks, usually carried out on smaller parts of the system to perform a

specific purpose.

The classification of these tasks into general software comprehension tasks and specific

reverse engineering tasks delineates the tasks into those that are likely to be most

conveniently solved using higher and lower levels of abstraction, respectively, which

constitutes the first dimension of the model proposed. The tasks can also be classified by

whether they are concerned with the system’s structure, behaviour, or both – the second

dimension of the proposed model.

• Structural

o G1 What is the static structure of the software system?

o G3 What is the high-level structure/architecture of the software system?

• Behavioural

o G2 What interactions occur between objects at runtime?

o G4 How do the high-level components of the software system interact?

o G5 What patterns of repeated behaviour occur at runtime?

o S1 What are the collaborations between the objects involved in an

interaction?

o S2 What is the control structure in an interaction?

o S3 How can a problem solution be mapped onto the functionality provided

by the software system?

127

o S5 What alternative functionalities are available in the software system to

implement a solution?

o S6 How does the state of an object change during an interaction?

• Both

o G6 What is the load on each component of the software system at runtime?

o G7 What design patterns are present in the software system's

implementation?

o G8 Where in the software system are the hotspots where additional

functionality can be added?

o G9 What impact will a change made to the software system have on the rest

of the software system?

o S4 Where is the functionality required to implement a solution located in the

software system?

All of the tasks can be analysed using either statically or dynamically extracted information,

except L5 ‘What patterns of repeated behaviour occur at runtime?’ and L6 ‘What is the load

on each component of the software system at runtime?’ (these tasks cannot be answered

using statically extracted information). The third dimension of the proposed model integrates

statically and dynamically extracted information.

5.4 New task sets

None of the tools in the original study were able to answer either question G7 ‘What design

patterns are present in the software system's implementation?’ or G8 ‘Where in the software

system are the hotspots where additional functionality can be added?’. These tasks are most

applicable to frameworks and may not have been anticipated by the tool developers. Work

by Keller et al. [Keller 1999] and others on identifying design patterns, and by Schauer et al.

[Schauer 1999] and others on identifying hotspots, stress the role of the human analyst and

reveal that detecting design patterns and hotspots is a non-trivial task that can benefit from

tool support. It is for these reasons that these tasks are excluded from the revised task set.

Tasks G3 ‘What is the high-level structure/architecture of the software system?’ and G4

‘How do the high-level components of the software system interact?’ are more abstract

versions of G1 ‘What is the static structure of the software system?’ and G2 ‘What

128

interactions occur between objects at runtime?’, respectively. However, this similarity is

desirable to allow higher levels of abstraction to be evaluated. To clarify this distinction, the

word “class” is added to G1.

The word “static” is removed from G1, as it is not meant to imply that we are concerned

solely with the structure as defined by statically extracted information. For the same reason,

the phrase “at runtime” is removed from G2. Information on both a system’s structure and

behaviour can be extracted both statically and dynamically.

These task sets address all of the issues relating to the questions from previous studies

identified in Section 5.2 and constitute typical software comprehension tasks that can be

used to realistically evaluate the usefulness and effectiveness of software comprehension

models and tools for real-world software comprehension.

5.4.1 General software comprehension tasks

G1. What is the class structure of the software system?

G2. What interactions occur between objects?

G3. What is the high-level structure/architecture of the software system?

G4. How do the high-level components of the software system interact?

G5. What patterns of repeated behaviour occur at runtime?

G6. What is the load on each component of the software system at runtime?

G7. What impact will a change made to the software system have on the rest of the

software system?

5.4.2 Specific reverse engineering tasks

S1. What are the collaborations between the objects involved in an interaction?

S2. What is the control structure in an interaction?

S3. How can a problem solution be mapped onto the functionality provided by the

software system?

S4. Where is the functionality required to implement a solution located in the software

system?

129

S5. What alternative functionalities are available in the software system to implement a

solution?

S6. How does the state of an object change during an interaction?

5.5 Justification

The above task set is intended to exercise all of the features of the proposed model. It has a

selection of tasks requiring structural, behavioural, data, and combined information, various

levels of abstraction, and statically and dynamically extracted information. The tasks are

intended to be representative of typical software comprehension tasks, and are based on

software comprehension activities as described in Section 5.2. Therefore, an evaluation of

the proposed model using this task set should provide an accurate assessment of its utility

and effectiveness in supporting software visualisation for program comprehension.

Table 5.1 illustrates the principal correspondences between the typical software

comprehension activities identified in Section 5.2 and the revised evaluation tasks from

Sections 5.4.1 and 5.4.2. This table illustrates that the revised evaluation tasks address all of

the typical of software comprehension activities. The number of tasks that address each

activity varies as not all activities are at the same level of granularity. These tasks are

proposed as a complete set of typical comprehension tasks, representative of the full range of

comprehension activities, and encompassing all those found in the related literature.

Table 5.1 The correspondence between typical software comprehension activities and the revised task

sets

Activity Tasks

A1 G1, G2, S1

A2 G7, S3, S4, S5

A3 G1

A4 G1, G3

A5 G2, G4, S1, S2, S6

A6 G2, G6

A7 G5

A8 G3, G4, G7

A9 G3, G4

130

5.6 Task set revision summary

The first part of this chapter has discussed the basis for a set of typical OO software

comprehension tasks. It also reviewed the set of tasks used in the initial study described in

Chapter 3 and evaluated their appropriateness and usefulness in evaluating a model for

software comprehension. On the basis of this analysis, a new evaluation task set, with

accompanying justification, was presented. In the remainder of this chapter, this task set will

be used to evaluate the proposed software visualisation model.

5.7 Theoretical evaluation of the proposed model

Chapter 4 proposed a multi-faceted, three-dimensional model for software comprehension,

which is designed to address the comprehension shortcomings in current software

visualisation tools identified in the study described in Chapter 3. In the remainder of this

chapter, the evaluation technique described in this chapter will be applied to this model

theoretically, in order to determine which aspect(s) of the model are most useful in

improving the effectiveness of software visualisation for comprehension and hence most

promising for future research. The refined model will also be analysed to assess its support

for software comprehension strategies.

5.7.1 Model information required to address typical software comprehension tasks

The model will be evaluated theoretically by comparing the information required by each

task against the information provided by each aspect of the model. For example, in order to

answer question G6 ‘What is the load on each component of the software system at

runtime?’, both structural and behavioural information are required concerning classes,

components, and distribution (levels 2-4), and only dynamically extracted information would

be useful. As another example, to answer question S6 ‘How does the state of an object

change during an interaction?’, behavioural information at the intra-object level (level 1) is

required, and both statically and dynamically extracted information would be useful.

131

Tables 5.2 and 5.3 illustrate the information required from each dimension of the proposed

model to address each of the typical software comprehension tasks from Sections 5.4.1 and

5.4.2 respectively.

Table 5.2 Information required from each dimension of the proposed model to address the general

software comprehension tasks

Task Abstraction levels Facets Static/dynamic

G1 2 Structure Both

G2 2 Behaviour Both

G3 3-4 Structure Both

G4 3-5 Behaviour Both

G5 1-5 Behaviour Dynamic

G6 2-4 Structure, Behaviour Dynamic

G7 1-5 Structure, Behaviour, Data Both

Table 5.3 Information required from each dimension of the proposed model to address the specific

reverse engineering tasks

Task Abstraction levels Facets Static/dynamic

S1 2 Behaviour Both

S2 1-2 Behaviour Both


S4 2-3 Structure, Behaviour Both


S6 1 Behaviour Both

Firstly, these tables show that a variety of abstraction levels are required to address the

typical software comprehension tasks.

Secondly, it is clear from these tables that the Data facet is rarely used, and when it is used it

is in conjunction with the other facets. This is because in the object-oriented paradigm, a

system’s implemented data structures are encapsulated in the system structure. Hence, the

information that may have made a data facet appropriate in a procedural system is available

in the structure facet for object-oriented systems. Higher-level data structures may also be

present, but not apparent in the structure facet due to physical implementation details: a

132

logical data structure may be implemented as a number of smaller physical data structures.

For example, a hash map may be implemented as an array of Vectors in Java. The higher

levels (3-5) of the data facet would illustrate these higher-level data structures if the required

information on such structures were available.

Thirdly, these tables show that dynamically extracted information is useful for all tasks, and

a combination of statically and dynamically extracted information is useful for addressing

most of the tasks. Only the two general software comprehension questions regarding runtime

behaviour do not require statically extracted information.

It would appear that the most useful features of the proposed model are the range of

abstraction levels and the structural and behavioural facts. Therefore, the data facet is

removed from the model. The following chapter specifies and refines the abstraction scales

for the structural and behavioural facets of the model. A preliminary assessment of support

for software comprehension strategies in the model is given in Appendix C.

133

6 The Refined Model

“Manipulating abstractions is a potent means of formulating and solving real problems.”


6.1 Introduction

This chapter specifies the proposed model in more detail, in preparation for its

implementation and evaluation, thus addressing the remainder of the research challenges

identified in Section 4.6. The fully-specified model will consist of well-defined abstraction

levels, defined in terms of entities and relationships, along with abstraction mechanisms and

corresponding inter-level mappings to enable information to be transformed from one level

to another, and the combination of information from multiple levels. An application to a real

system is presented, which validates the practicality of the model, and illustrates how

abstraction mappings are created in practice.

6.2 Abstraction levels

Table 6.1 illustrates the structure and behaviour abstraction hierarchies of the proposed

model. This figure differs from the initial model presented in Section 4.3 as all levels are

specified for both the Structure and Behaviour hierarchies. Each abstraction level of each

facet is a view and consists of a name, a description, a set of entities, a set of relationships

between those entities, and a set of diagrams that illustrate software at that level of that facet.

Each named view in Table 6.1 is accompanied by an example diagram type that can be used

to illustrate information from the facet at the specific level of abstraction. It is intended that

the analyst will be able to move conveniently between these diagrams during the course of

their investigation in order to examine the information relevant to their task. The views

selected are intended to represent the information that an analyst would find useful during

software comprehension. The directed arcs between levels 0 and 1 indicate that the program

code and event trace do not belong in either hierarchy: information for both hierarchies can

be obtained from either source. A clear benefit of the integrated approach proposed is the

ease of integration of and movement between views.

134

Table 6.1 The abstraction levels of the proposed model

Level Structure Behaviour

Name Business structure Business behaviour

Description The structure defined by the

high-level business goals of the

system

The behaviour defined by the high-

level business goals of the system

Entities BusinessEntity BusinessEntity

Relationships BusinessRelationship BusinessRule

5

Example

diagram

Use case diagram Use case diagram

Name System structure deployment System behaviour distribution

Description The structural deployment of the

system

The behavioural distribution of the

system

Entities Component, Machine Component, Machine

Relationships Dependency, Containment,

Communication

Dependency, Containment,

Communication

4

Example

diagram

Deployment diagram Deployment diagram

Name System architecture Component interaction

Description The structural relationships

between the system’s high-level

components

The behavioural relationships

between the system’s high-level

components

Entities Component Component

Relationships Dependency Usage

3

Example

diagram

Component diagram Reflexion model

Name Inter-class structure Inter-object interaction

Description The structural relationships

between the system’s classes

The behavioural relationships

between the system’s objects

Entities Class Object

Relationships Inheritance, Implementation,

Aggregation, Composition

Invocation 2

Example

diagram

Class diagram Sequence diagram

135

Name Intra-class structure Intra-object interaction

Description The internal structure of the

system’s objects

The internal behaviour of the

system’s objects

Entities Method, Attribute State

Relationships Containment Action

1

Example

diagram

Source code representation Activity diagram

Name Source code Event trace

Description The system’s source code A dynamic event trace of the

system’s execution

Entities Operand Operand

Relationships Operator Operator

0

Example

diagram

Program listing Output statements

As a further example, an instantiation of the behaviour hierarchy is illustrated graphically in

Figure 6.1 for part of the JHotDraw framework [Johnson 1992]. In contrast to the initial

examples presented in Section 4.4, there is clear continuity between the abstraction levels of

this figure. The application of the model to a JHotDraw system is described in Section 6.5.

The entity-relationship diagrams for each of the twelve views are given in Appendix D.

These diagrams specify formally the information present in each view, and provide the basis

for defining the inter-view abstraction relationships.

136

/** * Creates a new figure by cloning the prototype. */ public void mouseDown(MouseEvent e, int x, int y) { fAnchorPoint = new Point(x,y); fCreatedFigure = createFigure(); fCreatedFigure.displayBox(fAnchorPoint, fAnchorPoint); view().add(fCreatedFigure); }

Activitydiagram

Codesample

Sequencediagram

Create new Figure

Reflexionmodel

Use casediagram

5

4

3

2

1

0

fCreatedFigure= (Figure)fPrototyope.clone();

throw newHJDError(”No prototypedefined”);

initialCreationTool

fAnchorPointPoint

hJDErrorHJDError

DrawingView

DrawingTools

Figures

drawingClient:PC drawingServer:Sun

<<TCP>>DrawingView

Drawing Tools

FiguresDeploymentdiagram

fAnchorPoint= new Point(x,y);

/** * Creates a new figure by cloning the prototype. */ protected Figure createFigure() { if (fPrototype == null)

throw new HJDError("No protoype defined"); return (Figure) fPrototype.clone(); }

[fPrototype = null]

![fPrototype = null]

fCreatedFigure.displayBox(fAnchorPoint,fAnchorPoint);

view().add(fCreatedFigure);

fPrototypeFigure

fCreatedFigureFigure

drawingViewDrawingView

1: mouseDown(MouseEvent,int,int):void 1.1: <constructor>

(x, y)

1.2: fCreatedFigure:=createFigure():Figure

if (fPrototype == null)1.2.1.1: <constructor>("No protoype defined")

1.2.1.1.1: <constructor>(msg)1.2.2: clone():Object

1.3: displayBox(fAnchorPoint, fAnchorPoint):void

1.5: add(fCreatedFigure):Figure1.4: view():DrawingView

Figure 6.1 An example instantiation of the behaviour hierarchy for part of the JHotDraw framework

137

6.3 Inter-level abstraction relationships

6.3.1 Abstraction mechanisms

Fundamental to the model is the definition of abstraction relationships between the model

views. These abstractions allow software visualisations to be related and manipulated. Such

an integrated arrangement is preferable to an ad hoc approach as it allows software

visualisations to be reasoned about in a coordinated and correct manner. Yan et al. comment

that the most critical challenge in discovering high-level software artefacts from low-level

system events is “finding mechanisms to bridge the abstraction gap: in general, low-level

system observations do not map directly to architectural actions” [Yan 2004, p. 470].

Abstraction operations are defined between each of the views in the model. These operations

illustrate the relative abstraction level of each view (e.g. that one view is more abstract than

another), and define the transformations between views. Figure 6.2 presents the abstraction

hierarchies from Table 6.1 in the form of an abstraction network that illustrates these

operations [Fishwick 1988]. The more abstract representation (the arc target) is derived from

the less abstract base representation (the arc source) by applying the transformation indicated

by the arc label. The three abstraction mechanisms used in the network are:

• abstraction by reduction (RED);

• abstraction by induction (IND); and

• partial systems morphism (PSM).

These mechanisms are described in Section 2.3.3. Fishwick describes these mechanisms and

presents examples in the context a simulation of the dining philosophers (DP) problem

[Dijkstra 1968]. For abstraction by reduction, Fishwick gives the example of abstracting

from a Petri net [Peterson 1981] to a frequency distribution. For abstraction by induction, the

example given by Fishwick is of abstracting from the observed data to a finite state

automaton. For partial systems morphism, Fishwick’s example makes use of a PSM to

produce a more abstract Petri net (with fewer arcs) from a less abstract one. None of these

abstraction mechanisms require a change in representation: the representation used is simply

one that is appropriate to the underlying data.

138

Program Code / Event Trace

Inter Class Structure

System Architecture

System Structure Deployment

Intra Object Interaction

Inter Object Interaction

Component Interaction

Business Behaviour

System Behaviour Distribution

Business Structure

Intra Class Structure

2

1

0

3

4

5

PSM

INDIND

IND IND

IND IND

RED RED

Figure 6.2 An abstraction network illustrating the abstraction relationships between the views of Table

6.1

The interpretation of abstraction by reduction employed here is an abstraction function that

applies some summarisation function to the data to extract its essential higher-level

properties. For example, abstraction from level 3 – level 5 requires input from a human

analyst to define the summarisation function that recognises business-level information from

component-level information. Abstraction by reduction in this context typically requires

human or heuristic analysis.

Abstraction by induction is interpreted here as a function that amalgamates lower-level

entities and relationships to form higher-level ones. For example, abstraction from level 1 –

level 2, level 2 – level 3, and level 3 – level 4 involve grouping entities and relationships at

139

the lower level into their equivalents at the higher level. Abstraction by reduction can often

be automated entirely, though may benefit from analyst information to define a more

appropriate or precise amalgamation.

A partial systems morphism in this context is a direct mapping from a subset (proper or

improper) of the entities and relationships at the lower level to those at the higher level. An

entity or relationship at the lower level may map to zero, one, or more entities or

relationships respectively at the higher level, and each entity at the higher level must be

mapped to by at least one entity at the lower level. For example, abstraction from level 0 –

level 1 involves mapping operators and operands to higher-level entities and relationships.

Given the generic mappings to apply to the system, abstraction by partial systems morphism

can be automated conveniently, and indeed benefits from automation in terms of time and

correctness over a human implementation of the mappings.

In the model presented here, a PSM is used to extract basic information from the program

code (or from an event trace in the case of dynamic analysis). From this basic information,

information on intra- and inter-class and object interactions is available. This information

can then be transformed using abstraction by induction to generate information on the system

architecture and high-level component interaction. From this information, abstraction by

induction can be applied to produce information on the system’s structural and behavioural

distribution, or abstraction by reduction can be used to elicit the system’s business-level

structure and behaviour.

6.3.2 Detailed abstraction example

The following example considers the behavioural hierarchy from Figure 6.1, and is intended

to illustrate the abstraction mechanisms of the network. Firstly, the program code is

abstracted using a PSM to generate a data set. This makes use of a mapping between some

elements in the source code and elements in the extracted dataset. For example, method calls

would have a mapping to an appropriate representation in the extracted data, but comments

would be ignored. Views of the system’s intra- and inter-object behaviour can be produced

directly from this extracted data (alternatively, the inter-object view could be generated from

the intra-object view by means of abstraction by induction). These views could be visualised

using, for example, activity diagrams and sequence diagrams respectively.

140

The PSM abstraction from level 0 (code) to level 1 (intra-object behaviour) entails a

mapping from the source code to the entities and relationships at level 1, which are states and

actions respectively (see Table 6.1). In this example, the call fAnchorPoint = new

Point(x, y); in the code becomes the first state in the activity diagram, while the

condition if (Prototype == null) becomes an action.

The PSM abstraction from level 0 (code) to level 2 (inter-object behaviour) entails a

mapping from the source code to objects and invocations. Object references in the source

code become objects, while method calls become invocations. In this example, the object

reference fAnchorPoint in the code becomes the first object in the sequence diagram,

while the call to Point(x, y) becomes the second invocation.

The abstraction by induction from level 1 to level 2 involves grouping the level 1 entities and

relationships (states and actions respectively) into their level 2 counterparts (objects and

invocations). The activity diagram for each object maps onto the corresponding level 2

object, while the states that contain method calls become invocations. In this example, the

activity diagram becomes the object initial:CreationTool in the sequence diagram, while the

first state (fAnchorPoint = new Point(x,y)) becomes invocation 1.1.

From level 2, abstraction by induction can be applied to the inter-object information to

generate information on component interactions in the system. This view could be visualised

using a reflexion model [Murphy 2001]. The abstraction by induction from level 2 to level 3

involves grouping objects and invocations into components and usages respectively. Objects

are grouped together to form components. Invocations between objects in separate

components become usages. In this example, the CreationTool object becomes part of the

Tools component, and the invocations from this object to the Figure objects in the sequence

diagram become the usage from the Tools component to the Figures component in the

reflexion model.

From level 3, there are two more abstract representations that can be generated. Abstraction

by induction can be applied to generate information on the system’s behavioural distribution

(level 4), which could be visualised using a deployment diagram. Abstraction by reduction

can be applied to produce information on the system’s business behaviour (level 5), which

could be visualised by a use case diagram.

141

The abstraction by induction from level 3 to level 4 involves grouping components into

components and machines, and usages into dependency, containment, and communication

relationships. The level 3 components map directly to the components of the same name at

level 4. Similarly, inter-component usages at level 3 map directly to dependencies at level 4.

(Information on which components are located on which machine is available from the

distribution manager (e.g. CORBA).) Where a level 3 dependency exists between two

components not executing on the same machine, that dependency maps to a communication

between the machines at level 4. In this example, the Figures component from the reflexion

model becomes the Figures component in the deployment diagram. The usage between

Drawing and Figures in the reflexion model becomes the inter-machine communication in

the deployment diagram.

The abstraction by reduction from level 3 to level 5 involves the application of a function to

summarise component interaction, expressed in terms of components and usages, into

business behaviour in terms of business entities and business rules. The levels 3 components

map to business entities, while the level 3 usages map to business rules. In this example, the

component usages between Tools, Figures, Drawing, and DrawingView in the reflexion

model correspond to a single use case for creating a new figure in the diagram.

6.3.3 Generic abstraction mappings

In order to perform these abstractions, the mappings mentioned above must be defined. The

mappings for the structure and behaviour hierarchies are listed in Tables 6.2 and 6.3

respectively. The mappings listed in Tables 6.2 and 6.3 are defined generically in this

section, and relate the entities and relationships at level n to their counterparts at level n+1.

The entities and relationships in Tables 6.2 and 6.3 are subtypes of those in Table 6.1. For

example, those at level 0 constitute the complete set of entities and relationships in the

source code or event trace that are relevant to the model. An example of abstraction from

level 0 - level 1 would be an entity class Figure {…} extracted from the source code,

which is of type ClassContainmentOperator0, which relates to the entity Figure of type

ClassContainment1 at level 1. Further detail and examples are given when the model is

applied manually to a real system in Section 6.5. The operator denotes an abstraction

operation. The abstraction operations between each level are those illustrated in Figure 6.1.

For example, abstraction from intra-class information at level 1 to inter-class information at

142

level 2 involves the application of the appropriate mappings from Table 6.2, which

implement abstraction by induction between the two levels. The reflexion model technique

of Murphy et al. also makes use of mappings to relate low-level software artefacts to higher-

level architectural components [Murphy 2001].

When applying the mappings to a specific system, some mappings may be generated

automatically, while some may require or benefit from knowledge of the system and its

domain. In both hierarchies, the mappings from levels 0–1, 0–2, 1–2, and 3–4 are generated

entirely automatically. The mappings from levels 2–3 can be generated automatically for

some systems (e.g. based on source code naming conventions), but can also benefit from

analyst knowledge of the system and domain (i.e. RelatedClasses2 and RelatedObjects2 relate

to Component3 based on the analyst’s information). The mappings from levels 3–5 can be

generated automatically by means of selective tracing of use cases, or an analyst can relate

RelatedComponents3 to BusinessEntity5. The process of generating and applying the specific

mappings will be demonstrated in Section 6.5 when the model is applied in the context of a

real system.

6.3.3.1 Structure hierarchy

Table 6.2 gives the abstraction mappings for the structure hierarchy.

143

Table 6.2 Abstraction mappings for the structure hierarchy

From

level

From

Entity/Relationship

Abstraction

technique

To

Entity/Relationship

To level

Operand Method, Attribute 0

Operator PSM

Containment 1

Operand Class

0 Operator

PSM Inheritance,

Implementation,

Aggregation,

Composition

2

Method, Attribute Class

1 Containment

IND Inheritance,

Implementation,

Aggregation,

Composition

2

Class Component

2 Inheritance,

Implementation,

Aggregation,

Composition

IND Dependency

3

Component Component

3 Dependency

IND Machine,

Dependency,

Containment,

Communication

4

Component BusinessEntity 3

Dependency RED

BusinessRelationship 5

The generic abstraction mappings for the structure hierarchy are defined formally as follows.

Level 0 – Level 1

ClassContainmentOperator0 ClassContainment1

MethodContainmentOperator0 MethodContainment1

MethodDeclarationOperand0 Method1

AttributeDeclarationOperand0 Attribute1

144

Level 0 – Level 2

InterfaceOperand0 Interface2

ClassOperand0 Class2

InheritanceOperator0 Inheritance2

ImplementationOperator0 Implementation2

AggregationOperator0 Aggregation2

CompositionOperator0 Composition2

Level 1 – Level 2

Class1 Class2

Interface1 Interface2

Method1 MethodOfClass2

Attribute1 AttributeOfClass2

ClassContainmentWithInheritance1 Inheritance2

ClassContainmentWithImplementation1 Implementation2

ClassContainmentWithAggregation1 Aggregation2

ClassContainmentWithComposition1 Composition2

Level 2 – Level 3

RelatedClasses2 Component3

InheritanceBetweenUnrelatedClasses2 Dependency3

ImplementationBetweenUnrelatedClasses2 Dependency3

AggregationBetweenUnrelatedClasses2 Dependency3

CompositionBetweenUnrelatedClasses2 Dependency3

Level 3 – Level 4

Component3 Component4

LocalDependency3 Dependency4

RemoteDependency3 Communication4

ComponentsSharingLocalDependencies3 Machine , Containment4 4

Level 3 – Level 5

RelatedComponents3 BusinessEntity5

DependencyBetweenUnrelatedComponents3 BusinessRelationship5

145

6.3.3.2 Behaviour hierarchy

Table 6.3 gives the abstraction mappings for the behaviour hierarchy.

Table 6.3 Abstraction mappings for the behaviour hierarchy

From

level

From

Entity/Relationship

Abstraction

technique

To

Entity/Relationship

To level

Operand State 0

Operator PSM

Action 1

Operand Object 0

Operator PSM

Invocation 2

State Object 1

Action IND

Invocation 2

Object Component 2

Invocation IND

Usage 3

Component Component

3 Usage

IND Machine,

Usage,

Containment,

Communication

4

Component BusinessEntity 3

Usage RED

BusinessRule 5

The generic abstraction mappings for the behaviour hierarchy are defined formally as

follows.

Level 0 – Level 1

OperandsDefiningCurrentState0 State1

OperatorTriggeringStateChange0 Action1

Level 0 – Level 2

ObjectOperand0 Object2

InterObjectOperator0 Invocation2

146

Level 1 – Level 2

AllStates1 Object2

InterObjectAction1 Invocation2

Level 2 – Level 3

RelatedObjects2 Component3

InvocationBetweenUnrelatedObjects2 Usage3

Level 3 – Level 4

Component3 Component4

LocalUsage3 Usage4

RemoteUsage3 Communication4

ComponentsSharingLocalUsages3 Machine4, Containment4

Level 3 – Level 5

RelatedComponents3 BusinessEntity5

UsageBetweenUnrelatedComponents3 BusinessRule5

6.3.4 Combining information from multiple views

The rigorous definition of the abstraction mappings enables information from multiple views

to be combined. Most CASE and software visualisation tools do not support this flexibility.

View combination is useful for determining the low-level artefacts that correspond to high

level system properties, and for focusing analyses. For example, an analyst may wish to

examine the class structure responsible for some behaviour observed at the business level.

The combination of level 2 structural and level 5 behavioural information would allow them

to investigate this. The three possible scenarios for combining information are:

1. from the same level of each hierarchy;

2. from different levels of the same hierarchy; and

3. from different levels of each hierarchy.

147

6.3.4.1 From the same level of each hierarchy

The first combination is achieved by forming the union of the sets of entities and

relationships of each view. This can be expressed algebraically by Equations 1 and 2.

(1) Sx

Bx

Cx EEE ∪=

(2) Sx

Bx

Cx RRR ∪=

Where: denotes the set of entities at level x of hierarchy y; denotes the set of

relationships at level x of hierarchy y; B denotes the behaviour hierarchy; S denotes the

structure hierarchy; and C denotes the combination.

yxE y

xR

An example application of combining information from the same level of each hierarchy

would be to produce a unified visualisation of the structural and behavioural characteristics

of the system, for example at the component level.

6.3.4.2 From different levels of the same hierarchy

The second combination is achieved by relating the less abstract entities and relationships to

their more abstract counterparts in the same hierarchy. The relationship between these levels

is defined by the composition of the mappings from the less abstract level to the more

abstract level. This is shown in Equation 3.

(3) ymm

ynn

ynm fff 1....1.. ... +−= οο

Where: denotes the abstraction operation from level m to level n of hierarchy y; m<n;

and ο is the composition operator

y

nmf

..

The sets of entities and relationships in the combination consist of the union of the sets of

entities and relationships at levels m and n of hierarchy y, with three exceptions. Firstly, level

m entities that do not have an abstraction mapping to level n entities are not included.

148

Secondly, level m relationships whose entities are not present in the combination (due to

exception 1) are not included. Thirdly, it may be desirable to hide level m relationships that

are subsumed by level n relationships. The sets of entities and relationships remaining after

applying these exceptions are expressed algebraically in equations 6 and 7. Thus, the sets of

entities and relationships in the combination can be expressed algebraically by Equations 4

and 5.

(4) ym

yn

Cnm ExEE ∪=,

(5) ym

yn

Cnm RxRR ∪=,

Exception 1: (6) )]}([|{ y

m..n

ym

ym fdomeEeeEx ∈∈∀=

Exception 2 ∪ Exception 3:

(7) ]},[|{

]},[|{

)()(,

,2,1,

2..1..21

21

ynefef

ymee

Cnm

Cnm

ymee

ym

RrRrr

EeEeRrrRxy

nm

y

nm ∉∈∀∪

∈∈∈∀=

Where: e is an element of E; and r is an element of R relating entities e1 and e2.

An example application of combining information from different levels of the same

hierarchy would be to reveal the lower-level interactions responsible for the system’s high-

level behaviour.

6.3.4.3 From different levels of each hierarchy

The third combination is achieved by relating the less abstract entities and relationships in

one hierarchy to the more abstract entities and relationships in the other. This combination is

equivalent to applying both of the previous combinations. As in the previous combination,

the relationship between these levels is defined by the composition of the mappings from the

less abstract level to the more abstract level. This function is given by Equation 3, above.

149

The sets of entities and relationships in the combination consist of the union of the sets of

entities and relationships at level n of hierarchies y and y’, and level m of hierarchy y

(assuming hierarchy y contains the less abstract view), with the three exceptions defined in

Equations 6 and 7, above. Thus, the sets of entities and relationships in the combination can

be expressed algebraically by Equations 8 and 9.

(8) ym

yn

yn

Cnm ExEEE ∪∪= '

,

(9) ym

yn

yn

Cnm RxRRR ∪∪= '

,

Where: y’ denotes the inverse hierarchy to y (i.e. when y represents the behaviour hierarchy,

y’ represents the structure hierarchy, and vice versa).

An example application of combining information from different levels of each hierarchy

would be to investigate the structural elements responsible for some high-level behaviour.

An example of this is shown in Figure 6.3.

The general cases considered here discuss the combination of two views. The specifications

can also be used to combine more than two views and to focus analyses. View combination

in a real system is demonstrated in Section 6.5.

150

Create new Figure

Figure

ConnectionFigure

FigureChangeListener

Drawing

DrawingChangeListener

DrawingView

Tool

Create new Figure

Structure level 2 Behaviour level 5

Combination

Figure

ConnectionFigure

FigureChangeListener

Drawing

DrawingChangeListener

DrawingView

Tool

Figure 6.3 Example combination of level 2 structure and level 5 behaviour information

6.4 Metamodels

This section describes two metamodels that have been proposed for representing information

about software systems and compares them with the novel model proposed in this thesis.

151

6.4.1 Dagstuhl Middle Metamodel

The DMM is a metamodel for representing the static structure of source code [Lethbridge

2003]. The DMM has separate hierarchies for source elements and model elements. A third

hierarchy represents relationships: between model elements, between source elements, and

between model elements and source elements. The DMM can be used to model both OO and

non-OO systems. Some work has been done on integrating dynamic information. It has been

proposed that a formal semantics for the classes and relationships of the model should be

formulated. Proposed extensions include the modelling of dynamic information. Future work

may include developing mappings to relate to lower level schemas, and developing

architectural schemas that link to the DMM.

A small portion of the DMM could be used to represent the level 0 information extracted

from the source code or event trace. However, this would be overkill for the purposes of the

model proposed in this thesis. As an implementation detail, it would perhaps be desirable to

accept information from a DMM model as level 0 input to the model if interoperability were

a concern and it was observed that the DMM was achieving widespread acceptance in the

reverse engineering community (though this does not appear to be the case).

6.4.2 UML metamodel

The Unified Modelling Language (UML) is defined using a four-layer metamodel hierarchy.

The UML 2.0 superstructure metamodel [OMG 2003a] defines relationships between

concepts such as components and deployments.

The infrastructure [OMG 2003b] defines specific diagrams at a lower level of abstraction. It

is therefore possible, using the superstructure and infrastructure, to relate the different UML

diagram types and their elements. For example, UML can be used to relate statecharts

depicting behaviour to the class responsible for encapsulating that behaviour.

There does not appear to be any explicit acknowledgement of software abstraction levels. It

is therefore more cumbersome to relate software artefacts. In the novel model proposed in

this thesis, all level 1 information (intra-object interactions, which may be represented as

statecharts) relating to the same object entity can be abstracted to that object entity at level 2

152

(inter-object interactions, which may be represented as an object diagram). In UML, the

StateMachine entity is related to a BasicBehaviours:Behaviour entity, which is in turn related

to a Kernel:Class entity. In contrast, the novel model defines generic mappings to relate

software artefacts at different levels of abstraction; these mappings are intuitive and can be

conveniently created and read by a human analyst.

While UML describes and relates diagrams in terms of diagram concepts, the novel model

makes use of software entities and relationships, which are independent of a specific diagram

type. Thus, the novel model allows any type of diagram to be plugged in and used for

visualisation of the underlying model information by populating it with the required entities

and relationships. To achieve this with UML would require translation of this basic

information (which is extracted directly from a system) into the UML metamodel.

The novel model is also much more compact and focussed than the UML metamodel, as it is

targeted specifically to the reverse engineering process, rather than encompassing the

forward engineering steps of analysis and design.

There does not appear to be anything useful in the UML metamodel that the novel model

omits. This is because the novel model is based on accepted software artefacts (e.g. classes,

components, etc.) so any omission would be readily identified. A partial instantiation of the

UML schema could perhaps be used to implement the novel model, though this would lead

to an unnecessarily complicated implementation.

The fine-grained concepts embodied in UML (e.g. activities and triggers in state machines)

are not necessary in the novel model. The basic information required to draw the relevant

UML models at each level of abstraction is present, and could be augmented with this

additional information to produce a more precise UML diagram if desired. However, it

would be overly burdensome to include such information that is specific to a particular

diagramming notation by default; the intention of the novel model was to be independent of

any specific notation by representing the principal software constructs, thus allowing any

notation that represents the entities and relationships at a particular level to be plugged into

the model for display purposes as needed.

If all of the UML metamodel diagrams were amalgamated into a single diagram, diagrams

could be related to each other. Abstraction mappings are not defined explicitly, though one

153

model could be related to another through such a comprehensive metamodel. For example,

all the state machines for a class could be combined to abstract them into an entity

representing that class in a class diagram.

The principal differences between the novel model and the UML metamodel can be

summarised as follows:

1. the novel model deals explicitly with software artefacts at multiple levels of

abstraction;

2. generic mappings in the novel model make inter-abstraction level relationships

explicit;

3. the novel model supports multiple diagram types; and

4. the novel model is focussed on reverse engineering, and hence simpler.

An exploration of how the UML could be used to represent some of the information from the

novel model proposed in this thesis is given in Appendix F.

6.5 Applying the model to a real system

The applicability and feasibility of the model is demonstrated by applying it manually to a

real system (JHotDraw). The system-specific abstraction mappings were generated, which

allowed the abstraction hierarchies for the system to be produced. The diagrams produced

were then validated by comparison with those from reliable sources.

The class and sequence diagrams produced by the model were compared to those from the

system documentation, and were found to match closely. Discrepancies were due to the use

of static information in the documentation diagrams, the system designers’ ‘idealised’ view

of the system’s interactions, and the coverage of the event trace.

The component diagrams produced from the novel model were compared to those produced

by a system expert. There was a 30% agreement between the communications illustrated in

the diagram produced from the manual application of the model and the expert’s diagram.

Discrepancies were again due to the expert’s use of static information, his idealised view of

the system’s interactions, and the coverage of the event trace. This comparison illustrates

that although the novel model is an accurate representation of the system as implemented, it

154

may not entirely reflect an analyst’s idealised view of the system’s design. A potential

weakness that would affect the accuracy of the novel model in this comparison is the

accuracy of the level 2-3 abstraction mappings provided by the JHotDraw expert.

The use case diagrams produced from the novel model were compared to that produced by

the system expert. There was no agreement between the communications shown in these

diagrams. This shows that the business relationships generated by abstracting from low-level

structural and behavioural interactions in the novel model are not synonymous with the

expert’s perception of the system’s use cases. As in the previous comparison, this result

shows that the novel model may not entirely reflect an analyst’s idealised view of the

system’s design.

These initial results provide confidence in the validity of the model and the abstraction

relationships. The points of variability are the information provided by the expert analyst and

the coverage of any dynamic trace. Full details of the manual application of the model are

given in Appendix B.

6.6 VANESSA: Visualisation Abstraction NEtwork for Software Systems Analysis

The motivation in building VANESSA was to demonstrate the feasibility of the model, and

to allow a rigorous evaluation of the model using real software systems to be performed.

VANESSA fully implements all aspects of the model described in this thesis.

6.6.1 Tool implementation

VANESSA is implemented in Java (J2SE 5.0) and analyses Java systems. Structural

information is extracted statically from the program code. VANESSA first converts the

source code to XML using BeautyJ [Gulden 2004], then manipulates the XML

representation by means of an XSLT stylesheet using Xalan [Apache 2004] to generate basic

(‘level 0’) structure information. Behavioural information is extracted dynamically by

generating an event trace. VANESSA incorporates a custom-built JPDA-based tracing utility

implemented using JDI [Sun 2004a]. The trace generated contains the level 0 information for

the behaviour hierarchy.

155

The next stage is to parse the generated level 0 information to produce level 1 and level 2

information. Once this process is completed, the abstraction mappings are applied to

generate the higher-level views and the abstraction relationships between them. Expert

mappings are read from text files if available. The ten ‘basic’ views comprising the model

hierarchies (see Table 6.1) can now be output to files. The user can also specify any focussed

view or combination of views to be generated, as specified in Section 6.3.4. The dot format

[Gansner 2002] is currently used for output, though the generic nature of the model

implementation allows any output format to be conveniently plugged in, such as UML. The

output can now be viewed using a viewer such as dotty [Koutsofios 1996a]. The analysis

process is illustrated in Figure 6.4.

Figure 6.4 The VANESSA analysis process

156

6.6.2 Example analyses

This section demonstrates a number of example analyses of JHotDraw to illustrate the

capabilities of VANESSA and the underlying model. A simple example is presented first,

followed by examples of each type of combination described in Section 6.3.4. This serves to

illustrate both a selection of the basic views and the possible types of combination. In the

combination figures, less abstract entities are depicted nested within more abstract entities.

An example application of one of the basic views is investigating behavioural interactions

between components in a system. The level 3 view from the Behaviour hierarchy illustrates

this information. This view is shown in Figure 6.5.

Figure 6.5. The level 3 Behaviour view of JHotDraw. Arcs denote usage

An example application of combining information from the same level of each hierarchy is

to produce a unified visualisation of the structural and behavioural characteristics of the

system, for example at the component level. This is accomplished in VANESSA by

combining the level 3 views from the Structure and Behaviour hierarchies. Part of the result

is illustrated in Figure 6.6.

157

Figure 6.6 Combining views from the same level of each hierarchy. In the combined view, solid arcs

denote usage and dashed arcs denote dependency

An example application of combining information from different levels of the same

hierarchy is to reveal the lower-level interactions responsible for the system’s high-level

behaviour, such as the inter-component interactions responsible for some business level

behaviour. This is accomplished in VANESSA by combining the level 3 and level 5 views

from the Behaviour hierarchy. Part of the result is illustrated in Figure 6.7.

158

Figure 6.7 Combining views from different levels of the same hierarchy. In the combined view, arcs

between components denote usage and arcs between business entities denote business rules

An example application of combining information from different levels of each hierarchy is

to investigate the structural elements responsible for some high-level behaviour, such as the

inter-class relationships responsible for some component level behaviour. This is

accomplished in VANESSA by combining the level 2 view from the Structure hierarchy and

the level 3 view from the Behaviour hierarchy. Part of the result is illustrated in Figure 6.8.

159

Figure 6.8 Combining views from different levels of each hierarchy. Between classes: solid arcs

denote association; dashed arcs denote extension; dotted arcs denote inheritance. Between

components: arcs denote usage

6.6.3 Comparison with other software visualisation tools

Of the software visualisation tools discussed in Section 2.2, those with the most

commonality in functionality with VANESSA are Dali, Shimba, and SHriMP. The similarity

with Dali lies in its ‘view fusion’ functionality, in which links are established between views

from different sources in order to show complementary information. This typically involves

the combination of information from static and dynamic extractors. VANESSA supports the

combination of statically and dynamically extracted information, and allows the combination

of structural and behavioural information. VANESSA also differs from Dali in that

VANESSA provides a range of abstraction levels, while Dali visualises only architectural

level information.

160

Like VANESSA, Shimba extracts structural information statically and behavioural

information dynamically and allows the views generated from this information to be

combined. In contrast to VANESSA, Shimba addresses only a limited range of abstraction

levels by visualising inter-class to architectural level information, though this is a wider

range than the single level offered by Dali.

SHriMP achieves view combination by means of a fisheye view approach which allows the

analyst to show parts of a diagram at a lower level of abstraction while retaining context.

This differs from the current VANESSA implementation which makes use of basic graphs

where all entities and relationships are at one of a defined set of abstraction levels (i.e. the

levels of abstraction of the views from which the combined view was generated). SHriMP

visualises inter-class to architectural level information, thus providing a similar range of

abstraction levels to Shimba, and also provides linkage to source code, though this is a

narrower range than that provided by VANESSA.

6.7 Summary

This first part of this chapter described and demonstrated a fully specified abstraction model

for software visualisation. The abstraction relationships in the model were defined formally,

and the model was applied to an application built using the JHotDraw object-oriented

framework to demonstrate its use. The combination of information from multiple views of

the model was defined formally and demonstrated. Lastly, the models of the system

produced were compared with models produced by other sources. It is concluded that the

model presented is a practical and valid approach to visualising a software system. The

relationships between the model presented in this thesis and simulation and continuous

system abstraction techniques are discussed in Appendix E.

The second part of this chapter presented a tool implementation of the fully-specified model.

VANESSA was created to demonstrate the feasibility of the visualisation model, and to

allow a rigorous evaluation of the model to be performed. Having demonstrated the

feasibility of the approach, the effectiveness of the model in supporting software

comprehension will now be evaluated. The evaluation will employ a range of system types,

and will make use of typical software comprehension tasks, as in the initial study described

161

in Chapter 3. This will allow the effectiveness of the proposed visualisation model in

supporting large-scale, real world software comprehension to be assessed.

162

7 Evaluation

“As Minsky [Minsky 1965] observed a model is not simply a model, it is a model which can

answer certain questions about a certain object for a certain questioner.”


This chapter describes the evaluation of the software visualisation model as implemented in

VANESSA using typical software comprehension tasks. The purpose of this evaluation is to

explore the research hypothesis stated in Section 4.2, namely:

A model that supports visualisation of software through a range of abstraction levels

that incorporate structural and behavioural views and integrates statically and

dynamically extracted information will provide effective support for the full range of


7.1 Experimental setup

The model was evaluated using four systems – two small (JHotDraw and BeautyJ) and two

medium-sized (SHriMP and ArgoUML). A replication of the original study described in

Chapter 3 was also performed using the model. The systems were selected to provide a

variety of system types and sizes. For each system a set of comprehension questions was

obtained either from an expert (3 systems) or documentation (1 system). Where an expert

was employed, they were also asked to provide mappings between system elements as

described in Section 6.3.3. The experts were given some basic information on the model, an

explanation of what was required, and a set of example comprehension questions based on

the set of typical questions to demonstrate the style expected. The experimenter then

attempted to answer the experts’ questions using VANESSA. This setup was designed to

examine the conclusion from the original study that a visualisation model combining

structural and behavioural information and a range of abstraction levels would be useful in

addressing the majority of the tasks in that study, and to validate the research hypothesis that

163

the visualisation model developed is useful in addressing typical software comprehension

tasks in real world systems.

7.2 Comprehension questions

The typical software comprehension questions to be used in the evaluation as defined in

Section 5.4 are as follows.

General software comprehension questions

G1. What is the class structure of the software system?

G2. What interactions occur between objects?

G3. What is the high-level structure/architecture of the software system?

G4. How do the high-level components of the software system interact?

G5. What patterns of repeated behaviour occur at runtime?

G6. What is the load on each component of the software system at runtime?

G7. What impact will a change made to the software system have on the rest of the

software system?

Specific reverse engineering questions

S1. What are the collaborations between the objects involved in an interaction?

S4. Where is the functionality required to implement a solution located in the software

system?

S6. How does the state of an object change during an interaction?

The specific reverse engineering questions have been reduced from six to three in order to

clarify them: S2 was removed and is represented by S1; S3 and S5 were removed and are

represented by S4. It was felt that these questions were too framework-oriented and the

differences between them too subtle to be helpful.

7.3 Threats to validity

There are three principal types of validity threat that must be considered in this evaluation.

These are internal validity, construct validity, and external validity.

164

7.3.1 Internal validity

Internal validity is concerned with mitigating sources of bias in the experiment that would

affect the cause-effect process being studied [Bryman 1988]. In the case of the replication of

the original study, there is the possibility that the experience of the experimenter in

performing that original study would result in improved performance in the replication. This

was considered unlikely to affect validity as there was a temporal separation of almost three

years between these studies.

In the evaluation proper, there is the possibility that the experimenter’s experience with the

systems used in the evaluation would affect his ability to act as a ‘typical’ software

maintainer. To mitigate this risk, systems were chosen with which he had a range of

experience, namely as a reuser (JHotDraw), as a user (BeautyJ), familiarity with the concept

only (SHriMP), and no exposure (ArgoUML).

7.3.2 Construct validity

Construct validity is the extent to which a test actually measures what it purports to be

measuring [Kirk 1986, Litwin 1995, de Vaus 1996]. This study claims to be measuring the

extent to which the model is useful in addressing typical software comprehension tasks.

There are two potential threats to construct validity.

Firstly, it must be ensured that the tasks used in the study are indeed typical of those in real

world software comprehension. It has been defined previously what is meant by these typical

tasks – they are tasks that would commonly be encountered during the software

comprehension process. In the case of the two systems for which an expert produced

comprehension questions – BeautyJ and SHriMP – we can be confident that these questions

are typical as they were provided by people who are actively involved in comprehending and

maintaining these systems on an everyday basis. In the case of ArgoUML, where questions

were taken from documentation, we can again be confident of their validity as they are

intended to help developers in comprehending the system. Moreover, the questions from all

three sources are conveniently categorised according to the typical software comprehension

activities and questions presented in Chapter 5.

165

Secondly, it must be ensured that the implementation of the model in VANESSA is an

accurate representation of the approach, and does not add any overhead to inhibit the user or

indeed give them some undisclosed advantage. The implementation of VANESSA was

careful to adhere to these guidelines, hence minimising this potential threat to construct

validity.

7.3.3 External validity

External validity is concerned with the extent to which the results from this study can be

generalised [Bryman 1988]. There were three principal threats to external validity in this

study.

Firstly, it is possible that the comprehension questions were not sufficiently ‘typical’ to be

usefully representative of those that would be encountered in other studies. As described in

the previous section on construct validity, this risk was minimised firstly by gathering tasks

from actual software maintainers, and secondly by checking that these tasks conformed to

the typical activities observed in the literature.

Secondly, it is possible that the four systems studied are not typical of real world systems.

This typicality would be threatened by the type and size of the systems studied. This risk was

minimised by selecting a variety of system types – a graphical editor framework, a source

code processor, a visualisation tool, and a software modelling application – and sizes ranging

from 53 to 1165 core types8. While the system selection was limited to systems for which

expert information and source code was available, this was not overly restrictive as a great

deal of open source software is available that fulfils these criteria. Indeed, it could be argued

that the genre of open source software is particularly appropriate as it is widely subjected to

comprehension by a variety of maintainers.

Thirdly, it is likely that as the developer of the visualisation model and VANESSA tool, the

experimenter would have a better understanding of them than other users would and hence

may be more proficient in their use. As a result, other users using this approach to address

software comprehension tasks may be less successful. It was thought that this threat was

8 i.e. static types (interfaces, abstract classes, and concrete classes) that are not part of ancillary libraries

166

acceptable as being an expert with the approach allows the experimenter to exploit it to its

fullest potential thus providing the most accurate assessment of its utility, though at some

potential cost to reproducibility and comparison with the results of the initial study described

in Chapter 3.

7.4 Subject systems

7.4.1 JHotDraw

The core of the JHotDraw version 5.1 framework contains 125 types. The JavaDrawApp

extension consists of 124 of these classes (applet.DrawApplet is not included) plus 10 of its

own types, resulting in a total of 134 types. JHotDraw was created over 2-3 months by two

researchers; it has since become an open source project. The system expert has five years

software engineering experience, and has been working on JHotDraw for the same length of

time. The experimenter’s familiarity with this system was limited to a small reuse effort as

part of a coursework assignment four years ago, the original study three years ago, and more

recently during the manual application of the model described in Section 6.5. As in the

original study, mappings were provided by the system expert.

7.4.2 BeautyJ

BeautyJ is a Java source code transformation utility [Gulden 2004]. It performs auto-

formatting (beautification) of Java source code; input and output can be either standard

source code files or XML. The principal use cases of BeautyJ are therefore ‘Perform

beautification’ and ‘Read/write XML’. BeautyJ can be executed in batch mode from the

command line or via a GUI. The GUI was used in generating the trace to allow both use

cases to be exercised in a single trace. The trace execution consisted of:

1. Start BeautyJ

2. Configure BeautyJ to generate beautified source code from source code

3. Perform the beautification

4. Configure BeautyJ to output XML from source code

5. Perform the XML conversion

6. Exit BeautyJ

167

BeautyJ version 1.1 contains 53 types, excluding ancillary libraries which were omitted from

the analysis in order to focus on the core of the system. BeautyJ is three years old; during

that period, approximately four months were spent on its development. The expert was the

sole developer, who has eight years software engineering experience. The experimenter’s

experience with BeautyJ was limited to invoking it in batch mode from VANESSA to

generate XML representations of source code (as described in Section 6.6.1.). A screenshot

of BeautyJ is shown in Figure 7.1. The system’s package structure was used to identify class-

component mappings due to its functional structure. Component to use case mappings were

not available, though there would have been little interest in level 5 diagrams as the system

has only two use cases. The BeautyJ comprehension questions and answers were provided

by the BeautyJ expert.

Figure 7.1 A screenshot of the BeautyJ options dialogue

7.4.3 SHriMP

SHriMP is a visualisation system for hierarchical information [Storey 1995]. A number of

applications implementing the system have been produced, such as the Creole Eclipse plugin

[CHISEL 2005] for visualising Java code and the standalone SHriMP application [Storey

168

2001] for displaying graph based data. In the trace, the standalone SHriMP application was

used to explore a simple example program. The trace execution consisted of opening the

standalone SHriMP application, exercising the following use cases identified by the system

expert, then exiting the application:

1. Load a GXL [Winter 2002] file into SHriMP

2. Layout a node’s children in a tree

3. Search for and zoom to a particular node

4. Change the colour of an arc type

5. Export an SVG/HTML snapshot of the current view to a file

Standalone SHriMP version 2.1.11 contains 455 types, excluding ancillary libraries which

were not analysed. SHriMP has been under development for five years. The development

team typically consists of two or three developers – a mix of students and programmers. The

system expert was a programmer who has been maintaining SHriMP for four years and is

currently the sole maintainer. He has four years software engineering experience. Although

the experimenter was broadly familiar with ideas behind the approach, he had no prior

experience with SHriMP. A screenshot of SHriMP is shown in Figure 7.2. Mappings and

comprehension questions and answers were provided by the system expert.

Figure 7.2 A screenshot of the SHriMP application

169

7.4.4 ArgoUML

ArgoUML is a UML modelling application [ArgoUML 2005]. The trace consisted of starting

the application, creating a class, creating an association, and exiting the application.

ArgoUML version 0.18.1 contains 1165 types, excluding ancillary libraries which were not

analysed. ArgoUML has been under development for ten years; it has been an open source

project for the past six years. The experimenter had no prior experience with ArgoUML. A

screenshot of ArgoUML is shown in Figure 7.3. Class-component mappings were derived

from the package structure and documentation; component-use case mappings were not

available. The ArgoUML comprehension questions and answers were extracted from a

developers’ guide to the system edited by two of the system owners (the ArgoUML

Cookbook) [Tolke 2005].

Figure 7.3 A screenshot of ArgoUML

170

7.5 Findings

This section presents the findings from the evaluation. Comprehensive details are given in

the evaluation logbook in Appendix G.

7.5.1 Finding 1

Finding: VANESSA was able to address the majority of the tasks in the original study.

Justification: VANESSA’s performance of 79% in the replication

In the original study the performance of six visualisation tools was compared by assessing

their performance in addressing typical comprehension tasks in JHotDraw. This study

revealed that different tools were capable of answering different questions, and led to the

hypothesis that a tool that made use of structural and behavioural information in combination

with abstraction should be able to address the majority of these typical comprehension

questions. In order to investigate if this goal has been achieved, a replication of the original

study was performed using VANESSA9.

Eight use cases were identified by the system expert:

1. Add a new figure

2. Animate the drawing

3. Delete an existing figure

4. Edit an existing figure

5. Load or save a drawing

6. Print the drawing

7. Select an existing figure

8. Select a tool

The trace consisted of starting the JavaDrawApp JHotDraw application, executing these use

cases, then exiting the application. The mappings used were those provided by the system

expert in the original study.

9 As a replication was being performed, the original question sets from the original study described in Section 3.2 and 3.3 were used, not the revised question sets used in the remainder of this evaluation.

171

Here some examples of how VANESSA was used to answer the replication questions are

presented. The full replication is detailed in the evaluation logbook in Appendix G.

Large scale question 1: What is the static structure of the software system?

Level 2 of the Structure hierarchy illustrates inter-class information. To answer this question

the S2 view10 is generated, part of which is shown in Figure 7.4. As explained earlier, the

information in the diagrams presented in this evaluation could also be presented

conveniently in other diagram formats, such as UML.

Figure 7.4 A part of the S2 view of JavaDrawApp

Large scale question 4: How do the high-level components of the software system interact?

Level 3 of the Behaviour hierarchy illustrates component interaction information. To answer

this question the B3 view shown in Figure 7.5 is generated.

10 Individual views are referenced by the initial letter of their containing hierarchy, followed by their abstraction level. Thus, ‘S2’ refers to the view at level 2 of the Structure hierarchy (inter-class structure). Combinations are denoted by listing the views that they comprise, delimited by a slash. The ordering is not significant, but for sake of consistency the convention adopted here is to list the less abstract view first; where a combination consists of two views from different hierarchies, the Structure view is listed first. Thus, an S2/B3 combination refers to the combination of the level 2 Structure and level 3 Behaviour views. The word ‘custom’ is prepended to the name of a view or combination that has been focussed – i.e. does not contain all of the entities and relationships contained in the view(s) from which it is derived. Thus, a custom S1 view would be a subset of the level 1 Structure view.

172

Figure 7.5 The B3 view for JavaDrawApp

Small scale question 6: When debugging a JHotDraw application, it may be important to

examine the internal state of objects in the diagram. For example, in a class diagram

application, a Figure object representing a class would contain references to the attributes,

operations, and associations of the class it represents. In order to extract such information,

it is necessary to investigate the way in which an object’s state changes during the course of

an execution.

A B1 diagram is generated showing how RoundRectangleFigure changes when it is resized,

shown in Figure 7.6. It can be seen from this figure that

RoundRectangleFigure(id=1918).fDisplayBox changes value from Rectangle(id=1951) to

Rectangle(id=1960) after a call to RoundRectangleFigure.basicDisplayBox(Point, Point)

changes the display box, thus setting the new dimensions of the figure.

173

Figure 7.6 The custom B1 view of RoundRectangleFigure

7.5.1.1 Replication summary

Using VANESSA it was possible to answer 11 of the 14 questions. Of the three questions

that it was not possible to answer, two of these (detecting design patterns and hotspots) could

not be answered by any of the tools in the original study. The remaining question (pattern

detection) was answered by only one of the tools in the original study.

VANESSA’s large-scale performance was 6/9; the best result from the original study was

4/9, while the average was less than 3/9. VANESSA’s small-scale performance was 5/5; the

best result from the original study was 5/5 (one tool achieved this), while the average was

less than 3/5. This results in an overall performance of 79% for VANESSA; the best result

from the original study was 53%, while the average was 37%.

It is concluded from these results that the original proposal that a tool combining structural

and behavioural information with abstraction would be able to answer almost all of the case

174

study questions was indeed correct. This result supports the hypothesis that such a tool

should be able to address the full range of software comprehension questions. The evaluation

in the remainder of this section investigates this hypothesis further.

7.5.2 Finding 2

Finding: The VANESSA diagrams are correct and complete

Justification: The VANESSA diagrams are compared with diagrams from other authoritative

sources, i.e. diagrams produced by Together and by the system experts and from the system

documentation. Where there are differences, they are explained. VANESSA often uncovers

information missed by the expert, or conversely highlights errors in the expert diagram.

This evidence is taken from the comparison of VANESSA’s diagrams with the expert’s

diagrams. The second BeautyJ documentation diagram is entitled Main Classes and

illustrates structural information about the main classes of the application (Figure 7.7).

175

Figure 7.7 The BeautyJ documentation main classes diagram

The VANESSA custom S2/S3 combo showing only these classes (Figure 7.8) matches the

documentation diagram (it is assumed that the interface SourcletOption in the expert’s

diagram is a typo for SourcletOptions), with one exception: the VANESSA diagram does not

include an association Sourclet SourcletOptions. Inspecting the JavaDoc for these

interfaces shows that there is indeed no structural dependency Sourclet SourcletOptions;

the only structural interaction between these interfaces is that Sourclet.init() takes an

argument of type SourcletOptions. VANESSA does not consider method arguments when

forming relationships. It could be that the expert has chosen to create a dependency due to

this argument, though this is not normal practice in UML.

176

Figure 7.8 A custom S2/S3 view of BeautyJ

The Together class diagram for shrimp.DisplayBean (Figure 7.9) is compared to the

corresponding VANESSA custom S2 view (Figure 7.10). The diagrams correspond entirely.

Figure 7.9 The Together class diagram for the shrimp.DisplayBean package

Figure 7.10 The custom S2 view of the shrimp.DisplayBean package

177

7.5.3 Finding 3

Finding: The range of abstraction levels was comprehensive and useful for addressing the

tasks

Justification: It was useful to be able to examine the systems at a level of abstraction

appropriate to the task

When examining JHotDraw, level 5 information was used to compare VANESSA’s

interpretation of the system’s business level behaviour with that of the expert. The expert’s

use case diagram is shown in Figure 7.11. The expert’s diagram is compared with the

VANESSA level 5 structural and behavioural diagrams (Figure 7.12). It is clear from these

diagrams that the VANESSA model contains more relationships than the expert’s model,

probably due to the expert’s model being a more idealised view of the system as it was

designed, whereas the VANESSA model is more precise.

Figure 7.11 The JHotDraw expert’s use case diagram

178

Figure 7.12 The S5 and B5 views of JavaDrawApp

It is interesting to note that the S5 and B5 VANESSA diagrams are identical and that both

are totally connected – i.e. every business entity has both structural and behavioural

dependencies on every other business entity. This may indicate that the concept of business

entities defined here as being entities derived by abstraction from basic software entities does

not accord with the conventional notion of a use case as accepted by most analysts. Another

possibility is that the component-use case mappings supplied by the JHotDraw expert were

too liberal in that they mapped each class to many components. A further possibility is that

this is a quirk of frameworks in general or of JHotDraw in particular.

Level 4 information was not used as none of the four systems examined were distributed,

either logically (across JVMs) or physically (across processors).

In addressing the BeautyJ task “How are BeautyJ's main components interfaced to each

other statically, and how do they interact dynamically at runtime?” level 3 information was

used. The expert’s first diagram for this question illustrates static behavioural relationships

(Figure 7.13). The B3 diagram is generated to show behavioural relationships between

components (Figure 7.14).

179

Figure 7.13 The expert’s diagram of the static behavioural relationships between the BeautyJ

components

The VANESSA diagram matches the expert’s diagram, except that the BeautyJ AMODA

relationship is reversed and there is an extra communication from SourceParser Sourclet.

Some discrepancies are to be expected as the VANESSA diagram is based on dynamic

information. (One of the expert’s subsequent diagrams confirms that there should indeed be

a relationship from AMODA BeautyJ as there is in the VANESSA diagram.)

180

Figure 7.14 The B3 view of BeautyJ

Level 2 information was used extensively in the analysis of ArgoUML, such as in addressing

the tasks in Cookbook section 5.2. For example, in addressing the task “How do I create a

new critique?”, the likely-sounding cognitive.critics package was investigated first by

generating a custom S2 view (Figure 7.15).

Figure 7.15 The custom S2 view of the cognitive.critics package

This package contains the Critic class and the CompoundCritic class which extends Critic.

However, it does not appear to contain any actual critics, which it is assumed would extend

one of these classes and have an appropriate name. As we are dealing with UML, the uml

package is examined and it is found that it also contains a cognitive.critics package. This

package is investigated by generating a custom S2 view. From this, we find that this package

181

contains 92 classes with names in the form CrXXXX, where XXXX is a potential problem

with a UML diagram, for example CrCircularInheritance, CrEmptyPackage, CrIllegalName.

All of these classes extend CrUML, which in turn extends cognitive.critics.Critic. To add a

new critic for UML a new class in uml.cognitive.critics would be created called, say,

CrMyCritic to comply with the naming convention, that extends CrUML.

Level 1 Behaviour information was used to examine the state changes occurring in

JHotDraw when addressing the following task (see Figure 7.6 in Finding 1, above).

“When debugging a JHotDraw application, it may be important to examine the internal state

of objects in the diagram. For example, in a class diagram application, a Figure object

representing a class would contain references to the attributes, operations, and associations

of the class it represents. In order to extract such information, it is necessary to investigate

the way in which an object’s state changes during the course of an execution.”

Level 1 Structure information was used to investigate the functionality of individual classes

in ArgoUML when addressing the task “How do I create a pluggable diagram?”.

application.api contains an interface called PluggableDiagram, for which a custom S1 view

is generated (Figure 7.16).

Figure 7.16 The custom S1 view of PluggableDiagram

This diagram shows that PluggableDiagram contains one method: JMenuItem

getDiagramMenuItem(). The S2 view of ArgoUML shows that PluggableDiagram is

implemented by only one class: DiagramHelper. DiagramHelper extends ui.ArgoDiagram. It

appears that pluggable diagrams are diagram classes that implement PluggableDiagram.

Therefore, the new class would extend an appropriate parent class, such as

uml.diagram.ui.UMLDiagram for a new type of UML diagram, and also implement

PluggableDiagram.

182

7.5.4. Finding 4

Finding: The ability to navigate between abstraction levels was useful in addressing the tasks

Justification: It was useful to be able to drill down to show more detail, and to move up the

abstraction hierarchy to examine context

In examining BeautyJ, drilling down was used from the level 3 diagram showing the

component behaviour generated while addressing the task “How are BeautyJ's main

components interfaced to each other statically, and how do they interact dynamically at

runtime?” (see Figure 7.14 in Finding 3, above) to show the class structure (level 2) of the

javasource package (Figure 7.17) in order to address the task “Which steps need to be taken

to make BeautyJ capable of handling new features of the Java 1.5 language?”. It is apparent

from the S2 view of BeautyJ that the classes of the util.javasource package are used to

represent parsed Java code in BeautyJ; hence, changes to the language would be

accommodated in this package.

183

Fi

gure

7.1

7 A

cus

tom

S2

view

of B

eaut

yJ

18

4

In the case of ArgoUML, the analyst moved up the abstraction hierarchy from the S1 view of

application.api.PluggableDiagram generated while addressing the task “How do I create a

pluggable diagram?” (see Figure 7.16 in Finding 3, above) to the S2 view of the

application.api package (Figure 7.18) in order to address the task “How do I create a new

pluggable type?”.

Figure 7.18 The custom S2 view of the Pluggable types from application.api

Figure 7.18 illustrates the Pluggable interface and eight classes that implement it. It appears

that to create a new pluggable type a new class that implements Pluggable must be created,

called say PluggableNew to comply with the naming conventions. The cookbook explains

that calls must also be added to the new pluggable type in the context in which it is to be

used – this involves editing method internals. This would only be known to someone with an

in-depth understanding of the system.

7.5.5 Finding 5

Finding: The ability to combine abstraction levels was useful in addressing the tasks

Justification: It was useful to be able to display more detailed or contextual information in a

single view


other statically, and how do they interact dynamically at runtime?” combined S2/S3

185

diagrams were used to illustrate the class and component dependencies (see Figure 7.8 in

Finding 2, above).

7.5.6 Finding 6

Finding: The structural and behavioural facets were useful in addressing the tasks

Justification: It was useful to be able to focus the analysis on the relevant facet of the system.

In the analysis of ArgoUML, it was useful to be able to focus the analysis on the structural

facet as the tasks mainly involved investigating the existing functionality available, rather

than the system’s behaviour. For example, when comparing the VANESSA diagram to the

expert’s diagram to explore the concept of multi editor panes, a custom S2 view of the

relevant classes was used (Figure 7.19).

Figure 7.19 The custom S2 view of the multi editor pane classes

In addressing the BeautyJ task “How are textual fragments of Javadoc documentation

automatically generated by the StandardSourclet?”, it was useful to be able to focus the

analysis on the behavioural facet, as we were concerned solely with the existing behaviour of

the system as implemented. The VANESSA custom B2 view of StandardSourclet is shown

in Figure 7.20. This figure shows the methods executed by StandardSourclet during the

output generation process, and the order in which they occur. It would be trivial to present

this information in another form, such as a sequence diagram.

186

Fi

gure

7.2

0 Th

e cu

stom

B2

view

of S

tand

ardS

ourc

let’s

inte

ract

ions

18

7

7.5.7 Finding 7

Finding: The ability to combine facets was useful in addressing the tasks

Justification: It was useful to be able to be able to show both structural and behavioural

information in the same diagram


other statically, and how do they interact dynamically at runtime?”, level 3 information from

both facets was combined to illustrate the structural and behavioural interactions between the

components of BeautyJ (Figure 7.21).

Figure 7.21 The combined S3/B3 view of BeautyJ

188

7.5.8 Finding 8

Finding: The use of static and dynamic analyses was useful in addressing the tasks

Justification: It was useful to have information about the entire system from static analysis,

and more detailed information about specific parts of interest from dynamic analysis

The statically generated VANESSA S2 view, generated when addressing the BeautyJ task

“Which steps need to be taken to make BeautyJ capable of handling new features of the Java

1.5 language?” (see Figure 7.17 in Finding 4, above) shows the de.gulden.util.javasource

package from BeautyJ. This is useful as it allows the entire functionality available to be

examined. In contrast, when comparing the VANESSA and Together diagrams for SHriMP,

it can be seen that the VANESSA B2 view for GXLPersistentStorageBean.loadData()

(Figure 7.23) is more precise than the corresponding sequence diagram produced by

Together (Figure 7.22), in that it refers to the specific objects accessed during the execution,

rather than to their superclasses as the static Together diagram does. The VANESSA

diagram matches the Together diagram entirely, except for two missing calls in the

VANESSA diagram. GXLPersistentStorageBean GXLPersistentStorageBean.getNextID()

and GXLPersistentStorageBean GenericRigiArc.setCustomizedData() are absent as they

are contained within conditionals that were evidently not executed in the trace. This is useful

as the dynamic diagram can be used to see which parts of the static diagram are executed in

the trace.

189

Figure 7.22 The Together sequence diagram for GXLPersistentStorageBean.loadData()

190

Fi

gure

7.2

3 A

cus

tom

B2

view

of S

HriM

P

19

1

7.5.9 Finding 9

Finding: More diagram types may be useful

Justification: Some information is better displayed using diagrams other than basic graphs

Tasks that would benefit from a more explicit ordering of the answer, such as the following

JHotDraw task, would be better expressed using a representation ordered explicitly by time,

such as a sequence diagram:

“A common problem in JHotDraw applications is the display not being updated as desired

when a change is made to the model. For example, attempting to move a box (Figure) in an

organisation chart application may not be reflected in the display. To understand this

problem, it is necessary to investigate the redraw mechanism of JHotDraw. The redraw

mechanism is an interaction consisting of a sequence of object collaborations.”

Part of the VANESSA B2 view is shown in Figure 7.24. In this diagram, time is indicated by

the numbers after the method calls, which indicate their ordering. As the ordering of the

edges in the diagram is arbitrary (optimised to avoid node/edge overlap, point edges in the

same direction, minimise edge crossings, and reduce edge length), the order in which the

method invocations occurred is not immediately obvious. In a time-ordered diagram, the

edges would be drawn in order of their numbers, thus making the ordering obvious to the

viewer. This would, of course, be an alternative to the present layout and as such would not

feature the overlap-avoidance, monodirectionality, and minimal edge crossings and edge

lengths of the current layout. Hence, it is important to select a layout that is appropriate both

for the data being displayed and the purpose for which the visualisation is being produced.

The generic nature of the model and the modular implementation of VANESSA make

plugging in alternative visualisations, such as UML diagrams, a straightforward process.

192

Figure 7.24 A part of the custom B2 view

7.5.10 Finding 10

Finding: Levels 1-3 were used most commonly. Level 5 was used less often. Level 4 was not

used.

Justification: The frequency of employing each level in addressing the tasks.

This is apparent from the detail in the evaluation logbook in Appendix G. Table 7.1

quantifies this.

193

Table 7.1 Instances of usage of each of the five abstraction levels of the model in addressing the

comprehension tasks

Instances of usage11 Abstraction

level Replication12 Comprehension

tasks13

Diagram

comparisons14 Total

5 - - 1 1

4 - - - -

3 2 3 8 13

2 7 26 12 45

1 2 19 - 21

Figure 7.25 illustrates these results.

0

10

20

30

40

50

Number of tasks

1 2 3 4 5

Abstraction level

Replication Comprehension tasks Diagram comparisons Total

Figure 7.25 Illustration of abstraction level usage

11 Count of how many tasks each abstraction level was useful in addressing. Hence, where an abstraction level was used more than once in addressing a particular task, this only contributes one to its score. Where a combination was used, this contributes one to the score of each participant view 12 The replication of the original study using JHotDraw 13 The comprehension tasks provided by the system experts in the case of BeautyJ and SHriMP, and from the Cookbook for ArgoUML 14 The diagram comparisons of the VANESSA diagrams with diagrams from the authoritative sources (documentation or system expert); the comparisons with Together diagrams are counted in the ‘Comprehension tasks’ column as these pertain specifically to tasks G1 and G2.

194

7.5.11 Finding 11

Finding: The model proposed is capable of addressing the full range of software

comprehension tasks.

Justification: VANESSA was used successfully to address tasks representative of the full

range of software comprehension tasks.

VANESSA was able to address all of the comprehension tasks posed by the system experts

(the Cookbook in the case of ArgoUML) and by the diagrams from the documentation and

Together, except for two ArgoUML tasks concerning a non-code artefact.

The categorisation of the evaluation questions according to the typical software

comprehension questions identified earlier is given in the following tables.

Table 7.2 Categorisation of BeautyJ evaluation questions by typical software comprehension

questions

Evaluation question Typical question

Expert 1 G4

Expert 2 G7

Expert 3 S6

Expert 4 S1, S4

Expert 5 S4

Documentation 1 G3

Documentation 2 G1

Documentation 3 G1

Documentation 4 G1

Together 1 G1

Together 2 G2

This table shows that the set of tasks used in the BeautyJ application exercises almost all of

the typical comprehension questions. Only G5 (identifying patterns of repeated behaviour,

which VANESSA does not support), and G6 (component runtime load, which is supported

by VANESSA) were not present.

195

Table 7.3 Categorisation of SHriMP evaluation questions by typical software comprehension

questions


Expert 1 G3/G4

Expert 2 S4

Together 1 G1

Together 2 G2

This table shows that half of the typical software comprehension questions were addressed

by the tasks performed in the SHriMP evaluation. This was due to the low number of

questions, which was due primarily to the lack of documentation similar to that provided for

the other three systems and the time constraints of the system expert.

Table 7.4 Categorisation of ArgoUML evaluation questions by typical software comprehension

questions


Cookbook 4.4 G3/G4

Cookbook 4.5 G3/G4

Cookbook 4.6 G3/G4

Cookbook 5.1.3.2.1 S4



Cookbook 5.1.5 i S4

Cookbook 5.1.5 ii S4

Cookbook 5.1.5 iii S4

Cookbook 5.2.1 G1

Cookbook 5.2.2 i S4


Cookbook 5.2.3 G1

Cookbook 5.3.1 G1

Cookbook 5.3.1.1 S4

Cookbook 5.3.2 S4

Cookbook 5.3.3 S4

Cookbook 5.4.1 S4

Cookbook 5.9 G3/G4

196

Cookbook 5.11.2.1 i S4

Cookbook 5.11.2.1 ii S4

Cookbook 5.17.4 S4

Cookbook 6.1.1 i S4


Cookbook 6.2.2.6 S4

Cookbook 6.2.3.2 i S4

Cookbook 6.2.3.2 ii S4

Cookbook 6.2.3.2 iii S4

Together 1 G1

G2 Together 2

This table shows that the comprehension questions in the ArgoUML study exercise half of

the typical software comprehension questions. The focus on S4 (“Where is the functionality

required to implement a solution located in the software system?”) questions is to be

expected given that the source of the questions was a developers’ handbook intended to

explain how to reuse, extend, and interoperate with the application.

7.5.12 Miscellaneous issues

The VANESSA diagrams were verified by comparison with diagrams from a reliable source

(Together). VANESSA uncovered multiple errors in the expert-supplied information for the

two most thoroughly-analysed systems (BeautyJ and ArgoUML). In addition to these errors,

there were discrepancies between some diagrams that were not errors on the part of

VANESSA or the expert. In these cases, the expert had chosen to represent a simpler, purer,

or ‘as-designed’ version of the system, while VANESSA represents a more detailed view

with all information that describes the system as implemented. This raises the question: is it

more useful to be precise or to highlight the important relationships? For example, the

representation of arguments as associations and the omission of minor relationships for

BeautyJ, discrepancies in component diagrams for the other systems, and discrepancies in

the use case diagram for JHotDraw. As only a human can properly decide what is important,

it is better to be precise and then allow the human analyst to abstract further by generating

their own custom diagrams for their own particular purposes, rather than automatically

making potentially misleading assumptions.

197

In addition to the types of discrepancies described above, there were also some discrepancies

that were attributable to the greater precision of the VANESSA behavioural diagrams due to

their basis on dynamic rather than static information.

Some diagram discrepancies were due to the use of unparameterised collections – the system

expert would indicate a composition between the class holding the reference and the class

that he intends to be stored in that collection. However, in practice any type of object can be

stored in an unparameterised collection, and VANESSA quite correctly represents such

dependencies as a composition between the class holding the reference and the collection

class. If parameterised collections were used VANESSA would indicate a dependency to the

parameterised type.

As is the practice in UML, VANESSA does not represent method arguments as

dependencies between classes (only the BeautyJ expert chose to do so). This is also the

approach taken by Together.

There is occasional uncertainty in attributing the precise reason for some discrepancies

between the VANESSA and expert diagrams. This is inherent in the approach as one cannot

predict what the expert intended when he committed the error. The most likely explanation

for the discrepancy is offered.

JavaDoc was used only to verify hypotheses, not to gather new information about the

systems. Explanatory comments in the JavaDoc from the developers were not referred to – it

was used solely as an alternative to browsing the source code directly in order to verify

results (e.g. check the existence of a particular instance variable, ensure VANESSA has

covered all available methods).

A potential weakness in the VANESSA approach is the dependence on expert-supplied

mappings for some systems. This was particularly evident in the analysis of ArgoUML when

only partial class-component mappings were available – the information in the Cookbook

was incomplete and did not indicate which classes should be mapped to a number of

components. This issue can often be circumvented by determining the mappings from the

package structure, as was done for BeautyJ, or source code naming conventions.

Component-use case mappings can always be determined by tracing of individual use cases.

198

7.5.13 Conclusions

The first conclusion that can be drawn from this evaluation is that the model verifies the

conclusion from the original study described in Chapter 3 that a model combining a range of

abstraction levels, structural and behavioural views, and statically and dynamically extracted

information would be useful in addressing almost all of the typical software comprehension

tasks in that study. This was ratified through the replication of the original study.

The second conclusion that can be stated, indeed the principal goal of this evaluation, is that

the research hypothesis stated in Section 4.2 has been validated: the model provides

effective support for the full range of software comprehension tasks. This has been

demonstrated though the evaluation of the model using four real world systems of various

types and sizes and real comprehension questions supplied by real life software maintainers.

Furthermore, these questions exercise the full range of typical software comprehension tasks

identified in Chapter 5.

199

8 Conclusions

“We do not need to have an infinity of different machines doing different jobs. A single one

will suffice”

A M Turing [Turing 1948]

8.1 Summary

This thesis has presented a novel software visualisation model consisting of multiple levels

of abstraction, structural and behavioural perspectives, and the integration of statically and

dynamically extracted information that addresses the full range of software comprehension

tasks. Related work in the fields of software visualisation, tool evaluation, abstraction,

diagrams, views, exploration and querying, metamodels, and software modelling was

discussed. An initial case study that prompted the development of the novel model was

described. The model was introduced and assessed theoretically against its original goals,

and its support for software comprehension strategies was examined. Abstraction operations

between views in the model and the combination of views were defined formally. A

demonstration of the application of the model to a real system was presented. VANESSA, a

tool implementation of the model, was introduced. VANESSA was then used to evaluate the

utility of the model in addressing typical software comprehension tasks in real world

software systems. This section draws conclusions from the thesis and presents suggestions

for future work.

200

8.2 Conclusions

The foregoing evaluation has demonstrated the veracity of the hypothesis stated in this

thesis, namely that:

A model that supports visualisation of software through a range of abstraction levels

that incorporate structural and behavioural views and integrates statically and

dynamically extracted information provides effective support for the full range of


The contributions of this thesis are as follows: an abstraction scale and set of criteria for

classifying software comprehension tools; a thorough review and comparison of the extant

software visualisation tools; typical software comprehension activities and tasks to be used

in the evaluation of software comprehension tools; a schema for categorising view

arrangements in software engineering tools; the findings of an initial study assessing the

capabilities of the extant software visualisation tools using typical software comprehension

tasks; the novel software visualisation model based on a range of abstraction levels and

structural and behavioural perspectives; a prototype implementation of the model as the

VANESSA tool; and the findings of the evaluation of this model using real software

comprehension tasks and real systems.

8.3 Future work

The use of the typical comprehension questions defined in this thesis in future evaluations

would provide a common basis for real-world evaluation, and hence allow the results of such

studies to be compared objectively. Such studies would also validate the utility of the

Further evaluation of the model with additional systems would provide more evidence

regarding its support for software comprehension. Empirical studies with users would

provide a perspective on how easy analysts find it to use the model to achieve their

comprehension goals. An evaluation of the novel model involving industrial users would

provide valuable data as the motivation for this work was the lack of use of software

visualisation in industry. It would also be interesting to investigate in more detail which

software comprehension strategies are best supported by the model.

201

question sets. Further empirical studies to compare software visualisation tools, like the

initial study described in Chapter 3 of this thesis, would provide useful information on the

relative merits of such tools.

A further evaluation possibility would be to evaluate VANESSA using even larger systems,

such as Eclipse (11,548 types), to assess how well the benefits to comprehension

demonstrated in the evaluation scale to the very largest systems. As with the model itself, an

evaluation employing real users would provide an insight into how usable the tool was in

practice by those who were unfamiliar with it.

A number of enhancements could be made to VANESSA to expand its functionality. Firstly,

the model could be stored on disk in a database, rather than entirely in memory, to improve

performance and aid scalability. The Metadata Repository format would be suitable for this

task [MDR 2005]. Use of this repository would also allow VANESSA to exchange models in

a standard XML format. Secondly, reading and writing traces in a standard (compressed)

format would allow further interoperability and aid scalability. Thirdly, the implementation

of new diagram types, such as UML and SHriMP views, would provide alternative views of

the model data. The model was designed from the start to allow alternative views to be

plugged in. The implementation of VANESSA makes it simple to plug in such alternative

views. Fourthly, integrating the graph rendering functionality into VANESSA, rather than

relying on external viewers such as dotty, would improve user interaction with the graphs

and simplify navigation of the model. A framework such as JUNG [O’Madadhain 2005] or

prefuse [Heer 2004] would be useful for this purpose. Fifthly, incorporating some standard

pattern detection algorithms into VANESSA would allow it to detect common patterns in the

model and exploit these with appropriate visual cues (cf. Jinsight). The addition of pattern

detection to the current implementation would simply be a matter of implementing standard

algorithms to operate on the model data, which is conveniently accessible.

One of the principal barriers to the uptake of state of the art software engineering tools in

industry is lack of integration with a common development environment. The motivation for

this research was the lack of use of software visualisation in industry despite its apparent

benefits, which have been demonstrated in this thesis. A potential enhancement to

VANESSA to encourage usage would be integration with a standard environment, such as

Eclipse [Eclipse 2005].

202

References

[Addanki 2001] S Addanki, R Cremonini, J Penberthy, ‘Graphs of models’, Artificial

Intelligence, 51:145-177, 1991

[Aho 1986] A V Aho, R Sethi, J D Ullman, Compilers: Principles, Techniques,

and Tools, Reading, MA: Addison-Wesley, 1986

[Alexandridis 1986] N A Alexandridis, ‘Adaptable hardware and software: problems and

solutions’, IEEE Computer, 19(2):29-39, 1986

[Baldonado 2000] M Q W Baldonado, A Woodruff, A Kuchinsky, ‘Guidelines for

using multiple views in information visualization’ in Proceedings of

the Working Conference on Advanced Visual Interfaces (AVI),

Palermo, pp. 110-119, New York, NY: ACM Press, 2000

[ANSI 1998] ANSI, Information Systems – Database Language – SQL,

Document # ANSI INCITS 135-1992 (R1998), Washington,

DC: ANSI, 1998

[Apache 2004] Apache Software Foundation, Xalan-Java,

http://xml.apache.org/xalan-j/, 2004

[ArgoUML 2005] ArgoUML team, ArgoUML project home, http://argouml.tigris.org/,

2005

[Armstrong 1998] M N Armstrong, C Trudeau, ‘Evaluating architectural extractors’ in

Proceedings of the 5th Working Conference on Reverse Engineering

(WCRE), Honolulu, HA, pp. 30-39, Los Alamitos, CA: IEEE

Computer Society Press, 1998

[Arnold 2000] K Arnold, J Gosling, D Holmes, The Java Programming Language,

3rd edition, Boston, MA: Addison Wesley, 2000

[Arnold 2003] M Arnold, W De Pauw, ‘Websight: visualizing the execution of web

services’, demonstration at 1st ACM Symposium on Software

Visualization, San Diego, CA, 2003

[Baker 1994] M J Baker, S G Eick, ‘Visualizing software systems’ in Proceedings

of the 16th International Conference on Software Engineering

(ICSE), Sorrento, pp. 59-67, Los Alamitos, CA: IEEE Computer

Society Press, 1994

203

[Ball 1994] T Ball, S G Eick, ‘Visualizing program slices’ in Proceedings of the

IEEE Computer Society Symposium on Visual Languages (VL), St.

Louis, MO, pp. 288-295, Los Alamitos, CA: IEEE Computer

Society Press, 1994

[Ball 1996] Ball T, Eick S G, ‘Software visualization in the large’, IEEE

Computer, 29(4):33-43, 1996

[Basili 1996] V Basili, ‘Editorial’, Empirical Software Engineering, 1(2):105-108,

1996

[Bassil 2001b] S Bassil, R K Keller, ‘A qualitative and quantitative evaluation of

software visualization tools’ in Proceedings of the Workshop on

Software Visualization, 23

Engineering (ICSE), Toronto, ON, pp. 33-37, Los Alamitos, CA:

IEEE Computer Society Press, 2001

[Bassil 2001a] S Bassil, R K Keller, ‘Software visualization tools: survey and

analysis’ in Proceedings of the 9th International Workshop on

Program Comprehension (IWPC), Toronto, ON, pp. 7-17, Los

Alamitos, CA: IEEE Computer Society Press, 2001

rd International Conference on Software

[Beck 1994] K Beck, R Johnson, ‘Patterns generate architectures’, in

Proceedings of the 8th European Conference on Object-Oriented

Programming (ECOOP), Bologna, Lecture Notes in Computer

Science 821, pp. 139-149, Berlin: Springer-Verlag, 1994

[Becker 1995] R A Becker, S G Eick, A R Wilks, ‘Visualizing network data’, IEEE

Transactions on Visualization and Computer Graphics, 1(1):16-28,

1995

[Bellay 1997] B Bellay, H Gall, ‘A comparison of four reverse engineering tools’

in Proceedings of the 4th Working Conference on Reverse

Engineering (WCRE), Amsterdam, pp. 2-11, Los Alamitos, CA:


[Bellay 1998] B Bellay, H Gall, ‘An evaluation of reverse engineering tool

capabilities’, Journal of Software Maintenance: Research and

Practice, 10(5):305-331, 1998

[Berard 1993] Berard E V, ‘Abstraction, encapsulation, and information hiding’ in

E Berard, Essays on Object-Oriented Software Engineering, Vol. 1,

Englewood Cliffs, NJ: Prentice-Hall, 1993

204

[Bergey 1999] J Bergey, D Smith, N Weiderman, S Woods, Options Analysis for

Reengineering (OAR): Issues and Conceptual Approach, Technical

Note CMU/SEI-99-TN-014, Carnegie Mellon Software Engineering

Institute, 1999

[Bloch 2001] J Bloch, Effective Java: Programming Language Guide, Cambridge,

MA: Addison-Wesley, 2001, Item 17

[Brady 1974] M Brady, The Monopoly Book: Strategy and Tactics of the World’s

Most Popular Game, New York, NY: David McKay, 1974

[Burd 2002] E Burd, D Overy, A Wheetman, ‘Evaluating using animation to

improve understanding of sequence diagrams’ in Proceedings of the

10 nternational Workshop on Program Comprehension (IWPC),

[Bertuli 2003] R Bertuli, S Ducasse, M Lanza, ‘Run-time information visualization

for understanding object-oriented systems’, paper presented at 4th

International Workshop on Object-Oriented Reengineering,

Darmstadt, 2003

[Booch 1994] G Booch, Object-Oriented Design with Applications, 2nd ed.,

Redwood City, CA: Benjamin Cummings, 1994

[Borland 2004a] Borland, Borland Together, http://www.borland.com/together/, 2004

[Borland 2004b] Borland, Together Technologies: Simplify and Accelerate the

Success of your Applications,

http://www.borland.com/together/index.html, 2004

[Boyer 1977] R Boyer, J Moore, ‘A fast string searching algorithm’,

Communications of the ACM, 20(10):762-772, 1977

[Brant 1998] J Brant, B Foote, R E Johnson, D Roberts, ‘Wrappers to the rescue’

in Proceedings of the 12th European Conference on Object-

Oriented Programming (ECOOP), Brussels, Lecture Notes in

Computer Science 1445, pp. 396-417, Berlin: Springer-Verlag, 1998

[Brooks 1983] R Brooks, ‘Towards a theory of the comprehension of computer

programs’, International Journal of Man-Machine Studies, 18:543-

554, 1983

[Bryman 1988] A Bryman, Quantity and Quality in Social Research, London:

Unwin Hyman, 1988, pp. 30-31

[Budd 1987] T Budd, A Little Smalltalk, Reading, MA: Addison-Wesley, 1987

[Buhr 1996] R Buhr, R Casselman, Use Case Maps for Object-Oriented Systems,


th I

205

Paris, pp. 107-113, Los Alamitos, CA: IEEE Computer Society

Press, 2002

[Chen 1990] Y-F Chen, M Y Nishimoto, C V Ramamoorthy, ‘The C information

abstraction system’, IEEE Transactions on Software Engineering,

16(3):325-334, 1990

[Burkhardt 1997] J-M Burkhardt, F Détienne, S Wiedenbeck, ‘Mental representations

constructed by experts and novices in object-oriented program

comprehension’ in Proceedings of the 6th IFIP International

Conference on Human-Computer Interaction (INTERACT), Sydney,

NSW, pp. 339-346, Amsterdam: North Holland, 1997

[Burkhardt 1998] J-M Burkhardt, F Détienne, S Wiedenbeck, ‘The effect of object-

oriented programming expertise in several dimensions of

comprehension strategies’ in Proceedings of the 6th International

Workshop on Program Comprehension (IWPC), Ischia, pp. 82-89,

Los Alamitos, CA: IEEE Computer Society Press, 1998

[Campbell 1993] R H Campbell, N Islam, D Raila, P Madany, ‘Designing and

implementing Choices: an object-oriented system in C++’,


[Chan 2003] K Chan, Z C L Liang, A Michail, ‘Design recovery of interactive

graphical applications’ in Proceedings of the 25th International

Conference on Software Engineering (ICSE), Portland, OR, pp.

114-124, Los Alamitos, CA: IEEE Computer Society Press, 2003

[Chase 1996] M P Chase, D R Harris, S N Roberts, A S Yeh, ‘Analysis and

presentation of recovered software architectures’ in Proceedings of

the 3rd Working Conference on Reverse Engineering, Monterey, CA,

pp. 153-162, Los Alamitos, CA: IEEE Computer Society Press,

1996

[Chen 1977] P P Chen, The Entity-Relationship Approach to Logical Database

Design, Wellesley, MA: QED Information Sciences, 1977

[Chidamber 1994] S R Chidamber, C F Kemerer, ‘A metrics suite for object-oriented

design’, IEEE Transactions Software Engineering, 20(6):476-493,

1994

[Chikofsky 1990] E J Chikofsky, J H Cross II, ‘Reverse engineering and design

recovery: a taxonomy’, IEEE Software, 7(1):13-17, 1990

206

[CHISEL 2005] The CHISEL Group, SHriMP Suite,

http://sourceforge.net/projects/chiselgroup/, 2005

[Chuah 1997] M C Chuah, S G Eick, ‘Glyphs for software visualization’ in

Proceedings of the 5th International Workshop on Program

Comprehension (IWPC), Dearborn, MI, pp. 183-191, Los Alamitos,

CA: IEEE Computer Society Press, 1997

[Citrin 1995] W Citrin, A Cockburn, J von Kanel, R Hauser, ‘Using formalised

temporal message-flow diagrams’, Software – Practice and

Experience, 25(12):1367-1401, 1995

[Clark 1976] J H Clark, ‘Hierarchical geometric models for visible surface

algorithms’, Communications of the ACM, 19(10):547-554, 1976

[Codenie 1997] W Codenie, K De Hondt, P Steyaert, A Vercammen, ‘From custom

applications to domain-specific frameworks’, Communications of

the ACM, 40(10):70-77, 1997

[Consens 1993] M Consens, A Mendelzon, ‘Hy+: a hypergraph-based query and

visualization system’ in Proceedings of the ACM SIGMOD

International Conference on Management of Data, Washington,

DC, SIGMOD Record 22(2):511-516, New York, NY: ACM Press,

1993

[Cook 1995] J E Cook, A L Wolf, ‘Automated process discovery through event-

data analysis’ in Proceedings of the 17th International Conference

on Software Engineering (ICSE), Seattle, WA, pp. 73-82, Los


[Coplien 1995] J O Coplien, D C Schmidt, Pattern Languages of Program Design,

Reading, MA: Addison-Wesley, 1995

[Corritore 1999] C L Corritore, S Wiedenbeck, ‘Mental representations of expert

procedural and object-oriented programmers in a software

maintenance task’, International Journal of Human-Computer

Studies, 50(1):61-83, 1999

[Corritore 2000] C L Corritore, S Wiedenbeck, ‘Direction and scope of

comprehension-related activities by procedural and object-oriented

programmers: an empirical study’ in Proceedings of the 8th

International Workshop on Program Comprehension (IWPC),

Limerick, pp. 139-148, Los Alamitos, CA: IEEE Computer Society

Press, 2000

207

[Cox 1996] K C Cox, S G Eick, T He, ‘3D geographic network displays’,

SIGMOD Record, 25(4):50-54, 1996

[Cross 1992] J H Cross II, E J Chikofsky, C H May Jr., ‘Reverse engineering’,

Advances in Computers, 35:199-254, 1992

[Cytron 1991] R Cytron, J Ferrante, B K Rosen, M N Wegman, F K Zadeck,

‘Efficiently computing static single assignment form and the control

dependence graph’, ACM Transactions on Programming Languages

and Systems, 13(4):451-490, 1991

[De Hondt 1998] K De Hondt, A Novel Approach to Architectural Recovery in

Evolving Object-Oriented Systems, PhD thesis, Brussels: Vrije

Universiteit Brussel, 1998

[De Pauw 1993] W De Pauw, R Helm, D Kimelman, J Vlissides, ‘Visualizing the

behaviour of object-oriented systems’ in Proceedings of the 8th

Conference on Object-Oriented Programming, Systems, Languages,

and Applications (OOPSLA), Washington, DC, pp. 326-337, New

York, NY: ACM Press, 1993

[De Pauw 1994] W De Pauw, D Kimelman, J Vlissides, ‘Modelling object-oriented

program execution’ in Proceedings of the 8th European Conference

on Object-Oriented Programming (ECOOP), Bologna, Lecture

Notes in Computer Science 821, pp. 163-182, Berlin: Springer-

Verlag, 1994

[De Pauw 1998] W De Pauw, D Lorenz, J Vlissides, M Wegman, ‘Execution patterns

in object-oriented visualization’ in Proceedings of the 4th USENIX

Conference on Object-Oriented Technologies and Systems

(COOTS), Santa Fe, NM, pp. 219-234, Berkeley, CA: USENIX

Association, 1998

[De Pauw 1999] W De Pauw, G Sevitsky, ‘Visualizing reference patterns for solving

memory leaks in Java’ in Proceedings of the 13th European

Conference on Object-Oriented Programming (ECOOP), Lisbon,

Lecture Notes in Computer Science 1628, pp. 116-134, Berlin:

Springer-Verlag, 1999

[De Pauw 2000] W De Pauw, G Sevitsky, ‘Visualizing reference patterns for solving

memory leaks in Java’, Concurrency: Practice and Experience,

12(14):1431-1454, 2000

208

[De Pauw 2001] W De Pauw, N Mitchell, M Robillard, G Sevitsky, H Srinivasan,

‘Drive-by analysis of running programs’, paper presented at

Workshop on Software Visualization, 23rd International Conference

on Software Engineering (ICSE), Toronto, ON, 2001

[De Pauw 2002] W De Pauw, E Jensen, N Mitchell, G Sevitsky, J Vlissides , J Yang,

‘Visualizing the execution of Java programs’ in Proceedings of the

International Seminar on Software Visualization, Dagstuhl Castle,

Wadern, pp. 151-162, Lecture Notes in Computer Science 2269,

Berlin: Springer-Verlag, 2002

[Dijkstra 1968] E W Dijkstra, ‘Cooperating sequential processes’ in F Genuys (ed.),

Programming Languages, New York, NY: Academic, 1968

[Ducasse 2001] S Ducasse, M Lanza, ‘Towards a methodology for the

understanding of object-oriented systems’, Techniques et Sciences

Informatiques, 20(4):539-566, 2001

[de Vaus 1996] D A de Vaus, Surveys in Social Research, 4th edition, London: UCL

Press, 1996, pp. 56-57

[Demeyer 1998] S Demeyer, ‘Analysis of overridden methods to infer hot spots’ in

Proceedings of the Workshop on Object-Oriented Technology, 12th

European Conference on Object-Oriented Programming (ECOOP),

Brussels, Lecture Notes in Computer Science 1543, pp. 66-67,

Berlin: Springer-Verlag, 1998

[Demeyer 1999] S Demeyer, S Ducasse, M Lanza, ‘A hybrid reverse engineering

approach combining metrics and program visualisation’ in


(WCRE), Atlanta, GA, pp. 175-186, Washington, DC: IEEE


[Ducasse 2000] S Ducasse, M Lanza, S Tichelaar, ‘MOOSE: an extensible

language-independent environment for reengineering object-

oriented systems’ in Proceedings of the 2nd International Symposium

on Constructing Software Engineering Tools (CoSET), International

Conference on Software Engineering (ICSE), Limerick, pp. 24-30,

Wollongong, NSW: School of Information Technology and

Computer Science, University of Wollongong, 2000

[Ducasse 2004] S Ducasse, M Lanza, R Bertuli, ‘High-level polymetric views of

condensed run-time information’ in Proceedings of the 8th

209

Euromicro Working Conference on Software Maintenance and

Reengineering (CSMR), Tampere, pp. 309-318, Los Alamitos, CA:


[Ducasse 2005] S Ducasse, M Lanza, ‘The class blueprint: visually supporting the

understanding of classes’, IEEE Transactions on Software

Engineering, 31(1):1-16, 2005

[Eclipse 2005] Eclipse Foundation, Eclipse.org Main Page, http://eclipse.org/, 2005

[Eick 1992] S G Eick, J L Steffen, E E Sumner Jr., ‘Seesoft – a tool for

visualizing line oriented software statistics’, IEEE Transactions on

Software Engineering, 18(11):957-968, 1992

[Eick 1993] S G Eick, G J Wills, ‘Navigating large networks with hierarchies’ in

Proceedings of the 4 Visualization, San Jose, CA,


1993

[Eick 2002] S G Eick, T L Graves, A F Karr, A Mockus, P Schuster,

‘Visualizing software changes’, IEEE Transactions on Software

Engineering, 28(4):396-412, 2002

[Einstein 1920] Einstein A, Relativity: The Special and General Theory, New York,

NY: Henry Holt and Company, 1920

[Fischer 2000] T Fischer, J Niere, L Torunski, A Zündorf, ‘Story diagrams: a new

graph rewrite language based on the Unified Modelling Language

and Java’ in Proceedings of the 6 al Workshop on

Theory and Application of Graph Transformations (TAGT),

th Conference on

[Falkenhainer 1991] B Falkenhainer, K Forbus, ‘Compositional modeling: finding the

right model for the job’, Artificial Intelligence, 51:95-143, 1991

[Fayad 1999] M E Fayad, D C Schmidt, R E Johnson, Building Application

Frameworks: Object-Oriented Foundations of Framework Design,

New York, NY: Wiley Computer Publishing, 1999

[Feiner 1985] S Feiner, ‘Apex: an experiment in the automated creation of

pictorial explanations’, IEEE Computer Graphics and Applications,

5(11):29-37, 1985

[Finnigan 1997] P J Finnigan, R C Holt, I Kalas, S Kerr, K Kontogiannis, H A

Müller, J Mylopoulos, S G Perelgut, M Stanley, K Wong, ‘The

software bookshelf’, IBM Systems Journal, 36(4):564-593, 1997

th Internation

210

Paderborn, 1998, Lecture Notes in Computer Science 1764, pp.

296-309, Berlin: Springer Verlag, 2000

[Fishwick 1988] P A Fishwick, ‘The role of process abstraction in simulation’, IEEE

Transactions on Systems, Man, and Cybernetics, 18(1):18-39, 1988

[Fujaba 2002] Fujaba Developer Team, Fujaba is a public domain Case Tool for

UML, http://www.fujaba.de/, 2002

[Gamma 1995] E Gamma, R Helm, R Johnson, J Vlissides, Design Patterns:

Elements of Reusable Object-Oriented Software, Boston, MA:

Addison-Wesley, 1995

[Garlan 1997] D Garlan, B Monroe, D Wile, ‘ACME: An interchange language for

software architecture’, 2 cal Report, Pittsburgh, PA:

Carnegie Mellon University, 1997

[Fishman 1967] G S Fishman, P J Kiviat, ‘The analysis of simulation generated time

series’, Management Science, 13(7):525-557, 1967

[Frantz 1995] F K Frantz, ‘A taxonomy of model abstraction techniques’ in

Proceedings of the 27th Winter Simulation Conference, Arlington,

VA, pp. 1413-1420, New York, NY: ACM Press, 1995

[Furnas 1986] G W Furnas, ‘Generalized fisheye views’ in Proceedings of the 4th

ACM Conference on Human Factors in Computing Systems, Boston,

MA, New York, NY: ACM Press, pp. 16-23, 1986

[Gamma 1998] E Gamma, T Eggenschwiler, JHotDraw 5.1,

http://members.pingnet.ch/gamma/JHD-5.1.zip, 1998

[Gansner 2002] E. Gansner, E. Koutsofios, and S. North, “Drawing Graphs with

dot”, http://www.graphviz.org/Documentation/dotguide.pdf , 2002

nd ed., Techni

[Gîrba 2005] T Gîrba, M Lanaza, S Ducasse, ‘Characterizing the evolution of

class hierarchies’ in Proceedings of the 9th European Conference on

Software Maintenance and Reengineering (CSMR), Manchester, pp.

2-11, 2005

[Globus 1995] A Globus, S P Uselton, ‘Evaluation of visualization software’, ACM

SIGGRAPH Computer Graphics, 29(2):41-44, 1995

[GNU 2004] GNU, GCC Home Page – GNU Project – Free Software

Foundation (FSF), http://gcc.gnu.org/, 2004

[Goldberg 1983] A J Goldberg, D Robson, Smalltalk-80: The Language and its

Implementation, Reading, MA: Addison-Wesley, 1983

211

[Gosling 2000] J Gosling, B Joy, G Steele, G Bracha, The Java Language

Specification, 2nd edition, Boston, MA: Addison Wesley, 2000

[Grass 1992] J E Grass, ‘Object-oriented design archaeology with CIA++’,

Computing Systems 5(1):5-67, 1992

[Grove 1997] D Grove, G DeFouw, J Dean, C Chambers, ‘Call graph construction

in object-oriented languages’ in Proceedings of the 12th ACM

SIGPLAN Conference on Object-Oriented Programming, Systems,

Languages, and Applications (OOPSLA), Atlanta, GA, pp. 108-124,

New York, NY: ACM Press, 1997

[Grundy 2000] J Grundy, J Hosking, ‘High-level static and dynamic visualisation of

software architectures’ in Proceedings of the 4 EE Symposium

on Visual Languages (VL), Halifax, NS, pp. 5-12, Los Alamitos,


[Harel 1990] D Harel, H Lachover, A Naamad, A Pnueli, M Politi, R Sherman, A

Shtull-Trauring, M Trakhtenbrot, ‘STATEMATE: a working

environment for the development of complex reactive systems’,

IEEE Transactions on Software Engineering, 16(4):403-414, 1990

[Hatley 1987] D J Hatley, I A Pirbhai, Strategies for Real-Time System

Specification, New York, NY: Dorset House, 1987

th IE

[Gulden 2004] J Gulden, BeautyJ – Java source code transformation tool,

http://beautyj.berlios.de/, 2004

[GUPRO 2004] GUPRO, GUPRO – Homepage, http://www.uni-

koblenz.de:8080/Uni/CampusKoblenz/Contrib/GUPRO/Site/Home,

2004

[Guttag 1977] J Guttag, ‘Abstract data types and the development of data

structures’, Communications of the ACM, 20(6):396-404, 1977

[Harel 1988] D Harel, ‘On visual formalisms’, Communications of the ACM,

31(5), May 1988

[Hatch 2001] A S Hatch, M P Smith, C M B Taylor, M Munro, ‘No silver bullet

for software visualisation evaluation’ in Proceedings of the

Workshop on Fundamental Issues of Visualization, International

Conference on Imaging Science, Systems, and Technology (CISST),

Las Vegas, NV, pp. 651-657, Athens, GA: CSREA Press, 2001

212

[Haynes 1995] P Haynes, T Menzies, R F Cohen, Visualisations of Large Object-

Oriented Systems, Technical Report TR 95-4, Melbourne, VIC:

Monash University, 1995

[Hilliard 1999] R Hilliard, ‘Using the UML for architectural description’ in

Proceedings of the 2 Conference on The Unified

Modeling Language (<<UML>>), Fort Collins, CO, Lecture Notes

in Computer Science 1723, Berlin: Springer, 1999

[Hooker 1996] R Hooker, Abstraction,

http://www.wsu.edu:8080/~dee/GLOSSARY/ABSTRACT.HTM,

1996

[Imagix 2004] Imagix Corporation, Imagix 4D,

http://www.imagix.com/products/products.html, 2004

[Heer 2004] Jeffrey Heer, prefuse: an interactive visualization toolkit,

http://prefuse.sourceforge.net/, 2004

[Henry 1993] S Henry, M Humphrey, ‘Object-oriented vs. procedural

programming languages: effectiveness in program maintenance’,

Journal of Object-Oriented Programming, 6(3):41-49, 1993

nd International

[Hoagland 1995] J Hoagland, Mkfunctmap Home Page,

http://seclab.cs.ucdavis.edu/~hoagland/mkfunctmap.html, 1995

[Hofmeister 1999a] C Hofmeister, R Nord, D Soni, Applied Software Architecture,


[Hofmeister 1999b] C Hofmeister, R L Nord, D Soni, ‘Describing software architecture

with UML’ in Proceedings of the 1st Working IFIP Conference on

Software Architecture (WICSA), San Antonio, TX, pp. 145-160,

Dordrecht: Kluwer Academic Publishers, 1999

[IBM 2003] IBM, VisualAge Smalltalk – Product Overview,

http://www-3.ibm.com/software/ad/smalltalk/, 2003

[IBM 2004a] IBM, VisualAge C++ – Product Overview – IBM Software,

http://www-306.ibm.com/software/awdtools/vacpp/, 2004

[IBM 2004b] IBM, Rational Software from IBM,

http://www-306.ibm.com/software/rational/, 2004

[Issarny 1998] V Issarny, T Saridakis, A Zarras, ‘Multi-view description of

software architectures’ in Proceedings of the 3rd International

Workshop on Software Architecture, Orlando, FL, pp. 81-84, New


213

[ITU-T 1996] International Telecommunications Union – Standardization (ITU-

T), ITU-T Recommendation Z.120: Message Sequence Chart

(MSC), Geneva: ITU-T, 1996

[Jerding 1997] D F Jerding, S Rugaber, ‘Using visualization for architectural

localization and extraction’ in Proceedings of the 4

Conference on Reverse Engineering (WCRE), Amsterdam, pp. 56-

65, Los Alamitos, CA: IEEE Computer Society Press, 1997

[Jacobson 1992] I Jacobson, M Christerson, P Johnson, G Overgaard, Object-

Oriented Software Engineering: A Use Case Driven Approach,


[Jahnke 2002] J H Jahnke, H A Müller, A Walenstein, N Mansurov, K Wong,

“Fused data-centric visualizations for software evolution

environments” in Proceedings of the 10th International Workshop on

Program Comprehension (IWPC), Paris, pp. 187-196, Los


[Jerding 1995] D F Jerding, J T Stasko, ‘The information mural: a technique for

displaying and navigating large information spaces’ in Proceedings

of the 1st Symposium on Information Visualization, Atlanta, GA, pp.


th Working

[Johnson 1992] R E Johnson, ‘Documenting frameworks using patterns’ in

Proceedings of the 7th Conference on Object-Oriented

Programming, Systems, Languages, and Applications (OOPSLA),

Vancouver, BC, pp. 63-76, New York, NY: ACM Press, 1992

[Kapoor 2001] R V Kapoor, E Stroulia, ‘Mathaino: simultaneous legacy interface

migration to multiple platforms’ in Proceedings of the 9th

International Conference on Human-Computer Interaction, New

Orleans, LA, Vol. 1, pp. 51-55, Mahwah, NJ: Lawrence Erlbaum

Associates, 2001

[Kazman 1994] R Kazman, L Bass, G Abowd, S M Webb, ‘SAAM: a method for

analysing the properties of software architectures’ in Proceedings of

the 16th International Conference on Software Engineering (ICSE),

Sorrento, pp. 81-90, Los Alamitos, CA: IEEE Computer Society

Press, 1994

[Kazman 1996] R Kazman, S J Carrière, ‘An adaptable software architecture for

rapidly creating information visualizations’, Proceedings of

214

Graphics Interface, Toronto, ON, pp. 17-27, San Francisco: Morgan

Kaufmann, 1996

[Kazman 1998] R Kazman, S J Carrière, ‘View extraction and view fusion in

architectural understanding’ in Proceedings of the 5th International

Conference on Software Reuse (ICSR), Victoria, BC, pp. 290-299,


[Kazman 1999] R Kazman, S J Carrière, ‘Playing detective: reconstructing software

architecture from available evidence’, Journal of Automated

Software Engineering, 6(2):107-138, 1999

[Keller 1999] R K Keller, R Schauer, S Robitaille, P Pagé, ‘Pattern-based reverse-

engineering of design components’ in Proceedings of the 21

International Conference on Software Engineering (ICSE), Los

Angeles, CA, pp. 226-235, Los Alamitos, CA: IEEE Computer

Society Press, 1999

st

[Keynes 1936] J M Keynes, The General Theory of Employment, Interest, and

Money, Cambridge: Macmillan Cambridge University Press, 1936

[Kirk 1986] J Kirk, M L Miller, Reliability and Validity in Qualitative Research,

Beverley Hills, CA: Sage, 1986, pp. 22-23

[Kirk 2001] D Kirk, M Roper, M Wood, Understanding Object-Oriented

Frameworks – An Exploratory Case Study, Technical Report

EFoCS-42-2001, Glasgow: Department of Computer and

Information Sciences, University of Strathclyde, 2001

[Kirsanov 1998] D Kirsanov, The Flesh and Soul of Information: The Origins of

Abstraction, http://webreference.com/dlab/9804/origins.html, 1998

[Knight 2000] C Knight, M Munro, ‘Virtual but visible software’ in Proceedings of

the International Conference on Information Visualisation (IV),

London, pp. 198-205, Los Alamitos, CA: IEEE Computer Society

Press, 2000

[Knight 2001] C Knight, ‘Visualisation effectiveness’ in Workshop on

Fundamental Issues of Visualization, Proceedings of the

International Conference on Imaging Science, Systems, and

Technology (CISST), Las Vegas, NV, pp. 639-643, Athens, GA:

CSREA Press, 2001

[Koenemann 1991] J Koenemann, S P Robertson, ‘Expert problem solving strategies for

software comprehension’ in Proceedings of the SIGCHI Conference

215

on Human Factors in Computing Systems, New Orleans, LA, pp.

125-130, New York, NY: ACM Press, 2001

[Kolence 1973] K W Kolence, ‘The software empiricist’, ACM SIGMETRICS

Performance Evaluation Review, 2(2):31-36, 1973

[Kollmann 2001] R Kollmann, M Gogolla, ‘Application of UML associations and

their adornments in design recovery’ in Proceedings of the 8th

Working Conference on Reverse Engineering (WCRE), Stuttgart,

pp. 81-90, Los Alamitos, CA: IEEE Computer Society Press, 2001

[Kollmann 2002a] R Kollmann, P Selonen, E Stroulia, T Systä, A Zündorf, ‘A study on

the current state of the art in tool-supported UML-based static

reverse engineering’ in Proceedings of the 9 e

on Reverse Engineering (WCRE), Richmond, VA, pp. 22-33, Los


[Koskimies 1995a] K Koskimies, H Mössenböck, ‘Scenario-based browsing of object-

oriented systems with Scene – Report 4’, Linz: Institut für

Informatik (Systemsoftware), Johannes Kepler Universität, 1995

[Koskimies 1996] K Koskimies, H Mössenböck, ‘Scene: using scenario diagrams and

active text for illustrating object-oriented programs’ in Proceedings

of the 18 International Conference on Software Engineering

(ICSE), Berlin, pp. 366-375, Los Alamitos, CA: IEEE Computer

Society Press, 1996

th Working Conferenc

[Kollmann 2002b] R Kollmann, M Gogolla, ‘Metric-based selective representation of

UML diagrams’ in Proceedings of the 6th European Conference on

Software Maintenance and Reengineering (CSMR), Budapest, pp.


[Korn 1999] J L Korn, Non-Exclusive Limited-Use Software – Chava,

http://www.research.att.com/sw/tools/chava/, 1999

[Koschke 2003] R Koschke, Bauhaus Stuttgart, http://www.bauhaus-stuttgart.de/,

2003

[Koskimies 1995b] K Koskimies, H Mössenböck, ‘Designing a framework by stepwise

generalization’ in Proceedings of the 5th European Software

Engineering Conference (ESEC), Barcelona, Lecture Notes in

Computer Science 989, pp. 479-497, Berlin: Springer-Verlag, 1995

th

[Koskimies 1998] K Koskimies, T Männistö, T Systä, J Tuomi, ‘Automated support

for modeling OO software’, IEEE Software, 15(1):87-94, 1998

216

[Koskinen 2001] J Koskinen, J Peltonen, P Selonen, T Systä, K Koskimies, ‘Towards

tool assisted UML development environments’ in Proceedings of

the 7 osium on Programming Language and Software Tools

(SPLST), Szeged, pp. 1-15, Szeged: University of Szeged, 2001

[Koutsofios 1996b] E Koutsofios, S C North, Drawing Graphs with dot, Murray Hill,

NJ: AT&T Bell Laboratories, 1996

[Lange 1997] D B Lange, Y Nakamura, ‘Object-oriented program tracing and

visualization’, IEEE Computer, 30(5):63-70, 1997

th Symp

[Koutsofios 1996a] A Koutsofios, S C North, “Editing Graphs with dotty”,

http://www.graphviz.org/Documentation/dottyguide.pdf, 1996.

[Krasner 1988] G E Krasner, S T Pope, ‘A cookbook for using the model-view-

controller user interface paradigm in Smalltalk-80’, Journal of

Object-Oriented Programming, 1(3):26-49, 1988

[Kruchten 1995] P B Kruchten, ‘The 4+1 View Model of Architecture’, IEEE

Software, 12(6):42-50, 1995

[Laffra 1994] C Laffra, A Malhotra, ‘HotWire – a visual debugger for C++’ in

Proceedings of the 6th USENIX C++ Technical Conference,

Cambridge, MA, pp. 109-122, Berkeley, CA: USENIX Association,

1994

[Lange 1995a] D B Lange, Y Nakamura, ‘Interactive visualization of design

patterns can help in framework understanding’ in Proceedings of the

10th Conference on Object-Oriented Programming, Systems,

Languages, and Applications (OOPSLA), Austin, TX, pp. 342-357,


[Lange 1995b] D B Lange, Y Nakamura, ‘Program Explorer: a program visualizer

for C++’ in Proceedings of the 1st USENIX Conference on Object-

Oriented Technologies (COOTS), Monterey, CA, pp. 39-54,

Berkeley, CA: USENIX Association, 1995

[Lanza 2001] M Lanza, S Ducasse, ‘A categorization of classes based on the

visualization of their internal structure: the class blueprint’ in

Proceedings of the 16th International Conference on Object-

Oriented Programs, Systems, Languages, and Applications

(OOPSLA), Tampa Bay, FL, pp. 300-311, New York, NY: ACM

Press, 2001

217

[Lanza 2002] M Lanza, S Ducasse, ‘Understanding software evolution using a

combination of software visualization and software metrics’ in

Proceedings of Languages et Models a Objets (LMO), pp. 135-149,

London: Hermes, 2002

[Lanza 2003a] M Lanza, S Ducasse, ‘Polymetric views – a lightweight visual

approach to reverse engineering’, IEEE Transactions on Software

Engineering, 29(9):782-795, 2003

[Lanza 2003b] M Lanza, CodeCrawler,

http://www.iam.unibe.ch/~scg/Research/CodeCrawler/, 2003

[Ledgard 1977] H F Ledgard, R W Taylor, ‘Two views of data abstraction’,


[Lincoln 1993] S E Lincoln, M J Daly, E S Lander, Constructing Genetic Linkage

Maps with MapMaker/EXP Version 3.0: A Tutorial and Reference

Manual, 3 ed., Technical Report, Cambridge, MA: Whitehead

Institute for Biomedical Research, 1993

[Lee 1996] K Lee, P A Fishwick, ‘Dynamic model abstraction’ in Proceedings

of the 28th Winter Simulation Conference, Coronado, CA, pp. 764-

771, New York, NY: ACM Press, 1996

[Lethbridge 2004] T C Lethbridge, S Tichelaar, E Ploedereder, ‘The Dagstuhl Middle

Metamodel: a schema for reverse engineering’ in Proceedings of the

1st International Workshop on Meta-Models and Schemas for

Reverse Engineering (ateM), Victoria, BC, ENTCS 94. pp. 7-18,

Amsterdam: Elsevier, 2004

[Letovsky 1986] S Letovsky, ‘Cognitive processes in program comprehension’ in

Proceedings of the First Workshop on Empirical Studies of

Programmers, Washington, DC, pp. 58-79, Norwood, NJ: Ablex,

1986

rd

[Linton 1992] M A Linton, P R Calder, J A Interrante, S Tank, J M Vlissides,

Interviews Reference Manual Version 3.1, Stanford, CA: Stanford

University, 1992

[Liskov 1986] B Liskov, J Guttag, Abstraction and Specification in Program

Development, Cambridge, MA: MIT Press, 1986

[Littman 1986] D Littman, J Pinto, S Letovsky, E Soloway, ‘Mental models and

software maintenance’ in Proceedings of the First Workshop on

218

Empirical Studies of Programmers, Washington, DC, pp. 80-98,

Norwood, NJ: Ablex, 1986

[Litwin 1995] M S Litwin, How to Measure Survey Reliability and Validity,

Thousand Oaks, CA: SAGE, 1995, pp. 43-44

[Maletic 2001] J I Maletic, J Leigh, A Marcus, G Dunlap, ‘Visualizing object-

oriented software in virtual reality’ in Proceedings of the 9th


Toronto, ON, pp. 26-35, Washington, DC: IEEE Computer Society

Press, 2001

[Marcus 2003a] A Marcus, L Feng, J I Maletic, ‘Source Viewer 3D (sv3D): a system

for visualizing multi dimensional software analysis data’, paper

presented at The 2nd Annual “Designfest” on Visualizing Software

for Understanding and Analysis (VISSOFT), Amsterdam, 2003

[Marcus 2003b] A Marcus, L Feng, J I Maletic, ‘Comprehension of software analysis

data using 3D visualization’ in Proceedings of the 11th International

Workshop on Program Comprehension (IWPC), Orlando, FL, pp.

105-114 , Washington, DC: IEEE Computer Society Press, 2003

[Martin 2000] R C Martin, ‘Design principles and design patterns’,

http://www.objectmentor.com/resources/articles/Principles_and_Pat

terns.PDF, 2000, p. 17

[Martin 2002] L Martin, A Giesl, J Martin, ‘Dynamic component program

visualisation’ in Proceedings of the 9th Working Conference on

Reverse Engineering (WCRE), Richmond, VA, pp. 289-298, Los


[MDR 2005] MDR team, Metadata Repository (MDR) project home,

http://mdr.netbeans.org/, 2005

[Minsky 1965] M Minsky, ‘Models, minds, machines’ in Proceedings of the

IFIPS Congress, New York, NY, pp. 45-49, Montvale, NJ:

AFIPS Press, 1965

[Mössenböck 1991] H Mössenböck, N Wirth, ‘The programming language Oberon-2’,

Structured Programming, 12(4):179-195, 1991

[Mulholland 1997] P Mulholland, ‘Using a fine-grained comparative evaluation

technique to understand and design software visualization tools’ in

Proceedings of the 7th Workshop on Empirical Studies of

219

Programmers, Alexandria, VA, pp. 91-108, New York, NY: ACM

Press, 1997

[Mulholland 1998] P Mulholland, ‘A principled approach to the evaluation of SV: a

case-study in Prolog’ in J Stasko, J Domingue, M Brown, B Price

(eds.), Software Visualization: Programming as a Multimedia

Experience, Cambridge, MA: MIT Press, 1998

[Mulholland 1999] P Mulholland, ‘The ISM framework: understanding and evaluating

software visualization tools’ in P Brna, B du Boulay, H Pain (eds.),

Learning to Build and Comprehend Complex Information

Structures: Prolog as a Case Study, Norwood, NJ: Ablex, 1999

[Müller 1988] H A Müller, K Klashinsky, ‘Rigi – a system for programming-in-

the-large’ in Proceedings of the 10th International Conference on

Software Engineering (ICSE), Singapore, pp. 80-86, Los Alamitos,


[Müller 1993] H A Müller, M A Orgun, S R Tilley, J S Uhl, ‘A reverse

engineering approach to subsystem structure identification’, Journal

of Software Maintenance: Research and Practice, 5(4):181-204,

1993

[Müller 2000] H A Müller, J H Jahnke, D B Smith, M-A D Storey, S R Tilley, K

Wong, ‘Reverse engineering: a roadmap’ in Proceedings of the

Conference on the Future of Software Engineering, 22nd

International Conference on Software Engineering (ICSE),

Limerick, pp. 47-60, New York, NY: ACM Press, 2000

[Müller 2001] H A Müller, Rigi Group Home Page, http://www.rigi.csc.uvic.ca/,

2001

[Murphy 1995] G C Murphy, D Notkin, K J Sullivan, ‘Software reflexion models:

bridging the gap between source and high-level models’ in

Proceedings of the 3

Foundations of Software Engineering, Washington, DC, pp. 18-28,


rd ACM SIGSOFT Symposium on the

[Murphy 1996a] G C Murphy, D Notkin, E S-C Lan, ‘An empirical study of static

call graph extractors’ in Proceedings of the 18th International

Conference on Software Engineering (ICSE), Berlin, pp. 90-99, Los


220

[Murphy 1996b] G C Murphy, D Notkin, ‘Lightweight lexical source model

extraction’, ACM Transactions on Software Engineering and

Methodology, 5(3):262-292, 1996

[Murphy 1998] G C Murphy, D Notkin, W G Griswold, E S Lan, ‘An empirical

study of static call graph extractors’, ACM Transactions on Software

Engineering and Methodology, 7(2):158-191, 1998

[Murphy 1997] G C Murphy, D Notkin, ‘Reengineering with reflexion models: a

case study’, IEEE Computer, 30(8):29-36, 1997

[Murphy 2001] G C Murphy, D Notkin, K J Sullivan, ‘Software reflexion models:

bridging the gap between design and implementation’, IEEE

Transactions on Software Engineering, 27(4): 364-380, 2001

[Myers 1986] B A Myers, ‘Visual programming, programming by example, and

program visualization: a taxonomy’ in Proceedings of the 4th ACM

SIGCHI Conference on Human Factors in Computing Systems,

Boston, MA, pp. 59-66, New York, NY: ACM Press, 1986

[Nassi 1973] I Nassi, B Shneiderman, ‘Flowchart techniques for structured

programming’, ACM SIGPLAN Notices, 8(8):12-26, 1973

[NCSA 2003] National Center for Supercomputing Applications, NCSA Mosaic

Home Page, http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/

NCSAMosaicHome.html, 2003

[Nielson 1990] G M Nielson, B D Shriver, J Rosenblum, Visualization in Scientific

Computing, Los Alamitos, CA: IEEE Computer Society Press, 1990

[O’Brien 2002] L O’Brien, ‘Experiences in architecture reconstruction at Nokia’,

Technical Note CMU/SEI-2002-TN-004, Pittsburgh, PA: Software

Engineering Institute, Carnegie-Mellon University, 2002

[O’Madadhain 2005] J O’Madadhain, D Fisher, T Nelson, J Krefeldt, JUNG – Java

Universal Network/Graph Framework, http://jung.sourceforge.net/,

2005

[OMG 2001] Object Management Group, Unified Modeling Language (UML)

v1.4, http://www.omg.org/technology/documents/formal/uml.htm,

2001

[OMG 2003a] Object Management Group, UML 2.0 Superstructure Specification,

OMG Adopted Specification ptc/03-08-02, http://www.omg.org/cgi-

bin/doc?ptc/2003-08-02, 2003

221

[OMG 2003b] Object Management Group, UML 2.0 Infrastructure Specification,

OMG Adopted Specification ptc/03-09-15, http://www.omg.org/cgi-

bin/doc?ptc/2003-09-15, 2003

[Park 1991] H-S Park, ‘Abstract object types = abstract knowledge types +

abstract data types + abstract connector types’, Journal of Object-

Oriented Programming, 4(3):37-52, 1991

[Perry 2000] D E Perry, A A Porter, L G Votta, ‘Empirical studies of software

engineering: a roadmap’ in Proceedings of the Conference on the

Future of Software Engineering, 22 Conference on

Software Engineering (ICSE), Limerick, pp. 345-355, New York,

NY: ACM Press, 2000

[OMG 2003c] Object Management Group, OMG Unified Modeling Language

Specification Version 1.5, Formal/03-03-01,

http://www.omg.org/technology/documents/formal/uml.htm, 2003

[Ören 1984] T I Ören, Foreword in B P Zeigler, Multifacetted Modeling and

Discrete Event Simulation, London: Academic Press, 1984

[Pennington 1987] N Pennington, ‘Stimulus structures and mental representations in

expert comprehension of computer programs’, Cognitive

Psychology, 19:295-341, 1987

nd International

[Peterson 1981] Peterson J L, Petri Net Theory and the Modeling of Systems,


[Petre 1997] M Petre, A F Blackwell, T R G Green, 'Cognitive questions in

software visualisation' in J Stasko, J Domingue, M H Brown, B A

Price (eds.), Software Visualization: Programming as a Multimedia

Experience, Cambridge, MA: MIT Press, 1997

[Petri 1962] C Petri, Kommunikation mit Automaten, PhD dissertation, Bonn:

University of Bonn, 1962

[Pinzger 2005] M Pinzger, H Gall, M Fischer, M Lanza, ‘Visualizing multiple

evolution metrics’ in Proceedings of the 2005 ACM Symposium on

Software Visualization, St. Louis, MO, pp. 67-75, New York, NY:

ACM Press, 2005

[Pressman 2000] R S Pressman, Software Engineering: A Practitioner’s Approach,

European adaptation, 5th ed., London: McGraw-Hill, 2000

[Price 1992] B A Price, I S Small, R M Baecker, ‘A taxonomy of software

visualisation’ in Proceedings of the 25th Hawaii International

222

Conference on System Science (HICSS), Kauai, HI, Vol. II, pp. 597-

606, Los Alamitos, CA: IEEE Computer Society Press, 1992

[Price 1993] B A Price, R M Baecker, I S Small, ‘A principled taxonomy of

software visualization’, Journal of Visual Languages and

Computing, 4(3):211-266, 1993

[Rational 2003] Rational Software Corporation, Visual Modeling with Rational Rose

Home, http://www.rational.com/products/rose/index.jsp, 2003

[Reasoning 1994] Reasoning Systems Inc., Refine/C User’s Guide,

http://www.reasoning.com/, 1994

[Reeves 1983] W T Reeves, ‘Particle systems – a technique for modeling a class of

fuzzy objects’, ACM Transactions on Graphics, 2(2):91-108, 1983

[Reiser 1991] M Reiser, The Oberon System – User Guide and Programmer’s

Manual, Boston, MA: Addison-Wesley, 1991

[Reiss 1995] S P Reiss, The Field Programming Environment: A Friendly

Integrated Environment for Learning and Development, Dordrecht:

Kluwer Academic Publishers, 1995

[Reiss 2001] S P Reiss, ‘An overview of BLOOM’ in Proceedings of the 3

SIGPLAN-SIGSOFT Workshop on Program Analysis for Software

Tools and Engineering (PASTE), Snowbird, UT, pp. 2-5, New York,

NY: ACM Press, 2001

[Richner 1999] T Richner, S Ducasse, ‘Recovering high-level views of object-

oriented application from static and dynamic information’ in

Proceedings of the 15 e

rd ACM

[Reiss 2002] S P Reiss, ‘A visual query language for software visualisation’ in

Proceedings of the IEEE Symposium on Human Centric Computing

Languages and Environments (HCC), Arlington, VA, pp. 80-82, Los


[Reiss 2003a] S P Reiss, ‘JIVE: visualizing Java in action’, demonstration

presented at the 25th International Conference on Software

Engineering (ICSE), Portland, OR, pp. 820-821, Los Alamitos, CA:


[Reiss 2003b] S P Reiss, ‘Visualizing Java in action’ in Proceedings of the 1st

ACM Symposium on Software Visualization (SoftViz), San Diego,

CA, pp. 57-65, New York, NY: ACM Press, 2003

th International Conference on Softwar

223

Maintenance (ICSM), Oxford, pp. 13-22, Los Alamitos, CA: IEEE


[Richner 2002a] T Richner, S Ducasse, ‘Using dynamic information for the iterative

recovery of collaborations and roles’ in Proceedings of the 18th

International Conference on Software Maintenance (ICSM),

Montréal, QC, pp. 34-43, Los Alamitos, CA: IEEE Computer

Society Press, 2002

[Richner 2002b] T Richner, Recovering Behavioural Design Views: A Query-Based

Approach, PhD thesis, Berne: University of Berne, 2002

[Rieger 2004] M Rieger, S Ducasse, M Lanza, ‘Insights into system-wide code

duplication’ in Proceedings of the 11th Working Conference on

Reverse Engineering (WCRE), Delft, pp. 100-109, Los Alamitos,


[Riley 2003] G Riley, CLIPS: A Tool for Building Expert Systems,

http://www.ghg.net/clips/CLIPS.html, 2003

[Riva 2002] C Riva, J V Rodriguez, ‘Combining static and dynamic views for

architecture reconstruction’ in Proceedings of the 6th European

Conference on Software Maintenance and Reengineering (CSMR),

Budapest, pp. 47-56, Los Alamitos, CA: IEEE Computer Society

Press, 2002

[Rockel 2000] I Rockel, F Heimes, FUJABA – Homepage, http://www.uni-

paderborn.de/fachbereich/AG/schaefer/ag_dt/PG/Fujaba/fujaba.html

, 2000

[Roman 1993] G-C Roman, K C Cox, ‘A taxonomy of program visualization

systems’, IEEE Computer, 26(12):11-24, 1993

[Rumbaugh 1991] J Rumbaugh, M Blaha, W Premerlani, F Eddy, W Lorensen, Object-

oriented modelling and design, Englewood Cliffs, NJ: Prentice-Hall,

1991

[Rumbaugh 1999] J Rumbaugh, I Jacobsen, G Booch, The Unified Modelling

Reference Manual, Boston, MA: Addison Wesley, 1999

[Schauer 1999] R Schauer, S Robitaille, F Martel, R K Keller, ‘Hot spot recovery in

object-oriented software with inheritance and composition template

methods’ in Proceedings of the 15th International Conference on

Software Maintenance (ICSM), Oxford, pp. 220-229, Los Alamitos,


224

[Sefika 1996a] M Sefika, A Sane, R H Campbell, ‘Architecture-oriented

visualization’ in Proceedings of the 11th Conference on Object-

Oriented Programming, Systems, Languages, and Applications

(OOPSLA), San José, CA, pp. 389-405, New York, NY: ACM

Press, 1996

[Sefika 1996b] M Sefika, A Sane, R H Campbell, ‘Monitoring compliance of a

software system with its high-level design models’ in Proceedings

of the 18th International Conference on Software Engineering

(ICSE), Berlin, pp. 387-396, Los Alamitos, CA: IEEE Computer

Society Press, 1996

[Selic 1994] B Selic, G Gullekson, P T Ward, Real-Time Object-Oriented

Modeling, New York: John Wiley and Sons, 1994

[Selic 1998] B Selic, J Rumbaugh, Using UML for Modeling Complex Real-Time

Systems,

http://www.ibm.com/developerworks/rational/library/content/03July

/1000/1155/1155_umlmodeling.pdf, 1998

[Selonen 2001] P Selonen, K Koskimies, M Sakkinen, ‘How to make apples from

oranges in UML’ in Proceedings of the 34th Hawaii International

Conference on System Sciences (HICSS), Maui, HI, pp. 3054-3063,


[Sevitsky 2001] G Sevitsky, W De Pauw, R Konuru, ‘An information exploration

tool for performance analysis of Java programs’ in Proceedings of

the 38th Conference on Technology of Object-Oriented Languages

and Systems (TOOLS Europe), Zurich, pp. 85-101, Los Alamitos,


[Shaw 1984] M Shaw, ‘Abstraction techniques in modern programming

languages’, IEEE Software, 1(4):10-26, 1984

[Shaw 1995] M Shaw, R DeLine, D Klein, T Ross, D Young, G Zelesnik,

‘Abstractions for software architecture and tools to support them’,

IEEE Transactions on Software Engineering, 21(4):314-335, 1995

[Shneiderman 1980] B Shneiderman, Software Psychology: Human Factors in Computer

and Information Systems, Boston, MA: Winthrop Publishers, 1980

[Siff 1997] M Siff, T Reps, ‘Identifying modules via concept analysis’ in

Proceedings of the 13th International Conference on Software

225

Maintenance (ICSM), Bari, pp. 170-178, Los Alamitos, CA: IEEE


[Sim 2000a] S E Sim, M-A D Storey, ‘A structured demonstration of program

comprehension tools’ in Proceedings of the 7th Working Conference

on Reverse Engineering (WCRE), Brisbane, QL, pp. 184-193, Los


[Sim 2000b] S E Sim, M-A D Storey, A Winter, ‘A structured demonstration of

five program comprehension tools: lessons learnt’ in Proceedings of

the 7th Working Conference on Reverse Engineering (WCRE),

Brisbane, QL, pp. 210-212, Los Alamitos, CA: IEEE Computer

Society Press, 2000

[Singer 1997] J Singer, T Lethbridge, N Vinson, N Anquetil, ‘An examination of

software engineering work practices’ in Proceedings of the 1997

Conference of the Centre for Advanced Studies on Collaborative

Research (CASCON), Toronto, ON, p. 21, Armonk, NY: IBM Press,

1997

[Soloway 1984] E Soloway, K Ehrlich, ‘Empirical studies of programming

knowledge’, IEEE Transactions on Software Engineering, SE-

10(5):595-609, 1984

[Soloway 1988] E Soloway, J Pinto, S Letovsky, D Littman, R Lampert, ‘Designing

documentation to compensate for delocalized plans’,


[Stasko 1992] J T Stasko, C Patterson, ‘Understanding and characterising software

visualization systems’ in Proceedings of the 8th IEEE Workshop on

Visual Languages (VL), Seattle, WA, pp. 3-10, Los Alamitos, CA:


[Stonebraker 1990] M Stonebraker, L Rowe, M Hirohama, ‘The implementation of

POSTGRES’, IEEE Transactions on Knowledge and Data

Engineering, 2(1):125-141, 1990

[Storey 1995] M-A D Storey, H A Müller, ‘Manipulating and documenting

software structures using SHriMP views’ in Proceedings of the 11th

International Conference on Software Maintenance (ICSM), Nice,


1995

226

[Storey 1996a] M-A D Storey, K Wong, P Fong, D Hooper, K Hopkins, H A

Müller, ‘On designing an experiment to evaluate a reverse

engineering tool’ in Proceedings of the 3rd Working Conference on

Reverse Engineering (WCRE), Monterey, CA, pp. 31-40, Los


[Storey 1996b] M-A D Storey, H Müller, K Wong, ‘Manipulating and documenting

software structures’ in P Eades, K Zhang (eds.), Software

Visualization, pp. 244-263, World Scientific Publishing, 1996

[Storey 1997] M-A D Storey, K Wong, H A Müller, ‘How do program

understanding tools affect how programmers understand programs?’

in Proceedings of the 4th Working Conference on Reverse

Engineering (WCRE), Amsterdam, pp. 12-21, Los Alamitos, CA:


[Storey 2000] M-A D Storey, K Wong, H A Müller, ‘How do program

understanding tools affect how programmers understand

programs?’, Science of Computer Programming, 36(2-3):183-207,

2000

[Storey 2001] M-A Storey, C Best, J Michaud, ‘SHriMP views: an interactive

environment for exploring Java programs’ in Proceedings of the 9th


Toronto, ON, pp. 111-112, Los Alamitos, CA: IEEE Computer

Society Press, 2001

[Stroulia 2002] E Stroulia, T Systä, ‘Dynamic analysis for reverse engineering and

program understanding’, ACM SIGAPP Applied Computing Review,

10(1):8-17, 2002

[Sun 2000] Sun Microsystems, Java 2 SDK, Standard Edition Version 1.2,

http://java.sun.com/products/jdk/1.2/, 2000

[Sun 2002] Sun Microsystems, jdb – the Java Debugger,

http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/jdb.html,

2002

[Sun 2005] Sun Microsystems, java.sun.com, http://java.sun.com, 2005

[Sun 2004a] Sun Microsystems, Java Platform Debugger Architecture,

http://java.sun.com/j2se/1.5.0/docs/guide/jpda/, 2004.

[Sun 2004b] Sun Microsystems, Java Programming Language,

http://java.sun.com/j2se/1.5.0/docs/guide/language/index.html, 2004

227

[Systä 1999a] T Systä, ‘On the relationships between static and dynamic models in

reverse engineering Java software’ in Proceedings of the 6th

Working Conference on Reverse Engineering (WCRE), Atlanta, GA,


1999

[Systä 1999b] T Systä, ‘Dynamic reverse engineering of Java software’ in

Proceedings of the 3rd Workshop on Object-Oriented Technology,

Lisbon, Lecture Notes in Computer Science 1743, pp. 174-175,

London: Springer-Verlag, 1999

[Systä 2000a] T Systä, P Yu, H Müller, ‘Analyzing Java software by combining

metrics and program visualization’ in Proceedings of the 4th

European Conference on Software Maintenance and Reengineering

(CSMR), Zurich, pp. 199-208, Los Alamitos, CA: IEEE Computer

Society Press, 2000

[Systä 2000b] T Systä, ‘Understanding the behaviour of Java programs’ in


(WCRE), Brisbane, QLD, pp. 214-223, Los Alamitos, CA: IEEE


[Systä 2000c] T Systä, Static and Dynamic Reverse Engineering Techniques for

Java Software Systems, PhD Dissertation, Report A-2000-4,

Tampere: Department of Computer and Information Sciences,

University of Tampere, 2000

[Systä 2000d] T Systä, ‘Incremental construction of dynamic models for object-

oriented software systems’, Journal of Object-Oriented

Programming, 13(5):18-27, 2000

[Systä 2001] T Systä, K Koskimies, H Müller, ‘Shimba – an environment for

reverse engineering Java software systems’, Software – Practice and

Experience, 31(4):371-394, 2001

[Szyperski 1998] C Szyperski, Component Software: Beyond Object-Oriented

Programming, Harlow: Addison-Wesley, 1998

[Taligent 1994] Taligent Inc., Building Object-Oriented Frameworks, White Paper,

Cupertino, CA: Taligent Inc., 1994

[Templ 1994] J Templ, Oberon CD-ROM: Kepler – User Guide, Bonn: Addison-

Wesley 1994

228

[Tichelaar 1998] S Tichelaar, S Demeyer, ‘An exchange model for reengineering

tools’ in Proceedings of the 12th European Conference on Object-

Oriented Programming (ECOOP) Workshop Reader, Lecture Notes

in Computer Science 1543, pp.82-84, Berlin: Springer-Verlag, 1998

[Tilley 1994] S R Tilley, K Wong, M-A D Storey, H A Müller, ‘Programmable

reverse engineering’, International Journal of Software Engineering

and Knowledge Engineering, 4(4):501-520, 1994

[TogetherSoft 2001a] TogetherSoft Corporation, Together v5.5 Documentation,

http://www.togethercommunity.com/docs/5.5/together5.htm, 2001

[TogetherSoft 2001b] TogetherSoft Corporation, Together ControlCenter,

http://www.togethersoft.com/products/controlcenter/, 2001

[Tolke 2005] L Tolke, M Klink, Cookbook for Developers of ArgoUML: an

introduction to developing ArgoUML,

http://argouml.tigris.org/documentation/defaulthtml/cookbook/,

2005

[Tonella 1999] P Tonella, G Antoniol, ‘Object oriented design pattern inference’ in


Maintenance (ICSM), Oxford, pp. 230-238, Los Alamitos, CA:


[Tufte 1990] E R Tufte, Envisioning Information, Cheshire, CT: Graphics Press,

1990

[Turing 1948] A M Turing, ‘Intelligent machinery’ in B Meltzer, D Michie (eds.),

Machine Intelligence 5, Edinburgh: Edinburgh University Press,

1969

[von Mayrhauser 1995] A von Mayrhauser, A M Vans, ‘Program comprehension during

software maintenance and evolution’, IEEE Computer, 28(8):44-55,

1995

[von Mayrhauser 1999] A von Mayrhauser, S Lang, ‘On the role of static analysis during

software maintenance’ in Proceedings of the International

Workshop on Program Comprehension (IWPC), Pittsburgh, PA, pp.


[W3C 1999] World Wide Web Consortium, XSL Transformations (XSLT),

http://www.w3.org/TR/xslt, 1999

229

[W3C 2004] World Wide Web Consortium, Extensible Markup Language (XML)

1.0 (Third Edition), http://www.w3.org/TR/2004/REC-xml-

20040204/, 2004

[Walker 1998] R J Walker, G C Murphy, B Freeman-Benson, D Wright, D

Swanson, J Isaak, ‘Visualizing dynamic software system

information through high-level models’ in Proceedings of the 13th

Conference on Object-Oriented Programming, Systems, Languages,

and Applications (OOPSLA), Vancouver, BC, pp. 271-283, New


[Waters 1999] B Waters, S Rugaber, G Abowd, ‘Architectural synthesis:

integrating multiple architectural perspectives’ in Proceedings of the

6th Working Conference on Reverse Engineering, Atlanta, GA, pp. 2-

11, IEEE Computer Society Press, 1999

[Wikman 1998] J Wikman, Evolution of a Distributed Repository-Based

Architecture, Research Report 1998:14, Karlskrona: Department of

Software Engineering and Computer Science, Blekinge Institute of

Technology, 1998

[Wind River 2003] Wind River Systems Inc., SNIFF+ Datasheet,

http://www.takefive.com/bundle/sniff.pdf, 2003

[Winter 2002] A Winter, ‘GXL – overview and current status’, presentation at The

International Workshop on Graph-Based Tools (GraBaTs),

Barcelona, 2002

[Wirth 1992] N Wirth, J Gutknecht, Project Oberon, The Design of an Operating

System and Compiler, Boston, MA: Addison-Wesley, 1992

[Wong 1995] K Wong, S R Tilley, H A Müller, M-A D Storey, ‘Structural

redocumentation: a case study’, IEEE Software, 21(1):46-54, 1995

[Xfig 2003] Xfig.org, XFIG Drawing Program for the X Window System,

http://www.xfig.org/, 2003

[Yan 2004] H Yan, D Garlan, B Schmerl, J Aldrich, R Kazman, ‘DiscoTect: a

system for discovering architectures from running systems’ in


Engineering (ICSE), Edinburgh, pp. 470-479, Los Alamitos, CA:


[Yeh 1997] A S Yeh, D R Harris, M P Chase, ‘Manipulating recovered software

architecture views’ in Proceedings of the 19th International

230

231

Conference on Software Engineering (ICSE), Boston, MA, pp. 184-

194, New York, NY: ACM Press, 1997

[Yin 2002] Yin R, Keller R K, ‘Program comprehension by visualization in

contexts’ in Proceedings of the 18th International Conference on

Software Maintenance (ICSM), Montreal, QC, pp. 332-341, Los


[Zachman 1996] J A Zachman, ‘Concepts of the framework for enterprise

architecture: background, description, and utility’, Los Angeles, CA:

Zachman International, 1996

[Zeigler 1976] B P Zeigler, Theory of Modeling and Simulation, New York, NY:

Wiley, 1976

[Zeigler 1984] B P Zeigler, Multifacetted Modeling and Discrete Event Simulation,

London: Academic Press, 1984

[Zeigler 2000] B P Zeigler, H Praehofer, T G Kim, Theory of Modeling and

Simulation: Integrating Discrete Event and Continuous Complex

Dynamic Systems, 2nd ed., London: Academic, 2000

[Zimmer 1985] J A Zimmer, Abstraction for Programmers, New York, NY:

McGraw-Hill, 1985

Documents

A Novel Software Visualisation Model to Support Object-Oriented …€¦ · software visualisation model based on a range of abstraction levels and structural and behavioural perspectives;