Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
A Novel Software Visualisation Model to
Support Object-Oriented Program
Comprehension
Michael John Pacione
Department of Computer and Information Sciences
PhD
November 2005
The copyright of this thesis belongs to the author under the terms of the United Kingdom
Copyright Acts as qualified by University of Strathclyde Regulation 3.51. Due
acknowledgement must always be made of the use of any material contained in, or derived
from, this thesis.
ii
Abstract
Current software visualisation tools do not address the full range of software comprehension
requirements. This thesis presents a novel software visualisation model for supporting
object-oriented software comprehension that is intended to address the shortcomings of
existing tools. Related work in the fields of software visualisation, tool evaluation,
abstraction, diagrams, views, exploration and querying, metamodels, and software modelling
is discussed. An initial case study that prompted the development of this novel model is
described. The model is then introduced, based on multiple levels of abstraction, structural
and behavioural perspectives, and the integration of statically and dynamically extracted
information. The model is assessed theoretically against its original goals, and its support for
software comprehension strategies is examined. Abstraction operations between views in the
model and the combination of views are defined formally. A demonstration of the
application of the model to a real system is presented. A tool implementation of the model is
introduced. This tool is then used to evaluate the utility of the model in addressing typical
software comprehension tasks in real world software systems. It is concluded that the novel
software visualisation model proposed in this thesis provides effective support for the full
range of software comprehension tasks.
The contributions of this thesis are as follows: an abstraction scale and set of criteria for
classifying software comprehension tools; a thorough review and comparison of the extant
software visualisation tools; typical software comprehension activities and tasks to be used
in the evaluation of software comprehension tools; a schema for categorising view
arrangements in software tools; the findings of an initial study assessing the capabilities of
the extant software visualisation tools using typical software comprehension tasks; the novel
software visualisation model based on a range of abstraction levels and structural and
behavioural perspectives; a prototype implementation of the model as the VANESSA tool;
and the findings of the evaluation of this model using real software comprehension tasks and
real systems.
iii
Publications
Refereed conference papers
M J Pacione, ‘VANESSA: Visualisation Abstraction NEtwork for Software Systems
Analysis’ in Industrial and Tool Proceedings of the 21st IEEE International Conference on
Software Maintenance (ICSM), Budapest, pp. 85-88, Vienna: Harry Sneed, 2005
Best Tool Paper Award
M J Pacione, M Roper, M Wood, ‘A novel software visualisation model to support software
comprehension’ in Proceedings of the 11th Working Conference on Reverse Engineering
(WCRE), Delft, pp. 70-79, Los Alamitos, CA: IEEE CS Press, 2004
M J Pacione, ‘Software visualisation for object-oriented program comprehension’ in
Doctoral Symposium, Proceedings of the 26th International Conference on Software
Engineering (ICSE), Edinburgh, pp. 63-65, Los Alamitos, CA: IEEE CS Press, 2004
M J Pacione, ‘Software visualisation for object-oriented program comprehension’ in Poster
Proceedings of the 5th Postgraduate Research Conference in Electronics, Photonics,
Communications & Networks, and Computing Science (PREP), Hatfield, pp. 158-159,
Swindon: EPSRC, 2004
M J Pacione, M Roper, M Wood, ‘A comparative evaluation of dynamic visualisation tools’
in Proceedings of the 10th Working Conference on Reverse Engineering (WCRE), Victoria,
BC, pp. 80-89, Los Alamitos, CA: IEEE CS Press, 2003
iv
Technical reports
M J Pacione, A Fully Specified Abstraction Model for Software Visualisation, Technical
Report EFoCS-54-2004, Glasgow: Department of Computer and Information Sciences,
University of Strathclyde, 2004
M J Pacione, Evaluating a Model of Software Visualisation for Software Comprehension,
Technical Report EFoCS-53-2004, Glasgow: Department of Computer and Information
Sciences, University of Strathclyde, 2004
M J Pacione, Effective Visualisation for Comprehending Object-Oriented Software: A
Multifaceted, Three-Dimensional Abstraction Model for Software Visualisation, Technical
Report EFoCS-52-2004, Glasgow: Department of Computer and Information Sciences,
University of Strathclyde, 2004
M J Pacione, A Review and Evaluation of Dynamic Visualisation Tools, Technical Report
EFoCS-50-2003, Glasgow: Department of Computer and Information Sciences, University
of Strathclyde, 2003
v
Acknowledgements
I would like to thank my supervisors, Murray Wood and Marc Roper, for their guidance and
support during the course of my research.
I would also like to thank my fellow EFoCS PhD students – Neil Walkinshaw, Doug Kirk,
Al Dunsmore, and Matt Munro – for their solidarity, friendship, and innumerable games of
pool, foosball, and frisbee.
Doug Kirk at the University of Strathclyde, Jens Gulden at the Technical University of
Berlin, and Rob Lintern at the University of Victoria each made an invaluable contribution
by providing me with evaluation data.
The greatest debt I owe to my parents, Michael and Christine, for their encouragement and
support – both financial and moral – over the past three years (and the preceding twenty-
two). Lastly, I want to thank all my friends, especially Clare and Richard, for keeping me
sane and reminding me that there are more important things in life than research and the
pursuit of knowledge.
“All this worldly wisdom was once the unamiable heresy of some wise man.”
“All endeavor calls for the ability to tramp the last mile, shape the last plan, endure the last hours toil. The fight to the finish spirit is the one... characteristic we must posses if we are to face the future as finishers.”
Henry David Thureau, 1817-1862
“You qualify in your boiler suit and then put on your tuxedo.”
Jock Stein, 1922-1985
vi
Contents
Abstract iii
Publications iv
Acknowledgements vi
Contents vii
List of figures xix
List of tables xxx
1 Introduction 1
1.1 Background 1
1.1.1 Software visualisation 1
1.1.2 Software comprehension 1
1.1.3 Challenges in object-oriented software comprehension 2
1.1.4 Reverse engineering tools support software comprehension 2
1.1.4.1 The relationship between forward and reverse
engineering
3
1.1.5 Abstraction 3
1.2 Thesis overview 4
1.2.1 Motivation and aim 4
1.2.2 Research hypothesis 4
1.2.3 Approach and methodology 5
1.2.4 Contributions of this thesis 5
2 Related Work 6
2.1 Software visualisation techniques 6
2.1.1 Static software comprehension techniques 6
2.1.2 Dynamic software comprehension techniques 7
2.1.3 Advantages of dynamic analysis for object-oriented systems 7
2.1.4 Debuggers 8
2.1.5 Software visualisation tools 9
2.1.6 Data collection for software visualisation 9
2.1.7 Analysing the data produced 11
vii
2.1.8 Presenting the results 12
2.1.8.1 Basic graph representations 12
2.1.8.2 UML diagrams 12
2.1.8.3 Message sequence charts 14
2.1.8.4 Other representations 15
2.2 Software visualisation tools 16
2.2.1 Characteristics of software visualisation tools 17
2.2.1.1 Three criteria for characterising software
visualisation tools
17
2.2.1.2 A scale to indicate level of abstraction 17
2.2.1.3 Software visualisation tool taxonomies 18
2.2.2 Program Explorer (level 2) 19
2.2.2.1 Description 19
2.2.2.2 Evaluation 20
2.2.2.3 Comparison 20
2.2.2.4 Assessment 21
2.2.3 Scene (level 2) 22
2.2.3.1 Description 22
2.2.3.2 Evaluation 23
2.2.3.3 Comparison 23
2.2.3.4 Assessment 23
2.2.4 Architecture-oriented visualization (level 4) 23
2.2.4.1 Description 23
2.2.4.2 Evaluation 25
2.2.4.3 Comparison 26
2.2.4.4 Assessment 27
2.2.5 ISVis (level 4) 27
2.2.5.1 Description 27
2.2.5.2 Evaluation 28
2.2.5.3 Comparison 29
2.2.5.4 Assessment 30
2.2.6 Dali (level 4) 30
2.2.6.1 Description 30
2.2.6.2 Evaluation 32
2.2.6.3 Comparison 33
viii
2.2.6.4 Assessment 33
2.2.7 Ovation (level 2) 33
2.2.7.1 Description 33
2.2.7.2 Evaluation 35
2.2.7.3 Comparison 35
2.2.7.4 Assessment 36
2.2.8 Reflexion models (level 4) 36
2.2.8.1 Description 36
2.2.8.1.1 AVID 36
2.2.8.1.2 RMTool 38
2.2.8.2 Evaluation 39
2.2.8.2.1 AVID 39
2.2.8.2.2 RMTool 40
2.2.8.3 Comparison 41
2.2.8.3.1 AVID 41
2.2.8.3.2 RMTool 41
2.2.8.4 Assessment 42
2.2.9 Gaudi (levels 3-4) 42
2.2.9.1 Description 42
2.2.9.2 Evaluation 43
2.2.9.3 Comparison 44
2.2.9.4 Assessment 44
2.2.10 Shimba (levels 2-4) 44
2.2.10.1 Description 44
2.2.10.2 Evaluation 45
2.2.10.3 Comparison 46
2.2.10.4 Assessment 46
2.2.11 Jinsight (level 2-3) 47
2.2.11.1 Description 47
2.2.11.2 Evaluation 49
2.2.11.3 Comparison 49
2.2.11.4 Assessment 49
2.2.12 Collaboration Browser (levels 2-4) 50
2.2.12.1 Description 50
2.2.12.2 Evaluation 51
ix
2.2.12.3 Comparison 52
2.2.12.4 Assessment 52
2.2.13 Together debugger (level 1) 53
2.2.13.1 Description 53
2.2.13.2 Evaluation 53
2.2.13.3 Comparison 53
2.2.13.4 Assessment 54
2.2.14 Together diagrams (levels 2-3) 54
2.2.14.1 Description 54
2.2.14.2 Evaluation 54
2.2.14.3 Comparison 55
2.2.14.4 Assessment 55
2.2.15 SHriMP (levels 0, 2-4) 55
2.2.15.1 Description 55
2.2.15.2 Evaluation 56
2.2.15.3 Comparison 56
2.2.15.4 Assessment 56
2.2.16 BLOOM and JIVE (levels 2-3) 57
2.2.16.1 Description 57
2.2.16.2 Evaluation 57
2.2.16.3 Comparison 57
2.2.16.4 Assessment 58
2.2.17 Polymetric Views, Class Blueprint, RelVis (levels 2-3) 58
2.2.17.1 Description 58
2.2.17.2 Evaluation 60
2.2.17.3 Comparison 61
2.2.17.4 Assessment 62
2.2.18 Seesoft, SeeSys, SeeSlice, HierNet, SeeNet, SeeNet3D
(levels 0, 2-3)
62
2.2.18.1 Description 62
2.2.18.2 Evaluation 64
2.2.18.3 Comparison 65
2.2.18.4 Assessment 66
2.2.19 sv3D and Imsovision (levels 2-3) 66
2.2.19.1 Description 66
x
2.2.19.2 Evaluation 66
2.2.19.3 Comparison 67
2.2.19.4 Assessment 67
2.2.20 Tool summary 67
2.3 Abstraction 68
2.3.1 The concept of abstraction 68
2.3.2 The historical origins of abstraction 69
2.3.3 The application of abstraction 69
2.3.4 Abstraction in software engineering 70
2.3.5 Abstraction in software visualisation 71
2.4 Effective presentation techniques for software visualisation 71
2.4.1 Diagrams for describing software 71
2.4.1.1 Structured design diagrams 72
2.4.1.2 Object-oriented diagrams 73
2.4.1.3 Recent literature 74
2.4.2 Views for software comprehension 74
2.4.2.1 A single view illustrating a single facet 75
2.4.2.2 Multiple independent views illustrating a single
facet
76
2.4.2.3 Multiple interdependent views illustrating a
single facet
76
2.4.2.4 A single view illustrating multiple facets 77
2.4.2.5 Multiple independent views illustrating multiple
facets
77
2.4.2.6 Multiple interdependent views illustrating
multiple facets
77
2.5 Effective techniques for exploring and querying visualisations 78
2.5.1 Exploration 78
2.5.2 Querying 78
2.5.3 Guided navigation 79
2.6 Software modelling 79
2.6.1 The 4+1 view model 79
2.6.2 Hofmeister et al. 80
2.6.3 ManSART 81
2.6.4 Zachman Framework for Enterprise Architecture 82
xi
2.6.5 IEEE Recommended Practice for Architectural Description 82
2.6.6 Other approaches 83
2.7 Evaluation 83
2.7.1 Globus and Uselton (1995) 84
2.7.2 Murphy et al. (1996) 85
2.7.3 Bellay and Gall (1997) 86
2.7.4 Armstrong and Trudeau (1998) 87
2.7.5 Storey et al. (1996) 88
2.7.6 Sim and Storey (2000) 88
2.7.7 Sim et al. (2000) 89
2.7.8 Storey et al. (2000) 90
2.7.9 Bassil and Keller (2001) 91
2.7.10 Hatch et al. (2001) 92
2.7.11 Knight (2001) 93
2.7.12 Kollmann et al. (2002) 93
2.7.13 Conclusions 93
3 Initial Study 96
3.1 Introduction 96
3.2 Generic questions 97
3.2.1 General software comprehension questions 97
3.2.2 Specific reverse engineering questions 98
3.3. Specific reverse engineering questions specified for JHotDraw 98
3.4 Together diagrams 101
3.5 Jinsight 102
3.6 Reflexion models 103
3.7 Together debugger 104
3.8 Case study summary 105
3.9 Conclusions 111
4 A Novel Software Visualisation Model 112
4.1 Background 112
4.2 Research hypothesis 112
4.3 A visualisation model for object-oriented software 113
4.4 Examples 118
xii
4.5 Key research challenges 122
5 Refining the Initial Model 123
5.1 Evaluation based on representative tasks 123
5.2 The basis for typical software comprehension tasks 123
5.3 Task set analysis 127
5.4 New task sets 128
5.4.1 General software comprehension tasks 129
5.4.2 Specific reverse engineering tasks 129
5.5 Justification 130
5.6 Task set revision summary 131
5.7 Theoretical evaluation of the proposed model 131
5.7.1 Model information required to address typical software
comprehension tasks
131
6 The Refined Model 134
6.1 Introduction 134
6.2 Abstraction levels 134
6.3 Inter-level abstraction relationships 138
6.3.1 Abstraction mechanisms 138
6.3.2 Detailed abstraction example 140
6.3.3 Generic abstraction mappings 142
6.3.3.1 Structure hierarchy 143
6.3.3.2 Behaviour hierarchy 146
6.3.4 Combining information from multiple views 147
6.3.4.1 From the same level of each hierarchy 148
6.3.4.2 From different levels of the same hierarchy 148
6.3.4.3 From different levels of each hierarchy 149
6.4 Metamodels 151
6.4.1 Dagstuhl Middle Metamodel 152
6.4.2 UML metamodel 152
6.5 Applying the model to a real system 154
6.6 VANESSA: Visualisation Abstraction NEtwork for Software Systems
Analysis
155
6.6.1 Tool implementation 155
xiii
6.6.2 Example analyses 157
6.6.3 Comparison with other software visualisation tools 160
6.7 Summary 161
7 Evaluation 163
7.1 Experimental setup 163
7.2 Comprehension questions 164
7.3 Threats to validity 164
7.3.1 Internal validity 165
7.3.2 Construct validity 165
7.3.3 External validity 166
7.4 Subject systems 167
7.4.1 JHotDraw 167
7.4.2 BeautyJ 167
7.4.3 SHriMP 168
7.4.4 ArgoUML 170
7.5 Findings 171
7.5.1 Finding 1 171
7.5.1.1 Replication summary 174
7.5.2 Finding 2 175
7.5.3 Finding 3 178
7.5.4 Finding 4 183
7.5.5 Finding 5 185
7.5.6 Finding 6 186
7.5.7 Finding 7 188
7.5.8 Finding 8 189
7.5.9 Finding 9 192
7.5.10 Finding 10 193
7.5.11 Finding 11 195
7.5.12 Miscellaneous issues 197
7.5.13 Conclusions 199
8 Conclusions 200
8.1 Summary 200
8.2 Conclusions 201
xiv
8.3 Future work 201
References 203
Appendices A-1
Appendix A – Initial Study Lab Book A-2
A.1 Together diagrams A-2
A.1.1 General software comprehension questions A-2
A.1.2 Specific reverse engineering questions A-7
A.2 Jinsight A-14
A.2.1 General software comprehension questions A-14
A.2.2 Specific reverse engineering questions A-21
A.3 Reflexion models A-28
A.3.1 jRMTool A-28
A.3.2 AVID A-29
A.3.3 General software comprehension questions A-32
A.3.4 Specific reverse engineering questions A-33
A.4 Together debugger A-34
A.4.1 General software comprehension questions A-35
A.4.2 Specific reverse engineering questions A-36
Appendix B – Manual Model Verification Lab Book A-43
B.1 The JHotDraw framework A-43
B.2 Producing system-specific abstraction mappings A-44
B.2.1 The structure hierarchy A-44
B.2.1.1 From source code to level 0 A-44
B.2.1.2 From level 0 to level 1 A-56
B.2.1.3 From level 0 to level 2 A-59
B.2.1.4 From level 1 to level 2 A-61
B.2.1.5 From level 2 to level 3 A-63
B.2.1.6 From level 3 to level 4 A-87
B.2.1.7 From level 3 to level 5 A-91
B.2.2 The behaviour hierarchy A-97
B.2.2.1 From event trace to level 0 A-97
B.2.2.2 From level 0 to level 1 A-99
B.2.2.3 From level 0 to level 2 A-102
xv
B.2.2.4 From level 1 to level 2 A-104
B.2.2.5 From level 2 to level 3 A-105
B.2.2.6 From level 3 to level 4 A-123
B.2.2.7 From level 3 to level 5 A-128
B.3 JavaDrawApp abstraction hierarchies A-131
B.4 Combining information from multiple views A-134
B.4.1 From the same level of each hierarchy A-134
B.4.2 From different levels of the same hierarchy A-137
B.4.3 From different levels of each hierarchy A-147
B.5 Validation and analysis A-154
B.5.1 Logical validation A-154
B.5.2 Comparison with other diagrams A-154
B.5.2.1 The JHotDraw class diagram A-155
B.5.2.2 The JHotDraw sequence diagram A-158
B.5.2.3 An expert’s component model A-160
B.5.2.4 An expert’s use case model A-164
Appendix C – Validation of Support for Software Comprehension Strategies A-167
C.1 Software comprehension strategies A-167
C.1.1 The bottom-up model A-167
C.1.2 The top-down model A-168
C.1.3 The knowledge-based model A-168
C.1.4 The systematic and as-needed models A-168
C.1.5 The integrated model A-169
C.2 Object-oriented software comprehension A-169
C.3 Comprehension in software visualisation A-171
C.4 Support for software comprehension strategies in the novel software
visualisation model
A-172
C.5 Summary A-173
Appendix D – Entity-Relationship Diagrams for the Novel Model A-175
Appendix E – Comparison with Simulation and Continuous System Abstraction
Techniques
A-181
E.1 Multimodelling A-181
xvi
E.2 Modelling and simulation abstraction techniques A-184
Appendix F – Comparison of Abstraction Relations in the Novel Software
Visualisation Model and the UML Metamodel
A-188
F.1 Abstraction relations in the novel software visualisation model A-188
F.2 UML metamodel A-190
F.3 Representing model information in the UML metamodel A-190
F.3.1 Activities metamodel (behaviour level 1) A-190
F.3.2 Interactions metamodel (behaviour level 2) A-190
F.3.3 Classes metamodel (structure level 2) A-191
F.3.4 Components metamodel (structure level 3) A-192
F.4 Representing abstraction relationships in the UML metamodel A-192
F.4.1 Behaviour levels 1-2 A-192
F.4.2 Structure levels 2-3 A-193
F.5 Summary A-195
Appendix G – Evaluation Lab Book A-196
G.1 System 1 - JHotDraw A-196
G.1.1 Replication - general software comprehension questions A-196
G.1.2 Replication - specific reverse engineering questions A-200
G.1.3 Together diagrams A-208
G.1.4 System designers’ diagrams A-214
G.1.5 System expert’s diagrams A-217
G.2 System 2 – BeautyJ A-221
G.2.1 Comprehension questions A-221
G.2.1.1 Question 1 A-221
G.2.1.2 Question 2 A-229
G.2.1.3 Question 3 A-236
G.2.1.4 Question 4 A-240
G.2.1.5 Question 5 A-241
G.2.2 Documentation A-246
G.2.3 Together diagrams A-253
G.3 System 3 - SHriMP A-260
G.3.1 Comprehension questions A-260
G.3.2 Together diagrams A-266
xvii
G.4 System 4 - ArgoUML A-277
G.4.1 Comprehension questions A-277
G.4.2 Together diagrams A-323
G.5 Conclusions A-336
G.5.1 Abstraction A-336
G.5.1.1 Abstraction levels A-336
G.5.1.2 Navigation between levels A-337
G.5.1.3 Combination of levels A-337
G.5.2 Facets A-337
G.5.2.1 Structural and behavioural A-337
G.5.2.2 Combination of facets A-337
G.5.3 Static/dynamic analysis A-338
G.6 Package structures A-338
G.6.1 JHotDraw A-338
G.6.2 BeautyJ A-338
G.6.3 SHriMP A-339
G.6.4 ArgoUML A-339
xviii
List of Figures
Figure 1.1 The relationship between forward and reverse engineering 3
Figure 2.1 The UML sequence diagram for the Singleton design pattern 13
Figure 2.2 The UML collaboration diagram for the Singleton design pattern 14
Figure 2.3 A scale to indicate level of abstraction 18
Figure 2.4 The positions of tools on the abstraction scale of Figure 2.3 68
Figure 2.5 The six arrangements of views onto a software model. The rectangles
around the views in parts c and f represent the coordination inherent in such
interdependent arrangements
75
Figure 3.1 The orrery application. The circles represent astronomical bodies, such
as planets and moons, coloured according to their diameter. A blue border around
a planet represents atmosphere. The satellite icons represent satellites. The
directed arcs indicate gravitational attraction. The toolbar on the left is used to
select diagram objects, and to create planets, satellites (orbiting and non-orbiting),
atmosphere, and gravity
99
Figure 3.2 The sequence diagram drawn by Together for the
CH.ifa.draw.standard.SelectionTool.mouseDown() method,
illustrating the wide and shallow diagrams produced by static analysis
110
Figure 4.1 A multifaceted, three-dimensional abstraction model for software
visualisation
117
Figure 4.2 An example of the structure abstraction hierarchy 119
Figure 4.3 An example of the behaviour abstraction hierarchy 120
Figure 4.4 An example of the data abstraction hierarchy 121
Figure 6.1 An example instantiation of the behaviour hierarchy for part of the
JHotDraw framework
137
Figure 6.2 An abstraction network illustrating the abstraction relationships
between the views of Table 6.1
139
xix
Figure 6.3 Example combination of level 2 structure and level 5 behaviour
information
151
Figure 6.4 The VANESSA analysis process 156
Figure 6.5 The level 3 Behaviour view of JHotDraw. Arcs denote usage 157
Figure 6.6 Combining views from the same level of each hierarchy. In the
combined view, solid arcs denote usage and dashed arcs denote dependency
158
Figure 6.7 Combining views from different levels of the same hierarchy. In the
combined view, arcs between components denote usage and arcs between
business entities denote business rules
159
Figure 6.8 Combining views from different levels of each hierarchy. Between
classes: solid arcs denote association; dashed arcs denote extension; dotted arcs
denote inheritance. Between components: arcs denote usage
160
Figure 7.1 A screenshot of the BeautyJ options dialogue 168
Figure 7.2 A screenshot of the SHriMP application 169
Figure 7.3 A screenshot of ArgoUML 170
Figure 7.4 A part of the S2 view of JavaDrawApp 172
Figure 7.5 The B3 view for JavaDrawApp 173
Figure 7.6 The custom B1 view of RoundRectangleFigure 174
Figure 7.7 The BeautyJ documentation main classes diagram 176
Figure 7.8 A custom S2/S3 view of BeautyJ 177
Figure 7.9 The Together class diagram for the shrimp.DisplayBean package 177
Figure 7.10 The custom S2 view of the shrimp.DisplayBean package 177
Figure 7.11 The JHotDraw expert’s use case diagram 178
Figure 7.12 The S5 and B5 views of JavaDrawApp 179
Figure 7.13 The expert’s diagram of the static behavioural relationships between
the BeautyJ components
180
Figure 7.14 The B3 view of BeautyJ 181
Figure 7.15 The custom S2 view of the cognitive.critics package 181
Figure 7.16 The custom S1 view of PluggableDiagram 182
Figure 7.17 A custom S2 view of BeautyJ 184
Figure 7.18 The custom S2 view of the Pluggable types from application.api 185
Figure 7.19 The custom S2 view of the multi editor pane classes 186
Figure 7.20 The custom B2 view of StandardSourclet’s interactions 187
Figure 7.21 The combined S3/B3 view of BeautyJ 188
xx
Figure 7.22 The Together sequence diagram for
GXLPersistentStorageBean.loadData()
190
Figure 7.23 A custom B2 view of SHriMP 191
Figure 7.24 A part of the custom B2 view 193
Figure 7.25 Illustration of abstraction level usage 194
Figure A.1 The class diagram generated by Together for the orrery application A-3
Figure A.2 The sequence diagram generated by Together for the
Orbit.MainClass.main() method
A-5
Figure A.3 The sequence diagram generated by Together for the
AbstractFigure.moveBy() method
A-8
Figure A.4 The sequence diagram generated by Together for the
EllipseFigure.basicMoveBy() method
A-9
Figure A.5 The class diagram generated by Together for the Figure interface A-10
Figure A.6 The class diagram generated by Together for the Rectangle class A-11
Figure A.7 The sequence diagram generated by Together for the
AbstractFigure.displayBox(Point, Point) method
A-12
Figure A.8 The sequence diagram generated by Together for the
AbstractFigure.displayBox(Rectangle) method
A-13
Figure A.9 The sequence diagram generated by Together for the
EllipseFigure.basicDisplayBox() method
A-14
Figure A.10 Part of the Jinsight execution view for the orrery application. The
coloured horizontal lines represent method calls
A-15
Figure A.11 A Jinsight reference pattern view for part of the orrery application A-16
Figure A.12 Part of the Jinsight execution view for the orrery application with
repetition detection turned off
A-17
Figure A.13 The Jinsight execution view from Figure A.12 with repetition
detection turned on
A-18
Figure A.14 Part of the Jinsight object histogram for the orrery application,
showing the number of calls to methods of each object. The scale is shown at the
top of the window, with black being the lowest and red the highest. Filled
rectangles represent objects; outline rectangles represent garbage-collected
objects. Diamonds represent the class object of a class
A-19
Figure A.15 Part of the Jinsight object histogram for the orrery application,
showing the active memory size of the objects
A-19
xxi
Figure A.16 Part of the Jinsight method histogram for the orrery application,
showing the methods called by the selected method
A-20
Figure A.17 Part of the Jinsight invocation browser view for the orrery
application, showing methods that call the method highlighted in Figure A.16
A-21
Figure A.18 Part of the Jinsight execution view for the second orrery event trace,
showing the redraw methods
A-22
Figure A.19 Part of the Jinsight call tree for the second orrery event trace,
showing the call tree from the invalidate() method
A-23
Figure A.20 Part of the Jinsight execution view for the second orrery event trace,
showing the implicit control structure of the JHotDraw screen redraw mechanism
A-24
Figure A.21 The Jinsight call tree showing the implicit control structure of the
JHotDraw screen redraw mechanism
A-24
Figure A.22 Part of the Jinsight execution view for the second orrery event trace,
with the EllipseFigure.basicMoveBy() method selected
A-25
Figure A.23 The Jinsight call tree for the EllipseFigure.basicMoveBy()
method
A-26
Figure A.24 Part of the Jinsight execution view for the second orrery event trace,
which shows that AbstractFigure.displayBox(Point, Point) calls
MyEllipseFigure.basicDisplayBox(Point, Point)
A-27
Figure A.25 The initial high-level model input to jRMTool. Ovals represent high-
level system components. Directed arcs represent communication
A-28
Figure A.26 The reflexion model computed by jRMTool. Ovals represent high-
level system components. Directed arcs represent communication; arc annotations
indicate frequency. Solid arcs indicate agreement with the analyst’s model
(convergences); dashed arcs indicate absences from the analyst’s model
(divergences); dotted arcs indicate erroneous communications in the analyst’s
model (absences)
A-29
Figure A.27 The AVID cell view at the start of the execution A-30
Figure A.28 The AVID cell view partway through the execution A-31
Figure A.29 The AVID summary view of the execution A-31
Figure A.30 Together debugger output showing the method calls involved in a
screen redraw in JHotDraw
A-38
Figure A.31 Together debugger output showing the implicit control structure of
the JHotDraw redraw mechanism
A-41
xxii
Figure A.32 The Together debugger user interface. The top-left pane shows
packages and classes. The top-right pane shows a class diagram. The middle-right
pane shows the program code. The bottom pane is the debugger interface, which
shows the watches set at the DEFAULT_MASS, MINIMUM_MASS,
MAXIMUM_MASS, and mass attributes
A-42
Figure B.1 The JavaDrawApp source code, marked up to illustrate level 0
structure entities and relationships. Key: ClassContainmentOperator0,
MethodContainmentOperator0, MethodDeclarationOperand0,
AttributeDeclarationOperand0, ClassOperand0, InheritanceOperator0,
CompositionOperator0
A-54
Figure B.2 A graphical illustration of the extracted level 1 structure information.
Key: ClassContainmentOperator0, MethodContainmentOperator0
A-58
Figure B.3 A graphical illustration of the extracted level 2 structure information.
Key: InheritanceOperator0, CompositionOperator0
A-60
Figure B.4 A graphical illustration of the extracted level 2 structure information.
Key: extension, implementation, composition
A-74
Figure B.5 A graphical illustration of the generated level 3 structure information A-86
Figure B.6 A graphical illustration of the derived level 4 structure information A-90
Figure B.7 A graphical illustration of the derived level 5 structure information A-96
Figure B.8 Part of the extracted event trace. Square bracketed lines contain state
information
A-98
Figure B.9 A graphical illustration of the extracted AnimationDecorator level 1
behaviour information
A-101
Figure B.10 A graphical illustration of the extracted level 2 behaviour information A-103
Figure B.11 A graphical illustration of the extracted level 2 behaviour information A-117
Figure B.12 A graphical illustration of the generated level 3 behaviour
information
A-122
Figure B.13 A graphical illustration of the derived level 4 behaviour information A-127
Figure B.14 A graphical illustration of the derived level 5 behaviour information A-130
Figure B.15 The JavaDrawApp structure hierarchy A-132
Figure B.16 The JavaDrawApp behaviour hierarchy A-133
Figure B.17 A graphical illustration of the combined level 3 structure and
behaviour information
A-137
xxiii
Figure B.18 A graphical illustration of the combined level 2 and level 5 behaviour
information
A-146
Figure B.19 A graphical illustration of the combined level 2 structure and level 5
behaviour information
A-153
Figure B.20 The main class diagram from the JHotDraw architecture overview A-155
Figure B.21 Combined and filtered level 2 structure and behaviour information
from the novel model. Key: composition
A-155
Figure B.22 Combined and filtered level 2 structure and behaviour information
from the novel model. Key: extension, composition, implementation, invocation
A-157
Figure B.23 Combined and filtered level 3 structure and behaviour information
from the novel model. Key: structure, behaviour
A-158
Figure B.24 The sequence diagram from the JHotDraw architecture overview A-159
Figure B.25 Filtered level 3 behaviour information from the novel model A-160
Figure B.26 A component diagram produced by an experienced JHotDraw reuser A-161
Figure B.27 Combined level 3 information from the novel model A-162
Figure B.28 The reflexion model resulting from comparing the combined level 3
model with the expert’s component model
A-163
Figure B.29 A use case diagram produced by an experienced JHotDraw reuser A-164
Figure B.30 Combined level 5 information from the novel model A-165
Figure B.31 The reflexion model resulting from comparing the combined level 5
model with the expert’s use case model
A-166
Figure D.1 An ERD for the program code (level 0). Operators can be unary (e.g.
boolean NOT, ! in C, C++, and Java), binary (e.g. assignment, = in C, C++, and
Java), or ternary (e.g. conditional, ?: in C, C++, and Java )
A-175
Figure D.2 An ERD for the event trace (level 0) A-175
Figure D.3 An ERD for intra-class structure (structure level 1) A-176
Figure D.4 An ERD for inter-class structure (structure level 2). The (0, *)
cardinality of the Inheritance relationship assumes that multiple inheritance is
permitted
A-176
Figure D.5 An ERD for system architecture (structure level 3) A-177
Figure D.6 An ERD for system structure deployment (structure level 4) A-177
Figure D.7 An ERD for business structure (structure level 5) A-178
Figure D.8 An ERD for intra-object interaction (behaviour level 1) A-178
Figure D.9 An ERD for inter-object interaction (behaviour level 2) A-179
xxiv
Figure D.10 An ERD for component interaction (behaviour level 3) A-179
Figure D.11 An ERD for system behaviour distribution (behaviour level 4) A-180
Figure D.12 An ERD for business behaviour (behaviour level 5) A-180
Figure E.1 Frantz’s taxonomy of model abstraction techniques (from [Frantz
1995])
A-184
Figure G.1 A part of the S2 view of JavaDrawApp A-196
Figure G.2 A part of the B2 view for JavaDrawApp A-197
Figure G.3 The S3 view for JavaDrawApp A-198
Figure G.4 The B3 view for JavaDrawApp A-199
Figure G.5 A part of the custom B2 view A-200
Figure G.6 A part of the pruned B2 view A-202
Figure G.7 The custom S1/S2 view showing the methods of Figure and
AbstractFigure
A-206
Figure G.8 The custom B1 view of RoundRectangleFigure A-207
Figure G.9 The Together class diagram for the figures package A-209
Figure G.10 The custom S2 view of the figures package A-210
Figure G.11 The Together sequence diagram for
DrawApplication.saveAsStorableOutput()
A-212
Figure G.12 The custom B2 view A-213
Figure G.13 The main class diagram from the architecture overview A-214
Figure G.14 The custom S2 view corresponding to the main class diagram A-215
Figure G.15 The SelectionTool class diagram from the architecture overview A-215
Figure G.16 The custom S2 diagram corresponding to the SelectionTool class
diagram
A-215
Figure G.17 The sequence diagram from the architecture overview A-216
Figure G.18 The custom B2 view corresponding to the architecture overview
sequence diagram
A-217
Figure G.19 The JHotDraw expert’s component diagram A-218
Figure G.20 The S3 view of JavaDrawApp A-218
Figure G.21 The B3 view of JavaDrawApp A-219
Figure G.22 The JHotDraw expert’s use case diagram A-220
Figure G.23 The S3 and B3 views of JavaDrawApp A-221
xxv
Figure G.24 The expert’s diagram of the static behavioural relationships between
the BeautyJ components
A-222
Figure G.25 The B3 view of BeautyJ A-223
Figure G.26 The combined S3/B3 view of BeautyJ A-224
Figure G.27 The expert’s diagram of the static behavioural relationships between
BeautyJ’s main classes
A-225
Figure G.28 The custom B2 view of BeautyJ’s main classes A-226
Figure G.29 The expert’s diagram of the dynamic behavioural relationships
between BeautyJ’s main classes
A-228
Figure G.30 A part of the custom B2/B3 view of the system’s main classes A-229
Figure G.31 A part of the S2 view of BeautyJ A-230
Figure G.32 The custom S2 view of the javasource package A-231
Figure G.33 The expert’s solution for adding generics support to BeautyJ A-233
Figure G.34 The expert’s solution for adding typesafe enum support to BeautyJ A-234
Figure G.35 The expert’s solution for adding vararg support to BeautyJ A-235
Figure G.36 The expert’s solution for adding static import support to BeautyJ A-236
Figure G.37 The expert’s diagram of the internal behaviour of
StandardSourclet.buildStartSource
A-237
Figure G.38 The custom B1 view of StandardSourclet’s interactions A-238
Figure G.39 The custom B2 view of StandardSourclet A-239
Figure G.40 The expert’s diagram of Sourclet behaviour A-241
Figure G.41 The custom S1 view of ProgressTracker A-242
Figure G.42 The custom B2 view showing SourceParser interactions A-243
Figure G.43 The expert’s solution for adding ProgressTracker support to BeautyJ A-245
Figure G.44 The BeautyJ documentation component overview diagram A-246
Figure G.45 The S3 view of BeautyJ A-246
Figure G.46 The BeautyJ documentation main classes diagram A-247
Figure G.47 A custom S2/S3 view of BeautyJ A-248
Figure G.48 The BeautyJ documentation sourclet diagram A-248
Figure G.49 A custom S2/S3 view of BeautyJ A-249
Figure G.50 The BeautyJ documentation Java Source Parser diagram A-250
Figure G.51 A custom S2 view of BeautyJ A-251
Figure G.52 The application.beautyj class diagram A-254
Figure G.53 The custom S2 view showing the application.beautyj package A-254
Figure G.54 The util.javasource class diagram A-254
xxvi
Figure G.55 The util.javasource.jit class diagram A-255
Figure G.56 The custom S2 view showing the util.javasource.jit package A-256
Figure G.57 The class diagram for the util.javasource.sourclet package A-257
Figure G.58 The custom S2 view of the util.javsource.sourclet package A-257
Figure G.59 The sequence diagram for SourceParser.buildSource() A-258
Figure G.60 A custom B2 view of BeautyJ A-259
Figure G.61 The S3 view of SHriMP A-261
Figure G.62 The B3 view of SHriMP A-261
Figure G.63 The expert’s component diagram of SHriMP A-261
Figure G.64 The custom S2 view of the DisplayBean.layout package A-263
Figure G.65 The pruned custom S1 view of ShrimpView A-265
Figure G.66 The Together class diagram for the shrimp package A-266
Figure G.67 The custom S2 diagram for the shrimp package A-267
Figure G.68 The Together class diagram for the shrimp.gui package A-268
Figure G.69 The custom S2 view of the shrimp.gui package A-269
Figure G.70 The Together class diagram for the shrimp.SearchBean package A-270
Figure G.71 The custom S2 view of the shrimp.SearchBean package A-271
Figure G.72 The Together class diagram for the shrimp.DisplayBean package A-272
Figure G.73 The custom S2 view of the shrimp.DisplayBean package A-273
Figure G.74 The Together sequence diagram for
GXLPersistentStorageBean.loadData()
A-275
Figure G.75 A custom B2 view of SHriMP A-276
Figure G.76 The cookbook diagram for the model subsystems A-277
Figure G.77 A custom S3 and B3 view of ArgoUML A-278
Figure G.78 The cookbook diagram for the view subsystem A-278
Figure G.79 A custom S3 view of ArgoUML A-278
Figure G.80 A custom B3 view of ArgoUML A-279
Figure G.81 The cookbook diagram for the control subsystem A-279
Figure G.82 A custom S3 view of ArgoUML A-280
Figure G.83 A custom B3 view of ArgoUML A-280
Figure G.84 The cookbook diagram for the loadable subsystems A-281
Figure G.85 A custom S3 view of ArgoUML A-281
Figure G.86 A custom B3 view of ArgoUML A-281
Figure G.87 A part of the S1 view of ModelEventPump A-283
Figure G.88 The custom S2 view of the model.uml package A-285
xxvii
Figure G.89 The custom S1 view of CoreFactoryImpl A-288
Figure G.90 The custom S1 view of CoreHelperImpl A-289
Figure G.91 A custom S2 view of ArgoUML A-290
Figure G.92 The cookbook diagram of the main critics and cognitive classes A-291
Figure G.93 The custom S2 view of the cognitive.critics package A-292
Figure G.94 The custom S1 view of CrCircularInheritance A-293
Figure G.95 The custom S1 view of CrEmptyPackage A-293
Figure G.96 The custom S1 view of CrIllegalName A-294
Figure G.97 The cookbook diagram of the cognitive.critics package and related
classes
A-296
Figure G.98 The custom S3 combination of the cognitive.critics package and
related classes
A-297
Figure G.99 The cookbook diagram of the multi editor pane classes A-398
Figure G.100 The custom S2 view of the multi editor pane classes A-399
Figure G.101 A part of the custom S1 view of MultiEditorPane A-399
Figure G.102 The custom S2 view of the uml.diagram.static_structure.layout
package
A-301
Figure G.103 The custom S2 view of the uml.diagram.static_structure.ui package A-302
Figure G.104 A part of the custom S1 view of ClassDiagramGraphModel A-304
Figure G.105 The custom S1 view of ClassDiagramRenderer A-306
Figure G.106 The custom S2 view of the PropertyPanel classes A-308
Figure G.107 The cookbook diagram of the other languages components A-309
Figure G.108 The custom S3 view of the other languages components A-309
Figure G.109 The custom B3 view of the other languages components A-309
Figure G.110 A part of the custom S1 view of DetailsPane A-311
Figure G.111 The custom S2 view of the uml.ui.TabXXXX classes A-313
Figure G.112 A part of the custom S2 view of the ui.explorer.rules package A-315
Figure G.113 A part of the custom S1 view of PerspectiveManager A-316
Figure G.114 The custom S1 view of FigNodeModelElement A-316
Figure G.115 The custom S2 view of the moduleloader package A-317
Figure G.116 The custom S1 view of ModuleInterface A-318
Figure G.117 A part of the custom S1 view of ModuleLoader A-318
Figure G.118 The custom S1 view of ModuleStatus A-319
Figure G.119 A portion of the custom S1 view of ModuleTableModel A-319
Figure G.120 The custom S1 view of PluggableMenu A-321
xxviii
Figure G.121 The custom S1 view of PluggableDiagram A-322
Figure G.122 The custom S2 view of the Pluggable types from application.api A-323
Figure G.123 The Together class diagram for the kernel package A-324
Figure G.124 The custom S2 view of the kernel package A-324
Figure G.125 The Together class diagram for the language.java.generator package A-325
Figure G.126 The custom S2 view of the language.java.generator package A-326
Figure G.127 The Together class diagram for the model.uml package A-327
Figure G.128 The Together class diagram for the uml.reveng.java package A-331
Figure G.129 The custom S2 view of the uml.reveng.java package A-332
Figure G.130 The Together sequence diagram for
ClassDiagramGraphModel.addNode()
A-334
Figure G.131 A custom S2 view of ArgoUML A-335
xxix
List of Tables
Table 2.1 A selection of diagrams for describing software 72
Table 3.1 Tools summary comparison 106
Table 3.2 Questions summary comparison 107
Table 4.1 The proposed visualisation model for object-oriented software 114
Table 5.1 The correspondence between typical software comprehension activities
and the revised task sets
130
Table 5.2 Information required from each dimension of the proposed model to
address the general software comprehension tasks
132
Table 5.3 Information required from each dimension of the proposed model to
address the specific reverse engineering tasks
132
Table 6.1 The abstraction levels of the proposed model 135
Table 6.2 Abstraction mappings for the structure hierarchy 144
Table 6.3 Abstraction mappings for the behaviour hierarchy 146
Table 7.1 Instances of usage of each of the five abstraction levels of the model in
addressing the comprehension tasks
194
Table 7.2 Categorisation of BeautyJ evaluation questions by typical software
comprehension questions
195
Table 7.3 Categorisation of SHriMP evaluation questions by typical software
comprehension questions
196
Table 7.4 Categorisation of ArgoUML evaluation questions by typical software
comprehension questions
196
Table B.1a The level 0 structure entities and relationships extracted from the
JavaDrawApp source code. Format: line_number:name
A-55
xxx
Table B.1b The level 0 structure entities and relationships extracted from the
JavaDrawApp source code
A-56
Table B.2 The level 1 structure entities and relationships derived from Table B.1a A-57
Table B.3 The level 2 structure entities and relationships derived from Table B.1b A-60
Table B.4 The level 2 structure entities and relationships derived from Table B.2 A-62
Table B.5 The extracted level 2 structure entities and relationships A-64
Table B.6. The analyst mappings from level 2 – level 3 A-75
Table B.7 The generated level 3 structure entities and relationships A-83
Table B.8 The derived level 4 structure entities and relationships A-87
Table B.9 The analyst mappings from structure level 3 – level 5 A-91
Table B.10 The derived level 5 structure entities and relationships A-95
Table B.11 The extracted level 0 behaviour entities and relationships A-99
Table B.12 The AnimationDecorator level 1 behaviour entities and relationships
derived from Table B.11
A-100
Table B.13 The level 2 behaviour entities and relationships derived from Table
B.11
A-102
Table B.14. The level 2 behaviour entities and relationships derived from Table
B.12
A-104
Table B.15 The extracted level 2 behaviour entities and relationships A-105
Table B.16 The derived level 3 behaviour information A-118
Table B.17 The derived level 4 behaviour entities and relationships A-123
Table B.18 The derived level 5 behaviour entities and relationships A-128
Table B.19 The entity and relationship sets of the combined view A-134
Table B.20 The entity and relationship sets of the combined view. Key: level 2
behaviour, level 5 behaviour
A-138
Table B.21 The entity and relationship sets of the combined view. Key: Level 2
structure, Level 5 behaviour
A-148
Table E.1 Abstraction processes listed by Zeigler (from [Ziegler 2000]) A-183
Table E.2 Correspondence between abstraction operations in the novel model and
Frantz’s taxonomy
A-186
Table F.1 Entities and relationships at behaviour levels 1 and 2 A-189
Table F.2 Entities and relationships at structure levels 2 and 3 A-189
Table F.3 Activities metamodel correspondence A-190
xxxi
Table F.4 Interactions metamodel correspondence A-190
Table F.5 Classes metamodel correspondence A-191
Table F.6 Components metamodel correspondence A-192
Table G.1 Correspondence between interactions in the architecture overview
sequence diagram and VANESSA diagram
A-216
xxxii
1 Introduction
“Software visualisation is nifty stuff”
M Petre, A F Blackwell, T R G Green [Petre 1997]
1.1 Background
1.1.1 Software visualisation
This thesis presents a novel software visualisation model for supporting object-oriented
software comprehension. Software visualisation is the process of modelling software systems
for comprehension [Price 1993]. The comprehension of software systems both during and
after development is a crucial component of the software process. The complex interactions
inherent in the object-oriented paradigm make visualisation a particularly appropriate
comprehension technique. Software visualisation is therefore a useful technique in object-
oriented software maintenance. The large volume of information typically generated during
visualisation necessitates tool support.
1.1.2 Software comprehension
Software comprehension involves gaining an understanding of the functionality, structure,
and behaviour of a software system [von Mayrhauser 1995]. Software comprehension has a
number of applications in the development and maintenance of software. During the
development phase, software comprehension techniques can be used to ensure that the
system being developed complies with the system design. During software maintenance,
software comprehension can be applied to assist software evolution (extension and
contraction of functionality), reverse engineering (extracting design information),
reengineering (changing existing functionality), and refactoring or restructuring (improving
code by making it more extensible or maintainable). Software comprehension also has
applications in the field of software reuse [Szyperski 1998, Fayad 1999], where source code
or accurate documentation are not always available. Other areas of application include
1
redocumentation (documenting existing software) and legacy system migration (making old
systems work in new environments, e.g. the World Wide Web).
1.1.3 Challenges in object-oriented software comprehension
The prevalent software engineering paradigm in use today is the object-oriented (OO)
approach. Object-oriented software systems present new challenges for software
comprehension compared to traditional procedural or functional systems. The principal
features of the OO paradigm that place additional requirements on comprehension and
visualisation techniques, and hence require new approaches, are complex control flow,
inheritance, polymorphism, and dynamic binding. In the OO paradigm there are typically
many asynchronous interactions between objects and different points where methods can be
called; it is now much more difficult to follow a program’s execution than was the case with
traditional paradigms which produced more linear programs. Polymorphism is a concept
closely related to inheritance that allows a subtype to be substituted for a reference to a type
in the program; this makes it difficult to determine exactly which class is actually being
referred to in the program. Dynamic binding is a method of implementing polymorphism
where the type to be used is not bound to the type reference until runtime; thus it is not
possible to determine statically (i.e. from the program code) which class, and hence which
methods, will actually be referred to at runtime, nor is there any guarantee that subsequent
executions of the same code will refer to the same classes/methods.
1.1.4 Reverse engineering tools support software comprehension
Reverse engineering [Cross 1992] describes the process of analysing a software system
(complete or incomplete) in order to extract information about its design. Reverse
engineering tools exist to support the software developer in his software comprehension
tasks. A variety of industrial and academic reverse engineering tools exist, employing either
static or dynamic techniques, or an integrated approach. These tools range from relatively
simple debuggers, which allow the developer to step through the code execution in a
controlled manner and examine variable assignments and method calls as they occur, to
interactive visualisation tools, which produce diagrams to the user’s specification, based on a
directed analysis of the software system.
2
1.1.4.1 The relationship between forward and reverse engineering
The reverse engineering process can, to an extent, be considered the antithesis of the forward
engineering process. The traditional forward software engineering process (linear sequential
model, waterfall model, or classic life cycle) comprises: requirements elicitation and
analysis, high level design, detailed design, code generation, testing, and maintenance, in
that order [Pressman 2000 Sec. 2.4]. The reverse engineering process is equivalent to the
reverse of the second, third, and fourth stages of this process. The first stage of reverse
engineering is to extract information from the program code. This information can then be
analysed to produce low-level and high-level views of the software system. This relationship
is illustrated in Figure 1.1.
Figure 1.1 The relationship between forward and reverse engineering
1.1.5 Abstraction
Abstraction is the process of producing a simplified representation that emphasises the
important information while suppressing details that are (currently) uninteresting, with the
goal of reducing complexity and increasing comprehensibility [Berard 1993]. Thus, a more
abstract representation is produced from a less abstract (i.e. more detailed) base
representation by applying an abstraction operation to it. An abstraction operation may
perform aggregation on the entities and relationships in the representation, or it may apply
3
some mapping or other function to the information. Different levels of abstraction are
commonly employed in both the forward and reverse engineering of object-oriented software
systems. For example, a software system is typically specified at a high level of abstraction
during the initial design phase, which is then refined to a more detailed (less abstract)
representation later in the development prior to the system’s implementation in code.
Conversely, in reverse engineering, it is possible to generate representations of an existing
system at various levels of abstraction. For example, a debugger produces detailed
information about method calls and variable accesses, while an integrated development
environment (IDE) may provide functionality to extract a (more abstract) class diagram from
the system’s source code.
1.2 Thesis overview
1.2.1 Motivation and aim
It appears that software visualisation tools are seldom employed outwith the context of
research. This is because current tools are relatively tightly focussed in that they address only
a very small range of abstraction levels or a single aspect (i.e. structure or behaviour) of the
software. As a result, each of the extant software visualisation tools addresses only a small
subset of the range of software comprehension tasks. In order to address the limitations of
current visualisation techniques, an approach is proposed that integrates abstraction with
structural and behavioural perspectives. The aim of this research is to improve the
effectiveness of visualisation techniques for large-scale software understanding. The
motivation for this work was the lack of a unified model for software visualisation that
allows the analyst to move conveniently between abstraction levels. Such a model would
allow the analyst to visualise the information required for their analysis within the context of
the system as a whole, and hence to relate and reason about visualisations.
1.2.2 Research hypothesis
The research hypothesis explored in this thesis is whether a model that supports visualisation
of software through a range of abstraction levels that incorporate structural and behavioural
4
views and integrates statically and dynamically extracted information provides effective
support for the full range of software comprehension tasks.
1.2.3 Approach and methodology
The first stage in investigating this hypothesis was to examine related work in the fields of
software visualisation, tool evaluation, abstraction, diagrams, views, exploration and
querying, metamodels, and software modelling. An initial case study was then carried out to
assess the capabilities of the extant software visualisation tools. It is from the results of this
study that the research hypothesis was derived. A model was then proposed based on
multiple levels of abstraction, structural and behavioural perspectives, and the integration of
statically and dynamically extracted information. The model was assessed theoretically
against its original goals and then refined. Support for software comprehension strategies in
the proposed model was considered and abstraction operations between views in the model
and the combination of views were defined formally. The model was then applied manually
to a real system to demonstrate and verify its utility. A tool implementation of the model was
developed to facilitate its evaluation. The tool was used to perform a replication of the
original study with the novel model proposed, and to evaluate the performance of the model
in addressing typical software comprehension tasks in real world software systems. It is
concluded that the model proposed in this thesis provides support for the full range of
software comprehension tasks.
1.2.4 Contributions of this thesis
This thesis presents a novel software visualisation model based on a range of abstraction
levels and incorporating structural and behavioural perspectives of software, and introduces
a prototype implementation of the model. An abstraction scale and set of criteria for
classifying software comprehension tools are presented, and are used to review and compare
the extant software visualisation tools. A schema for categorising view arrangements in
software tools is presented. Typical software comprehension activities and tasks to be used
in the evaluation of software comprehension tools are proposed, and are used to assess the
extant tools and the novel model presented in this thesis.
5
2 Related Work
“Effectively presenting large amounts of information in any form is challenging.”
M-A D Storey, H Müller [Storey 1995]
This chapter discusses related work in order to provide a foundation for the work in
succeeding chapters and demonstrate the need for such work. Firstly, the basic techniques
involved in producing software visualisations are discussed. An overview and comparison of
the extant software visualisation tools is then presented. The fundamental concept of
abstraction is explored in detail. Various diagrams for presenting visualisations are then
described, along with the concept of views for organising them, and techniques for exploring
and querying visualisations. Related work from the field of software modelling is discussed.
Lastly, a variety of evaluation techniques in software comprehension and visualisation are
surveyed.
2.1 Software visualisation techniques
This section introduces the basic techniques involved in software visualisation – static and
dynamic extraction, data analysis, and presentation.
2.1.1 Static software comprehension techniques
Software comprehension techniques can be classified as either static or dynamic. Static
techniques analyse a system by examining its source or object code. Static techniques can
help in understanding the relationships between classes in a system, and in identifying the
system architecture [Müller 1993]. Although software systems written in procedural
languages are well suited to analysis with static techniques, aspects of the object-oriented
paradigm, such as polymorphism, overloading, and dynamic binding, make it more difficult
to gain an understanding of an object-oriented software system using static techniques alone.
Gamma et al. [Gamma 1995 pp.22-23] state, “An object-oriented program’s run-time
6
structure often bears little resemblance to its code structure. The code structure is frozen at
compile-time; it consists of classes in fixed inheritance relationships. A program’s run-time
structure consists of rapidly changing networks of communicating objects. In fact, the two
structures are largely independent. Trying to understand one from the other is like trying to
understand the dynamism of living ecosystems from the static taxonomy of plants and
animals, and vice versa. […] With such disparity between a program’s run-time and
compile-time structures, it’s clear that code won’t reveal everything about how a system will
work.”
2.1.2 Dynamic software comprehension techniques
Dynamic software comprehension techniques analyse a software system by extracting
information from the system as it is executing. Dynamic techniques can help to illustrate the
interactions between objects in a target system, and the flow of control between the system’s
components. Dynamic software comprehension techniques address many of the
shortcomings of static techniques in the comprehension of object-oriented software systems.
A potential disadvantage of dynamic techniques is that they can consider only a subset of the
software system’s possible behaviour. While static techniques can analyse the entire system,
dynamic techniques analyse only the behaviour evident in the execution trace. It is the
responsibility of the analyst to ensure that a suitably representative trace is selected for
analysis.
2.1.3 Advantages of dynamic analysis for object-oriented systems
As described above, dynamic analysis is particularly useful in the context of object-oriented
software systems. Dynamic information describes the actions of a system a run time; it
includes information such as object instantiation and communication, method calls, and
branching decisions. In contrast to the collection of static information, dynamic analysis
takes place in the context of a running system, rather than by examination of static program
code or design documents. As described in Section 1.1.3, the object-oriented programming
model often has a complex control flow, with many asynchronous interactions between
objects and points where methods can be called. Information for the comprehension of
object-oriented systems is hence often difficult to collect and complex to analyse. The large
7
number of object interactions and often unpredictable control flow can result in a large and
complicated event trace.
It should be stressed that dynamic analysis does not supplant static analysis, even in the
context of object-oriented systems. However, much of the information traditionally collected
by static analysis techniques can also be collected dynamically, thus subsuming much of the
functionality of static techniques. For example, that one method calls another method can be
revealed through dynamic analysis, but may be also be evident using static analysis
techniques, such as a call graph extractor (notwithstanding the difficulties posed to static
techniques by object-oriented concepts such as polymorphism, overloading, and dynamic
binding, as noted above). As noted above, there are a number of exceptions to the
subsumption of static analysis techniques by dynamic analysis in the form of information
that cannot necessarily be extracted dynamically. Such information includes, for example,
line numbers, comments, and branch conditions. It depends upon the goals of the software
comprehension process whether it is more important to know, for example, the conditions
pertaining to a branch structure, or whether or not that branch is taken during the execution
of the software. Also, as noted above, dynamically extracted information pertains only to the
program execution from which it was extracted, and does not necessarily represent all of the
possible runtime behaviour of a system. Thus, as in all scientific analysis, the analyst must
select the technique appropriate to his task.
2.1.4 Debuggers
A debugger is a utility that enables the collection of dynamic information from a running
system, and has long been part of the software engineer’s tool set. Debuggers operate in an
online mode, producing output as the software executes. The software engineer can control
the execution of the software by means of the debugger interface, for example by stepping
through the code or by suspending and resuming threads of execution. Breakpoints can also
be set at points in the code in which the software engineer is interested. Upon encountering a
breakpoint during execution, the debugger will output an appropriate message, e.g. ‘Method
x called’. The debugger can also be used to examine the values of variables and expressions
during execution. A debugger provides a view of a software system at a low level of
abstraction (i.e. at a level relatively close to the level of detail provided by the source code
itself), and can be invaluable in locating code-level errors a program. However, its low level
8
of abstraction is less useful in software comprehension activities, where a view at a higher
level of abstraction (i.e. at a level more distant from that of the code) of the system under
analysis is often required [Ball 1996].
2.1.5 Software visualisation tools
The foregoing discussion suggests that tools are required to assist the software developer in
the collection and analysis of information. As with any scientific analysis procedure, analysis
of a software system consists of three phases: collection of data about the software system;
analysis of the data collected; and presentation of the analysis results. In common with many
other scientific fields [Nielson 1990], visualisation has been found to be a particularly
effective method of presentation for the large and complex data sets produced by dynamic
analysis [Roman 1993].
Software visualisation tools typically operate in an offline mode, in which the collection
phase precedes the analysis and presentation phases. Walker et al. [Walker 1998] explain
that an offline system has two distinct advantages over an online system. Firstly,
preprocessing of the entire data set can be carried out prior to the presentation of the results,
allowing summary information to be produced for the execution. Such information can be
useful in helping the analyst to gain an overall view of the system. Secondly, (a part of) the
execution can be analysed repeatedly without requiring the execution to be repeated. This
allows the analyst to examine the same execution data in a number of different ways.
However, the disadvantage of the offline approach is that it is not possible to explore
alternative paths through the execution without rerunning the execution. This makes it
difficult to ask “What if…” questions of the system under analysis.
2.1.6 Data collection for software visualisation
During the data collection phase, static data is extracted from the program code and dynamic
data is extracted from the system during execution; this data is stored in a repository on disk.
The repository can be either a simple file or set of files, or a database. The usual advantages
and disadvantages of database systems also apply in this context: while it is quicker to write
9
a simple text file, a database can be queried more efficiently. The most appropriate
repository format depends on the functionality of the visualisation tool.
Extracting information statically from the program code typically involves analysis of the
program code (e.g. by means of call graph analysis [Grove 1997]). Dynamic data collection
occurs during the execution of the system. This necessitates some form of data collection
procedure running either within or alongside the system. One method of collecting this data
is by instrumentation of the source or object code of the system. This involves inserting
additional statements into the code that generate appropriate output when an ‘interesting’
event occurs during the execution of the system. In the context of object-oriented systems,
‘interesting’ events are usually defined as method calls and returns (when instrumenting the
caller) or method entries and exits (when instrumenting the callee). Koskimies and
Mössenböck [Koskimies 1996] explain that the advantages of inserting the instrumentation
in the caller’s code are that callee methods with multiple return points do not require
additional instrumentation, and that information about the caller method is conveniently
available. However, the instrumentation of method calls within expressions can appear
convoluted. Instrumentation of the source code can also reduce the readability of the code.
Code can be instrumented manually or automatically, e.g. using a preprocessor as in Scene
[Koskimies 1996]. One method of instrumenting either source or object code is the use of
wrappers. Brant et al. [Brant 1998] define wrappers as “mechanisms for introducing new
behaviour that is executed before and/or after, and perhaps even in lieu of, an existing
method”. Method wrappers were used in Gaudi [Richner 1999] to add instrumentation to the
code of the system under analysis.
Another method of collecting the dynamic data required for visualisation is the
instrumentation of the environment in which the software system is executing. This method
has the advantage that no changes to the source or object code of the system are required.
The environment is instrumented to produce appropriate output on the occurrence of relevant
events, as with code-level instrumentation. This method is employed in Ovation [De Pauw
1998], where the system under analysis is executed in an instrumented Smalltalk [Goldberg
1983] environment.
An alternative to instrumentation of the code or environment is to run the system under the
control of a debugger. Breakpoints set at appropriate points generate the output required.
Breakpoints can be set automatically, e.g. at every method entry and exit. This technique is
10
used in Shimba [Systä 2001] to generate trace information for selected methods and control
statements. Jinsight [De Pauw 2002] uses a profiling agent to control execution. As with an
instrumented environment, running under debugger or profiler control does not require
changes to be made to the code.
All of these methods are employed by the tools that are discussed later in this chapter and
evaluated in Chapter 3.
2.1.7 Analysing the data produced
The huge amount of data produced during the data collection phase must be analysed to
produce useful information about the software system. Three ways of reducing the event
trace to a manageable size are: selective instrumentation, pattern recognition, and
abstraction. These techniques may be used singly or in sequence.
Selective instrumentation involves instrumenting only those methods that are considered
‘important’. An analyst interested in gaining an overall understanding of a system may
choose to exclude all methods in library classes (such as java.lang.* and
javax.swing.* in Java [Arnold 2000, Gosling 2000, Sun 2005]). Alternatively, an
analyst pursuing a specific reverse engineering task, such as investigating how two classes
interact, may choose to instrument only the methods of those classes. Selective
instrumentation is employed in Shimba.
Pattern recognition is concerned with examining the event trace to detect repetition, in order
that this can be factored out to improve comprehensibility. This can performed using string
matching algorithms, such as the Boyer-Moore algorithm [Boyer 1977] used in Shimba.
Alternatively, Ovation employs a hashing technique to detect and generalise patterns in the
event trace.
Abstraction according to specified criteria can be performed on the event trace to raise the
level of abstraction from that of individual method calls and returns to some higher level.
Gaudi allows trace elements to be clustered arbitrarily into user-defined components to aid
understanding of the system under investigation.
11
Additionally, traces may be split manually or automatically into one or more smaller traces
to aid manageability, as in Shimba. It is also possible to start and stop the instrumentation
process, producing trace output only for defined periods of the system’s execution, as in
AVID [Walker 1998].
2.1.8 Presenting the results
The goal of software visualisation is to present information about the software system under
investigation to the analyst in a format that is useful in helping them to achieve their
software comprehension tasks. A number of diagramming techniques have been employed in
visualisation tools in an effort to achieve this goal; these include basic graph representations,
UML diagrams, and message sequence chart-based representations.
2.1.8.1 Basic graph representations
Basic node/arc graphs are often used to illustrate the structure or behaviour of a software
system. For example, flow graphs are directed graphs that can be used to represent the flow
of control in a system; one application of these is in testing [Pressman 2000 Sec. 17.4.1,
17.6.1]. In an object-oriented context, directed or undirected graphs can be used to depict
object interactions by representing objects as nodes and messages as directed arcs between
nodes. The problem of scalability common to many representations is particularly evident
with graphs – attempting to draw numerous messages between objects can quickly reduce
the readability of the diagram. Directed graphs are used in Program Explorer [Lange 1995a].
The tool described by Sefika et al. [Sefika 1996a] and Gaudi employ directed graphs to
illustrate interaction between system components, while Dali [Kazman 1999] uses
undirected graphs to do so. The reflexion models used to show this in AVID are based on
directed graphs. Shimba uses undirected graphs to illustrate static method dependencies.
2.1.8.2 UML diagrams
Interaction diagrams, statechart diagrams, and activity diagrams are part of the UML
diagramming standard [Rumbaugh 1999, OMG 2001], which provides diagrams that
12
illustrate both the static structure and dynamic behaviour of a system. Interaction diagrams
illustrate interactions, which comprise objects, the relationships between them, and the
messages that are passed among them. There are two types of interaction diagrams:
collaboration diagrams and interaction sequence diagrams (sequence diagrams). The
emphasis of collaboration diagrams is on the structural organisation of the objects, while
sequence diagrams emphasise the temporal order of the messages passed between the
objects. Though semantically equivalent, the information shown by the two types of diagram
differs: collaboration diagrams show the connections between objects, while this is only
implied in sequence diagrams. While sequence diagrams show message return values,
collaboration diagrams do not. Figures 2.1 and 2.2 show a pair of corresponding sequence
and collaboration diagrams representing the Singleton.getInstance() method of
the Singleton design pattern [Gamma 1995 pp. 127-134]. This pattern ensures that a class
has only one instance in a system, and provides a global point of access to the instance. For
example, a system may be connected to a number of printers, but there should be only one
print queue. The getInstance() method returns this instance. Interaction diagrams solve
some of the scalability issues inherent in graph-based representations by representing time
explicitly along the vertical axis.
Figure 2.1 The UML sequence diagram for the Singleton design pattern
13
Figure 2.2 The UML collaboration diagram for the Singleton design pattern
Statechart diagrams (statecharts) [Harel 1990] model the behaviour of an individual object
as it changes state in response to events. Statecharts emphasise the states in which an object
can exist and the transitions between these states. Activity diagrams are flowcharts that
describe the flow of control between activities; the Together documentation [TogetherSoft
2001a] describes an activity as “an ongoing, non-atomic execution within a state machine”.
While interaction diagrams emphasise the flow of control between objects, activity diagrams
emphasise control flow between activities.
Together ControlCenter [TogetherSoft 2001b] synthesises interaction diagrams, statechart
diagrams, and activity diagrams from source code. An extended version of UML sequence
diagrams and statecharts are used in Shimba. Booch’s object interaction diagram [Booch
1994] (a precursor to the UML sequence diagram) is used in Sefika et al.’s approach [Sefika
1996a] to illustrate component interaction in a system.
2.1.8.3 Message sequence charts
Message sequence charts (MSCs) [ITU-T 1996] are similar to UML interaction diagrams.
Objects are listed along the top of the diagram, with vertical lines indicating the lifetime of
the object. Messages between objects are shown as directed arcs; time progresses
downwards. A variation of message sequence charts is used in Ovation [De Pauw 1998]. De
14
Pauw et al. [De Pauw 1998] explain that the tree-structured interaction diagrams – called
execution patterns - used in Ovation emphasise the progression of time, rather than the flow
of control as in sequence diagrams. They also give the disadvantages of sequence diagrams
as being that they do not scale conveniently, there are ambiguities in that the ordering of
objects on the horizontal axis is arbitrary, and the lifetimes of recursive calls are not easily
resolved. The execution patterns of De Pauw et al. also use colour to indicate the class of an
object, and label each object with a unique identifier. De Pauw et al. explain that execution
patterns address the perceived shortcomings of sequence diagrams because being
unidirectional in both axes makes them more convenient to read, they scale better, and more
efficient use is made of space in both axes.
An early form of MSC - an interaction chart - was used in Program Explorer. The OMT
[Rumbaugh 1991] event trace diagram (scenario diagram) used in Scene [Koskimies 1996]
is a variant of the MSC. ISVis [Jerding 1997] used a style of MSC called a Temporal
Message Flow Diagram (TMFD) [Citrin 1995].
2.1.8.4 Other representations
A number of less widely used representations also exist, including three-dimensional
[Marcus 2003a] and virtual reality [Maletic 2001] visualisations (discussed later in the
context of specific tools).
Chuah et al. present three novel interactive glyphs for visualising software: InfoBUG,
timeWheel, and 3D-wheel [Chuah 1997]. The InfoBUG glyph provides an overview of the
software’s components. It is shaped like an insect and consists of wings, head, tail, and body,
each of which encodes some metric about the software, such as LOC, number of errors, lines
of code added or deleted, contained objects, etc. The timeWheel glyph illustrates multiple
properties of a software system over time. Each property is represented by a time series,
which are laid out in a circle. This glyph is useful for visualising trends in evolution. The
3D-wheel glyph represents the same data as timeWheel, but uses height to denote time. Each
variable is represented by an equal-sized portion of the circumference of the circle, with its
radius (i.e. thickness) denoting its value (as in a rose diagram). It is easier to identify trends
using the 3D-wheel than the timeWheel, but harder to identify divergences. The approach is
15
demonstrated by a description of its application to a large real-time software system
developed by thousands of developers over twenty years.
Eick et al. present several visualisations designed to help in understanding and managing the
software change process [Eick 2002]. These are matrix views, cityscapes, bar and pie charts,
data sheets, and networks. Multiple visualisations can be combined to form perspectives that
show high-level structure in change data while allowing access to details. Matrix views are
effective for displaying values that are a function of two indices. Advantages of matrix
displays are that many cells are visible and there is no overplotting. One drawback that also
applies to cityscape views is the arbitrary ordering of the columns and rows, which makes it
difficult to relate the representation to other data. Cityscapes are 3D bar charts which are an
extension of matrix views. There are two indices with one or more values. Although more
compelling than 2D matrices, cityscapes have decreased scalability and suffer from
occlusion. The bar and pie charts used here have been enhanced to improve scalability, thus
bar charts scale effectively, though pie charts still do not. These representations are most
effective when used as selectors linked to other views. A data sheet is basically a
multicolumn scrollable textual display, with the addition of zooming. They are useful in
providing direct access to details, and are especially effective when linked to other views. In
network views, nodes represent software units and visual attributes denote measures of
association between them. The strength of network views lies in revealing high-level
structure. Their weaknesses are lack of scalability, inability to display multiple link
characteristics, and overlap. Perspectives are used to show multiple views simultaneously,
with links between them so that manipulations in one view can be reflected in the others.
Usage is demonstrated in the context of understanding software change by exploration of
change data, and managing software development.
2.2 Software visualisation tools
This section discusses a representative selection of the extant software visualisation tools. A
scheme for characterising software visualisation tools and a scale for measuring abstraction
level are presented. The extant tools are then assessed and discussed in the context of this
framework. Examining and comparing the existing tools in this way emphasises the
capabilities of current software visualisation tools and highlights potential areas for
improvement.
16
2.2.1 Characteristics of software visualisation tools
2.2.1.1 Three criteria for characterising software visualisation tools
From the foregoing discussion, three distinguishing criteria regarding software visualisation
tools can be identified. The first of these is the method used to extract the dynamic
information from the software system. Techniques include instrumentation of the source or
object code (e.g. using wrappers) or environment, or running the system under the control of
a debugger or profiler.
The second criterion is the methods of analysis that are applied to the extracted data to
improve its comprehensibility and usefulness to the analyst. These include selective
instrumentation, pattern recognition, abstraction, trace splitting, and suspension/resumption
of tracing.
The third criterion is the method by which the results of the visualisation are presented to the
analyst. Diagramming techniques are typically based on graphs, UML diagrams, or message
sequence charts.
2.2.1.2 A scale to indicate level of abstraction
The combination of these three criteria determines the level of abstraction at which the
software visualisation tool operates. This thesis proposes a scale for the classification of
software analysis tools based on their level of abstraction; this is illustrated in Figure 2.3. An
ordinal scale is used to assign a value (or range of values) from one to five to a tool, based on
its position relative to the five indicated reference points. At the microscopic end of the
scale, debuggers (1) are representative of the lowest level of abstraction that an analysis tool
can produce. At the opposite, macroscopic, end are tools that provide a broad overview of an
entire software system at a high level of abstraction (5). The middle portion of the scale
ascends from tools that illustrate method calls and returns (2), through tools giving an object-
or class-level representation of the system (3), to tools that provide an architectural-level
view of the system (4). The program code itself can be considered to be at level 0. The
application of this scale is not restricted to the assessment of software visualisation tools. It
17
is equally applicable to diagrams, and indeed other forms of documentation, at any stage of
the software engineering life cycle.
Figure 2.3 A scale to indicate level of abstraction
2.2.1.3 Software visualisation tool taxonomies
The remainder of this section examines a representative sample of the software visualisation
tools that have been produced. Each tool is described according to the three criteria proposed
in Section 2.2.1.1, and placed on the abstraction scale described in Section 2.2.1.2.
The template used to categorise software visualisation tools in this section is based around
four headings – Description, Evaluation, Comparison, and Assessment. A number of
alternative taxonomies have been proposed in the literature, including work by Myers
[Myers 1986], Chikofsky and Cross [Chikofsky 1990], Stasko and Patterson [Stasko 1992],
Price et al. [Price 1992, Price 1993], and Roman and Cox [Roman 1993]. Price et al. [Price
1993] propose a detailed, multi-level taxonomy for classifying software visualisation tools.
Unlike earlier taxonomies that have derived categorisations based on observations of tools,
Price et al. justify their categories (Scope, Content, Form, Method, Interaction, and
Effectiveness) based on the theory of visualisation tools. They then attempt to classify a
selection of software visualisation tools according to this taxonomy. The software
visualisation tools in this thesis are categorised according to four categories that were
observed from the extant software visualisation tools (extraction, analysis, and presentation
techniques, and abstraction level). There is some commonality between these categories and
those of the taxonomy of Price et al. While this categorisation may be less detailed than the
18
taxonomy of Price et al., it provides much of the cogent information that may be required
when selecting a software visualisation tool for a software comprehension or reverse
engineering task.
2.2.2 Program Explorer (level 2)
2.2.2.1 Description
Lange and Nakamura [Lange 1995a] discuss the investigation of object-oriented frameworks
by means of visualisation that focuses on identifying design patterns. They describe Program
Explorer, a tool that uses a combination of static and dynamic information to visualise C++
programs. The program has a GUI front end, and queries are formulated in Prolog. Static and
dynamic information are stored in a single “Program Database”. The static information for
this database is gleaned from files output by a compiler. The system consists of a C++
program database; an “instrumentation utility” that instruments the C++ source code; a
“Trace Recorder” that is linked with the program under analysis to capture the event trace
during execution; and “Program Explorer”, which is used to control the execution of the
program, and present the static and dynamic information using its GUI.
The tool presents the visualisations using a graph-based representation and interaction charts.
These visualisations can be navigated step-by-step in a hypertext-like manner (e.g. expand a
node in the graph, explore a relationship between two nodes). The dynamic information is
extracted by automatic instrumentation of the source code. Further information on the
instrumentation technique is given by Lange and Nakamura [Lange 1995b]. A version that
uses runtime trapping instead of source code instrumentation, thus eliminating the need for
an extra compilation stage at the expense of execution speed, is discussed by Lange and
Nakamura [Lange 1997].
The visualisation can be localised by allowing the user to set breakpoints at a variety of
points in the source code, including classes, objects, function calls, etc. This also allows the
user to switch the tracing on and off, limiting the size of the trace. It does not appear that any
automatic analyses (e.g. behavioural pattern matching as in Shimba, described in Section
2.2.10) are applied to limit the size of the event trace and, hence, the resultant diagrams.
19
Lange and Nakamura [Lange 1995a] explain how identifying design patterns can help in
framework understanding, using examples from the Interviews framework [Linton 1992]. It
is argued that patterns can help in two ways. Firstly that, once identified in the software
comprehension process, they can help to “fill in the blanks” about the rest of system.
Secondly, patterns can provide a starting point for the exploration of a system. Lange and
Nakamura comment that although some automation using heuristics may be possible, it is
unlikely that the identification of patterns in the visualisation could be fully automated. The
system relies on the user being “pattern-literate”, and being able to identify the semantics of
design patterns from the method names and interactions between objects.
2.2.2.2 Evaluation
Lange and Nakamura [Lange 1995a] cite user reports that the tool was useful for three types
of task, namely: in supporting the understanding of certain specific C++ frameworks; for
reviewing designs, allowing visualisation of the implemented design in comparison to the
original design; and for visually debugging the application logic of C++ systems. Lange and
Nakamura state that Program Explorer provides framework developers with “abstract”
design pattern views, and “microscopic” views that provide sufficient detail as to make
source code superfluous in the software comprehension process. They explain that Program
Explorer’s ability to handle complex frameworks such as Interviews and CommonPoint
[Taligent 1994] is attributed mainly to the integration of static and dynamic information,
ease of view navigation and interaction, the ability to trace selectively, and user control over
the execution.
2.2.2.3 Comparison
At the time of the original paper [Lange 1995a], there do not appear to be any tools with
functionality comparable to that of Program Explorer. Lange and Nakamura [Lange 1995a]
discuss briefly two static analysis tools - CIA++ [Grass 1992] and GraphLog [Consens
1993] - and two dynamic analysis tools - Object Visualizer [De Pauw 1993, De Pauw 1994]
and HotWire [Laffra 1994]. Both dynamic tools are based on the same instrumentation
mechanism, which is less accurate than that used in Program Explorer in that it lacks
information on implicit functions, variable usage, and variable values. Object Visualizer is an
20
object-oriented profiling tool, while HotWire is a visual C++ debugger. Lange and
Nakamura explain that it is HotWire that is most similar to Program Explorer. Both HotWire
and Program Explorer generate microscopic visualisations concerning the state and
behaviour of objects, while Object Visualizer provides a more general overview, similar to a
profiling tool.
Jerding and Rugaber [Jerding 1997] observe that Program Explorer is not intended to give
an overall understanding of a software system, but to focus on specific classes or objects.
The analyst must be aware of what he is looking for, or where in the execution it occurs,
before he begins his analysis.
Walker et al. [Walker 1998] note that the analyst must possess an in-depth knowledge about
the software system under analysis in order to query usefully the fine-grained models
produced by Program Explorer.
De Pauw et al. [De Pauw 1998] argue that Program Explorer bridges the gap between
microscopic and macroscopic extremes by its use of interaction diagrams. However, they
note that such diagrams are inconvenient and suffer from the difficulties described
previously relating to scalability, ambiguity, and recursive calls.
Richner and Ducasse [Richner 1999] note that the highest level of abstraction supported by
Program Explorer is the class or object level.
Systä et al. [Systä 2001] observe that the analyst cannot specify how the event trace is split
into sequence diagrams, and that the level of abstraction of these diagrams is fixed.
2.2.2.4 Assessment
The comments made by Jerding and Rugaber [Jerding 1997] suggest that Program Explorer
is more suited to specific reverse engineering tasks than overall software comprehension, as
the analysis must be focussed clearly. Therefore, it would be expected that Program
Explorer might struggle with general software comprehension tasks, but could perform well
in specific reverse engineering tasks, depending on the level of detail required for useful
analysis.
21
2.2.3 Scene (level 2)
2.2.3.1 Description
Koskimies and Mössenböck [Koskimies 1996] describe a tool called Scene (Scenario
Environment) that produces scenario diagrams from a dynamic event trace. Calls can be
expanded and collapsed to simplify the scenario diagram. A hypertext approach enables the
analyst to click various areas of the diagram (e.g. a method call or an object) to jump to a
related document (e.g. a point in the source code or a class interface). An externally-
produced class diagram can also be linked to the scenario diagram. The scenario diagrams
are partitioned to display only as many objects as fill the screen horizontally, thus
eliminating horizontal scrolling. Calls to ‘uninteresting’ objects can be filtered out in the
diagram by selecting the object(s) to retain or discard. Calls can be expanded and viewed in a
‘single-step mode’ where subsequent events are displayed one by one in a separate window.
For any call event (or for the whole diagram), a call summary can be viewed in the form of a
call matrix.
The system is implemented in the Oberon-2 language [Mössenböck 1991] and runtime
environment [Reiser 1991, Wirth 1992], and traces programs in this language. Oberon-2 is a
hybrid language, which, in addition to the object-oriented concepts of classes and methods,
also supports modules and procedures. The trace is obtained by automatically instrumenting
the source code using a preprocessor, which is then compiled and executed. The event trace
is then input to Scene, which produces a scenario diagram. ‘Uninteresting’ modules (e.g.
those related to mouse events in a GUI) can be excluded from the instrumentation, or
instrumented manually by the analyst at their discretion.
One problem identified was the lack of support for understanding of the relationships
between multiple scenario windows, which represent a hierarchy. Future work includes
automatic production of object state information, instrumentation of object rather than source
code, and application of Scene to other languages such as C++. More detail on Scene is given
by Koskimies and Mössenböck [Koskimies 1995a].
22
2.2.3.2 Evaluation
Koskimies and Mössenböck [Koskimies 1996] report that Scene has been used to analyse a
number of “framework-like” systems, including a compiler construction framework
[Koskimies 1995b] and a graphics editor [Templ 1994]. They believe that Scene is most
beneficial in the analysis of frameworks, as understanding their complex dynamic behaviour
is vital for reuse.
2.2.3.3 Comparison
The scenario diagrams used in Scene are similar to Program Explorer’s interaction charts,
and represent the same level of abstraction. Both tools extract dynamic information through
automatic instrumentation of the program source code.
2.2.3.4 Assessment
The similarity of the representations in both Scene and Program Explorer would suggest that
Scene may also be better suited to targeted reverse engineering tasks than overall software
comprehension activities. However, the diagram manipulations and summary generation
supported by Scene would be expected to give it an advantage over Program Explorer in
overall comprehension tasks.
2.2.4 Architecture-oriented visualization (level 4)
2.2.4.1 Description
Sefika et al. [Sefika 1996a] discuss the concept of architecture-oriented visualization, which
is concerned with the visualisation of architecture-level components of software systems,
e.g. subsystems, frameworks, design patterns. Sefika et al. state that it is often architectural-
level questions that are most useful in understanding software systems, but that answering
such questions using traditional programming tools is difficult for a number of reasons.
Firstly, the volume of data generated by “flat” instrumentation of method calls and returns is
23
too great for architectural-level understanding, and its collection disrupts the system under
analysis. Secondly, the more abstract architectural structures are hidden from
instrumentation. Thirdly, prior systems have scant support for multiple perspectives or
hierarchical navigation, making it difficult to analyse the information from various
abstraction levels and design aspects that is required to discern the software architecture.
The (unnamed) tool described by Sefika et al. generates a variety of diagrams. Summary
statistics are displayed using bar charts and ternary diagrams [Haynes 1995]. Relationships
between run-time information and static system structure are represented using space-filling
diagrams [Baker 1994]. Interaction between system components is illustrated using affinity
diagrams [Sefika 1996b] and object interaction diagrams [Booch 1994].
The user has a choice of diagrams for different purposes. In the case studies, a bar chart is
used to display the number of processes blocked per subsystem; space-filling diagrams to
illustrate process blocking statistics at sub-framework, inheritance structure and class levels,
and for particular class instances; ternary diagrams to show communication between sub-
frameworks and subsystems; affinity diagrams to show communication between classes of a
sub-framework and classes of a subsystem; and object interaction diagrams to show object
interactions. The system supports multiple simultaneous diagrams of combined static and
dynamic information, with hyperlinks between diagrams. The tool described uses an online
approach.
The two principal constraints on the design of the system were that it must incur low spatial
and temporal overheads, and that it must be flexible enough to allow the analyst to change
the data extraction technique conveniently. The structure of the system is based around
events being received by an event sensor and passed to an event announcer, which informs
an instrument at the relevant level of abstraction (i.e. that selected by the user) that the event
has occurred. The key architectural design decisions were identified as: how the query
interpreter maps architectural units to instruments; how the instrument managers control
instruments; how data collectors visit instruments to obtain data; and how events should be
directed to interested instruments. The information is obtained from method-level
instrumentation contained in the Choices operating system. Queries are formed automatically
via the GUI, or can be entered manually (the grammar is given in Extended BNF notation
[Aho 1986]).
24
At the time of the original paper [Sefika 1996a], this appears to be the first work to consider
dynamic architecture-oriented visualization. Potential for future research is identified in
combining the system with a code refactory to automate design repair; in utilising the
instrumentation techniques in an optimising compiler; and integration of the system with
everyday programming tools such as debuggers and code browsers. More generally, Sefika
et al. expect the pervasive instrumentation technique to provide analysts with better user
interfaces and views, particularly 3D views exploiting virtual reality technology.
Unfortunately, this latter development does not appear to have materialised.
2.2.4.2 Evaluation
Sefika et al. [Sefika 1996a] present two case studies based on the Choices object-oriented
operating system [Campbell 1993], written in C++. One is related to identifying a system
bottleneck, and the other related to analysing subsystem cohesion and coupling.
The performance of the architecture-oriented tool is compared to that of traditional “flat”
method-level instrumentation tools, and the improvements of architecture-oriented
visualization over traditional visualisation are identified as follows. Firstly, architecture-
oriented instrumentation utilises knowledge of the software structure, enabling it to reduce
the volume of data that must be collected, lowering the overhead of dynamic analysis. This
reduction also decreases the volume of data sent to the visualiser, and hence also reduces the
requirement for analysis to improve the comprehensibility of the data. The volume of data
collected is further reduced in the architecture-oriented approach as event sensors and
instruments are enabled according to the requirements of the query. The hierarchical
organisation of the instruments allows information about system structure to be obtained
quickly.
In terms of data generation, a graph comparing architecture-oriented instrumentation with
traditional instrumentation reveals that architecture-oriented instrumentation dramatically
reduces the trace size as the level of abstraction employed increases. In terms of
instrumentation overhead, two graphs illustrate clearly that architecture-oriented
instrumentation reduces the analysis overhead, as the volume of data and time required for
collection are decreased, and instruments are enabled depending on the components of the
current query.
25
Sefika et al. note that architecture-oriented instrumentation is not entirely without cost.
While traditional instrumentation increased the size of the Choices executable by 7.8%,
architecture-oriented instrumentation caused an increase in size of 14.3% in the worst case.
This difference is due to the addition of code for query processing and instrument
management required by architecture-oriented instrumentation. Sefika et al. feel that this is
an acceptable trade-off, given the benefits of architecture-oriented instrumentation and
falling memory prices, and the fact that most of the architecture-oriented instruments in the
code will be unused until explicitly required.
2.2.4.3 Comparison
Jerding and Rugaber [Jerding 1997] note that the goals of Sefika et al.’s approach are similar
to those of ISVis in visualising a system from a variety of architectural levels. However, they
point out that some of the views described by Sefika et al. are tightly coupled to the subject
system domain, rather than being generally applicable to software architectures. They
speculate that this could be because the tool was applied only to an operating system.
Walker et al. [Walker 1998] state that the higher-level visualisations provided by Sefika et
al.’s tool are an improvement over earlier techniques in analysing component interactions in
large systems. However, Walker et al. also explain that the tool is not as flexible as it could
be. While an online approach provides a connection between system execution speed and the
speed shown in the visualisation, this places restrictions on the analyses that can be carried
out, as discussed in Section 2.1.5. The approach taken by Sefika et al. of using predefined
abstraction types built into the tool, while gathering dynamic information effectively,
reduces the flexibility of the technique by making it more difficult for the analyst to adapt it
to a different system. The reflexion model technique described by Walker et al. can
conveniently be applied to a variety of systems, partly due to the decoupling of the data
collection and visualisation components.
Richner and Ducasse [Richner 1999] note that while Sefika et al.’s tool is one of the few
tools to support architectural-level visualisations, the approach taken requires application-
specific instrumentation, unlike Gaudi.
26
Systä et al. [Systä 2001] observe that Sefika et al.’s tool requires the analyst to select the
abstraction level and views to be produced before running the software system to be
analysed. Shimba is more flexible: it does not have this requirement, and provides a variety
of techniques to allow the analyst to construct abstractions from the low-level views
produced.
2.2.4.4 Assessment
The architectural-level visualisations produced by Sefika et al.’s tool suggest that it would
perform well in general software comprehension tasks. If appropriate views and abstraction
level were selected, it could also be useful in specific reverse engineering tasks. However,
the evaluation presented by Sefika et al. was in the context of an operating system only, and
it remains to be seen whether the technique will perform well when visualising other types of
software.
2.2.5 ISVis (level 4)
2.2.5.1 Description
Jerding and Rugaber [Jerding 1997] describe a tool called ISVis (Interaction Scenario
Visualiser) for identifying software system architecture. Static information is extracted from
files generated by the Solaris C/C++ compiler. An instrumentor then combines this static
information, the source code, and information from the analyst about what to instrument, and
generates instrumented source code. This code is compiled, executed according to the
desired usage pattern, and event traces are produced. The ISVis trace analyser then converts
this information into a set of scenarios and involved actors that are stored in a program
model. The user then queries views of this program model. ISVis has a Main View and a
Scenario View. The Main View lists the actors in the program model, including user-defined
components, files, classes, and functions, and the scenarios and interactions in the program
model. A Scenario View can be opened for any scenario in the model, which takes the form
of a Temporal Message Flow Diagram (also called TMFD, interaction diagram, message
sequence chart, event-trace diagram). A global overview is shown using an Information
Mural [Jerding 1995]; this allows the analyst to identify repeated patterns in the execution
27
visually. An option allows actors to be grouped by containing file, class, or component
actors. Another option allows the user to select an interaction or class of interactions and
define them as a scenario, which can then be abstracted out and replaced in the diagram by a
reference to the scenario. Interaction patterns can also be identified by a technique similar to
regular expression matching. Jerding and Rugaber compare the interaction patterns of ISVis
to design patterns [Coplien 1995, Gamma 1995], stating that interaction patterns are a result
of the implementation of design patterns, and constitute low-level evidence of their
existence.
The relationship between the two views and the program model is an implementation of the
Observer design pattern [Gamma 1995 pp. 293-303], and an example of the Model-View-
Controller (MVC) architecture used in languages such as Smalltalk [Krasner 1988]. The
Observer design pattern defines a one-to-many relationship between objects, such that when
one object changes state all its dependent objects are notified and updated automatically. For
example, objects representing different views of the same data, e.g. a pie chart, a bar chart,
and a spreadsheet, could be registered to observe the data source and hence be updated
automatically when the data source changed. The Observer pattern allows consistency
between cooperating objects, without making them tightly coupled which would reduce their
reusability. ISVis allows the analyst to save the event traces and program model for future
analysis. The process of reading in the trace, creating the program model, creating scenarios
and architectural models, and viewing the results is iterative, with each analysis building on
the results from the previous analysis. Analysis sessions can be loaded and saved. ISVis can
simultaneously analyse a number of traces from one system.
Jerding and Rugaber suggest future improvements to ISVis as suggesting patterns to the
analyst more effectively, and import/export of components from/to other tools. Future work
is to include interoperation of ISVis with the Balboa machine-learning finite state machine
generation tool [Cook 1995], and with the SAAMTool architectural analysis tool [Kazman
1994].
2.2.5.2 Evaluation
ISVis is applied to a case study involving adding functionality to the Mosaic web browser
[NCSA 2003]. Jerding and Rugaber term the problem of finding where in a system to insert
28
an enhancement “architectural localization”. The high-level process of architectural
localization during the case study consisted of producing scenarios, removing interactions
that do not pertain to the functionality being localised, using the information mural to browse
the scenarios and identify patterns, using pattern matching to find scenarios similar to those
already identified, then relating this behaviour to the source code.
The principal strength of ISVis was reported to be its support of the abstraction process by
means of interaction patterns. This frees the analyst from the computationally-intensive work
and allows them to identify semantically those patterns that are relevant to the task at hand.
This allows the analyst to perform inferences manually that would not be considered by a
wholly automated approach. The authors emphasise the importance of appropriate usage
scenarios being chosen, as these have a direct effect on the analyst’s ability to identify
patterns. The problem of selecting a suitably representative trace is a key concept in dynamic
analysis, as discussed above in Section 2.1.2 and Section 2.1.3.
A weakness is given as the complexity of the user interface, which is attributed to its rich
features. The importance of scalability is emphasised, as architectural visualisation is only
useful if the system is large enough to benefit from such analysis. It is reported that the
information mural was effective at compressing the large volume of data.
2.2.5.3 Comparison
Unlike Ovation [De Pauw 1998] and Jinsight [De Pauw 2002], ISVis does not automatically
identify patterns of repeated execution; ISVis requires the analyst to identify such patterns.
Richner and Ducasse [Richner 1999] believe that Gaudi complements ISVis in that while
both tools acknowledge that higher-level views are required for architectural understanding,
ISVis concentrates on pattern detection, while Gaudi allows the analyst to specify the type of
view used.
Systä et al. [Systä 2001] observe that source files are the lowest level of granularity that can
be excluded from the trace in ISVis, while Shimba allows individual classes and methods to
be excluded. Shimba also allows more flexible construction of abstractions. However,
29
Shimba only allows pattern searching using exact string matches, and patterns must be
contained within a single sequence diagram.
2.2.5.4 Assessment
As with Sefika’s Architecture-Oriented Visualization tool [Sefika 1996a], the architectural-
level visualisations used in ISVis would appear to lend themselves well to general software
comprehension. The tool may also be useful in specific reverse engineering tasks, depending
on the level of abstraction required for the task.
2.2.6 Dali (level 4)
2.2.6.1 Description
Kazman and Carrière [Kazman 1998, Kazman 1999] describe a tool called the Dali
Workbench, which is designed to help with the extraction of program architecture. It is
designed as a lightweight, flexible tool that integrates other tools, the argument being that no
single tool is adequate for architectural extraction. Kazman and Carrière [Kazman 1999]
argue that software architecture is a “shared hallucination” – it exists from the various points
of view of people involved with the software. It is thus argued further that a human element
is essential in the process of architectural extraction. The goal of Dali is to assist the analyst
in the analysis of software architecture. This implies a need for the reconstruction of
architectural representations of the system. Kazman and Carrière list the main contributions
of Dali as its use of a central data repository to integrate system information, its use of a
common language (SQL) to enable the combination of views and user-defined pattern
matching, and its assessment of such patterns as a metric for architectural conformance.
Four iteratively applied techniques are involved in the process of reconstructing software
architecture using Dali. Firstly, static information from source artefacts, such as the program
code, and dynamic information from the output of profilers or coverage tools is used to
create extracted views of the system. These views represent the implemented architecture of
the system. Secondly, the extracted views are combined to produce fused views giving a
more complete representation of the architecture. Thirdly, the analyst defines a number of
30
architectural patterns that represent his understanding of the implemented architecture, which
are used to create refined views. Fourthly, the refined views are visualised to allow the
analyst to compare the implemented architecture to the designed architecture.
The extraction component of Dali extracts information using tools such as lexical analysis,
parsing, and profiling tools, then combines this information. This information is stored in a
central repository (a relational database). The contents of the repository can be visualised
and manipulated, and analyses can be performed on them. The various tools that are used
with the Dali Workbench are not fixed in its specification, but examples include the
following tools: Lightweight Source Model Extraction (LSME) [Murphy 1996b] for
extraction of static information; gprof for extraction of dynamic information; PostgreSQL
(based on POSTGRES [Stonebraker 1990]) as the relational database; SQL for view fusion
and architectural pattern definition; and RMTool [Murphy 1995] for analysis.
Fusing views in Dali means defining connections between them. The fused views in Dali are
concerned with providing complementary information from multiple views, navigating
between views, and improving the accuracy of a view with information from another view.
When combining views the fusion process must reconcile the information extracted using
different, complementary techniques. For example, Kazman and Carrière point out that both
their static and dynamic extractors provide information on function calls, the former listing
potential calls and the latter actual calls. A simple naïve union of these two sets of
information would lead to inconsistencies, so it is necessary to reconcile the elements in
these views. Statically extracted class inheritance information can be added to disambiguate
calls to sub/super classes.
Kazman and Carrière state that the intention was not to provide an ultimate solution, but to
develop an extensible environment for tool integration. Future research includes extending
the scope of Dali to analyse other languages and larger systems (e.g. legacy COBOL
systems) – at the time of the original work [Kazman 1999], Dali had been used on systems
up to 200 KLOC (thousand lines of code) in C, C++, Objective C, and Fortran – there is
evidence of such extension in work by O’Brien [O’Brien 2002]. There is also the possibility
of integrating other tools, such as to enable the import and export of architecture
representations in ACME [Garlan 1997] or UniCon [Shaw 1995]. It is also hoped to improve
user interaction, with the addition of a history/undo feature in the short term, and the ability
for the user to manipulate the architecture directly and have the system infer appropriate
31
architectural rules. Finally, Dali could be used to guide architectural evolution, e.g. in
determining how difficult it would be to change the connection mechanisms of an
architecture; this could be useful in web-enabling legacy systems or distributing them via
CORBA.
2.2.6.2 Evaluation
Kazman and Carrière [Kazman 1999] describe the application of Dali to two C++ systems:
VANISH [Kazman 1996], which has a well-designed architecture, and UCMEdit [Buhr
1996], which has no designed architecture. The study describes the stages of extracting the
information, forming “fused views”, then applying patterns (expressed as SQL queries) to
simplify the resultant visualisation (application-independent patterns, common application
patterns, then application-specific patterns). The analyst carrying out the architectural
extraction would appear to have to be either a very good software engineer, or even to be
intimately familiar with the system under investigation. The sorts of manipulation that are
carried out involve, for example, the grouping of methods and variables into their associated
classes, and the grouping of functions and header files into their associated classes. The case
studies extracted the as-implemented architecture from both systems, but, as would be
expected, found the VANISH architecture much more useful. The analysts were also able to
identify some architectural exceptions and points for improvement in the VANISH
architecture using the extracted model. Kazman and Carrière note that a good architecture is
characterised by functional consistency.
O’Brien [O’Brien 2002] describes three case studies in which Dali was employed in an
industrial architecture reconstruction project at Nokia. The system involved in the first case
study was a network management system consisting of 500 KLOC of C; the goal was to
understand how the system could be improved. The second case study concerned another
network management system consisting of 100 KLOC of Java; the goal was to understand
the system and determine whether it could be reused. The third case study involved a mobile
phone system consisting of 1 MLOC (million lines of code) of C++; the goals were to
examine the way in which this application was integrated with the operating system, and to
determine whether a specific component could be extracted and reused. O’Brien reports that
the architecture reconstruction efforts were successful in each of these contexts with their
various goals, and that the architects found the Dali views to be useful. However, a difficulty
32
was identified concerning the static analysis of the C and C++ systems. It was found that
identifier names extracted from the source code were often not unique, and could not be
discriminated between without compiling and linking. O’Brien concludes that architecture
reconstruction requires tool support, and that such tools are available. However, research is
required to improve the reconstruction process and the tools that support it.
2.2.6.3 Comparison
Systä et al. [Systä 2001] observe that Dali uses a single merged view to represent both static
and dynamic information about the software system, whereas Shimba uses separate, linked
views to separate static and dynamic information.
2.2.6.4 Assessment
As with the other architecture-level visualisation tools in this report, Dali would appear to be
well-suited to general software comprehension tasks. The intended role of Dali as an
architectural extractor may make it less suitable for specific reverse engineering tasks.
However, performance in either type of task will depend on the ease with which appropriate
architectural patterns can be identified and useful architectural views built.
2.2.7 Ovation (level 2)
2.2.7.1 Description
De Pauw et al. [De Pauw 1998] describe a tool for visualising programs using an execution
pattern view, which is a variation of Jacobson’s interaction diagrams [Jacobson 1992]. The
technique is based on that used in Ovation [De Pauw 1993, De Pauw 1994], and has since
been implemented in Jinsight [De Pauw 2002]. De Pauw et al. [De Pauw 1998] recognise the
inherent information overload problem, noting that both statically complex and small,
repetitive programs can produce huge traces. They state that dynamic execution trace data
can be comprehended if it is summarised into distinct, abstract portions and detail is
provided to the analyst on demand, and if patterns in the trace can be detected and
33
generalised. The execution pattern view achieves these requirements by allowing the analyst
to examine program execution at various levels of detail, with information supplied only on
demand, and by extracting and visualising generalised patterns in the trace. Ovation can
visualise C++ or Java programs using traces generated from the VisualAge development
environment [IBM 2004a], and Smalltalk programs through instrumentation added to the
Little Smalltalk [Budd 1987] and VisualAge Smalltalk [IBM 2003] environments.
De Pauw et al. observe that, while interaction diagrams are an improvement on directed
graphs for illustrating program interactions, they do not scale up well to larger execution
traces. The execution pattern view instead uses a tree structure, emphasising the progression
of time, rather than control structure. Colour is used to indicate the class of an object, and a
unique object ID appears in each object box. In the execution pattern views, horizontal space
is mapped to the call sequence, not the object population, and vertical space is also used
more efficiently. The view can be explored by searching for execution patterns based on a
number of criteria, such as the involvement of a specific class, object, or method. Subtrees
can be collapsed and expanded, allowing the user to “drill down” to focus on interactions of
interest while excluding extraneous detail. The context of the view can also be changed by
moving up or down the call hierarchy. Filtered expansion is also possible, for example by
expanding only those nodes in the tree that lead to a certain type of object. The system can
detect repetition automatically, either in the form of iteration (shown vertically) or recursion
(shown horizontally). Zooming and panning the view is also supported. Flattening can be
used to limit the horizontal depth by collapsing only the receiver of the message.
Underlaying saves horizontal space by hiding all the messages sent by the underlaying class
and displaying call recipient objects on top of the object that initiated the call. These
techniques allow the analyst to navigate the execution one step at a time. A number of
alternative charts for representing subtrees are available, including class legends and class
communication graphs. Other possible charts could include a CPU time meter, a call matrix,
or an instance histogram [De Pauw 1993]. To aid comprehension, “flyovers” and zooming
(without scaling method names) are supported.
Ovation supports generalized (i.e. non-identical occurrences) pattern matching for detecting
patterns of similar execution. The generalization criteria for pattern matching implemented
in Ovation are those that De Pauw et al. report that programmers found most useful: object
identity, class identity, message structure, depth-limiting, repetition, polymorphism,
associativity, and commutativity. To implement this generalisation, the tool assigns a hash
34
value to each subtree of the execution tree. The subtree hash code is formed from the hash
codes of the subtree’s children and values in the subtree’s root. The values used to form the
hash code depend on the matching criterion specified, e.g. method names (method and class
names would be used) or class names (class names would be used). The hash values are
stored in a pattern dictionary, which records summary statistics for each entry (e.g.
frequency of this pattern). De Pauw et al. argue that the execution pattern view bridges the
gap between microscopic and macroscopic visualisation representations by providing a view
of the entire trace, with more detail available on demand.
De Pauw et al. conclude that execution patterns have three key benefits for object-oriented
visualisation. Firstly, they provide a convenient representation of object-oriented
communication. Secondly, similar execution patterns can be generalised. Thirdly, execution
patterns can help in the assessment of system complexity (for example through metrics such
as pattern redundancy). Future work is to include improving the flexibility of the pattern
matching, visual grammars, and reporting of qualitative results.
2.2.7.2 Evaluation
De Pauw et al. [De Pauw 1998] report that the system proved helpful for discovering
unexpected behaviour, comprehension of unfamiliar code, and performance improvement in
both medium-sized systems (such as Ovation itself) and large systems (such as Taligent).
2.2.7.3 Comparison
Walker et al. [Walker 1998] believe that the analyst requires a detailed knowledge of the
system under investigation in order to compose appropriate queries for Ovation.
Koskimies and Mössenböck [Koskimies 1996] believe that the techniques employed in
Scene and Ovation are complementary. They observe that Ovation compresses the extracted
execution trace into statistical information, while Scene retains the trace. The variations in
the two approaches are due to their different intended applications. Scene aims to visualise
method calls and returns, whereas Ovation aims to characterise and illustrate programs using
35
dynamic statistics. The call summary in Scene is an example of such statistical information,
and was inspired by earlier work by De Pauw et al. [De Pauw 1994].
2.2.7.4 Assessment
As with the other method-level visualisation tools, it would be expected that Ovation would
perform better in a specific reverse engineering task, where the area of application would be
more focussed than in a general software comprehension task. However, the summary views
of Ovation may be useful in this latter context.
2.2.8 Reflexion models (level 4)
2.2.8.1 Description
2.2.8.1.1 AVID
Walker et al. [Walker 1998] describe an approach for producing architectural–level
visualisations of behaviour. The approach derives its abstractions from the number of objects
in the program trace, and the communications between these objects. The tool uses a
sequence of cels to represent the information collected during the system’s execution. Each
cel constitutes an abstraction of dynamic information about the system at that point, and
about the execution until that point. The approach is intended to complement and extend
existing techniques for analysing dynamic information. The benefits of the approach are that
it enables the analysis of a system without changing the source code, allows the user to
manipulate the abstraction, provides an offline visualisation that is independent of the
execution speed of the target system, and allows the analyst to navigate both forwards and
backwards through the visualisation. The tool was originally implemented in Smalltalk for
the analysis of Smalltalk programs, and has since been implemented in Java for the analysis
of Java programs and named AVID (Architectural Visualization of Dynamics in Java
Systems).
The tool has two main views. One view displays a series of cels showing the events that
occurred during the program execution. The other view is a summary view showing cels
36
representing an aggregate of the whole execution. The execution can be viewed in an
animated form in the first view, and the user can step both forwards and backwards through
the execution. Each cel consists of: a box that represents a set of objects in the high-level
model defined by the analyst; a directed hyperarc passing between and through a number of
boxes; a set of directed arcs between pairs of boxes, representing method calls; a histogram
representing the age and garbage collection status of the objects associated with the box;
annotations and bars within boxes; and annotations on each directed arc. The hyperarc
represents the call stack at the end of the interval displayed. The summary view is equivalent
to the final cel of the animated view. Additionally, it displays two histograms for each box:
one showing the pattern of object allocation for the entire execution, and the other the age of
garbage-collected objects. Although only one view can be displayed at a time, the offline
nature of the tool allows multiple instances to be run simultaneously on the same execution
trace. The animation controls allow the user to “play” the trace, step back and forward
through it, and set the step (number of cels between steps) and interval (number of events
represented per cel) size. Clicking an arc, hyperarc, or histogram in either view pops up a
text box giving more information on the selection. Walker et al. note that it would be
possible to link the tool with a textual code browser, and have the browser jump to the
relevant position in the source code when an item in the text box popup is selected.
Constructing a visualisation in AVID is a four-stage process. Firstly, execution data is
extracted from the system under analysis and stored to disk. Secondly, the analyst produces a
high-level model of the system using abstract entities designed to emphasise the architectural
properties that he is investigating. Thirdly, the analyst defines a mapping from the abstract
entities to the extracted dynamic information. The tool then applies this mapping to the
extracted information to produce the visualisation. Finally, the analyst examines the
visualisation to investigate the system’s dynamic behaviour. This offline, multi-stage process
increases the tool’s usability by allowing iterations over the latter stages of the process –
there is no need to re-run the program to collect the dynamic information again. This process
is based on the concept of reflexion models introduced by Murphy et al. [Murphy 1995].
The tool collects information for every method call, object creation, and object deletion,
which consists of the class of the calling (or creating) object, and either the method being
called and the class of the object containing it, or the class of the object being created or
deleted. The tool was originally implemented in Smalltalk and the dynamic information is
collected by instrumenting the Smalltalk VM. A map relates dynamic system entities (e.g.
37
objects or methods) to abstract ones (e.g. a box in the visualisation). The mapping process is
achieved by use of regular expressions. The map consists of a set of entries, each with three
parts: the name of the level of the Smalltalk structural hierarchy being mapped (i.e.
application, subapplication, category, class, or method); a regular expression defining the set
of names to be mapped for that level; and the name of the abstract entity to which the system
entities represented by these names should be mapped.
As discussed previously, the separation of visualisation from system execution by using an
off-line approach has two benefits. Firstly, pre-processing can be performed prior to
visualisation, e.g. to generate summary information for the entire execution. Secondly, it
allows the trace to be replayed from an arbitrary point without having to re-run the
execution. Concerning navigation, a further advantage of the off-line approach is that the
user can play, step back and forward through, and access randomly any part of the execution.
Although no information on execution time is built into the representations, Walker et al.
note that this could be desirable. An object is identified by a description of the call stack that
exists when the object is created.
An area for further research is the possibility of allowing objects’ mappings to change, to
allow them to “migrate” between abstraction units. Walker et al. recognise the difficulties
concerning the huge volume of data generated by tracing and believe that the flexibility and
usability of the tool are limited by the use of trace information and that the use of sampled
information could partially resolve such limitations.
2.2.8.1.2 RMTool
Murphy et al. [Murphy 2001] discuss a technique to extract a model of a system that is
“good enough” to be used for a specified task. The reflexion model technique involves
comparing a high-level model (produced by the analyst) of a system with the actual
implemented model. The analyst defines a mapping (using regular expressions) between the
source code constructs (e.g. file names, class names, function names, etc.) and his high-level
model. The RMTool (Reflexion Model Tool) system compares the two models and produces
a diagram containing the modules from the analyst’s model with three types of arc
connecting them: convergences (communications that agree with the analyst’s model),
divergences (communications that did not appear in the analyst’s model, but do appear in the
38
extracted model), and absences (communications that appear in the analyst’s model but not
in the extracted model). Not all source code constructs need be mapped to a high-level
equivalent – partial and approximate models are allowed. The process is designed to be
iterative – the mapping can be refined as the task proceeds. Murphy et al. give the key
characteristics of the technique as being that it is “lightweight”, requiring low effort and a
timeframe of hours not days, “approximate”, using a variety of source models and refining
the mapping as the analysis proceeds, and “scalable”, capable of analysing various languages
and systems from several to over 1000 KLOC. The procedure is as follows: the analyst
specifies his model; he then uses a third-party tool to extract structural information from the
system (via static or dynamic analysis); he then defines the mapping between this source
model and his high-level model; the analyst uses a tool to compute the reflexion model; and
finally he investigates the reflexion model via a GUI.
A formal Z specification of the technique for producing the reflexion models is given by
Murphy et al. Optimisations were applied to reduce the computation time to acceptable
levels (55 seconds for the 1000 KLOC MS Excel application). Murphy et al. discuss the
similarities and differences between their tool and consistency checkers, reverse engineering
tools, knowledge-based approaches, and model comparison techniques. Future work is to
include use of the tool to produce documentation on demand for a specific task.
2.2.8.2 Evaluation
2.2.8.2.1 AVID
A qualitative evaluation was obtained through two case studies involving performance-
tuning tasks on Smalltalk programs, each involving an expert and a non-expert Smalltalk
developer. The expert participant found the summary view and animated hyperarc useful, but
that the tool was lacking integration with a traditional code browser and the ability to view a
detailed stack dump as in a Smalltalk debugger. The tool was designed to allow the
integration of a code browser, but seeks to complement existing techniques, so does not seek
to replace a debugger by incorporating one. The non-expert found the garbage collection
histograms, and the correlation between abstract information and method/object names
available in the pop-up useful, but desired different displays of information, feeling that one
screen was “too cluttered”.
39
2.2.8.2.2 RMTool
Murphy et al. [Murphy 2001] discuss the tool in the context of NetBSD (written in C), and a
number of case studies are discussed, including Microsoft Excel (C), the SPIN OS (Modula-
3 and C), and a restructuring tool (C++); the tool appeared to help with all of them.
The MS Excel case study is described in more detail by Murphy and Notkin [Murphy 1997].
The Excel application consists of 1.2 MLOC of C. The goal of the reengineering task was to
identify and extract components from the application source code. To achieve this, an
understanding of the structure of the application was required. Specifically, the analyst
needed to gain an understanding of how the source code was divided into static modules, and
how the modules communicated at runtime. The analyst reported that the reflexion model
technique had assisted him in refining an architectural view of the application, and in
investigating the correspondence between that view and the source code. Additionally, the
reflexion model helped the analyst with his overall understanding of the application, and
highlighted aspects that were not apparent from the initial high-level model or the source
code. The analyst also reported that it was straightforward to focus the investigation on the
relevant parts of the system and exclude extraneous detail. Murphy and Notkin assert that
this case study proves that the reflexion model technique has useful practical applications for
the following reasons. Firstly, the analyst elected to use the reflexion model technique even
with the constraints of an industrial setting. Secondly, the analyst continued to use the
technique for future revisions of the application outwith the case study period. Thirdly, the
analyst believed that the reengineering task could have been completed sooner had the
reflexion model technique been employed earlier. Murphy and Notkin attribute much of the
success of the technique to its support for approximation in the form of unrefined areas of
the model. They believe that the results of this case study can be generalised to similar
reengineering efforts, as the application was written in a commonly-used language (C), the
source code had evolved over time with multiple developers, and the task of identifying and
extracting components from an existing system is a common one.
40
2.2.8.3 Comparison
2.2.8.3.1 AVID
Richner and Ducasse [Richner 1999] believe that the Gaudi technique complements that of
Walker et al. [Walker 1998] in recognising that object-level tracing information is too low-
level to assist in architectural understanding of a system. While the approach of Walker et al.
appears to be targeted to performance evaluation, Gaudi aims to allow the analyst to specify
the view that most suits his analysis.
Systä et al. [Systä 2001] observe that the mapping between low-level system artefacts and
high-level components of the analyst’s model in Walker et al.’s approach is constructed
manually using a declarative mapping language. Shimba presents static and dynamic
information in separate views, and Rigi is used to build high-level static components. The
analyst can then construct high-level sequence diagrams by mapping low-level artefacts to
high-level components.
2.2.8.3.2 RMTool
Richner and Ducasse [Richner 1999] note the similarity of their process with that of Murphy
and Notkin [Murphy 1997], in that it allows the analyst to navigate their investigation
through an iterative process. Another similarity is that Richner and Ducasse also expect the
engineer to produce a high-level model of the system under analysis.
Murphy et al. [Murphy 2001] present the idea of combining models from different extractors
as a simple case of set union, which is in contrast to the production of fused views in Dali
described by Kazman and Carrière [Kazman 1997]. A possible disadvantage of the reflexion
model technique is that the analyst needs to start with a model – the system gives no help if
the model is very inaccurate. It must be considered whether or not it would always be
acceptably straightforward to produce a sufficiently accurate model. The technique appears
to require either an understanding of the system under investigation, or an experienced
analyst. The effort involved in producing the mapping for a large system would appear to be
considerable, even if it were produced iteratively (e.g. 1,425 map entries for Excel). The
system appears to be reliant on conventions (e.g. directory or class structure) in the source
41
code for producing its models; although Murphy et al. note that this was not a problem in
their case studies, if the source code is disorganised the model produced may be of little
value.
2.2.8.4 Assessment
The high-level architectural views produced by these tools would be expected to be useful in
general software comprehension tasks, provided appropriate high-level models of the target
system could be constructed. The reflexion model approach may be less successful with
specific reverse engineering tasks, depending on the level of abstraction required.
Specifically, tasks at a low level of abstraction, such as those concerned with intra-object
behaviour, would be too detailed for the information presented in a reflexion model to
address.
2.2.9 Gaudi (levels 3-4)
2.2.9.1 Description
Richner and Ducasse [Richner 1999] describe a technique for extracting application
visualisations from Smalltalk programs using a combination of static and dynamic
information. A set of Prolog facts defines the basic static (e.g. superclass-subclass) and
dynamic (e.g. message send) relations between elements. Derived relations can be produced
from these, such as overrides (static) and sendsCreate (dynamic). Views are defined by a
describing a set of components and the connectors between them. Prolog rules are used to
define a clustering of components (C), and a relation (R). The diagrams contain ovals
representing components, and directed arcs representing communications between those
components. Methods can also be grouped by class.
The static information is extracted by parsing the code using the MOOSE tool [Ducasse
2000] and representing it in the FAMIX model [Tichelaar 1998]. The dynamic information is
collected by instrumenting the application with Method Wrappers [Brant 1998], and stored
as Prolog facts. Prolog queries are used to build the abstractions. The Gaudi tool was used to
create the views, which were then displayed using the dot tool [Koutsofios 1996b]. Richner
42
and Ducasse note that the approach could be adapted easily to Java to C++, but that it does
not presently support concurrency.
Richner and Ducasse give the weaknesses of the approach as follows. Obtaining dynamic
information requires an executable, instrumentable system – Gaudi is therefore not suitable
for sections of partially constructed systems, or other unexecutable code. They also note the
problem of scalability, and give possible solutions as instrumenting only some
methods/classes, feedback from query results to instrumentation so that only relevant
methods are instrumented, appropriate scenario choice, and pre-analysis trace filtering.
Richner and Ducasse give the strengths of Gaudi as flexibility in the kinds of views that can
be recovered by allowing the analyst to define relations and clusterings, and in the questions
that can be answered through its use of both static and dynamic information.
Future work includes determining which views are most useful in reverse engineering, and
guidelines for the use of such tools in reverse engineering.
2.2.9.2 Evaluation
A case study of reverse engineering of Smalltalk HotDraw [Johnson 1992, Beck 1994]
demonstrates the technique. The case study proceeded as follows. A high level view was
created that shows all the relations between HotDraw classes, grouped by Smalltalk
category. Based on this information, a new clustering was then defined to give a different
view. A view was then created showing creation invocations, and one to show non-creation
invocations.
The clustering in Gaudi provided a number of views at different levels of granularity, while
the combination of static and dynamic information was reported to assist in focussing the
effort. The views produced helped the analyst to formulate questions about the interactions
in the system, and provided a comparison with his own mental model of the system.
43
2.2.9.3 Comparison
Systä et al. [Systä 2001] observe that the query-based approach of Gaudi allows the user to
tailor the views produced, which may contain either static or dynamic information, or a
combination of both, and exist at various levels of abstraction. The query-based approach
also allows the analyst to control the volume of information generated. However, unlike
Shimba, Gaudi does not support the direct exchange of information among views.
2.2.9.4 Assessment
In common with other architectural-level tools, Gaudi would be expected to perform best in
general software comprehension tasks. The varying levels of abstraction that can be
produced using its query-based approach may also allow it to perform well in specific
reverse engineering tasks.
2.2.10 Shimba (levels 2-4)
2.2.10.1 Description
Systä et al. [Systä 2001] describe the Shimba tool, which produces visualisations of Java
programs using both static and dynamic information. Shimba extracts static and dynamic
information from the Java bytecode of the system. It displays static information using
directed graphs (Rigi dependency graphs), and dynamic information using a variation of
UML sequence diagrams (SCED (Scenario Editor) sequence diagrams) from which
statecharts can be generated automatically. The principal contribution of this work is that
Shimba considers both static (structural) and dynamic (behavioural) information and
constructs separate diagrams for each, but maintains a relationship between the diagrams.
Most other tools consider either static structure or dynamic behaviour, or combine both into
a single diagram.
The dynamic information is extracted by running the target system under a customised Java
SDK debugger [Sun 2000], which automatically sets breakpoints in the code. Shimba
integrates the Rigi [Müller 1988, Müller 2001] (static) and SCED [Koskimies 1998]
44
(dynamic) tools to carry out both general program understanding and goal-driven reverse
engineering. Shimba (and, in a similar manner, Dali) demonstrates the possibility of
constructing software comprehension tools using pre-existing tools, rather than starting from
scratch. SCED sequence diagrams can be used to slice the static graphs produced by Rigi, to
enable visualisation of the part of the system that is responsible for a particular observed
behaviour. Rigi graphs can be used to guide the generation of SCED sequence diagrams to
observe the behaviour of a specific part of the system, and can also be used to raise the level
of abstraction of the SCED diagrams. Dynamic control flow information can also be added to
sequence diagrams, while the static graphs can be annotated with software metrics
[Chidamber 1994]. The event trace explosion problem is handled by applying behavioural
pattern matching algorithms [Boyer 1977] to the trace to extract out repeated patterns. These
are then represented in the SCED sequence diagram using subscenario and repetition
constructs. The trace can be split (both automatically and by the user) into a number of
smaller traces to limit the size of the sequence diagrams produced.
Systä et al. note that the techniques in Shimba are also applicable to forward engineering, to
check the implemented structure against design guidelines and the implemented behaviour
against use cases. Future work is planned to integrate the techniques of Shimba into the
Nokia TED UML modelling tool [Wikman 1998]. This will allow the usefulness of the
techniques in Shimba to be studied with real users, and will allow tighter integration than is
possible with current reverse engineering environments. Systä et al. comment that a reverse
engineering environment using various UML diagrams would be useful.
Further details on the use of Shimba in analysing metrics is given by Systä et al. [Systä
2000a]. Further information is available on the reverse engineering of Java software using
Shimba [Systä 2000b, Systä 2000c], using Rigi and SCED [Systä 1999a], and using SCED
[Systä 1999b, Systä 2000d].
2.2.10.2 Evaluation
A case study of the FUJABA system [Rockel 2000] illustrates the use of Shimba. The
combination of static and dyamic information was found to be particularly useful. Although
the string matching algorithms employed were able to detect numerous, nested patterns in
the trace, one of the most problematic aspects involved structuring the SCED sequence
45
diagrams using behavioural patterns. One problem related to the naming of subscenario
boxes, which is automatic and therefore not descriptive of the subscenario. Another problem
relating to subscenarios was that a pattern is defined based on its length and contains an
arbitrary sequence of SCED sequence diagram elements, which may not necessarily form a
logical unit within the context of the system under analysis.
Using static information to guide the generation of dynamic information was found to be
particularly useful for goal-driven reverse engineering tasks. This helps to prune
‘uninteresting’ information from the visualisation. The statechart synthesis functionality was
useful for analysing the dynamic behaviour and control flow of selected parts of the system.
The model slicing technique was used to determine the cause of certain behaviour, the
system structure that relates to this behaviour, and how elements of a SCED sequence
diagram relate to the rest of the system. Raising the level of abstraction of the SCED
sequence diagrams using static Rigi abstractions was also employed to understand
communication between high-level components, and to validate such static abstractions.
Further information on this case study is given by Systä [Systä 2000c].
2.2.10.3 Comparison
Unlike other tools that produce diagrams containing only static or dynamic information, or
combine both into one diagram, Shimba produces separate diagrams for static and dynamic
information and provides linkages between them. The pattern-matching functionality is
comparable to that employed in Ovation, allowing repeated behaviour to be factored out as a
subscenario in the visualisation. Shimba’s automatic statechart generation function is unique
– none of the other tools considered produce state-level representations of dynamic
behaviour.
2.2.10.4 Assessment
The sequence diagram, statechart, and dependency graph representations used in Shimba
should enable it to perform well in specific reverse engineering tasks. The ability to slice the
static dependency graphs using dynamic sequence diagrams, and to raise the level of
46
abstraction of a scenario diagram using high-level static abstractions should make Shimba
useful for general software comprehension tasks also.
2.2.11 Jinsight (levels 2-3)
2.2.11.1 Description
De Pauw et al. [De Pauw 2002] describe the Jinsight tool and its application to the visual
exploration of runtime information. Jinsight illustrates object population, thread activity, and
method calls in Java software. Jinsight includes a profiling agent that is used to produce an
execution trace from which visualisations are generated. Tracing can be enabled and disabled
during execution. Uninteresting classes and packages can be excluded from the visualisation.
Visualisations are presented in the form of interdependent views, each of which illustrates a
different facet of the software’s runtime behaviour.
One such view is the histogram view, which illustrates resource usage (CPU time and
memory space) for classes, objects, and methods. This view allows the analyst to identify
‘hot spots’1 of activity in the execution that could indicate a bottleneck. Each row in the
histogram corresponds to a class in the system. Symbols are coloured to represent activity on
that class, such as the time spent executing methods of the class, the number of calls made to
methods in the class, the amount of memory consumed by instances of the class, or the
number of threads in which instances of the class participate. Hollow rectangles represent
garbage-collected objects, which helps in identifying memory leaks. The lines in the
histogram view represent inter-object communication, and can be set to show either method
calls, object creation, or references between objects. However, the combinatorial nature of
inter-object communications means that this aspect of the histogram view is not scalable
beyond very simple programs.
One way in which Jinsight simplifies the huge amount of data produced from a dynamic
trace is through pattern extraction. A pattern extractor analyses the event trace information
and identifies patterns of repeated behaviour. These patterns can be used to present an
aggregated view of the execution. The reference pattern view illustrates patterns of object
1 Hot spots in this context are distinct from hot spots in the context of framework reuse, where they are points where a framework is designed to be extended.
47
references in the execution. Colours denote classes. Double rectangles represent a group of
objects of a certain type. Labels denote the number of instances of a class, and the class
name. The reference pattern view can be used to help in identifying memory leaks in the
form of objects that are no longer required but cannot be garbage-collected due to
outstanding references from other objects.
Jinsight’s execution view illustrates the sequence of method calls that make up the event
trace of the system’s dynamic behaviour. Time proceeds from top to bottom. Each horizontal
stripe represents the execution of a method, with deeper calls at the right hand side. Stripes
are coloured by class. A vertical lane constitutes all of the method stripes for a thread of the
execution. Lanes are added from left to right. Zooming in further to the execution view
reveals individual method calls, annotated with their names. Pattern recognition can also be
applied to the execution view. The execution pattern view illustrates patterns of method calls
in the execution.
The call tree view gives quantitative data on the sequence of method calls, including the
number of calls and their contribution to the total execution time.
Jinsight allows the analyst to group related behaviour into execution slices, which can be
used as a basis for comparison between executions, or to filter out information not pertinent
to the visualisation objectives. Execution slices can be defined by selecting elements in a
view, or by querying the trace data directly.
De Pauw and Sevitsky [De Pauw 1999, De Pauw 2000] describe the use of Jinsight in
examining memory leaks, while Sevitsky et al. [Sevitsky 2001] discuss the use of Jinsight
for performance analysis. A brief summary of Jinsight’s functionality is given by De Pauw et
al. [De Pauw 2001].
Future work includes enabling the visualisation of systems running on multiple JVMs
simultaneously and across networks, and of heterogeneous systems containing middleware
such as databases in addition to Java components.
48
2.2.11.2 Evaluation
De Pauw et al. [De Pauw 2002] report that Jinsight has been used successfully to diagnose a
number of problems in industrial applications. They note that the system did not perform
well when analysing high-volume web-based applications as the tracing overhead caused
undesirable behaviour in the application, requiring more selective trace information
collection. They found that their aggregate statistics did not provide sufficient information to
support some analyses, and that broad filtering at the class or method level did not scale
well. To rectify this, Jinsight allows task-oriented tracing, where relevant details can be
extracted while retaining other important contextual information.
2.2.11.3 Comparison
De Pauw et al. [De Pauw 2002] note that it is important to select appropriate diagram
abstractions that are sufficiently scalable to large amounts of execution information. They
comment that Sefika et al. [Sefika 1996] use large architectural units, while Walker et al.
[Walker 1998] include additional structural units to organise the data.
Jinisight shares some ideas with Ovation [De Pauw 1998], notably the concept of execution
patterns.
2.2.11.4 Assessment
The call tree view and execution view would be expected to help with specific reverse
engineering tasks. The reference patterns may be useful for general software comprehension
tasks. Jinsight would appear to be particularly useful for examining performance issues, for
which the histogram view would be useful.
49
2.2.12 Collaboration Browser (levels 2-4)
2.2.12.1 Description
Richner and Ducasse [Richner 2002a] describe a process for recovering collaborations from
software systems using dynamic information. A tool called the Collaboration Browser
illustrates the technique. A collaboration represents a part of the software system that
performs some function and details how the classes that make up the collaboration interact
by playing certain roles.
The first stage in extracting collaborations from source code is to analyse the code
dynamically to extract interactions. Static analysis is inadequate for this purpose as it cannot
provide the object-oriented control flow information required. It is then necessary to identify
the important collaborations that help to answer the analyst’s questions. Collaboration
Browser records an event trace containing information for each method call, consisting of
sender class and identity, receiver class and identity, and the name of the called method.
Pattern matching is used to abstract similar sequences of execution from the trace. Querying
allows the analyst to identify the interesting collaborations.
A collaboration instance is the sequence of method calls between a method call and its
corresponding return. A collaboration pattern is a generalised class of collaboration
instances, and represents the collaboration design concept. The set of methods called on a
class during a collaboration pattern corresponds to the role design concept.
The pattern matching settings used to identify collaboration patterns from instances can be
adjusted in three ways. Firstly, any of the five items of information that represent an event in
the trace (caller class and identity, callee class and identity, and method) can be included or
excluded from the match. Secondly, events can be ignored when an object sends itself a
message, or if the depth of invocation exceeds some limit in the pattern or overall execution.
Thirdly, the analyst can choose to treat events as a tree-structure sequence, or simply as a set
of events with no implied ordering.
Collaboration Browser uses a textual GUI to allow the analyst to query the entire execution
or a single collaboration. The analysis can be focussed by excluding selected senders,
receivers, or methods. A collaboration can also be illustrated as a sequence diagram.
50
Two limitations of the Collaboration Browser were identified as follows. Firstly, the pattern
matching was simplified by only considering all of the events between a method call and
return; it could be useful to consider a subset. Secondly, the role of a class is identified as the
set of all methods called on that class during the execution; considering individual class
instances separately could produce a more refined view of roles.
Collaboration Browser is implemented in Smalltalk and visualises Smalltalk programs. The
program to be analysed is instrumented using Method Wrappers [Brant 1998], which allows
selective instrumentation. The Interaction Diagram tool [Brant 1998] is used as the basis for
the sequence diagram representations.
Richner and Ducasse note that the recovery of collaborations is most effective when
combined with high-level views showing the interaction of components in a system [Richner
1999, Richner 2002b].
2.2.12.2 Evaluation
Collaboration Browser is evaluated in a HotDraw case study where the goal is to investigate
the implementation of tools. The scenario executed produced 53,735 method calls, from
which 183 collaboration patterns were extracted using the pattern matching functionality.
The results were then queried to discover the collaboration patterns containing an interaction
between the Tool class and another class in the trace; this produced twelve unique
collaboration patterns. The results were then focussed further to examine four collaboration
patterns resulting from a call to Tool.handleEvent. Further queries on these collaboration
patterns revealed the role played by each of the participant classes. The role of Tool in other
collaborations was also investigated. Further case study evaluation of Collaboration Browser
is given By Richner [Richner 2002b].
It is reported that the case studies showed that the queries helped in locating interesting
collaborations and in understanding the roles of classes in collaborations. They also
demonstrate that the process cannot be fully automated – a human analyst is required. It was
a challenge to identify suitable pattern matching criteria to obtain a balance between too
much and too little information. The iterative process employed in the case study was as
51
follows: collaboration patterns were created; queries were formulated regarding class
interfaces; collaboration patterns involving certain classes were identified; the collaboration
pattern participants were investigated; and the collaboration was investigated further using
the interaction diagram representation.
2.2.12.3 Comparison
Richner and Ducasse [Richner 2002a] consider their approach to be complementary to other
reverse engineering techniques that are more focussed towards visualisation, such as those of
De Pauw et al. (Ovation and Jinsight) [De Pauw 1998] and ISVis [Jerding 1997]. The
approach of Richner and Ducasse is focussed more on querying the trace data to extract
collaborations than on producing a visualisation. They feel that, whereas the techniques of
De Pauw et al. and Jerding and Rugaber consider the trace as a whole, their approach
complements these techniques by concentrating on smaller portions of the interaction. They
also note that no single tool can provide all of the functionality necessary for design
recovery.
The only other approach that attempts to reverse engineer collaborations is one based on
static analysis only [De Hondt 1998]. This approach relies on the analyst selecting
participants and roles for the collaboration and proposing appropriate links between them.
2.2.12.4 Assessment
The collaboration approach used in Collaboration Browser suggests that it would be useful
in general comprehension of specific parts of a software system. The ability to view
collaborations as sequence diagrams would be expected to be helpful in specific reverse
engineering tasks.
52
2.2.13 Together debugger (level 1)
2.2.13.1 Description
The Together debugger is part of the Together ControlCenter development environment
[TogetherSoft 2001a, TogetherSoft 2001b]. It provides all of the standard debugger features,
including breakpoints, expression evaluation and monitoring, variable modification, and
program flow control. Breakpoints can be set at classes, methods, lines, or exceptions.
Whenever a breakpoint is encountered during the execution of the program, the debugger
outputs a message and/or suspends the execution. The values of variables and expressions
can be monitored during execution, and variable values can be modified. Program execution
can be suspended and resumed by the user. Execution can proceed as normal, or in steps
where the debugger executes one line of code then suspends. The user can instruct the
debugger to step to the next line, or into, out of, or over a method. Integration with the
source code allows the user to set breakpoints and watches by selecting a position in the
code, and also to instruct the debugger to run the program up to the current cursor position.
Many IDEs provide a debugger as part of their standard tool set, such as Eclipse [Eclipse
2005].
2.2.13.2 Evaluation
There do not appear to have been any evaluations published regarding the performance or
functionality of the Together debugger.
2.2.13.3 Comparison
The graphical interface of the Together debugger makes it easier for non-experts to use.
Debuggers traditionally have a command line interface, for example jdb [Sun 2002]. The
integration with the source code also makes it more convenient to set and manage
breakpoints and watches.
53
2.2.13.4 Assessment
The low level information provided by the debugger is likely to be useful for some specific
reverse engineering tasks, which are often amenable to analysis at a low level of abstraction.
The debugger is less likely to be useful for general software comprehension tasks, where
information at a higher level of abstraction is typically required.
2.2.14 Together diagrams (levels 2-3)
2.2.14.1 Description
Together ControlCenter can produce UML class and interaction diagrams from program
source code. Unlike other tools considered in this section, Together produces behavioural
diagrams by parsing the program code, rather than by analysing an event trace. As discussed
in Section 2.1, this limits the accuracy of the interaction diagrams generated, while
maximising their generality by considering the entire system. When generating interaction
diagrams, Together addresses the potential information overload problem by allowing the
user to select the classes to be included in the diagram, limit the depth of method calls to be
included, and hide method internals. Interaction diagrams are generated for a method
specified by the user. Together supports ‘simultaneous round trip engineering’, meaning that
changes to the program code are reflected in the derived diagrams and vice versa.
2.2.14.2 Evaluation
Kollmann et al. [Kollmann 2002a] present a comparison of four static reverse engineering
tools. Together is compared with the commercial Rational Rose tool [Rational 2003], and the
IDEA [Kollmann 2001, Kollmann 2002b] and Fujaba [Fujaba 2002] research tools. The
tools were assessed by evaluating the class diagrams that they produced. While basic
diagram generation results were broadly similar across the tool set, Rational Rose detected
some associations that Together did not. The research tools were able to handle more
advanced diagram concepts than the industrial tools, such as multiplicities, inverse
associations, and container resolution.
54
2.2.14.3 Comparison
Together is unique among the tools in this section as it produces behavioural diagrams by
parsing the program code. All other tools considered extract behavioural information
dynamically. As discussed above, this has the effect of reducing the detail of the diagrams
while increasing their generality.
2.2.14.4 Assessment
It would be expected that the combination of the class and interaction diagrams for the entire
system produced by Together would be useful in general software comprehension tasks. The
lack of dynamically extracted information and resultant lack of detail may be a problem in
specific reverse engineering tasks.
2.2.15 SHriMP (levels 0, 2-4)
2.2.15.1 Description
SHriMP (Simple Hierarchical Multi-Perspective) views [Storey 1995] display software
modelled as nested graphs [Harel 1988] using fisheye views [Furnas 1986]. Nodes represent
software artefacts, such as functions or variables. Arcs represent dependencies, such as
function calls. Composite nodes represent subsystems, and composite arcs represent
collections of dependencies. This nesting encapsulates the hierarchical nature of the
software, and allows multiple levels of abstraction to be visualised concurrently. The fisheye
view approach allows the analyst to examine some area of the system in detail in the context
of the entire system. This is achieved by enlarging the nodes of interest while shrinking those
not immediately relevant. Graphs also include links to the source code. SHriMP is intended
to i) provide the user with a range of views of a system, from information about its
architecture down to the source code; and ii) enable the user to focus in on part of the system
while maintaining the big picture.
55
2.2.15.2 Evaluation
The authors describe the application of SHriMP to comprehend the structures of two
systems. Ray Tracer is a C system consisting of approximately thirty modules, and SQL/DS
(Structured Query Language/Data System) is an RDBMS written in PL/AS (a proprietary
IBM systems language) consisting of around 1,300 compilation units. SHriMP views were
implemented as an extension to the Rigi program understanding tool, and their performance
was compared to that of Rigi without nested graphs and fisheye views. They found that
showing detail in context, visualising software structures, visualising source code, and
navigating the hierarchy to be useful techniques in comprehending the subject systems. One
potential drawback noted was that the capability of Rigi to illustrate part of the software
system in a separate window without higher-level information is not present in SHriMP; this
may be useful for very large systems where the maintainer is only interested in a small part
of the system. Similarly, the Rigi overview window, which shows a tree or graph-based view
of the containment hierarchy is not present in SHriMP; this may be a more familiar
visualisation of a hierarchy for some analysts. SHriMP has since been reimplemented using
Java Beans, and applied to itself as a case study [Storey 2001].
2.2.15.3 Comparison
Unlike most of the tools discussed thus far, SHriMP addresses a range of abstraction levels
from code to system architecture through its use of hierarchical views. SHriMP is also
unique in its use of fisheye views, which allow the analyst to display more detail for
interesting parts of the system while maintaining overall context.
2.2.15.4 Assessment
SHriMP would be expected to be useful for tasks relating to the static structure of software
systems at a range of abstraction levels.
56
2.2.16 BLOOM and JIVE (levels 2-3)
2.2.16.1 Description
BLOOM extracts static and dynamic information [Reiss 2001]. A visual query language
allows views to be combined. The system suggests appropriate visualisations based on the
data chosen by the user.
JIVE visualises dynamic information about Java programs [Reiss 2003a]. It uses a ‘box
layout’ which consists of a number of rectangles whose height, width, hue, saturation, and
brightness depict various properties, such as number of calls, number of instantiations, etc.
2.2.16.2 Evaluation
There does not appear to be any documented evaluation of BLOOM.
Anecdotal evidence regarding the use of JIVE on a variety of Java programs is presented by
Reiss [Reiss 2003b]. It is reported that JIVE illustrates the different phases that an
application goes through during its execution, provides rudimentary performance
information, and highlights unexpected behaviour. Weaknesses reported are bias in the
statistics presented, missed thread state transitions, and unwanted artefacts in the trace. Reiss
comments that it would be useful to allow the user to determine how information is grouped.
This would allow the user to examine a program at a high level, then zoom in on a particular
area of interest. Reiss also notes that it would be useful to be able to save and later reload the
trace data.
2.2.16.3 Comparison
Like Ovation and Jinsight, JIVE and BLOOM are concerned with analysing the dynamic
behaviour of software. Reiss comments that the tracing element of Jinsight, which makes use
of a modified JVM, is too inefficient for extensive use of the tool.
57
2.2.16.4 Assessment
The focus of JIVE and BLOOM on dynamically extracted behavioural information should
make them suitable for tasks involving the run-time behaviour of software systems, such as
analysing memory leaks, etc.
2.2.17 Polymetric Views, Class Blueprint, RelVis (levels 2-3)
2.2.17.1 Description
Bertuli et al. describe a lightweight dynamic visualisation technique [Bertuli 2003]. The
technique employs polymetric views, which consist of rectangular nodes connected by arcs,
annotated with metrics. Up to five metrics can be represented per node by the node’s x
position, y position, height, width, and colour. A minimal amount of information is collected
at run-time. The twelve measurements extracted include the number of called methods, rate
of called methods, number of method invocations, number of created instances (class-based),
and total number of method calls (method-based). This approach requires much less space to
store, and incurs a much lower overhead to collect than traditional tracing approaches. Static
information is extracted using the Moose reengineering environment (built on the FAMIX
metamodel). Wrappers are used to trace the program and output the metrics (by means of
counters). The views are specified and displayed using CodeCrawler [Lanza 2003b].
Four types of view are produced using the technique. Each view is illustrated in the context
of a case study of the Moose system. A number of system characteristics were identified
using the views. The Instance Usage Overview view shows the instantiation and usage of
classes, and is intended as a starting point for analysis. It considers the entire system, and
uses a logarithmic scale. This view is displayed as an inheritance tree with nodes
representing classes and edges representing inheritance. The node width represents the
number of created instances, the node height represents the number of called methods, and
the node colour represents the number of method invocations on a class. This view combines
both static (inheritance hierarchy, number of classes) and dynamic (number of class
instances, number of method calls, number of invoked methods) information. It is useful as
an overview of the whole system’s behaviour, and shows the classes used in the system in
the context of the inheritance hierarchy.
58
The Communication Interaction View shows inter-class communication. It considers the
entire system and uses a linear scale. This view uses the embedded spring layout, with
springs being weighted so that classes between which there is a lot of communication will be
aggregated. Nodes represent classes, and edges represent invocations. The node width and
height represent the number of called methods, the node colour represents the number of
method invocations on a class, and the edge width represents the number of invocations
between two classes. This view identifies heavily used classes. It is less scalable than the
Instance Usage Overview view as the layout algorithm employed does not readily identify
well defined groups of classes.
The Creation Interaction View shows class creation between classes. It considers the entire
system and uses a logarithmic scale. This view also uses the embedded spring layout, with
springs being weighted so that classes between which there are a lot of creation invocations
will be grouped. Nodes represent classes, and edges represent invocations. The node width
represents the number of objects created by the class, the node height and colour represent
the number of created instances of the class, and the edge width represents the number of
creation invocations between the two classes. The lower number of arcs makes the Creation
Interaction View more scalable than the Communication Interaction View.
The Method Call Origin View shows the origin of method calls – i.e. internal or external to
the class. It can be used to consider the entire system, a subsystem, or a single class, and uses
a logarithmic scale. This view is displayed as a scatterplot, with nodes representing methods.
The x coordinate represents the number of calls from external methods, the y coordinate
represents the number of calls from internal methods, and the node colour represents the total
number of calls. The scatterplot layout illustrates the three metrics well, even with a large
number of nodes.
The class blueprint approach visualises the static structure of a class [Lanza 2001]. A class
blueprint is based on a template consisting of five rectangles representing Initialization,
Interface, Implementation, Accessor, and Attributes. Size, shape, and colour are used to
visualise these attributes – there is an obvious connection with the authors’ polymetric views
approach.
59
The RelVis approach provides graphical views of source code and release history
information [Pinzger 2005]. Kiviat diagrams are used to display metrics, which can then be
used to identify trends and hence potential refactoring targets [Kolence 1973].
2.2.17.2 Evaluation
The polymetric views approach has been used to analyse a number of applications of up to
1800 classes in size written in Smalltalk, COBOL, C, C++, and Java. It was found that the
approach was useful to give an overview of the system, assess the quality of inheritance
hierarchies, identify candidate classes for refactoring, and assess class coupling.
Disadvantages of the approach are that it considers only static information, lacks detail in
places, and the reengineering of larger systems would require more information than is
provided by the current model [Demeyer 1999, Ducasse 2001, Lanza 2003a].
A version of the approach based on run-time information was useful in providing insights
into the runtime behaviour of the system, presenting various different kinds of information,
and providing overviews as well as more detailed information [Ducasse 2004]. Drawbacks
include the lack of very detailed information (e.g. sequence of interactions, as in a sequence
diagram), and that the user is required to interact with the view to gather the relevant
information.
The approach has also been applied to the problem of software evolution [Lanza 2002]. It
was found that the approach reduces complexity and provides system wide views of the
evolution, provides a finger-grained understanding of class evolution, builds a vocabulary to
describe evolution, and scales well. Limitations include fragility relating to class naming,
screen limitations that necessitate working at a new level of abstraction, and a lack of other
levels of granularity.
Another application of the approach was to the problem of code duplication [Rieger 2004]. It
is reported that the goal of data reduction on different levels was achieved and that the views
were useful for providing overview information. However, layout and readability could be
improved, and a link to the source code would be useful.
60
The approach has also been used to analyse class hierarchy evolution [Gîrba 2005]. Gîrba et
al. describe how they used the technique to answer a number of questions regarding the
evolution of the inheritance hierarchies in several systems.
Two case studies of the class blueprint approach are presented by Lanza and Ducasse [Lanza
2001] and four by Ducasse and Lanza [Ducasse 2005]. The benefits of the approach are
listed as the reduction of complexity and the definition of a common vocabulary. Limitations
are the lack of consideration for cognitive science, the layout, the lack of illustration of a
class’s functionality, the lack of illustration of collaboration between classes, and the lack of
dynamic information.
Pinzger et al. [Pinzger 2005] demonstrate the RelVis approach by applying the technique to
seven releases of the open source Mozilla project spanning three years. The graphs produced
highlighted positive and negative trends in the entities and relationships of the system.
Future work is planned to explore 3D Kiviat diagrams and different sets of metrics.
2.2.17.3 Comparison
Bertuli et al. note that AVID, Program Explorer, ISVis, and Jinsight all employ sophisticated
diagramming techniques to make an entire event trace comprehensible [Bertuli 2003]. In
contrast, the polymetric views approach condenses this information into a number of metrics
that are used to annotate visualisations.
Bertuli et al. note that Program Explorer focuses on classes and objects, such as method
invocation, object instantiation, and attribute access, but it is not intended as a global
understanding tool. They point out the user must know what he is looking for before
commencing the analysis, whereas the polymetric views approaches are intended to cover
the whole system.
Bertuli et al. explain that the purpose of ISVis is to visualise method calls. While patterns can
be recognised and extracted, there is a lack of flexibility in the analysis. The approach scales
well for a large number of messages, but not for a large number of classes in which case the
visualisation becomes less useful.
61
Bertuli et al. comment that AVID is focussed on the lifetimes and number of object in a
system. They explain that AVID is concerned more with static architectural models while the
polymetric views based approaches consider the various types of interactions between
classes during execution.
Bertuli et al. point out that Jinsight visualises messages between objects and extracts
execution patterns, but that class roles are difficult to understand during execution for large
traces. They comment further that the approach taken by Ovation involving class call
clusters and class call matrix is closer to their approach. However, while such visualisations
are simple and have good scalability, they present only a small facet of an OO application.
2.2.17.4 Assessment
The advantages of the techniques based on the polymetric views approach are as follows.
The lightweight approach employed allows minimal disruption to the system under analysis.
It also reduces the amount of data produced compared to a full trace. The technique can be
attached to a running system. This allows it to be used for systems such as web servers that
are running constantly. The approach is incremental and data can be analysed cumulatively.
The views provide overviews as well as more fine-grained information. The disadvantages
are the lack of invocation sequence level information, as in Jinsight, and the shortcomings of
the spring layout in dealing with high levels of communication.
2.2.18 Seesoft, SeeSys, SeeSlice, HierNet, SeeNet, SeeNet3D (levels 0, 2-3)
2.2.18.1 Description
Seesoft visualises source code from systems up to 50KLOC [Eick 1992]. Each line of code is
mapped to a thin row of colour. The four key ideas are: reduced representation, colouring by
statistic, direct manipulation, and capability to read the actual code. Data can be taken from
version control systems, static analyses, or dynamic analyses.
SeeSys visualises statistics associated with code organised hierarchically into systems,
subsystems, and files [Baker 1994]. The approach can display the relative sizes of
62
components, which components are stable and which are changing, where new functionality
is being added, and identify error-prone code that has many bug fixes. Animation can be
used to display code evolution. The visualisation is based on nested rectangles. Each
subsystem is denoted by a rectangle whose area is determined by some statistic. These
rectangles are then partitioned to show their internal directory structure, each sub-rectangle’s
area being proportional to the NCSL (non-commentary source lines) metric for that
directory. Rectangles can be filled to illustrate additional metrics. For example, a fill may be
used to show the proportion of a directory’s NCSL that corresponds to new code.
SeeSlice is a tool that allows slicing at the statement, procedure, or file level and visualises
the structure of the slice produced [Ball 1994]. Files are displayed as columns containing
representations of procedures. Procedures can be display ‘open’ (code visible) or ‘closed’
(code hidden). Pointing to a statement immediately highlights the procedures and code
included in the slice.
HierNet visualises networks where each link has an associated weight, and exploits any
hierarchy present [Eick 1993]. The position, area, and colour of nodes is significant, as is the
colour of arcs. For example, in an email system, the area of a node could be proportional to
the number of messages sent or received by the user represented by that node, the node
colour could indicate job function (clerical, technical, management), links could show email
communication between individuals, and a heat colouring scale could be used to indicate
communication frequency.
SeeNet is a tool for visualising network data [Becker 1995]. It consists of three static
displays and direct manipulation techniques that allow these displays to be parameterised.
Link maps consist of nodes connected by lines to indicate data flow. This shows the
connectivity of the network. Line segments may be coloured or drawn with varying
thicknesses to illustrate values. Arrows can be used to indicate link directionality. Problems
with link maps are link overlap, long links, and difficulties in determining line terminations.
An alternative representation that avoids this clutter is the nodemap. Nodemaps use symbols
or glyphs to represent nodes and illustrate statistics through visual characteristics such as
size, shape, and colour. Complex glyphs can represent more than one statistic. Although a
nodemap solves the clutter problem experienced with link maps, it does so at the expense of
detailed information about individual links. Another possible approach to the clutter problem
is to omit geographical information. A matrix display displays a network in a matrix form
63
with each matrix element allocated to a link. While overcoming the clutter problem, the
matrix display sacrifices information about the geography of the network – indeed, it may
introduce a false idea of geography due to the ambiguous ordering of rows and columns.
SeeNet allows direct manipulation of the various parameters involved in network
visualisations (statistics, levels, geography/topology, time, aggregation, size, colour).
SeeNet3D introduces five new three-dimensional views to address some of the fundamental
problems that limit the scalability of two-dimensional geographical network displays [Cox
1996]. A global network positions nodes geographically on a globe and draws arcs between
them. Restricting the 3D space to a globe captures many of the advantages of a general 3D
layout, while helping the user to maintain context. Users are also familiar with globes. Arc
crossings, and hence visual clutter, are reduced by the background of the globe surface and
the 3D embedding. An arc map positions nodes on a flat 2D map and draws arcs between
them in 3D space. Advantages of arc maps are that they are not restricted to whole world
displays, they can be positioned arbitrarily in space, the use of arcs greatly reduces the line
crossings typical in 2D displays, and the most important links are represented by the highest
arcs. To analyse a particular node or subnetwork, drill-down network views can be
employed. These linked views showing data on demand display links emanating from a
central focal node. Spoke displays order nodes around the focal node in a circle. Spoke
displays become overwhelmed with >50-100 nodes. This problem can be circumvented by
means of a 3D layout that positions the nodes on a helix. An alternative 3D display,
motivated by the helix display, positions the nodes approximately uniformly round a sphere
(as with a pincushion), thus forming them into lines of latitude. Another alternative is to
tessellate the sphere surface and select points from the tessellation. To be effective, the
pincushion (like the helix) views need to be viewed interactively with motion.
2.2.18.2 Evaluation
An example of using Seesoft to visualise change data is presented for a 9 KLOC system
[Eick 1992]. The analyst was able to learn which files were changed most often, the age of
the code, when each file was last changed, and how files can be grouped by modification
request. Anecdotal evidence of field experiences is also discussed.
64
SeeSys [Baker 1994] is applied to the source code for the ATT 5ESS telephone switch. The
system consists of several MLOC, written by thousands over programmers over a decade.
They were able to show the sizes of the subsystems and directories that have changed
recently, zoom in on particularly active subsystems, discover how much of the development
activity involved bug fixes and new functionality, identify directories and subsystems with
high fix-on-fix rates, and identify the subsystems that have been historically active and also
those that have shrunk or been removed.
SeeSlice [Ball 1994] is applied to a 12KLOC profiling/tracing tool written in C. They were
able to determine that most of the program is dependent on five highly interdependent input
procedures, a set of interdependent procedures spanning four files is responsible for output,
and a single variable influences a large portion of the program.
HierNet is demonstrated by applying it to an intra-departmental email network over eight
months and to changes to a large section of a computer program [Eick 1993]. For the email
network, the visualisation showed that the amount of mail varies greatly, a community of
three users can be identified, and that it typically takes two months for communication
patterns to solidify after a new user joins the system. For the software system, the approach
reduced the size of the data set, found a large group of near-identical modules, located a
group of modules performing a function independently of the other modules, and identified
an anomalous module whose files are linked with most other modules.
The SeeNet approach is demonstrated by applying it to the CICNet packet-switched data
network and an email communications network [Becker 1995].
Anecdotal evidence of the application of the SeeNet3D approach to data such as NFS-
NET/ANSnet backbone (50 countries) and MBone Internet traffic is presented by Cox et al.
[Cox 1996].
2.2.18.3 Comparison
Unlike the other tools described in this section, these approaches focus on visualising a
particular type of data (i.e. source code, hierarchical data, network data, or slices), rather than
visualising the data for a particular purpose (e.g. to show object interactions, or to gain a
65
general understanding of the software) or at a predefined abstraction level (e.g. methods,
classes).
2.2.18.4 Assessment
These techniques would be particularly useful for tasks involving the visualisation of source
code, network data, or slices.
2.2.19 sv3D and Imsovision (levels 2-3)
2.2.19.1 Description
sv3D uses a three-dimensional representation to visualise software structure [Marcus 2003a].
Source files are represented as coloured cylinders, where height represents nesting and
colour represents controls structure.
Imsovision uses a VR style representation to display classes and their relationships, along
with metric information [Maletic 2001]. Planes are used to represent classes, spheres
represent attributes, and columns represent functions.
2.2.19.2 Evaluation
Marcus et al. present anecdotal evidence of applying sv3D to a small (4KLOC) C++ system
to demonstrate the approach [Marcus 2003a]. Marcus et al. describe the application of sv3D
to a 56KLOC system (Doxygen) [Marcus 2003b]. They report on how sv3D was used to
identify execution hotspots from profiling information.
Imsovision is applied to a small mail system by Maletic et al. [Maletic 2001]. The purpose of
Imsovision seems to be to provide a general understanding of a system.
66
2.2.19.3 Comparison
sv3D and Imsovision are unique amongst the tools discussed here in their use of 3D and
virtual reality visualisations respectively. Marcus et al. [Marcus 2003a] comment that
SeeSoft’s use of 2D pixel bars limits the number of attributes that can be represented, and
makes it difficult to represent hierarchical relationships and multiple abstraction levels.
These are issues that sv3D seeks to address.
2.2.19.4 Assessment
sv3D would appear to be useful in analysing dynamic behaviour, while Imsovision’s strength
lies in gaining a general understand of a software system.
2.2.20 Tool summary
This section has reviewed a selection of software visualisation tools, which illustrate the
concepts described in Section 2.1. Each tool was discussed in the context of the three
characteristic criteria introduced in Section 2.2.1. Early object-oriented software
visualisation tools were concerned primarily with illustrating method-level interactions; such
tools included Program Explorer and Scene. Later tools began to consider the problem of
architectural extraction, and architectural-level visualisations were produced by tools such as
Sefika’s, ISVis, Dali, AVID, and RMTool. The latest tools have attempted to bridge the gap
between microscopic and macroscopic visualisations and provide both low-level and
architectural visualisations, namely Gaudi, Shimba, Collaboration Browser, and SHriMP.
Figure 2.4 annotates the abstraction scale from Figure 2.3 to illustrate the relative levels of
abstraction of these tools. It is clear from this figure that the extant software visualisation
tools address only a single level of abstraction or a limited range of levels.
Tools have also been developed to address specific tasks, such as Jinsight for performance
analysis, Together to support software development, BLOOM, JIVE, and sv3D for dynamic
understanding, and Imsovision for VR exploration. Some tools focus on addressing the
requirements of displaying specific types of data, such as the tools developed by Eick et al.
for visualising source code, hierarchical data, network data, and program slices, and the
67
Polymetric Views approach for visualising metrics. There is an emerging trend of
retargetable software visualisation tools, which can be used to visualise programs in a variety
of languages, rather than being designed for use with one specific language. Such tools
include Dali, RMTool, Gaudi, and Together. A retargetable design makes the tool more
flexible and should encourage usage and interoperability. Section 3 assesses those tools that
were available in the context of a case study involving both general software comprehension
and specific reverse engineering activities.
Figure 2.4 The positions of tools on the abstraction scale of Figure 2.3
2.3 Abstraction
It is clear from the foregoing discussion that abstraction is a crucial concept in software
visualisation. This section discusses the concept of abstraction and its application in software
engineering and visualisation.
2.3.1 The concept of abstraction
As stated in Section 1.1.5, abstraction is the process of producing a simplified representation
that emphasises the important information while suppressing details that are (currently)
uninteresting, with the goal of reducing complexity and increasing comprehensibility [Berard
1993]. Lee and Fishwick define an abstraction as a “generalized, idealised model of a
system” [Lee 1996]. Abstraction is employed in a wide variety of scientific fields, including
68
statistics, simulation theory, management science, and software engineering. Two principal
features of abstract models identified by Fishwick are that they are usually less complex and
more comprehensible than the model from which they are derived [Fishwick 1988].
2.3.2 The historical origins of abstraction
Abstraction has provided the foundation that we use for performing mental tasks ever since
human thought began [Kirsanov 1998]. The modern use of abstraction began in the early
twentieth century in a variety of fields [Hooker 1996]. Hooker provides evidence for this
with the examples of abstract art, atonal music, Einstein’s Theory of Relativity [Einstein
1920], and Keynesian economics [Keynes 1936]. In this modern context, abstraction refers
to the view that separate aspects of human experience are independent of each other, and can
hence be reasoned about in isolation.
2.3.3 The application of abstraction
Fishwick [Fishwick 1988] presents abstraction in the context of simulation using the dining
philosophers (DP) problem [Dijkstra 1968]. The models used are a frequency distribution,
finite state automaton, observed data, Petri net [Petri 1962, Peterson 1981], flow graph, and
equations. These models are then presented as an abstraction network, consisting of the
models and abstraction techniques that relate them. For example, a more abstract flow graph
model of the DP system can be derived from the Petri net model using abstraction by
representation. Fishwick describes a number of abstraction techniques, namely: abstraction
by representation, abstraction by induction, abstraction by reduction, total systems
morphism, and partial systems morphism.
In abstraction by representation, an abstract model represents a base model in another form.
Such models are often purely structural and have no behaviour, except as defined by the
more detailed base model. Abstraction by induction involves combining elements from the
base model to form a smaller, more compact representation. Abstraction by reduction is
achieved by deriving a representative summary of the base model. A total systems morphism
(TSM) [Zeigler 1976] is a mapping between all of the elements in the base and abstract
models. A TSM preserves both structure and behaviour. TSMs are well-suited for abstracting
69
discrete representations (e.g. graphs), but less so for continuous systems. A partial systems
morphism (PSM) is a mapping between some subset of the elements in the base and abstract
models. In contrast to a TSM, all structure and behaviour is not necessarily preserved in a
PSM. Sensory (visual) and cerebral abstraction are also discussed; unlike the previous five
techniques, these do not define any mappings. Sensory abstraction aims to produce a model
that is convincing to an audience, but without the attendant complexity of a mapping
technique, for example, particle systems simulating fire or explosions [Reeves 1983].
Cerebral abstraction relates to the way in which humans reason about models. Other methods
of abstraction include geometric model abstraction, where complex geometric elements are
approximated by simpler ones [Clark 1976, Feiner 1985].
It is important that abstractions are evaluated in order to determine their utility. Fishman and
Kiviat [Fishman 1967] define three components of evaluation as verification (ensure the
model is consistent and behaves as intended), validation (test the model against the real
system to assess similarities and differences), and analysis (ensure the output data is
correctly interpreted). Fishwick [Fishwick 1988] defines an abstraction method as being
valid by dint of its definition (i.e. if the definition of the method is valid, then the method
itself is valid). An abstract model is considered valid if it can be either validated empirically
or produced from a valid base model using a valid abstraction technique. An example of an
empirical validation of an abstraction model could be the percentage of human observers
who found the model convincing. Fishwick argues that abstraction models should be
formalised whenever possible.
2.3.4 Abstraction in software engineering
Abstraction is employed in software engineering to help manage the complexity of software
systems. For example, a diagram may be used as an abstraction to illustrate the principal
components of a system. A number of different types of abstraction are used in software
engineering. Functional or procedural abstraction allows a package of program functionality
to be considered as a ‘black box’ with a clearly defined interface and its implementation
hidden [Alexandridis 1986, Liskov 1986]. Iteration or action abstraction is used to express
repeated patterns of program behaviour, such as loop constructs [Zimmer 1985, Liskov
1986]. Data abstraction is based on the idea of ‘abstract data types’, which allow data to be
stored and manipulated through a defined interface without concern for how the raw data is
70
represented [Guttag 1977, Ledgard 1977, Shaw 1984]. Process abstractions are similar to
data abstractions, but include a thread of control [Alexandridis 1986]. In the context of
knowledge-based OO logic programming, Park defines object abstraction as the
combination of knowledge abstraction (models of knowledge base representation and
control), data abstraction, and connection abstraction (models of object hierarchy and
communication) [Park 1991].
2.3.5 Abstraction in software visualisation
Abstraction is crucial in software visualisation to allow the large quantities of information
involved to be comprehended usefully. The study of software visualisation tools described in
Section 2.2 found that the various tasks typically involved in software comprehension and
reverse engineering efforts are best addressed at different levels of abstraction. The work
also showed that most extant software visualisation tools operate at only one or two such
levels (as measured on the five-level abstraction scale proposed). Consequently, it is
currently necessary to utilise several tools in combination in order to address satisfactorily
the full range of software comprehension tasks.
2.4 Effective presentation techniques for software visualisation
It is clear from the foregoing discussion that the technique used to present the results of the
data analysis is a crucial component of the software visualisation process. This section
discusses diagram types for presenting software visualisations and view arrangements to
organise them.
2.4.1 Diagrams for describing software
The goal of software visualisation is to present information about the software system under
investigation to the analyst in a format that is useful in helping them to achieve their
software comprehension tasks. A variety of diagram types for describing software systems
have been proposed in the literature and implemented in CASE (computer-aided software
71
engineering) and visualisation tools. A selection of these are listed in Table 2.1 and
discussed below.
Table 2.1 A selection of diagrams for describing software
Structured design diagrams UML extension diagrams
Basic graph Robustness analysis diagram
Petri net Business process diagram
Nassi-Shneiderman diagram
Entity relationship diagram Real time modelling
Control flow diagram System context diagram
Data flow diagram System architecture diagram
Data structure diagram Event sheet diagram
Statechart
XML modelling
Pre-UML OO diagrams XML structure diagram
Booch diagram
Message sequence chart Recent SE literature
InfoBUG, timeWheel, 3D-wheel [Chuah 1997]
UML diagrams Execution pattern [De Pauw 1998]
Class diagram Reflexion model [Murphy 2001]
Object diagram Story board diagram [Fischer 2000]
Sequence diagram SoftArch diagrams [Grundy 2000]
Collaboration diagram Virtual reality [Knight 2000, Maletic 2001]
Component diagram Matrix views, cityscapes [Eick 2002]
Deployment diagram 3D [Martin 2002, Marcus 2003a]
Activity diagram Use case [Riva 2002]
Statechart diagram Visualization in contexts [Yin 2002]
Use case diagram Polymetric views [Bertuli 2003]
DRT [Chan 2003]
2.4.1.1 Structured design diagrams
Before object-oriented techniques became popular in the early 1990s, a number of diagrams
for supporting the traditional structured design process had been proposed. These included
72
Petri nets [Petri 1962], Nassi-Shneiderman diagrams [Nassi 1973], entity relationship
diagrams [Chen 1977], control flow diagrams [Hatley 1987], data flow diagrams [Pressman
2000, Sec. 12.4.1], data structure diagrams [Pressman 2000, Sec. 13.4.7], and statecharts
[Harel 1990].
2.4.1.2 Object-oriented diagrams
The advent of the object-oriented paradigm produced a new set of diagrams. These included
Booch diagrams [Booch 1994] and message sequence charts (MSCs) [ITU-T 1996]. A
popular set of OO diagrams is that defined by the Unified Modeling Language (UML)
[Rumbaugh 1999, OMG 2003c]. UML version 1.5 defines a set of nine diagrams for
describing various aspects of the analysis, design, and implementation of software, which are
popular during the forward engineering process. These diagrams consist of boxes
representing entities (e.g. classes, objects, components), connected by arcs representing
relationships (e.g. inheritance, communication, dependency). Selonen et al. [Selonen 2001]
discuss transformations between UML diagram types. Burd et al. [Burd 2002] describe an
experiment demonstrating that animation aids understanding of UML sequence diagrams.
UML models are essentially graph-based, and basic graphs (with one type of node and one
type of edge), such as call graphs, can also be used to represent software (e.g. in the Program
Explorer tool [Lange 1995b]). MSCs were a precursor to UML sequence diagrams, while
UML statechart diagrams are derived from Harel’s statecharts.
The UML diagrams described above are implemented in many popular CASE tools. A
number of additional diagrams that are not part of the UML standard, such as robustness
analysis diagrams and business process diagrams, as well as diagrams intended specifically
for modelling real-time systems, such as system context diagrams, system architecture
diagrams, and event sheet diagrams, and XML (XML structure diagrams), are also available
in some tools.
73
2.4.1.3 Recent literature
The recent software engineering literature has also proposed a number of diagrams. De Pauw
et al. [De Pauw 1998] describe a variation of the MSC called an execution pattern that
incorporates colour and emphasises time rather than control flow. Murphy et al. [Murphy
2001] present reflexion models for modelling high-level system entities. Fischer et al.
[Fischer 2000] describe story board diagrams (SBDs), which combine aspects from three
UML diagrams. Grundy and Hosking [Grundy 2000] have implemented the SoftArch
environment for architectural visualisations. Knight and Munro [Knight 2000] discuss the
use of virtual reality environments for modelling software. Martin et al. [Martin 2002] use a
three-dimensional environment to illustrate component dynamics. Riva and Rodriguez [Riva
2002] incorporate a basic use case visualisation into their approach. Yin and Keller [Yin
2002] use a UML-based notation in their visualization in contexts technique. Bertuli et al.
[Bertuli 2003] describe polymetric views, which are annotated with measurements collected
from software. Chan et al. [Chan 2003] enhance their visualisations with application
screenshots.
The survey in Section 2.2 revealed that the diagrams used in the extant software visualisation
tools address only a single abstraction level, or a small range. Arranging diagrams in an
interrelated hierarchy encompassing the entire range of abstraction levels would increase
their utility and aid comprehension, as all levels of abstraction could be addressed
conveniently.
2.4.2 Views for software comprehension
Diagrams, such as those described in the previous section, are used to illustrate models of
software. Different views of a software model are possible - these views are implemented
using diagrams. It is proposed in this thesis that there are six possible arrangements of views
onto a software model, illustrated in Figure 2.5, namely: (a) a single view illustrating a single
facet2; (b) multiple independent3 views illustrating a single facet; (c) multiple interdependent
2 A facet in this context is taken to mean a (interesting) property of a software system, such as its structure or behaviour [Jahnke 2002]. 3 The views are independent in the sense that there is no coordination between them. Two models of the same system may be implicitly dependent on each other unless they refer to disjoint parts of the system.
74
views illustrating a single facet; (d) a single view illustrating multiple facets; (e) multiple
independent views illustrating multiple facets; and (f) multiple interdependent views
illustrating multiple facets. This categorisation distinguishes view arrangements by the
number of views (one or multiple), the number of facets (one or multiple), and their
relationship (independent or interdependent) The remainder of this section describes these
arrangements in more detail.
View A View B2
View F1View F2
View F3
View E1
View E2
View E3
Single viewSingle facet
Single viewMultiple facets
Multiple interdependent viewsMultiple facets
Multiple independent viewsMultiple facets
a b
c
e
Multiple independent viewsSingle facet
f
View B1
View B3
View D
Multipleinterdependent
viewsSingle facet
dView F1
View F2
View F3
Figure 2.5 The six arrangements of views onto a software model. The rectangles around the views in
parts c and f represent the coordination inherent in such interdependent arrangements
2.4.2.1 A single view illustrating a single facet
This arrangement illustrates a single facet of the software system in one view. A single facet
may not illustrate all of the information necessary for comprehension, but for a specific task
it may be sufficient. A single view provides the analyst with only one perspective of the facet
under investigation, which may restrict exploration. The reference implementation of the
75
Dali workbench [Kazman 1999] and jRMTool [Murphy 2001] implement this arrangement.
In the case of these tools, either structural or behavioural facets can be visualised.
2.4.2.2 Multiple independent views illustrating a single facet
This arrangement uses multiple views to illustrate a single facet of the software system.
Multiple views give the analyst a number of perspectives of the facet, and may improve the
navigability of the model (cf. Baldonado et al.’s ‘Rule of Diversity’ and ‘Rule of
Complementarity’ [Baldonado 2000]). However, the lack of relationships between the views
can cause the analyst cognitive difficulties in reconciling the multiple views and transferring
information between them (cf. Baldonado et al.’s ‘Rule of Parsimony’ [Baldonado 2000]).
An example of this arrangement would be the use of a number of single view, single facet
tools in combination to visualise a single facet from multiple views. For example, the Dali
and jRMTool tools could be used in combination to provide two independent views of
structural or behavioural information.
2.4.2.3 Multiple interdependent views illustrating a single facet
This arrangement also illustrates a single facet of the software system using multiple views.
In this case, the interdependency between views alleviates many of the cognitive difficulties
inherent in the previous arrangement (cf. Baldonado et al.’s ‘Rule of Self-Evidence’ and
‘Rule of Consistency’ [Baldonado 2000]). Such interdependent arrangements are typically
implemented using a Model-View-Controller architecture [Krasner 1988] to maintain
synchronisation between the views and with the model. Scene [Koskimies 1996],
Architecture-Oriented Visualization [Sefika 1996a], ISVis [Jerding 1997], Sced [Koskimies
1998], Ovation [De Pauw 1998], AVID [Walker 1998], Gaudi [Richner 1999], Jinsight [De
Pauw 2002], and Collaboration Browser [Richner 2002a] implement this arrangement. The
facet in these tools illustrates behavioural information.
76
2.4.2.4 A single view illustrating multiple facets
This arrangement presents multiple facets of the software system in a single view. Multiple
facets present more information to the analyst, which may help with comprehension of the
software. However, compressing all the information into a single view can lead to
information overload and reduced comprehensibility (cf. Baldonado et al.’s ‘Rule of
Decomposition’ [Baldonado 2000]). The implementation of story board diagrams described
by Jahnke et al. [Jahnke 2002] is an example of this arrangement. The tool they describe
illustrates structural, behavioural, and data facets.
2.4.2.5 Multiple independent views illustrating multiple facets
This arrangement uses multiple views to illustrate multiple facets of the software system,
with no interaction between the views. While this arrangement combines the benefits of
multiple facets and multiple views, the lack of relationships between the views can cause
difficulties in comprehension as described in Section 2.4.2.2. An example of this
arrangement would be the use of a number of single view, single facet tools in combination
to visualise multiple facets from multiple views. For example, the Dali and jRMTool tools
described above could be used in combination to provide two independent views of
structural and behavioural information.
2.4.2.6 Multiple interdependent views illustrating multiple facets
This arrangement also uses multiple views to illustrate multiple facets, with the addition of
interrelationships between the views. This arrangement has the same advantages as the
previous one, but the interdependency between views aids comprehension as described in
Section 2.4.2.3. This arrangement is employed in Kruchten’s 4+1 View Model [Kruchten
1995] and in work by Hofmeister et al. [Hofmeister 1999b] to illustrate structural,
behavioural, and (in Kruchten’s work) data facets. The Program Explorer [Lange 1995b]
and Shimba [Systä 2001] visualisation tools and the Together CASE tool [Borland 2004a]
implement this arrangement. These tools illustrate both structural and behavioural facets.
77
It appears from the foregoing discussion that an arrangement of multiple interdependent
views illustrating multiple facets of a software system is the most desirable arrangement of
views for software comprehension. Multiple views give a variety of different perspectives on
various facets of the software, while the interdependency between the views aids cognition.
Such an arrangement would allow software to be described conveniently using a set of
diagrams illustrating relevant information at appropriate levels of abstraction. The use of
multiple views in visualisation is discussed in more detail by Baldonado et al. [Baldonado
2000].
2.5 Effective techniques for exploring and querying visualisations
A crucial factor in the usefulness of a visualisation system is the ease with which the analyst
can interact with the visualisation to obtain the information they require. In this thesis the
two principal types of navigation technique observed in the extant visualisation tools are
classified as exploration and querying.
2.5.1 Exploration
A system employing the exploration technique presents the visualisation to the analyst and
allows them to explore it freely. Although giving the analyst complete freedom to explore
the visualisation, the large volume of information typically generated can make it difficult to
find the cogent information required for the analyst’s tasks. The complexities inherent in the
object-oriented paradigm compound this issue. Tools such as ISVis and Together utilise the
exploration technique.
2.5.2 Querying
A system employing the querying technique allows the analyst to specify queries to be
applied to the visualisation and then view the results. Queries can be specified in a textual or
visual query language, such as SQL [ANSI 1998] or MURAL [Reiss 2002] respectively, or
using a GUI. This approach can help the analyst to focus the visualisation on the information
pertaining to their specific tasks. However, the analyst must know enough about the system
78
to be able to form useful queries. Tools such as Gaudi, Collaboration Browser, and BLOOM
utilise the querying technique. Gaudi uses a textual query language, Collaboration Browser
uses a GUI, and BLOOM uses a visual query language.
2.5.3 Guided navigation
There is a third possibility that has not been observed in the extant visualisation tools that in
this thesis is termed guided navigation. A system employing guided navigation would assist
the analyst in achieving their goals by suggesting likely lines of enquiry. A wizard-based
approach may be suitable for this technique.
In practice, some systems employ a combination of the exploration and querying techniques.
Such an arrangement combines the flexibility and scope of exploration with the focussing
power of the querying technique. Guided navigation, possibly using wizards, is an interesting
and complementary alternative.
2.6 Software modelling
This section discusses related work from the field of software modelling. Software modelling
is closely related to software visualisation: the goal of both approaches is to produce a
representation of a software system. In the case of software visualisation, such
representations are visual, whereas in software modelling they may be purely conceptual.
2.6.1 The 4+1 view model
Kruchten describes an architectural model consisting of four views and a set of scenarios for
validating these views [Kruchten 1995]. The logical view describes the object structure of
the system. Representations include Rational/Booch class diagrams or (for data-driven
systems) entity-relationship diagrams (ERDs). The process view describes how the logical
entities of the system are delineated into processing units. Kruchten uses a version of
Booch’s Ada task notation for this view. The development view describes the organisation of
the system’s development into a hierarchy of layers of subsystems. Again, a variation on
79
Booch’s notation is employed. The physical view describes the deployment of the system
amongst processing nodes. It appears that some form of Booch notation is used for this view.
The views are illustrated by examples. Scenarios are detailed in a similar manner to the
logical view, and are accompanied by a script that describes the interaction. There are
interconnections amongst the views, for example between the logical and process views and
the logical and development views. An iterative, scenario-driven approach is used to develop
and refine architectural specifications using the technique.
2.6.2 Hofmeister et al.
Hofmeister describes a method of describing software architecture using four views
consisting of UML diagrams [Hofmeister 1999b]. The views are based on the authors’
experiences with large systems. The conceptual view describes the functionality of the
system. The module view describes the decomposition of the software. The execution view
describes the correspondence between modules and run-time concepts, such as threads. The
code view describes the mapping of logical entities to program files.
The conceptual architecture view consists of components with ports, and connectors with
roles that define how they can connect to ports. A combination of components and ports is
termed a configuration. The conceptual view uses: class diagrams to depict the static
configuration; ROOM (Real-time Object-Oriented Modelling) protocol declarations [Selic
1994, Selic 1998] and sequence or state diagrams to show the correspondence between
protocols and ports; and sequence diagrams to illustrate sequences of interactions among
components.
The module architecture view decomposes subsystems into modules, and assigns them to
layers. There is no configuration in the module view as it describes inherent properties of the
system, rather than a particular instantiation. The module view uses: tables to map elements
between the conceptual and module views; package diagrams to illustrate subsystem
decomposition dependencies, use-dependencies among layers, and the mapping between
modules and layers; and class diagrams to illustrate inter-module use dependencies.
The execution architecture view describes the combination of modules to form a particular
product by assigning them to run-time images. Run-time images are bound to
80
communication paths to form a configuration. The execution view uses: class diagrams to
illustrate the static configuration; sequence diagrams to illustrate the dynamic behaviour of a
configuration, or transition between configurations; and state or sequence diagrams to
illustrate a communication path’s protocol.
The code architecture view consists of files and directories. As in the module view, there is
no configuration, as the relations described apply to all instantiations of the system. The code
architecture view uses: tables to map between elements in the module and execution views
and the code view; and component diagrams to illustrate the dependencies between source,
object, and executable files.
The views are demonstrated using an example. One concern the authors note is that UML
notation can be susceptible to a number of different interpretations due to the lack of a well-
defined semantics. They also note that the use of UML, which is traditionally used to design
implementation classes, to describe architecture risks blurring the distinction between
architecture and implementation. They found that UML was useful for describing static
structure, variability, and particular sequences of activities. The found it to be lacking in
describing correspondences, protocols, ports on components, dynamics, and general
sequences of activities. Although entities can be contained by other entities, the views are
only at one level of abstraction.
2.6.3 ManSART
ManSART (MITRE Software Architecture Recovery Tool) recovers architectural features
from source code by means of a library of ‘recognizers’ [Chase 1996]. ManSART makes use
of ‘operators’ to extract information from views. These operators manipulate views through
a combination of graph operations and ‘containment analysis’ (determining when an element
of a view overlaps or contains another element). Manipulations can be syntactic, derived, or
transformational. Syntactic manipulations provide an alternative representation of the
information. Derived manipulations create new views based on combinations of components
and connectors in existing views. Transformational manipulations produce a new
architecture from the information.
81
Yeh et al. describe the six types of view manipulation possible in ManSART [Yeh 1997]. A
cross-view relation combines the components of one view with the relationships from
another view. Merging combines the information from two views into a single view.
Bundling aggregates relations between components. Building a hierarchy allows the user to
create a hierarchy of views where each component in a view is described in terms of the
components it contains from a lower level view. Finding neighbours creates a subset of a
view by including only the selected component and its immediate neighbours. Finding
connected subsets decomposes a view into connected subgraphs, where each subgraph
becomes a component in the new view.
2.6.4 Zachman framework for enterprise architecture
Zachman’s framework for enterprise architecture consists of thirty views arranged in six
hierarchies [Zachman 1996]. The framework is intended to be a logical structure for
categorising representations of an enterprise that are significant to its management and the
development of its systems. The six hierarchies describe product abstractions for an
enterprise: Data (what it is made of), Function (how it works), Network (where the
components are located), People (who does what work), Time (when things happen), and
Motivation (why choices are made). Each hierarchy consists of five levels which define roles
in the design process: Planner (concerned with system scope, at a contextual level), Owner
(concerned with the business model, at a conceptual level), Designer (concerned with the
system model, at a logical level), Builder (concerned with the technology model, at a
physical level), and Subcontractor (concerned with detailed representation, out of context).
The intersection of a row/column represents the interaction of a design role with a product
abstraction. For example, the intersection of the Function hierarchy with the Builder role
corresponds to the System Design, which illustrates data elements and sets (the I/O) between
computer functions (the processes).
2.6.5 IEEE Recommended Practice for Architectural Description
IEEE P1471 Draft Recommended Practice for Architectural Description defines a
conceptual framework for describing software architectures [Hilliard 1999]. The central
abstraction of this standard is the concept of an Architectural Description – this is a
82
collection of artefacts that document the architecture of a System. An Architectural
Description is organised into one or more architectural Views. Each View conforms to a
Viewpoint, which embodies the rules governing the view. A View consists of one or more
architectural Models – the representational scheme (e.g. UML) of the model is not defined
by the standard, and each model may use a different scheme. A System exists in an
Environment and has an Architecture and one or more Stakeholders. Each Stakeholder has
one or more Concerns – aspects of the system that are important to them (e.g. security,
reliability).
2.6.6 Other approaches
Issarny et al. [Issarny 1998] propose four views for software architecture: Functional
showing the operations provided by the system; Interaction showing the protocols used in
interacting with the system; Efficiency showing how efficiency could be improved; and
Dependability showing fault tolerance mechanisms. In combining these views, they
concentrate on the issues of consistency and structure. They list some considerations to be
taken into account to ensure the former. The latter is stated as being ongoing work – they are
developing a tool to produce CORBA architectures from specific views.
The approach of Waters et al [Waters 1999] does not specify pre-defined views (unlike
Kruchten), but integrates architectural views generated by other tools or from
documentation. This approach allows non-disjoint views to be combined (‘fused’) to check
for commonality, consistency, and to create compositions (present structural information
from multiple views as a single view).
Bergey et al’s horseshoe model describes the architecture recovery process [Bergey 1999].
Changes to the system can be made at the ‘code structure’, ‘function’, or ‘architectural’
levels. There is no explicit mapping or interaction defined between the levels of the model.
2.7 Evaluation
Evaluation is a crucial component of any project. In order to assess the results of the project
evaluation criteria are required, by which it can be determined what has been achieved.
83
Without evaluation, it is impossible to state whether the project was successful or not, or to
draw any conclusions from the results. Before embarking on a project the criteria by which it
will be evaluated should be established. This provides a predetermined basis for assessing
the project on completion.
Empirical evaluation is evaluation consisting of experimentation, rather than purely
theoretical analysis [Basili 1996, Perry 2000]. Empirical studies are useful for testing
hypotheses, as they provide experimental data that can be analysed. Empirical studies are
often used in software engineering to perform more realistic evaluations than are possible
with purely theoretical methods.
This section will discuss a number of methods for the evaluation of software visualisation
and software comprehension tools. The techniques discussed are also more widely applicable
in other areas of software engineering and beyond.
2.7.1 Globus and Uselton (1995)
Globus and Uselton discuss the evaluation of scientific visualisation software [Globus 1995].
Although their analysis is concerned with modeling physical systems, such as the field of
computational fluid dynamics, there are important points that are relevant to software
visualisation. They observe that visualisation is becoming widely used, and that for a
visualisation to be useful it is important to be able to evaluate it. However, they also note that
there is a lack of evaluation in the visualisation community at the time of writing.
Firstly, they discuss the need for standardised test suites consisting of test data and a set of
tests of particular visualisation techniques and functions. In the case of software
visualisation, the test data would be provided by a representative sample of systems to which
the visualisation would be applied. The wider the range of the systems, the more generally
applicable and useful the validated visualisation is likely to be.
Secondly, they highlight the importance of the effect of error in visualisation. Errors should
be minimised and it is important to characterise (recognise and, where possible, quantify)
any error in order to provide an accurate visualisation. Although the potential for error is less
in software visualisation than in the visualisation of continuous physical systems, it is
84
important to recognise the possibility of inaccuracies in the visualisation (e.g. caused by an
inappropriate or misleading abstraction technique).
Finally, Globus and Uselton discuss the evaluation of visualisations using human subjects.
They argue that as the purpose of visualisation is to improve human insight into data,
humans are best suited to evaluate the performance of a visualisation in achieving this.
Although insight cannot be measured directly, task performance can be used an indicator of
this. They state that is it easier to perform experiments that compare two visualisation
systems than experiments intending to evaluate or characterise a single system. Given the
experimental results, it may be possible to predict performance in related tasks, and to
predict the effect of making changes to the visualisation system. As with all subject-based
empirical evaluation, the choice of subjects is crucial and must be representative of the
intended user base of the visualisation.
2.7.2 Murphy et al. (1996)
Murphy et al. describe an evaluation of five static call graph extractors [Murphy 1996a,
Murphy 1998]. The tools analysed were cflow (a standard Unix tool), CIA [Chen 1990],
Field [Reiss 1995], mkfunctmap [Hoagland 1995], and rigiparse [Müller 1988]. These tools
were chosen as they are readily available, extract calls from C code in textual form, and all
run on the same platform (SunOS on a Sun SPARC). The aim of the evaluation was to
compare both quantitatively and qualitatively the call graphs produced by the tools. Three C
systems were used for the case study: mapmaker (a molecular biology application) [Lincoln
1993], mosaic (a web browser) [NCSA 2003], and gcc (the GNU C compiler) [GNU 2004];
these were intended to represent a variety of application domains.
Call graphs were generated for each of the three applications by each tool. The results were
then compared quantitatively (pairwise) to determine the number of calls detected by both
tools, and by one tool but not the other. Of course, a higher number of calls detected does not
make one tool better than another. The differences in calls detected are attributable to the
analysis algorithms employed by the tools (which are often not elucidated). The results were
also sampled and qualitatively analysed to assess the numbers of false positives (a call
detected where one does not exist) and false negatives (a call not detected where one does
85
exist) in the results. It was determined that all of the tools generate both false positives and
false negatives. It appears that the study was conducted by the authors.
The use of a number of different application types makes both the experimental procedure
and results more generally applicable than if only a single system, or type of system, was
considered. The use of objective evaluation criteria (number of calls detected, number of
false positives, etc.) increases the validity of the study.
2.7.3 Bellay and Gall (1997)
Bellay and Gall describe an experiment to evaluate the capabilities of four reverse
engineering tools [Bellay 1997, Bellay 1998]. The tools analysed were Refine/C [Reasoning
1994], Imagix 4D [Imagix 2004], Rigi [Müller 1988], and Sniff+ [Wind River 2003]. The
aim of the case study was to investigate the capabilities of the tools, and identify their
advantages and disadvantages in terms of applicability to embedded software, usability, and
extensibility. An industrial embedded train control system was used for the case study,
containing approximately 150 KLOC.
The assessment criteria were expressed as a checklist, which was formulated based on the
authors’ experience in applying the tools during the case study. The consequence of this is
that the checklist is likely to be biased towards the system and tools analysed in the case
study. The checklist was delineated into four categories: Analysis, Representation,
Editing/Browsing, and General Capabilities. The Analysis category is concerned with the
functionality and performance of the parser. The Representation section is concerned with
the features of the representation used and how quickly it is generated; for textual reports,
sorting is examined while for graphical reports the view type and editing facility is
examined. The Editing/Browsing category is concerned with text editor integration and
speed, and other user interface facilities, such as search and history functions. The General
Capabilities section covers support for multiple platforms and users, extensibility, and
storage, output, history, search, and help facilities.
The study found that the different tools are best suited to different usage contexts. The
general result is that tool performance and capabilities depend on the case study and
application domain as well as the purpose of the analysis. It appears that the evaluation was
86
carried out by (one of) the authors. It is not clear exactly what was done in terms of
analysing the software system in order to exercise the tools’ functionality.
2.7.4 Armstrong and Trudeau (1998)
Armstrong and Trudeau perform an evaluation to compare the functionalities of five
architectural extraction tools [Armstrong 1998]. The tools analysed were Rigi, Dali [Kazman
1999], PBS [Finnigan 1997], CIA, and SNiFF+. The aim of the evaluation was to assess the
extraction, classification, and visualization features of the tools. Two C systems were used
for the case study: the CLIPS expert system tool [Riley 2003], and a small test program that
was intended to be problematic for the tools to parse.
Similar to the study by Belay and Gall described in Section 2.7.3, the evaluation criteria
were in the form of a checklist. The criteria were devised based on the authors’ experiences
of using the tools. As in the Bellay and Gall study, this makes it likely that the assessment
criteria will be biased towards the particular tools and systems used for the case study, hence
making both the results and the assessment criteria themselves difficult to generalise.
The Extraction checklist assessed the functionality of the tools’ parsers, such as the
exclusion of library calls, the contents of C structs, and recursion. The Classification
evaluation did not have an explicit checklist associated with it, but was based on using the
tools to generate meaningful abstractions from the extracted data. The Visualization
checklist was concerned with issues such as the types of nodes and edges available, and the
navigation functionality.
Similar to the Bellay and Gall study, it was found that various features from the different
tools are useful but no one tool integrates all of the features that would be desired. Again, it
appears that the evaluation was carried out by (one of) the authors. It appears that general
exploration of the software systems was performed in order to exercise the tools’
functionality, though there is little detail about this.
87
2.7.5 Storey et al. (1996)
Storey et al. [Storey 1996a] describe the preparation and execution of an empirical study to
assess the usabilities of two interfaces to the Rigi reverse engineering tool [Tilley 1994,
Wong 1995]. The aim of the study was to compare the interfaces to each other and to
standard Unix command line tools (vi and grep). Three C game programs of similar
complexity but varying size (300-1700 LOC) were used in the evaluation. Storey et al.
evaluated the usability of the user interfaces by observing users completing a set of software
maintenance tasks followed by a questionnaire and an interview. The small tasks involved in
the Storey et al. study were intended to be typical of those performed by software
maintainers working towards a larger goal; a trade-off was necessary between experiment
time and task complexity. The tasks were divided into two groups of four tasks, ‘abstract’
and ‘concrete’, which were concerned with high- and low-level understanding, respectively.
The importance of experimental setup is stressed; a ‘dry run’ was conducted in advance,
which helped to refine the experiment. The subjects were given training in each of the
interfaces beforehand. In addition to the questionnaire and interview, the participants were
observed performing the tasks. Appropriate statistical tests were applied to the results. In
addition to useful results in terms of the relative usabilities of the tool interfaces, Storey et al.
also identify a number of improvements to the experiment, namely the need for a larger user
group, more tasks, longer time, and greater experimental control.
2.7.6 Sim and Storey (2000)
Sim and Storey describe a structured tool demonstration in which several reverse
engineering tools were evaluated using a common software system and set of analysis tasks
[Sim 2000a]. The aim was to provide a fair comparative demonstration of the capabilities of
the tools. They argue that tool evaluations in the literature tend to be ad hoc, tools are rarely
evaluated formally by users, and when they are evaluated it is for only a short time by people
unfamiliar with the tool. Such potential users often assess tools on superficial observations,
such as appearance or feature set, rather than factors such as ease of use or scalability. While
useful, the results of case study evaluations are often difficult to generalise. The structured
demonstration was intended to address some of these shortcomings in evaluation. The three
main contributions were the establishment of an evaluation benchmark for reverse
88
engineering tools, the combination of usability assessment with benchmarking, and the
development of a package of materials to facilitate future tool evaluations.
The six tools evaluated in the study were Lemma [von Mayrhauser 1999], PBS, Rigi, TkSee
[Singer 1997], Visual Age C++ [IBM 2004A], and Unix command-line utilities. Each tool
was used by a team of expert users, who were monitored by industrial observers in an
attempt evaluate the usefulness of the tool in industrial software maintenance. The teams
were presented with two reverse engineering tasks (‘Documentation’ and ‘Evaluate the
structure of the application’) and three maintenance tasks (‘Modify the existing command
panel’, ‘Add a new method for specifying arcs’, and ‘Bug fix: loading library objects’) to be
performed on the xfig 3.2.1 utility [Xfig 2003], which consists of 50 KLOC of ANSI C. The
teams were then asked to present the results of their investigation. The documentation
generated was less than expected, and there were some differences of opinion between the
teams. A number of issues regarding the tools themselves were also uncovered.
The evaluation found that different tools are best suited to different tasks, and it would be
useful to combine features from a number of tools. Additionally, it is important to
understand what the tool will be used for and select an appropriate tool accordingly. It is also
important that the cost of introducing the tool can be justified. Certain users may be biased
towards certain types of tool based on past experience. The organisers noted that a pilot
evaluation would have been helpful in refining the experimental design, and is an important
phase in any experimental evaluation. They also state that more inter-tool evaluations would
have been interesting, more explicit instructions may also have been helpful, and more time
(longer than one day) may have been desirable. The networking and teambuilding fostered
by the collaborative demonstration was also noted. A future demonstration is planned based
on parsing tools, which was an area that many of the teams had problems with.
2.7.7 Sim et al. (2000)
Sim et al. also present a number of observations regarding maintenance tools based on
structured demonstration [Sim 2000b]. The aim was the same as that of the structured
demonstration by Sim and Storey discussed in Section 2.7.6: to compare the capabilities of
the tools. Three of the tools from the Sim and Storey evaluation (Rigi, PBS, and Unix
utilities) were supplemented by two tools from the Workshop on Algebraic and Graph-
89
Theoretic Approaches in Software Reengineering 2000 (GUPRO [GUPRO 2004] and
Bauhaus [Koschke 2003]). Remarks on the tools and their application covered parsing,
flexibility, quantity of extracted data, experience, and reasons for participation, while
remarks on the demonstration scenario addressed the issues of tool selection, educational
value, fairness, replication, and results scalability. A collaborative reengineering exercise is
planned in which tools will be combined to address tasks.
2.7.8 Storey et al. (2000)
Storey et al. describe a subject-based evaluation of how program understanding tools affect
users’ comprehension strategies [Storey 1997, Storey 2000]. Thirty subjects were observed
carrying out a number of software comprehension tasks using the Rigi, ShriMP [Storey
1996b], and SNIFF+ tools. The goals of the study were to: examine the factors affecting the
subjects’ choice of comprehension strategy; observe whether the tools enhanced the subjects’
preferred comprehension strategy; devise a framework to characterise comprehension tools;
and provide feedback for tool developers. A number of comprehension strategies have been
proposed in the literature, such as: bottom-up [Shneiderman 1980, Pennington 1987], top-
down [Brooks 1983, Soloway 1984], knowledge-based [Letovsky 1986], systematic
[Littman 1986, Soloway 1988], as-needed [Littman 1986, Soloway 1988], and integrated
[von Mayrhauser 1995].
Participants were assigned randomly to one of the three tools, and were asked to complete a
number of comprehension tasks relating to an implementation of the Monopoly game [Brady
1974]. Each two-hour participant session consisted of orientation, training, practice tasks,
formal tasks, a post-study questionnaire, and a post-study interview and debriefing. During
the orientation phase, the outline of the experiment was explained to the subjects. The
training phase was used to familiarise the subjects with the basic functionality of the tool
they were to use. A number of practice tasks were used to allow the subjects to acquaint
themselves with using the tool. The formal tasks were observed and videotaped, with the
subjects encouraged to think aloud as they worked through the tasks. The questionnaire
consisted of questions regarding the tools’ usabilities. Finally, the interview and debriefing
was intended to stimulate further thoughts from the subject that may not have been expressed
during the experiment.
90
Statistically significant results were obtained regarding the usabilities of the tools and the
extents to which they supported each subject’s comprehension strategy. In general, it was
found that the tools did enhance the subjects’ preferred comprehension strategies while
carrying out the tasks, though there were instances where users were hindered by the tools.
In this study, participants frequently browsed hierarchies of abstraction. Future work is
intended to study fewer, more experienced subjects with a broader task set over a greater
time period.
2.7.9 Bassil and Keller (2001)
Bassil and Keller describe a questionnaire-based evaluation of visualisation tools [Bassil
2001a, Bassil 2001b]. The questionnaire was available on the web and its location was
publicised via mailing lists, newsgroups, and email. 107 responses were received, concerning
more than 40 tools. This wide user base may make the results more generalisable. The aim of
the study was to assess the functional, practical and cognitive aspects of visualisation tools
that users desire, and how these compare to the functionality available in the various tools.
The questionnaire was designed around a list of properties of software visualisation tools,
extracted from existing taxonomies. This should result in an objective questionnaire. The
questionnaire consisted of two sections; the first for all software visualisation tool users, and
the second for expert users. In the first section, participants were asked about their work
context, the software systems they visualise, functional and practical aspects of software
visualisation tools, and the tool they use. The functional aspects were assessed in terms of a
list of 34 functional properties, such as source code browsing, graph visualisation, zooming,
and program slicing. The second section asked technical questions about the software
visualisation tool used by the participants. Practical aspects were investigated in terms of a
list of 13 aspects, such as tool cost, availability of technical support, ease of use, and
portability. Most questions were closed, though there were fields to allow expanded answers
for some questions.
Statistical analyses were performed on the survey results to reveal trends in the responses.
Although the small number of participants compared to the number of tools makes results for
individual tools insignificant, some statistically significant correlations were identified. The
most interesting correlations in terms of the work presented here were as follows. There was
91
a positive correlation between software system size and desire to visualise it graphically, and
a negative correlation between size and a desire to jump straight to the source code. There
was also a higher correlation between source code visualisation and procedural systems than
object-oriented systems. Lastly, there were high correlations between analysing object-
oriented software and the desire for hierarchical representations, and between OO software
and the ability to navigate across hierarchies.
Areas for improvement were identified as finer grained choices, more open questions, and
different surveys for different visualisation tool types. Future work is to include additional
statistical analyses (e.g. factor and cluster analyses), more targeted surveys, and industrial
integration.
2.7.10 Hatch et al. (2001)
Hatch et al. describe the strengths and weaknesses of four strategies for software
visualisation evaluation [Hatch 2001]. Guidelines and frameworks can be useful during the
initial formulation of a visualisation, but can also be used during evaluation if appropriate.
Care must be taken to avoid ‘self-measurement’ when a visualisation is constructed
according to a set of guidelines then evaluated against those same guidelines. Feature-based
evaluation frameworks are useful for assessing the features of a visualisation against a set of
questions. Care must be taken to select appropriate questions and question types. Also,
current frameworks often omit potential negative features of the visualisation. Scenarios and
walkthroughs allow a visualisation to be evaluated according to specific tasks, though it is
easy to show the visualisation in its best light, while hiding undesirable features. However,
the evaluation is influenced by the user and their biases. User and empirical studies can be a
valuable source of evidence for evaluation. However, these entail overheads such as
selecting and training subjects and analysing the results, and are also subject to user bias. It
may be possible to perform statistical analyses on the results, though care must be taken
when attempting to generalise any findings.
92
2.7.11 Knight (2001)
Knight discusses briefly some considerations to be taken into account when deciding
whether or not a visualisation is effective [Knight 2001]. This is expressed in the form of an
equation: effectiveness = suitability for task(s) + suitability of representation, metaphor, and
mapping based on the underlying data. It is important to take into account influences from
the domain for which the visualisation was designed, and also the dataset that it was intended
to visualise.
2.7.12 Kollmann et al. (2002)
Kollmann et al. evaluate four static UML-based reverse engineering tools [Kollmann 2002a].
The tools evaluated were Together [Borland 2004b], Rational Rose [IBM 2004b], Idea
[Kollmann 2002b], and Fujaba [Wikman 1998]. The aim of the case study was to compare
the class diagram generation facilities of the tools. The Java-based Mathaino legacy user
interface migration tool was the subject of the case study [Kapoor 2001].
The tools were assessed quantitatively by examining various properties of the class diagrams
they produced from the program code, such as the number of classes, types of associations,
multiplicities, and role names. The results were compared by performing model operations
using the BMO Toolkit [Koskinen 2001]. While basic diagram generation results were
broadly similar across the tool set, the research tools were able to handle more advanced
diagram concepts than the industrial tools, such as multiplicities, inverse associations, and
container resolution. It appears that the investigation was carried out by the authors, though
there is little information on the experimental procedure. The use of the BMO Toolkit to
compare the quantitative results should result in an objective comparison of the tools’
capabilities, provided the chosen measures provide an accurate reflection of the tools’
capabilities.
2.7.13 Conclusions
This chapter has provided a foundation for the thesis by discussing software visualisation
techniques; comparing the extant software visualisation tools; exploring the concept of
93
abstraction; presenting diagrams for visualisations, views to organise them, and techniques
for exploring and querying visualisations; reviewing related work from the field of software
modelling; and surveying evaluation techniques from the fields of software comprehension
and visualisation.
The basic software visualisation techniques presented demonstrate the requirement for
appropriate extraction, analysis, and presentation mechanisms for software visualisations. An
abstraction scale and assessment criteria were presented in order to compare the extant
software visualisation tools. The principal conclusion from this comparison was that the
current tools address only a single level of abstraction, or a very small range of levels, and
thus several of the existing tools would require to be employed in order to address the full
range of software comprehension tasks. This conclusion led on to a discussion of the concept
of abstraction and its application in software engineering and visualisation. A range of
diagram types observed in the extant tools and recent literature were then discussed, and it
was concluded that arranging such diagrams in an interrelated abstraction hierarchy would
increase their utility and aid comprehension, in line with the conclusions of the tool
comparison. This conclusion led on to an analysis of view arrangements for organising
visualisations; it was concluded that the most effective arrangement was that of multiple
interdependent views illustrating multiple facets. Techniques for exploring and querying
visualisations were discussed briefly; again, some of these techniques were observed in the
comparison of the extant visualisation tools. A discussion of relevant work from the related
field of software modelling was then presented; this discussion provided a useful perspective
from the point of view of the underlying model, in contrast to the externally observed
visualisation.
A variety of evaluation techniques from software comprehension and visualisation studies
were discussed. This survey consisted of both qualitative and quantitative studies, ranging
from small-scale studies conducted by the authors, through medium-scale studies consisting
of around 10-30 participants, to large-scale studies of over 100 participants. The evaluation
techniques included standard tests, checklist, specific tasks, interviews, observation,
questionnaire, and scenario walkthrough. Murphy et al.’s use of a broad base of application
types makes both their experimental procedure and results more generalisable [Murphy
1996a, Murphy 1998]. Their use of objective evaluation criteria increases the accuracy of the
study. The use of a specific checklist, for example by Bellay and Gall [Bellay 1997, Bellay
1998] and Armstrong and Trudeau [Armstrong 1998], reduces the generalisability of their
94
results. Storey et al. [Storey 1996a, Storey 1997, Storey 2000], Sim and Storey [Sim 2000a],
and Sim et al. [Sim 2000b] analyse performance in typical software comprehension tasks as
a basis for tool evaluation, and advocate the use of multiple subjects for studies. The use of a
questionnaire, for example by Bassil and Keller, allows a broader base of participants [Bassil
2001a, Bassil 2001b]. The use of an automated technique to compare quantitative study
results by Kollman et al. should improve the accuracy and objectivity of the analysis
[Kollmann 2002a].
The survey by Bassil and Keller reveals that as the size of software grows, analysts are more
likely to employ graphical visualisations, and also that analysts are less likely to go straight
to the source code [Bassil 2001a, Bassil 2001b]. This survey also shows that analysts are less
likely to examine source code visualisations for object-oriented software, which implies that
visualisations at a higher level of abstraction than source code are most useful in the context
of OO systems. The studies by Storey et al. [Storey 1997, Storey 2000] and Bassil and Keller
both describe users navigating hierarchies of abstraction.
This chapter has presented related work and compared the extant software visualisation
tools. In order to assess the capabilities of these tools comprehensively an empirical study is
required that will allow us to compare the real world performance of these tools objectively.
Such a study will also highlight areas of improvement and potential for future work.
95
3 Initial Study
“Descriptions of visualisation systems rarely specify any particular task that they are intended to support.” M Petre, A F Blackwell, T R G Green [Petre 1997]
3.1 Introduction
This section describes a study to evaluate a selection of the extant software visualisation
tools. The motivation for this work was the lack of use of software visualisation tools in
industry despite their apparent potential. Therefore, the tools will evaluated by assessing
their performance in a variety of software visualisation tasks. This is the approach taken in
the studies by Storey et al. [Storey 1996a, Storey 1997, Storey 2000], Sim and Storey [Sim
2000a], and Sim et al. [Sim 2000b] described in Section 2.7. The tasks take the form of
questions that an analyst would find it useful to be able to ask about a software system. The
questions are divided into two sets: general software comprehension questions consider the
entire system, and are typical of those that would be asked in a general software
comprehension effort; specific reverse engineering questions address only a part of the
system, and are typical of those asked while carrying out a specific reverse engineering task.
The goal of this study is to assess and compare the capabilities of the extant software
visualisation tools to determine where there is scope for improvement and hence future
research. The study was carried out by a single analyst who attempted to use the tools to
address the questions. Further detail is presented in the lab book in Appendix A.
The JHotDraw semantic drawing editor framework [Gamma 1998] was chosen for this case
study as it a reasonably complex, real-life application framework typical of the type of
system that would be subject to software comprehension and reverse engineering efforts.
HotDraw is also widely used as a case study in the literature.
The tools evaluated were Together diagrams, Jinsight, jRMTool, AVID, and Together
debugger. These tools were chosen as implementations capable of analysing Java programs
96
were available.4 All tasks were carried out on a minimally loaded AMD Athlon XP 2100+
machine with 512MB RAM running Windows 2000 Professional.
3.2 Generic questions
These generic questions can be reused for the evaluation of any type of software
comprehension tool in the context of any specific system. The general software
comprehension questions are immediately reusable, while the specific reverse engineering
questions can be instantiated within the context of the system being used for the evaluation.
3.2.1 General software comprehension questions
The following questions are intended to be typical of those asked during the course of a
software comprehension effort. Questions G1-G6 are inspired by the six ‘overall
understanding’ questions of Systä et al. [Systä 2001, p.378]. Questions G7 and G8 address
issues that are particularly relevant to framework reuse, while G9 is an important software
comprehension issue.
G1 What is the static structure of the software system?
G2 What interactions occur between objects at runtime?
G3 What is the high-level structure/architecture of the software system?
G4 How do the high-level components of the software system interact?
G5 What patterns of repeated behaviour occur at runtime?
G6 What is the load on each component of the software system at runtime?
G7 What design patterns are present in the software system's implementation?
G8 Where in the software system are the hotspots where additional functionality can be
added?
G9 What impact will a change made to the software system have on the rest of the software
system?
4 Although Dali is retargetable, a tool to provide dynamic information from Java programs in the format required by Dali was not available.
97
3.2.2 Specific reverse engineering questions
The following questions are intended to be typical of those asked during the course of a
specific reverse engineering effort. Questions S1, S2, and S6 are inspired by the ‘goal-driven
reverse engineering’ and ‘object/method behaviour’ questions of Systä et al. [Systä 2001,
p.378]. Questions S3, S4, and S5 address issues typically encountered in framework
comprehension [Kirk 2001] and are typical maintenance activities.
S1 What are the collaborations between the objects involved in an interaction?
S2 What is the control structure in an interaction?
S3 How can a problem solution be mapped onto the functionality provided by the software
system?
S4 Where is the functionality required to implement a solution located in the software
system?
S5 What alternative functionalities are available in the software system to implement a
solution?
S6 How does the state of an object change during an interaction?
3.3 Specific reverse engineering questions specified for JHotDraw
The system used for this case study was an orrery simulation consisting of 133 classes
constructed as a sample solution to a final year undergraduate software architecture
assignment at the University of Strathclyde, Glasgow. A JHotDraw drawing editor consists
of a drawing containing figures and connections between them, and a set of tools for creating
and manipulating the drawing elements. The orrery application is shown in Figure 3.1. The
source code for the application was available in the Orbit and CH.ifa.draw.*
packages. Javadoc documentation was available for the JHotDraw classes, but not for the
orrery extension. The coursework assignment worksheets provided some background to the
application functionality. The following questions instantiate the specific reverse engineering
questions for the JHotDraw domain.
98
Figure 3.1 The orrery application. The circles represent astronomical bodies, such as planets and
moons, coloured according to their diameter. A blue border around a planet represents atmosphere.
The satellite icons represent satellites. The directed arcs indicate gravitational attraction. The toolbar
on the left is used to select diagram objects, and to create planets, satellites (orbiting and non-
orbiting), atmosphere, and gravity
J1 A common problem in JHotDraw applications is the display not being updated as desired
when a change is made to the model. For example, attempting to move a planet (represented
by an object of type Figure) in an orrery application may not be reflected in the display.
To understand this problem, it is necessary to investigate the redraw mechanism of
JHotDraw. The redraw mechanism is an interaction consisting of a sequence of object
collaborations.
(Answer: The correct sequence of method calls is Figure.willChange(),
Figure.invalidate(), Figure.changed(), then Figure.invalidate().)
J2 When a Figure object is moved or has its dimensions changed, there may be erratic
changes both to this Figure and to other Figure objects to which it is connected (by
ConnectionFigure objects). For example, an orrery application may represent three
planets (as Figure objects) A, B, and C, and gravity between them (as
ConnectionFigure objects), such that A is connected to B, B to C, and C to A. If the
Figure objects are connected such that moving one planet also moves those planets
99
connected to it, then moving A would cause C to move, which would in turn cause B to
move, which would then cause A to move, resulting in an infinite loop of Figure
movements. To understand this problem, it is necessary to investigate the way in which
JHotDraw deals with cyclic constraints such as this. Interactions in JHotDraw can use an
implicit or explicit control structure; the control structure used is important in solving
problems such as this.
(Answer: It is JHotDraw's use of an implicit invocation mechanism to enforce constraints
that causes this problem. This can be circumvented by the use of explicit invocation, as in
the Pert Chart example application provided with JHotDraw.)
J3 JHotDraw applications often require collision detection, so that action can be taken when
two figures 'collide' (i.e. overlap on the diagram). For example, in an orrery application, it
may be desirable to detect when a Figure representing an asteroid crosses the connection
(i.e. overlaps the ConnectionFigure) between two Figures representing a planet and
a satellite orbiting that planet respectively. To understand this problem, it is necessary to
investigate the mechanism by which JHotDraw determines the locations of Figures in a
drawing. Collision detection is not provided natively in JHotDraw; therefore, the solution to
the collision detection problem must be mapped onto the functionality available in
JHotDraw.
(Answer: JHotDraw uses the concept of a ‘display box’ to define the location of a Figure.
This can cause problems by constraining all Figures to be rectangles.)
J4 Question J3 describes how collision detection can be implemented in JHotDraw by testing
when two Figures’ display boxes overlap. For example, if the display box of a Figure
representing a planet in an orrery application overlaps with that of a Figure representing
an asteroid, then a collision would have occurred. In order to implement this solution, it is
necessary to investigate how a Figure’s display box can be obtained, and how display
boxes can be tested to determine whether they overlap. In order to implement the solution to
the collision detection issue described in Question J3, it is necessary to identify the location
of the required functionality in JHotDraw.
(Answer: Figure.displayBox() returns a Figure’s display box as an object of type
java.awt.Rectangle. The Rectangle.intersects(Rectangle) method can
then be used to test if two rectangles intersect.)
J5 When Figures in a diagram are moved or resized, they may also be resized or moved
100
unexpectedly. For example, when moving a planet (represent by a Figure) in an orrery
application the planet may appear larger or smaller than expected, or when resizing a planet
its position may change. To understand this problem, it is necessary to investigate the way in
which Figures are moved and resized in JHotDraw. JHotDraw provides a number of ways
of altering the position and/or dimensions of a Figure, and it is necessary to select the
appropriate functionality.
(Answer: Figure.displayBox(java.awt.Point, java.awt.Point) and
Figure.displayBox(Rectangle) allow both the position and dimensions of a
Figure to be changed in one operation. Figure.moveBy(int, int) can be used to
move a Figure without changing its dimensions.)
J6 When debugging a JHotDraw application, it may be important to examine the internal
state of objects in the diagram. For example, in an orrery application, a Figure object
representing a planet would contain a reference to the mass of the planet it represents. In
order to extract such information, it is necessary to investigate the way in which an object’s
state changes during the course of an execution.
3.4 Together diagrams
Together was successful in producing a model of the static structure of the system in the
form of a class diagram. Its statically derived interaction diagrams could be used to give an
approximation of the runtime behaviour of a single method. There is no functionality for
identifying high-level structural components or interactions, save for what can be determined
by the analyst from the class and interaction diagrams. Behavioural and design patterns are
not automatically identified. The lack of runtime information makes it impossible to measure
the load on system components. There is no way to identify hotspots automatically. Some
idea of change impact analysis can be obtained using the ‘Search for Usages’ function,
which identifies all code locations where an attribute, method, class, interface, or package is
used.
101
Together coped well with the specific reverse engineering questions J1-J5: it was able to
answer the questions on object collaboration, control structure, mapping, and functionality
identification. However, Together’s lack of dynamically-extracted information prevents it
from observing changes to the state of an object at runtime.
The strengths of Together were seen as:
• the comprehensiveness of its diagrams due to their generation from source code; and
• its ‘Search for Usages’ functionality.
Together’s principal weaknesses are attributable to its lack of dynamically-extracted
information:
• while the diagrams are broad in scope they lack depth;
• it is impossible to focus the diagrams on a particular part of the system’s execution;
• it is difficult to know which are the ‘interesting’ methods for which the analyst
should create sequence diagrams;
• sequence diagram generation can be time-consuming;
• references to (methods of) interfaces and abstract classes cannot be resolved to
objects, as the implementing/extending class cannot be determined statically;
• references to subtypes cannot always be fully resolved, as it is not possible to
determine statically whether an object is an instance of the supertype or of one of its
subtypes. For example, a reference in a statically derived sequence diagram may be
to an object of type Figure, which could resolve to an object of any subtype of
Figure (e.g. EllipseFigure) at runtime; and
• the inability to examine internal object state.
3.5 Jinsight
Jinsight was not able to give information on the static structure or high-level architecture of
the system. It provides an array of diagrams for examining dynamic behaviour, but cannot
display behavioural information for high-level components. The execution pattern view was
used to identify patterns of repeated behaviour. The execution view and object histogram can
be used to identify high-activity classes and methods. Jinsight does not support the
identification of design patterns or hotspots for extension. The method histogram and
102
invocation browser can be used in conjunction with the execution view to identify where
methods are used, which would be useful for change impact analysis.
Jinsight was able to answer questions on object collaboration and control structure. The size
of the diagrams made it difficult to identify how a solution could be mapped onto the
framework. The lack of a static view hindered the identification of framework functionality.
Jinsight does not support analysis of objects’ internal state.
The strengths of Jinsight were considered to be:
• a variety of dynamic views;
• accuracy of its diagrams due to dynamically-extracted information; and
• automatic behavioural pattern identification.
The weaknesses of Jinsight were seen as:
• difficulty in focussing the visualisation due to the size of the diagrams;
• lack of a static representation of the software system;
• lack of generality in its diagrams resulting from a lack of statically-extracted
information; and
• the inability to examine internal object state.
3.6 Reflexion models
Reflexion models are at too high a level of abstraction to show basic static structure or object
interactions. The architecture and high-level interactions were clearly shown in the reflexion
model. Only a very general, aggregated impression of patterns of repeated behaviour and
runtime load were evident in the reflexion model. The identification of design patterns and
extension hotspots were both below the level of abstraction provided by the reflexion model.
Change impact can be investigated by altering the input high-level model or the mapping
from source to high-level entities.
Reflexion models are at too high a level of abstraction to illustrate object collaborations,
control flow, alternative functionalities, or object state. They would be useful for mapping
problems at a higher level of abstraction.
103
The strengths of the reflexion model technique are:
• it illustrates the software system architecture;
• it illustrates the high-level interactions in the system; and
• it enables the analyst to validate their model of the system.
The weaknesses of reflexion models were felt to be:
• the reflexion model technique relies on the analyst to provide an adequately accurate
high-level model as input; and
• reflexion models are at too high a level of abstraction for them to answer specific
reverse engineering questions, such as those relating to object interactions or internal
state.
3.7 Together debugger
Although static information is not shown, dynamic information can be output by setting
breakpoints at ‘interesting’ methods or classes. High-level structural and behavioural
information is above the low level of abstraction provided by the debugger. There is no
functionality to detect repeated patterns of execution, or to show runtime component load.
Questions relating to design patterns and extension hotspots are at too high a level of
abstraction to be answered using a debugger. Basic change impact analysis can be performed
by comparing the output from executions before and after the change.
If breakpoints can be accuracy placed at ‘interesting’ methods, questions about object
collaborations and control structure can be answered straightforwardly. The lack of a view of
the whole system makes mapping problems and identifying functionality difficult. The
dynamically extracted nature of the information means that alternative functionalities are not
always apparent, and the lack of full method signatures makes method identification
confusing. The debugger was able to query internal object state conveniently.
The strengths of the Together debugger are as follows:
• the low level of abstraction would be useful for finding code-level errors;
• dynamically extracted information gives precise output;
• integration with source code makes setting and monitoring breakpoints and watches
more convenient;
104
• diagram animation during debugging assists comprehension; and
• the ability to examine internal object state.
The weaknesses of the debugger were found to be:
• the low level of abstraction makes it impossible to answer many higher-level
questions, such as those relating to the system architecture;
• lack of statically extracted information means only a subset of possible behaviour is
shown;
• unlike some other debuggers, such as jdb [Sun 2002], the Together debugger
requires source code, which may not always be available, particularly for legacy
systems;
• it can be very time-consuming to set each breakpoint manually. To obtain
information comparable to that provided by a tracing tool, method breakpoints
would be required at every method in the system; and
• it is often difficult to know where to set breakpoints. Setting breakpoints for every
method would result in information overload.
3.8 Case study summary
A summary comparison of the five software visualisation tools evaluated in the case study is
given in Table 3.1 and Table 3.2. Table 3.1 evaluates the performance of each tool on the
question set; Table 3.2 assesses the performance of the tools on each question. These
comparisons assess each tool’s performance in each task simply as yes/no: if a tool
performed a task sufficiently well, it received a ‘yes’, otherwise a ‘no’.
105
Tabl
e 3.
1 To
ols s
umm
ary
com
paris
on
Too
l
Ext
ract
ion
tech
niqu
e
(Sec
tion
2.1.
6)
Ana
lysi
s
tech
niqu
e
(Sec
tion
2.1.
7)
Pres
enta
tion
tech
niqu
e
(Sec
tion
2.1.
8)
Abs
trac
tion
leve
l
(Sec
tion
2.2.
1.2)
GSC
perf
orm
ance
(/9)
SRE
perf
orm
ance
(/6)
Ove
rall
perf
orm
ance
(/15)
Toge
ther
diag
ram
s
Stat
ic
Abs
tract
ion
UM
L di
agra
ms
2-3
3
{G1,
G2,
G9}
5
{J1,
J2, J
3, J4
,
J5}
8
53%
Jins
ight
D
ynam
ic
(pro
filer
)
Patte
rn re
cogn
ition
,
abst
ract
ion,
susp
ensi
on5
MSC
-bas
ed
2-3
4
{G2,
G5,
G6,
G9}
4
{J1,
J2, J
3, J5
}
8
53%
jRM
Tool
St
atic
A
bstra
ctio
nG
raph
-bas
ed4
3
{G3,
G4,
G9}
0 {}
3
20%
AVID
D
ynam
ic
(pro
filer
)
Abs
tract
ion,
susp
ensi
on
Gra
ph-b
ased
4
3
{G3,
G4,
G9}
0 {}
3
20%
Toge
ther
debu
gger
Dyn
amic
(deb
ugge
r)
Sele
ctiv
e
inst
rum
enta
tion,
susp
ensi
on
Text
ual
11
{G2}
3
{J1,
J2, J
6}
4
27%
5 ‘S
uspe
nsio
n’ re
fers
to th
e ab
ility
to su
spen
d an
d re
sum
e tra
cing
.
10
6
Table 3.2 Questions summary comparison
General software comprehension Specific reverse engineering
Question Success (/5) Question Success (/5)
G1 1 J1 3
G2 3 J2 3
G3 2 J3 2
G4 2 J4 1
G5 1 J5 2
G6 1 J6 1
G7 0
G8 0
G9 4
TOTAL 14/45 (31%) TOTAL 12/30 (40%)
OVERALL 26/75 (35%)
It is clear from Table 3.1 that Together diagrams and Jinsight were able to answer the most
questions (53%), whereas jRMTool and AVID could answer the fewest (20%). Comparing
tools of similar abstraction levels that use different extraction techniques indicates that the
choice of statically or dynamically extracted information does not affect significantly the
number of questions the tool can answer. This was surprising, although a larger case study
involving more tools would be required before any strong conclusions could be drawn from
this result. Table 3.1 also shows that the reflexion model technique is unsuitable for specific
reverse engineering questions whether statically or dynamically extracted information is
used. It would be interesting to assess in this way the performance of a tool that combines
both types of information, such as Shimba (see Section 2.2.10) [Systä 2001]. With an
abstraction level of 2-4, Shimba addresses a wider range of abstraction levels than any of
tools in this case study. This range of abstraction levels, combined with the inclusion of both
statically and dynamically extracted information, should allow Shimba to perform well in
both the general software comprehension and specific reverse engineering questions. Shimba
would be expected to be useful in answering a higher proportion of questions than the tools
considered in this case study. Unfortunately, Shimba was not available for evaluation.
107
Table 3.1 reveals that an abstraction level of around 2-3 is optimal in terms of answering the
most questions. Moving away from this point, for specific reverse engineering questions, the
tools become less effective as their abstraction levels move towards the higher (macroscopic)
end of the scale, while for general software comprehension questions the opposite is true. As
expected, tools that employ abstraction as an analysis technique were able to answer more
general software comprehension questions than the tool that did not (Together debugger).
However, increasing the level of abstraction still further resulted in worse performance in
specific reverse engineering questions than if no abstraction were used. A larger case study
involving more tools is required before further conclusions can be drawn regarding the
effectiveness of the presentation techniques, analysis techniques (other than abstraction), or
dynamic extraction techniques.
Table 3.1 shows that tools employing solely behavioural information (Jinsight and Together
debugger) were not capable of answering questions relating to the structure of the software
system. This implies that a combination of structural and behavioural information is required
to address all tasks.
Table 3.2 shows that the tools were more successful in answering the specific reverse
engineering questions: 40% compared to 31%. It also shows that, on average, a tool could
answer only 35% of the questions. This may imply that a single software comprehension tool
may not be adequate for all tasks. Kazman and Carrière [Kazman 1999] posit that this is the
case for architectural extraction, and Richner and Ducasse [Richner 2002a] say this with
regard to design recovery. However, it may also suggest that tools require a combination of
both statically and dynamically extracted information to perform well in all tasks.
No tools were able to answer either of the general software comprehension questions L7
(What design patterns are present in the software system's implementation?) and L8 (Where
in the software system are the hotspots where additional functionality can be added?). Keller
et al. [Keller 1999] describe the role of the SPOOL environment in assisting an analyst in
locating three design patterns (Template Method [Gamma 1995 pp.325-330], Factory
Method [Gamma 1995 pp.107-116], and Bridge [Gamma 1995, pp.151-161]) in C++ code.
Tonella and Antoniol [Tonella 1999] describe a technique based on concept analysis [Siff
1997] and illustrate its use in identifying instances of the Adapter pattern [Gamma 1995,
pp.139-150]. The work by both Keller et al. and Tonella and Antoniol stresses the role of the
human analyst in identifying design patterns. Demeyer [Demeyer 1998] discusses hotspot
108
identification in Smalltalk HotDraw; the technique employed identified a large number of
false positives. Schauer et al. [Schauer 1999] describe the use of the SPOOL environment in
identifying hotspots in C++ code, and emphasise the importance of the human analyst.
However, Codenie et al. [Codenie 1997] contend that building applications by extending
framework hotspots is too simplistic an approach for real-world problems. These papers
reveal that detecting design patterns and hotspots is a non-trivial task, and one that can
benefit from tool support.
As discussed in Section 2.1.3, the complex interactions typical of object-oriented software
systems mean that dynamic analysis is often more appropriate than static analysis for
software comprehension tasks. However, dynamic analysis captures only a subset of the
possible behaviour of the program. The diagrams produced by dynamic analysis are narrow
and deep, while those produced by static analysis are wide and shallow. Figure 3.2 illustrates
this comparison. The fchild object is an attribute of initial. In practice, fchild is an
instance of one of three Tool subclasses: HandleTracker, DragTracker, or
SelectAreaTracker. This statically extracted diagram shows the three possible
outcomes of the user clicking on some part of a diagram: on either a handle, a figure, or a
blank space, respectively. The diagram cannot show messages that occur after message 1.8
(the call to mouseDown (e, x, y):void), as these depend on the type of fchild,
which is known only at runtime and hence cannot be determined statically. The diagram is
wide but shallow: it shows all of the possibilities. A dynamically extracted diagram would
show one possibility in more detail: it would be narrow but deep.
There are three possibilities for combining the benefits of both static and dynamic
information to produce a suitably wide and deep visualisation. Firstly, the analyst can ensure
that a representative trace is extracted. This raises the questions of how the analyst can
ensure that the trace is representative, and how he knows that it is representative enough for
the task at hand. Most dynamic analysis tools implicitly require the analyst to perform this
function. Secondly, a tool can combine multiple event traces into a single visualisation; this
approach is used in Dali and RMTool. Thirdly, statically and dynamically extracted
information can be combined, as in Shimba. The key problem of ensuring a representative
trace is inherent in dynamic visualisation, even when one of the latter two techniques is
employed.
109
Figure 3.2 The sequence diagram drawn by Together for the
CH.ifa.draw.standard.SelectionTool.mouseDown() method, illustrating the wide and
shallow diagrams produced by static analysis
It is clear from the case study results in Table 3.2 that no one software visualisation tool
answers all questions that are typical of a software comprehension or reverse engineering
effort. Some tasks are less well supported than others, and some tasks are beyond the
capabilities of all the tools. This implies that current software visualisation tools are not
adequate in isolation for supporting software comprehension, and must be employed along
with other software comprehension tools if all typical issues are to be addressed. The above
results also reveal that the application of software visualisation tools in combination can
improve comprehension performance. Tools employing higher levels of abstraction were
more successful in addressing general software comprehension questions, while those using
110
a lower level of abstraction were more useful for specific reverse engineering questions;
tools employing an abstraction level of 2-3 were most generally effective. The results
suggest that a combination of structural and behavioural information may be required to
address all comprehension tasks effectively. The results also suggest that a combination of
statically and dynamically extracted information may improve performance. The
visualisations generated from statically extracted data are more general but less precise than
those obtained from dynamically extracted data: statically extracted visualisations are wide
but shallow, while dynamically extracted visualisations are narrow but deep. The lack of a
single software visualisation tool that performs well in all tasks is likely a large contributory
factor in the lack of use of software visualisation tools outwith the context of research.
Analysts are evidently using alternative types of tool to obtain the information they require
for software comprehension.
3.9 Conclusions
The principal conclusion from this initial study is that the abstraction level of a tool is crucial
in determining which questions it can answer. It is clear that a range of abstraction levels
would be required to address the full range of comprehension questions from the study. It is
also observed that the use of statically and dynamically extracted information, and structural
and behavioural perspectives, allows different questions to be addressed. If all of the tools
were used in combination it should be possible to address almost all of the tasks. Therefore,
as the questions in this initial study were typical software comprehension tasks, a tool that
combines the desirable properties of these individual tools would be expected to perform
well in real-world software comprehension tasks.
111
4 A Novel Software Visualisation Model
“Model-based simulation is like a gem: it is multifacetted”
T I Ören [Ören 1984]
4.1 Background
The initial study to assess the capabilities of software visualisation tools found that no single
tool examined was capable of satisfying slightly more than half of the typical software
comprehension and reverse engineering tasks set. However, if all five of the tools in the
study were used in combination, it should be possible to address 13 out of the 15 tasks.6
However, such an arrangement of multiple independent views would cause the analyst
cognitive difficulties in reconciling the multiple views and transferring information between
them, as described in Section 2.4.2. It is clear from these results that a tool combining the
desirable properties of the individual tools in the previous study would perform well in these
representative software comprehension tasks. It would therefore be reasonable to expect that
such a tool would be useful in real world software comprehension.
4.2 Research hypothesis
It is proposed that a model that supports visualisation of software through a range of
abstraction levels that incorporate structural and behavioural views and integrates statically
and dynamically extracted information will provide effective support for the full range of
software comprehension tasks.
6 The remaining two tasks involved automatic framework hotspot and design pattern detection. Though they are amenable to visualisation, these are non-trivial tasks that require a high level of analyst interaction.
112
4.3 A visualisation model for object-oriented software
In order to combine the benefits of these alternative approaches to extraction, analysis,
presentation, and abstraction, this thesis proposes a multifaceted, three-dimensional
abstraction model for software visualisation. Similar to the abstraction scale proposed in
Section 2.2.1.2, the first dimension of the model consists of a number of abstraction levels
from microscopic to macroscopic. This arrangement allows the analyst to explore the
software system at the level(s) of abstraction appropriate to the comprehension task they are
undertaking. The second dimension of the model consists of a number of facets [Jahnke
2002], each representing some property of the system. The use of interrelated facets allows
the analyst to examine a property of the software system individually or in combination,
allowing them to focus the visualisation on the information appropriate to their query.
The model shown in Table 4.1 is proposed. The principal challenges associated with this
model are the way in which information extracted from the software system will be
represented, how view hierarchies will be generated from this information, and the definition
of inter- and intra-hierarchy relationships between views. It will also be important to identify
which views are useful for a variety of comprehension tasks.
Five levels of abstraction are chosen to represent OO systems; the program code can be
considered to be at level 0 as it is the least abstract representation of the software. Structure,
behaviour, and data have been selected as the three facets, as these are the principal elements
of typical OO systems. Classes, packages, and files provide structural abstractions;
procedures, functions, and – in OO systems – methods and interfaces provide behavioural
abstractions; and abstract data types provide data abstractions. (Jahnke et al. [Jahnke 2002]
also use these three facets.) Each abstraction level of each facet is a view and consists of a
name, a description, a set of entities and relationships, and example diagram types that can
be used to illustrate information from the facet at the specific level of abstraction. It is
intended that the analyst will be able to move conveniently between these views during the
course of their investigation in order to examine the information relevant to their task. The
views selected are intended to represent the information that an analyst would find useful
during software comprehension.
113
Diagrams (with the exception of storyboard diagrams7) that can illustrate information in
more than one facet appear at the same level of abstraction in each facet of the model,
though this is not a requirement. Each facet need not have the same number of abstraction
levels. There are no diagrams at level 5 or 1 of the structure facet. This is because the system
structure is not relevant at a business level (only behaviour and the data it operates on are
specified), and the internal structure of classes is not relevant or visible outside the class.
There are no diagrams at level 4 of the behaviour facet. This is because the behaviour
distribution is dictated by, and therefore encapsulated in, the structure distribution. There are
no diagrams at level 3 of the data facet. Abstract data types are typically described using
textual descriptions, algorithm pseudocode, and specific pictorial representations.
Robustness analysis diagrams bridge between levels 5 and 2: they relate business entities to
classes. System context diagrams bridge between levels 5 and 3: they relate business entities
to components.
Table 4.1 The proposed visualisation model for object-oriented software
Abstraction level Structure Behaviour Data
Business structure Business behaviour Business data
The structure defined
by the high-level
business goals of the
system
The behaviour
defined by the high-
level business goals
of the system
The data defined by
the high-level
business goals of the
system
{} {BusinessEntity} {BusinessEntity}
{} {BusinessRule} {DataDependency}
5
(macroscopic)
{} {Use case diagram,
business process
diagram, robustness
analysis diagram}
{Entity relationship
diagram, XML
structure diagram,
robustness analysis
diagram}
7 This is because SBDs illustrate data and behaviour at a low level, but are contained within a package. The SBD implementation considered in this report is that described by Jahnke et al. [Jahnke 2002].
114
System structure
deployment
System behaviour
distribution
Data distribution
The structural
deployment of the
system
The behavioural
distribution of the
system
The distribution of the
system’s data
{Component,
Machine}
{} {DataObject,
Machine}
{Dependency,
Containment}
{} {Dependency,
Containment}
4
{Deployment
diagram}
{} {Deployment
diagram}
System architecture Component
interaction
Abstract data types
The structural
relationships between
the system’s high-
level components
The behavioural
relationships between
the system’s high-
level components
The abstract data
types used to
encapsulate the
system’s data
{Component} {Component} {}
{Dependency} {Usage} {}
3 {Component diagram,
system context
diagram, system
architecture diagram,
reflexion model, story
board diagram}
{System context
diagram, system
architecture diagram,
reflexion model,
[Martin 2002]}
{}
Inter-class structure Inter-object
interaction
Physical
implementation
The structural
relationships between
the system’s classes
The behavioural
relationships between
the system’s objects
The classes used in
the physical
implementation of the
system’s data
structures
{Class} {Object} {Class}
115
{Inheritance,
Implementation,
Aggregation,
Composition}
{Invocation} {Inheritance,
Implementation,
Aggregation,
Composition}
2
{Class diagram,
object diagram, basic
graph}
{Class diagram,
object diagram,
sequence diagram,
collaboration
diagram, event sheet
diagram, message
sequence chart,
execution pattern}
{Class diagram,
object diagram}
Intra-object structure Intra-object
interaction
Primitives
The internal structure
of the system’s
objects
The internal
behaviour of the
system’s objects
The primitive data
objects used in the
system
{} {State} {Primitive}
{} {Action} {Operator}
1
(microscopic)
{} {Statechart diagram,
activity diagram,
story board diagram}
{Statechart diagram,
activity diagram,
story board diagram}
0 Program code Program code Program code
The third dimension of the abstraction model consists of static and/or dynamic analyses of
the software. As discussed in Section 2.1, static analyses have broad coverage but less detail,
while dynamic analyses are more focussed and more detailed. In this three-dimensional
model, the width of static analysis can be combined with the depth of dynamic analysis,
without their attendant disadvantages. This is achieved by the combination of analyses.
Combining a single static analysis with several dynamic analyses results in a visualisation
that is both detailed and broad in its coverage. A combination of multiple dynamic analyses
(without static analysis) could also be used to achieve this to an extent. The principal
challenge with respect to this aspect of the model is the way in which statically and
dynamically extracted information is combined and presented.
116
The multifaceted, three-dimensional abstraction model is illustrated in Figure 4.1. This
model is an example of the ‘multiple interdependent views illustrating multiple facets’
arrangement described in Section 2.4.2.6. The model will be refined in the following chapter.
The interrelationships between facets and analyses are not shown explicitly; these will be
defined as part of this formalisation process.
Structure Behaviour Data
1
2
3
4
5
FacetAnalysis
Static
Dynamic 1
Dynamic 2
View
Structure Behaviour Data
Structure Behaviour Data
Figure 4.1 A multifaceted, three-dimensional abstraction model for software visualisation
117
4.4 Examples
Examples of the structure, behaviour, and data abstraction hierarchies are given in Figures
4.2 – 4.4 respectively. It would be possible to synthesize a fourth facet that combines the
three existing facets and represents the integration of structure, behaviour and data in a single
hierarchy. However, as discussed in Section 2.4.2.4, this can lead to information overload
and reduced comprehensibility. It would instead be more useful to allow the analyst to define
their own views by combining information from the three existing facets. These figures serve
only as examples of the type of information represented by each level and each facet; there is
no abstraction relationship between the diagrams at the various levels of these example
figures. Each of the diagrams shown is only one example of the possible diagram types that
could be used to illustrate the information from each view. For example, activity diagrams
are given as an example diagram for level 2 structural information in Figure 4.2; these may
also be used at higher levels of abstraction, e.g. to refine use cases, to describe data
processing within an information system or organisation, or to specify an algorithm. The
empty levels in each figure correspond to the as yet undefined levels of the model.
118
class MyFigure extends AbstractFigure implements Cloneable { int color; public MyFigure(Color c) { color = c; } ... }
1
2
3
4
5
Deploymentdiagram
Componentdiagram
Classdiagram
Code0
AbstractFigure Cloneable
MyFigure
CH.ifa.draw.standard
com.michael.myapp
debugger:PC debuggee:Sun
<<JDWP>>com.sun.jdi
com.michael.tracer com.michael.myapp
CH.ifa.draw.standard
Figure 4.2 An example of the structure abstraction hierarchy
119
int myMethod(int a, int b) { System.out.println(”Start”); if (a==b) return 1; else { System.out.println(”2”); return 2; } }
Activitydiagram
Code
Sequencediagram
Assign taskto developer
Assign developerto project
Start new project
Find developer
<<uses>>
<<uses>>
Reflexionmodel
Use casediagram
5
4
3
2
1
0
return 1
return 2print “2”
[a==b]
![a==b]
print “Start”
objectA objectB
objectC
doSomething(int)
ObjectC()
doMore(char)
DrawingView
Drawing
Figure
Tool
Figure 4.3 An example of the behaviour abstraction hierarchy
120
5
4
3
2
1
0
Entity relationshipdiagram
Deploymentdiagram
Classdiagram
Code
Statechartdiagram
class ResearchStudent extends Person { String studNum;
ResearchStudent(Person p, String s) { super(p); studNum = s; }
setStuNo(String s) {...} setTopic(String t) {...} setTitle(String t) {...} submitThesis() {...} }
ResearchStudent() Registered
Researching
Writing Up
when [startDate >= currentDate < startDate + 3 years]
when [currentDate >= startDate + 3 years]
submitThesis()
entry / setTopic
entry / setTitle
entry / setStuNo
sex:booleanaddress:Stringdob:java.util.Date
Person
ResearchStudentstudNum:String
java.util.Hashtable
StaffstaffNum:Stringsalary:int
Namefirst:Stringmiddle:Stringlast:String
PC client uniServer:Sun
<<JDBC>>sun.jdbc
Enquiry system
UniDB:Database
<<TCP/IP>>
<<TCP/IP>>Staff info
Research studentinfo
staffServer:Vax
resStudentServer:Vax
ResStudent
Name
MiddleFirst Last
Sex
DoB
StudentNo
Address SUPERVISION
Staff
WORKS_IN
Department
Name
MiddleFirst Last
Sex
DoB
Address
StaffNoSalary
Name
RoomsStartDate
RESEARCHES_IN
department-research
researcher
supervisee supervisor
department-employ
worker1
N
2N
N
1
StartDate
Status
Figure 4.4 An example of the data abstraction hierarchy
121
4.5 Key research challenges
There are a number of key research challenges associated with this proposed solution. One
such challenge is the way in which the visualisation information will be stored as a model,
and how this will be used to generate view hierarchies. Another challenge is the definition of
the inter- and intra-hierarchy relationships between the views. Abstraction techniques
applicable to software visualisation will also be investigated. Identifying which views are
appropriate and useful for which comprehension tasks is a further challenge. The way in
which statically and dynamically extracted information is combined and presented will also
require investigation. The following chapters will address these research challenges.
122
5 Refining the Initial Model
“Modelling, in its computerized form, increasingly will take its place as the key knowledge
component in all forms of decision making in modern life.”
B P Zeigler [Zeigler 1984]
This chapter addresses the challenge identified in Section 4.6 of identifying which views are
appropriate and useful for which comprehension tasks. This is achieved by first validating
and refining the typical comprehension task sets used in the initial evaluation, then
theoretically evaluating and refining the proposed model based on the refined task sets.
5.1 Evaluation based on representative tasks
Chapter 3 evaluated the performance of visualisation tools by assessing their performance in
typical software comprehension tasks. The first part of this chapter: describes what the basis
for such a task set should be in terms of what information is useful for the comprehension of
OO systems; reviews the sets of tasks used in that evaluation and evaluates their
appropriateness and usefulness in evaluating a model for software comprehension; and
presents a revised evaluation task set, with accompanying justification. It is intended that this
revised task set will be used to evaluate the multifaceted, three-dimensional model proposed
in Chapter 4.
5.2 The basis for typical software comprehension tasks
A set of typical software comprehension tasks should seek to encapsulate the principal
activities typically performed during real world software comprehension. Software
comprehension activities can be divided up into those performed during general software
comprehension, where the intention is to gain an overall understanding of (a subset of) a
system, and those performed during a specific reverse engineering effort, where the intention
is to carry out a specific task (e.g. fix a bug). Some activities may involve examining the
123
structure of the software system, its behaviour, or both. Analysis at various levels of
abstraction is often required. Depending on the activity, statically or dynamically extracted
information, or a combination of both, may be desirable.
A number of typical software comprehension tasks are suggested in the literature. Storey et
al. used two sets of tasks in their study of interfaces to the Rigi tool described in Section
2.7.5 [Storey 1996a]. The ‘abstract’ tasks, which were high-level comprehension activities
that involved understanding the overall structure or design of the software, were:
1. Show familiarity with the game [that the system simulates]
2. Summarise what subsystem x does
3. Describe the purpose of artefact x
4. On a scale of 1-5, how well was the program designed?
The ‘concrete’ tasks, which were low-level comprehension activities that involved
understanding only part of the software, were:
1. Find all artefacts on which artefact x directly or indirectly depends
2. Find all artefacts that directly or indirectly depend on artefact x
3. Find an artefact that is not used
4. Find an artefact that is heavily used
Sim and Storey used two sets of tasks in their structured tool demonstration described in
Section 2.7.6 [Sim 2000a]. The tasks were intended to be representative of those encountered
by a software developer in their everyday work. The reverse engineering tasks were:
1. Provide a textual and/or graphical summary of how the [system’s] source code is
organised
2. Was [the system] well designed initially?
3. Do you think the original design is still intact?
4. How difficult will [the system] be to maintain and modify?
5. Are there some modules that are unnecessarily complex?
6. Are there any GOTOs? If so, how many? What changes would need to be made to
remove them?
The maintenance tasks were:
1. Modify the existing command panel
2. Add a new method for specifying arcs
3. Bug fix: loading library objects
124
Storey et al. used a set of tasks in their evaluation of the comprehension strategies supported
by the Rigi, SHriMP, and SNiFF+ tools, described in Section 2.7.8, that were intended to be
typical of what a maintenance programmer would be asked to do [Storey 1997, Storey 2000].
These were:
1. Look at the real Monopoly game until you understand the general concept and rules
of the game. Have you played Monopoly before?
2. Spend a while browsing the program using the provided software maintenance tool
and try to gain a high level understanding of the structure of the program.
3. In the computer game, how many players can play at any one time?
4. Does the program support a ‘computer’ mode where the computer will play against
one opponent?
5. There should be a limited total number of hotels and houses; how is this limit
implemented and where is it used? If this functionality is not currently implemented,
would it be difficult to add? What changes would this enhancement require?
6. Where and what needs to be changed in the code to implement a new rule which
states that a player in jail (and not just visiting) cannot collect rent from anyone
landing on his/her properties?
7. Overall, what was your impression of the structure of the program? Do you think it
was well written?
In their description of the Shimba reverse engineering tool, Systä et al. suggest three sets of
tasks supported by the tool [Systä 2001]. The ‘overall understanding’ tasks were:
1. What are the static software artefacts and how are they related?
2. How are the software artefacts used at run-time?
3. What is the high-level structure of a subject system?
4. How do the high-level components interact with each other?
5. Does the run-time behaviour contain regular behavioural patterns that are repeated?
If so, what are the patterns and under which circumstances do they occur?
6. How heavily has each component of a subject system been used at run-time and
which components have not been used at all?
The ‘goal-driven reverse engineering tasks’ were:
1. How does a certain component behave and how is it related to the rest of the system?
2. When was an exception thrown or when did an error occur? What happened before
that and in which order?
125
3. How is the component that causes exceptional behaviour constructed?
The ‘object/method behaviour’ tasks were:
1. What is the dynamic control flow and the overall behaviour of an object or a
method?
2. How can a certain state of an object be reached (i.e. which execution paths lead from
the initial state to this state) and how does the execution continue (i.e. which
execution paths lead from this state to the final state)?
3. To which messages has an object responded at a certain state during its lifetime?
4. Which methods of the object have been called during execution?
Kirk et al. conducted a questionnaire survey of students reusing a framework [Kirk 2001].
The questions asked how difficult the students found understanding the following aspects of
the framework:
1. Understanding individual classes and their methods
2. Using abstract classes and interfaces
3. Mapping your solution to framework code
4. Understanding the structure of inheritance hierarchies and object compositions
5. Understanding design patterns
6. Understanding the dynamic structure of the framework
7. Choosing from alternative framework solution strategies
8. Understanding the [framework’s] problem domain
The study found that the key issues were:
3. Mapping your solution to framework code
6. Understanding the dynamic structure of the framework
7. Choosing from alternative framework solution strategies
From a review of these tasks, the principal software comprehension activities can be defined
as follows.
A1. Investigating the functionality of (a part of) the system
A2. Adding to or changing the system’s functionality
A3. Investigating the internal structure of an artefact
A4. Investigating dependencies between artefacts
A5. Investigating runtime interactions in the system
A6. Investigating how much an artefact is used
126
A7. Investigating patterns in the system’s execution
A8. Assessing the quality of the system’s design
A9. Understanding the domain of the system
A set of typical software comprehension tasks should address all of these activities.
5.3 Task set analysis
A definitive set of typical software comprehension tasks does not appear to exist in the
literature. Therefore, in Chapter 3, two sets of tasks that were intended to be representative
of those performed in a typical software comprehension effort were compiled. The tasks
were divided into those typical of general software comprehension tasks, usually carried out
when attempting to understand a large part of the system, and those typical of specific
reverse engineering tasks, usually carried out on smaller parts of the system to perform a
specific purpose.
The classification of these tasks into general software comprehension tasks and specific
reverse engineering tasks delineates the tasks into those that are likely to be most
conveniently solved using higher and lower levels of abstraction, respectively, which
constitutes the first dimension of the model proposed. The tasks can also be classified by
whether they are concerned with the system’s structure, behaviour, or both – the second
dimension of the proposed model.
• Structural
o G1 What is the static structure of the software system?
o G3 What is the high-level structure/architecture of the software system?
• Behavioural
o G2 What interactions occur between objects at runtime?
o G4 How do the high-level components of the software system interact?
o G5 What patterns of repeated behaviour occur at runtime?
o S1 What are the collaborations between the objects involved in an
interaction?
o S2 What is the control structure in an interaction?
o S3 How can a problem solution be mapped onto the functionality provided
by the software system?
127
o S5 What alternative functionalities are available in the software system to
implement a solution?
o S6 How does the state of an object change during an interaction?
• Both
o G6 What is the load on each component of the software system at runtime?
o G7 What design patterns are present in the software system's
implementation?
o G8 Where in the software system are the hotspots where additional
functionality can be added?
o G9 What impact will a change made to the software system have on the rest
of the software system?
o S4 Where is the functionality required to implement a solution located in the
software system?
All of the tasks can be analysed using either statically or dynamically extracted information,
except L5 ‘What patterns of repeated behaviour occur at runtime?’ and L6 ‘What is the load
on each component of the software system at runtime?’ (these tasks cannot be answered
using statically extracted information). The third dimension of the proposed model integrates
statically and dynamically extracted information.
5.4 New task sets
None of the tools in the original study were able to answer either question G7 ‘What design
patterns are present in the software system's implementation?’ or G8 ‘Where in the software
system are the hotspots where additional functionality can be added?’. These tasks are most
applicable to frameworks and may not have been anticipated by the tool developers. Work
by Keller et al. [Keller 1999] and others on identifying design patterns, and by Schauer et al.
[Schauer 1999] and others on identifying hotspots, stress the role of the human analyst and
reveal that detecting design patterns and hotspots is a non-trivial task that can benefit from
tool support. It is for these reasons that these tasks are excluded from the revised task set.
Tasks G3 ‘What is the high-level structure/architecture of the software system?’ and G4
‘How do the high-level components of the software system interact?’ are more abstract
versions of G1 ‘What is the static structure of the software system?’ and G2 ‘What
128
interactions occur between objects at runtime?’, respectively. However, this similarity is
desirable to allow higher levels of abstraction to be evaluated. To clarify this distinction, the
word “class” is added to G1.
The word “static” is removed from G1, as it is not meant to imply that we are concerned
solely with the structure as defined by statically extracted information. For the same reason,
the phrase “at runtime” is removed from G2. Information on both a system’s structure and
behaviour can be extracted both statically and dynamically.
These task sets address all of the issues relating to the questions from previous studies
identified in Section 5.2 and constitute typical software comprehension tasks that can be
used to realistically evaluate the usefulness and effectiveness of software comprehension
models and tools for real-world software comprehension.
5.4.1 General software comprehension tasks
G1. What is the class structure of the software system?
G2. What interactions occur between objects?
G3. What is the high-level structure/architecture of the software system?
G4. How do the high-level components of the software system interact?
G5. What patterns of repeated behaviour occur at runtime?
G6. What is the load on each component of the software system at runtime?
G7. What impact will a change made to the software system have on the rest of the
software system?
5.4.2 Specific reverse engineering tasks
S1. What are the collaborations between the objects involved in an interaction?
S2. What is the control structure in an interaction?
S3. How can a problem solution be mapped onto the functionality provided by the
software system?
S4. Where is the functionality required to implement a solution located in the software
system?
129
S5. What alternative functionalities are available in the software system to implement a
solution?
S6. How does the state of an object change during an interaction?
5.5 Justification
The above task set is intended to exercise all of the features of the proposed model. It has a
selection of tasks requiring structural, behavioural, data, and combined information, various
levels of abstraction, and statically and dynamically extracted information. The tasks are
intended to be representative of typical software comprehension tasks, and are based on
software comprehension activities as described in Section 5.2. Therefore, an evaluation of
the proposed model using this task set should provide an accurate assessment of its utility
and effectiveness in supporting software visualisation for program comprehension.
Table 5.1 illustrates the principal correspondences between the typical software
comprehension activities identified in Section 5.2 and the revised evaluation tasks from
Sections 5.4.1 and 5.4.2. This table illustrates that the revised evaluation tasks address all of
the typical of software comprehension activities. The number of tasks that address each
activity varies as not all activities are at the same level of granularity. These tasks are
proposed as a complete set of typical comprehension tasks, representative of the full range of
comprehension activities, and encompassing all those found in the related literature.
Table 5.1 The correspondence between typical software comprehension activities and the revised task
sets
Activity Tasks
A1 G1, G2, S1
A2 G7, S3, S4, S5
A3 G1
A4 G1, G3
A5 G2, G4, S1, S2, S6
A6 G2, G6
A7 G5
A8 G3, G4, G7
A9 G3, G4
130
5.6 Task set revision summary
The first part of this chapter has discussed the basis for a set of typical OO software
comprehension tasks. It also reviewed the set of tasks used in the initial study described in
Chapter 3 and evaluated their appropriateness and usefulness in evaluating a model for
software comprehension. On the basis of this analysis, a new evaluation task set, with
accompanying justification, was presented. In the remainder of this chapter, this task set will
be used to evaluate the proposed software visualisation model.
5.7 Theoretical evaluation of the proposed model
Chapter 4 proposed a multi-faceted, three-dimensional model for software comprehension,
which is designed to address the comprehension shortcomings in current software
visualisation tools identified in the study described in Chapter 3. In the remainder of this
chapter, the evaluation technique described in this chapter will be applied to this model
theoretically, in order to determine which aspect(s) of the model are most useful in
improving the effectiveness of software visualisation for comprehension and hence most
promising for future research. The refined model will also be analysed to assess its support
for software comprehension strategies.
5.7.1 Model information required to address typical software comprehension tasks
The model will be evaluated theoretically by comparing the information required by each
task against the information provided by each aspect of the model. For example, in order to
answer question G6 ‘What is the load on each component of the software system at
runtime?’, both structural and behavioural information are required concerning classes,
components, and distribution (levels 2-4), and only dynamically extracted information would
be useful. As another example, to answer question S6 ‘How does the state of an object
change during an interaction?’, behavioural information at the intra-object level (level 1) is
required, and both statically and dynamically extracted information would be useful.
131
Tables 5.2 and 5.3 illustrate the information required from each dimension of the proposed
model to address each of the typical software comprehension tasks from Sections 5.4.1 and
5.4.2 respectively.
Table 5.2 Information required from each dimension of the proposed model to address the general
software comprehension tasks
Task Abstraction levels Facets Static/dynamic
G1 2 Structure Both
G2 2 Behaviour Both
G3 3-4 Structure Both
G4 3-5 Behaviour Both
G5 1-5 Behaviour Dynamic
G6 2-4 Structure, Behaviour Dynamic
G7 1-5 Structure, Behaviour, Data Both
Table 5.3 Information required from each dimension of the proposed model to address the specific
reverse engineering tasks
Task Abstraction levels Facets Static/dynamic
S1 2 Behaviour Both
S2 1-2 Behaviour Both
S3 1-3 Behaviour Both
S4 2-3 Structure, Behaviour Both
S5 2-3 Behaviour Both
S6 1 Behaviour Both
Firstly, these tables show that a variety of abstraction levels are required to address the
typical software comprehension tasks.
Secondly, it is clear from these tables that the Data facet is rarely used, and when it is used it
is in conjunction with the other facets. This is because in the object-oriented paradigm, a
system’s implemented data structures are encapsulated in the system structure. Hence, the
information that may have made a data facet appropriate in a procedural system is available
in the structure facet for object-oriented systems. Higher-level data structures may also be
present, but not apparent in the structure facet due to physical implementation details: a
132
logical data structure may be implemented as a number of smaller physical data structures.
For example, a hash map may be implemented as an array of Vectors in Java. The higher
levels (3-5) of the data facet would illustrate these higher-level data structures if the required
information on such structures were available.
Thirdly, these tables show that dynamically extracted information is useful for all tasks, and
a combination of statically and dynamically extracted information is useful for addressing
most of the tasks. Only the two general software comprehension questions regarding runtime
behaviour do not require statically extracted information.
It would appear that the most useful features of the proposed model are the range of
abstraction levels and the structural and behavioural facts. Therefore, the data facet is
removed from the model. The following chapter specifies and refines the abstraction scales
for the structural and behavioural facets of the model. A preliminary assessment of support
for software comprehension strategies in the model is given in Appendix C.
133
6 The Refined Model
“Manipulating abstractions is a potent means of formulating and solving real problems.”
B P Zeigler [Zeigler 1984]
6.1 Introduction
This chapter specifies the proposed model in more detail, in preparation for its
implementation and evaluation, thus addressing the remainder of the research challenges
identified in Section 4.6. The fully-specified model will consist of well-defined abstraction
levels, defined in terms of entities and relationships, along with abstraction mechanisms and
corresponding inter-level mappings to enable information to be transformed from one level
to another, and the combination of information from multiple levels. An application to a real
system is presented, which validates the practicality of the model, and illustrates how
abstraction mappings are created in practice.
6.2 Abstraction levels
Table 6.1 illustrates the structure and behaviour abstraction hierarchies of the proposed
model. This figure differs from the initial model presented in Section 4.3 as all levels are
specified for both the Structure and Behaviour hierarchies. Each abstraction level of each
facet is a view and consists of a name, a description, a set of entities, a set of relationships
between those entities, and a set of diagrams that illustrate software at that level of that facet.
Each named view in Table 6.1 is accompanied by an example diagram type that can be used
to illustrate information from the facet at the specific level of abstraction. It is intended that
the analyst will be able to move conveniently between these diagrams during the course of
their investigation in order to examine the information relevant to their task. The views
selected are intended to represent the information that an analyst would find useful during
software comprehension. The directed arcs between levels 0 and 1 indicate that the program
code and event trace do not belong in either hierarchy: information for both hierarchies can
be obtained from either source. A clear benefit of the integrated approach proposed is the
ease of integration of and movement between views.
134
Table 6.1 The abstraction levels of the proposed model
Level Structure Behaviour
Name Business structure Business behaviour
Description The structure defined by the
high-level business goals of the
system
The behaviour defined by the high-
level business goals of the system
Entities BusinessEntity BusinessEntity
Relationships BusinessRelationship BusinessRule
5
Example
diagram
Use case diagram Use case diagram
Name System structure deployment System behaviour distribution
Description The structural deployment of the
system
The behavioural distribution of the
system
Entities Component, Machine Component, Machine
Relationships Dependency, Containment,
Communication
Dependency, Containment,
Communication
4
Example
diagram
Deployment diagram Deployment diagram
Name System architecture Component interaction
Description The structural relationships
between the system’s high-level
components
The behavioural relationships
between the system’s high-level
components
Entities Component Component
Relationships Dependency Usage
3
Example
diagram
Component diagram Reflexion model
Name Inter-class structure Inter-object interaction
Description The structural relationships
between the system’s classes
The behavioural relationships
between the system’s objects
Entities Class Object
Relationships Inheritance, Implementation,
Aggregation, Composition
Invocation 2
Example
diagram
Class diagram Sequence diagram
135
Name Intra-class structure Intra-object interaction
Description The internal structure of the
system’s objects
The internal behaviour of the
system’s objects
Entities Method, Attribute State
Relationships Containment Action
1
Example
diagram
Source code representation Activity diagram
Name Source code Event trace
Description The system’s source code A dynamic event trace of the
system’s execution
Entities Operand Operand
Relationships Operator Operator
0
Example
diagram
Program listing Output statements
As a further example, an instantiation of the behaviour hierarchy is illustrated graphically in
Figure 6.1 for part of the JHotDraw framework [Johnson 1992]. In contrast to the initial
examples presented in Section 4.4, there is clear continuity between the abstraction levels of
this figure. The application of the model to a JHotDraw system is described in Section 6.5.
The entity-relationship diagrams for each of the twelve views are given in Appendix D.
These diagrams specify formally the information present in each view, and provide the basis
for defining the inter-view abstraction relationships.
136
/** * Creates a new figure by cloning the prototype. */ public void mouseDown(MouseEvent e, int x, int y) { fAnchorPoint = new Point(x,y); fCreatedFigure = createFigure(); fCreatedFigure.displayBox(fAnchorPoint, fAnchorPoint); view().add(fCreatedFigure); }
Activitydiagram
Codesample
Sequencediagram
Create new Figure
Reflexionmodel
Use casediagram
5
4
3
2
1
0
fCreatedFigure= (Figure)fPrototyope.clone();
throw newHJDError(”No prototypedefined”);
initialCreationTool
fAnchorPointPoint
hJDErrorHJDError
DrawingView
DrawingTools
Figures
drawingClient:PC drawingServer:Sun
<<TCP>>DrawingView
Drawing Tools
FiguresDeploymentdiagram
fAnchorPoint= new Point(x,y);
/** * Creates a new figure by cloning the prototype. */ protected Figure createFigure() { if (fPrototype == null)
throw new HJDError("No protoype defined"); return (Figure) fPrototype.clone(); }
[fPrototype = null]
![fPrototype = null]
fCreatedFigure.displayBox(fAnchorPoint,fAnchorPoint);
view().add(fCreatedFigure);
fPrototypeFigure
fCreatedFigureFigure
drawingViewDrawingView
1: mouseDown(MouseEvent,int,int):void 1.1: <constructor>
(x, y)
1.2: fCreatedFigure:=createFigure():Figure
if (fPrototype == null)1.2.1.1: <constructor>("No protoype defined")
1.2.1.1.1: <constructor>(msg)1.2.2: clone():Object
1.3: displayBox(fAnchorPoint, fAnchorPoint):void
1.5: add(fCreatedFigure):Figure1.4: view():DrawingView
Figure 6.1 An example instantiation of the behaviour hierarchy for part of the JHotDraw framework
137
6.3 Inter-level abstraction relationships
6.3.1 Abstraction mechanisms
Fundamental to the model is the definition of abstraction relationships between the model
views. These abstractions allow software visualisations to be related and manipulated. Such
an integrated arrangement is preferable to an ad hoc approach as it allows software
visualisations to be reasoned about in a coordinated and correct manner. Yan et al. comment
that the most critical challenge in discovering high-level software artefacts from low-level
system events is “finding mechanisms to bridge the abstraction gap: in general, low-level
system observations do not map directly to architectural actions” [Yan 2004, p. 470].
Abstraction operations are defined between each of the views in the model. These operations
illustrate the relative abstraction level of each view (e.g. that one view is more abstract than
another), and define the transformations between views. Figure 6.2 presents the abstraction
hierarchies from Table 6.1 in the form of an abstraction network that illustrates these
operations [Fishwick 1988]. The more abstract representation (the arc target) is derived from
the less abstract base representation (the arc source) by applying the transformation indicated
by the arc label. The three abstraction mechanisms used in the network are:
• abstraction by reduction (RED);
• abstraction by induction (IND); and
• partial systems morphism (PSM).
These mechanisms are described in Section 2.3.3. Fishwick describes these mechanisms and
presents examples in the context a simulation of the dining philosophers (DP) problem
[Dijkstra 1968]. For abstraction by reduction, Fishwick gives the example of abstracting
from a Petri net [Peterson 1981] to a frequency distribution. For abstraction by induction, the
example given by Fishwick is of abstracting from the observed data to a finite state
automaton. For partial systems morphism, Fishwick’s example makes use of a PSM to
produce a more abstract Petri net (with fewer arcs) from a less abstract one. None of these
abstraction mechanisms require a change in representation: the representation used is simply
one that is appropriate to the underlying data.
138
Program Code / Event Trace
Inter Class Structure
System Architecture
System Structure Deployment
Intra Object Interaction
Inter Object Interaction
Component Interaction
Business Behaviour
System Behaviour Distribution
Business Structure
Intra Class Structure
2
1
0
3
4
5
PSM
INDIND
IND IND
IND IND
RED RED
Figure 6.2 An abstraction network illustrating the abstraction relationships between the views of Table
6.1
The interpretation of abstraction by reduction employed here is an abstraction function that
applies some summarisation function to the data to extract its essential higher-level
properties. For example, abstraction from level 3 – level 5 requires input from a human
analyst to define the summarisation function that recognises business-level information from
component-level information. Abstraction by reduction in this context typically requires
human or heuristic analysis.
Abstraction by induction is interpreted here as a function that amalgamates lower-level
entities and relationships to form higher-level ones. For example, abstraction from level 1 –
level 2, level 2 – level 3, and level 3 – level 4 involve grouping entities and relationships at
139
the lower level into their equivalents at the higher level. Abstraction by reduction can often
be automated entirely, though may benefit from analyst information to define a more
appropriate or precise amalgamation.
A partial systems morphism in this context is a direct mapping from a subset (proper or
improper) of the entities and relationships at the lower level to those at the higher level. An
entity or relationship at the lower level may map to zero, one, or more entities or
relationships respectively at the higher level, and each entity at the higher level must be
mapped to by at least one entity at the lower level. For example, abstraction from level 0 –
level 1 involves mapping operators and operands to higher-level entities and relationships.
Given the generic mappings to apply to the system, abstraction by partial systems morphism
can be automated conveniently, and indeed benefits from automation in terms of time and
correctness over a human implementation of the mappings.
In the model presented here, a PSM is used to extract basic information from the program
code (or from an event trace in the case of dynamic analysis). From this basic information,
information on intra- and inter-class and object interactions is available. This information
can then be transformed using abstraction by induction to generate information on the system
architecture and high-level component interaction. From this information, abstraction by
induction can be applied to produce information on the system’s structural and behavioural
distribution, or abstraction by reduction can be used to elicit the system’s business-level
structure and behaviour.
6.3.2 Detailed abstraction example
The following example considers the behavioural hierarchy from Figure 6.1, and is intended
to illustrate the abstraction mechanisms of the network. Firstly, the program code is
abstracted using a PSM to generate a data set. This makes use of a mapping between some
elements in the source code and elements in the extracted dataset. For example, method calls
would have a mapping to an appropriate representation in the extracted data, but comments
would be ignored. Views of the system’s intra- and inter-object behaviour can be produced
directly from this extracted data (alternatively, the inter-object view could be generated from
the intra-object view by means of abstraction by induction). These views could be visualised
using, for example, activity diagrams and sequence diagrams respectively.
140
The PSM abstraction from level 0 (code) to level 1 (intra-object behaviour) entails a
mapping from the source code to the entities and relationships at level 1, which are states and
actions respectively (see Table 6.1). In this example, the call fAnchorPoint = new
Point(x, y); in the code becomes the first state in the activity diagram, while the
condition if (Prototype == null) becomes an action.
The PSM abstraction from level 0 (code) to level 2 (inter-object behaviour) entails a
mapping from the source code to objects and invocations. Object references in the source
code become objects, while method calls become invocations. In this example, the object
reference fAnchorPoint in the code becomes the first object in the sequence diagram,
while the call to Point(x, y) becomes the second invocation.
The abstraction by induction from level 1 to level 2 involves grouping the level 1 entities and
relationships (states and actions respectively) into their level 2 counterparts (objects and
invocations). The activity diagram for each object maps onto the corresponding level 2
object, while the states that contain method calls become invocations. In this example, the
activity diagram becomes the object initial:CreationTool in the sequence diagram, while the
first state (fAnchorPoint = new Point(x,y)) becomes invocation 1.1.
From level 2, abstraction by induction can be applied to the inter-object information to
generate information on component interactions in the system. This view could be visualised
using a reflexion model [Murphy 2001]. The abstraction by induction from level 2 to level 3
involves grouping objects and invocations into components and usages respectively. Objects
are grouped together to form components. Invocations between objects in separate
components become usages. In this example, the CreationTool object becomes part of the
Tools component, and the invocations from this object to the Figure objects in the sequence
diagram become the usage from the Tools component to the Figures component in the
reflexion model.
From level 3, there are two more abstract representations that can be generated. Abstraction
by induction can be applied to generate information on the system’s behavioural distribution
(level 4), which could be visualised using a deployment diagram. Abstraction by reduction
can be applied to produce information on the system’s business behaviour (level 5), which
could be visualised by a use case diagram.
141
The abstraction by induction from level 3 to level 4 involves grouping components into
components and machines, and usages into dependency, containment, and communication
relationships. The level 3 components map directly to the components of the same name at
level 4. Similarly, inter-component usages at level 3 map directly to dependencies at level 4.
(Information on which components are located on which machine is available from the
distribution manager (e.g. CORBA).) Where a level 3 dependency exists between two
components not executing on the same machine, that dependency maps to a communication
between the machines at level 4. In this example, the Figures component from the reflexion
model becomes the Figures component in the deployment diagram. The usage between
Drawing and Figures in the reflexion model becomes the inter-machine communication in
the deployment diagram.
The abstraction by reduction from level 3 to level 5 involves the application of a function to
summarise component interaction, expressed in terms of components and usages, into
business behaviour in terms of business entities and business rules. The levels 3 components
map to business entities, while the level 3 usages map to business rules. In this example, the
component usages between Tools, Figures, Drawing, and DrawingView in the reflexion
model correspond to a single use case for creating a new figure in the diagram.
6.3.3 Generic abstraction mappings
In order to perform these abstractions, the mappings mentioned above must be defined. The
mappings for the structure and behaviour hierarchies are listed in Tables 6.2 and 6.3
respectively. The mappings listed in Tables 6.2 and 6.3 are defined generically in this
section, and relate the entities and relationships at level n to their counterparts at level n+1.
The entities and relationships in Tables 6.2 and 6.3 are subtypes of those in Table 6.1. For
example, those at level 0 constitute the complete set of entities and relationships in the
source code or event trace that are relevant to the model. An example of abstraction from
level 0 - level 1 would be an entity class Figure {…} extracted from the source code,
which is of type ClassContainmentOperator0, which relates to the entity Figure of type
ClassContainment1 at level 1. Further detail and examples are given when the model is
applied manually to a real system in Section 6.5. The operator denotes an abstraction
operation. The abstraction operations between each level are those illustrated in Figure 6.1.
For example, abstraction from intra-class information at level 1 to inter-class information at
142
level 2 involves the application of the appropriate mappings from Table 6.2, which
implement abstraction by induction between the two levels. The reflexion model technique
of Murphy et al. also makes use of mappings to relate low-level software artefacts to higher-
level architectural components [Murphy 2001].
When applying the mappings to a specific system, some mappings may be generated
automatically, while some may require or benefit from knowledge of the system and its
domain. In both hierarchies, the mappings from levels 0–1, 0–2, 1–2, and 3–4 are generated
entirely automatically. The mappings from levels 2–3 can be generated automatically for
some systems (e.g. based on source code naming conventions), but can also benefit from
analyst knowledge of the system and domain (i.e. RelatedClasses2 and RelatedObjects2 relate
to Component3 based on the analyst’s information). The mappings from levels 3–5 can be
generated automatically by means of selective tracing of use cases, or an analyst can relate
RelatedComponents3 to BusinessEntity5. The process of generating and applying the specific
mappings will be demonstrated in Section 6.5 when the model is applied in the context of a
real system.
6.3.3.1 Structure hierarchy
Table 6.2 gives the abstraction mappings for the structure hierarchy.
143
Table 6.2 Abstraction mappings for the structure hierarchy
From
level
From
Entity/Relationship
Abstraction
technique
To
Entity/Relationship
To level
Operand Method, Attribute 0
Operator PSM
Containment 1
Operand Class
0 Operator
PSM Inheritance,
Implementation,
Aggregation,
Composition
2
Method, Attribute Class
1 Containment
IND Inheritance,
Implementation,
Aggregation,
Composition
2
Class Component
2 Inheritance,
Implementation,
Aggregation,
Composition
IND Dependency
3
Component Component
3 Dependency
IND Machine,
Dependency,
Containment,
Communication
4
Component BusinessEntity 3
Dependency RED
BusinessRelationship 5
The generic abstraction mappings for the structure hierarchy are defined formally as follows.
Level 0 – Level 1
ClassContainmentOperator0 ClassContainment1
MethodContainmentOperator0 MethodContainment1
MethodDeclarationOperand0 Method1
AttributeDeclarationOperand0 Attribute1
144
Level 0 – Level 2
InterfaceOperand0 Interface2
ClassOperand0 Class2
InheritanceOperator0 Inheritance2
ImplementationOperator0 Implementation2
AggregationOperator0 Aggregation2
CompositionOperator0 Composition2
Level 1 – Level 2
Class1 Class2
Interface1 Interface2
Method1 MethodOfClass2
Attribute1 AttributeOfClass2
ClassContainmentWithInheritance1 Inheritance2
ClassContainmentWithImplementation1 Implementation2
ClassContainmentWithAggregation1 Aggregation2
ClassContainmentWithComposition1 Composition2
Level 2 – Level 3
RelatedClasses2 Component3
InheritanceBetweenUnrelatedClasses2 Dependency3
ImplementationBetweenUnrelatedClasses2 Dependency3
AggregationBetweenUnrelatedClasses2 Dependency3
CompositionBetweenUnrelatedClasses2 Dependency3
Level 3 – Level 4
Component3 Component4
LocalDependency3 Dependency4
RemoteDependency3 Communication4
ComponentsSharingLocalDependencies3 Machine , Containment4 4
Level 3 – Level 5
RelatedComponents3 BusinessEntity5
DependencyBetweenUnrelatedComponents3 BusinessRelationship5
145
6.3.3.2 Behaviour hierarchy
Table 6.3 gives the abstraction mappings for the behaviour hierarchy.
Table 6.3 Abstraction mappings for the behaviour hierarchy
From
level
From
Entity/Relationship
Abstraction
technique
To
Entity/Relationship
To level
Operand State 0
Operator PSM
Action 1
Operand Object 0
Operator PSM
Invocation 2
State Object 1
Action IND
Invocation 2
Object Component 2
Invocation IND
Usage 3
Component Component
3 Usage
IND Machine,
Usage,
Containment,
Communication
4
Component BusinessEntity 3
Usage RED
BusinessRule 5
The generic abstraction mappings for the behaviour hierarchy are defined formally as
follows.
Level 0 – Level 1
OperandsDefiningCurrentState0 State1
OperatorTriggeringStateChange0 Action1
Level 0 – Level 2
ObjectOperand0 Object2
InterObjectOperator0 Invocation2
146
Level 1 – Level 2
AllStates1 Object2
InterObjectAction1 Invocation2
Level 2 – Level 3
RelatedObjects2 Component3
InvocationBetweenUnrelatedObjects2 Usage3
Level 3 – Level 4
Component3 Component4
LocalUsage3 Usage4
RemoteUsage3 Communication4
ComponentsSharingLocalUsages3 Machine4, Containment4
Level 3 – Level 5
RelatedComponents3 BusinessEntity5
UsageBetweenUnrelatedComponents3 BusinessRule5
6.3.4 Combining information from multiple views
The rigorous definition of the abstraction mappings enables information from multiple views
to be combined. Most CASE and software visualisation tools do not support this flexibility.
View combination is useful for determining the low-level artefacts that correspond to high
level system properties, and for focusing analyses. For example, an analyst may wish to
examine the class structure responsible for some behaviour observed at the business level.
The combination of level 2 structural and level 5 behavioural information would allow them
to investigate this. The three possible scenarios for combining information are:
1. from the same level of each hierarchy;
2. from different levels of the same hierarchy; and
3. from different levels of each hierarchy.
147
6.3.4.1 From the same level of each hierarchy
The first combination is achieved by forming the union of the sets of entities and
relationships of each view. This can be expressed algebraically by Equations 1 and 2.
(1) Sx
Bx
Cx EEE ∪=
(2) Sx
Bx
Cx RRR ∪=
Where: denotes the set of entities at level x of hierarchy y; denotes the set of
relationships at level x of hierarchy y; B denotes the behaviour hierarchy; S denotes the
structure hierarchy; and C denotes the combination.
yxE y
xR
An example application of combining information from the same level of each hierarchy
would be to produce a unified visualisation of the structural and behavioural characteristics
of the system, for example at the component level.
6.3.4.2 From different levels of the same hierarchy
The second combination is achieved by relating the less abstract entities and relationships to
their more abstract counterparts in the same hierarchy. The relationship between these levels
is defined by the composition of the mappings from the less abstract level to the more
abstract level. This is shown in Equation 3.
(3) ymm
ynn
ynm fff 1....1.. ... +−= οο
Where: denotes the abstraction operation from level m to level n of hierarchy y; m<n;
and ο is the composition operator
y
nmf
..
The sets of entities and relationships in the combination consist of the union of the sets of
entities and relationships at levels m and n of hierarchy y, with three exceptions. Firstly, level
m entities that do not have an abstraction mapping to level n entities are not included.
148
Secondly, level m relationships whose entities are not present in the combination (due to
exception 1) are not included. Thirdly, it may be desirable to hide level m relationships that
are subsumed by level n relationships. The sets of entities and relationships remaining after
applying these exceptions are expressed algebraically in equations 6 and 7. Thus, the sets of
entities and relationships in the combination can be expressed algebraically by Equations 4
and 5.
(4) ym
yn
Cnm ExEE ∪=,
(5) ym
yn
Cnm RxRR ∪=,
Exception 1: (6) )]}([|{ y
m..n
ym
ym fdomeEeeEx ∈∈∀=
Exception 2 ∪ Exception 3:
(7) ]},[|{
]},[|{
)()(,
,2,1,
2..1..21
21
ynefef
ymee
Cnm
Cnm
ymee
ym
RrRrr
EeEeRrrRxy
nm
y
nm ∉∈∀∪
∈∈∈∀=
Where: e is an element of E; and r is an element of R relating entities e1 and e2.
An example application of combining information from different levels of the same
hierarchy would be to reveal the lower-level interactions responsible for the system’s high-
level behaviour.
6.3.4.3 From different levels of each hierarchy
The third combination is achieved by relating the less abstract entities and relationships in
one hierarchy to the more abstract entities and relationships in the other. This combination is
equivalent to applying both of the previous combinations. As in the previous combination,
the relationship between these levels is defined by the composition of the mappings from the
less abstract level to the more abstract level. This function is given by Equation 3, above.
149
The sets of entities and relationships in the combination consist of the union of the sets of
entities and relationships at level n of hierarchies y and y’, and level m of hierarchy y
(assuming hierarchy y contains the less abstract view), with the three exceptions defined in
Equations 6 and 7, above. Thus, the sets of entities and relationships in the combination can
be expressed algebraically by Equations 8 and 9.
(8) ym
yn
yn
Cnm ExEEE ∪∪= '
,
(9) ym
yn
yn
Cnm RxRRR ∪∪= '
,
Where: y’ denotes the inverse hierarchy to y (i.e. when y represents the behaviour hierarchy,
y’ represents the structure hierarchy, and vice versa).
An example application of combining information from different levels of each hierarchy
would be to investigate the structural elements responsible for some high-level behaviour.
An example of this is shown in Figure 6.3.
The general cases considered here discuss the combination of two views. The specifications
can also be used to combine more than two views and to focus analyses. View combination
in a real system is demonstrated in Section 6.5.
150
Create new Figure
Figure
ConnectionFigure
FigureChangeListener
Drawing
DrawingChangeListener
DrawingView
Tool
Create new Figure
Structure level 2 Behaviour level 5
Combination
Figure
ConnectionFigure
FigureChangeListener
Drawing
DrawingChangeListener
DrawingView
Tool
Figure 6.3 Example combination of level 2 structure and level 5 behaviour information
6.4 Metamodels
This section describes two metamodels that have been proposed for representing information
about software systems and compares them with the novel model proposed in this thesis.
151
6.4.1 Dagstuhl Middle Metamodel
The DMM is a metamodel for representing the static structure of source code [Lethbridge
2003]. The DMM has separate hierarchies for source elements and model elements. A third
hierarchy represents relationships: between model elements, between source elements, and
between model elements and source elements. The DMM can be used to model both OO and
non-OO systems. Some work has been done on integrating dynamic information. It has been
proposed that a formal semantics for the classes and relationships of the model should be
formulated. Proposed extensions include the modelling of dynamic information. Future work
may include developing mappings to relate to lower level schemas, and developing
architectural schemas that link to the DMM.
A small portion of the DMM could be used to represent the level 0 information extracted
from the source code or event trace. However, this would be overkill for the purposes of the
model proposed in this thesis. As an implementation detail, it would perhaps be desirable to
accept information from a DMM model as level 0 input to the model if interoperability were
a concern and it was observed that the DMM was achieving widespread acceptance in the
reverse engineering community (though this does not appear to be the case).
6.4.2 UML metamodel
The Unified Modelling Language (UML) is defined using a four-layer metamodel hierarchy.
The UML 2.0 superstructure metamodel [OMG 2003a] defines relationships between
concepts such as components and deployments.
The infrastructure [OMG 2003b] defines specific diagrams at a lower level of abstraction. It
is therefore possible, using the superstructure and infrastructure, to relate the different UML
diagram types and their elements. For example, UML can be used to relate statecharts
depicting behaviour to the class responsible for encapsulating that behaviour.
There does not appear to be any explicit acknowledgement of software abstraction levels. It
is therefore more cumbersome to relate software artefacts. In the novel model proposed in
this thesis, all level 1 information (intra-object interactions, which may be represented as
statecharts) relating to the same object entity can be abstracted to that object entity at level 2
152
(inter-object interactions, which may be represented as an object diagram). In UML, the
StateMachine entity is related to a BasicBehaviours:Behaviour entity, which is in turn related
to a Kernel:Class entity. In contrast, the novel model defines generic mappings to relate
software artefacts at different levels of abstraction; these mappings are intuitive and can be
conveniently created and read by a human analyst.
While UML describes and relates diagrams in terms of diagram concepts, the novel model
makes use of software entities and relationships, which are independent of a specific diagram
type. Thus, the novel model allows any type of diagram to be plugged in and used for
visualisation of the underlying model information by populating it with the required entities
and relationships. To achieve this with UML would require translation of this basic
information (which is extracted directly from a system) into the UML metamodel.
The novel model is also much more compact and focussed than the UML metamodel, as it is
targeted specifically to the reverse engineering process, rather than encompassing the
forward engineering steps of analysis and design.
There does not appear to be anything useful in the UML metamodel that the novel model
omits. This is because the novel model is based on accepted software artefacts (e.g. classes,
components, etc.) so any omission would be readily identified. A partial instantiation of the
UML schema could perhaps be used to implement the novel model, though this would lead
to an unnecessarily complicated implementation.
The fine-grained concepts embodied in UML (e.g. activities and triggers in state machines)
are not necessary in the novel model. The basic information required to draw the relevant
UML models at each level of abstraction is present, and could be augmented with this
additional information to produce a more precise UML diagram if desired. However, it
would be overly burdensome to include such information that is specific to a particular
diagramming notation by default; the intention of the novel model was to be independent of
any specific notation by representing the principal software constructs, thus allowing any
notation that represents the entities and relationships at a particular level to be plugged into
the model for display purposes as needed.
If all of the UML metamodel diagrams were amalgamated into a single diagram, diagrams
could be related to each other. Abstraction mappings are not defined explicitly, though one
153
model could be related to another through such a comprehensive metamodel. For example,
all the state machines for a class could be combined to abstract them into an entity
representing that class in a class diagram.
The principal differences between the novel model and the UML metamodel can be
summarised as follows:
1. the novel model deals explicitly with software artefacts at multiple levels of
abstraction;
2. generic mappings in the novel model make inter-abstraction level relationships
explicit;
3. the novel model supports multiple diagram types; and
4. the novel model is focussed on reverse engineering, and hence simpler.
An exploration of how the UML could be used to represent some of the information from the
novel model proposed in this thesis is given in Appendix F.
6.5 Applying the model to a real system
The applicability and feasibility of the model is demonstrated by applying it manually to a
real system (JHotDraw). The system-specific abstraction mappings were generated, which
allowed the abstraction hierarchies for the system to be produced. The diagrams produced
were then validated by comparison with those from reliable sources.
The class and sequence diagrams produced by the model were compared to those from the
system documentation, and were found to match closely. Discrepancies were due to the use
of static information in the documentation diagrams, the system designers’ ‘idealised’ view
of the system’s interactions, and the coverage of the event trace.
The component diagrams produced from the novel model were compared to those produced
by a system expert. There was a 30% agreement between the communications illustrated in
the diagram produced from the manual application of the model and the expert’s diagram.
Discrepancies were again due to the expert’s use of static information, his idealised view of
the system’s interactions, and the coverage of the event trace. This comparison illustrates
that although the novel model is an accurate representation of the system as implemented, it
154
may not entirely reflect an analyst’s idealised view of the system’s design. A potential
weakness that would affect the accuracy of the novel model in this comparison is the
accuracy of the level 2-3 abstraction mappings provided by the JHotDraw expert.
The use case diagrams produced from the novel model were compared to that produced by
the system expert. There was no agreement between the communications shown in these
diagrams. This shows that the business relationships generated by abstracting from low-level
structural and behavioural interactions in the novel model are not synonymous with the
expert’s perception of the system’s use cases. As in the previous comparison, this result
shows that the novel model may not entirely reflect an analyst’s idealised view of the
system’s design.
These initial results provide confidence in the validity of the model and the abstraction
relationships. The points of variability are the information provided by the expert analyst and
the coverage of any dynamic trace. Full details of the manual application of the model are
given in Appendix B.
6.6 VANESSA: Visualisation Abstraction NEtwork for Software Systems Analysis
The motivation in building VANESSA was to demonstrate the feasibility of the model, and
to allow a rigorous evaluation of the model using real software systems to be performed.
VANESSA fully implements all aspects of the model described in this thesis.
6.6.1 Tool implementation
VANESSA is implemented in Java (J2SE 5.0) and analyses Java systems. Structural
information is extracted statically from the program code. VANESSA first converts the
source code to XML using BeautyJ [Gulden 2004], then manipulates the XML
representation by means of an XSLT stylesheet using Xalan [Apache 2004] to generate basic
(‘level 0’) structure information. Behavioural information is extracted dynamically by
generating an event trace. VANESSA incorporates a custom-built JPDA-based tracing utility
implemented using JDI [Sun 2004a]. The trace generated contains the level 0 information for
the behaviour hierarchy.
155
The next stage is to parse the generated level 0 information to produce level 1 and level 2
information. Once this process is completed, the abstraction mappings are applied to
generate the higher-level views and the abstraction relationships between them. Expert
mappings are read from text files if available. The ten ‘basic’ views comprising the model
hierarchies (see Table 6.1) can now be output to files. The user can also specify any focussed
view or combination of views to be generated, as specified in Section 6.3.4. The dot format
[Gansner 2002] is currently used for output, though the generic nature of the model
implementation allows any output format to be conveniently plugged in, such as UML. The
output can now be viewed using a viewer such as dotty [Koutsofios 1996a]. The analysis
process is illustrated in Figure 6.4.
Figure 6.4 The VANESSA analysis process
156
6.6.2 Example analyses
This section demonstrates a number of example analyses of JHotDraw to illustrate the
capabilities of VANESSA and the underlying model. A simple example is presented first,
followed by examples of each type of combination described in Section 6.3.4. This serves to
illustrate both a selection of the basic views and the possible types of combination. In the
combination figures, less abstract entities are depicted nested within more abstract entities.
An example application of one of the basic views is investigating behavioural interactions
between components in a system. The level 3 view from the Behaviour hierarchy illustrates
this information. This view is shown in Figure 6.5.
Figure 6.5. The level 3 Behaviour view of JHotDraw. Arcs denote usage
An example application of combining information from the same level of each hierarchy is
to produce a unified visualisation of the structural and behavioural characteristics of the
system, for example at the component level. This is accomplished in VANESSA by
combining the level 3 views from the Structure and Behaviour hierarchies. Part of the result
is illustrated in Figure 6.6.
157
Figure 6.6 Combining views from the same level of each hierarchy. In the combined view, solid arcs
denote usage and dashed arcs denote dependency
An example application of combining information from different levels of the same
hierarchy is to reveal the lower-level interactions responsible for the system’s high-level
behaviour, such as the inter-component interactions responsible for some business level
behaviour. This is accomplished in VANESSA by combining the level 3 and level 5 views
from the Behaviour hierarchy. Part of the result is illustrated in Figure 6.7.
158
Figure 6.7 Combining views from different levels of the same hierarchy. In the combined view, arcs
between components denote usage and arcs between business entities denote business rules
An example application of combining information from different levels of each hierarchy is
to investigate the structural elements responsible for some high-level behaviour, such as the
inter-class relationships responsible for some component level behaviour. This is
accomplished in VANESSA by combining the level 2 view from the Structure hierarchy and
the level 3 view from the Behaviour hierarchy. Part of the result is illustrated in Figure 6.8.
159
Figure 6.8 Combining views from different levels of each hierarchy. Between classes: solid arcs
denote association; dashed arcs denote extension; dotted arcs denote inheritance. Between
components: arcs denote usage
6.6.3 Comparison with other software visualisation tools
Of the software visualisation tools discussed in Section 2.2, those with the most
commonality in functionality with VANESSA are Dali, Shimba, and SHriMP. The similarity
with Dali lies in its ‘view fusion’ functionality, in which links are established between views
from different sources in order to show complementary information. This typically involves
the combination of information from static and dynamic extractors. VANESSA supports the
combination of statically and dynamically extracted information, and allows the combination
of structural and behavioural information. VANESSA also differs from Dali in that
VANESSA provides a range of abstraction levels, while Dali visualises only architectural
level information.
160
Like VANESSA, Shimba extracts structural information statically and behavioural
information dynamically and allows the views generated from this information to be
combined. In contrast to VANESSA, Shimba addresses only a limited range of abstraction
levels by visualising inter-class to architectural level information, though this is a wider
range than the single level offered by Dali.
SHriMP achieves view combination by means of a fisheye view approach which allows the
analyst to show parts of a diagram at a lower level of abstraction while retaining context.
This differs from the current VANESSA implementation which makes use of basic graphs
where all entities and relationships are at one of a defined set of abstraction levels (i.e. the
levels of abstraction of the views from which the combined view was generated). SHriMP
visualises inter-class to architectural level information, thus providing a similar range of
abstraction levels to Shimba, and also provides linkage to source code, though this is a
narrower range than that provided by VANESSA.
6.7 Summary
This first part of this chapter described and demonstrated a fully specified abstraction model
for software visualisation. The abstraction relationships in the model were defined formally,
and the model was applied to an application built using the JHotDraw object-oriented
framework to demonstrate its use. The combination of information from multiple views of
the model was defined formally and demonstrated. Lastly, the models of the system
produced were compared with models produced by other sources. It is concluded that the
model presented is a practical and valid approach to visualising a software system. The
relationships between the model presented in this thesis and simulation and continuous
system abstraction techniques are discussed in Appendix E.
The second part of this chapter presented a tool implementation of the fully-specified model.
VANESSA was created to demonstrate the feasibility of the visualisation model, and to
allow a rigorous evaluation of the model to be performed. Having demonstrated the
feasibility of the approach, the effectiveness of the model in supporting software
comprehension will now be evaluated. The evaluation will employ a range of system types,
and will make use of typical software comprehension tasks, as in the initial study described
161
in Chapter 3. This will allow the effectiveness of the proposed visualisation model in
supporting large-scale, real world software comprehension to be assessed.
162
7 Evaluation
“As Minsky [Minsky 1965] observed a model is not simply a model, it is a model which can
answer certain questions about a certain object for a certain questioner.”
B P Zeigler [Zeigler 1984]
This chapter describes the evaluation of the software visualisation model as implemented in
VANESSA using typical software comprehension tasks. The purpose of this evaluation is to
explore the research hypothesis stated in Section 4.2, namely:
A model that supports visualisation of software through a range of abstraction levels
that incorporate structural and behavioural views and integrates statically and
dynamically extracted information will provide effective support for the full range of
software comprehension tasks.
7.1 Experimental setup
The model was evaluated using four systems – two small (JHotDraw and BeautyJ) and two
medium-sized (SHriMP and ArgoUML). A replication of the original study described in
Chapter 3 was also performed using the model. The systems were selected to provide a
variety of system types and sizes. For each system a set of comprehension questions was
obtained either from an expert (3 systems) or documentation (1 system). Where an expert
was employed, they were also asked to provide mappings between system elements as
described in Section 6.3.3. The experts were given some basic information on the model, an
explanation of what was required, and a set of example comprehension questions based on
the set of typical questions to demonstrate the style expected. The experimenter then
attempted to answer the experts’ questions using VANESSA. This setup was designed to
examine the conclusion from the original study that a visualisation model combining
structural and behavioural information and a range of abstraction levels would be useful in
addressing the majority of the tasks in that study, and to validate the research hypothesis that
163
the visualisation model developed is useful in addressing typical software comprehension
tasks in real world systems.
7.2 Comprehension questions
The typical software comprehension questions to be used in the evaluation as defined in
Section 5.4 are as follows.
General software comprehension questions
G1. What is the class structure of the software system?
G2. What interactions occur between objects?
G3. What is the high-level structure/architecture of the software system?
G4. How do the high-level components of the software system interact?
G5. What patterns of repeated behaviour occur at runtime?
G6. What is the load on each component of the software system at runtime?
G7. What impact will a change made to the software system have on the rest of the
software system?
Specific reverse engineering questions
S1. What are the collaborations between the objects involved in an interaction?
S4. Where is the functionality required to implement a solution located in the software
system?
S6. How does the state of an object change during an interaction?
The specific reverse engineering questions have been reduced from six to three in order to
clarify them: S2 was removed and is represented by S1; S3 and S5 were removed and are
represented by S4. It was felt that these questions were too framework-oriented and the
differences between them too subtle to be helpful.
7.3 Threats to validity
There are three principal types of validity threat that must be considered in this evaluation.
These are internal validity, construct validity, and external validity.
164
7.3.1 Internal validity
Internal validity is concerned with mitigating sources of bias in the experiment that would
affect the cause-effect process being studied [Bryman 1988]. In the case of the replication of
the original study, there is the possibility that the experience of the experimenter in
performing that original study would result in improved performance in the replication. This
was considered unlikely to affect validity as there was a temporal separation of almost three
years between these studies.
In the evaluation proper, there is the possibility that the experimenter’s experience with the
systems used in the evaluation would affect his ability to act as a ‘typical’ software
maintainer. To mitigate this risk, systems were chosen with which he had a range of
experience, namely as a reuser (JHotDraw), as a user (BeautyJ), familiarity with the concept
only (SHriMP), and no exposure (ArgoUML).
7.3.2 Construct validity
Construct validity is the extent to which a test actually measures what it purports to be
measuring [Kirk 1986, Litwin 1995, de Vaus 1996]. This study claims to be measuring the
extent to which the model is useful in addressing typical software comprehension tasks.
There are two potential threats to construct validity.
Firstly, it must be ensured that the tasks used in the study are indeed typical of those in real
world software comprehension. It has been defined previously what is meant by these typical
tasks – they are tasks that would commonly be encountered during the software
comprehension process. In the case of the two systems for which an expert produced
comprehension questions – BeautyJ and SHriMP – we can be confident that these questions
are typical as they were provided by people who are actively involved in comprehending and
maintaining these systems on an everyday basis. In the case of ArgoUML, where questions
were taken from documentation, we can again be confident of their validity as they are
intended to help developers in comprehending the system. Moreover, the questions from all
three sources are conveniently categorised according to the typical software comprehension
activities and questions presented in Chapter 5.
165
Secondly, it must be ensured that the implementation of the model in VANESSA is an
accurate representation of the approach, and does not add any overhead to inhibit the user or
indeed give them some undisclosed advantage. The implementation of VANESSA was
careful to adhere to these guidelines, hence minimising this potential threat to construct
validity.
7.3.3 External validity
External validity is concerned with the extent to which the results from this study can be
generalised [Bryman 1988]. There were three principal threats to external validity in this
study.
Firstly, it is possible that the comprehension questions were not sufficiently ‘typical’ to be
usefully representative of those that would be encountered in other studies. As described in
the previous section on construct validity, this risk was minimised firstly by gathering tasks
from actual software maintainers, and secondly by checking that these tasks conformed to
the typical activities observed in the literature.
Secondly, it is possible that the four systems studied are not typical of real world systems.
This typicality would be threatened by the type and size of the systems studied. This risk was
minimised by selecting a variety of system types – a graphical editor framework, a source
code processor, a visualisation tool, and a software modelling application – and sizes ranging
from 53 to 1165 core types8. While the system selection was limited to systems for which
expert information and source code was available, this was not overly restrictive as a great
deal of open source software is available that fulfils these criteria. Indeed, it could be argued
that the genre of open source software is particularly appropriate as it is widely subjected to
comprehension by a variety of maintainers.
Thirdly, it is likely that as the developer of the visualisation model and VANESSA tool, the
experimenter would have a better understanding of them than other users would and hence
may be more proficient in their use. As a result, other users using this approach to address
software comprehension tasks may be less successful. It was thought that this threat was
8 i.e. static types (interfaces, abstract classes, and concrete classes) that are not part of ancillary libraries
166
acceptable as being an expert with the approach allows the experimenter to exploit it to its
fullest potential thus providing the most accurate assessment of its utility, though at some
potential cost to reproducibility and comparison with the results of the initial study described
in Chapter 3.
7.4 Subject systems
7.4.1 JHotDraw
The core of the JHotDraw version 5.1 framework contains 125 types. The JavaDrawApp
extension consists of 124 of these classes (applet.DrawApplet is not included) plus 10 of its
own types, resulting in a total of 134 types. JHotDraw was created over 2-3 months by two
researchers; it has since become an open source project. The system expert has five years
software engineering experience, and has been working on JHotDraw for the same length of
time. The experimenter’s familiarity with this system was limited to a small reuse effort as
part of a coursework assignment four years ago, the original study three years ago, and more
recently during the manual application of the model described in Section 6.5. As in the
original study, mappings were provided by the system expert.
7.4.2 BeautyJ
BeautyJ is a Java source code transformation utility [Gulden 2004]. It performs auto-
formatting (beautification) of Java source code; input and output can be either standard
source code files or XML. The principal use cases of BeautyJ are therefore ‘Perform
beautification’ and ‘Read/write XML’. BeautyJ can be executed in batch mode from the
command line or via a GUI. The GUI was used in generating the trace to allow both use
cases to be exercised in a single trace. The trace execution consisted of:
1. Start BeautyJ
2. Configure BeautyJ to generate beautified source code from source code
3. Perform the beautification
4. Configure BeautyJ to output XML from source code
5. Perform the XML conversion
6. Exit BeautyJ
167
BeautyJ version 1.1 contains 53 types, excluding ancillary libraries which were omitted from
the analysis in order to focus on the core of the system. BeautyJ is three years old; during
that period, approximately four months were spent on its development. The expert was the
sole developer, who has eight years software engineering experience. The experimenter’s
experience with BeautyJ was limited to invoking it in batch mode from VANESSA to
generate XML representations of source code (as described in Section 6.6.1.). A screenshot
of BeautyJ is shown in Figure 7.1. The system’s package structure was used to identify class-
component mappings due to its functional structure. Component to use case mappings were
not available, though there would have been little interest in level 5 diagrams as the system
has only two use cases. The BeautyJ comprehension questions and answers were provided
by the BeautyJ expert.
Figure 7.1 A screenshot of the BeautyJ options dialogue
7.4.3 SHriMP
SHriMP is a visualisation system for hierarchical information [Storey 1995]. A number of
applications implementing the system have been produced, such as the Creole Eclipse plugin
[CHISEL 2005] for visualising Java code and the standalone SHriMP application [Storey
168
2001] for displaying graph based data. In the trace, the standalone SHriMP application was
used to explore a simple example program. The trace execution consisted of opening the
standalone SHriMP application, exercising the following use cases identified by the system
expert, then exiting the application:
1. Load a GXL [Winter 2002] file into SHriMP
2. Layout a node’s children in a tree
3. Search for and zoom to a particular node
4. Change the colour of an arc type
5. Export an SVG/HTML snapshot of the current view to a file
Standalone SHriMP version 2.1.11 contains 455 types, excluding ancillary libraries which
were not analysed. SHriMP has been under development for five years. The development
team typically consists of two or three developers – a mix of students and programmers. The
system expert was a programmer who has been maintaining SHriMP for four years and is
currently the sole maintainer. He has four years software engineering experience. Although
the experimenter was broadly familiar with ideas behind the approach, he had no prior
experience with SHriMP. A screenshot of SHriMP is shown in Figure 7.2. Mappings and
comprehension questions and answers were provided by the system expert.
Figure 7.2 A screenshot of the SHriMP application
169
7.4.4 ArgoUML
ArgoUML is a UML modelling application [ArgoUML 2005]. The trace consisted of starting
the application, creating a class, creating an association, and exiting the application.
ArgoUML version 0.18.1 contains 1165 types, excluding ancillary libraries which were not
analysed. ArgoUML has been under development for ten years; it has been an open source
project for the past six years. The experimenter had no prior experience with ArgoUML. A
screenshot of ArgoUML is shown in Figure 7.3. Class-component mappings were derived
from the package structure and documentation; component-use case mappings were not
available. The ArgoUML comprehension questions and answers were extracted from a
developers’ guide to the system edited by two of the system owners (the ArgoUML
Cookbook) [Tolke 2005].
Figure 7.3 A screenshot of ArgoUML
170
7.5 Findings
This section presents the findings from the evaluation. Comprehensive details are given in
the evaluation logbook in Appendix G.
7.5.1 Finding 1
Finding: VANESSA was able to address the majority of the tasks in the original study.
Justification: VANESSA’s performance of 79% in the replication
In the original study the performance of six visualisation tools was compared by assessing
their performance in addressing typical comprehension tasks in JHotDraw. This study
revealed that different tools were capable of answering different questions, and led to the
hypothesis that a tool that made use of structural and behavioural information in combination
with abstraction should be able to address the majority of these typical comprehension
questions. In order to investigate if this goal has been achieved, a replication of the original
study was performed using VANESSA9.
Eight use cases were identified by the system expert:
1. Add a new figure
2. Animate the drawing
3. Delete an existing figure
4. Edit an existing figure
5. Load or save a drawing
6. Print the drawing
7. Select an existing figure
8. Select a tool
The trace consisted of starting the JavaDrawApp JHotDraw application, executing these use
cases, then exiting the application. The mappings used were those provided by the system
expert in the original study.
9 As a replication was being performed, the original question sets from the original study described in Section 3.2 and 3.3 were used, not the revised question sets used in the remainder of this evaluation.
171
Here some examples of how VANESSA was used to answer the replication questions are
presented. The full replication is detailed in the evaluation logbook in Appendix G.
Large scale question 1: What is the static structure of the software system?
Level 2 of the Structure hierarchy illustrates inter-class information. To answer this question
the S2 view10 is generated, part of which is shown in Figure 7.4. As explained earlier, the
information in the diagrams presented in this evaluation could also be presented
conveniently in other diagram formats, such as UML.
Figure 7.4 A part of the S2 view of JavaDrawApp
Large scale question 4: How do the high-level components of the software system interact?
Level 3 of the Behaviour hierarchy illustrates component interaction information. To answer
this question the B3 view shown in Figure 7.5 is generated.
10 Individual views are referenced by the initial letter of their containing hierarchy, followed by their abstraction level. Thus, ‘S2’ refers to the view at level 2 of the Structure hierarchy (inter-class structure). Combinations are denoted by listing the views that they comprise, delimited by a slash. The ordering is not significant, but for sake of consistency the convention adopted here is to list the less abstract view first; where a combination consists of two views from different hierarchies, the Structure view is listed first. Thus, an S2/B3 combination refers to the combination of the level 2 Structure and level 3 Behaviour views. The word ‘custom’ is prepended to the name of a view or combination that has been focussed – i.e. does not contain all of the entities and relationships contained in the view(s) from which it is derived. Thus, a custom S1 view would be a subset of the level 1 Structure view.
172
Figure 7.5 The B3 view for JavaDrawApp
Small scale question 6: When debugging a JHotDraw application, it may be important to
examine the internal state of objects in the diagram. For example, in a class diagram
application, a Figure object representing a class would contain references to the attributes,
operations, and associations of the class it represents. In order to extract such information,
it is necessary to investigate the way in which an object’s state changes during the course of
an execution.
A B1 diagram is generated showing how RoundRectangleFigure changes when it is resized,
shown in Figure 7.6. It can be seen from this figure that
RoundRectangleFigure(id=1918).fDisplayBox changes value from Rectangle(id=1951) to
Rectangle(id=1960) after a call to RoundRectangleFigure.basicDisplayBox(Point, Point)
changes the display box, thus setting the new dimensions of the figure.
173
Figure 7.6 The custom B1 view of RoundRectangleFigure
7.5.1.1 Replication summary
Using VANESSA it was possible to answer 11 of the 14 questions. Of the three questions
that it was not possible to answer, two of these (detecting design patterns and hotspots) could
not be answered by any of the tools in the original study. The remaining question (pattern
detection) was answered by only one of the tools in the original study.
VANESSA’s large-scale performance was 6/9; the best result from the original study was
4/9, while the average was less than 3/9. VANESSA’s small-scale performance was 5/5; the
best result from the original study was 5/5 (one tool achieved this), while the average was
less than 3/5. This results in an overall performance of 79% for VANESSA; the best result
from the original study was 53%, while the average was 37%.
It is concluded from these results that the original proposal that a tool combining structural
and behavioural information with abstraction would be able to answer almost all of the case
174
study questions was indeed correct. This result supports the hypothesis that such a tool
should be able to address the full range of software comprehension questions. The evaluation
in the remainder of this section investigates this hypothesis further.
7.5.2 Finding 2
Finding: The VANESSA diagrams are correct and complete
Justification: The VANESSA diagrams are compared with diagrams from other authoritative
sources, i.e. diagrams produced by Together and by the system experts and from the system
documentation. Where there are differences, they are explained. VANESSA often uncovers
information missed by the expert, or conversely highlights errors in the expert diagram.
This evidence is taken from the comparison of VANESSA’s diagrams with the expert’s
diagrams. The second BeautyJ documentation diagram is entitled Main Classes and
illustrates structural information about the main classes of the application (Figure 7.7).
175
Figure 7.7 The BeautyJ documentation main classes diagram
The VANESSA custom S2/S3 combo showing only these classes (Figure 7.8) matches the
documentation diagram (it is assumed that the interface SourcletOption in the expert’s
diagram is a typo for SourcletOptions), with one exception: the VANESSA diagram does not
include an association Sourclet SourcletOptions. Inspecting the JavaDoc for these
interfaces shows that there is indeed no structural dependency Sourclet SourcletOptions;
the only structural interaction between these interfaces is that Sourclet.init() takes an
argument of type SourcletOptions. VANESSA does not consider method arguments when
forming relationships. It could be that the expert has chosen to create a dependency due to
this argument, though this is not normal practice in UML.
176
Figure 7.8 A custom S2/S3 view of BeautyJ
The Together class diagram for shrimp.DisplayBean (Figure 7.9) is compared to the
corresponding VANESSA custom S2 view (Figure 7.10). The diagrams correspond entirely.
Figure 7.9 The Together class diagram for the shrimp.DisplayBean package
Figure 7.10 The custom S2 view of the shrimp.DisplayBean package
177
7.5.3 Finding 3
Finding: The range of abstraction levels was comprehensive and useful for addressing the
tasks
Justification: It was useful to be able to examine the systems at a level of abstraction
appropriate to the task
When examining JHotDraw, level 5 information was used to compare VANESSA’s
interpretation of the system’s business level behaviour with that of the expert. The expert’s
use case diagram is shown in Figure 7.11. The expert’s diagram is compared with the
VANESSA level 5 structural and behavioural diagrams (Figure 7.12). It is clear from these
diagrams that the VANESSA model contains more relationships than the expert’s model,
probably due to the expert’s model being a more idealised view of the system as it was
designed, whereas the VANESSA model is more precise.
Figure 7.11 The JHotDraw expert’s use case diagram
178
Figure 7.12 The S5 and B5 views of JavaDrawApp
It is interesting to note that the S5 and B5 VANESSA diagrams are identical and that both
are totally connected – i.e. every business entity has both structural and behavioural
dependencies on every other business entity. This may indicate that the concept of business
entities defined here as being entities derived by abstraction from basic software entities does
not accord with the conventional notion of a use case as accepted by most analysts. Another
possibility is that the component-use case mappings supplied by the JHotDraw expert were
too liberal in that they mapped each class to many components. A further possibility is that
this is a quirk of frameworks in general or of JHotDraw in particular.
Level 4 information was not used as none of the four systems examined were distributed,
either logically (across JVMs) or physically (across processors).
In addressing the BeautyJ task “How are BeautyJ's main components interfaced to each
other statically, and how do they interact dynamically at runtime?” level 3 information was
used. The expert’s first diagram for this question illustrates static behavioural relationships
(Figure 7.13). The B3 diagram is generated to show behavioural relationships between
components (Figure 7.14).
179
Figure 7.13 The expert’s diagram of the static behavioural relationships between the BeautyJ
components
The VANESSA diagram matches the expert’s diagram, except that the BeautyJ AMODA
relationship is reversed and there is an extra communication from SourceParser Sourclet.
Some discrepancies are to be expected as the VANESSA diagram is based on dynamic
information. (One of the expert’s subsequent diagrams confirms that there should indeed be
a relationship from AMODA BeautyJ as there is in the VANESSA diagram.)
180
Figure 7.14 The B3 view of BeautyJ
Level 2 information was used extensively in the analysis of ArgoUML, such as in addressing
the tasks in Cookbook section 5.2. For example, in addressing the task “How do I create a
new critique?”, the likely-sounding cognitive.critics package was investigated first by
generating a custom S2 view (Figure 7.15).
Figure 7.15 The custom S2 view of the cognitive.critics package
This package contains the Critic class and the CompoundCritic class which extends Critic.
However, it does not appear to contain any actual critics, which it is assumed would extend
one of these classes and have an appropriate name. As we are dealing with UML, the uml
package is examined and it is found that it also contains a cognitive.critics package. This
package is investigated by generating a custom S2 view. From this, we find that this package
181
contains 92 classes with names in the form CrXXXX, where XXXX is a potential problem
with a UML diagram, for example CrCircularInheritance, CrEmptyPackage, CrIllegalName.
All of these classes extend CrUML, which in turn extends cognitive.critics.Critic. To add a
new critic for UML a new class in uml.cognitive.critics would be created called, say,
CrMyCritic to comply with the naming convention, that extends CrUML.
Level 1 Behaviour information was used to examine the state changes occurring in
JHotDraw when addressing the following task (see Figure 7.6 in Finding 1, above).
“When debugging a JHotDraw application, it may be important to examine the internal state
of objects in the diagram. For example, in a class diagram application, a Figure object
representing a class would contain references to the attributes, operations, and associations
of the class it represents. In order to extract such information, it is necessary to investigate
the way in which an object’s state changes during the course of an execution.”
Level 1 Structure information was used to investigate the functionality of individual classes
in ArgoUML when addressing the task “How do I create a pluggable diagram?”.
application.api contains an interface called PluggableDiagram, for which a custom S1 view
is generated (Figure 7.16).
Figure 7.16 The custom S1 view of PluggableDiagram
This diagram shows that PluggableDiagram contains one method: JMenuItem
getDiagramMenuItem(). The S2 view of ArgoUML shows that PluggableDiagram is
implemented by only one class: DiagramHelper. DiagramHelper extends ui.ArgoDiagram. It
appears that pluggable diagrams are diagram classes that implement PluggableDiagram.
Therefore, the new class would extend an appropriate parent class, such as
uml.diagram.ui.UMLDiagram for a new type of UML diagram, and also implement
PluggableDiagram.
182
7.5.4. Finding 4
Finding: The ability to navigate between abstraction levels was useful in addressing the tasks
Justification: It was useful to be able to drill down to show more detail, and to move up the
abstraction hierarchy to examine context
In examining BeautyJ, drilling down was used from the level 3 diagram showing the
component behaviour generated while addressing the task “How are BeautyJ's main
components interfaced to each other statically, and how do they interact dynamically at
runtime?” (see Figure 7.14 in Finding 3, above) to show the class structure (level 2) of the
javasource package (Figure 7.17) in order to address the task “Which steps need to be taken
to make BeautyJ capable of handling new features of the Java 1.5 language?”. It is apparent
from the S2 view of BeautyJ that the classes of the util.javasource package are used to
represent parsed Java code in BeautyJ; hence, changes to the language would be
accommodated in this package.
183
Fi
gure
7.1
7 A
cus
tom
S2
view
of B
eaut
yJ
18
4
In the case of ArgoUML, the analyst moved up the abstraction hierarchy from the S1 view of
application.api.PluggableDiagram generated while addressing the task “How do I create a
pluggable diagram?” (see Figure 7.16 in Finding 3, above) to the S2 view of the
application.api package (Figure 7.18) in order to address the task “How do I create a new
pluggable type?”.
Figure 7.18 The custom S2 view of the Pluggable types from application.api
Figure 7.18 illustrates the Pluggable interface and eight classes that implement it. It appears
that to create a new pluggable type a new class that implements Pluggable must be created,
called say PluggableNew to comply with the naming conventions. The cookbook explains
that calls must also be added to the new pluggable type in the context in which it is to be
used – this involves editing method internals. This would only be known to someone with an
in-depth understanding of the system.
7.5.5 Finding 5
Finding: The ability to combine abstraction levels was useful in addressing the tasks
Justification: It was useful to be able to display more detailed or contextual information in a
single view
In addressing the BeautyJ task “How are BeautyJ's main components interfaced to each
other statically, and how do they interact dynamically at runtime?” combined S2/S3
185
diagrams were used to illustrate the class and component dependencies (see Figure 7.8 in
Finding 2, above).
7.5.6 Finding 6
Finding: The structural and behavioural facets were useful in addressing the tasks
Justification: It was useful to be able to focus the analysis on the relevant facet of the system.
In the analysis of ArgoUML, it was useful to be able to focus the analysis on the structural
facet as the tasks mainly involved investigating the existing functionality available, rather
than the system’s behaviour. For example, when comparing the VANESSA diagram to the
expert’s diagram to explore the concept of multi editor panes, a custom S2 view of the
relevant classes was used (Figure 7.19).
Figure 7.19 The custom S2 view of the multi editor pane classes
In addressing the BeautyJ task “How are textual fragments of Javadoc documentation
automatically generated by the StandardSourclet?”, it was useful to be able to focus the
analysis on the behavioural facet, as we were concerned solely with the existing behaviour of
the system as implemented. The VANESSA custom B2 view of StandardSourclet is shown
in Figure 7.20. This figure shows the methods executed by StandardSourclet during the
output generation process, and the order in which they occur. It would be trivial to present
this information in another form, such as a sequence diagram.
186
Fi
gure
7.2
0 Th
e cu
stom
B2
view
of S
tand
ardS
ourc
let’s
inte
ract
ions
18
7
7.5.7 Finding 7
Finding: The ability to combine facets was useful in addressing the tasks
Justification: It was useful to be able to be able to show both structural and behavioural
information in the same diagram
In addressing the BeautyJ task “How are BeautyJ's main components interfaced to each
other statically, and how do they interact dynamically at runtime?”, level 3 information from
both facets was combined to illustrate the structural and behavioural interactions between the
components of BeautyJ (Figure 7.21).
Figure 7.21 The combined S3/B3 view of BeautyJ
188
7.5.8 Finding 8
Finding: The use of static and dynamic analyses was useful in addressing the tasks
Justification: It was useful to have information about the entire system from static analysis,
and more detailed information about specific parts of interest from dynamic analysis
The statically generated VANESSA S2 view, generated when addressing the BeautyJ task
“Which steps need to be taken to make BeautyJ capable of handling new features of the Java
1.5 language?” (see Figure 7.17 in Finding 4, above) shows the de.gulden.util.javasource
package from BeautyJ. This is useful as it allows the entire functionality available to be
examined. In contrast, when comparing the VANESSA and Together diagrams for SHriMP,
it can be seen that the VANESSA B2 view for GXLPersistentStorageBean.loadData()
(Figure 7.23) is more precise than the corresponding sequence diagram produced by
Together (Figure 7.22), in that it refers to the specific objects accessed during the execution,
rather than to their superclasses as the static Together diagram does. The VANESSA
diagram matches the Together diagram entirely, except for two missing calls in the
VANESSA diagram. GXLPersistentStorageBean GXLPersistentStorageBean.getNextID()
and GXLPersistentStorageBean GenericRigiArc.setCustomizedData() are absent as they
are contained within conditionals that were evidently not executed in the trace. This is useful
as the dynamic diagram can be used to see which parts of the static diagram are executed in
the trace.
189
Figure 7.22 The Together sequence diagram for GXLPersistentStorageBean.loadData()
190
Fi
gure
7.2
3 A
cus
tom
B2
view
of S
HriM
P
19
1
7.5.9 Finding 9
Finding: More diagram types may be useful
Justification: Some information is better displayed using diagrams other than basic graphs
Tasks that would benefit from a more explicit ordering of the answer, such as the following
JHotDraw task, would be better expressed using a representation ordered explicitly by time,
such as a sequence diagram:
“A common problem in JHotDraw applications is the display not being updated as desired
when a change is made to the model. For example, attempting to move a box (Figure) in an
organisation chart application may not be reflected in the display. To understand this
problem, it is necessary to investigate the redraw mechanism of JHotDraw. The redraw
mechanism is an interaction consisting of a sequence of object collaborations.”
Part of the VANESSA B2 view is shown in Figure 7.24. In this diagram, time is indicated by
the numbers after the method calls, which indicate their ordering. As the ordering of the
edges in the diagram is arbitrary (optimised to avoid node/edge overlap, point edges in the
same direction, minimise edge crossings, and reduce edge length), the order in which the
method invocations occurred is not immediately obvious. In a time-ordered diagram, the
edges would be drawn in order of their numbers, thus making the ordering obvious to the
viewer. This would, of course, be an alternative to the present layout and as such would not
feature the overlap-avoidance, monodirectionality, and minimal edge crossings and edge
lengths of the current layout. Hence, it is important to select a layout that is appropriate both
for the data being displayed and the purpose for which the visualisation is being produced.
The generic nature of the model and the modular implementation of VANESSA make
plugging in alternative visualisations, such as UML diagrams, a straightforward process.
192
Figure 7.24 A part of the custom B2 view
7.5.10 Finding 10
Finding: Levels 1-3 were used most commonly. Level 5 was used less often. Level 4 was not
used.
Justification: The frequency of employing each level in addressing the tasks.
This is apparent from the detail in the evaluation logbook in Appendix G. Table 7.1
quantifies this.
193
Table 7.1 Instances of usage of each of the five abstraction levels of the model in addressing the
comprehension tasks
Instances of usage11 Abstraction
level Replication12 Comprehension
tasks13
Diagram
comparisons14 Total
5 - - 1 1
4 - - - -
3 2 3 8 13
2 7 26 12 45
1 2 19 - 21
Figure 7.25 illustrates these results.
0
10
20
30
40
50
Number of tasks
1 2 3 4 5
Abstraction level
Replication Comprehension tasks Diagram comparisons Total
Figure 7.25 Illustration of abstraction level usage
11 Count of how many tasks each abstraction level was useful in addressing. Hence, where an abstraction level was used more than once in addressing a particular task, this only contributes one to its score. Where a combination was used, this contributes one to the score of each participant view 12 The replication of the original study using JHotDraw 13 The comprehension tasks provided by the system experts in the case of BeautyJ and SHriMP, and from the Cookbook for ArgoUML 14 The diagram comparisons of the VANESSA diagrams with diagrams from the authoritative sources (documentation or system expert); the comparisons with Together diagrams are counted in the ‘Comprehension tasks’ column as these pertain specifically to tasks G1 and G2.
194
7.5.11 Finding 11
Finding: The model proposed is capable of addressing the full range of software
comprehension tasks.
Justification: VANESSA was used successfully to address tasks representative of the full
range of software comprehension tasks.
VANESSA was able to address all of the comprehension tasks posed by the system experts
(the Cookbook in the case of ArgoUML) and by the diagrams from the documentation and
Together, except for two ArgoUML tasks concerning a non-code artefact.
The categorisation of the evaluation questions according to the typical software
comprehension questions identified earlier is given in the following tables.
Table 7.2 Categorisation of BeautyJ evaluation questions by typical software comprehension
questions
Evaluation question Typical question
Expert 1 G4
Expert 2 G7
Expert 3 S6
Expert 4 S1, S4
Expert 5 S4
Documentation 1 G3
Documentation 2 G1
Documentation 3 G1
Documentation 4 G1
Together 1 G1
Together 2 G2
This table shows that the set of tasks used in the BeautyJ application exercises almost all of
the typical comprehension questions. Only G5 (identifying patterns of repeated behaviour,
which VANESSA does not support), and G6 (component runtime load, which is supported
by VANESSA) were not present.
195
Table 7.3 Categorisation of SHriMP evaluation questions by typical software comprehension
questions
Evaluation question Typical question
Expert 1 G3/G4
Expert 2 S4
Together 1 G1
Together 2 G2
This table shows that half of the typical software comprehension questions were addressed
by the tasks performed in the SHriMP evaluation. This was due to the low number of
questions, which was due primarily to the lack of documentation similar to that provided for
the other three systems and the time constraints of the system expert.
Table 7.4 Categorisation of ArgoUML evaluation questions by typical software comprehension
questions
Evaluation question Typical question
Cookbook 4.4 G3/G4
Cookbook 4.5 G3/G4
Cookbook 4.6 G3/G4
Cookbook 5.1.3.2.1 S4
Cookbook 5.1.3.2.2 S4
Cookbook 5.1.3.2.3 S4
Cookbook 5.1.5 i S4
Cookbook 5.1.5 ii S4
Cookbook 5.1.5 iii S4
Cookbook 5.2.1 G1
Cookbook 5.2.2 i S4
Cookbook 5.2.2 ii S4
Cookbook 5.2.3 G1
Cookbook 5.3.1 G1
Cookbook 5.3.1.1 S4
Cookbook 5.3.2 S4
Cookbook 5.3.3 S4
Cookbook 5.4.1 S4
Cookbook 5.9 G3/G4
196
Cookbook 5.11.2.1 i S4
Cookbook 5.11.2.1 ii S4
Cookbook 5.17.4 S4
Cookbook 6.1.1 i S4
Cookbook 6.1.1 ii S4
Cookbook 6.2.2.6 S4
Cookbook 6.2.3.2 i S4
Cookbook 6.2.3.2 ii S4
Cookbook 6.2.3.2 iii S4
Together 1 G1
G2 Together 2
This table shows that the comprehension questions in the ArgoUML study exercise half of
the typical software comprehension questions. The focus on S4 (“Where is the functionality
required to implement a solution located in the software system?”) questions is to be
expected given that the source of the questions was a developers’ handbook intended to
explain how to reuse, extend, and interoperate with the application.
7.5.12 Miscellaneous issues
The VANESSA diagrams were verified by comparison with diagrams from a reliable source
(Together). VANESSA uncovered multiple errors in the expert-supplied information for the
two most thoroughly-analysed systems (BeautyJ and ArgoUML). In addition to these errors,
there were discrepancies between some diagrams that were not errors on the part of
VANESSA or the expert. In these cases, the expert had chosen to represent a simpler, purer,
or ‘as-designed’ version of the system, while VANESSA represents a more detailed view
with all information that describes the system as implemented. This raises the question: is it
more useful to be precise or to highlight the important relationships? For example, the
representation of arguments as associations and the omission of minor relationships for
BeautyJ, discrepancies in component diagrams for the other systems, and discrepancies in
the use case diagram for JHotDraw. As only a human can properly decide what is important,
it is better to be precise and then allow the human analyst to abstract further by generating
their own custom diagrams for their own particular purposes, rather than automatically
making potentially misleading assumptions.
197
In addition to the types of discrepancies described above, there were also some discrepancies
that were attributable to the greater precision of the VANESSA behavioural diagrams due to
their basis on dynamic rather than static information.
Some diagram discrepancies were due to the use of unparameterised collections – the system
expert would indicate a composition between the class holding the reference and the class
that he intends to be stored in that collection. However, in practice any type of object can be
stored in an unparameterised collection, and VANESSA quite correctly represents such
dependencies as a composition between the class holding the reference and the collection
class. If parameterised collections were used VANESSA would indicate a dependency to the
parameterised type.
As is the practice in UML, VANESSA does not represent method arguments as
dependencies between classes (only the BeautyJ expert chose to do so). This is also the
approach taken by Together.
There is occasional uncertainty in attributing the precise reason for some discrepancies
between the VANESSA and expert diagrams. This is inherent in the approach as one cannot
predict what the expert intended when he committed the error. The most likely explanation
for the discrepancy is offered.
JavaDoc was used only to verify hypotheses, not to gather new information about the
systems. Explanatory comments in the JavaDoc from the developers were not referred to – it
was used solely as an alternative to browsing the source code directly in order to verify
results (e.g. check the existence of a particular instance variable, ensure VANESSA has
covered all available methods).
A potential weakness in the VANESSA approach is the dependence on expert-supplied
mappings for some systems. This was particularly evident in the analysis of ArgoUML when
only partial class-component mappings were available – the information in the Cookbook
was incomplete and did not indicate which classes should be mapped to a number of
components. This issue can often be circumvented by determining the mappings from the
package structure, as was done for BeautyJ, or source code naming conventions.
Component-use case mappings can always be determined by tracing of individual use cases.
198
7.5.13 Conclusions
The first conclusion that can be drawn from this evaluation is that the model verifies the
conclusion from the original study described in Chapter 3 that a model combining a range of
abstraction levels, structural and behavioural views, and statically and dynamically extracted
information would be useful in addressing almost all of the typical software comprehension
tasks in that study. This was ratified through the replication of the original study.
The second conclusion that can be stated, indeed the principal goal of this evaluation, is that
the research hypothesis stated in Section 4.2 has been validated: the model provides
effective support for the full range of software comprehension tasks. This has been
demonstrated though the evaluation of the model using four real world systems of various
types and sizes and real comprehension questions supplied by real life software maintainers.
Furthermore, these questions exercise the full range of typical software comprehension tasks
identified in Chapter 5.
199
8 Conclusions
“We do not need to have an infinity of different machines doing different jobs. A single one
will suffice”
A M Turing [Turing 1948]
8.1 Summary
This thesis has presented a novel software visualisation model consisting of multiple levels
of abstraction, structural and behavioural perspectives, and the integration of statically and
dynamically extracted information that addresses the full range of software comprehension
tasks. Related work in the fields of software visualisation, tool evaluation, abstraction,
diagrams, views, exploration and querying, metamodels, and software modelling was
discussed. An initial case study that prompted the development of the novel model was
described. The model was introduced and assessed theoretically against its original goals,
and its support for software comprehension strategies was examined. Abstraction operations
between views in the model and the combination of views were defined formally. A
demonstration of the application of the model to a real system was presented. VANESSA, a
tool implementation of the model, was introduced. VANESSA was then used to evaluate the
utility of the model in addressing typical software comprehension tasks in real world
software systems. This section draws conclusions from the thesis and presents suggestions
for future work.
200
8.2 Conclusions
The foregoing evaluation has demonstrated the veracity of the hypothesis stated in this
thesis, namely that:
A model that supports visualisation of software through a range of abstraction levels
that incorporate structural and behavioural views and integrates statically and
dynamically extracted information provides effective support for the full range of
software comprehension tasks.
The contributions of this thesis are as follows: an abstraction scale and set of criteria for
classifying software comprehension tools; a thorough review and comparison of the extant
software visualisation tools; typical software comprehension activities and tasks to be used
in the evaluation of software comprehension tools; a schema for categorising view
arrangements in software engineering tools; the findings of an initial study assessing the
capabilities of the extant software visualisation tools using typical software comprehension
tasks; the novel software visualisation model based on a range of abstraction levels and
structural and behavioural perspectives; a prototype implementation of the model as the
VANESSA tool; and the findings of the evaluation of this model using real software
comprehension tasks and real systems.
8.3 Future work
The use of the typical comprehension questions defined in this thesis in future evaluations
would provide a common basis for real-world evaluation, and hence allow the results of such
studies to be compared objectively. Such studies would also validate the utility of the
Further evaluation of the model with additional systems would provide more evidence
regarding its support for software comprehension. Empirical studies with users would
provide a perspective on how easy analysts find it to use the model to achieve their
comprehension goals. An evaluation of the novel model involving industrial users would
provide valuable data as the motivation for this work was the lack of use of software
visualisation in industry. It would also be interesting to investigate in more detail which
software comprehension strategies are best supported by the model.
201
question sets. Further empirical studies to compare software visualisation tools, like the
initial study described in Chapter 3 of this thesis, would provide useful information on the
relative merits of such tools.
A further evaluation possibility would be to evaluate VANESSA using even larger systems,
such as Eclipse (11,548 types), to assess how well the benefits to comprehension
demonstrated in the evaluation scale to the very largest systems. As with the model itself, an
evaluation employing real users would provide an insight into how usable the tool was in
practice by those who were unfamiliar with it.
A number of enhancements could be made to VANESSA to expand its functionality. Firstly,
the model could be stored on disk in a database, rather than entirely in memory, to improve
performance and aid scalability. The Metadata Repository format would be suitable for this
task [MDR 2005]. Use of this repository would also allow VANESSA to exchange models in
a standard XML format. Secondly, reading and writing traces in a standard (compressed)
format would allow further interoperability and aid scalability. Thirdly, the implementation
of new diagram types, such as UML and SHriMP views, would provide alternative views of
the model data. The model was designed from the start to allow alternative views to be
plugged in. The implementation of VANESSA makes it simple to plug in such alternative
views. Fourthly, integrating the graph rendering functionality into VANESSA, rather than
relying on external viewers such as dotty, would improve user interaction with the graphs
and simplify navigation of the model. A framework such as JUNG [O’Madadhain 2005] or
prefuse [Heer 2004] would be useful for this purpose. Fifthly, incorporating some standard
pattern detection algorithms into VANESSA would allow it to detect common patterns in the
model and exploit these with appropriate visual cues (cf. Jinsight). The addition of pattern
detection to the current implementation would simply be a matter of implementing standard
algorithms to operate on the model data, which is conveniently accessible.
One of the principal barriers to the uptake of state of the art software engineering tools in
industry is lack of integration with a common development environment. The motivation for
this research was the lack of use of software visualisation in industry despite its apparent
benefits, which have been demonstrated in this thesis. A potential enhancement to
VANESSA to encourage usage would be integration with a standard environment, such as
Eclipse [Eclipse 2005].
202
References
[Addanki 2001] S Addanki, R Cremonini, J Penberthy, ‘Graphs of models’, Artificial
Intelligence, 51:145-177, 1991
[Aho 1986] A V Aho, R Sethi, J D Ullman, Compilers: Principles, Techniques,
and Tools, Reading, MA: Addison-Wesley, 1986
[Alexandridis 1986] N A Alexandridis, ‘Adaptable hardware and software: problems and
solutions’, IEEE Computer, 19(2):29-39, 1986
[Baldonado 2000] M Q W Baldonado, A Woodruff, A Kuchinsky, ‘Guidelines for
using multiple views in information visualization’ in Proceedings of
the Working Conference on Advanced Visual Interfaces (AVI),
Palermo, pp. 110-119, New York, NY: ACM Press, 2000
[ANSI 1998] ANSI, Information Systems – Database Language – SQL,
Document # ANSI INCITS 135-1992 (R1998), Washington,
DC: ANSI, 1998
[Apache 2004] Apache Software Foundation, Xalan-Java,
http://xml.apache.org/xalan-j/, 2004
[ArgoUML 2005] ArgoUML team, ArgoUML project home, http://argouml.tigris.org/,
2005
[Armstrong 1998] M N Armstrong, C Trudeau, ‘Evaluating architectural extractors’ in
Proceedings of the 5th Working Conference on Reverse Engineering
(WCRE), Honolulu, HA, pp. 30-39, Los Alamitos, CA: IEEE
Computer Society Press, 1998
[Arnold 2000] K Arnold, J Gosling, D Holmes, The Java Programming Language,
3rd edition, Boston, MA: Addison Wesley, 2000
[Arnold 2003] M Arnold, W De Pauw, ‘Websight: visualizing the execution of web
services’, demonstration at 1st ACM Symposium on Software
Visualization, San Diego, CA, 2003
[Baker 1994] M J Baker, S G Eick, ‘Visualizing software systems’ in Proceedings
of the 16th International Conference on Software Engineering
(ICSE), Sorrento, pp. 59-67, Los Alamitos, CA: IEEE Computer
Society Press, 1994
203
[Ball 1994] T Ball, S G Eick, ‘Visualizing program slices’ in Proceedings of the
IEEE Computer Society Symposium on Visual Languages (VL), St.
Louis, MO, pp. 288-295, Los Alamitos, CA: IEEE Computer
Society Press, 1994
[Ball 1996] Ball T, Eick S G, ‘Software visualization in the large’, IEEE
Computer, 29(4):33-43, 1996
[Basili 1996] V Basili, ‘Editorial’, Empirical Software Engineering, 1(2):105-108,
1996
[Bassil 2001b] S Bassil, R K Keller, ‘A qualitative and quantitative evaluation of
software visualization tools’ in Proceedings of the Workshop on
Software Visualization, 23
Engineering (ICSE), Toronto, ON, pp. 33-37, Los Alamitos, CA:
IEEE Computer Society Press, 2001
[Bassil 2001a] S Bassil, R K Keller, ‘Software visualization tools: survey and
analysis’ in Proceedings of the 9th International Workshop on
Program Comprehension (IWPC), Toronto, ON, pp. 7-17, Los
Alamitos, CA: IEEE Computer Society Press, 2001
rd International Conference on Software
[Beck 1994] K Beck, R Johnson, ‘Patterns generate architectures’, in
Proceedings of the 8th European Conference on Object-Oriented
Programming (ECOOP), Bologna, Lecture Notes in Computer
Science 821, pp. 139-149, Berlin: Springer-Verlag, 1994
[Becker 1995] R A Becker, S G Eick, A R Wilks, ‘Visualizing network data’, IEEE
Transactions on Visualization and Computer Graphics, 1(1):16-28,
1995
[Bellay 1997] B Bellay, H Gall, ‘A comparison of four reverse engineering tools’
in Proceedings of the 4th Working Conference on Reverse
Engineering (WCRE), Amsterdam, pp. 2-11, Los Alamitos, CA:
IEEE Computer Society Press, 1997
[Bellay 1998] B Bellay, H Gall, ‘An evaluation of reverse engineering tool
capabilities’, Journal of Software Maintenance: Research and
Practice, 10(5):305-331, 1998
[Berard 1993] Berard E V, ‘Abstraction, encapsulation, and information hiding’ in
E Berard, Essays on Object-Oriented Software Engineering, Vol. 1,
Englewood Cliffs, NJ: Prentice-Hall, 1993
204
[Bergey 1999] J Bergey, D Smith, N Weiderman, S Woods, Options Analysis for
Reengineering (OAR): Issues and Conceptual Approach, Technical
Note CMU/SEI-99-TN-014, Carnegie Mellon Software Engineering
Institute, 1999
[Bloch 2001] J Bloch, Effective Java: Programming Language Guide, Cambridge,
MA: Addison-Wesley, 2001, Item 17
[Brady 1974] M Brady, The Monopoly Book: Strategy and Tactics of the World’s
Most Popular Game, New York, NY: David McKay, 1974
[Burd 2002] E Burd, D Overy, A Wheetman, ‘Evaluating using animation to
improve understanding of sequence diagrams’ in Proceedings of the
10 nternational Workshop on Program Comprehension (IWPC),
[Bertuli 2003] R Bertuli, S Ducasse, M Lanza, ‘Run-time information visualization
for understanding object-oriented systems’, paper presented at 4th
International Workshop on Object-Oriented Reengineering,
Darmstadt, 2003
[Booch 1994] G Booch, Object-Oriented Design with Applications, 2nd ed.,
Redwood City, CA: Benjamin Cummings, 1994
[Borland 2004a] Borland, Borland Together, http://www.borland.com/together/, 2004
[Borland 2004b] Borland, Together Technologies: Simplify and Accelerate the
Success of your Applications,
http://www.borland.com/together/index.html, 2004
[Boyer 1977] R Boyer, J Moore, ‘A fast string searching algorithm’,
Communications of the ACM, 20(10):762-772, 1977
[Brant 1998] J Brant, B Foote, R E Johnson, D Roberts, ‘Wrappers to the rescue’
in Proceedings of the 12th European Conference on Object-
Oriented Programming (ECOOP), Brussels, Lecture Notes in
Computer Science 1445, pp. 396-417, Berlin: Springer-Verlag, 1998
[Brooks 1983] R Brooks, ‘Towards a theory of the comprehension of computer
programs’, International Journal of Man-Machine Studies, 18:543-
554, 1983
[Bryman 1988] A Bryman, Quantity and Quality in Social Research, London:
Unwin Hyman, 1988, pp. 30-31
[Budd 1987] T Budd, A Little Smalltalk, Reading, MA: Addison-Wesley, 1987
[Buhr 1996] R Buhr, R Casselman, Use Case Maps for Object-Oriented Systems,
Englewood Cliffs, NJ: Prentice-Hall, 1996
th I
205
Paris, pp. 107-113, Los Alamitos, CA: IEEE Computer Society
Press, 2002
[Chen 1990] Y-F Chen, M Y Nishimoto, C V Ramamoorthy, ‘The C information
abstraction system’, IEEE Transactions on Software Engineering,
16(3):325-334, 1990
[Burkhardt 1997] J-M Burkhardt, F Détienne, S Wiedenbeck, ‘Mental representations
constructed by experts and novices in object-oriented program
comprehension’ in Proceedings of the 6th IFIP International
Conference on Human-Computer Interaction (INTERACT), Sydney,
NSW, pp. 339-346, Amsterdam: North Holland, 1997
[Burkhardt 1998] J-M Burkhardt, F Détienne, S Wiedenbeck, ‘The effect of object-
oriented programming expertise in several dimensions of
comprehension strategies’ in Proceedings of the 6th International
Workshop on Program Comprehension (IWPC), Ischia, pp. 82-89,
Los Alamitos, CA: IEEE Computer Society Press, 1998
[Campbell 1993] R H Campbell, N Islam, D Raila, P Madany, ‘Designing and
implementing Choices: an object-oriented system in C++’,
Communications of the ACM, 36(9):117-126, 1993
[Chan 2003] K Chan, Z C L Liang, A Michail, ‘Design recovery of interactive
graphical applications’ in Proceedings of the 25th International
Conference on Software Engineering (ICSE), Portland, OR, pp.
114-124, Los Alamitos, CA: IEEE Computer Society Press, 2003
[Chase 1996] M P Chase, D R Harris, S N Roberts, A S Yeh, ‘Analysis and
presentation of recovered software architectures’ in Proceedings of
the 3rd Working Conference on Reverse Engineering, Monterey, CA,
pp. 153-162, Los Alamitos, CA: IEEE Computer Society Press,
1996
[Chen 1977] P P Chen, The Entity-Relationship Approach to Logical Database
Design, Wellesley, MA: QED Information Sciences, 1977
[Chidamber 1994] S R Chidamber, C F Kemerer, ‘A metrics suite for object-oriented
design’, IEEE Transactions Software Engineering, 20(6):476-493,
1994
[Chikofsky 1990] E J Chikofsky, J H Cross II, ‘Reverse engineering and design
recovery: a taxonomy’, IEEE Software, 7(1):13-17, 1990
206
[CHISEL 2005] The CHISEL Group, SHriMP Suite,
http://sourceforge.net/projects/chiselgroup/, 2005
[Chuah 1997] M C Chuah, S G Eick, ‘Glyphs for software visualization’ in
Proceedings of the 5th International Workshop on Program
Comprehension (IWPC), Dearborn, MI, pp. 183-191, Los Alamitos,
CA: IEEE Computer Society Press, 1997
[Citrin 1995] W Citrin, A Cockburn, J von Kanel, R Hauser, ‘Using formalised
temporal message-flow diagrams’, Software – Practice and
Experience, 25(12):1367-1401, 1995
[Clark 1976] J H Clark, ‘Hierarchical geometric models for visible surface
algorithms’, Communications of the ACM, 19(10):547-554, 1976
[Codenie 1997] W Codenie, K De Hondt, P Steyaert, A Vercammen, ‘From custom
applications to domain-specific frameworks’, Communications of
the ACM, 40(10):70-77, 1997
[Consens 1993] M Consens, A Mendelzon, ‘Hy+: a hypergraph-based query and
visualization system’ in Proceedings of the ACM SIGMOD
International Conference on Management of Data, Washington,
DC, SIGMOD Record 22(2):511-516, New York, NY: ACM Press,
1993
[Cook 1995] J E Cook, A L Wolf, ‘Automated process discovery through event-
data analysis’ in Proceedings of the 17th International Conference
on Software Engineering (ICSE), Seattle, WA, pp. 73-82, Los
Alamitos, CA: IEEE Computer Society Press, 1995
[Coplien 1995] J O Coplien, D C Schmidt, Pattern Languages of Program Design,
Reading, MA: Addison-Wesley, 1995
[Corritore 1999] C L Corritore, S Wiedenbeck, ‘Mental representations of expert
procedural and object-oriented programmers in a software
maintenance task’, International Journal of Human-Computer
Studies, 50(1):61-83, 1999
[Corritore 2000] C L Corritore, S Wiedenbeck, ‘Direction and scope of
comprehension-related activities by procedural and object-oriented
programmers: an empirical study’ in Proceedings of the 8th
International Workshop on Program Comprehension (IWPC),
Limerick, pp. 139-148, Los Alamitos, CA: IEEE Computer Society
Press, 2000
207
[Cox 1996] K C Cox, S G Eick, T He, ‘3D geographic network displays’,
SIGMOD Record, 25(4):50-54, 1996
[Cross 1992] J H Cross II, E J Chikofsky, C H May Jr., ‘Reverse engineering’,
Advances in Computers, 35:199-254, 1992
[Cytron 1991] R Cytron, J Ferrante, B K Rosen, M N Wegman, F K Zadeck,
‘Efficiently computing static single assignment form and the control
dependence graph’, ACM Transactions on Programming Languages
and Systems, 13(4):451-490, 1991
[De Hondt 1998] K De Hondt, A Novel Approach to Architectural Recovery in
Evolving Object-Oriented Systems, PhD thesis, Brussels: Vrije
Universiteit Brussel, 1998
[De Pauw 1993] W De Pauw, R Helm, D Kimelman, J Vlissides, ‘Visualizing the
behaviour of object-oriented systems’ in Proceedings of the 8th
Conference on Object-Oriented Programming, Systems, Languages,
and Applications (OOPSLA), Washington, DC, pp. 326-337, New
York, NY: ACM Press, 1993
[De Pauw 1994] W De Pauw, D Kimelman, J Vlissides, ‘Modelling object-oriented
program execution’ in Proceedings of the 8th European Conference
on Object-Oriented Programming (ECOOP), Bologna, Lecture
Notes in Computer Science 821, pp. 163-182, Berlin: Springer-
Verlag, 1994
[De Pauw 1998] W De Pauw, D Lorenz, J Vlissides, M Wegman, ‘Execution patterns
in object-oriented visualization’ in Proceedings of the 4th USENIX
Conference on Object-Oriented Technologies and Systems
(COOTS), Santa Fe, NM, pp. 219-234, Berkeley, CA: USENIX
Association, 1998
[De Pauw 1999] W De Pauw, G Sevitsky, ‘Visualizing reference patterns for solving
memory leaks in Java’ in Proceedings of the 13th European
Conference on Object-Oriented Programming (ECOOP), Lisbon,
Lecture Notes in Computer Science 1628, pp. 116-134, Berlin:
Springer-Verlag, 1999
[De Pauw 2000] W De Pauw, G Sevitsky, ‘Visualizing reference patterns for solving
memory leaks in Java’, Concurrency: Practice and Experience,
12(14):1431-1454, 2000
208
[De Pauw 2001] W De Pauw, N Mitchell, M Robillard, G Sevitsky, H Srinivasan,
‘Drive-by analysis of running programs’, paper presented at
Workshop on Software Visualization, 23rd International Conference
on Software Engineering (ICSE), Toronto, ON, 2001
[De Pauw 2002] W De Pauw, E Jensen, N Mitchell, G Sevitsky, J Vlissides , J Yang,
‘Visualizing the execution of Java programs’ in Proceedings of the
International Seminar on Software Visualization, Dagstuhl Castle,
Wadern, pp. 151-162, Lecture Notes in Computer Science 2269,
Berlin: Springer-Verlag, 2002
[Dijkstra 1968] E W Dijkstra, ‘Cooperating sequential processes’ in F Genuys (ed.),
Programming Languages, New York, NY: Academic, 1968
[Ducasse 2001] S Ducasse, M Lanza, ‘Towards a methodology for the
understanding of object-oriented systems’, Techniques et Sciences
Informatiques, 20(4):539-566, 2001
[de Vaus 1996] D A de Vaus, Surveys in Social Research, 4th edition, London: UCL
Press, 1996, pp. 56-57
[Demeyer 1998] S Demeyer, ‘Analysis of overridden methods to infer hot spots’ in
Proceedings of the Workshop on Object-Oriented Technology, 12th
European Conference on Object-Oriented Programming (ECOOP),
Brussels, Lecture Notes in Computer Science 1543, pp. 66-67,
Berlin: Springer-Verlag, 1998
[Demeyer 1999] S Demeyer, S Ducasse, M Lanza, ‘A hybrid reverse engineering
approach combining metrics and program visualisation’ in
Proceedings of the 6th Working Conference on Reverse Engineering
(WCRE), Atlanta, GA, pp. 175-186, Washington, DC: IEEE
Computer Society Press, 1999
[Ducasse 2000] S Ducasse, M Lanza, S Tichelaar, ‘MOOSE: an extensible
language-independent environment for reengineering object-
oriented systems’ in Proceedings of the 2nd International Symposium
on Constructing Software Engineering Tools (CoSET), International
Conference on Software Engineering (ICSE), Limerick, pp. 24-30,
Wollongong, NSW: School of Information Technology and
Computer Science, University of Wollongong, 2000
[Ducasse 2004] S Ducasse, M Lanza, R Bertuli, ‘High-level polymetric views of
condensed run-time information’ in Proceedings of the 8th
209
Euromicro Working Conference on Software Maintenance and
Reengineering (CSMR), Tampere, pp. 309-318, Los Alamitos, CA:
IEEE Computer Society Press, 2004
[Ducasse 2005] S Ducasse, M Lanza, ‘The class blueprint: visually supporting the
understanding of classes’, IEEE Transactions on Software
Engineering, 31(1):1-16, 2005
[Eclipse 2005] Eclipse Foundation, Eclipse.org Main Page, http://eclipse.org/, 2005
[Eick 1992] S G Eick, J L Steffen, E E Sumner Jr., ‘Seesoft – a tool for
visualizing line oriented software statistics’, IEEE Transactions on
Software Engineering, 18(11):957-968, 1992
[Eick 1993] S G Eick, G J Wills, ‘Navigating large networks with hierarchies’ in
Proceedings of the 4 Visualization, San Jose, CA,
pp. 204-209, Los Alamitos, CA: IEEE Computer Society Press,
1993
[Eick 2002] S G Eick, T L Graves, A F Karr, A Mockus, P Schuster,
‘Visualizing software changes’, IEEE Transactions on Software
Engineering, 28(4):396-412, 2002
[Einstein 1920] Einstein A, Relativity: The Special and General Theory, New York,
NY: Henry Holt and Company, 1920
[Fischer 2000] T Fischer, J Niere, L Torunski, A Zündorf, ‘Story diagrams: a new
graph rewrite language based on the Unified Modelling Language
and Java’ in Proceedings of the 6 al Workshop on
Theory and Application of Graph Transformations (TAGT),
th Conference on
[Falkenhainer 1991] B Falkenhainer, K Forbus, ‘Compositional modeling: finding the
right model for the job’, Artificial Intelligence, 51:95-143, 1991
[Fayad 1999] M E Fayad, D C Schmidt, R E Johnson, Building Application
Frameworks: Object-Oriented Foundations of Framework Design,
New York, NY: Wiley Computer Publishing, 1999
[Feiner 1985] S Feiner, ‘Apex: an experiment in the automated creation of
pictorial explanations’, IEEE Computer Graphics and Applications,
5(11):29-37, 1985
[Finnigan 1997] P J Finnigan, R C Holt, I Kalas, S Kerr, K Kontogiannis, H A
Müller, J Mylopoulos, S G Perelgut, M Stanley, K Wong, ‘The
software bookshelf’, IBM Systems Journal, 36(4):564-593, 1997
th Internation
210
Paderborn, 1998, Lecture Notes in Computer Science 1764, pp.
296-309, Berlin: Springer Verlag, 2000
[Fishwick 1988] P A Fishwick, ‘The role of process abstraction in simulation’, IEEE
Transactions on Systems, Man, and Cybernetics, 18(1):18-39, 1988
[Fujaba 2002] Fujaba Developer Team, Fujaba is a public domain Case Tool for
UML, http://www.fujaba.de/, 2002
[Gamma 1995] E Gamma, R Helm, R Johnson, J Vlissides, Design Patterns:
Elements of Reusable Object-Oriented Software, Boston, MA:
Addison-Wesley, 1995
[Garlan 1997] D Garlan, B Monroe, D Wile, ‘ACME: An interchange language for
software architecture’, 2 cal Report, Pittsburgh, PA:
Carnegie Mellon University, 1997
[Fishman 1967] G S Fishman, P J Kiviat, ‘The analysis of simulation generated time
series’, Management Science, 13(7):525-557, 1967
[Frantz 1995] F K Frantz, ‘A taxonomy of model abstraction techniques’ in
Proceedings of the 27th Winter Simulation Conference, Arlington,
VA, pp. 1413-1420, New York, NY: ACM Press, 1995
[Furnas 1986] G W Furnas, ‘Generalized fisheye views’ in Proceedings of the 4th
ACM Conference on Human Factors in Computing Systems, Boston,
MA, New York, NY: ACM Press, pp. 16-23, 1986
[Gamma 1998] E Gamma, T Eggenschwiler, JHotDraw 5.1,
http://members.pingnet.ch/gamma/JHD-5.1.zip, 1998
[Gansner 2002] E. Gansner, E. Koutsofios, and S. North, “Drawing Graphs with
dot”, http://www.graphviz.org/Documentation/dotguide.pdf , 2002
nd ed., Techni
[Gîrba 2005] T Gîrba, M Lanaza, S Ducasse, ‘Characterizing the evolution of
class hierarchies’ in Proceedings of the 9th European Conference on
Software Maintenance and Reengineering (CSMR), Manchester, pp.
2-11, 2005
[Globus 1995] A Globus, S P Uselton, ‘Evaluation of visualization software’, ACM
SIGGRAPH Computer Graphics, 29(2):41-44, 1995
[GNU 2004] GNU, GCC Home Page – GNU Project – Free Software
Foundation (FSF), http://gcc.gnu.org/, 2004
[Goldberg 1983] A J Goldberg, D Robson, Smalltalk-80: The Language and its
Implementation, Reading, MA: Addison-Wesley, 1983
211
[Gosling 2000] J Gosling, B Joy, G Steele, G Bracha, The Java Language
Specification, 2nd edition, Boston, MA: Addison Wesley, 2000
[Grass 1992] J E Grass, ‘Object-oriented design archaeology with CIA++’,
Computing Systems 5(1):5-67, 1992
[Grove 1997] D Grove, G DeFouw, J Dean, C Chambers, ‘Call graph construction
in object-oriented languages’ in Proceedings of the 12th ACM
SIGPLAN Conference on Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA), Atlanta, GA, pp. 108-124,
New York, NY: ACM Press, 1997
[Grundy 2000] J Grundy, J Hosking, ‘High-level static and dynamic visualisation of
software architectures’ in Proceedings of the 4 EE Symposium
on Visual Languages (VL), Halifax, NS, pp. 5-12, Los Alamitos,
CA: IEEE Computer Society Press, 2000
[Harel 1990] D Harel, H Lachover, A Naamad, A Pnueli, M Politi, R Sherman, A
Shtull-Trauring, M Trakhtenbrot, ‘STATEMATE: a working
environment for the development of complex reactive systems’,
IEEE Transactions on Software Engineering, 16(4):403-414, 1990
[Hatley 1987] D J Hatley, I A Pirbhai, Strategies for Real-Time System
Specification, New York, NY: Dorset House, 1987
th IE
[Gulden 2004] J Gulden, BeautyJ – Java source code transformation tool,
http://beautyj.berlios.de/, 2004
[GUPRO 2004] GUPRO, GUPRO – Homepage, http://www.uni-
koblenz.de:8080/Uni/CampusKoblenz/Contrib/GUPRO/Site/Home,
2004
[Guttag 1977] J Guttag, ‘Abstract data types and the development of data
structures’, Communications of the ACM, 20(6):396-404, 1977
[Harel 1988] D Harel, ‘On visual formalisms’, Communications of the ACM,
31(5), May 1988
[Hatch 2001] A S Hatch, M P Smith, C M B Taylor, M Munro, ‘No silver bullet
for software visualisation evaluation’ in Proceedings of the
Workshop on Fundamental Issues of Visualization, International
Conference on Imaging Science, Systems, and Technology (CISST),
Las Vegas, NV, pp. 651-657, Athens, GA: CSREA Press, 2001
212
[Haynes 1995] P Haynes, T Menzies, R F Cohen, Visualisations of Large Object-
Oriented Systems, Technical Report TR 95-4, Melbourne, VIC:
Monash University, 1995
[Hilliard 1999] R Hilliard, ‘Using the UML for architectural description’ in
Proceedings of the 2 Conference on The Unified
Modeling Language (<<UML>>), Fort Collins, CO, Lecture Notes
in Computer Science 1723, Berlin: Springer, 1999
[Hooker 1996] R Hooker, Abstraction,
http://www.wsu.edu:8080/~dee/GLOSSARY/ABSTRACT.HTM,
1996
[Imagix 2004] Imagix Corporation, Imagix 4D,
http://www.imagix.com/products/products.html, 2004
[Heer 2004] Jeffrey Heer, prefuse: an interactive visualization toolkit,
http://prefuse.sourceforge.net/, 2004
[Henry 1993] S Henry, M Humphrey, ‘Object-oriented vs. procedural
programming languages: effectiveness in program maintenance’,
Journal of Object-Oriented Programming, 6(3):41-49, 1993
nd International
[Hoagland 1995] J Hoagland, Mkfunctmap Home Page,
http://seclab.cs.ucdavis.edu/~hoagland/mkfunctmap.html, 1995
[Hofmeister 1999a] C Hofmeister, R Nord, D Soni, Applied Software Architecture,
Reading, MA: Addison-Wesley, 1999
[Hofmeister 1999b] C Hofmeister, R L Nord, D Soni, ‘Describing software architecture
with UML’ in Proceedings of the 1st Working IFIP Conference on
Software Architecture (WICSA), San Antonio, TX, pp. 145-160,
Dordrecht: Kluwer Academic Publishers, 1999
[IBM 2003] IBM, VisualAge Smalltalk – Product Overview,
http://www-3.ibm.com/software/ad/smalltalk/, 2003
[IBM 2004a] IBM, VisualAge C++ – Product Overview – IBM Software,
http://www-306.ibm.com/software/awdtools/vacpp/, 2004
[IBM 2004b] IBM, Rational Software from IBM,
http://www-306.ibm.com/software/rational/, 2004
[Issarny 1998] V Issarny, T Saridakis, A Zarras, ‘Multi-view description of
software architectures’ in Proceedings of the 3rd International
Workshop on Software Architecture, Orlando, FL, pp. 81-84, New
York, NY: ACM Press, 1998
213
[ITU-T 1996] International Telecommunications Union – Standardization (ITU-
T), ITU-T Recommendation Z.120: Message Sequence Chart
(MSC), Geneva: ITU-T, 1996
[Jerding 1997] D F Jerding, S Rugaber, ‘Using visualization for architectural
localization and extraction’ in Proceedings of the 4
Conference on Reverse Engineering (WCRE), Amsterdam, pp. 56-
65, Los Alamitos, CA: IEEE Computer Society Press, 1997
[Jacobson 1992] I Jacobson, M Christerson, P Johnson, G Overgaard, Object-
Oriented Software Engineering: A Use Case Driven Approach,
Reading, MA: Addison-Wesley, 1992
[Jahnke 2002] J H Jahnke, H A Müller, A Walenstein, N Mansurov, K Wong,
“Fused data-centric visualizations for software evolution
environments” in Proceedings of the 10th International Workshop on
Program Comprehension (IWPC), Paris, pp. 187-196, Los
Alamitos, CA: IEEE Computer Society Press, 2002
[Jerding 1995] D F Jerding, J T Stasko, ‘The information mural: a technique for
displaying and navigating large information spaces’ in Proceedings
of the 1st Symposium on Information Visualization, Atlanta, GA, pp.
43-50, Los Alamitos, CA: IEEE Computer Society Press, 1995
th Working
[Johnson 1992] R E Johnson, ‘Documenting frameworks using patterns’ in
Proceedings of the 7th Conference on Object-Oriented
Programming, Systems, Languages, and Applications (OOPSLA),
Vancouver, BC, pp. 63-76, New York, NY: ACM Press, 1992
[Kapoor 2001] R V Kapoor, E Stroulia, ‘Mathaino: simultaneous legacy interface
migration to multiple platforms’ in Proceedings of the 9th
International Conference on Human-Computer Interaction, New
Orleans, LA, Vol. 1, pp. 51-55, Mahwah, NJ: Lawrence Erlbaum
Associates, 2001
[Kazman 1994] R Kazman, L Bass, G Abowd, S M Webb, ‘SAAM: a method for
analysing the properties of software architectures’ in Proceedings of
the 16th International Conference on Software Engineering (ICSE),
Sorrento, pp. 81-90, Los Alamitos, CA: IEEE Computer Society
Press, 1994
[Kazman 1996] R Kazman, S J Carrière, ‘An adaptable software architecture for
rapidly creating information visualizations’, Proceedings of
214
Graphics Interface, Toronto, ON, pp. 17-27, San Francisco: Morgan
Kaufmann, 1996
[Kazman 1998] R Kazman, S J Carrière, ‘View extraction and view fusion in
architectural understanding’ in Proceedings of the 5th International
Conference on Software Reuse (ICSR), Victoria, BC, pp. 290-299,
Los Alamitos, CA: IEEE Computer Society Press, 1998
[Kazman 1999] R Kazman, S J Carrière, ‘Playing detective: reconstructing software
architecture from available evidence’, Journal of Automated
Software Engineering, 6(2):107-138, 1999
[Keller 1999] R K Keller, R Schauer, S Robitaille, P Pagé, ‘Pattern-based reverse-
engineering of design components’ in Proceedings of the 21
International Conference on Software Engineering (ICSE), Los
Angeles, CA, pp. 226-235, Los Alamitos, CA: IEEE Computer
Society Press, 1999
st
[Keynes 1936] J M Keynes, The General Theory of Employment, Interest, and
Money, Cambridge: Macmillan Cambridge University Press, 1936
[Kirk 1986] J Kirk, M L Miller, Reliability and Validity in Qualitative Research,
Beverley Hills, CA: Sage, 1986, pp. 22-23
[Kirk 2001] D Kirk, M Roper, M Wood, Understanding Object-Oriented
Frameworks – An Exploratory Case Study, Technical Report
EFoCS-42-2001, Glasgow: Department of Computer and
Information Sciences, University of Strathclyde, 2001
[Kirsanov 1998] D Kirsanov, The Flesh and Soul of Information: The Origins of
Abstraction, http://webreference.com/dlab/9804/origins.html, 1998
[Knight 2000] C Knight, M Munro, ‘Virtual but visible software’ in Proceedings of
the International Conference on Information Visualisation (IV),
London, pp. 198-205, Los Alamitos, CA: IEEE Computer Society
Press, 2000
[Knight 2001] C Knight, ‘Visualisation effectiveness’ in Workshop on
Fundamental Issues of Visualization, Proceedings of the
International Conference on Imaging Science, Systems, and
Technology (CISST), Las Vegas, NV, pp. 639-643, Athens, GA:
CSREA Press, 2001
[Koenemann 1991] J Koenemann, S P Robertson, ‘Expert problem solving strategies for
software comprehension’ in Proceedings of the SIGCHI Conference
215
on Human Factors in Computing Systems, New Orleans, LA, pp.
125-130, New York, NY: ACM Press, 2001
[Kolence 1973] K W Kolence, ‘The software empiricist’, ACM SIGMETRICS
Performance Evaluation Review, 2(2):31-36, 1973
[Kollmann 2001] R Kollmann, M Gogolla, ‘Application of UML associations and
their adornments in design recovery’ in Proceedings of the 8th
Working Conference on Reverse Engineering (WCRE), Stuttgart,
pp. 81-90, Los Alamitos, CA: IEEE Computer Society Press, 2001
[Kollmann 2002a] R Kollmann, P Selonen, E Stroulia, T Systä, A Zündorf, ‘A study on
the current state of the art in tool-supported UML-based static
reverse engineering’ in Proceedings of the 9 e
on Reverse Engineering (WCRE), Richmond, VA, pp. 22-33, Los
Alamitos, CA: IEEE Computer Society Press, 2002
[Koskimies 1995a] K Koskimies, H Mössenböck, ‘Scenario-based browsing of object-
oriented systems with Scene – Report 4’, Linz: Institut für
Informatik (Systemsoftware), Johannes Kepler Universität, 1995
[Koskimies 1996] K Koskimies, H Mössenböck, ‘Scene: using scenario diagrams and
active text for illustrating object-oriented programs’ in Proceedings
of the 18 International Conference on Software Engineering
(ICSE), Berlin, pp. 366-375, Los Alamitos, CA: IEEE Computer
Society Press, 1996
th Working Conferenc
[Kollmann 2002b] R Kollmann, M Gogolla, ‘Metric-based selective representation of
UML diagrams’ in Proceedings of the 6th European Conference on
Software Maintenance and Reengineering (CSMR), Budapest, pp.
89-98, Los Alamitos, CA: IEEE Computer Society Press, 2002
[Korn 1999] J L Korn, Non-Exclusive Limited-Use Software – Chava,
http://www.research.att.com/sw/tools/chava/, 1999
[Koschke 2003] R Koschke, Bauhaus Stuttgart, http://www.bauhaus-stuttgart.de/,
2003
[Koskimies 1995b] K Koskimies, H Mössenböck, ‘Designing a framework by stepwise
generalization’ in Proceedings of the 5th European Software
Engineering Conference (ESEC), Barcelona, Lecture Notes in
Computer Science 989, pp. 479-497, Berlin: Springer-Verlag, 1995
th
[Koskimies 1998] K Koskimies, T Männistö, T Systä, J Tuomi, ‘Automated support
for modeling OO software’, IEEE Software, 15(1):87-94, 1998
216
[Koskinen 2001] J Koskinen, J Peltonen, P Selonen, T Systä, K Koskimies, ‘Towards
tool assisted UML development environments’ in Proceedings of
the 7 osium on Programming Language and Software Tools
(SPLST), Szeged, pp. 1-15, Szeged: University of Szeged, 2001
[Koutsofios 1996b] E Koutsofios, S C North, Drawing Graphs with dot, Murray Hill,
NJ: AT&T Bell Laboratories, 1996
[Lange 1997] D B Lange, Y Nakamura, ‘Object-oriented program tracing and
visualization’, IEEE Computer, 30(5):63-70, 1997
th Symp
[Koutsofios 1996a] A Koutsofios, S C North, “Editing Graphs with dotty”,
http://www.graphviz.org/Documentation/dottyguide.pdf, 1996.
[Krasner 1988] G E Krasner, S T Pope, ‘A cookbook for using the model-view-
controller user interface paradigm in Smalltalk-80’, Journal of
Object-Oriented Programming, 1(3):26-49, 1988
[Kruchten 1995] P B Kruchten, ‘The 4+1 View Model of Architecture’, IEEE
Software, 12(6):42-50, 1995
[Laffra 1994] C Laffra, A Malhotra, ‘HotWire – a visual debugger for C++’ in
Proceedings of the 6th USENIX C++ Technical Conference,
Cambridge, MA, pp. 109-122, Berkeley, CA: USENIX Association,
1994
[Lange 1995a] D B Lange, Y Nakamura, ‘Interactive visualization of design
patterns can help in framework understanding’ in Proceedings of the
10th Conference on Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA), Austin, TX, pp. 342-357,
New York, NY: ACM Press, 1995
[Lange 1995b] D B Lange, Y Nakamura, ‘Program Explorer: a program visualizer
for C++’ in Proceedings of the 1st USENIX Conference on Object-
Oriented Technologies (COOTS), Monterey, CA, pp. 39-54,
Berkeley, CA: USENIX Association, 1995
[Lanza 2001] M Lanza, S Ducasse, ‘A categorization of classes based on the
visualization of their internal structure: the class blueprint’ in
Proceedings of the 16th International Conference on Object-
Oriented Programs, Systems, Languages, and Applications
(OOPSLA), Tampa Bay, FL, pp. 300-311, New York, NY: ACM
Press, 2001
217
[Lanza 2002] M Lanza, S Ducasse, ‘Understanding software evolution using a
combination of software visualization and software metrics’ in
Proceedings of Languages et Models a Objets (LMO), pp. 135-149,
London: Hermes, 2002
[Lanza 2003a] M Lanza, S Ducasse, ‘Polymetric views – a lightweight visual
approach to reverse engineering’, IEEE Transactions on Software
Engineering, 29(9):782-795, 2003
[Lanza 2003b] M Lanza, CodeCrawler,
http://www.iam.unibe.ch/~scg/Research/CodeCrawler/, 2003
[Ledgard 1977] H F Ledgard, R W Taylor, ‘Two views of data abstraction’,
Communications of the ACM, 20(6):382-384, 1977
[Lincoln 1993] S E Lincoln, M J Daly, E S Lander, Constructing Genetic Linkage
Maps with MapMaker/EXP Version 3.0: A Tutorial and Reference
Manual, 3 ed., Technical Report, Cambridge, MA: Whitehead
Institute for Biomedical Research, 1993
[Lee 1996] K Lee, P A Fishwick, ‘Dynamic model abstraction’ in Proceedings
of the 28th Winter Simulation Conference, Coronado, CA, pp. 764-
771, New York, NY: ACM Press, 1996
[Lethbridge 2004] T C Lethbridge, S Tichelaar, E Ploedereder, ‘The Dagstuhl Middle
Metamodel: a schema for reverse engineering’ in Proceedings of the
1st International Workshop on Meta-Models and Schemas for
Reverse Engineering (ateM), Victoria, BC, ENTCS 94. pp. 7-18,
Amsterdam: Elsevier, 2004
[Letovsky 1986] S Letovsky, ‘Cognitive processes in program comprehension’ in
Proceedings of the First Workshop on Empirical Studies of
Programmers, Washington, DC, pp. 58-79, Norwood, NJ: Ablex,
1986
rd
[Linton 1992] M A Linton, P R Calder, J A Interrante, S Tank, J M Vlissides,
Interviews Reference Manual Version 3.1, Stanford, CA: Stanford
University, 1992
[Liskov 1986] B Liskov, J Guttag, Abstraction and Specification in Program
Development, Cambridge, MA: MIT Press, 1986
[Littman 1986] D Littman, J Pinto, S Letovsky, E Soloway, ‘Mental models and
software maintenance’ in Proceedings of the First Workshop on
218
Empirical Studies of Programmers, Washington, DC, pp. 80-98,
Norwood, NJ: Ablex, 1986
[Litwin 1995] M S Litwin, How to Measure Survey Reliability and Validity,
Thousand Oaks, CA: SAGE, 1995, pp. 43-44
[Maletic 2001] J I Maletic, J Leigh, A Marcus, G Dunlap, ‘Visualizing object-
oriented software in virtual reality’ in Proceedings of the 9th
International Workshop on Program Comprehension (IWPC),
Toronto, ON, pp. 26-35, Washington, DC: IEEE Computer Society
Press, 2001
[Marcus 2003a] A Marcus, L Feng, J I Maletic, ‘Source Viewer 3D (sv3D): a system
for visualizing multi dimensional software analysis data’, paper
presented at The 2nd Annual “Designfest” on Visualizing Software
for Understanding and Analysis (VISSOFT), Amsterdam, 2003
[Marcus 2003b] A Marcus, L Feng, J I Maletic, ‘Comprehension of software analysis
data using 3D visualization’ in Proceedings of the 11th International
Workshop on Program Comprehension (IWPC), Orlando, FL, pp.
105-114 , Washington, DC: IEEE Computer Society Press, 2003
[Martin 2000] R C Martin, ‘Design principles and design patterns’,
http://www.objectmentor.com/resources/articles/Principles_and_Pat
terns.PDF, 2000, p. 17
[Martin 2002] L Martin, A Giesl, J Martin, ‘Dynamic component program
visualisation’ in Proceedings of the 9th Working Conference on
Reverse Engineering (WCRE), Richmond, VA, pp. 289-298, Los
Alamitos, CA: IEEE Computer Society Press, 2002
[MDR 2005] MDR team, Metadata Repository (MDR) project home,
http://mdr.netbeans.org/, 2005
[Minsky 1965] M Minsky, ‘Models, minds, machines’ in Proceedings of the
IFIPS Congress, New York, NY, pp. 45-49, Montvale, NJ:
AFIPS Press, 1965
[Mössenböck 1991] H Mössenböck, N Wirth, ‘The programming language Oberon-2’,
Structured Programming, 12(4):179-195, 1991
[Mulholland 1997] P Mulholland, ‘Using a fine-grained comparative evaluation
technique to understand and design software visualization tools’ in
Proceedings of the 7th Workshop on Empirical Studies of
219
Programmers, Alexandria, VA, pp. 91-108, New York, NY: ACM
Press, 1997
[Mulholland 1998] P Mulholland, ‘A principled approach to the evaluation of SV: a
case-study in Prolog’ in J Stasko, J Domingue, M Brown, B Price
(eds.), Software Visualization: Programming as a Multimedia
Experience, Cambridge, MA: MIT Press, 1998
[Mulholland 1999] P Mulholland, ‘The ISM framework: understanding and evaluating
software visualization tools’ in P Brna, B du Boulay, H Pain (eds.),
Learning to Build and Comprehend Complex Information
Structures: Prolog as a Case Study, Norwood, NJ: Ablex, 1999
[Müller 1988] H A Müller, K Klashinsky, ‘Rigi – a system for programming-in-
the-large’ in Proceedings of the 10th International Conference on
Software Engineering (ICSE), Singapore, pp. 80-86, Los Alamitos,
CA: IEEE Computer Society Press, 1988
[Müller 1993] H A Müller, M A Orgun, S R Tilley, J S Uhl, ‘A reverse
engineering approach to subsystem structure identification’, Journal
of Software Maintenance: Research and Practice, 5(4):181-204,
1993
[Müller 2000] H A Müller, J H Jahnke, D B Smith, M-A D Storey, S R Tilley, K
Wong, ‘Reverse engineering: a roadmap’ in Proceedings of the
Conference on the Future of Software Engineering, 22nd
International Conference on Software Engineering (ICSE),
Limerick, pp. 47-60, New York, NY: ACM Press, 2000
[Müller 2001] H A Müller, Rigi Group Home Page, http://www.rigi.csc.uvic.ca/,
2001
[Murphy 1995] G C Murphy, D Notkin, K J Sullivan, ‘Software reflexion models:
bridging the gap between source and high-level models’ in
Proceedings of the 3
Foundations of Software Engineering, Washington, DC, pp. 18-28,
New York, NY: ACM Press, 1995
rd ACM SIGSOFT Symposium on the
[Murphy 1996a] G C Murphy, D Notkin, E S-C Lan, ‘An empirical study of static
call graph extractors’ in Proceedings of the 18th International
Conference on Software Engineering (ICSE), Berlin, pp. 90-99, Los
Alamitos, CA: IEEE Computer Society Press, 1996
220
[Murphy 1996b] G C Murphy, D Notkin, ‘Lightweight lexical source model
extraction’, ACM Transactions on Software Engineering and
Methodology, 5(3):262-292, 1996
[Murphy 1998] G C Murphy, D Notkin, W G Griswold, E S Lan, ‘An empirical
study of static call graph extractors’, ACM Transactions on Software
Engineering and Methodology, 7(2):158-191, 1998
[Murphy 1997] G C Murphy, D Notkin, ‘Reengineering with reflexion models: a
case study’, IEEE Computer, 30(8):29-36, 1997
[Murphy 2001] G C Murphy, D Notkin, K J Sullivan, ‘Software reflexion models:
bridging the gap between design and implementation’, IEEE
Transactions on Software Engineering, 27(4): 364-380, 2001
[Myers 1986] B A Myers, ‘Visual programming, programming by example, and
program visualization: a taxonomy’ in Proceedings of the 4th ACM
SIGCHI Conference on Human Factors in Computing Systems,
Boston, MA, pp. 59-66, New York, NY: ACM Press, 1986
[Nassi 1973] I Nassi, B Shneiderman, ‘Flowchart techniques for structured
programming’, ACM SIGPLAN Notices, 8(8):12-26, 1973
[NCSA 2003] National Center for Supercomputing Applications, NCSA Mosaic
Home Page, http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/
NCSAMosaicHome.html, 2003
[Nielson 1990] G M Nielson, B D Shriver, J Rosenblum, Visualization in Scientific
Computing, Los Alamitos, CA: IEEE Computer Society Press, 1990
[O’Brien 2002] L O’Brien, ‘Experiences in architecture reconstruction at Nokia’,
Technical Note CMU/SEI-2002-TN-004, Pittsburgh, PA: Software
Engineering Institute, Carnegie-Mellon University, 2002
[O’Madadhain 2005] J O’Madadhain, D Fisher, T Nelson, J Krefeldt, JUNG – Java
Universal Network/Graph Framework, http://jung.sourceforge.net/,
2005
[OMG 2001] Object Management Group, Unified Modeling Language (UML)
v1.4, http://www.omg.org/technology/documents/formal/uml.htm,
2001
[OMG 2003a] Object Management Group, UML 2.0 Superstructure Specification,
OMG Adopted Specification ptc/03-08-02, http://www.omg.org/cgi-
bin/doc?ptc/2003-08-02, 2003
221
[OMG 2003b] Object Management Group, UML 2.0 Infrastructure Specification,
OMG Adopted Specification ptc/03-09-15, http://www.omg.org/cgi-
bin/doc?ptc/2003-09-15, 2003
[Park 1991] H-S Park, ‘Abstract object types = abstract knowledge types +
abstract data types + abstract connector types’, Journal of Object-
Oriented Programming, 4(3):37-52, 1991
[Perry 2000] D E Perry, A A Porter, L G Votta, ‘Empirical studies of software
engineering: a roadmap’ in Proceedings of the Conference on the
Future of Software Engineering, 22 Conference on
Software Engineering (ICSE), Limerick, pp. 345-355, New York,
NY: ACM Press, 2000
[OMG 2003c] Object Management Group, OMG Unified Modeling Language
Specification Version 1.5, Formal/03-03-01,
http://www.omg.org/technology/documents/formal/uml.htm, 2003
[Ören 1984] T I Ören, Foreword in B P Zeigler, Multifacetted Modeling and
Discrete Event Simulation, London: Academic Press, 1984
[Pennington 1987] N Pennington, ‘Stimulus structures and mental representations in
expert comprehension of computer programs’, Cognitive
Psychology, 19:295-341, 1987
nd International
[Peterson 1981] Peterson J L, Petri Net Theory and the Modeling of Systems,
Englewood Cliffs, NJ: Prentice-Hall, 1981
[Petre 1997] M Petre, A F Blackwell, T R G Green, 'Cognitive questions in
software visualisation' in J Stasko, J Domingue, M H Brown, B A
Price (eds.), Software Visualization: Programming as a Multimedia
Experience, Cambridge, MA: MIT Press, 1997
[Petri 1962] C Petri, Kommunikation mit Automaten, PhD dissertation, Bonn:
University of Bonn, 1962
[Pinzger 2005] M Pinzger, H Gall, M Fischer, M Lanza, ‘Visualizing multiple
evolution metrics’ in Proceedings of the 2005 ACM Symposium on
Software Visualization, St. Louis, MO, pp. 67-75, New York, NY:
ACM Press, 2005
[Pressman 2000] R S Pressman, Software Engineering: A Practitioner’s Approach,
European adaptation, 5th ed., London: McGraw-Hill, 2000
[Price 1992] B A Price, I S Small, R M Baecker, ‘A taxonomy of software
visualisation’ in Proceedings of the 25th Hawaii International
222
Conference on System Science (HICSS), Kauai, HI, Vol. II, pp. 597-
606, Los Alamitos, CA: IEEE Computer Society Press, 1992
[Price 1993] B A Price, R M Baecker, I S Small, ‘A principled taxonomy of
software visualization’, Journal of Visual Languages and
Computing, 4(3):211-266, 1993
[Rational 2003] Rational Software Corporation, Visual Modeling with Rational Rose
Home, http://www.rational.com/products/rose/index.jsp, 2003
[Reasoning 1994] Reasoning Systems Inc., Refine/C User’s Guide,
http://www.reasoning.com/, 1994
[Reeves 1983] W T Reeves, ‘Particle systems – a technique for modeling a class of
fuzzy objects’, ACM Transactions on Graphics, 2(2):91-108, 1983
[Reiser 1991] M Reiser, The Oberon System – User Guide and Programmer’s
Manual, Boston, MA: Addison-Wesley, 1991
[Reiss 1995] S P Reiss, The Field Programming Environment: A Friendly
Integrated Environment for Learning and Development, Dordrecht:
Kluwer Academic Publishers, 1995
[Reiss 2001] S P Reiss, ‘An overview of BLOOM’ in Proceedings of the 3
SIGPLAN-SIGSOFT Workshop on Program Analysis for Software
Tools and Engineering (PASTE), Snowbird, UT, pp. 2-5, New York,
NY: ACM Press, 2001
[Richner 1999] T Richner, S Ducasse, ‘Recovering high-level views of object-
oriented application from static and dynamic information’ in
Proceedings of the 15 e
rd ACM
[Reiss 2002] S P Reiss, ‘A visual query language for software visualisation’ in
Proceedings of the IEEE Symposium on Human Centric Computing
Languages and Environments (HCC), Arlington, VA, pp. 80-82, Los
Alamitos, CA: IEEE Computer Society Press, 2002
[Reiss 2003a] S P Reiss, ‘JIVE: visualizing Java in action’, demonstration
presented at the 25th International Conference on Software
Engineering (ICSE), Portland, OR, pp. 820-821, Los Alamitos, CA:
IEEE Computer Society Press, 2003
[Reiss 2003b] S P Reiss, ‘Visualizing Java in action’ in Proceedings of the 1st
ACM Symposium on Software Visualization (SoftViz), San Diego,
CA, pp. 57-65, New York, NY: ACM Press, 2003
th International Conference on Softwar
223
Maintenance (ICSM), Oxford, pp. 13-22, Los Alamitos, CA: IEEE
Computer Society Press, 1999
[Richner 2002a] T Richner, S Ducasse, ‘Using dynamic information for the iterative
recovery of collaborations and roles’ in Proceedings of the 18th
International Conference on Software Maintenance (ICSM),
Montréal, QC, pp. 34-43, Los Alamitos, CA: IEEE Computer
Society Press, 2002
[Richner 2002b] T Richner, Recovering Behavioural Design Views: A Query-Based
Approach, PhD thesis, Berne: University of Berne, 2002
[Rieger 2004] M Rieger, S Ducasse, M Lanza, ‘Insights into system-wide code
duplication’ in Proceedings of the 11th Working Conference on
Reverse Engineering (WCRE), Delft, pp. 100-109, Los Alamitos,
CA: IEEE Computer Society Press, 2004
[Riley 2003] G Riley, CLIPS: A Tool for Building Expert Systems,
http://www.ghg.net/clips/CLIPS.html, 2003
[Riva 2002] C Riva, J V Rodriguez, ‘Combining static and dynamic views for
architecture reconstruction’ in Proceedings of the 6th European
Conference on Software Maintenance and Reengineering (CSMR),
Budapest, pp. 47-56, Los Alamitos, CA: IEEE Computer Society
Press, 2002
[Rockel 2000] I Rockel, F Heimes, FUJABA – Homepage, http://www.uni-
paderborn.de/fachbereich/AG/schaefer/ag_dt/PG/Fujaba/fujaba.html
, 2000
[Roman 1993] G-C Roman, K C Cox, ‘A taxonomy of program visualization
systems’, IEEE Computer, 26(12):11-24, 1993
[Rumbaugh 1991] J Rumbaugh, M Blaha, W Premerlani, F Eddy, W Lorensen, Object-
oriented modelling and design, Englewood Cliffs, NJ: Prentice-Hall,
1991
[Rumbaugh 1999] J Rumbaugh, I Jacobsen, G Booch, The Unified Modelling
Reference Manual, Boston, MA: Addison Wesley, 1999
[Schauer 1999] R Schauer, S Robitaille, F Martel, R K Keller, ‘Hot spot recovery in
object-oriented software with inheritance and composition template
methods’ in Proceedings of the 15th International Conference on
Software Maintenance (ICSM), Oxford, pp. 220-229, Los Alamitos,
CA: IEEE Computer Society Press, 1999
224
[Sefika 1996a] M Sefika, A Sane, R H Campbell, ‘Architecture-oriented
visualization’ in Proceedings of the 11th Conference on Object-
Oriented Programming, Systems, Languages, and Applications
(OOPSLA), San José, CA, pp. 389-405, New York, NY: ACM
Press, 1996
[Sefika 1996b] M Sefika, A Sane, R H Campbell, ‘Monitoring compliance of a
software system with its high-level design models’ in Proceedings
of the 18th International Conference on Software Engineering
(ICSE), Berlin, pp. 387-396, Los Alamitos, CA: IEEE Computer
Society Press, 1996
[Selic 1994] B Selic, G Gullekson, P T Ward, Real-Time Object-Oriented
Modeling, New York: John Wiley and Sons, 1994
[Selic 1998] B Selic, J Rumbaugh, Using UML for Modeling Complex Real-Time
Systems,
http://www.ibm.com/developerworks/rational/library/content/03July
/1000/1155/1155_umlmodeling.pdf, 1998
[Selonen 2001] P Selonen, K Koskimies, M Sakkinen, ‘How to make apples from
oranges in UML’ in Proceedings of the 34th Hawaii International
Conference on System Sciences (HICSS), Maui, HI, pp. 3054-3063,
Los Alamitos, CA: IEEE Computer Society Press, 2001
[Sevitsky 2001] G Sevitsky, W De Pauw, R Konuru, ‘An information exploration
tool for performance analysis of Java programs’ in Proceedings of
the 38th Conference on Technology of Object-Oriented Languages
and Systems (TOOLS Europe), Zurich, pp. 85-101, Los Alamitos,
CA: IEEE Computer Society Press, 2001
[Shaw 1984] M Shaw, ‘Abstraction techniques in modern programming
languages’, IEEE Software, 1(4):10-26, 1984
[Shaw 1995] M Shaw, R DeLine, D Klein, T Ross, D Young, G Zelesnik,
‘Abstractions for software architecture and tools to support them’,
IEEE Transactions on Software Engineering, 21(4):314-335, 1995
[Shneiderman 1980] B Shneiderman, Software Psychology: Human Factors in Computer
and Information Systems, Boston, MA: Winthrop Publishers, 1980
[Siff 1997] M Siff, T Reps, ‘Identifying modules via concept analysis’ in
Proceedings of the 13th International Conference on Software
225
Maintenance (ICSM), Bari, pp. 170-178, Los Alamitos, CA: IEEE
Computer Society Press, 1997
[Sim 2000a] S E Sim, M-A D Storey, ‘A structured demonstration of program
comprehension tools’ in Proceedings of the 7th Working Conference
on Reverse Engineering (WCRE), Brisbane, QL, pp. 184-193, Los
Alamitos, CA: IEEE Computer Society Press, 2000
[Sim 2000b] S E Sim, M-A D Storey, A Winter, ‘A structured demonstration of
five program comprehension tools: lessons learnt’ in Proceedings of
the 7th Working Conference on Reverse Engineering (WCRE),
Brisbane, QL, pp. 210-212, Los Alamitos, CA: IEEE Computer
Society Press, 2000
[Singer 1997] J Singer, T Lethbridge, N Vinson, N Anquetil, ‘An examination of
software engineering work practices’ in Proceedings of the 1997
Conference of the Centre for Advanced Studies on Collaborative
Research (CASCON), Toronto, ON, p. 21, Armonk, NY: IBM Press,
1997
[Soloway 1984] E Soloway, K Ehrlich, ‘Empirical studies of programming
knowledge’, IEEE Transactions on Software Engineering, SE-
10(5):595-609, 1984
[Soloway 1988] E Soloway, J Pinto, S Letovsky, D Littman, R Lampert, ‘Designing
documentation to compensate for delocalized plans’,
Communications of the ACM, 31(11):1259-1267, 1988
[Stasko 1992] J T Stasko, C Patterson, ‘Understanding and characterising software
visualization systems’ in Proceedings of the 8th IEEE Workshop on
Visual Languages (VL), Seattle, WA, pp. 3-10, Los Alamitos, CA:
IEEE Computer Society Press, 1992
[Stonebraker 1990] M Stonebraker, L Rowe, M Hirohama, ‘The implementation of
POSTGRES’, IEEE Transactions on Knowledge and Data
Engineering, 2(1):125-141, 1990
[Storey 1995] M-A D Storey, H A Müller, ‘Manipulating and documenting
software structures using SHriMP views’ in Proceedings of the 11th
International Conference on Software Maintenance (ICSM), Nice,
pp. 275-284, Los Alamitos, CA: IEEE Computer Society Press,
1995
226
[Storey 1996a] M-A D Storey, K Wong, P Fong, D Hooper, K Hopkins, H A
Müller, ‘On designing an experiment to evaluate a reverse
engineering tool’ in Proceedings of the 3rd Working Conference on
Reverse Engineering (WCRE), Monterey, CA, pp. 31-40, Los
Alamitos, CA: IEEE Computer Society Press, 1996
[Storey 1996b] M-A D Storey, H Müller, K Wong, ‘Manipulating and documenting
software structures’ in P Eades, K Zhang (eds.), Software
Visualization, pp. 244-263, World Scientific Publishing, 1996
[Storey 1997] M-A D Storey, K Wong, H A Müller, ‘How do program
understanding tools affect how programmers understand programs?’
in Proceedings of the 4th Working Conference on Reverse
Engineering (WCRE), Amsterdam, pp. 12-21, Los Alamitos, CA:
IEEE Computer Society Press, 1997
[Storey 2000] M-A D Storey, K Wong, H A Müller, ‘How do program
understanding tools affect how programmers understand
programs?’, Science of Computer Programming, 36(2-3):183-207,
2000
[Storey 2001] M-A Storey, C Best, J Michaud, ‘SHriMP views: an interactive
environment for exploring Java programs’ in Proceedings of the 9th
International Workshop on Program Comprehension (IWPC),
Toronto, ON, pp. 111-112, Los Alamitos, CA: IEEE Computer
Society Press, 2001
[Stroulia 2002] E Stroulia, T Systä, ‘Dynamic analysis for reverse engineering and
program understanding’, ACM SIGAPP Applied Computing Review,
10(1):8-17, 2002
[Sun 2000] Sun Microsystems, Java 2 SDK, Standard Edition Version 1.2,
http://java.sun.com/products/jdk/1.2/, 2000
[Sun 2002] Sun Microsystems, jdb – the Java Debugger,
http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/jdb.html,
2002
[Sun 2005] Sun Microsystems, java.sun.com, http://java.sun.com, 2005
[Sun 2004a] Sun Microsystems, Java Platform Debugger Architecture,
http://java.sun.com/j2se/1.5.0/docs/guide/jpda/, 2004.
[Sun 2004b] Sun Microsystems, Java Programming Language,
http://java.sun.com/j2se/1.5.0/docs/guide/language/index.html, 2004
227
[Systä 1999a] T Systä, ‘On the relationships between static and dynamic models in
reverse engineering Java software’ in Proceedings of the 6th
Working Conference on Reverse Engineering (WCRE), Atlanta, GA,
pp. 304-313, Los Alamitos, CA: IEEE Computer Society Press,
1999
[Systä 1999b] T Systä, ‘Dynamic reverse engineering of Java software’ in
Proceedings of the 3rd Workshop on Object-Oriented Technology,
Lisbon, Lecture Notes in Computer Science 1743, pp. 174-175,
London: Springer-Verlag, 1999
[Systä 2000a] T Systä, P Yu, H Müller, ‘Analyzing Java software by combining
metrics and program visualization’ in Proceedings of the 4th
European Conference on Software Maintenance and Reengineering
(CSMR), Zurich, pp. 199-208, Los Alamitos, CA: IEEE Computer
Society Press, 2000
[Systä 2000b] T Systä, ‘Understanding the behaviour of Java programs’ in
Proceedings of the 7th Working Conference on Reverse Engineering
(WCRE), Brisbane, QLD, pp. 214-223, Los Alamitos, CA: IEEE
Computer Society Press, 2000
[Systä 2000c] T Systä, Static and Dynamic Reverse Engineering Techniques for
Java Software Systems, PhD Dissertation, Report A-2000-4,
Tampere: Department of Computer and Information Sciences,
University of Tampere, 2000
[Systä 2000d] T Systä, ‘Incremental construction of dynamic models for object-
oriented software systems’, Journal of Object-Oriented
Programming, 13(5):18-27, 2000
[Systä 2001] T Systä, K Koskimies, H Müller, ‘Shimba – an environment for
reverse engineering Java software systems’, Software – Practice and
Experience, 31(4):371-394, 2001
[Szyperski 1998] C Szyperski, Component Software: Beyond Object-Oriented
Programming, Harlow: Addison-Wesley, 1998
[Taligent 1994] Taligent Inc., Building Object-Oriented Frameworks, White Paper,
Cupertino, CA: Taligent Inc., 1994
[Templ 1994] J Templ, Oberon CD-ROM: Kepler – User Guide, Bonn: Addison-
Wesley 1994
228
[Tichelaar 1998] S Tichelaar, S Demeyer, ‘An exchange model for reengineering
tools’ in Proceedings of the 12th European Conference on Object-
Oriented Programming (ECOOP) Workshop Reader, Lecture Notes
in Computer Science 1543, pp.82-84, Berlin: Springer-Verlag, 1998
[Tilley 1994] S R Tilley, K Wong, M-A D Storey, H A Müller, ‘Programmable
reverse engineering’, International Journal of Software Engineering
and Knowledge Engineering, 4(4):501-520, 1994
[TogetherSoft 2001a] TogetherSoft Corporation, Together v5.5 Documentation,
http://www.togethercommunity.com/docs/5.5/together5.htm, 2001
[TogetherSoft 2001b] TogetherSoft Corporation, Together ControlCenter,
http://www.togethersoft.com/products/controlcenter/, 2001
[Tolke 2005] L Tolke, M Klink, Cookbook for Developers of ArgoUML: an
introduction to developing ArgoUML,
http://argouml.tigris.org/documentation/defaulthtml/cookbook/,
2005
[Tonella 1999] P Tonella, G Antoniol, ‘Object oriented design pattern inference’ in
Proceedings of the 15th International Conference on Software
Maintenance (ICSM), Oxford, pp. 230-238, Los Alamitos, CA:
IEEE Computer Society Press, 1999
[Tufte 1990] E R Tufte, Envisioning Information, Cheshire, CT: Graphics Press,
1990
[Turing 1948] A M Turing, ‘Intelligent machinery’ in B Meltzer, D Michie (eds.),
Machine Intelligence 5, Edinburgh: Edinburgh University Press,
1969
[von Mayrhauser 1995] A von Mayrhauser, A M Vans, ‘Program comprehension during
software maintenance and evolution’, IEEE Computer, 28(8):44-55,
1995
[von Mayrhauser 1999] A von Mayrhauser, S Lang, ‘On the role of static analysis during
software maintenance’ in Proceedings of the International
Workshop on Program Comprehension (IWPC), Pittsburgh, PA, pp.
170-177, Los Alamitos, CA: IEEE Computer Society Press, 1999
[W3C 1999] World Wide Web Consortium, XSL Transformations (XSLT),
http://www.w3.org/TR/xslt, 1999
229
[W3C 2004] World Wide Web Consortium, Extensible Markup Language (XML)
1.0 (Third Edition), http://www.w3.org/TR/2004/REC-xml-
20040204/, 2004
[Walker 1998] R J Walker, G C Murphy, B Freeman-Benson, D Wright, D
Swanson, J Isaak, ‘Visualizing dynamic software system
information through high-level models’ in Proceedings of the 13th
Conference on Object-Oriented Programming, Systems, Languages,
and Applications (OOPSLA), Vancouver, BC, pp. 271-283, New
York, NY: ACM Press, 1998
[Waters 1999] B Waters, S Rugaber, G Abowd, ‘Architectural synthesis:
integrating multiple architectural perspectives’ in Proceedings of the
6th Working Conference on Reverse Engineering, Atlanta, GA, pp. 2-
11, IEEE Computer Society Press, 1999
[Wikman 1998] J Wikman, Evolution of a Distributed Repository-Based
Architecture, Research Report 1998:14, Karlskrona: Department of
Software Engineering and Computer Science, Blekinge Institute of
Technology, 1998
[Wind River 2003] Wind River Systems Inc., SNIFF+ Datasheet,
http://www.takefive.com/bundle/sniff.pdf, 2003
[Winter 2002] A Winter, ‘GXL – overview and current status’, presentation at The
International Workshop on Graph-Based Tools (GraBaTs),
Barcelona, 2002
[Wirth 1992] N Wirth, J Gutknecht, Project Oberon, The Design of an Operating
System and Compiler, Boston, MA: Addison-Wesley, 1992
[Wong 1995] K Wong, S R Tilley, H A Müller, M-A D Storey, ‘Structural
redocumentation: a case study’, IEEE Software, 21(1):46-54, 1995
[Xfig 2003] Xfig.org, XFIG Drawing Program for the X Window System,
http://www.xfig.org/, 2003
[Yan 2004] H Yan, D Garlan, B Schmerl, J Aldrich, R Kazman, ‘DiscoTect: a
system for discovering architectures from running systems’ in
Proceedings of the 26th International Conference on Software
Engineering (ICSE), Edinburgh, pp. 470-479, Los Alamitos, CA:
IEEE Computer Society Press, 2004
[Yeh 1997] A S Yeh, D R Harris, M P Chase, ‘Manipulating recovered software
architecture views’ in Proceedings of the 19th International
230
231
Conference on Software Engineering (ICSE), Boston, MA, pp. 184-
194, New York, NY: ACM Press, 1997
[Yin 2002] Yin R, Keller R K, ‘Program comprehension by visualization in
contexts’ in Proceedings of the 18th International Conference on
Software Maintenance (ICSM), Montreal, QC, pp. 332-341, Los
Alamitos, CA: IEEE Computer Society Press, 2002
[Zachman 1996] J A Zachman, ‘Concepts of the framework for enterprise
architecture: background, description, and utility’, Los Angeles, CA:
Zachman International, 1996
[Zeigler 1976] B P Zeigler, Theory of Modeling and Simulation, New York, NY:
Wiley, 1976
[Zeigler 1984] B P Zeigler, Multifacetted Modeling and Discrete Event Simulation,
London: Academic Press, 1984
[Zeigler 2000] B P Zeigler, H Praehofer, T G Kim, Theory of Modeling and
Simulation: Integrating Discrete Event and Continuous Complex
Dynamic Systems, 2nd ed., London: Academic, 2000
[Zimmer 1985] J A Zimmer, Abstraction for Programmers, New York, NY:
McGraw-Hill, 1985