Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
A Reverse Engineering Portal Web SiteCongrong Liao
andSpiros Mancoridis
Technical Report DU-CS-02-04Department of Computer Science
Drexel UniversityPhiladelphia, PA 19104
November 2002
1
c© Copyright 2002Congrong Liao. All Rights Reserved.
ii
Acknowledgements
I would first like to thank my advisor, Dr. S. Mancoridis, for his constant support
and guidance. I would also like to thank the other members of my committee: T.
Hewett, D. Salvucci and B. Mitchell for their input and thoughtful direction. I also
want to acknowledge for the people at the SERG lab for their help with the user
interface study, special thanks to W. Mongan for his input to REportal and his
patient help of correcting grammar errors of this thesis.
Finally, I would like to thank Xiaoping Hu, my wife, for supporting me while
I finished this thesis, and my father and mother, for always being there giving me
strength and love.
iii
Table of Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1. Introduction and Motivation . . . . . . . . . . . . . . . . . 1
1.1 Our Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Research Challenges Addressed in this Thesis . . . . . . . . . . . . . 7
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Web-based Software Engineering . . . . . . . . . . . . . . . . . . . . . 10
2.2 Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Design Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Graph Drawing for Software Visualization . . . . . . . . . . . 14
Chapter 3. REportal: A User Perspective . . . . . . . . . . . . . . . . 16
3.1 Introduction to REportal . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Login and Sign Up . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Workspace Management and Navigation . . . . . . . . . . . . . . . . 19
3.3 Code Querying and Browsing Service . . . . . . . . . . . . . . . . . . 22
3.4 Design Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iv
Chapter 4. REportal: A Developer Perspective . . . . . . . . . . . . . 47
4.1 Development History . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 REportal Non-Functional Requirements . . . . . . . . . . . . . . . . . 48
4.3 The REportal Process . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Development and Deployment Environment . . . . . . . . . . . . . . 51
4.4.1 Apache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.2 Java and Java Servlets . . . . . . . . . . . . . . . . . . . . . . 52
4.4.3 Configuration of REportal . . . . . . . . . . . . . . . . . . . . 54
4.5 REportal Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 REportal Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6.1 User Management & Configuration Subsystem . . . . . . . . . 59
4.6.2 Workspace Management & Utility Subsystem . . . . . . . . . 61
4.6.3 REportal Configuration . . . . . . . . . . . . . . . . . . . . . 63
4.6.4 The Query Dispatcher Subsystem . . . . . . . . . . . . . . . . 64
4.6.5 The Clustering Subsystem . . . . . . . . . . . . . . . . . . . . 65
Chapter 5. Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1 Design of Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Conducting the Evaluation Study . . . . . . . . . . . . . . . . . . . . 70
5.3 Results of the Evaluation Study . . . . . . . . . . . . . . . . . . . . . 71
Chapter 6. Conclusions and Future Work . . . . . . . . . . . . . . . . 74
6.1 Summary & Research Contributions . . . . . . . . . . . . . . . . . . . 74
6.2 Plans of Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Appendix A. Tasks of Evaluation . . . . . . . . . . . . . . . . . . . . . . 80
A.1 Tomcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
A.2 Bunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
v
List of Tables
2.1 All Classes Defined in TAP System . . . . . . . . . . . . . . . . . . . . . 13
3.1 Entity Types for Entity Search . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Wild Cards Accepted by Entity Search . . . . . . . . . . . . . . . . . . . 25
3.3 Examples of Relationship Search . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Shell Scripts in REportal (Locations are relative to
/usr/local/www/reportal) . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 BunchAPILinks in REportal (Locations are relative to
/usr/local/www/reportal) . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 All Other Tools in REportal . . . . . . . . . . . . . . . . . . . . . . . . . 57
vi
List of Figures
1.1 Log into REportal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Create a folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Upload the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Open and analyze the project . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Sign up for a REportal account . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Log into REportal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Structure of user’s file system . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Process to build up a working space . . . . . . . . . . . . . . . . . . . . . 21
3.5 Workspace display and navigation . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Query window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7 Entity search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.8 Relationship search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9 Display results as a graph . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.10 Reachability query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.11 Reachability query results . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.12 Clustering a relationship search graph . . . . . . . . . . . . . . . . . . . 31
3.13 Advanced clustering a relationship search graph . . . . . . . . . . . . . . 32
3.14 Text search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.15 Advanced search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.16 Advanced search results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.17 Code browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.18 Integration of Bunch in REportal . . . . . . . . . . . . . . . . . . . . . . 37
3.19 Upload a MDG file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
vii
3.20 Create a MDG file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.21 Hill-climbing configuration window . . . . . . . . . . . . . . . . . . . . . 39
3.22 GA configuration window . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.23 Clustering options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.24 Using libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.25 Using omnipresent modules . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.26 User-directed clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.27 MQ calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 The process supported by REportal . . . . . . . . . . . . . . . . . . . . . 50
4.2 Development and deployment environment of REportal . . . . . . . . . . 51
4.3 Queries against the repositories . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 REportal architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 REportal subsystem-level sequence diagram . . . . . . . . . . . . . . . . 58
4.6 The user management and REportal configuration subsystem . . . . . . 60
4.7 The workspace management and REportal utilities subsystem . . . . . . 62
4.8 The sequence diagram of ProjectFile . . . . . . . . . . . . . . . . . . . 64
4.9 The sequence diagram of the Clustering subsystem . . . . . . . . . . . 66
viii
AbstractA Reverse Engineering Portal Web Site
Congrong LiaoSpiros Mancoridis, Ph.D.
The software engineering research community has developed a suite of useful tools,
which run on different platforms with different user interfaces and in some cases are
difficult to configure. They also change over time because they are research tools.
Thus, the users of these tools are responsible for integrating these tools together and
dealing with constant updates. In this thesis, we present a technique of “software
as services”. We create a portal web site called REportal which deliver reverse engi-
neering services that only requires an internet browser. REportal features a simple
and friendly user interface that hides the complexities of using those tools. The nice
design of REportal is of value to experienced users because REportal provides all of
the integration. It is also of value to inexperienced users since it provides a suite of
useful services that may not be obvious or known to them.
1
Chapter 1. Introduction and Motivation
Large volumes of code are typically difficult to understand. This problem is made
worse because most software systems are not documented. Studies show that about
half of the time spent making changes to software is spent on understanding the
software [10]. Automated reverse engineering techniques help developers by minimizes
the amount of manual source code analysis needed to understand a system. Reverse
engineering is the process of analyzing a subject system to identify its components
and their relationships, creating a representation of the system at a higher level of
abstraction than what exists in the source code. Specifically, reverse engineering
provides an aid to the comprehension of complex systems[10].
The Software Engineering Research Group (SERG)[28] at Drexel University along
with the AT&T Research Labs have developed several reverse engineering tools over
the years, among which are the Bunch software clustering tool[7, 11, 23, 24], the
Acacia source code analyzer for C/C++[13], the Chava source code analyzer for
Java[18], various tools for profiling C/C++/Java programs, and graph visualization
tools.
Even though most of the aforementioned tools are developed in Java, which makes
them portable across platforms, their installation still is cumbersome. For example, in
order to install Bunch, you must first download bunch.jar. The installation requires
the Java Development Kit (JDK) 1.2 or higher. If you don’t have the JDK on your
2
machine, you have to install it. Even after setting the classpath1 you have to install
GraphViz from AT&T [6] to view the clustered graphs. That means you must go to
the web site of the AT&T Research Lab, find the correct version of GraphViz for your
platform, download it, and install it. Even after everything starts to work correctly,
the user must periodically check for software updates of Bunch, GraphViz and the
JDK in order to ensure the latest version is being used.
The difficulty of installing and updating these reverse engineering tools might
prevent professional software engineers, educators, and other researchers from using
them. Also, these tools are developed over several years by many different people.
Hence, the usability, reliability and programming interfaces of these tools may differ.
For example, Bunch was developed in Java and features a user-friendly Graphical
User Interface (GUI). It takes Module Dependence Graph (MDG) files as input and
produces module cluster graphs as output. Acacia and Chava were developed in C
and run in Unix. They are invoked by a Unix shell command or a Unix shell script
with appropriate arguments. Typically, MDG files are created by either Acacia or
Chava.
Besides, the usability and interfaces of these tools may change over time. For
example, since Bunch is a reach tool its API and GUI have changed extensively over
the past few years as new features were added. The Bunch developers noticed that
many inquires about the tool were related to users that were executing older versions
of the software.
Our experience shows that learning how to use these tools takes a long period of
time. To illustrate how cumbersome it is to use these tools, let’s do a simple analysis
of a small operating system simulator called cssimulator 2, written by Liao, which
1To run Bunch you must alter the classpath environment variable so that the Java VirtualMachine can find and load Bunch’s class file.
2cssimulator is a small operating system simulator written by the author for a course taught byDr. Brian Mitchell at Drexel University.
3
has 32 classes. Our goal is to determine the inheritance relationships between the
class command and all other classes. We assume that all necessary tools have been
installed correctly.
• Firstly we use Chava to create the entity and relationship repositories. To do
this, we type the following commands in the directory where the source code of
cssimulator resides:
– % chava -c *.java
– % chava -l *.java
The first command compiles Java source into a set of small repositories (one
per Java source file). The second command links the set of files to produce the
Chava repositories. Fortunately, all of the source code is in the same directory,
which eases the creation of the repository. In the case that source code resides
in more than one directory, a shell script might be necessary to compile and
link all the files. If the repository-creation process is successful, an entity.db file
and a relationship.db file are created in the directory. These files are required
by other AT&T source code analysis tools.
• To determine the parent class of class command, we use AT&T’s cref tool as
follows:
– % cref c command c -
The query results are displayed on screen in plain text.
• If you want to visualize the results, dagger, a program to generate program
graphs from the repositories, is a good choice. If you are interested in a partic-
ular class, you open the class using a text editor and view the source code.
4
In summary, to do the simple query, you have to install Chava, CIAO and dagger.
You have to read through the user manuals of these tools in addition to being familiar
with Unix.
1.1 Our Research
Practitioners and researchers can take advantage of the latest developments in reverse
engineering if the services provided by the tools are made available on the WWW.
Services provided via a browser interface typically require no client-side software, and
offer an intuitive user interface that is very easy to learn.
In this thesis, we present a technique of “software as services”. We create a portal
web site called REportal, which is a front-end to a repository of reverse engineering
tools. Instead of downloading these tools to each client’s local machine and running
them locally, we put these tools on a server and expose the features of these tools
as web-based services. It is called a portal web site because through REportal users
have access to a set of reverse engineering tools that essentially have different user
interface. REportal enables authorized users to login to the site and upload their
code to the server through a secure channel. REportal features a simple and friendly
user interface that hides the complexities of using those tools natively. Through a
web browser, users requests are sent to the server. The server then executes the
appropriate tools on the server. The results are shipped back to each user’s browser.
By these means, users are able to analyze and browse source code, as well as to query
and extract design information for C, C++ and Java programs.
To use REportal, all the user needs is a Java-enabled web browser. This allows
them to analyze and visualize source code without going through the troubles of
obtaining, installing, and learning how to use a set of reverse engineering tools. Users
all over the world are guaranteed access to the latest version of the portal service.
As a comparison, we go back to the previous example using REportal. This task
5
Figure 1.1: Log into REportal
requires the following steps:
1. Launch a web browser from any platform, go to the REportal web site[8] and
log on, as shown in Figure 1.1. Users without an account may obtain one for
free by simply filling out a registration form.
2. As shown in Figure 1.2, the next step is to create a folder for the project and
upload the project,3 as seen in Figure 1.3.
3. Open the project, then in the relationship section, choose the class to class
relationship, specify the name of the first entity as command, and press the list
button, as seen in Figure 1.4.
3Upon uploading, the project is unzipped and the repositories entity.db and relationship.db arecreated automatically.
6
Figure 1.2: Create a folder
Figure 1.3: Upload the code
7
Figure 1.4: Open and analyze the project
4. The query results are displayed in a table. If the user follows links in the table,
source code is displayed in a web browser as hyper-linked source. The hyperlinks
automatically cross-reference all of the entities in the source code.
The above example illustrates how easy it is to use and REportal services. Further-
more, once a project has been uploaded to REportal, it is preserved in our repository
enabling future analysis.
1.2 Research Challenges Addressed in this Thesis
This thesis addresses the following research challenges.
• Integration of complex interfaces. REportal is the first comprehensive
code browsing and analysis service available on the WWW. As a result, there
8
is no template that we can follow. Many challenges involve providing enough
features to users while keeping the user interface as simple as possible. REportal
consists of many integrated tools, which have different interfaces and platform
requirements. Some of these tools are written in Java and have comprehensive
APIs, while others are Unix command-line utilities. Providing a simple and
intuitive user interface by hiding the complexities of these tools is one of the
challenges that this thesis addresses.
• Limitation of web browsers. The only software needed to use REportal is a
web browser. User requests are sent to the server via the web browser; and the
results of the requests are sent back to the browser to be displayed. The idiosyn-
cracies of different web browsers limit the implementation of REportal in many
ways. For example, displaying an error message to users can be done easily in
a stand along program by a single line of code. However, to do the same thing
in a web browser, the content of the whole page has to be reloaded with the
error message. Other techniques, such as the use of Dynamic HTML(DHTML)
provides nice capabilities to enrich the user experience, but these implementa-
tion choices also limit the number of browsers that we can support, as only the
most recent releases of web browsers correctly support DHTML.
There are many types of web browsers on the market, among which are the pop-
ular Internet Explorer, Netscape and Mozilla. Some features are supported by
one browser but not another. Other features (e.g., JavaScript) behave differently
on different browsers. Browser capability is an issue that this thesis addresses.
Our goal is that all features of REportal should be supported by all popular
web browsers such as IE6.0, Netscape4.76, Netscape6.0, and Netscape6.2, on all
popular platforms such as Windows98, Windows95, Windows2000 and Linux.
this design tradeoff limits some of our implementation choices, but it enables
9
REportal to be used by a very large community of users.
• Security. Since REportal requires users to upload their source code to our
server, it must provide a secure environment so that users are confident enough
to use it. Security issues include: a)Secure user authentication; b)Secure trans-
fer of code to the server; c)Secure display of results. Security is provided by
using standard SSL technology through a registered server-side certification.
1.3 Thesis Outline
The rest of the thesis is organized as follows: Chapter 2 provides an overview of related
work. Chapter 3 describes the functionality of REportal from a user’s perspective.
The architecture of REportal is described in Chapter 4, thus providing a developer’s
perspective. The validation of REportal is covered in Chapter 5. The thesis concludes
in Chapter 6 by summarizing the work and outlining our plans for future work.
10
Chapter 2. Related Work
There are several areas that are relevant to this thesis: web-based software engineer-
ing, source code analysis, design extraction and software visualization.
2.1 Web-based Software Engineering
Many web-based software engineering tools fall into the following categories:
• A knowledge base of information about reverse engineering, program under-
standing, and software evolution.
• A repository of software packages that can be downloaded and run locally.
• A web-based service provider in an area other than reverse engineering, such as
graphing.
Reengineering Wiki[31] is a forum where all topics related to Reverse Engineering
and Reengineering can be discussed. It also contains a collection of papers, tutorials,
and surveys.
Software Bookshelf[15] catalogues software architectures and allows users to ex-
plore them. It has a primitive querying mechanism and, unlike REportal, the Software
Bookshelf does not support source code browsing.
The graph server at Brown University provides interactive graph drawing and
translation via the WWW[37]. It offers two kinds of services:
1) drawing graphs using a user-specified algorithm;
11
2) translating the description of a graph form one format to another[38].
Stephen North designed a graph service called drawdag [1]. The service accepts a
dot file and layout format from users via email. The server constructs a drawing with
dot[14] and then sends the output to users via e-mail.
Sourceforge.net[30] is one of the largest open source software development web site.
It supports source code browsing, discussion, bug/defect tracking, documentation
repository, etc. The focus is specifically for software and reverse engineering.
The closest work to REportal is the LintPlus Online [9] of Cleanscape. It provides
online source code analysis for C and Fortran programs. It displays the compiling,
call tree, and include-file trees in a text format. However, it does not support graph
visualizations, interactive querying, or design extractions.
2.2 Code Analysis
The current trend is to build repositories from a system’s source code so that those
repositories can be used for a variety of reverse engineering analysis. The reposito-
ries are useful because complex reverse engineering tools can be built by analyzing
information stored in the repository without parsing the system’s source code.
Two popular approaches exist to construct software repositories. One is to store
variants of abstract syntax trees in the repository; the other is to structure the repos-
itory as a relational database[13]. Once the repositories are constructed from the
source code, queries can be made to the repositories in order to exact structural
information about the source code.
Over the past several years, the AT&T Research Lab has developed a family of
source code analysis tools such as Acacia[13] for C and C++ and Chava[18] for Java.
In these tools, software systems are represented as collections of entities that refer to
each other. An entity represents a static syntactic construct such as a macro, a type,
or a function. The output of Acacia and Chava are two repositories, i.e., the entity
12
repository and the relationship repository. The entity repository stores the entities
such as file names, variables, functions, classes along with attributes, such as scopes,
line positions, and so on. Likewise the relationship repository stores the relationships
between entities, such as function calls, inheritance, and variable references.
A variety of Unix command line tools are available to query against the repository,
answering questions such as:
• Is variable a defined in file b?
• Is function a referenced by function b?
• What are the child classes of class a?
Advanced analysis such as dead code detection and reachability analysis can be
applied to the repositories as well. The query results are display in text format. Table
2.1 displays the query results of all classes defined in the TAP4 system.
The work described in this thesis uses Acacia and Chava to create the repositories.
There are some commercial code analysis tools such as Red Hat’s Source-Navigator[33]
and Netcomputing’s AnyJ[32].
2.3 Design Extraction
Software systems often need to be modified to improve performance, add new fea-
tures, adapt to new platforms or hardware, and so on. To modify a software system,
developers have to understand the system. As the size and complexity of software
systems increase, the time spending on understanding software systems increases as
well. In most cases the relevant design documentation is missing or inconsistent,
4TAP (Ticket Auction Program) is a small system that the author wrote in Java for a coursetaught by Dr. Spiros Mancoridis at Drexel University.
13
Table 2.1: All Classes Defined in TAP System
Name File
TAPClient TAP/TAPClient.java
TAPServant TAP/TAPServer.java
TAPImplBase TAP/TAPApp/ TAPImplBase.java
TAPServer TAP/TAPServer.java
account TAP/account.java
bid TAP/bid.java
command TAP/command.java
bidder TAP/bidder.java
commandReader TAP/commandReader.java
createUser TAP/createUser
notify TAP/notify
making the problem even worse. Therefore, tools that can provide a high-level sys-
tem decomposition become very helpful to facilitate the comprehension of software
systems.
The principle artifact that must be examined is the system’s source code. Thus,
the major task in software reverse engineering is to build an abstract model of a
software system from its source code.
Mitchell and Mancoridis developed a software tool called Bunch[7, 23], which
automatically decomposes the structure of software systems into subsystems. Modules
with high cohesion are grouped in the same subsystems (clusters), and independent
modules are grouped into separate subsystems. The modules and dependencies of
a system are mapped to a Module Dependency Graph (MDG) using source code
analysis tools such as Acacia[13] for C and C++ and Chava[18] for Java. The goal of
14
Bunch is to find a good partition of an MDG graph. It is the first system to employ
generic search algorithms to the software clustering problems.
Mitchell and Mancoridis introduced an objective function called Modularizaion
Quality (MQ). The MQ rewards the creation of highly cohesive clusters, and pe-
nalizes excessive coupling between clusters. Hence, Bunch reformulates the software
clustering activity into an optimization problem where the goal is to maximize the
value of MQ. The assumption behind this rationale is that most software systems are
designed in such a way that highly cohesive modules are organized into the same sub-
system while loosely coupled modules are organized into separate subsystems. The
process is conducted automatically. Also, users can integrate their knowledge with
clustering process by assigning some modules to subsystems manually. Extensive case
studies and experiments show that Bunch does a good job of producing a subsystem
decomposition with or without knowledge of the software design.
Bunch also includes a programmer’s Application Programming Interface (API) so
that the clustering tool can be integrated with other tools, which makes the integra-
tion of Bunch into REportal possible.
2.3.1 Graph Drawing for Software Visualization
Visual presentations can ease the understanding of complex systems. Not surprisingly,
extensive research has been conducted on how to store, layout and display graphs.
Barghouti and Mocenigo developed an extensible graph drawing package written
in Java, called Grappa[39]. It consists of a set of classes that implement graphs, in
addition to representation and presentation services. It also provides an API so that it
can be integrated into applications that require graph drawing, editing, and browsing.
The second version of Grappa, in addition to supporting the feature of bird’s eye view,
is able to handle large graphs, which the first version of Grappa could not. REportal
integrates Grappa in the form of an applet to represent interactive graphs.
15
Grappa invokes the dot [14], a graph layout tool. Dot, which runs fast enough
for interactive use, uses a four-pass algorithm for drawing directed graphs. Dot is a
command-line utility that takes a dot description file as input, and produces an output
file where the nodes are assigned a position in a 2D space based on layout properties.
By default, dot positions nodes to minimize edge lengths and edge crossing. Grappa
renders a graph in a Java applet based on the layout information produced by dot.
dot can also transform a dot description file into a formatted graph using a number of
standard image file formats such as GIF, PS, JPEG, and PDF. The dot description
file is a text file where users specify the edges and nodes to appear in the output
graph. Users are able to control the font type, font size, colors of nodes and edges,
shapes of nodes, labels and so on[27]. User may also provide information that dot
uses in the layout process.
16
Chapter 3. REportal: A User Perspective
This chapter covers the capabilities of REportal from user’s perspective. The main
services of REportal will be presented along with usage scenarios to illustrate how
REportal can be used for a variety of software engineering problems.
3.1 Introduction to REportal
As mentioned earlier REportal is a web-based application that integrates many stand-
along software engineering tools. The service provides these tools are aggregated into
a common presentation that is rendered in a standard Internet Browser. The current
version of REportal provides the following services:
• Registration & Account Maintenance. REportal enables any Internet user
to create an account. All services provided by REportal are executed under a
user context. Furthermore, users can only work with systems that they upload
to REportal.
• Code Analysis. REportal provides two analysis services, one for Java code
and the other for C/C++ code. These services analyze programs and generate
repositories with all the necessary information about the system being analyzed.
Using these repositories, advanced code analysis, such as query and clustering,
can be done. Once created, the repositories are associated with a particular
users project, enabling them to perform additional analysis without having to
re-upload source code.
17
• Code querying and browsing. REportal allows users to perform entity or
relationship queries that explore structural information about the programs
being analyzed. In addition, REportal can display fully cross-referenced source
code in a web browser using standard HTML hyperlinks.
• Design recovery and visualization. REportal integrates a tool called Bunch[7,
23, 24], which automatically partitions source-level structures into high-level
subsystems. The results are displayed on a web browser.
• Supporting services. REportal provides supporting services that make the
above services easier to use. These services include user authentication, content-
sensitive help, user workspace management (e.g., creating, renaming, deleting,
uploading and downloading files/folders.).
3.1.1 Installation
Although REportal doesn’t require users to install the reverse engineering tools, the
client’s workstation configuration must support JavaScript, Applets, Cascading Style
Sheets (CCS), and HTML5. Users must verify the following before using REportal:
• JDK 1.3 or higher is installed locally.
• The web browser enables Java and JavaScript.
• The latest version of web browsers is highly recommended. We have tested
REportal using Internet Explorer 5.5 or higher, or Netscape 6.0 or higher.
• A web browser with support for Secure Sockets Layer (SSL).
5Most of these services are provided intrinsically by modern day browsers.
18
Figure 3.1: Sign up for a REportal account
3.1.2 Login and Sign Up
REportal uses SSL[29] to encrypt transmission of source code and user authentication
information. When users go to the REportal website[8], they are prompted that they
are about to view pages over a secure connection.
To create a REportal account, users must provide their first name, last name,
desired username, password, company name, and a valid email address, as seen in
Figure 3.1.
As soon as the request is submitted, a welcome email message is sent to the user’s
mail box, and an account is created.
Once a user has an account, they are eligible to login to REportal by providing a
19
Figure 3.2: Log into REportal
valid username and password, as seen in Figure 3.2.
3.2 Workspace Management and Navigation
Each authenticated user has their own workspace with some management privileges.
Users are allowed to do the following:
• create folders for projects.
• Upload source file packages (e.g., Zip of Jar) to their private workspace.
20
• Download files or folders to their local machines.
• Rename or delete a project.
• Upload graphs in MDG, DOT or SIL format to the graphs subfolder.
• Browse the entire workspace.
• Select a project and analyze it.
Workspace Management
Users need to create folders in their working spaces; each folder holds only one
project. Once a folder is created, the src and graphs subfolders are created automat-
ically. Although users are permitted to create additional folders for projects, they
cannot change the structure of the folders, meaning that they cannot delete, rename
or create subfolders (Figure 3.3). All operations including creating, opening, renam-
ing, deleting, uploading and downloading are performed by selecting an appropriate
item from a context menu, which is activated by right-clicking on a folder/file.
Figure 3.4 shows the process of building up a workspace. Once a folder is created,
users can upload zipped source code to the folder. Users are prompted to select a
zipped file from their local machines. All source code must be zipped into a single
file. For C/C++ programs, all source files including headers must be included in
the zipped file. For Java programs the user may upload either bytecode or the Java
source code; since Java bytecode executes on top of a virtual machine, code analysis
can be performed by introspecting the bytecode directly. However, Java source code
is required if the user also want to take advantage of REportal’s source code browsing
feature.
If the source code is uploaded successfully, the file is upzipped into the src sub-
folder automatically, keeping the file structure intact. Then, REportal automatically
21
User 2 User n
User File System
Project 1 Project n
src graphs
Project 2
User 1
Created after login
Creating, uploading, renaming, deleting and downloading are allowed.
Uploading and downloading are allowed. Only downloading
is allowed.
Downloading and deleting are allowed.
Only downloading is allowed.
Figure 3.3: Structure of user’s file system
Create a folder
Upload a zipped project
Open the project and analyze it
Manage the working space
Figure 3.4: Process to build up a working space
22
generates the two repositories: entity.db and relationship.db, against which fur-
ther analysis can be conducted.
Workspace Navigation
As shown in Figure 3.5, users are able to navigate through their workspace. RE-
portal displays the tree structure of the workspace. If an icon of an entity appears as
a folder, users can expand it by left-clicking the mouse to see what the folder contains.
Once a folder is open, the contents of the folder are displayed in the tree structure,
and the icon of the folder changes indicating that the folder is open. Left-clicking the
icon again closes the folder. Users have limited privileges to manage certain folders;
they cannot, for example, rename or remove the src folder and the graphs folder.
3.3 Code Querying and Browsing Service
Once users open a project, the query window is displayed (Figure 3.6), which allows
users to perform customized analysis. All tools are listed on the top of the window
as tabs. The Entity Search tool is enabled by default, and all files in the selected
project are listed in a table. Hyperlinks to the source code browsing feature of RE-
portal are associated with the files that have source code available. For example,
clicking on the TAP/TAPClient.java link will redirect REportal to display this file’s
actual source code. As mentioned earlier, when REportal displays source code it is
cross-referenced. Using HTML hyperlinks allowing the user to perform additional
source code analysis.
Code Querying
Code Querying is a very important service provided by REportal with basic features
including entity search, relationship search, text search, and advanced search.
23
Figure 3.5: Workspace display and navigation
Entity Search
An entity represents a static and syntactic construct such as a macro, a type, a
function, a file, etc. For Java programs, entities represent files, classes, methods and
fields, while for C programs, entities represent types, functions, variables, macros,
and files (see Table 3.1). In an entity search, the user can select the entity type (file,
class, method or field) and specify the name of the entity (Figure 3.7). Wild cards
may be used for entity names as described in Table 3.2.
Query results are displayed in a table that has two fields: the name of the entity,
24
Figure 3.6: Query window
and the file in which the entity resides. If the source code of the file is available, a
hyper link, which leads to the source code browser, is associated with the file. More
about source code browsing is described in section 3.3. In addition, a hyper link
is associated with the entity, which links to source code browser to the definition
of the entity. Query results are restricted to 20 entries per page, which minimizes
download time and is a popular implementation approach for web-based systems that
need to display the results of large queries. By default, the first page is displayed.
However, users can jump to any page or display all query results in one page. Figure
3.7 illustrates a query of all methods in Ticket Auction Program (TAP).
The entity search can answer questions like:
25
Table 3.1: Entity Types for Entity SearchC Javafile file
types class
function method
variable field
macros
Table 3.2: Wild Cards Accepted by Entity SearchName File? Matches any single character
* Matches any sequence of zero or more characters.
[x...y] Matches any single character specified by the set x...y.
A minus sign may be used to indicate a range of Characters.
• What classes are defined in the project?
• What are the methods whose names start with “b”?
• Where is the field “fileName” defined?
• What files have source code available?
Relationship Search
A system-level relation is an association between two entities, such as inheritance
between two classes, a function call between two methods, or a reference to a variable
by a function. A relationship query takes two parameters: a source and a destination
entity. The query returns a collection directed relations from the source to the desti-
nation. For example, to list all entities that refer to a particular entity, users specify
the destination entity and leave the source entity as a “*”. On other hand, to find all
entities that a specific entity refers to, users specify the source entity and leave the
destination entity as a “*”.
Similar to the entity search, users can specify destination and/or target by speci-
fying the entity type and entity name. Entity name may also be specified using the
26
Figure 3.7: Entity search
wild cards listed in Table 3.2 as well. Some examples of relationship queries are shown
in Table 3.3.
Query results of a relationship search can be listed in a table. The table has four
columns for the names of source and destination entities, and the files in which they
reside. If the source code of a file exists, hyperlinks are associated with the entity and
the file name containing the entity. Figure 3.8 shows all the class to class relationships
in the TAP system.
Query results can also be displayed as a graph as seen in Figure 3.9. If users
choose this option, a new window appears with the graph displayed as an applet in
the left panel and a bird’s eye view of the graph in the right panel. In the graph, a
node represents an entity, while a link between two entities represents a relationship.
27
Table 3.3: Examples of Relationship SearchQuery Target type Target
nameDestinationtype
Destinationname
All entities
refer to method
bid
All * method bid
All entities that
method bid refers
to
method bid All *
All classes that
inherit from
classes with
initial b
class * class b*
All fields end
with c being
referenced by
class c
class c field *c
Figure 3.8: Relationship search
28
Figure 3.9: Display results as a graph
The time needed to display a graph is based on the size of the graph. The size of
the applet window varies with the size of the web browser, and the monitor’s screen
resolution.
The bird’s eye view is very appealing for navigating through large graphs that
cannot be displayed on one screen. If users click in an area of the bird’s eye view
window, the graph in the main window centers to the place where it was clicked.
Also, users can click the mouse and drag it so that a block of nodes and links are
highlighted in the bird’s eye view window and the main graph will zoom to that block.
The graph is displayed in the center of the applet with six buttons at the bot-
tom, which are -, =, +, Reachability Query, Cluster this Graph and Advanced
Cluster this Graph.
29
• - This button zooms out the graph.
• = This button centers the graph.
• + This button zooms in the graph.
• Reachability Query. If users want to determine which entity can be reached
by a particular entity, or which entity another entity can reach, they can click
on the corresponding node in the applet, and click “Reachability Query”. A
new window appears as seen in Figure 3.10. Users may query what can be
reached by the entity by specifying the direction as forward ; or they may query
what can reach the entity by specifying the direction as backward. Users can
also limit the type of entities that are reachable to/from a entity. For Java
programs, entity types are packages, files, methods, classes, fields, strings, and
interfaces. User may also specify the depth and output format of reachability
query. If users select database as the output format, a window similar to Figure
3.11 appears. In this window all entities that match query criteria are displayed.
• Cluster this Graph. The entities and relations in a system may also be
represented as a graph where the nodes are the entities and the edges in the
graph are the relations. Up to this point we have represented this information
using a tabular view. Once a system’s representation is modeled as a graph,
we can use the Bunch software clustering tool[7, 23] to partition the graph to
obtain high-level structural information about the system. As shown in Figure
3.12, the results of the clustering activity are displayed visually as an applet.
The contents of subsystems are hidden as blocks, but they may be expanded
by double-clicking on them.
• Advanced Cluster this Graph. Similar to “Cluster this Graph”, this fea-
ture also clusters the graph using Bunch and displays results as an applet.
30
Figure 3.10: Reachability query
Figure 3.11: Reachability query results
31
Figure 3.12: Clustering a relationship search graph
Unlike “Cluster this Graph”, before the clustering is conducted, a new window
is presented (Figure 3.13), where users have an opportunity to customize the
clustering engine. For example, they can exclude some modules from the clus-
tering process and can upload a file that may restrict the placement of particular
modules into certain clusters.
Text Search
Text search allows users to search the entire source code of the system for a
particular text pattern. It accepts wild cards as described in Table 3.2. A user can,
for example, search for all the source code lines that contain the text “main”, or a
text pattern that begins with “ma” and ends with “n”.
The query results are displayed in a table as seen in Figure 3.14. The table
32
Figure 3.13: Advanced clustering a relationship search graph
contains three columns: the file in which the text/pattern resides, the line number
of the text/pattern in the file, and the contents of the line. As usual, if the source
code of a particular file is available, hyperlinks are placed on the file name that can
be used to access the source code browsing view where the specified line of code is
displayed at the top of the window.
Advanced Search
The entity search and relationship search allows primitive queries that accept
entity types and names. More complicated queries can be performed by the Advanced
Search feature of REportal. The interface of the advanced search is shown in Figure
3.15.
33
Figure 3.14: Text search
Not only can users specify entity names and types, but can also specify the file in
which the entity resides, the scope of the entity (e.g., the scope could be protected,
public or private), and the type of query (either an entity or a relationship search).
Query results are displayed in tables with more information. Figure 3.16 shows all
public classes in the TAP system. It displays information about entity names, scopes,
files in which they reside, and beginning and end line numbers of each entity.
With the additional capabilities of the advanced search feature, users can perform
more complicated queries such as:
• Which private classes implement class bidder?
• Which classes in file TAPServer reference the class bidder in file bidder?
• List all of the private methods in file search?
34
Figure 3.15: Advanced search
• How many files does the package bunch contain?
• List all interfaces that are defined in files beginning with the letter “C”?
• Is the run method in the class command referenced by the notify method in
class mail?
Code Browsing
If users provide source code when uploading their systems to REportal, they can
browse the source code in a web browser. Like many commercial Integrated Develop-
ment Environments, different types of information in the source code are displayed
35
Figure 3.16: Advanced search results
in different colors for easy reading. For example, line numbers in front of each line of
code are displayed in a light gray indicating that the line numbers are not part of the
source code; reserved words are displayed in green, comments are displayed in red,
and the other source code in black.
When source code refers to a program entity, a link is placed in the browser
window. If a user follows the link, the window as seen in Figure 3.10 appears. From
this window users can perform a reachability query to find what can be reached by
the selected entity.
36
Figure 3.17: Code browsing
3.4 Design Recovery
REportal uses the Bunch clustering tool to infer high-level design information about
a system. REportal mimics the Graphical User Interface (GUI) of Bunch, so that the
user’s experience in Bunch can be carried to REportal, or vice versa.
Using “Bunch”
Figure 3.18 shows the user interface of the Bunch software clustering service.
It contains two sections, the “option” window on the top allows user to configure
clustering options, while the bottom section is used to perform a given action. On
top of the “option” window, there is a set of tabbed panels, which are “Basic”,
37
Figure 3.18: Integration of Bunch in REportal
“Options”, “Libraries”, “Omnipresent”, “User Directed” and “MQ Calculator”. If
these panels are activated (by clicking them) users can switch to a different “option”
window.
When Bunch is launched, by default, the “Basic” option window is displayed,
where users can specify input graph files (MDG format) and clustering methods.
Clustering a graph requires users to select an MDG graph from the drop-down list
of the “Input Graph File” tab, which lists all graphs in the “graphs” folder of the
project. When a graph is selected, the “Run” button at bottom becomes active
indicating that clustering is allowed. MDG files can be created by two means:
1. Locally create a MDG file either manually or automatically, then upload the file
to the “graphs” folder. The “upload” button in the “Basic” window launches
the window as shown in Figure 3.19, where users can select a MDG file from
38
Figure 3.19: Upload a MDG file
Figure 3.20: Create a MDG file
their local machines and upload it.
2. Alternatively, users can create custom MDG files on REportal directly by
launching a window as shown in Figure 3.20. Users can customize the files by
excluding or including method-to-variable relationships, package names, weights
on relationship, method-to-method relationships, implementation relationships,
and/or relation types.
Once the MDG file is created, it is selected as the input graph file automatically.
Users can then run clustering services after a MDG file is selected. Depending on the
39
configuration of the clustering engine and the size of the MDG, the time required to
cluster the system varies. After the clustering finishes, four output files are generated
for viewing – “dot”, “pdf”, “ps” and “gif”. If everything works correctly, the “Down-
load” and “View” buttons in the “Basic” window is activated. Users may then choose
one of the four formats and download an image of the clustered graph to their local
machine. Alternatively, they can view graph online as an applet. If this option is
chosen, a window similar to Figure 3.9 appears with the graph displayed in an applet.
This option supports the bird’s eye view and reachability queries also.
Figure 3.21: Hill-climbing configuration window
Clustering method
“Bunch” supports two clustering methods: Hill-Climbing and Genetic Algorithm
(GA). Users can choose either from the “Clustering Method” list. Hill-Climbing is
the default clustering method. Hill-Climbing works best for most graphs, while the
GA sometimes is more efficient than Hill Climbing for extremely large graphs.
Users also have opportunities to configure the two clustering algorithms by click-
40
Figure 3.22: GA configuration window
ing the “Option” button. Depending on which clustering method they choose, one
of the windows as shown in Figure 3.21 or Figure 3.22 appears. For the GA method,
users can configure GA selection methods, numbers of generations, population sizes,
crossover probabilities and mutation probabilities; while for the Hill-Climbing algo-
rithm, users can configure generation sizes, percentages of search space, percentage of
randomization, and disable/enable simulated annealing. For more information about
the configuration options, refer to Mitchell’s Ph.D. thesis [11].
Clustering Options
The clustering options window allows users to control the behavior of the clustering
algorithms.
• Objective Function. Users may choose one of the following measurements for
evaluating the clustering results: incremental MQ, incremental MQ weighted,
turbo MQ function and turbo MQ squared. The incremental MQ weighted is
default.
41
Figure 3.23: Clustering options
• Limiting running time. Users may establish an upper-bound on runtime by
setting this field to a specific value in milliseconds. Clustering stops when the
limit is reached and the best result found so far by the clustering search engine
is the returned answer.
• Agglomerative Output Options. These options are valid only when “Ag-
glomerative Clustering” is selected in “Action”. The agglomerative clustering
algorithm generates several levels of the module decomposition hierarchy from
the most detailed level to the topmost level. Users may choose the median level,
the most detailed level, or the topmost level as the output.
42
Figure 3.24: Using libraries
Using libraries
The “Library” tab is activated when an MDG file is selected (Figure 3.24). This
feature allows users to exclude those nodes that only have incoming edges (libraries)
from the clustering process. Library nodes tend to obfuscate the abstract view of
structure because all incident edges in the MDG are directed towards the library
module. Thus, the library modules’ placement into a cluster become somewhat arbi-
trary. This result means that library modules may affect the results dramatically.
In REportal’s library window, users can either select nodes manually from the
list on the left and move them to the right, or click the “FIND” button to move all
libraries from left to right automatically. All nodes in the right list are treated as
libraries and are excluded from clustering activity. In the resultant clustered graphs,
these nodes are placed in a special cluster and are displayed in gray.
43
Figure 3.25: Using omnipresent modules
Using omnipresent modules
Omnipresent modules are modules that either have many incoming edges or outgo-
ing edges relative to the other nodes in the MDG. Modules that have many incoming
edges are called omnipresent clients; while those that have many outgoing edges are
called omnipresent suppliers. In the “Omnipresent” window, which is activated only
after a MDG file is selected, users can exclude omnipresent modules from the clus-
tering process(Figure 3.25).
The list to the left shows all modules that may be selected as omnipresent modules.
The list does not contain those modules that have already been selected as libraries.
44
Likewise, the list on the left in the library window does not contains those modules
that have already been selected as omnipresent6. Users may select certain modules
as omnipresent manually, or press the “FIND” button to have REportal suggest
omnipresent modules automatically.
Figure 3.26: User-directed clustering
User-directed clustering
Users sometimes have prior knowledge about which modules should be placed
within a subsystem. The “User Directed” window (Figure 3.26), which is activated
6In other words, each module in the MDG will either participate in the clustering process, or willbe tagged with a special type such as “library” or “omnipresent”.
45
only after an MDG file is selected, allows users to upload a description file that
specifies what modules should be placed together into clusters. By selecting the
“Lock Clusters” checkbox, no modules will be added into the subsystems that are
defined by the users. If the “Lock Clusters” option is not selected, Bunch may move
additional modules into the user-specified clusters.
Figure 3.27: MQ calculator
MQ calculator
This feature, which ia accessed by pressing the MQ Calculator tab, can be used
to measure the quality of an MDG partition. Bunch uses the Modularization Quality
(MQ) function to evaluate the relative “quality” of a particular MDG partition. To
46
determine the MQ for a provided clustered input file, users select an MDG file and a
SIL file from their graphs folder, and then click the “Calculate” button. The number
of nodes, edges, clusters in the graph, along with MQ value are displayed as results.
3.5 Chapter Summary
This chapter describes the functionality of REportal from a uses’s perspective. The
next chapter is dedicated to REportal’s architecture.
47
Chapter 4. REportal: A Developer Perspective
4.1 Development History
REportal integrates a repository of reverse engineering tools developed over the past
several years by SERG and the AT&T Research Labs. Many people have been in-
volved in the development of these tools. Yih-Farn Chen, Emden R. Gansner, Elefthe-
rios Koutsofios and Jeffrey Korn at the AT&T Research Labs developed the acacia[13]
and chava[18] source code analysis tools. Emden R. Gansner, Eleftherios Koutsofios,
Stephen C. North and Kiem-Phong Vo developed the graph drawing tool dot [14].
Naser S. Barghouti, John M. Mocenigo, and Wenke Lee developed the graph package
Grappa[39]. Brian Mitchell and Spiros Mancoridis developed the software clustering
tool Bunch[7, 11, 23, 24].
Since these tools were developed in different languages and for different platforms
with different user interfaces, installing and learning how to use these tools can be
a complete undertaking. In the year 2000, Spiros Mancoridis and Yih-Farn Chen,
inspired by the idea of “software as services” model, came up with a proposal for a
web-based portal site that integrates these tools, and hides their complexity behind
a simple and intuitive user interface.
The proposal was sponsored by grants from the National Science Foundation
(NSF) and the research laboratories of AT&T. Timothy S. Souder and Jeffrey L.
Korn started the implementation of REportal. Souder set up the deployment en-
vironment for REportal, implemented a few straightforward administrative features
48
such as downloading, renaming and deleting files and folders. The initial release of
REportal also integrated a few of Bunch’s features through the use of Java Servlets7.
Korn implemented graph visualization, reachability query and source code browsing
features via Common Gateway Interface (CGI) programming model.
By the end of June 2001, when I took over the development of REportal, it had
a few basic features such as code analysis, code browsing, design extraction and
supporting services. The early work was a proof of the concept for REportal. It was
clear that REportal needed to be overhauled and expanded before it could be useful
to the software engineering community.
William Mongan soon joined the REportal development team. Our goal was
to re-implement REportal so that it could provide a simple and intuitive interface
with many additional new features. Mongan maintains our development machines,
is involved in the design of the user interface, and works on the graph visualizations
feature. Due to his contribution, REportal is able to visualize large graphs both
in IE and Netscape, and users can navigate through these large graphs using the
bird’s eye viewing feature. He also found and fixed many bugs. I worked on the
design and implementation of the new interface, the integration of Bunch’s features
into REportal, and the addition of the features for user workspace management and
navigation.
4.2 REportal Non-Functional Requirements
The functional requirement of REportal were outlined in Chapter 3, which describes
REportal from a user’s perspective. In addition to these functional requirements,
REportal was designed to satisfy the following non-functional requirements:
7The original architecture by Korn and Souder is still being used, although most of the classeshave been rewritten, modified, or expanded to support the new interfaces and features.
49
• Extensibility. As a long-term research project, REportal is designed to be
a central repository of reverse engineering tools. Although it only supports
static system analysis today, eventually more tools will be integrated to support
security analysis and dynamic program dialysis.
• Installation and Portability. Currently, REportal runs on a server in the
Software Engineering Research Group (SERG) at Drexel University. To use the
services of REportal, users must first upload their source code to the server. If
security is a key concern for users, they may hesitate to do so. Hence, one feature
of REportal is its portability to other servers, perhaps behind a corporate fire
wall. Therefore, it is important that REportal be easy to install, and adjustable
to variety of operating systems and servers.
• Usability. As a portal web site that provides complicated software engineering
services, usability is a top design concern. Providing robust, reliable services
through intuitive and friendly interfaces is a very important requirement.
4.3 The REportal Process
Figure 4.1 shows the high-level process supported by REportal. Once the source
code is uploaded to the server, depending on whether the source code is written in
Java or C/C++, Acacia (C/C++)or Chava (Java)is used to create two repositories:
entity.db and relationship.db. These database files store the information about entities
in the source code and the relationships between them, respectively. Other than text
searching, all advanced queries such as the entity searching and relationship searching
are conducted by querying the two repositories.
Advanced queries are conducted through a set of CIAO8 query tools, such as cdef,
8The CIAO tools, cdef and cref, work against the database files provided by Acacia and Chava.
50
Source code
Source code analysis tools
Acacia Chava
Repositories
relationship.db entity.db
CIAO querying tools
Bunch
Relationship search
Entity search
Source code browsing
Visualization tools
dotty Grappa
Text search Unix utility tools
Figure 4.1: The process supported by REportal
cref. The results produced by these tools contain the information about the entity
name, the ID of an entity in the repositories, the file in which the entity resides, the
ID of the file, the beginning and ending line numbers of the definition of the entity,
the relationship type, the scope of the entity, and so on. Some of this information,
such as entity names and files in which they reside, are used when displaying results
in table listings.
Design extraction is achieved using the Bunch clustering tool. REportal generates
a subprocess in which a set of Java programs invoke the Bunch API9 to perform
clustering. The Bunch clustering tool takes Module Dependency Graphs (MDGs) as
input. The MDGs are generated by a unix shell script which invokes a set of CIAO
query tools to produce a file in the correct MDG format.
Query results may be displayed online in tabular format or graphically as an applet
9Bunch provides a graphical user interface to support stand-alone usage, and an API to supportintegration with additional tools.
51
File server ( snoopy )
Web server ( tweety ) Development machine ( art )
Check in
Check out
Load
Figure 4.2: Development and deployment environment of REportal
by Grappa [39].
4.4 Development and Deployment Environment
REportal runs on tweety, which is a Unix machine that runs the Apache web server.
tweety runs Red Hat Linux 7.2 on a 1 GHz Pentium III processor, with 256 MB RAM
and 18 GB SCSI hard drive. tweety has all of the necessary tools, shell scripts, and
software to run REportal.
snoopy runs Red Hat Linux 7.2 on dual 1.5 GHz Xeon processors, with 1 GB
PC800 RAM, 500 GB external RAID array and 100 GB internal array. snoopy works
as a file server that stores the source code and bytecode of REportal among other
things. The code of REportal is periodically copied from snoopy to tweety, so that
tweety always reflects the latest development of REportal.
REportal is developed on another machine named art. art also runs Apache so
that it can be used for development and testing. Each developer has a development
folder on art, which contains the source code of REportal. Each developer works in
52
their respective folder, and version control is performed using the Concurrent Versions
System (CVS).
After a development change (and before checking the code into the repositories
on snoopy), the developers must update their local folders so that their work can be
merged using CVS.
The web server that runs REportal and the techniques used to develop it are
described in the following subsections.
4.4.1 Apache
The Apache server is a powerful, flexible, HTTP/1.1 compliant web server. It is
available from the Apache Software Foundation at no charge and comes with an un-
restrictive licence [3]. “Apache has been the most popular web server on the Internet
since April of 1996. The March 2002 Netcraft Web Server Survey found that 54%
of the web sites on the Internet are using Apache, thus making it more widely used
than all other web servers combined”[3].
Apache runs on Windows NT/9x, Netware 5.x and above, OS/2, and most versions
of Unix. It has been shown to be substantially faster, more stable, and more feature-
rich than many other web servers.
Apache is used as the web server for REportal. Because of Apache’s popularity,
REportal can be deployed easily on other servers.
4.4.2 Java and Java Servlets
Introduced by James Gosling at Sun Microsystems in 1995, Java has gained tremen-
dous popularity over the years. As one the fastest growing programming technologies
of all time, Java has become an ideal language for server-side development of large
applications such as REportal.
The cross platform nature of Java facilitates the portability of REportal. Java’s
53
object-oriented, memory-protected design reduces development cycles and increases
reliability.
Java Servlets are special Java classes that can be loaded dynamically to provide
web developers with a simple, consistent mechanism for extending the functionality
of a web server. Java Servlets have many unique features for creating dynamic web
content. Many of these features overcome scalability and performance limitations
that are associated with other server side technologies such as CGI and server-side
JavaScript.
Portability
Java Servlets are supported on all major platforms, and work with all of the
major web servers [5]. This feature enables REportal to developed on a high-end
Unix server running Apache, and to be deployed effortlessly on another platform
running a different web server, such as a Windows NT machine running the Java
Server.
Efficiency
Unlike CGI, which uses a single process to handle each program and/or request,
Servlets are all handled by separate threads that run within the web server process
[5]. Once a servlet is loaded, it stays in the server’s memory as a single object
instance. The server invokes it to handle a request using a simple, lightweight method
invocation. This design has historically shown that Servlets are very efficient and
scalable.
Security
Due to Java’s exception handling mechanism, Servlets can handle errors safely. If
a run-time error occurs, a exception is thrown which can be handled safely without
54
Reportal servlet
entity.db relationship.db
Unix shell scripts
CIAO querying tools
Child processes
User’s requests
Figure 4.3: Queries against the repositories
the danger of crashing the server. It should also be noted that Apache and the Java
Servlet engine provides a significant amount of logging services, which can be used to
support security audits.
Extensibility
Servlets may access the entire family of Java APIs, including the JDBC (Java
Database Connection) API, networking, multithreading, and object serialization,
among others. Thus, Servlets include all of the benefits of Java environment, in-
cluding portability, reusability, and protection [4].
Java Servlet technologies are utilized by much of the work described in this thesis.
The servlet engine used by the REportal project is Apache JServ.
4.4.3 Configuration of REportal
Other than the text search, all other code analysis and design extraction activities
are performed by querying entity.db and relationship.db database files. As il-
55
Table 4.1: Shell Scripts in REportal (Locations are relative to/usr/local/www/reportal)
Shell script Location Descriptionfix-bunch-dot.sed bin Changes dot files created by Bunch
so that the shapes and colors of
nodes are consistent with other
graphs.
ciaodb bin Produces the repositories for C/C++
and Java programs.
mdg bin Produces custom MDG files from the
repositories.
mdg-helper bin Gets around a JDK1.3 bug in the
Runtime.exec method which hangs
when it invokes a Unix shell
script. Runtime.exec in JDK1.3
may not work well with Unix shell
scripts. mdg-helper is a C program
written by Souder that fixes this
problem. This program must be
invoked before the execution of
scripts.
textSearch bin Invokes the Unix utility grep
and searches for a text pattern
within source files in a certain
directory.
cdef bin/ciao Invokes a set of CIAO tools to
perform entity queries against
the repositories. Returns matched
entities along with information
about the entities.
cref bin/ciao Invokes a set of CIAO tools to
perform relationship queries
against the repositories. Returns
matched relationships along
with information about the
relationships.
56
Table 4.2: BunchAPILinks in REportal (Locations are relative to/usr/local/www/reportal)
Name Location DescriptionBunchOmni src Invokes Bunch to generate a list of
omnipresent modules.
BunchNodes src Invokes Bunch to generate a list of
modules in a system.
BunchLibrary src Produces a list of all libraries.
BuncMQCal src Invokes Bunch to calculate the MQ
value of a given MDG partition.
BunchAPILink src Invokes Bunch to run clustering.
lustrated in Figure 4.3, when queries are sent to the REportal servlet via the http
protocol, the servlet generates child processes which invoke appropriate Unix shell
scripts. Essentially, the shell scripts invoke the CIAO tools to query the two reposi-
tories. The query results are stored in temporary files that are subsequently read by
the servlet and displayed to the users. Table 4.1 summarizes all of the shell scripts
used by REportal.
Likewise, Bunch is loosely integrated into REportal through a set of independent
Java programs, which we call BunchAPILinks. When users have clustering requests,
the REportal servlet creates a subprocess which invokes the BunchAPILinks. The
BunchAPILinks call the appropriate Bunch APIs to perform the requests and stores
results in temporary files. Table 4.2 summarizes all the BunchAPILinks.
Table 4.3 summarizes all other tools used by in REportal.
4.5 REportal Architecture
Figure 4.4 illustrates the architecture of REportal, which consists of 11 subsystems.
Before we go into the details of each subsystem, some important aspects of the RE-
portal architecture are described below:
• The REportal configuration subsystem and the REportal utilities sub-
57
Table 4.3: All Other Tools in REportalName Location DescriptionCIAO /usr/local/ciao/lib
/usr/local/ciao/bin
Querying tools against the
repositories.
dot /usr/local/bin A tool that does graph
layout.
ps2pdf /usr/local/bin A tool that transform a
graph from postscript (ps)
format to Portable Document
Format(pdf).
bunch.jar /usr/local/www/reportal Bunch clustering tool.
JDK /usr/local/jdk1.3.1 01/bin
/usr/local/jdk1.3.1 01/lib
Java development kit.
activation.jar
mail.jar
/usr/local/www/reportal Java Mail API.
Chava /cgi-bin/reportal Java source code analysis
tools.
Grappa /cgi-bin/reportal Java package that visualizes
graphs in applets.
<<subsystem>> User Management
<<subsystem>> Workspace
Management
<<subsystem>> Reportal Utilities
<<subsystem>> Reportal
Configuration
<<subsystem>> Graph Server
<<subsystem>> Source Code
Browsing
<<subsystem>> Bunch API Links
<<subsystem>> Clustering
<<subsystem>> Bunch
<<subsystem>> Shell Scripts and
CIAO Query Tools
<<subsystem>> Query
Dispatcher
Core of REportal
Figure 4.4: REportal architecture
58
User Management Workspace Management
Log on
REportal Utilities
Renaming/creating
Uploading/deleting/ downloading
Query Dispatcher
Open a project
CIAO Quering Tools
entity searching
Clustering
Clustering
Relationship searching
Bunch API Links Bunch
Souce Code Browsing
View source code
Graph Server
View graph View graph
Figure 4.5: REportal subsystem-level sequence diagram
system provides system-wide services to almost all other core subsystems. To
make the figure clear, the two subsystems are not connected to the others.
• The shell scripts and CIAO query tools, source code browsing, Bunch
API Links and Bunch subsystems are loosely coupled to the other subsystems
in the REportal architecture. These subsystems are integrated into REportal
to support services like code analysis, module dependency queries, and so on.
We choose a loosely coupled approach because these subsystems are likely to
evolve in the future. This design eases the maintenance of REportal.
• Depending on the type of queries from users, the query dispatcher subsystem
invokes corresponding tools. Future integration of other tools should require
minor effort.
59
Figure 4.5 illustrates the subsystem-level sequence diagram of REportal. The User
Management subsystem deals with user authentication and registration. Once a user
logs into REportal successfully, he enters the Workspace Management subsystem. All
administrative services such as creating, renaming, deleting, uploading and download-
ing a file/folder are completed by the REportal Utilities subsystem. If the user
selects a project and opens it, they advance to the Query Dispatcher subsystem,
where the user’s requests are dispatched to other subsystems. For example, entity
searching requests are directed to the CIAO Querying Tools subsystem; clustering
requests are directed to the Clustering subsystem.
4.6 REportal Subsystems
This section examines the pertinent design aspects for 7 of the 11 REportal subsys-
tems. The description of the User Management subsystem is combined with that
of the REportal Configuration subsystem for simplicity. The Shell Scripts and
BunchAPILinks subsystems have been covered in section 4.4.3. The Source Code
Browsing subsystem is developed by the AT&T research lab in CGI and is not cov-
ered here.
4.6.1 User Management & Configuration Subsystem
Figure 4.6 illustrates the design of the User Management and REportal Configuration
subsystems.
When the REportal service is requested for the first time, a REportal servlet is
created. When a client connects to the server and makes an HTTP request, the
servlet engine produces one thread for each connection and directs the request to the
Reportal object. The Reportal object serves three purposes:
• Creates a Ruser object, a ReportalUser object, and a session for the client.
60
<<subsystem>>
Workspace Management
<<subsystem>>
Query Dispatcher
<<subsystem>>
Clustering
Reportal
Init() destroy() doGet()
ReportalUser
doAuth() displayLoginS
creen() doRequest
Ruser userName password email firstName lastName
SignUp
<<subsystem>>
Reportal Configuration
uses
Figure 4.6: The user management and REportal configuration subsystem
The Ruser object is associated with the session. During the creation, the con-
figuration file of REportal is loaded in through the REportal subsystem. The
Ruser objects contains information about the client such as user name, pass-
word, first name, last name and so on.
• Depending on the status of the session, it invokes methods in the ReportalUser
object to display the login screen, to do user authentication, or to dispatch user
requests to other subsystems.
• Invalidate the session when the connection is closed or the session expires.10
If the client wants to create a REportal account, the request is directed to the SignUp
class, which displays the sign up form. Once the client finishes and submits the form,
10The lifetime of a session is set to 30 mins.
61
the client’s information is stored in a file on the server. The client’s password is
encrypted by a one-way hash function called MD5. If the creation of the account is
successful, a welcome email message is sent to the user via the Java Mail API.
It is worth pointing out that every HTTP request goes through the same path:
the request is directed to the Reportal object, then forwarded to ReportalUser,
which defines several states. Depending on the state of the request, it invokes ob-
jects in appropriate subsystems (e.g., Workspace Management, Query Dispatcher,
or Clustering). Every HTTP request is processed in this manner.
4.6.2 Workspace Management & Utility Subsystem
The REportal Utility subsystem contains two primary classes: FileUtilities and
DisplayUtilities. The FileUtilities class is a layer between the class Projects
and user’s file system. All manipulation of the file system, such as creating a di-
rectory, renaming a file, deleting a directory, and uploading a file must be done by
the FileUtilities class. Currently, we assume authenticated users have all the
privileges described in Section 3.3 to manage their workspace. Once a user is authen-
ticated, they can start managing their workspace without further validation. In the
future we may require every manipulation to a user’s workspace to be validated. The
separation of this class allows these changes to be isolated from other subsystems.
The DisplayUtilities class provides a layer on top of HTML generation. All of
the HTML generated by REportal is handled by calls to this class. For example, the
openTable() method writes a <table> tag and the doHeader() method produces a
header that includes SERG’s logo and REportal’s banner. The abstraction provided
by this class essentially eliminates HTML-related code from the rest of REportal
subsystems. This design makes the code more readable and easier to change.
The Projects class provides an interactive interface for users to manage their
workspace with the support of the FileUtilities and DisplayUtilities classes
62
Projects
createProject() renameProject() deleteProject() uploadProject()
downloadProject() doProjectScreen()
...
<<subsystem>>
Reportal Configuration
uses
DisplayUtilities
doHeader() doFoot()
openTable() ...
FileUtilities
rename() delete()
receiveFile() sendFile()
...
Utilities
Figure 4.7: The workspace management and REportal utilities subsystem
(Figure 4.7). This class displays the user’s workspace in a fashion that is similar to
Windows Explorer. For example, all directories that are at the same level are aligned
vertically; subdirectories are underneath the directories they belong to; directories
that have subdirectories have a + sign in front of them; the directories are expanded
if the + sign is clicked.
In a web browser, clients have three primary ways to send requests to a server to
get dynamic content:
• HTML forms are the oldest and most flexible method of allowing clients to
interact with servers. User requests, along with some information about the
requests, are sent to the server by submitting a form. This method is used
frequently in the work described in this thesis.
• Clients can use Uniform Resource Locators (URLs) to provide extra information
by creating a query string. For example, authenticated clients can use the URL
https://reportal.mcs.drexel.edu/cgi-bin /reportal/webchava/ciao
src.cgi?name=tapDispatch%2Ejava&pathname =src/TAP/command.java
&id=98zU8&JServSessionIdreportal=k33dy8tus1&key=&type=class
63
to view the source code of the command class in the TAP system.11
• An applet is a program written in the Java programming language that can
be included in an HTML page. REportal uses applets to perform graph visu-
alization. Since applets are downloaded Java programs that execute within a
browser’s context, clients are able to view and query graphs interactively.
Every form in the workspace management page has a hidden variable called “ac-
tion” associated with it. The value of this variable varies from form to form. When a
request is sent to the Projects class, the request is forwarded to appropriate methods
depending on the value of the variable. In this manner, expansions take minimum
effort. These techniques are used in REportal’s query page and clustering page.
4.6.3 REportal Configuration
The REportal Configuration subsystem contains a single class named
ReportalConfiguration. As mentined previously, REportal uses many shell scripts
and other tools to perform its tasks. This class defines the paths to these shell scripts
and tools.
Whenever REportal refers to a shell script or a tool, it finds the location of it
through this class. This class reads the paths from a configuration file, in which the
paths are specified. If the file is not found, the paths are set to default values. This
design simplifies the installation of REportal on other servers where the shell scripts
and tools may be located in different directories on the file system.
11The Appache Jserv assigns a randomly generated string to each session as an ID. This URL isvalid only when the session ID is authenticated.
64
Relationship search table listing
Entity search
Text search
Module dependency graph
ProjectFile
cref
cdef
textSerach.sh
mdg
CGI GraphServer Relationship search graph
ClusterWizard Clustering
Figure 4.8: The sequence diagram of ProjectFile
4.6.4 The Query Dispatcher Subsystem
The ProjectFile class provides an interactive interface to support code analysis,
design extraction, and source code browsing services. This class, which is called
when a project is opened, generates the “query” page. The interface of the “query”
page is designed so that all services that support source code queries and browsing,
and design extraction can be reached directly from this page.
The “query” page provides 7 services: a) entity search b) relationship search that
lists results in a table c) relationship search that displays results as a graph d) create
an MDG file e) cluster an MDG file f) advanced search, and g) text search. The
ProjectFile class either completes the service request by creating a subprocess in
which the appropriate shell script/tool is invoked, or redirects the service request
to other subsystems. As illustrated in Figure 4.8, entity search, relationship search
(listing results in a table), MDG file creation and text searching are completed within
the class. The request to perform relationship search (displayed as a graph), and
the request to cluster an MDG file, are redirected to the CGI and the Clustering
subsystem, respectively.
Whenever a task must be completed by calling a shell script or a Unix command-
65
line tool, the ProjectFile object creates a subprocess, in which the shell script or tool
is executed. The subprocess’ standard input/output is redirected to the parent process
through two streams (Process.getOutputStream() and Process.getErrorStream()).
Some native platforms only provide limited buffer size for standard output streams,
and as such, failure to read the output stream of the subprocess promptly may cause
the subprocess to deadlock. To overcome this problem, all query results are saved in
a temporary file. The parent process waits until the child processes complete. After
that, the parent process reads the results from the temporary file and displays them
to the users.
4.6.5 The Clustering Subsystem
Although the Clustering subsystem contains only one class: ClusterWizard, it
is one of the most complicated subsystems. This subsystem integrates the Bunch
clustering tool into REportal to support the design extraction service.
The ClusterWizard class not only implements the user interface, but also invokes
the Bunch API to perform clustering tasks. The Bunch clustering tool has been under
active development since 1998. REportal mimics the GUI of Bunch, so that the user’s
experience in Bunch can be transferred to REportal, or vice versa. However, due to
web browser constraints and the fact that each user’s workspace resides on the server,
the user interface for clustering in REportal is slightly different from that of Bunch.
For example, The Bunch clustering tool provides a dialog box for users to select an
MDG file from their local machines. Since each user’s MDG files reside on the server,
REportal provides a drop-down list from which users can select an MDG.
Figure 4.9 illustrates the process followed by the ClusterWizard class. Whenever
an MDG file is selected (either by creating, uploading or selecting an MDG file), all
buttons (other than the Download and View button which are activated after clus-
tering) and tabs are activated. The class then produces two subprocess which calls
66
MQ Calculater
BunchMQCal
an MDG
ClusterWizard
Create/Select/Upload
BunchLibrary & BunchNode
Configure omnipresent modules
Relationship searching
BunchOmni
Dot
Create graphs
BunchAPILinks1
Run clustering
Grappa
View graph
Figure 4.9: The sequence diagram of the Clustering subsystem
the BunchLibrary and BunchNode classes. The two java programs invoke the Bunch
API to generate a list of all libraries and nodes in the system and save them in tem-
porary files. When the subprocess is completed, the ClusterWizard reads the files.
If the user chooses to configure omnipresent modules, a subprocess is generated in
which BunchOmni is invoked. When the user hits the Run button, the ClusterWizard
class produces a subprocess which calls BunchAPILinks1 and passes all parameters
to it. Those parameters that the user didn’t configure are set to a default. The
BunchAPILinks1 produces a dot file. The ClusterWizard classes then generates two
additional subprocess which invoke the dot utility to convert the dot file into PDF
and PS files. After that, the Download and View buttons are activated. If the user
chooses to view the graph, the request is redirected to Grappa for display. If the user
wants to download the results they can obtain PostScript file and view them with a
local viewer such as Acrobat reader (for PDF) or gohstscript (for PostScript).
The user may choose to execute the MQ calculator utility. After an MDG and
67
a SIL file are selected, and the user hits the Calculate button, a subprocess calls
BunchMQCal to perform the calculation. The results are stored in a temporary file,
which are read by the ClusterWizard class. The results are then formatted and
displayed to the user.
This chapter described REportal’s overall design and architecture. The next chap-
ter focuses on how we validated REportal’s interface from an ease-of-use perspective.
68
Chapter 5. Validation
Computer software is designed to target a certain group of users; therefore, usability
is essential to its success. Even well-designed software packages built through the
efforts of many software engineers may be rendered useless simply because the end
user cannot use the program easily. This is especially true for REportal since it is
designed to provide a simple, consistent, and intuitive user interface that abstracts
the complexities of the underlying reverse engineering tools.
Although the current user interface has evolved through many iterations, there
may still exist flaws that are inconsistent with the expectations of software engineers
for a portal web site. Likewise, although thorough tests have been done on REportal,
it is very possible that there are some bugs that have gone undetected.
As I write this thesis, the first version of REportal is ready to be released. Before
the release of REportal, we wanted to extensively test the software to remove defects
and investigate ways to improve the user interface.
Evaluating the user interface of software is not easy. Dr. Hewett from the Psy-
chology department of Drexel University, who has rich experience in this area, gave
us several ideas12. According to Dr. Hewett, the easiest way to evaluate the user
interface of a software system is to conduct usability studies in which a few typical
users perform some tasks using the software. In the study, the users are observed to
reveal usability design improvements.
12Dr. T. Hewett can be reached at [email protected]
69
5.1 Design of Tasks
The usability study was designed with the following requirements:
• The tasks should be able to finish within an hour or so, otherwise participants
may lose patience.
• The tasks must be based on real-life scenarios so that the results are meaningful
and convincing.
• To finish the tasks most, if not all, of the features provided by REportal should
be covered.
Two software systems were chosen for this study: Apache Tomcat 4.0.4 and Bunch
3.3.5. Apache Tomcat is an implementation of the Java Servlet 2.3 and JavaServer
Pages 1.2 Specifications. It should be noted that Tomcat version 4.04 is a complete
re-implementation of earlier versions of this application. The open source code of
Apache Tomcat 4.0.4 is available at FRESHMEAT [2] and Jakarta [19]. Bunch is
a clustering tool intended to aid the software developer and maintainer with under-
standing, verifying and maintaining a source code base [7]. Bunch is integrated into
REportal to provide the design extraction feature. Apache Tomcat 4.0.4 has about
40 classes while Bunch 3.3.5 has 220.
The participants were asked to answer six to seven questions for each system.
These questions cover most of the features provided by REportal, such as entity
searching, relationship searching, reachability querying, design extraction, source code
browsing and administrative functions. These questions are practical and are similar
to questions that are likely to be asked when studying a software system. Please refer
to Appendix A where the complete list of tasks for the evaluation are documented.
70
5.2 Conducting the Evaluation Study
The Software Engineering Research Group (SERG) at Drexel University consists of
several undergraduate students, graduate students, and faculty. Some of them have
full-time jobs in industry as software engineers. REportal contains a comprehensive
set of reverse engineering tools to profile and mine the source code of software sys-
tems. The target users are researchers, students and software engineers, which can
be represented by the people of SERG. Therefore, volunteers who represent typical
users of REportal can be recruited from SERG.
Dr. Hewett offered a two-hour lecture for all of the participants involved in the
evaluation study. The lecture covered the method by which information about users
can be gained when evaluating the usability of a software system. To practice the
method described in the lecture, participants are asked to complete the tasks using
REportal during the information-gathering sessions following the lecture. Hence,
their participation becomes a learning experience of a methodology for evaluating the
usability of software, which might even benefit the participants later. Dr. Hewett’s
lecture gave participants educational benefits, thus increasing their enthusiasm toward
taking part in the experiment.
An email message was sent to the SERG mailing list calling for volunteers to
participate in the experiment. Realizing the benefit they may obtain from the exper-
iment, four people volunteered to participate. Among them are one undergraduate
student and three graduate students. One graduate student also has a full-time job
as a software engineer.
After the lecture, each participant attended a one-hour information-gathering ses-
sion, during which they finished the tasks using REportal. Participants were encour-
aged to think aloud and to give feedback about the user interface while complet-
ing the tasks. The author and William Mongan (a REportal developer) observed
and recorded problems that the participants encountered and the feedback that they
71
gave. All information-gathering sessions were video-taped for the record so that we
can review the video and find out what we missed.
5.3 Results of the Evaluation Study
Overall, the study went smoothly. All participants were able to answer most of the
questions correctly within an hour. Although no major problems were found, some
flaws and bugs were revealed:
Faults Found
• Sometimes the graph visualization windows crash. REportal uses Grappa [39]
to draw graphs. According to the author of Grappa, it can only handle small
graphs with size of up to 1 to 2 Megabytes. REportal sometimes has to handle
graphs as large as 3 to 5 Megabytes. REportal’s graph viewing feature may
push Grappa’s limits, as it was implemented to handle smaller size graphs.
• The clustering task freezes occasionally and needs further investigation.
User Interface Problems
• There are many types of relationships between two entities, such as inheritance,
containment and method invocation. However, when relationship searching
results are listed in a table, no information about the types of relationships is
displayed.
• Clustering takes a long time for large systems. When clustering, the graph
window is blank. It is hard to tell if the clustering is in progress. A progress
bar would be helpful.
72
• When a project is opened, all important features are listed as tabs on the top
of the window for easy access. However, “reachability searching”, an important
feature, is not listed there. As a result, all participants had difficulties finding
and using this feature.
• Advanced Search needs significant improvement. Captions in this feature are
not descriptive. Many features in it do not work properly.
Some other interesting observations were:
• Although REportal provides context-sensitive on-line help documentation, and
the hyperlink to it is placed on the right top of every window, people only
view the documentation as a last resort of help. When they get stuck, they
would rather go through many iterations of trial-and-error than refer to the help
documentation. Even when they go to the help documentation, they glance at
screen shots in an attempt to find the answer as soon as they can, instead of
reading documentation.
• When a project is opened, a list of tabs is displayed on the top of the window.
Each tab is associated with a feature and it is enabled only after it is clicked.
The only exception is that the first tab is enabled as default. The choice of
the default tab is important. Since the first tab, Entity Search, is enabled as
default, people attempt to answer all questions using this feature. They switch
to other features only when this one fails. This implies that the most frequently
used feature should be set as default.
• Although the user interface of REportal is designed to be intuitive, we found
that participants still have trouble using it the first time. This is reasonable
and analogous to renting a car. Even though every driver knows how to drive,
they still need a few minutes to get familiar with the car they just rented. Once
73
they know what is where, they can drive it comfortably. The same logic applies
to REportal. Once the users know what features REportal provides and how
to use them, they can use REportal very comfortably. This explains why they
spent less time studying the second system than the first one.
User Wish List
Some valuable suggestions from participants are listed as below:
• REportal should offer a feature that gives statistics about a project, such as
number of classes, relationships, and methods.
• When doing a relationship search, REportal should provide spell checking for
the names of entities, because a misspelled entity name returns nothing, nor
does a search in which no relationships exist.
74
Chapter 6. Conclusions and Future Work
This chapter summarizes the work described in this thesis and outlines some plans
for future work.
6.1 Summary & Research Contributions
Practitioners and researchers can take advantage of the latest developments in reverse
engineering if the services provided by the tools are made available on the WWW.
In this thesis, we present a portal web site called REportal, which is a web-based
application that integrates many stand-along reverse engineering tools. These tools
reside on a server and can be accessed by authorized users through internet browsers.
REportal provides an intuitive and friendly user interface that ease the using of those
tools. Using REportal, users are able to analyze and browse source code, as well as
to query and extract design information for Java programs. REportal is valuable to
both experienced and inexperienced users.
To work with REportal, users only need Java-enabled web browsers. This allows
them to analyze and visualize source code without going through the troubles of
obtaining, installing, and learning how to use the set of reverse engineering tools.
Users all over the world may access the latest version of the portal service easily.
The author’s main contributions include:
• Designed and implemented a user interface that enhanced REportal’s usability.
75
• Developed many new features to REportal such as file system navigation, the
Bunch clustering tool integration, entity searches, relationship searches, and
text searches.
• Maintained the running of REportal.
At the time of this writing, REportal has 54 users from institutions of higher
learning and industry. Up to this point and time REportal has been in beta, we are
now ready to release our first production version.
6.2 Plans of Future Work
This section outlines our plans for future work, which include REportal’s new services,
and improved security.
• C/C++ program analysis. Currently REportal only supports Java pro-
grams. In the future, GAST-MP (a source code analysis tool developed at
SERG for GNU C) will be integrated with REportal so that C programs can
be supported as well.
• Portability. Professional software developers prefer to analyze and store their
code on their own sites because of security concerns. One way this can be
facilitated is to develop a portable version of REportal that can be installed
locally (probably behind a corporate fire wall).
• Enhanced security. Other than user passwords, other information such as a
project’s source code is not encrypted. Future work might encrypt the whole file
system so that users are more confident when they are uploading their source
code to the server.
76
• Personalization. Users will be able to personalize REportal in the future
configure personal preferences such as the layout of tools, background colors,
and the number of query entries per page, and so on.
• Data mining. With additional features and security, we expect that REportal
will have more active users in the near future. As a result, this will create a
large repository of source code available for data mining analysis to support
future software engineering research.
77
Bibliography
[1] The drawdag at AT&T. http://www.research.att.com/dist/drawdag/mail.
[2] Freshmeat. http://freshmeat.net/.
[3] The Apache Software Foundation. http://httpd.apache.org/.
[4] JAVA Servlet Technology. http://java.sun.com/products/servlet/index.html.
[5] Jason Hunter and William Crawford. JAVA Servlet Programming. Chapter 1,pages 1-7. O’Reilly, 1999.
[6] The AT&T Labs Research Internet Page. http://www.research.att.com.
[7] The Bunch Project. Drexel University Software Engineering Research Group(SERG). http://serg.mcs.drexel.edu/bunch.
[8] The REportal Project. Drexel University Software Engineering Research Group(SERG). http://reportal.mcs.drexel.edu/.
[9] LintPlus Online. http://www.cleanscape.net/products/lintonline.
[10] P.A.V. Hall. Software Reuse and Reverse Engineering in Practice. Vhapman &Hall, 1992.
[11] Brian Mitchell. A Heuristic Search Approach to Solving the Software ClusteringProblem. Drexel University Ph.D. Thesis, 2002.
[12] Y. Chen. Reverse engineering. In B. Krishnamurthy, editor, Practical ReusableUNIX Software, chapter 6, pages 177–208. John Wiley & Sons, New York, 1995.
[13] Y. Chen, E. Gansner, and E. Koutsofios. A C++ Data Model Supporting Reach-ability Analysis and Dead Code Detection. In Proc. 6th European Software En-gineering Conference and 5th ACM SIGSOFT Symposium on the Foundationsof Software Engineering, September 1997.
[14] E.R. Gansner, E. Koutsofios, S.C. North, and K.P. Vo. A technique for drawingdirected graphs. IEEE Transactions on Software Engineering, 19(3):214–230,March 1993.
[15] P. Finnigan, R. C. Holt, I. Kalas, S. Kerr, et al. The Software Bookshelf. IBMSystems Journal, 36(4):564-593, 1997.
78
[16] R. C. Holt. Concurrent Euclid, The UNIX System and Tunis. Addison Wesley,Reading, Massachusetts, 1983.
[17] K. Hwang. Advanced Computer Architecture: Parallelism, Scalability, Pro-grammability. McGraw-Hill, 1993.
[18] J. Korn, Y. Chen, and E. Koutsofios. Chava: Reverse engineering and trackingof java applets. In Proc. Working Conference on Reverse Engineering, October1999.
[19] The Jakarta Project. http://jakarta.apache.org.
[20] R. Koschke. Evaluation of Automatic Re-Modularization Techniques and theirIntegration in a Semi-Automatic Method. PhD thesis, University of Stuttgart,Stuttgart, Germany, 2000.
[21] R. Koschke. Software visualization in software maintenance, reverse engineering,and reengineering - a research survey. http://www.informatik.uni-stuttgart.de/-ifi/ps/rainer/softviz/, 2000.
[22] S. Mancoridis. ISF: A Visual Formalism for Specifying Interconnection Styles forSoftware Design. International Journal of Software Engineering and KnowledgeEngineering, 8(4):517–540, 1998.
[23] S. Mancoridis, B.S. Mitchell, Y. Chen, and E.R. Gansner. Bunch: A clusteringtool for the recovery and maintenance of software system structures. In Proceed-ings of International Conference of Software Maintenance, pages 50–59, August1999.
[24] S. Mancoridis, B.S. Mitchell, C. Rorres, Y. Chen, and E.R. Gansner. Usingautomatic clustering to produce high-level system organizations of source code.In Proc. 6th Intl. Workshop on Program Comprehension, June 1998.
[25] S. Mancoridis, T. Souder, Y. Chen, E. R. Gansner, and J. L. Korn. REportal: Aweb-based portal site for reverse engineering. In Proc. Working Conference onReverse Engineering, October 2001.
[26] Microsoft .Net. Microsoft Corporation. http://www.microsoft.com/net.
[27] S. North and E. Koutsofios. Applications of graph visualization. In Proc. Graph-ics Interface, 1994.
[28] The Drexel University Software Engineering Research Group (SERG).http://serg.mcs.drexel.edu.
[29] Secure Socket Layer. http://wp.netscape.com/security/techbriefs/ssl.html.
[30] Sourceforge.net. http://sourceforge.net/.
[31] Reengineering Wiki. http://www.program-ransformation.org/re/.
79
[32] The Netcomputing AnyJ Java IDE. http://www.netcomputing.de/html/main/html.
[33] The Red Hat Source-Navigator. http://sources.redhat.com/sourcenav/.
[34] M. Storey, K. Wong, F. Fracchia, and H. Muller. On integrating visualizationtechniques for effective software exploration. In Proc. of IEEE Symposium onInformation Visualization, October 1997.
[35] Tom Sawyer. Graph Drawing and Visualization Tool.http://www.tomsawyer.com.
[36] T.A. Wiggerts. Using clustering algorithms in legacy systems remodularization.In Proc. Working Conference on Reverse Engineering, October 1997.
[37] Graph Drawing Server at Brown Universityhttp://loki.cs.brown.edu:8081/graphserver/gds/gds-home.shtml
[38] S. Bridgeman, A. Garg and R. Tamassia A graph drawing and translation serviceon the WWW. In Lecture Notes Comput. Sci. Springer-Verlag, 1997.
[39] N. S. Barghouti, J. Mocenigo, and W. Lee. Grappa: A Graph Package in Java. inFifth International Symposium on Graph Drawing, pages 336-343. SpringerVer-lag, Sept. 1997.
80
Appendix A. Tasks of Evaluation
A.1 Tomcat
Please create a folder for the Tomcat system, upload the Jakarta-tomcat-4.0.4.zipfrom the disk and answer the following questions using REportal.
1. Where is the doEndTag method defined?2. How many methods are there in Tomcat?3. What class variables are defined in the class src/jakarta-tomcat-4.0.4
/webapps/examples/WEB-INF/classes/filters/ExampleFilter?4. How many subsystems are there in Tomcat? What are they?5. What is the parent class of CompressionFilters.CompressionResponseStream?6. if the CompressionResponseStream.flushToGZip method has bugs, what
classes might be impacted by it?
A.2 Bunch
Please create a folder for the Bunch system, upload the bunch.jar from the disk andanswer the following questions using REportal.
1. In which file is the main method defined?2. How many classes are there in Bunch?3. How many subsystems are there in Bunch?4. What methods are defined in the class BunchServer?5. Does the maximizeCluster method in the src/bunch/BunchServer/
ClusterUsingVectorSAHC class call the getLocks method in the src/bunch/Clusterclass?
6. Two clustering algorithms, namely Hill Climbing and GA, have been imple-mented. What classes possibly need to change if an additional clustering algorithmis added?
7. The class BunchUtilities is known to have bugs. What classes are probablyaffected by it?
Thank you for your participation!