A Reverse Engineering Portal Web SiteA Reverse Engineering Portal Web Site Congrong Liao and Spiros Mancoridis Technical Report DU-CS-02-04 Department of Computer Science Drexel University

A Reverse Engineering Portal Web SiteCongrong Liao

andSpiros Mancoridis

Technical Report DU-CS-02-04Department of Computer Science

Drexel UniversityPhiladelphia, PA 19104

November 2002

1

c© Copyright 2002Congrong Liao. All Rights Reserved.

ii

Acknowledgements

I would first like to thank my advisor, Dr. S. Mancoridis, for his constant support

and guidance. I would also like to thank the other members of my committee: T.

Hewett, D. Salvucci and B. Mitchell for their input and thoughtful direction. I also

want to acknowledge for the people at the SERG lab for their help with the user

interface study, special thanks to W. Mongan for his input to REportal and his

patient help of correcting grammar errors of this thesis.

Finally, I would like to thank Xiaoping Hu, my wife, for supporting me while

I finished this thesis, and my father and mother, for always being there giving me

strength and love.

iii

Table of Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Chapter 1. Introduction and Motivation . . . . . . . . . . . . . . . . . 1

1.1 Our Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Research Challenges Addressed in this Thesis . . . . . . . . . . . . . 7

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Web-based Software Engineering . . . . . . . . . . . . . . . . . . . . . 10

2.2 Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Design Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Graph Drawing for Software Visualization . . . . . . . . . . . 14

Chapter 3. REportal: A User Perspective . . . . . . . . . . . . . . . . 16

3.1 Introduction to REportal . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Login and Sign Up . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Workspace Management and Navigation . . . . . . . . . . . . . . . . 19

3.3 Code Querying and Browsing Service . . . . . . . . . . . . . . . . . . 22

3.4 Design Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

iv

Chapter 4. REportal: A Developer Perspective . . . . . . . . . . . . . 47

4.1 Development History . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 REportal Non-Functional Requirements . . . . . . . . . . . . . . . . . 48

4.3 The REportal Process . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Development and Deployment Environment . . . . . . . . . . . . . . 51

4.4.1 Apache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4.2 Java and Java Servlets . . . . . . . . . . . . . . . . . . . . . . 52

4.4.3 Configuration of REportal . . . . . . . . . . . . . . . . . . . . 54

4.5 REportal Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.6 REportal Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.6.1 User Management & Configuration Subsystem . . . . . . . . . 59

4.6.2 Workspace Management & Utility Subsystem . . . . . . . . . 61

4.6.3 REportal Configuration . . . . . . . . . . . . . . . . . . . . . 63

4.6.4 The Query Dispatcher Subsystem . . . . . . . . . . . . . . . . 64

4.6.5 The Clustering Subsystem . . . . . . . . . . . . . . . . . . . . 65

Chapter 5. Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.1 Design of Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2 Conducting the Evaluation Study . . . . . . . . . . . . . . . . . . . . 70

5.3 Results of the Evaluation Study . . . . . . . . . . . . . . . . . . . . . 71

Chapter 6. Conclusions and Future Work . . . . . . . . . . . . . . . . 74

6.1 Summary & Research Contributions . . . . . . . . . . . . . . . . . . . 74

6.2 Plans of Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Appendix A. Tasks of Evaluation . . . . . . . . . . . . . . . . . . . . . . 80

A.1 Tomcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

A.2 Bunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

v

List of Tables

2.1 All Classes Defined in TAP System . . . . . . . . . . . . . . . . . . . . . 13

3.1 Entity Types for Entity Search . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Wild Cards Accepted by Entity Search . . . . . . . . . . . . . . . . . . . 25

3.3 Examples of Relationship Search . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Shell Scripts in REportal (Locations are relative to

/usr/local/www/reportal) . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 BunchAPILinks in REportal (Locations are relative to

/usr/local/www/reportal) . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 All Other Tools in REportal . . . . . . . . . . . . . . . . . . . . . . . . . 57

vi

List of Figures

1.1 Log into REportal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Create a folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Upload the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Open and analyze the project . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 Sign up for a REportal account . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Log into REportal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Structure of user’s file system . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Process to build up a working space . . . . . . . . . . . . . . . . . . . . . 21

3.5 Workspace display and navigation . . . . . . . . . . . . . . . . . . . . . . 23

3.6 Query window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.7 Entity search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.8 Relationship search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.9 Display results as a graph . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.10 Reachability query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.11 Reachability query results . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.12 Clustering a relationship search graph . . . . . . . . . . . . . . . . . . . 31

3.13 Advanced clustering a relationship search graph . . . . . . . . . . . . . . 32

3.14 Text search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.15 Advanced search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.16 Advanced search results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.17 Code browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.18 Integration of Bunch in REportal . . . . . . . . . . . . . . . . . . . . . . 37

3.19 Upload a MDG file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

vii

3.20 Create a MDG file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.21 Hill-climbing configuration window . . . . . . . . . . . . . . . . . . . . . 39

3.22 GA configuration window . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.23 Clustering options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.24 Using libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.25 Using omnipresent modules . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.26 User-directed clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.27 MQ calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 The process supported by REportal . . . . . . . . . . . . . . . . . . . . . 50

4.2 Development and deployment environment of REportal . . . . . . . . . . 51

4.3 Queries against the repositories . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 REportal architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.5 REportal subsystem-level sequence diagram . . . . . . . . . . . . . . . . 58

4.6 The user management and REportal configuration subsystem . . . . . . 60

4.7 The workspace management and REportal utilities subsystem . . . . . . 62

4.8 The sequence diagram of ProjectFile . . . . . . . . . . . . . . . . . . . 64

4.9 The sequence diagram of the Clustering subsystem . . . . . . . . . . . 66

viii

AbstractA Reverse Engineering Portal Web Site

Congrong LiaoSpiros Mancoridis, Ph.D.

The software engineering research community has developed a suite of useful tools,

which run on different platforms with different user interfaces and in some cases are

difficult to configure. They also change over time because they are research tools.

Thus, the users of these tools are responsible for integrating these tools together and

dealing with constant updates. In this thesis, we present a technique of “software

as services”. We create a portal web site called REportal which deliver reverse engi-

neering services that only requires an internet browser. REportal features a simple

and friendly user interface that hides the complexities of using those tools. The nice

design of REportal is of value to experienced users because REportal provides all of

the integration. It is also of value to inexperienced users since it provides a suite of

useful services that may not be obvious or known to them.

1

Chapter 1. Introduction and Motivation

Large volumes of code are typically difficult to understand. This problem is made

worse because most software systems are not documented. Studies show that about

half of the time spent making changes to software is spent on understanding the

software [10]. Automated reverse engineering techniques help developers by minimizes

the amount of manual source code analysis needed to understand a system. Reverse

engineering is the process of analyzing a subject system to identify its components

and their relationships, creating a representation of the system at a higher level of

abstraction than what exists in the source code. Specifically, reverse engineering

provides an aid to the comprehension of complex systems[10].

The Software Engineering Research Group (SERG)[28] at Drexel University along

with the AT&T Research Labs have developed several reverse engineering tools over

the years, among which are the Bunch software clustering tool[7, 11, 23, 24], the

Acacia source code analyzer for C/C++[13], the Chava source code analyzer for

Java[18], various tools for profiling C/C++/Java programs, and graph visualization

tools.

Even though most of the aforementioned tools are developed in Java, which makes

them portable across platforms, their installation still is cumbersome. For example, in

order to install Bunch, you must first download bunch.jar. The installation requires

the Java Development Kit (JDK) 1.2 or higher. If you don’t have the JDK on your

2

machine, you have to install it. Even after setting the classpath1 you have to install

GraphViz from AT&T [6] to view the clustered graphs. That means you must go to

the web site of the AT&T Research Lab, find the correct version of GraphViz for your

platform, download it, and install it. Even after everything starts to work correctly,

the user must periodically check for software updates of Bunch, GraphViz and the

JDK in order to ensure the latest version is being used.

The difficulty of installing and updating these reverse engineering tools might

prevent professional software engineers, educators, and other researchers from using

them. Also, these tools are developed over several years by many different people.

Hence, the usability, reliability and programming interfaces of these tools may differ.

For example, Bunch was developed in Java and features a user-friendly Graphical

User Interface (GUI). It takes Module Dependence Graph (MDG) files as input and

produces module cluster graphs as output. Acacia and Chava were developed in C

and run in Unix. They are invoked by a Unix shell command or a Unix shell script

with appropriate arguments. Typically, MDG files are created by either Acacia or

Chava.

Besides, the usability and interfaces of these tools may change over time. For

example, since Bunch is a reach tool its API and GUI have changed extensively over

the past few years as new features were added. The Bunch developers noticed that

many inquires about the tool were related to users that were executing older versions

of the software.

Our experience shows that learning how to use these tools takes a long period of

time. To illustrate how cumbersome it is to use these tools, let’s do a simple analysis

of a small operating system simulator called cssimulator 2, written by Liao, which

1To run Bunch you must alter the classpath environment variable so that the Java VirtualMachine can find and load Bunch’s class file.

2cssimulator is a small operating system simulator written by the author for a course taught byDr. Brian Mitchell at Drexel University.

3

has 32 classes. Our goal is to determine the inheritance relationships between the

class command and all other classes. We assume that all necessary tools have been

installed correctly.

• Firstly we use Chava to create the entity and relationship repositories. To do

this, we type the following commands in the directory where the source code of

cssimulator resides:

– % chava -c *.java

– % chava -l *.java

The first command compiles Java source into a set of small repositories (one

per Java source file). The second command links the set of files to produce the

Chava repositories. Fortunately, all of the source code is in the same directory,

which eases the creation of the repository. In the case that source code resides

in more than one directory, a shell script might be necessary to compile and

link all the files. If the repository-creation process is successful, an entity.db file

and a relationship.db file are created in the directory. These files are required

by other AT&T source code analysis tools.

• To determine the parent class of class command, we use AT&T’s cref tool as

follows:

– % cref c command c -

The query results are displayed on screen in plain text.

• If you want to visualize the results, dagger, a program to generate program

graphs from the repositories, is a good choice. If you are interested in a partic-

ular class, you open the class using a text editor and view the source code.

4

In summary, to do the simple query, you have to install Chava, CIAO and dagger.

You have to read through the user manuals of these tools in addition to being familiar

with Unix.

1.1 Our Research

Practitioners and researchers can take advantage of the latest developments in reverse

engineering if the services provided by the tools are made available on the WWW.

Services provided via a browser interface typically require no client-side software, and

offer an intuitive user interface that is very easy to learn.

In this thesis, we present a technique of “software as services”. We create a portal

web site called REportal, which is a front-end to a repository of reverse engineering

tools. Instead of downloading these tools to each client’s local machine and running

them locally, we put these tools on a server and expose the features of these tools

as web-based services. It is called a portal web site because through REportal users

have access to a set of reverse engineering tools that essentially have different user

interface. REportal enables authorized users to login to the site and upload their

code to the server through a secure channel. REportal features a simple and friendly

user interface that hides the complexities of using those tools natively. Through a

web browser, users requests are sent to the server. The server then executes the

appropriate tools on the server. The results are shipped back to each user’s browser.

By these means, users are able to analyze and browse source code, as well as to query

and extract design information for C, C++ and Java programs.

To use REportal, all the user needs is a Java-enabled web browser. This allows

them to analyze and visualize source code without going through the troubles of

obtaining, installing, and learning how to use a set of reverse engineering tools. Users

all over the world are guaranteed access to the latest version of the portal service.

As a comparison, we go back to the previous example using REportal. This task

5

Figure 1.1: Log into REportal

requires the following steps:

1. Launch a web browser from any platform, go to the REportal web site[8] and

log on, as shown in Figure 1.1. Users without an account may obtain one for

free by simply filling out a registration form.

2. As shown in Figure 1.2, the next step is to create a folder for the project and

upload the project,3 as seen in Figure 1.3.

3. Open the project, then in the relationship section, choose the class to class

relationship, specify the name of the first entity as command, and press the list

button, as seen in Figure 1.4.

3Upon uploading, the project is unzipped and the repositories entity.db and relationship.db arecreated automatically.

6

Figure 1.2: Create a folder

Figure 1.3: Upload the code

7

Figure 1.4: Open and analyze the project

4. The query results are displayed in a table. If the user follows links in the table,

source code is displayed in a web browser as hyper-linked source. The hyperlinks

automatically cross-reference all of the entities in the source code.

The above example illustrates how easy it is to use and REportal services. Further-

more, once a project has been uploaded to REportal, it is preserved in our repository

enabling future analysis.

1.2 Research Challenges Addressed in this Thesis

This thesis addresses the following research challenges.

• Integration of complex interfaces. REportal is the first comprehensive

code browsing and analysis service available on the WWW. As a result, there

8

is no template that we can follow. Many challenges involve providing enough

features to users while keeping the user interface as simple as possible. REportal

consists of many integrated tools, which have different interfaces and platform

requirements. Some of these tools are written in Java and have comprehensive

APIs, while others are Unix command-line utilities. Providing a simple and

intuitive user interface by hiding the complexities of these tools is one of the

challenges that this thesis addresses.

• Limitation of web browsers. The only software needed to use REportal is a

web browser. User requests are sent to the server via the web browser; and the

results of the requests are sent back to the browser to be displayed. The idiosyn-

cracies of different web browsers limit the implementation of REportal in many

ways. For example, displaying an error message to users can be done easily in

a stand along program by a single line of code. However, to do the same thing

in a web browser, the content of the whole page has to be reloaded with the

error message. Other techniques, such as the use of Dynamic HTML(DHTML)

provides nice capabilities to enrich the user experience, but these implementa-

tion choices also limit the number of browsers that we can support, as only the

most recent releases of web browsers correctly support DHTML.

There are many types of web browsers on the market, among which are the pop-

ular Internet Explorer, Netscape and Mozilla. Some features are supported by

one browser but not another. Other features (e.g., JavaScript) behave differently

on different browsers. Browser capability is an issue that this thesis addresses.

Our goal is that all features of REportal should be supported by all popular

web browsers such as IE6.0, Netscape4.76, Netscape6.0, and Netscape6.2, on all

popular platforms such as Windows98, Windows95, Windows2000 and Linux.

this design tradeoff limits some of our implementation choices, but it enables

9

REportal to be used by a very large community of users.

• Security. Since REportal requires users to upload their source code to our

server, it must provide a secure environment so that users are confident enough

to use it. Security issues include: a)Secure user authentication; b)Secure trans-

fer of code to the server; c)Secure display of results. Security is provided by

using standard SSL technology through a registered server-side certification.

1.3 Thesis Outline

The rest of the thesis is organized as follows: Chapter 2 provides an overview of related

work. Chapter 3 describes the functionality of REportal from a user’s perspective.

The architecture of REportal is described in Chapter 4, thus providing a developer’s

perspective. The validation of REportal is covered in Chapter 5. The thesis concludes

in Chapter 6 by summarizing the work and outlining our plans for future work.

10

Chapter 2. Related Work

There are several areas that are relevant to this thesis: web-based software engineer-

ing, source code analysis, design extraction and software visualization.

2.1 Web-based Software Engineering

Many web-based software engineering tools fall into the following categories:

• A knowledge base of information about reverse engineering, program under-

standing, and software evolution.

• A repository of software packages that can be downloaded and run locally.

• A web-based service provider in an area other than reverse engineering, such as

graphing.

Reengineering Wiki[31] is a forum where all topics related to Reverse Engineering

and Reengineering can be discussed. It also contains a collection of papers, tutorials,

and surveys.

Software Bookshelf[15] catalogues software architectures and allows users to ex-

plore them. It has a primitive querying mechanism and, unlike REportal, the Software

Bookshelf does not support source code browsing.

The graph server at Brown University provides interactive graph drawing and

translation via the WWW[37]. It offers two kinds of services:

1) drawing graphs using a user-specified algorithm;

11

2) translating the description of a graph form one format to another[38].

Stephen North designed a graph service called drawdag [1]. The service accepts a

dot file and layout format from users via email. The server constructs a drawing with

dot[14] and then sends the output to users via e-mail.

Sourceforge.net[30] is one of the largest open source software development web site.

It supports source code browsing, discussion, bug/defect tracking, documentation

repository, etc. The focus is specifically for software and reverse engineering.

The closest work to REportal is the LintPlus Online [9] of Cleanscape. It provides

online source code analysis for C and Fortran programs. It displays the compiling,

call tree, and include-file trees in a text format. However, it does not support graph

visualizations, interactive querying, or design extractions.

2.2 Code Analysis

The current trend is to build repositories from a system’s source code so that those

repositories can be used for a variety of reverse engineering analysis. The reposito-

ries are useful because complex reverse engineering tools can be built by analyzing

information stored in the repository without parsing the system’s source code.

Two popular approaches exist to construct software repositories. One is to store

variants of abstract syntax trees in the repository; the other is to structure the repos-

itory as a relational database[13]. Once the repositories are constructed from the

source code, queries can be made to the repositories in order to exact structural

information about the source code.

Over the past several years, the AT&T Research Lab has developed a family of

source code analysis tools such as Acacia[13] for C and C++ and Chava[18] for Java.

In these tools, software systems are represented as collections of entities that refer to

each other. An entity represents a static syntactic construct such as a macro, a type,

or a function. The output of Acacia and Chava are two repositories, i.e., the entity

12

repository and the relationship repository. The entity repository stores the entities

such as file names, variables, functions, classes along with attributes, such as scopes,

line positions, and so on. Likewise the relationship repository stores the relationships

between entities, such as function calls, inheritance, and variable references.

A variety of Unix command line tools are available to query against the repository,

answering questions such as:

• Is variable a defined in file b?

• Is function a referenced by function b?

• What are the child classes of class a?

Advanced analysis such as dead code detection and reachability analysis can be

applied to the repositories as well. The query results are display in text format. Table

2.1 displays the query results of all classes defined in the TAP4 system.

The work described in this thesis uses Acacia and Chava to create the repositories.

There are some commercial code analysis tools such as Red Hat’s Source-Navigator[33]

and Netcomputing’s AnyJ[32].

2.3 Design Extraction

Software systems often need to be modified to improve performance, add new fea-

tures, adapt to new platforms or hardware, and so on. To modify a software system,

developers have to understand the system. As the size and complexity of software

systems increase, the time spending on understanding software systems increases as

well. In most cases the relevant design documentation is missing or inconsistent,

4TAP (Ticket Auction Program) is a small system that the author wrote in Java for a coursetaught by Dr. Spiros Mancoridis at Drexel University.

13

Table 2.1: All Classes Defined in TAP System

Name File

TAPClient TAP/TAPClient.java

TAPServant TAP/TAPServer.java

TAPImplBase TAP/TAPApp/ TAPImplBase.java

TAPServer TAP/TAPServer.java

account TAP/account.java

bid TAP/bid.java

command TAP/command.java

bidder TAP/bidder.java

commandReader TAP/commandReader.java

createUser TAP/createUser

notify TAP/notify

making the problem even worse. Therefore, tools that can provide a high-level sys-

tem decomposition become very helpful to facilitate the comprehension of software

systems.

The principle artifact that must be examined is the system’s source code. Thus,

the major task in software reverse engineering is to build an abstract model of a

software system from its source code.

Mitchell and Mancoridis developed a software tool called Bunch[7, 23], which

automatically decomposes the structure of software systems into subsystems. Modules

with high cohesion are grouped in the same subsystems (clusters), and independent

modules are grouped into separate subsystems. The modules and dependencies of

a system are mapped to a Module Dependency Graph (MDG) using source code

analysis tools such as Acacia[13] for C and C++ and Chava[18] for Java. The goal of

14

Bunch is to find a good partition of an MDG graph. It is the first system to employ

generic search algorithms to the software clustering problems.

Mitchell and Mancoridis introduced an objective function called Modularizaion

Quality (MQ). The MQ rewards the creation of highly cohesive clusters, and pe-

nalizes excessive coupling between clusters. Hence, Bunch reformulates the software

clustering activity into an optimization problem where the goal is to maximize the

value of MQ. The assumption behind this rationale is that most software systems are

designed in such a way that highly cohesive modules are organized into the same sub-

system while loosely coupled modules are organized into separate subsystems. The

process is conducted automatically. Also, users can integrate their knowledge with

clustering process by assigning some modules to subsystems manually. Extensive case

studies and experiments show that Bunch does a good job of producing a subsystem

decomposition with or without knowledge of the software design.

Bunch also includes a programmer’s Application Programming Interface (API) so

that the clustering tool can be integrated with other tools, which makes the integra-

tion of Bunch into REportal possible.

2.3.1 Graph Drawing for Software Visualization

Visual presentations can ease the understanding of complex systems. Not surprisingly,

extensive research has been conducted on how to store, layout and display graphs.

Barghouti and Mocenigo developed an extensible graph drawing package written

in Java, called Grappa[39]. It consists of a set of classes that implement graphs, in

addition to representation and presentation services. It also provides an API so that it

can be integrated into applications that require graph drawing, editing, and browsing.

The second version of Grappa, in addition to supporting the feature of bird’s eye view,

is able to handle large graphs, which the first version of Grappa could not. REportal

integrates Grappa in the form of an applet to represent interactive graphs.

15

Grappa invokes the dot [14], a graph layout tool. Dot, which runs fast enough

for interactive use, uses a four-pass algorithm for drawing directed graphs. Dot is a

command-line utility that takes a dot description file as input, and produces an output

file where the nodes are assigned a position in a 2D space based on layout properties.

By default, dot positions nodes to minimize edge lengths and edge crossing. Grappa

renders a graph in a Java applet based on the layout information produced by dot.

dot can also transform a dot description file into a formatted graph using a number of

standard image file formats such as GIF, PS, JPEG, and PDF. The dot description

file is a text file where users specify the edges and nodes to appear in the output

graph. Users are able to control the font type, font size, colors of nodes and edges,

shapes of nodes, labels and so on[27]. User may also provide information that dot

uses in the layout process.

16

Chapter 3. REportal: A User Perspective

This chapter covers the capabilities of REportal from user’s perspective. The main

services of REportal will be presented along with usage scenarios to illustrate how

REportal can be used for a variety of software engineering problems.

3.1 Introduction to REportal

As mentioned earlier REportal is a web-based application that integrates many stand-

along software engineering tools. The service provides these tools are aggregated into

a common presentation that is rendered in a standard Internet Browser. The current

version of REportal provides the following services:

• Registration & Account Maintenance. REportal enables any Internet user

to create an account. All services provided by REportal are executed under a

user context. Furthermore, users can only work with systems that they upload

to REportal.

• Code Analysis. REportal provides two analysis services, one for Java code

and the other for C/C++ code. These services analyze programs and generate

repositories with all the necessary information about the system being analyzed.

Using these repositories, advanced code analysis, such as query and clustering,

can be done. Once created, the repositories are associated with a particular

users project, enabling them to perform additional analysis without having to

re-upload source code.

17

• Code querying and browsing. REportal allows users to perform entity or

relationship queries that explore structural information about the programs

being analyzed. In addition, REportal can display fully cross-referenced source

code in a web browser using standard HTML hyperlinks.

• Design recovery and visualization. REportal integrates a tool called Bunch[7,

23, 24], which automatically partitions source-level structures into high-level

subsystems. The results are displayed on a web browser.

• Supporting services. REportal provides supporting services that make the

above services easier to use. These services include user authentication, content-

sensitive help, user workspace management (e.g., creating, renaming, deleting,

uploading and downloading files/folders.).

3.1.1 Installation

Although REportal doesn’t require users to install the reverse engineering tools, the

client’s workstation configuration must support JavaScript, Applets, Cascading Style

Sheets (CCS), and HTML5. Users must verify the following before using REportal:

• JDK 1.3 or higher is installed locally.

• The web browser enables Java and JavaScript.

• The latest version of web browsers is highly recommended. We have tested

REportal using Internet Explorer 5.5 or higher, or Netscape 6.0 or higher.

• A web browser with support for Secure Sockets Layer (SSL).

5Most of these services are provided intrinsically by modern day browsers.

18

Figure 3.1: Sign up for a REportal account

3.1.2 Login and Sign Up

REportal uses SSL[29] to encrypt transmission of source code and user authentication

information. When users go to the REportal website[8], they are prompted that they

are about to view pages over a secure connection.

To create a REportal account, users must provide their first name, last name,

desired username, password, company name, and a valid email address, as seen in

Figure 3.1.

As soon as the request is submitted, a welcome email message is sent to the user’s

mail box, and an account is created.

Once a user has an account, they are eligible to login to REportal by providing a

19

Figure 3.2: Log into REportal

valid username and password, as seen in Figure 3.2.

3.2 Workspace Management and Navigation

Each authenticated user has their own workspace with some management privileges.

Users are allowed to do the following:

• create folders for projects.

• Upload source file packages (e.g., Zip of Jar) to their private workspace.

20

• Download files or folders to their local machines.

• Rename or delete a project.

• Upload graphs in MDG, DOT or SIL format to the graphs subfolder.

• Browse the entire workspace.

• Select a project and analyze it.

Workspace Management

Users need to create folders in their working spaces; each folder holds only one

project. Once a folder is created, the src and graphs subfolders are created automat-

ically. Although users are permitted to create additional folders for projects, they

cannot change the structure of the folders, meaning that they cannot delete, rename

or create subfolders (Figure 3.3). All operations including creating, opening, renam-

ing, deleting, uploading and downloading are performed by selecting an appropriate

item from a context menu, which is activated by right-clicking on a folder/file.

Figure 3.4 shows the process of building up a workspace. Once a folder is created,

users can upload zipped source code to the folder. Users are prompted to select a

zipped file from their local machines. All source code must be zipped into a single

file. For C/C++ programs, all source files including headers must be included in

the zipped file. For Java programs the user may upload either bytecode or the Java

source code; since Java bytecode executes on top of a virtual machine, code analysis

can be performed by introspecting the bytecode directly. However, Java source code

is required if the user also want to take advantage of REportal’s source code browsing

feature.

If the source code is uploaded successfully, the file is upzipped into the src sub-

folder automatically, keeping the file structure intact. Then, REportal automatically

21

User 2 User n

User File System

Project 1 Project n

src graphs

Project 2

User 1

Created after login

Creating, uploading, renaming, deleting and downloading are allowed.

Uploading and downloading are allowed. Only downloading

is allowed.

Downloading and deleting are allowed.

Only downloading is allowed.

Figure 3.3: Structure of user’s file system

Create a folder

Upload a zipped project

Open the project and analyze it

Manage the working space

Figure 3.4: Process to build up a working space

22

generates the two repositories: entity.db and relationship.db, against which fur-

ther analysis can be conducted.

Workspace Navigation

As shown in Figure 3.5, users are able to navigate through their workspace. RE-

portal displays the tree structure of the workspace. If an icon of an entity appears as

a folder, users can expand it by left-clicking the mouse to see what the folder contains.

Once a folder is open, the contents of the folder are displayed in the tree structure,

and the icon of the folder changes indicating that the folder is open. Left-clicking the

icon again closes the folder. Users have limited privileges to manage certain folders;

they cannot, for example, rename or remove the src folder and the graphs folder.

3.3 Code Querying and Browsing Service

Once users open a project, the query window is displayed (Figure 3.6), which allows

users to perform customized analysis. All tools are listed on the top of the window

as tabs. The Entity Search tool is enabled by default, and all files in the selected

project are listed in a table. Hyperlinks to the source code browsing feature of RE-

portal are associated with the files that have source code available. For example,

clicking on the TAP/TAPClient.java link will redirect REportal to display this file’s

actual source code. As mentioned earlier, when REportal displays source code it is

cross-referenced. Using HTML hyperlinks allowing the user to perform additional

source code analysis.

Code Querying

Code Querying is a very important service provided by REportal with basic features

including entity search, relationship search, text search, and advanced search.

23

Figure 3.5: Workspace display and navigation

Entity Search

An entity represents a static and syntactic construct such as a macro, a type, a

function, a file, etc. For Java programs, entities represent files, classes, methods and

fields, while for C programs, entities represent types, functions, variables, macros,

and files (see Table 3.1). In an entity search, the user can select the entity type (file,

class, method or field) and specify the name of the entity (Figure 3.7). Wild cards

may be used for entity names as described in Table 3.2.

Query results are displayed in a table that has two fields: the name of the entity,

24

Figure 3.6: Query window

and the file in which the entity resides. If the source code of the file is available, a

hyper link, which leads to the source code browser, is associated with the file. More

about source code browsing is described in section 3.3. In addition, a hyper link

is associated with the entity, which links to source code browser to the definition

of the entity. Query results are restricted to 20 entries per page, which minimizes

download time and is a popular implementation approach for web-based systems that

need to display the results of large queries. By default, the first page is displayed.

However, users can jump to any page or display all query results in one page. Figure

3.7 illustrates a query of all methods in Ticket Auction Program (TAP).

The entity search can answer questions like:

25

Table 3.1: Entity Types for Entity SearchC Javafile file

types class

function method

variable field

macros

Table 3.2: Wild Cards Accepted by Entity SearchName File? Matches any single character

* Matches any sequence of zero or more characters.

[x...y] Matches any single character specified by the set x...y.

A minus sign may be used to indicate a range of Characters.

• What classes are defined in the project?

• What are the methods whose names start with “b”?

• Where is the field “fileName” defined?

• What files have source code available?

Relationship Search

A system-level relation is an association between two entities, such as inheritance

between two classes, a function call between two methods, or a reference to a variable

by a function. A relationship query takes two parameters: a source and a destination

entity. The query returns a collection directed relations from the source to the desti-

nation. For example, to list all entities that refer to a particular entity, users specify

the destination entity and leave the source entity as a “*”. On other hand, to find all

entities that a specific entity refers to, users specify the source entity and leave the

destination entity as a “*”.

Similar to the entity search, users can specify destination and/or target by speci-

fying the entity type and entity name. Entity name may also be specified using the

26

Figure 3.7: Entity search

wild cards listed in Table 3.2 as well. Some examples of relationship queries are shown

in Table 3.3.

Query results of a relationship search can be listed in a table. The table has four

columns for the names of source and destination entities, and the files in which they

reside. If the source code of a file exists, hyperlinks are associated with the entity and

the file name containing the entity. Figure 3.8 shows all the class to class relationships

in the TAP system.

Query results can also be displayed as a graph as seen in Figure 3.9. If users

choose this option, a new window appears with the graph displayed as an applet in

the left panel and a bird’s eye view of the graph in the right panel. In the graph, a

node represents an entity, while a link between two entities represents a relationship.

27

Table 3.3: Examples of Relationship SearchQuery Target type Target

nameDestinationtype

Destinationname

All entities

refer to method

bid

All * method bid

All entities that

method bid refers

to

method bid All *

All classes that

inherit from

classes with

initial b

class * class b*

All fields end

with c being

referenced by

class c

class c field *c

Figure 3.8: Relationship search

28

Figure 3.9: Display results as a graph

The time needed to display a graph is based on the size of the graph. The size of

the applet window varies with the size of the web browser, and the monitor’s screen

resolution.

The bird’s eye view is very appealing for navigating through large graphs that

cannot be displayed on one screen. If users click in an area of the bird’s eye view

window, the graph in the main window centers to the place where it was clicked.

Also, users can click the mouse and drag it so that a block of nodes and links are

highlighted in the bird’s eye view window and the main graph will zoom to that block.

The graph is displayed in the center of the applet with six buttons at the bot-

tom, which are -, =, +, Reachability Query, Cluster this Graph and Advanced

Cluster this Graph.

29

• - This button zooms out the graph.

• = This button centers the graph.

• + This button zooms in the graph.

• Reachability Query. If users want to determine which entity can be reached

by a particular entity, or which entity another entity can reach, they can click

on the corresponding node in the applet, and click “Reachability Query”. A

new window appears as seen in Figure 3.10. Users may query what can be

reached by the entity by specifying the direction as forward ; or they may query

what can reach the entity by specifying the direction as backward. Users can

also limit the type of entities that are reachable to/from a entity. For Java

programs, entity types are packages, files, methods, classes, fields, strings, and

interfaces. User may also specify the depth and output format of reachability

query. If users select database as the output format, a window similar to Figure

3.11 appears. In this window all entities that match query criteria are displayed.

• Cluster this Graph. The entities and relations in a system may also be

represented as a graph where the nodes are the entities and the edges in the

graph are the relations. Up to this point we have represented this information

using a tabular view. Once a system’s representation is modeled as a graph,

we can use the Bunch software clustering tool[7, 23] to partition the graph to

obtain high-level structural information about the system. As shown in Figure

3.12, the results of the clustering activity are displayed visually as an applet.

The contents of subsystems are hidden as blocks, but they may be expanded

by double-clicking on them.

• Advanced Cluster this Graph. Similar to “Cluster this Graph”, this fea-

ture also clusters the graph using Bunch and displays results as an applet.

30

Figure 3.10: Reachability query

Figure 3.11: Reachability query results

31

Figure 3.12: Clustering a relationship search graph

Unlike “Cluster this Graph”, before the clustering is conducted, a new window

is presented (Figure 3.13), where users have an opportunity to customize the

clustering engine. For example, they can exclude some modules from the clus-

tering process and can upload a file that may restrict the placement of particular

modules into certain clusters.

Text Search

Text search allows users to search the entire source code of the system for a

particular text pattern. It accepts wild cards as described in Table 3.2. A user can,

for example, search for all the source code lines that contain the text “main”, or a

text pattern that begins with “ma” and ends with “n”.

The query results are displayed in a table as seen in Figure 3.14. The table

32

Figure 3.13: Advanced clustering a relationship search graph

contains three columns: the file in which the text/pattern resides, the line number

of the text/pattern in the file, and the contents of the line. As usual, if the source

code of a particular file is available, hyperlinks are placed on the file name that can

be used to access the source code browsing view where the specified line of code is

displayed at the top of the window.

Advanced Search

The entity search and relationship search allows primitive queries that accept

entity types and names. More complicated queries can be performed by the Advanced

Search feature of REportal. The interface of the advanced search is shown in Figure

3.15.

33

Figure 3.14: Text search

Not only can users specify entity names and types, but can also specify the file in

which the entity resides, the scope of the entity (e.g., the scope could be protected,

public or private), and the type of query (either an entity or a relationship search).

Query results are displayed in tables with more information. Figure 3.16 shows all

public classes in the TAP system. It displays information about entity names, scopes,

files in which they reside, and beginning and end line numbers of each entity.

With the additional capabilities of the advanced search feature, users can perform

more complicated queries such as:

• Which private classes implement class bidder?

• Which classes in file TAPServer reference the class bidder in file bidder?

• List all of the private methods in file search?

34

Figure 3.15: Advanced search

• How many files does the package bunch contain?

• List all interfaces that are defined in files beginning with the letter “C”?

• Is the run method in the class command referenced by the notify method in

class mail?

Code Browsing

If users provide source code when uploading their systems to REportal, they can

browse the source code in a web browser. Like many commercial Integrated Develop-

ment Environments, different types of information in the source code are displayed

35

Figure 3.16: Advanced search results

in different colors for easy reading. For example, line numbers in front of each line of

code are displayed in a light gray indicating that the line numbers are not part of the

source code; reserved words are displayed in green, comments are displayed in red,

and the other source code in black.

When source code refers to a program entity, a link is placed in the browser

window. If a user follows the link, the window as seen in Figure 3.10 appears. From

this window users can perform a reachability query to find what can be reached by

the selected entity.

36

Figure 3.17: Code browsing

3.4 Design Recovery

REportal uses the Bunch clustering tool to infer high-level design information about

a system. REportal mimics the Graphical User Interface (GUI) of Bunch, so that the

user’s experience in Bunch can be carried to REportal, or vice versa.

Using “Bunch”

Figure 3.18 shows the user interface of the Bunch software clustering service.

It contains two sections, the “option” window on the top allows user to configure

clustering options, while the bottom section is used to perform a given action. On

top of the “option” window, there is a set of tabbed panels, which are “Basic”,

37

Figure 3.18: Integration of Bunch in REportal

“Options”, “Libraries”, “Omnipresent”, “User Directed” and “MQ Calculator”. If

these panels are activated (by clicking them) users can switch to a different “option”

window.

When Bunch is launched, by default, the “Basic” option window is displayed,

where users can specify input graph files (MDG format) and clustering methods.

Clustering a graph requires users to select an MDG graph from the drop-down list

of the “Input Graph File” tab, which lists all graphs in the “graphs” folder of the

project. When a graph is selected, the “Run” button at bottom becomes active

indicating that clustering is allowed. MDG files can be created by two means:

1. Locally create a MDG file either manually or automatically, then upload the file

to the “graphs” folder. The “upload” button in the “Basic” window launches

the window as shown in Figure 3.19, where users can select a MDG file from

38

Figure 3.19: Upload a MDG file

Figure 3.20: Create a MDG file

their local machines and upload it.

2. Alternatively, users can create custom MDG files on REportal directly by

launching a window as shown in Figure 3.20. Users can customize the files by

excluding or including method-to-variable relationships, package names, weights

on relationship, method-to-method relationships, implementation relationships,

and/or relation types.

Once the MDG file is created, it is selected as the input graph file automatically.

Users can then run clustering services after a MDG file is selected. Depending on the

39

configuration of the clustering engine and the size of the MDG, the time required to

cluster the system varies. After the clustering finishes, four output files are generated

for viewing – “dot”, “pdf”, “ps” and “gif”. If everything works correctly, the “Down-

load” and “View” buttons in the “Basic” window is activated. Users may then choose

one of the four formats and download an image of the clustered graph to their local

machine. Alternatively, they can view graph online as an applet. If this option is

chosen, a window similar to Figure 3.9 appears with the graph displayed in an applet.

This option supports the bird’s eye view and reachability queries also.

Figure 3.21: Hill-climbing configuration window

Clustering method

“Bunch” supports two clustering methods: Hill-Climbing and Genetic Algorithm

(GA). Users can choose either from the “Clustering Method” list. Hill-Climbing is

the default clustering method. Hill-Climbing works best for most graphs, while the

GA sometimes is more efficient than Hill Climbing for extremely large graphs.

Users also have opportunities to configure the two clustering algorithms by click-

40

Figure 3.22: GA configuration window

ing the “Option” button. Depending on which clustering method they choose, one

of the windows as shown in Figure 3.21 or Figure 3.22 appears. For the GA method,

users can configure GA selection methods, numbers of generations, population sizes,

crossover probabilities and mutation probabilities; while for the Hill-Climbing algo-

rithm, users can configure generation sizes, percentages of search space, percentage of

randomization, and disable/enable simulated annealing. For more information about

the configuration options, refer to Mitchell’s Ph.D. thesis [11].

Clustering Options

The clustering options window allows users to control the behavior of the clustering

algorithms.

• Objective Function. Users may choose one of the following measurements for

evaluating the clustering results: incremental MQ, incremental MQ weighted,

turbo MQ function and turbo MQ squared. The incremental MQ weighted is

default.

41

Figure 3.23: Clustering options

• Limiting running time. Users may establish an upper-bound on runtime by

setting this field to a specific value in milliseconds. Clustering stops when the

limit is reached and the best result found so far by the clustering search engine

is the returned answer.

• Agglomerative Output Options. These options are valid only when “Ag-

glomerative Clustering” is selected in “Action”. The agglomerative clustering

algorithm generates several levels of the module decomposition hierarchy from

the most detailed level to the topmost level. Users may choose the median level,

the most detailed level, or the topmost level as the output.

42

Figure 3.24: Using libraries

Using libraries

The “Library” tab is activated when an MDG file is selected (Figure 3.24). This

feature allows users to exclude those nodes that only have incoming edges (libraries)

from the clustering process. Library nodes tend to obfuscate the abstract view of

structure because all incident edges in the MDG are directed towards the library

module. Thus, the library modules’ placement into a cluster become somewhat arbi-

trary. This result means that library modules may affect the results dramatically.

In REportal’s library window, users can either select nodes manually from the

list on the left and move them to the right, or click the “FIND” button to move all

libraries from left to right automatically. All nodes in the right list are treated as

libraries and are excluded from clustering activity. In the resultant clustered graphs,

these nodes are placed in a special cluster and are displayed in gray.

43

Figure 3.25: Using omnipresent modules

Using omnipresent modules

Omnipresent modules are modules that either have many incoming edges or outgo-

ing edges relative to the other nodes in the MDG. Modules that have many incoming

edges are called omnipresent clients; while those that have many outgoing edges are

called omnipresent suppliers. In the “Omnipresent” window, which is activated only

after a MDG file is selected, users can exclude omnipresent modules from the clus-

tering process(Figure 3.25).

The list to the left shows all modules that may be selected as omnipresent modules.

The list does not contain those modules that have already been selected as libraries.

44

Likewise, the list on the left in the library window does not contains those modules

that have already been selected as omnipresent6. Users may select certain modules

as omnipresent manually, or press the “FIND” button to have REportal suggest

omnipresent modules automatically.

Figure 3.26: User-directed clustering

User-directed clustering

Users sometimes have prior knowledge about which modules should be placed

within a subsystem. The “User Directed” window (Figure 3.26), which is activated

6In other words, each module in the MDG will either participate in the clustering process, or willbe tagged with a special type such as “library” or “omnipresent”.

45

only after an MDG file is selected, allows users to upload a description file that

specifies what modules should be placed together into clusters. By selecting the

“Lock Clusters” checkbox, no modules will be added into the subsystems that are

defined by the users. If the “Lock Clusters” option is not selected, Bunch may move

additional modules into the user-specified clusters.

Figure 3.27: MQ calculator

MQ calculator

This feature, which ia accessed by pressing the MQ Calculator tab, can be used

to measure the quality of an MDG partition. Bunch uses the Modularization Quality

(MQ) function to evaluate the relative “quality” of a particular MDG partition. To

46

determine the MQ for a provided clustered input file, users select an MDG file and a

SIL file from their graphs folder, and then click the “Calculate” button. The number

of nodes, edges, clusters in the graph, along with MQ value are displayed as results.

3.5 Chapter Summary

This chapter describes the functionality of REportal from a uses’s perspective. The

next chapter is dedicated to REportal’s architecture.

47

Chapter 4. REportal: A Developer Perspective

4.1 Development History

REportal integrates a repository of reverse engineering tools developed over the past

several years by SERG and the AT&T Research Labs. Many people have been in-

volved in the development of these tools. Yih-Farn Chen, Emden R. Gansner, Elefthe-

rios Koutsofios and Jeffrey Korn at the AT&T Research Labs developed the acacia[13]

and chava[18] source code analysis tools. Emden R. Gansner, Eleftherios Koutsofios,

Stephen C. North and Kiem-Phong Vo developed the graph drawing tool dot [14].

Naser S. Barghouti, John M. Mocenigo, and Wenke Lee developed the graph package

Grappa[39]. Brian Mitchell and Spiros Mancoridis developed the software clustering

tool Bunch[7, 11, 23, 24].

Since these tools were developed in different languages and for different platforms

with different user interfaces, installing and learning how to use these tools can be

a complete undertaking. In the year 2000, Spiros Mancoridis and Yih-Farn Chen,

inspired by the idea of “software as services” model, came up with a proposal for a

web-based portal site that integrates these tools, and hides their complexity behind

a simple and intuitive user interface.

The proposal was sponsored by grants from the National Science Foundation

(NSF) and the research laboratories of AT&T. Timothy S. Souder and Jeffrey L.

Korn started the implementation of REportal. Souder set up the deployment en-

vironment for REportal, implemented a few straightforward administrative features

48

such as downloading, renaming and deleting files and folders. The initial release of

REportal also integrated a few of Bunch’s features through the use of Java Servlets7.

Korn implemented graph visualization, reachability query and source code browsing

features via Common Gateway Interface (CGI) programming model.

By the end of June 2001, when I took over the development of REportal, it had

a few basic features such as code analysis, code browsing, design extraction and

supporting services. The early work was a proof of the concept for REportal. It was

clear that REportal needed to be overhauled and expanded before it could be useful

to the software engineering community.

William Mongan soon joined the REportal development team. Our goal was

to re-implement REportal so that it could provide a simple and intuitive interface

with many additional new features. Mongan maintains our development machines,

is involved in the design of the user interface, and works on the graph visualizations

feature. Due to his contribution, REportal is able to visualize large graphs both

in IE and Netscape, and users can navigate through these large graphs using the

bird’s eye viewing feature. He also found and fixed many bugs. I worked on the

design and implementation of the new interface, the integration of Bunch’s features

into REportal, and the addition of the features for user workspace management and

navigation.

4.2 REportal Non-Functional Requirements

The functional requirement of REportal were outlined in Chapter 3, which describes

REportal from a user’s perspective. In addition to these functional requirements,

REportal was designed to satisfy the following non-functional requirements:

7The original architecture by Korn and Souder is still being used, although most of the classeshave been rewritten, modified, or expanded to support the new interfaces and features.

49

• Extensibility. As a long-term research project, REportal is designed to be

a central repository of reverse engineering tools. Although it only supports

static system analysis today, eventually more tools will be integrated to support

security analysis and dynamic program dialysis.

• Installation and Portability. Currently, REportal runs on a server in the

Software Engineering Research Group (SERG) at Drexel University. To use the

services of REportal, users must first upload their source code to the server. If

security is a key concern for users, they may hesitate to do so. Hence, one feature

of REportal is its portability to other servers, perhaps behind a corporate fire

wall. Therefore, it is important that REportal be easy to install, and adjustable

to variety of operating systems and servers.

• Usability. As a portal web site that provides complicated software engineering

services, usability is a top design concern. Providing robust, reliable services

through intuitive and friendly interfaces is a very important requirement.

4.3 The REportal Process

Figure 4.1 shows the high-level process supported by REportal. Once the source

code is uploaded to the server, depending on whether the source code is written in

Java or C/C++, Acacia (C/C++)or Chava (Java)is used to create two repositories:

entity.db and relationship.db. These database files store the information about entities

in the source code and the relationships between them, respectively. Other than text

searching, all advanced queries such as the entity searching and relationship searching

are conducted by querying the two repositories.

Advanced queries are conducted through a set of CIAO8 query tools, such as cdef,

8The CIAO tools, cdef and cref, work against the database files provided by Acacia and Chava.

50

Source code

Source code analysis tools

Acacia Chava

Repositories

relationship.db entity.db

CIAO querying tools

Bunch

Relationship search

Entity search

Source code browsing

Visualization tools

dotty Grappa

Text search Unix utility tools

Figure 4.1: The process supported by REportal

cref. The results produced by these tools contain the information about the entity

name, the ID of an entity in the repositories, the file in which the entity resides, the

ID of the file, the beginning and ending line numbers of the definition of the entity,

the relationship type, the scope of the entity, and so on. Some of this information,

such as entity names and files in which they reside, are used when displaying results

in table listings.

Design extraction is achieved using the Bunch clustering tool. REportal generates

a subprocess in which a set of Java programs invoke the Bunch API9 to perform

clustering. The Bunch clustering tool takes Module Dependency Graphs (MDGs) as

input. The MDGs are generated by a unix shell script which invokes a set of CIAO

query tools to produce a file in the correct MDG format.

Query results may be displayed online in tabular format or graphically as an applet

9Bunch provides a graphical user interface to support stand-alone usage, and an API to supportintegration with additional tools.

51

File server ( snoopy )

Web server ( tweety ) Development machine ( art )

Check in

Check out

Load

Figure 4.2: Development and deployment environment of REportal

by Grappa [39].

4.4 Development and Deployment Environment

REportal runs on tweety, which is a Unix machine that runs the Apache web server.

tweety runs Red Hat Linux 7.2 on a 1 GHz Pentium III processor, with 256 MB RAM

and 18 GB SCSI hard drive. tweety has all of the necessary tools, shell scripts, and

software to run REportal.

snoopy runs Red Hat Linux 7.2 on dual 1.5 GHz Xeon processors, with 1 GB

PC800 RAM, 500 GB external RAID array and 100 GB internal array. snoopy works

as a file server that stores the source code and bytecode of REportal among other

things. The code of REportal is periodically copied from snoopy to tweety, so that

tweety always reflects the latest development of REportal.

REportal is developed on another machine named art. art also runs Apache so

that it can be used for development and testing. Each developer has a development

folder on art, which contains the source code of REportal. Each developer works in

52

their respective folder, and version control is performed using the Concurrent Versions

System (CVS).

After a development change (and before checking the code into the repositories

on snoopy), the developers must update their local folders so that their work can be

merged using CVS.

The web server that runs REportal and the techniques used to develop it are

described in the following subsections.

4.4.1 Apache

The Apache server is a powerful, flexible, HTTP/1.1 compliant web server. It is

available from the Apache Software Foundation at no charge and comes with an un-

restrictive licence [3]. “Apache has been the most popular web server on the Internet

since April of 1996. The March 2002 Netcraft Web Server Survey found that 54%

of the web sites on the Internet are using Apache, thus making it more widely used

than all other web servers combined”[3].

Apache runs on Windows NT/9x, Netware 5.x and above, OS/2, and most versions

of Unix. It has been shown to be substantially faster, more stable, and more feature-

rich than many other web servers.

Apache is used as the web server for REportal. Because of Apache’s popularity,

REportal can be deployed easily on other servers.

4.4.2 Java and Java Servlets

Introduced by James Gosling at Sun Microsystems in 1995, Java has gained tremen-

dous popularity over the years. As one the fastest growing programming technologies

of all time, Java has become an ideal language for server-side development of large

applications such as REportal.

The cross platform nature of Java facilitates the portability of REportal. Java’s

53

object-oriented, memory-protected design reduces development cycles and increases

reliability.

Java Servlets are special Java classes that can be loaded dynamically to provide

web developers with a simple, consistent mechanism for extending the functionality

of a web server. Java Servlets have many unique features for creating dynamic web

content. Many of these features overcome scalability and performance limitations

that are associated with other server side technologies such as CGI and server-side

JavaScript.

Portability

Java Servlets are supported on all major platforms, and work with all of the

major web servers [5]. This feature enables REportal to developed on a high-end

Unix server running Apache, and to be deployed effortlessly on another platform

running a different web server, such as a Windows NT machine running the Java

Server.

Efficiency

Unlike CGI, which uses a single process to handle each program and/or request,

Servlets are all handled by separate threads that run within the web server process

[5]. Once a servlet is loaded, it stays in the server’s memory as a single object

instance. The server invokes it to handle a request using a simple, lightweight method

invocation. This design has historically shown that Servlets are very efficient and

scalable.

Security

Due to Java’s exception handling mechanism, Servlets can handle errors safely. If

a run-time error occurs, a exception is thrown which can be handled safely without

54

Reportal servlet

entity.db relationship.db

Unix shell scripts

CIAO querying tools

Child processes

User’s requests

Figure 4.3: Queries against the repositories

the danger of crashing the server. It should also be noted that Apache and the Java

Servlet engine provides a significant amount of logging services, which can be used to

support security audits.

Extensibility

Servlets may access the entire family of Java APIs, including the JDBC (Java

Database Connection) API, networking, multithreading, and object serialization,

among others. Thus, Servlets include all of the benefits of Java environment, in-

cluding portability, reusability, and protection [4].

Java Servlet technologies are utilized by much of the work described in this thesis.

The servlet engine used by the REportal project is Apache JServ.

4.4.3 Configuration of REportal

Other than the text search, all other code analysis and design extraction activities

are performed by querying entity.db and relationship.db database files. As il-

55

Table 4.1: Shell Scripts in REportal (Locations are relative to/usr/local/www/reportal)

Shell script Location Descriptionfix-bunch-dot.sed bin Changes dot files created by Bunch

so that the shapes and colors of

nodes are consistent with other

graphs.

ciaodb bin Produces the repositories for C/C++

and Java programs.

mdg bin Produces custom MDG files from the

repositories.

mdg-helper bin Gets around a JDK1.3 bug in the

Runtime.exec method which hangs

when it invokes a Unix shell

script. Runtime.exec in JDK1.3

may not work well with Unix shell

scripts. mdg-helper is a C program

written by Souder that fixes this

problem. This program must be

invoked before the execution of

scripts.

textSearch bin Invokes the Unix utility grep

and searches for a text pattern

within source files in a certain

directory.

cdef bin/ciao Invokes a set of CIAO tools to

perform entity queries against

the repositories. Returns matched

entities along with information

about the entities.

cref bin/ciao Invokes a set of CIAO tools to

perform relationship queries

against the repositories. Returns

matched relationships along

with information about the

relationships.

56

Table 4.2: BunchAPILinks in REportal (Locations are relative to/usr/local/www/reportal)

Name Location DescriptionBunchOmni src Invokes Bunch to generate a list of

omnipresent modules.

BunchNodes src Invokes Bunch to generate a list of

modules in a system.

BunchLibrary src Produces a list of all libraries.

BuncMQCal src Invokes Bunch to calculate the MQ

value of a given MDG partition.

BunchAPILink src Invokes Bunch to run clustering.

lustrated in Figure 4.3, when queries are sent to the REportal servlet via the http

protocol, the servlet generates child processes which invoke appropriate Unix shell

scripts. Essentially, the shell scripts invoke the CIAO tools to query the two reposi-

tories. The query results are stored in temporary files that are subsequently read by

the servlet and displayed to the users. Table 4.1 summarizes all of the shell scripts

used by REportal.

Likewise, Bunch is loosely integrated into REportal through a set of independent

Java programs, which we call BunchAPILinks. When users have clustering requests,

the REportal servlet creates a subprocess which invokes the BunchAPILinks. The

BunchAPILinks call the appropriate Bunch APIs to perform the requests and stores

results in temporary files. Table 4.2 summarizes all the BunchAPILinks.

Table 4.3 summarizes all other tools used by in REportal.

4.5 REportal Architecture

Figure 4.4 illustrates the architecture of REportal, which consists of 11 subsystems.

Before we go into the details of each subsystem, some important aspects of the RE-

portal architecture are described below:

• The REportal configuration subsystem and the REportal utilities sub-

57

Table 4.3: All Other Tools in REportalName Location DescriptionCIAO /usr/local/ciao/lib

/usr/local/ciao/bin

Querying tools against the

repositories.

dot /usr/local/bin A tool that does graph

layout.

ps2pdf /usr/local/bin A tool that transform a

graph from postscript (ps)

format to Portable Document

Format(pdf).

bunch.jar /usr/local/www/reportal Bunch clustering tool.

JDK /usr/local/jdk1.3.1 01/bin

/usr/local/jdk1.3.1 01/lib

Java development kit.

activation.jar

mail.jar

/usr/local/www/reportal Java Mail API.

Chava /cgi-bin/reportal Java source code analysis

tools.

Grappa /cgi-bin/reportal Java package that visualizes

graphs in applets.

<<subsystem>> User Management

<<subsystem>> Workspace

Management

<<subsystem>> Reportal Utilities

<<subsystem>> Reportal

Configuration

<<subsystem>> Graph Server

<<subsystem>> Source Code

Browsing

<<subsystem>> Bunch API Links

<<subsystem>> Clustering

<<subsystem>> Bunch

<<subsystem>> Shell Scripts and

CIAO Query Tools

<<subsystem>> Query

Dispatcher

Core of REportal

Figure 4.4: REportal architecture

58

User Management Workspace Management

Log on

REportal Utilities

Renaming/creating

Uploading/deleting/ downloading

Query Dispatcher

Open a project

CIAO Quering Tools

entity searching

Clustering

Clustering

Relationship searching

Bunch API Links Bunch

Souce Code Browsing

View source code

Graph Server

View graph View graph

Figure 4.5: REportal subsystem-level sequence diagram

system provides system-wide services to almost all other core subsystems. To

make the figure clear, the two subsystems are not connected to the others.

• The shell scripts and CIAO query tools, source code browsing, Bunch

API Links and Bunch subsystems are loosely coupled to the other subsystems

in the REportal architecture. These subsystems are integrated into REportal

to support services like code analysis, module dependency queries, and so on.

We choose a loosely coupled approach because these subsystems are likely to

evolve in the future. This design eases the maintenance of REportal.

• Depending on the type of queries from users, the query dispatcher subsystem

invokes corresponding tools. Future integration of other tools should require

minor effort.

59

Figure 4.5 illustrates the subsystem-level sequence diagram of REportal. The User

Management subsystem deals with user authentication and registration. Once a user

logs into REportal successfully, he enters the Workspace Management subsystem. All

administrative services such as creating, renaming, deleting, uploading and download-

ing a file/folder are completed by the REportal Utilities subsystem. If the user

selects a project and opens it, they advance to the Query Dispatcher subsystem,

where the user’s requests are dispatched to other subsystems. For example, entity

searching requests are directed to the CIAO Querying Tools subsystem; clustering

requests are directed to the Clustering subsystem.

4.6 REportal Subsystems

This section examines the pertinent design aspects for 7 of the 11 REportal subsys-

tems. The description of the User Management subsystem is combined with that

of the REportal Configuration subsystem for simplicity. The Shell Scripts and

BunchAPILinks subsystems have been covered in section 4.4.3. The Source Code

Browsing subsystem is developed by the AT&T research lab in CGI and is not cov-

ered here.

4.6.1 User Management & Configuration Subsystem

Figure 4.6 illustrates the design of the User Management and REportal Configuration

subsystems.

When the REportal service is requested for the first time, a REportal servlet is

created. When a client connects to the server and makes an HTTP request, the

servlet engine produces one thread for each connection and directs the request to the

Reportal object. The Reportal object serves three purposes:

• Creates a Ruser object, a ReportalUser object, and a session for the client.

60

<<subsystem>>

Workspace Management

<<subsystem>>

Query Dispatcher

<<subsystem>>

Clustering

Reportal

Init() destroy() doGet()

ReportalUser

doAuth() displayLoginS

creen() doRequest

Ruser userName password email firstName lastName

SignUp

<<subsystem>>

Reportal Configuration

uses

Figure 4.6: The user management and REportal configuration subsystem

The Ruser object is associated with the session. During the creation, the con-

figuration file of REportal is loaded in through the REportal subsystem. The

Ruser objects contains information about the client such as user name, pass-

word, first name, last name and so on.

• Depending on the status of the session, it invokes methods in the ReportalUser

object to display the login screen, to do user authentication, or to dispatch user

requests to other subsystems.

• Invalidate the session when the connection is closed or the session expires.10

If the client wants to create a REportal account, the request is directed to the SignUp

class, which displays the sign up form. Once the client finishes and submits the form,

10The lifetime of a session is set to 30 mins.

61

the client’s information is stored in a file on the server. The client’s password is

encrypted by a one-way hash function called MD5. If the creation of the account is

successful, a welcome email message is sent to the user via the Java Mail API.

It is worth pointing out that every HTTP request goes through the same path:

the request is directed to the Reportal object, then forwarded to ReportalUser,

which defines several states. Depending on the state of the request, it invokes ob-

jects in appropriate subsystems (e.g., Workspace Management, Query Dispatcher,

or Clustering). Every HTTP request is processed in this manner.

4.6.2 Workspace Management & Utility Subsystem

The REportal Utility subsystem contains two primary classes: FileUtilities and

DisplayUtilities. The FileUtilities class is a layer between the class Projects

and user’s file system. All manipulation of the file system, such as creating a di-

rectory, renaming a file, deleting a directory, and uploading a file must be done by

the FileUtilities class. Currently, we assume authenticated users have all the

privileges described in Section 3.3 to manage their workspace. Once a user is authen-

ticated, they can start managing their workspace without further validation. In the

future we may require every manipulation to a user’s workspace to be validated. The

separation of this class allows these changes to be isolated from other subsystems.

The DisplayUtilities class provides a layer on top of HTML generation. All of

the HTML generated by REportal is handled by calls to this class. For example, the

openTable() method writes a <table> tag and the doHeader() method produces a

header that includes SERG’s logo and REportal’s banner. The abstraction provided

by this class essentially eliminates HTML-related code from the rest of REportal

subsystems. This design makes the code more readable and easier to change.

The Projects class provides an interactive interface for users to manage their

workspace with the support of the FileUtilities and DisplayUtilities classes

62

Projects

createProject() renameProject() deleteProject() uploadProject()

downloadProject() doProjectScreen()

...

<<subsystem>>

Reportal Configuration

uses

DisplayUtilities

doHeader() doFoot()

openTable() ...

FileUtilities

rename() delete()

receiveFile() sendFile()

...

Utilities

Figure 4.7: The workspace management and REportal utilities subsystem

(Figure 4.7). This class displays the user’s workspace in a fashion that is similar to

Windows Explorer. For example, all directories that are at the same level are aligned

vertically; subdirectories are underneath the directories they belong to; directories

that have subdirectories have a + sign in front of them; the directories are expanded

if the + sign is clicked.

In a web browser, clients have three primary ways to send requests to a server to

get dynamic content:

• HTML forms are the oldest and most flexible method of allowing clients to

interact with servers. User requests, along with some information about the

requests, are sent to the server by submitting a form. This method is used

frequently in the work described in this thesis.

• Clients can use Uniform Resource Locators (URLs) to provide extra information

by creating a query string. For example, authenticated clients can use the URL

https://reportal.mcs.drexel.edu/cgi-bin /reportal/webchava/ciao

src.cgi?name=tapDispatch%2Ejava&pathname =src/TAP/command.java

&id=98zU8&JServSessionIdreportal=k33dy8tus1&key=&type=class

63

to view the source code of the command class in the TAP system.11

• An applet is a program written in the Java programming language that can

be included in an HTML page. REportal uses applets to perform graph visu-

alization. Since applets are downloaded Java programs that execute within a

browser’s context, clients are able to view and query graphs interactively.

Every form in the workspace management page has a hidden variable called “ac-

tion” associated with it. The value of this variable varies from form to form. When a

request is sent to the Projects class, the request is forwarded to appropriate methods

depending on the value of the variable. In this manner, expansions take minimum

effort. These techniques are used in REportal’s query page and clustering page.

4.6.3 REportal Configuration

The REportal Configuration subsystem contains a single class named

ReportalConfiguration. As mentined previously, REportal uses many shell scripts

and other tools to perform its tasks. This class defines the paths to these shell scripts

and tools.

Whenever REportal refers to a shell script or a tool, it finds the location of it

through this class. This class reads the paths from a configuration file, in which the

paths are specified. If the file is not found, the paths are set to default values. This

design simplifies the installation of REportal on other servers where the shell scripts

and tools may be located in different directories on the file system.

11The Appache Jserv assigns a randomly generated string to each session as an ID. This URL isvalid only when the session ID is authenticated.

64

Relationship search table listing

Entity search

Text search

Module dependency graph

ProjectFile

cref

cdef

textSerach.sh

mdg

CGI GraphServer Relationship search graph

ClusterWizard Clustering

Figure 4.8: The sequence diagram of ProjectFile

4.6.4 The Query Dispatcher Subsystem

The ProjectFile class provides an interactive interface to support code analysis,

design extraction, and source code browsing services. This class, which is called

when a project is opened, generates the “query” page. The interface of the “query”

page is designed so that all services that support source code queries and browsing,

and design extraction can be reached directly from this page.

The “query” page provides 7 services: a) entity search b) relationship search that

lists results in a table c) relationship search that displays results as a graph d) create

an MDG file e) cluster an MDG file f) advanced search, and g) text search. The

ProjectFile class either completes the service request by creating a subprocess in

which the appropriate shell script/tool is invoked, or redirects the service request

to other subsystems. As illustrated in Figure 4.8, entity search, relationship search

(listing results in a table), MDG file creation and text searching are completed within

the class. The request to perform relationship search (displayed as a graph), and

the request to cluster an MDG file, are redirected to the CGI and the Clustering

subsystem, respectively.

Whenever a task must be completed by calling a shell script or a Unix command-

65

line tool, the ProjectFile object creates a subprocess, in which the shell script or tool

is executed. The subprocess’ standard input/output is redirected to the parent process

through two streams (Process.getOutputStream() and Process.getErrorStream()).

Some native platforms only provide limited buffer size for standard output streams,

and as such, failure to read the output stream of the subprocess promptly may cause

the subprocess to deadlock. To overcome this problem, all query results are saved in

a temporary file. The parent process waits until the child processes complete. After

that, the parent process reads the results from the temporary file and displays them

to the users.

4.6.5 The Clustering Subsystem

Although the Clustering subsystem contains only one class: ClusterWizard, it

is one of the most complicated subsystems. This subsystem integrates the Bunch

clustering tool into REportal to support the design extraction service.

The ClusterWizard class not only implements the user interface, but also invokes

the Bunch API to perform clustering tasks. The Bunch clustering tool has been under

active development since 1998. REportal mimics the GUI of Bunch, so that the user’s

experience in Bunch can be transferred to REportal, or vice versa. However, due to

web browser constraints and the fact that each user’s workspace resides on the server,

the user interface for clustering in REportal is slightly different from that of Bunch.

For example, The Bunch clustering tool provides a dialog box for users to select an

MDG file from their local machines. Since each user’s MDG files reside on the server,

REportal provides a drop-down list from which users can select an MDG.

Figure 4.9 illustrates the process followed by the ClusterWizard class. Whenever

an MDG file is selected (either by creating, uploading or selecting an MDG file), all

buttons (other than the Download and View button which are activated after clus-

tering) and tabs are activated. The class then produces two subprocess which calls

66

MQ Calculater

BunchMQCal

an MDG

ClusterWizard

Create/Select/Upload

BunchLibrary & BunchNode

Configure omnipresent modules

Relationship searching

BunchOmni

Dot

Create graphs

BunchAPILinks1

Run clustering

Grappa

View graph

Figure 4.9: The sequence diagram of the Clustering subsystem

the BunchLibrary and BunchNode classes. The two java programs invoke the Bunch

API to generate a list of all libraries and nodes in the system and save them in tem-

porary files. When the subprocess is completed, the ClusterWizard reads the files.

If the user chooses to configure omnipresent modules, a subprocess is generated in

which BunchOmni is invoked. When the user hits the Run button, the ClusterWizard

class produces a subprocess which calls BunchAPILinks1 and passes all parameters

to it. Those parameters that the user didn’t configure are set to a default. The

BunchAPILinks1 produces a dot file. The ClusterWizard classes then generates two

additional subprocess which invoke the dot utility to convert the dot file into PDF

and PS files. After that, the Download and View buttons are activated. If the user

chooses to view the graph, the request is redirected to Grappa for display. If the user

wants to download the results they can obtain PostScript file and view them with a

local viewer such as Acrobat reader (for PDF) or gohstscript (for PostScript).

The user may choose to execute the MQ calculator utility. After an MDG and

67

a SIL file are selected, and the user hits the Calculate button, a subprocess calls

BunchMQCal to perform the calculation. The results are stored in a temporary file,

which are read by the ClusterWizard class. The results are then formatted and

displayed to the user.

This chapter described REportal’s overall design and architecture. The next chap-

ter focuses on how we validated REportal’s interface from an ease-of-use perspective.

68

Chapter 5. Validation

Computer software is designed to target a certain group of users; therefore, usability

is essential to its success. Even well-designed software packages built through the

efforts of many software engineers may be rendered useless simply because the end

user cannot use the program easily. This is especially true for REportal since it is

designed to provide a simple, consistent, and intuitive user interface that abstracts

the complexities of the underlying reverse engineering tools.

Although the current user interface has evolved through many iterations, there

may still exist flaws that are inconsistent with the expectations of software engineers

for a portal web site. Likewise, although thorough tests have been done on REportal,

it is very possible that there are some bugs that have gone undetected.

As I write this thesis, the first version of REportal is ready to be released. Before

the release of REportal, we wanted to extensively test the software to remove defects

and investigate ways to improve the user interface.

Evaluating the user interface of software is not easy. Dr. Hewett from the Psy-

chology department of Drexel University, who has rich experience in this area, gave

us several ideas12. According to Dr. Hewett, the easiest way to evaluate the user

interface of a software system is to conduct usability studies in which a few typical

users perform some tasks using the software. In the study, the users are observed to

reveal usability design improvements.

12Dr. T. Hewett can be reached at [email protected]

69

5.1 Design of Tasks

The usability study was designed with the following requirements:

• The tasks should be able to finish within an hour or so, otherwise participants

may lose patience.

• The tasks must be based on real-life scenarios so that the results are meaningful

and convincing.

• To finish the tasks most, if not all, of the features provided by REportal should

be covered.

Two software systems were chosen for this study: Apache Tomcat 4.0.4 and Bunch

3.3.5. Apache Tomcat is an implementation of the Java Servlet 2.3 and JavaServer

Pages 1.2 Specifications. It should be noted that Tomcat version 4.04 is a complete

re-implementation of earlier versions of this application. The open source code of

Apache Tomcat 4.0.4 is available at FRESHMEAT [2] and Jakarta [19]. Bunch is

a clustering tool intended to aid the software developer and maintainer with under-

standing, verifying and maintaining a source code base [7]. Bunch is integrated into

REportal to provide the design extraction feature. Apache Tomcat 4.0.4 has about

40 classes while Bunch 3.3.5 has 220.

The participants were asked to answer six to seven questions for each system.

These questions cover most of the features provided by REportal, such as entity

searching, relationship searching, reachability querying, design extraction, source code

browsing and administrative functions. These questions are practical and are similar

to questions that are likely to be asked when studying a software system. Please refer

to Appendix A where the complete list of tasks for the evaluation are documented.

70

5.2 Conducting the Evaluation Study

The Software Engineering Research Group (SERG) at Drexel University consists of

several undergraduate students, graduate students, and faculty. Some of them have

full-time jobs in industry as software engineers. REportal contains a comprehensive

set of reverse engineering tools to profile and mine the source code of software sys-

tems. The target users are researchers, students and software engineers, which can

be represented by the people of SERG. Therefore, volunteers who represent typical

users of REportal can be recruited from SERG.

Dr. Hewett offered a two-hour lecture for all of the participants involved in the

evaluation study. The lecture covered the method by which information about users

can be gained when evaluating the usability of a software system. To practice the

method described in the lecture, participants are asked to complete the tasks using

REportal during the information-gathering sessions following the lecture. Hence,

their participation becomes a learning experience of a methodology for evaluating the

usability of software, which might even benefit the participants later. Dr. Hewett’s

lecture gave participants educational benefits, thus increasing their enthusiasm toward

taking part in the experiment.

An email message was sent to the SERG mailing list calling for volunteers to

participate in the experiment. Realizing the benefit they may obtain from the exper-

iment, four people volunteered to participate. Among them are one undergraduate

student and three graduate students. One graduate student also has a full-time job

as a software engineer.

After the lecture, each participant attended a one-hour information-gathering ses-

sion, during which they finished the tasks using REportal. Participants were encour-

aged to think aloud and to give feedback about the user interface while complet-

ing the tasks. The author and William Mongan (a REportal developer) observed

and recorded problems that the participants encountered and the feedback that they

71

gave. All information-gathering sessions were video-taped for the record so that we

can review the video and find out what we missed.

5.3 Results of the Evaluation Study

Overall, the study went smoothly. All participants were able to answer most of the

questions correctly within an hour. Although no major problems were found, some

flaws and bugs were revealed:

Faults Found

• Sometimes the graph visualization windows crash. REportal uses Grappa [39]

to draw graphs. According to the author of Grappa, it can only handle small

graphs with size of up to 1 to 2 Megabytes. REportal sometimes has to handle

graphs as large as 3 to 5 Megabytes. REportal’s graph viewing feature may

push Grappa’s limits, as it was implemented to handle smaller size graphs.

• The clustering task freezes occasionally and needs further investigation.

User Interface Problems

• There are many types of relationships between two entities, such as inheritance,

containment and method invocation. However, when relationship searching

results are listed in a table, no information about the types of relationships is

displayed.

• Clustering takes a long time for large systems. When clustering, the graph

window is blank. It is hard to tell if the clustering is in progress. A progress

bar would be helpful.

72

• When a project is opened, all important features are listed as tabs on the top

of the window for easy access. However, “reachability searching”, an important

feature, is not listed there. As a result, all participants had difficulties finding

and using this feature.

• Advanced Search needs significant improvement. Captions in this feature are

not descriptive. Many features in it do not work properly.

Some other interesting observations were:

• Although REportal provides context-sensitive on-line help documentation, and

the hyperlink to it is placed on the right top of every window, people only

view the documentation as a last resort of help. When they get stuck, they

would rather go through many iterations of trial-and-error than refer to the help

documentation. Even when they go to the help documentation, they glance at

screen shots in an attempt to find the answer as soon as they can, instead of

reading documentation.

• When a project is opened, a list of tabs is displayed on the top of the window.

Each tab is associated with a feature and it is enabled only after it is clicked.

The only exception is that the first tab is enabled as default. The choice of

the default tab is important. Since the first tab, Entity Search, is enabled as

default, people attempt to answer all questions using this feature. They switch

to other features only when this one fails. This implies that the most frequently

used feature should be set as default.

• Although the user interface of REportal is designed to be intuitive, we found

that participants still have trouble using it the first time. This is reasonable

and analogous to renting a car. Even though every driver knows how to drive,

they still need a few minutes to get familiar with the car they just rented. Once

73

they know what is where, they can drive it comfortably. The same logic applies

to REportal. Once the users know what features REportal provides and how

to use them, they can use REportal very comfortably. This explains why they

spent less time studying the second system than the first one.

User Wish List

Some valuable suggestions from participants are listed as below:

• REportal should offer a feature that gives statistics about a project, such as

number of classes, relationships, and methods.

• When doing a relationship search, REportal should provide spell checking for

the names of entities, because a misspelled entity name returns nothing, nor

does a search in which no relationships exist.

74

Chapter 6. Conclusions and Future Work

This chapter summarizes the work described in this thesis and outlines some plans

for future work.

6.1 Summary & Research Contributions

Practitioners and researchers can take advantage of the latest developments in reverse

engineering if the services provided by the tools are made available on the WWW.

In this thesis, we present a portal web site called REportal, which is a web-based

application that integrates many stand-along reverse engineering tools. These tools

reside on a server and can be accessed by authorized users through internet browsers.

REportal provides an intuitive and friendly user interface that ease the using of those

tools. Using REportal, users are able to analyze and browse source code, as well as

to query and extract design information for Java programs. REportal is valuable to

both experienced and inexperienced users.

To work with REportal, users only need Java-enabled web browsers. This allows

them to analyze and visualize source code without going through the troubles of

obtaining, installing, and learning how to use the set of reverse engineering tools.

Users all over the world may access the latest version of the portal service easily.

The author’s main contributions include:

• Designed and implemented a user interface that enhanced REportal’s usability.

75

• Developed many new features to REportal such as file system navigation, the

Bunch clustering tool integration, entity searches, relationship searches, and

text searches.

• Maintained the running of REportal.

At the time of this writing, REportal has 54 users from institutions of higher

learning and industry. Up to this point and time REportal has been in beta, we are

now ready to release our first production version.

6.2 Plans of Future Work

This section outlines our plans for future work, which include REportal’s new services,

and improved security.

• C/C++ program analysis. Currently REportal only supports Java pro-

grams. In the future, GAST-MP (a source code analysis tool developed at

SERG for GNU C) will be integrated with REportal so that C programs can

be supported as well.

• Portability. Professional software developers prefer to analyze and store their

code on their own sites because of security concerns. One way this can be

facilitated is to develop a portable version of REportal that can be installed

locally (probably behind a corporate fire wall).

• Enhanced security. Other than user passwords, other information such as a

project’s source code is not encrypted. Future work might encrypt the whole file

system so that users are more confident when they are uploading their source

code to the server.

76

• Personalization. Users will be able to personalize REportal in the future

configure personal preferences such as the layout of tools, background colors,

and the number of query entries per page, and so on.

• Data mining. With additional features and security, we expect that REportal

will have more active users in the near future. As a result, this will create a

large repository of source code available for data mining analysis to support

future software engineering research.

77

Bibliography

[1] The drawdag at AT&T. http://www.research.att.com/dist/drawdag/mail.

[2] Freshmeat. http://freshmeat.net/.

[3] The Apache Software Foundation. http://httpd.apache.org/.

[4] JAVA Servlet Technology. http://java.sun.com/products/servlet/index.html.

[5] Jason Hunter and William Crawford. JAVA Servlet Programming. Chapter 1,pages 1-7. O’Reilly, 1999.

[6] The AT&T Labs Research Internet Page. http://www.research.att.com.

[7] The Bunch Project. Drexel University Software Engineering Research Group(SERG). http://serg.mcs.drexel.edu/bunch.

[8] The REportal Project. Drexel University Software Engineering Research Group(SERG). http://reportal.mcs.drexel.edu/.

[9] LintPlus Online. http://www.cleanscape.net/products/lintonline.

[10] P.A.V. Hall. Software Reuse and Reverse Engineering in Practice. Vhapman &Hall, 1992.

[11] Brian Mitchell. A Heuristic Search Approach to Solving the Software ClusteringProblem. Drexel University Ph.D. Thesis, 2002.

[12] Y. Chen. Reverse engineering. In B. Krishnamurthy, editor, Practical ReusableUNIX Software, chapter 6, pages 177–208. John Wiley & Sons, New York, 1995.

[13] Y. Chen, E. Gansner, and E. Koutsofios. A C++ Data Model Supporting Reach-ability Analysis and Dead Code Detection. In Proc. 6th European Software En-gineering Conference and 5th ACM SIGSOFT Symposium on the Foundationsof Software Engineering, September 1997.

[14] E.R. Gansner, E. Koutsofios, S.C. North, and K.P. Vo. A technique for drawingdirected graphs. IEEE Transactions on Software Engineering, 19(3):214–230,March 1993.

[15] P. Finnigan, R. C. Holt, I. Kalas, S. Kerr, et al. The Software Bookshelf. IBMSystems Journal, 36(4):564-593, 1997.

78

[16] R. C. Holt. Concurrent Euclid, The UNIX System and Tunis. Addison Wesley,Reading, Massachusetts, 1983.

[17] K. Hwang. Advanced Computer Architecture: Parallelism, Scalability, Pro-grammability. McGraw-Hill, 1993.

[18] J. Korn, Y. Chen, and E. Koutsofios. Chava: Reverse engineering and trackingof java applets. In Proc. Working Conference on Reverse Engineering, October1999.

[19] The Jakarta Project. http://jakarta.apache.org.

[20] R. Koschke. Evaluation of Automatic Re-Modularization Techniques and theirIntegration in a Semi-Automatic Method. PhD thesis, University of Stuttgart,Stuttgart, Germany, 2000.

[21] R. Koschke. Software visualization in software maintenance, reverse engineering,and reengineering - a research survey. http://www.informatik.uni-stuttgart.de/-ifi/ps/rainer/softviz/, 2000.

[22] S. Mancoridis. ISF: A Visual Formalism for Specifying Interconnection Styles forSoftware Design. International Journal of Software Engineering and KnowledgeEngineering, 8(4):517–540, 1998.

[23] S. Mancoridis, B.S. Mitchell, Y. Chen, and E.R. Gansner. Bunch: A clusteringtool for the recovery and maintenance of software system structures. In Proceed-ings of International Conference of Software Maintenance, pages 50–59, August1999.

[24] S. Mancoridis, B.S. Mitchell, C. Rorres, Y. Chen, and E.R. Gansner. Usingautomatic clustering to produce high-level system organizations of source code.In Proc. 6th Intl. Workshop on Program Comprehension, June 1998.

[25] S. Mancoridis, T. Souder, Y. Chen, E. R. Gansner, and J. L. Korn. REportal: Aweb-based portal site for reverse engineering. In Proc. Working Conference onReverse Engineering, October 2001.

[26] Microsoft .Net. Microsoft Corporation. http://www.microsoft.com/net.

[27] S. North and E. Koutsofios. Applications of graph visualization. In Proc. Graph-ics Interface, 1994.

[28] The Drexel University Software Engineering Research Group (SERG).http://serg.mcs.drexel.edu.

[29] Secure Socket Layer. http://wp.netscape.com/security/techbriefs/ssl.html.

[30] Sourceforge.net. http://sourceforge.net/.

[31] Reengineering Wiki. http://www.program-ransformation.org/re/.

79

[32] The Netcomputing AnyJ Java IDE. http://www.netcomputing.de/html/main/html.

[33] The Red Hat Source-Navigator. http://sources.redhat.com/sourcenav/.

[34] M. Storey, K. Wong, F. Fracchia, and H. Muller. On integrating visualizationtechniques for effective software exploration. In Proc. of IEEE Symposium onInformation Visualization, October 1997.

[35] Tom Sawyer. Graph Drawing and Visualization Tool.http://www.tomsawyer.com.

[36] T.A. Wiggerts. Using clustering algorithms in legacy systems remodularization.In Proc. Working Conference on Reverse Engineering, October 1997.

[37] Graph Drawing Server at Brown Universityhttp://loki.cs.brown.edu:8081/graphserver/gds/gds-home.shtml

[38] S. Bridgeman, A. Garg and R. Tamassia A graph drawing and translation serviceon the WWW. In Lecture Notes Comput. Sci. Springer-Verlag, 1997.

[39] N. S. Barghouti, J. Mocenigo, and W. Lee. Grappa: A Graph Package in Java. inFifth International Symposium on Graph Drawing, pages 336-343. SpringerVer-lag, Sept. 1997.

80

Appendix A. Tasks of Evaluation

A.1 Tomcat

Please create a folder for the Tomcat system, upload the Jakarta-tomcat-4.0.4.zipfrom the disk and answer the following questions using REportal.

1. Where is the doEndTag method defined?2. How many methods are there in Tomcat?3. What class variables are defined in the class src/jakarta-tomcat-4.0.4

/webapps/examples/WEB-INF/classes/filters/ExampleFilter?4. How many subsystems are there in Tomcat? What are they?5. What is the parent class of CompressionFilters.CompressionResponseStream?6. if the CompressionResponseStream.flushToGZip method has bugs, what

classes might be impacted by it?

A.2 Bunch

Please create a folder for the Bunch system, upload the bunch.jar from the disk andanswer the following questions using REportal.

1. In which file is the main method defined?2. How many classes are there in Bunch?3. How many subsystems are there in Bunch?4. What methods are defined in the class BunchServer?5. Does the maximizeCluster method in the src/bunch/BunchServer/

ClusterUsingVectorSAHC class call the getLocks method in the src/bunch/Clusterclass?

6. Two clustering algorithms, namely Hill Climbing and GA, have been imple-mented. What classes possibly need to change if an additional clustering algorithmis added?

7. The class BunchUtilities is known to have bugs. What classes are probablyaffected by it?

Thank you for your participation!

Documents

A Reverse Engineering Portal Web SiteA Reverse Engineering Portal Web Site Congrong Liao and Spiros Mancoridis Technical Report DU-CS-02-04 Department of Computer Science Drexel University