24
Y The SPOOL Approach to Pattern-Based Recovery of Design Components Rudolf K. Keller Reinhard Schauer Sébastien Robitaille Bruno Laguë Y.1 Introduction Automated tool support is crucial for the comprehension of large-scale, ob- ject-oriented software and involves compressing and clustering the vast amount of information that is contained in the source code. However, software compre- hension demands more than the mere understanding of the static structure of the source code. The clear representation of the system’s physical and logical struc- ture is still insufficient for a developer to fully comprehend the purpose of a given piece of software (Beck and Johnson, 1994). Underlining this statement, Booch estimates that “it takes a professional programmer about 6-9 months to become really proficient with a larger framework”, and he adds that “this rate increases rather exponential to the complexity of software” (Booch, 1996). We agree with Beck and Johnson that one reason for this gigantic effort for software comprehension and evolution is that “existing design notations focus on commu- nicating the what of designs, but almost completely ignore the why” (Beck and Johnson, 1994). They argue that comprehension of the rationale behind the de- sign decisions is as much important as thorough understanding of the software’s structural and logical constituents. Yet, for the most part, current reverse engi- neering tools completely neglect recovery of the design rationale. Design patterns capture the rationale behind recurringly proven design solu- tions and illuminate the trade-offs that are inherent in almost any solution to a non-trivial design problem. In forward engineering, the advantages of design patterns are widely accepted (Beck and Johnson, 1994), but in reverse- engineering their usefulness encounter strong resistance throughout both the pat- tern and the reverse-engineering communities (Buschmann et al., 1996). The

Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

YThe SPOOL Approach to Pattern-BasedRecovery of Design Components

Rudolf K. KellerReinhard SchauerSébastien RobitailleBruno Laguë

Y.1 Introduction

Automated tool support is crucial for the comprehension of large-scale, ob-ject-oriented software and involves compressing and clustering the vast amountof information that is contained in the source code. However, software compre-hension demands more than the mere understanding of the static structure of thesource code. The clear representation of the system’s physical and logical struc-ture is still insufficient for a developer to fully comprehend the purpose of agiven piece of software (Beck and Johnson, 1994). Underlining this statement,Booch estimates that “it takes a professional programmer about 6-9 months tobecome really proficient with a larger framework”, and he adds that “this rateincreases rather exponential to the complexity of software” (Booch, 1996). Weagree with Beck and Johnson that one reason for this gigantic effort for softwarecomprehension and evolution is that “existing design notations focus on commu-nicating the what of designs, but almost completely ignore the why” (Beck andJohnson, 1994). They argue that comprehension of the rationale behind the de-sign decisions is as much important as thorough understanding of the software’sstructural and logical constituents. Yet, for the most part, current reverse engi-neering tools completely neglect recovery of the design rationale.

Design patterns capture the rationale behind recurringly proven design solu-tions and illuminate the trade-offs that are inherent in almost any solution to anon-trivial design problem. In forward engineering, the advantages of designpatterns are widely accepted (Beck and Johnson, 1994), but in reverse-engineering their usefulness encounter strong resistance throughout both the pat-tern and the reverse-engineering communities (Buschmann et al., 1996). The

Page 2: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

2

main arguments are that patterns can be implemented in many different wayswithout ever being the same twice, and that the same structure may recur withwidely different intents. In addition, existing studies that were aimed at detectingdesign patterns in existing software systems (Antoniol et al., 1998; Kraemer andPrechelt, 1996) failed to convey the usefulness of this approach to reverse engi-neering, considering the minimal results of recovered pattern instances. Never-theless, it is these patterns of thought that comprise the rationale of many piecesof an existing software system, and to comprehend the software, we need to re-cover these patterns, be it automatically or manually.

In the SPOOL project (Spreading Desirable Properties into the Design ofObject-Oriented, Large-Scale Software Systems), a joint industry/university col-laboration between the software quality assessment team of Bell Canada and theGELO group at the University of Montreal, we are investigating methods andtools for software comprehension and for assessing software design quality, forinstance in respect to changeability (see Chapter Y). As part of the project, wehave developed the SPOOL environment for design pattern engineering, whichcomprises functionality for design composition (Keller and Schauer, 1998),change impact analysis (see Chapter Y), and most importantly, support for therecovery of design patterns.

Note that with “support” we underline the purpose of the environment as anaid for gaining a pattern-based overview of the software system under investiga-tion. It would be pretentious to argue that the environment itself can comprehendthe rationale behind a design, “which would go far beyond the current state-of-the-art in artificial intelligence” (Brown, 1997); however, by generating appro-priate views, it may lead a human analyzer to the recovery of the rationale be-hind some of its most critical parts. Using the environment, the analyzer canzoom into these design components1 that resemble patterns, extract them as dia-grams in their own right, contrast the pattern description with the implementedstructures, or, in the case of a false positive, dismiss the existence of the auto-matically identified pattern instance.

In this chapter2, we apply our environment to the reverse engineering of de-sign components that are based on some of the design pattern descriptions de-fined by Gamma et al. (1995). The purpose is to introduce pattern-based reverseengineering as a valuable technique for software comprehension and thuscounter the widely-held believe that design patterns are only meaningful in for-

1 Note that we introduced the term design component as the reification of design ele-ments, such as patterns, idioms, or application-specific solutions, and their provision assoftware components (JavaBeans, COM objects, or the like), which are manipulated viaspecialization, adaptation, assembly, and revision. We refer to (Keller and Schauer, 1998)for further details on this approach to software composition. For the purpose of this chap-ter, we use the term design component as a package of structural model descriptions to-gether with informal documentation, such as intent, applicability, or known-uses.2 This chapter is a revised and extended version of (Keller et al., 1999).

Page 3: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

3

ward engineering. Applying our approach to several case studies extracted fromindustrial, large-scale software, we show that pattern-based reverse engineeringof design components is helpful for understanding software-in-the-large. In Sec-tion Y.2, we explain the architecture of the SPOOL environment. In Section Y.3,we describe the three C++ systems that we used for experimentation, presentthree case studies that show how we applied pattern-based reverse engineering ofdesign components, and discuss the findings of our experiments. Section Y.4compares our approach with related techniques. Section Y.5 concludes the chap-ter and provides an outlook into future work.

Y.2 The SPOOL Reverse Engineering Environment

The SPOOL reverse engineering environment (Figure Y.1) uses a three-tierarchitecture to achieve a clear separation of concerns between the end user tools,the schema and the objects of the reverse engineered models, and the persistentdatastore. The lowest tier consists of an object-oriented database managementsystem, which provides the physical, persistent datastore for the reverse engi-neered source code models and the design information. The middle tier consistsof the repository schema, which is an object-oriented schema of the reverse en-gineered models, comprising structure (classes, attributes, and relationships),behavior (access and manipulation functionality of objects), and mechanisms(higher-level functionality, such as complex object traversal, change notification,and dependency accumulation). We call these two lower tiers the SPOOL designrepository. For a detailed description of the design repository, refer to ChapterZ. The upper tier consists of end user tools implementing domain-specific func-tionality based on the repository schema, i.e., source code capturing, and visu-alization and analysis.

In this section, we first describe the environment’s techniques and tools forsource code capturing. Then, we explain its functionalities for pattern-based de-sign recovery and for design representation.

Y.2.1 Source Code Capturing

Source code capturing is the first step within the reverse engineering process.Its goal is to extract an initial model from the source code. At this time, SPOOLsupports C++ and uses Datrix (Datrix, 2000) to parse C++ source code files. Asthe Datrix team already provides CSER members with a parser for Java, SPOOLwill soon be able to extend its support for reverse engineering Java source code.Datrix provides complete information on the source code in form of an ASCII-based representation, the Datrix/TA intermediate format. The purpose of thisintermediate representation is to make the Datrix output independent of the pro-gramming language being parsed. Moreover, it provides a data export mecha-

Page 4: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

4

mechanism to analysis and visualization tools, ranging from Bell Canada’s suiteof software comprehension tools to the SPOOL environment and to other CSERsource code comprehension tools. SPOOL parses the Datrix/TA files, which aregenerated for each compile unit, and imports the data into the SPOOL reposi-tory. In order to leverage off-the-shelf parsing technology for this intermediateformat, we convert Datrix/TA into XML (W3C, 1998) syntax (Datrix/TA-XML)

Source Code Capturing

Source Code Parser(Datrix)

Intermediate Format(Datrix/TA-XML)

Visualization and Analysis

• Source code visualization• Dependency analysis• Searching and browsing• Design querying• Design inspection and visu-

alization• Design component editing• Metrics analysis

Source Code(C/C++, Java)

Intermediate Format Importer(xml4j)

Design Repository

Repository Schema• Reverse engineered source code models• Abstract design components• Implemented design components• Recovered design models• Re-organized design models

Object-OrientedDatabase Management System

Pattern-BasedDesign Recovery(automatic,semi-automatic,manual)

Figure Y.1. Overview of SPOOL reverse engineering environment.

Page 5: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

5

and use IBM’s xml4j XML parser (IBM, 1999) to read and traverse the contentof the Datrix/TA-XML files. xml4j complies to the Document Object Model(DOM; W3C, 1999) as the industry standard for XML-based file content tra-versal, and it supports the Simple API for XML (SAX, 2000). Since the gener-ated Datrix/TA-XML files are typically quite large, we opted for using SAX,which views an XML document as a stream of elements that can be discardedquickly. An import utility as part of the SPOOL repository infrastructure assem-bles the nodes and arcs of the Datrix/TA-XML intermediate source code repre-sentations and constructs the objects of an initial physical model in the SPOOLrepository. Note that we are currently working on substituting the Datrix/TA-XML representation by an XMI-based solution (XMI, 2000; Saint-Denis et al.,2000). At the current state of development, we capture and manage in the reposi-tory the source code information as listed in Table Y.1.

Table Y.1. Source code information managed in the SPOOL repository.

1. Files (name, directory)2. Classifier – classes, structures, unions, anonymous unions, primitive

types (char, int, float, etc.), enumerations [name, file, visibility]. Classdeclarations are resolved to point to their definitions.

3. Generalization relationships [superclass, subclass, visibility].4. Attributes [name, type, owner, visibility]. Global and static variables are

stored in utility classes (as suggested by the UML), one associated toeach file. Variable declarations are resolved to point to their definitions.

5. Operations and methods [name, visibility, polymorphic, kind]. Methodsare the implementations of operations. Free functions and operators arestored in utility classes (as suggested by the UML), one associated toeach file. Kind stands for constructor, destructor, standard, or operator.

5.1. Parameters [name, type]. The type is a classifier.5.2. Return types [name, type]. The type is a classifier.5.3. Call actions - [operation, sender, receiver]. The receiver points to the

class to which a request (operation) is sent. The sender is the classifierthat owns the method of the call action.

5.4. Create actions. These represent object instantiations.5.5. Variable use within a method. This set contains all member attributes,

parameters, and local attributes used by the method.6. Friendship relationships between classes and operations.7. Class and function template instantiations. These are stored as normal

classes and as operations and methods, respectively.

Page 6: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

6

Y.2.2 Pattern-Based Design Recovery

The purpose of pattern-based design recovery is to help structure parts ofclass diagrams to resemble pattern diagrams (see Figure Y.2, window 4). Weenvision three techniques to support this task: automatic design recovery, manualdesign recovery, and semi-automatic design recovery. Automatic design recov-ery relates to the fully automated structuring of software designs according topattern descriptions, which are stored in the repository as abstract design com-ponents. We have implemented query mechanisms that can recognize the struc-tural descriptions in the source code models, extract these from the source code,and visualize them within the class hierarchies. This technique will be furtherdetailed in Section Y.3. Manual design recovery relates to the structuring of soft-ware designs by manually grouping design elements, such as classes, methods,attributes, or relationships, to reflect a pattern. Our environment allows thedeveloper to select model elements and associate them with the roles of the re-spective pattern elements. Manual design recovery gives the human analyzer thepossibility to look at a model from their own perspectives and cluster designelements to design components. It provides the flexibility that is necessary togroup and communicate ad-hoc solutions as proto-patterns (Appleton, 1998),which may at some time even become patterns. Semi-automatic design recoverycombines both strategies, automatic and manual recovery. It may be imple-mented as a multi-phase recovery process. The first phase consists of the auto-matic detection of low-level idioms or the general core of pattern descriptions.Subsequent phases match the identified instances with more specificimplementation details, which may be provided interactively by the analyzer whois in control of the recovery process. He or she may interrupt recovery runs toconfirm or decline the existence of a pattern occurrence, and to manually supplyspecifics that are not covered by the default recovery queries. At the currentstage of development, we have implemented the techniques for automated andmanual design recovery.

Y.2.3 Design Representation

The purpose of design representation is to provide for the interactive visuali-zation and analysis of source code models, abstract design components, and im-plemented components. It is our contention that only the interplay among humancognition, automatic information matching and filtering, visual representations,and flexible visual transformations can lead to the all-important why behind thekey design decisions in large-scale software systems. To date, we have imple-mented and integrated tools (see Figure Y.1, Visualization and Analysis) for

• Source code visualization (visualizing classes, attributes, methods, op-eration calls, instantiations, friendship dependencies, type dependenciesbased on the types of parameters, operations and methods, and attrib-utes),

Page 7: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

7

• Interactive and incremental dependency analysis (the user may select anumber of classes, files, or directories, and the system loads and visual-izes the dependencies among these elements; see Section Z.4.3),

• Design investigation by searching and browsing, based on both structureand full-text retrieval, using the SPOOL Design Browser (Robitaille etal., 2000) (cf. Section Y.3.2),

• Design querying to classes that collaborate to solve a given problem,• Design inspection and visualization within the context of the reverse en-

gineered source code models,• Design component editing, allowing for the interactive description of

design components, and• Metrics analysis to conduct quantitative analyses on desirable and unde-

sirable source and design properties.Figure Y.2 illustrates the graphic user interface of the SPOOL environment.

Windows 1 and 2 show the inheritance hierarchy of ET++ (Weinand et al.,1989) (tree layout generated with Dot (Kontsofios and North, 2000) and springlayout generated with Neato (North, 2000)). Via the property sheet associatedwith such diagrams (window 3), all the other association relationships stored inthe repository, such as instantiation or aggregation relationships, can be illus-trated as well, in both separate and combined forms. Different colors distinguishthe different kinds of association relationships. On the left hand side of eachwindow, a tree view can be optionally displayed (windows 1, 4, 5, and 6) to con-vey in textual form the source code models, abstract design components, or im-plemented design components. Through a diagram’s pop-up menu, designqueries on the information contents of the current diagram can be launched, withsubsequent visualization of the query results (window 4). In our environment,each of the supported abstract design components (the pattern-like structures tobe discovered) comprises a so-called reference class. This is the class in thecomponent’s structure diagram that is considered most characteristic of the com-ponent’s nature. Upon design recovery, we draw incrementally bounding boxesaround the reference classes of the implementations of an abstract design com-ponent (window 4). In this way, a class that is the reference class for several ofthese implemented design components (“multiple reference class”) will exhibit ataller bounding box than a class that is just part of a single component. Keepingthe size of these bounding boxes constant during zooming leads to the effect thatonce their diagrams are sufficiently zoomed out (window 4), multiple referenceclasses will protrude from the diagram. The implemented design components canthen be extracted into a separate diagram and related to the classes, methods, andattributes of their respective abstract design components (window 5), which inthis study represent the descriptions of design patterns. The more informal con-stituents, such as intent, motivation, or applicability, can be viewed in the sameor in separate diagrams (window 6). These informal descriptions are crucial forunderstanding the design, as they capture the rationale that may be at the root ofthe automatically identified design component.

Page 8: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

8

Design representation also encompasses interactive description of designcomponents. The SPOOL environment provides a class diagram editor based onthe UML notation 1.1 (UML, 1997) for structural descriptions and an HTMLeditor for specifying the informal constituents of design components. Using theseeditors, the environment allows for the modeling, documenting, and storing ofnew abstract design components in the design repository. The environment alsosupports the refinement and generalization of existing abstract components. Thisis essential as design components can be rendered in different forms. For exam-ple, a design component representing an Adapter pattern can be refined into aClass Adapter or an Object Adapter, and similarly, a Composite may be special-

12

4

3

6

5

Figure Y.2. Graphic user interface of the SPOOL environment.

Page 9: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

9

ized into a Transparent Composite or a Safe Composite component (Keller andSchauer, 1998).

The user interface of the SPOOL environment is implemented based on Java1.2, the JFC/Swing components (Sun, 2000), and the graphic editor applicationframework jKit/GO (Instantiations, 2000). For visualizing the HTML code, theICEBrowser (ICESoft, 2000) JavaBeans component is being used. To generateinitial layouts of the system under investigation, we developed an interface toexternal layout generators. We integrated Dot (Kontsofios and North, 2000) fortree layouts and Neato (North, 2000) for spring layouts.

Y.3 Applying Pattern-Based Reverse Engineering

The purpose of this section is to point out the importance of pattern-basedreverse engineering of design components for the comprehension of large-scalesoftware. We chose a case study approach to illustrate and discuss some of ourfindings when analyzing industrial systems. We have selected the following ab-stract design components, which we based on the corresponding descriptions inthe pattern catalogue of Gamma et al. (1995): Template Method, FactoryMethod, and Bridge.

To assess the feasibility of pattern-based reverse engineering and the useful-ness of the SPOOL environment, we analyzed the source code of three industrialC++ systems. Bell Canada provided us with two large-scale systems from thedomain of telecommunications. For confidentiality reasons, we call these sys-tems System-A and System-B. Our third test system is the well-known applica-tion framework ET++ 3.0 (Weinand et al., 1989), as included in the SNiFF+development environment (TakeFive, 2000). Table Y.2 shows some size metricsfor these systems. Note that header files from the compiler are included in thesenumbers.

In this section, we first show how we reverse-engineered the selected compo-nents in System-A, System-B, and ET++, respectively. Then, we discuss thethree case studies.

Page 10: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

10

Table Y.2. Size metrics of industrial systems.

System-A System-B ET++

Lines of codeLines of pure commentsBlank lines

472,82460,25680,463

291,61971,20990,426

70,7963,494

12,892

# of files (.C/.h) 1,900 1,153 485# of classes (.C/.h) 3,103 1,420 722# of generalizations 1,422 941 466# of methods 17,634 8,594 6,255# of attributes 1,928 13,624 4,460Size of the system in the repository 63.1 MB 41.0 MB 19.3 MB

Y.3.1 Case Study #1: Template Method

“Template Methods define the skeleton of an algorithm in an operation, de-ferring some steps to subclasses.” (Gamma et al., 1995) Template methods areoften referred to as the characterizing building blocks of white box frameworks,which let clients extend the framework by overriding pre-defined hook methodsthat are called by the framework (Fayad and Schmidt, 1997). The rationale be-hind a Template Method is to make the steps of an algorithm easily exchange-able. The trade-off is that if not used with care, Template Methods cancontribute to overly complex software, especially when the hook methods them-selves are Template Methods deferring functionality to other hook methods. Inlarge, framework-based application software, such as System-A, knowledgeabout the existence and location of Template Methods is crucial for the judiciousevolution of the applications.

Figure Y.3. Structure of Template Method (Gamma et al., 1995).

Page 11: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

11

We reified the Template Method pattern (Figure Y.3 shows its structure) asan abstract design component, stored it in our repository, and associated it with aquery that searches the source code models for the component’s structure. Thedefault implementation of the Template Method query traverses all classes (Ab-stractClass), goes into each method (TemplateMethod), looks up the operationcall tree for local operation calls (PrimitiveOperation), and verifies if Primi-tiveOperation is polymorphic. If all conditions are met, all relevant informationis passed to a Design Component Builder object, which creates an ImplementedDesign Component containing references to the identified elements in the sourcecode model. Note that through query options, the human analyzer can specifydeviations from the default behavior of the query, for instance, to recover onlythose TemplateMethods in which PrimitiveOperation in AbstractClass is purevirtual (in the case of a C++ system), or to check if PrimitiveOperation is over-ridden by at least one class (ConcreteClass) in the AbstractClass’ subclass hier-archy.

Figure Y.4 illustrates the recovered Template Methods in one class tree ofSystem-A (note that the reference class of Template Method is AbstractClass).This diagram clearly shows the key players within this part of the application,and conveys an impression of how many such mini-algorithms, which may berefined in subclasses, exist in the class tree. For instance, the main class, clearlyvisible on top of the diagram, contains 43 Template Methods. More detailedinformation can be recovered by zooming into the diagram, showing operationsand attributes, or by spawning another diagram that shows the implementation ofone particular Template Method only.

Figure Y.4. Template Methods in System-A.

Page 12: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

12

It is our experience that knowledge on both the rationale and the existence ofTemplate Methods is essential to develop an understanding on how to hook intothe mechanisms that are enforced by a framework-like architecture. Such knowl-edge may be of great help in flattening the learning curve of a framework.

Y.3.2 Case Study #2: Factory Method

“Factory Methods define an interface for creating an object, but let sub-classes decide which class to instantiate.” (Gamma et al., 1995). Factory Meth-ods are specialized Template Methods in that the PrimitiveOperation in theConcreteClass instantiates a concrete product (see Figure Y.5). Factory Methodsare often used when different objects have the same construction process. Theconstruction algorithm is coded in the Creator class, and the steps that instantiatethe objects are deferred to the subclasses.

The query for the Factory Method is, obviously, similar to that of the Tem-plate Method, except for the condition that the FactoryMethod in ConcreteCrea-tor is required to instantiate a ConcreteProduct. By default, the query does notenforce that ConcreteProduct be a subclass of another class (Product), but thisadditional constraint can be specified through query options.

Figure Y.6 illustrates the results of the Factory Method query as applied toSystem-B. The upper window shows the inheritance tree of all classes of System-B, which we layed out with Neato. Due to the high zooming ratio (the smallpoints constitute large inheritance trees), the recovered design components pro-trude from the diagram. This is crucial information that can help find a basis forunderstanding a complex piece of software, which is presented in the lower win-dow of Figure Y.6. We zoomed into the tallest bounding box and extracted thedetailed information into a separate diagram (lower window). It illustrates a cen-tral Creator class, which defines 13 abstract FactoryMethod operations, and anoverall of 57 subclasses, which implement these operations.

Figure Y.5. Structure of Factory Method (Gamma et al., 1995).

Page 13: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

13

This automatically generated diagram provides essential information aboutthe rationale behind the design of the system. The developers designed this partof System-B for easy extension with new classes. This was necessary as this partof the system deals with user interface forms and input tables, which by naturechange very fast. The diagram also tells us that the designers decided to instanti-ate objects in the same classes that provide the functionality for their manipula-tion. In the example, a better solution would have been the use of an AbstractFactory, which “provides an interface for creating families of related or depend-ent objects” (Gamma et al., 1995). This would have provided for more flexibilityas the manipulation functionality could have evolved independently from theobject created by the factory. Thus, a different family of objects, which may re-flect changed user requirements or a different user interface platform, could havebeen plugged into the class hierarchy without the need of subclassing existing

Figure Y.6. Factory methods in System-B: overview diagram (upper window); extractedFactory Methods (lower window).

Page 14: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

14

classes. This would have reduced the number of classes from 57 to about 30,improving understandability and maintainability. This example illustrates pat-tern-based reverse engineering of design components as a technique that canhelp a human analyzer not only to comprehend a complex piece of software, butalso to make substantial design improvements.

As a further example, window 1 of Figure Y.7 illustrates the occurrences ofthe Factory Method pattern in ET++. The user may want to inspect the recov-ered pattern instances by starting the design inspection tool (window 2). Thisdiagram shows in its upper part the list of recovered Factory Method patterns,identified by the Creator class and the FactoryMethod. The middle part showsthe selected Factory Method as a collaboration diagram. The example shows theclass ET_Object with the method GetObserverIter, which calls the Factory-Method MakeIterator (top row). This method is overridden in five subclasses ofET_Object (middle row). Each of these implementations of MakeIterator instan-tiates different Products (bottom row). The lower part of the window shows therecovered classes in the context of the overall class hierarchy. This example pre-sents the case where the design patterns Factory Method and Iterator (Gamma etal., 1995) are combined to provide for a flexible traversal mechanism of ET++containers, such as lists, sets, dictionaries, collections, and arrays. Note that theSPOOL Design Browser with its structure and full-text retrieval functionalitymight have been used to hint at instances of the Iterator pattern (Robitaille et al.,2000).

The content of the design inspection diagram was automatically generated bythe query for the Factory Method pattern. The diagram provides precious infor-mation for program comprehension as it presents in a concise way all the classesthat take on some role in a pattern-based collaboration. Note that in the physicalfile structure, these classes may be spread out over many directories and subsys-tems. Yet, the diagram falls short in conveying all the information a user mightwish to obtain about the design fragment at hand. He or she might want to know,for instance, the classes and methods that invoke MakeIterator, or get informa-tion about the semantics of the method GetObserverIter, whose name alludes toits purpose of creating an Iterator of the Observers of a view element. A visualdesign inspection tool can never answer all of these questions.

Page 15: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

15

The SPOOL Design Browser together with the SPOOL mechanism for inte-grating external tools provides the flexibility to obtain detailed knowledge aswell as context information about the constituents of a recovered, pattern-baseddesign. For example, the browser of window 3 shows all the methods from whichMakeIterator is invoked, including the GetObserverIter method already identi-

1

2

4

3

Figure Y.7. Inspection of Factory Methods in ET++, involving overview diagram, designinspection tool, the SPOOL Design Browser, and the SNIFF+ source code environment.

Page 16: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

16

fied in window 2. By invoking the SNiFF+ environment (TakeFive, 2000), theuser can then investigate and edit the retrieved elements directly in the SNiFF+source code editor (window 4). This provides invaluable context informationabout how, in our example, a Factory Method is used.

Y.3.3 Case Study #3: Bridge

The intent of a Bridge pattern is to “decouple an abstraction from its imple-mentation so that the two can vary independently.” (Gamma et al., 1995) TheBridge is a design technique that can avoid combinatorial explosion of class hi-erarchies if a domain concept in different variations can be implemented in mul-tiple ways. If realized using inheritance, each variation would have a subclass foreach of the possible implementations. To avoid this, the Bridge suggests separateclass hierarchies for the abstraction and the implementation (Figure Y.8).

We include the Bridge as one of those patterns that demand human insight tobe recovered from source code. The Bridge is a semantic concept that can havemany forms of physical appearance in the source code. For instance, we haveidentified Bridges with Abstractions that are not subclassed, ConcreteImplemen-tors that do not have a common superclass, or OperationImps that constituteTemplate Methods (see Section 4.1) in which not OperationImp, but its hookmethod is overridden. Our Bridge query captures these cases, and as an addi-tional heuristic verifies that Abstraction and Implementor are not in the samepath of the inheritance tree, which otherwise would be counter to the very intentof the Bridge. The final result was 46 Bridge-based design components in ET++,which not unsurprisingly included many false positives. It is our contention thatthe systematic discovery of the Bridge pattern within source code needs humaninsight into the problem domain of the software. However, as Figure Y.9 illus-

Figure Y.8. Structure of Bridge (Gamma et al., 1995).

Page 17: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

17

trates, a machine can generate appropriate diagrams that are of great value forthe human analyzer to identify instances of the Bridge.

In the upper window of Figure Y.9, we illustrate all recovered Bridges inET++. Abstraction serves as the reference class, which is decorated with abounding box for each Operation that delegates functionality to a subclass of theabstract Implementor that is the target of the maximum number of delegations.More specifically, our default Bridge query looks for classes with an instancevariable (imp) of a type Implementor. It then goes into the operation call tree ofeach method (Operation) in Abstraction, and verifies if the receiver of an opera-tion call (OperationImp) is of type Implementor and is overridden by at least onesubclass of Implementor (ConcreteImplementor). By default, we also allow thatOperationImp be a Template Method, meaning that not OperationImp itself isoverridden, but one of its polymorphic hook methods (see Case #1). We discov-ered many Bridge Implementors in industrial code that were based on TemplateMethods.

Our query reported 46 Bridge-based design components in ET++, yet mostof the visualized Bridges had only up to three bounding boxes (i.e., operationcalls to Implementor), meaning that most probably these automatically recoveredimplementations of Bridge reflect only its structure, but not its intent. Clearlyvisible in Figure Y.9 are a few reference classes with tall bounding boxes (rightside of upper window). The lower windows of Figure Y.9 illustrate the threereference classes that exhibit the most bounding boxes. The lower left windowshows ET_TextView with its superclass ET_StaticTextView, both delegating mul-tiple methods to ET_Text (not displayed in Figure Y.9). The documentation ofET++ (Weinand et al., 1989) describes ET_TextView and ET_Text as the viewand model of the MVC architectural design pattern, which is in this example

Figure Y.9. Bridges in ET++: overview diagram (upper window); ET_TextView class(lower left window); ET_Window class (lower right window).

Page 18: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

18

applied to text handling. In other words, subclasses of ET_TextView providedifferent rendering strategies for instances of ET_Text, thus serving as the Ab-stractions for ET_Text Implementors, which is the very intent of the Bridge de-sign pattern. The lower right window of Figure Y.9 shows the ET_Window classwith 11 bounding boxes. Gamma et al. (1995) describe this case as one of theknown uses of Bridge. In ET++, the ET_WindowPort class serves as the abstractImplementor for different kinds of windows, and ET_XWindowPort andET_SunWindowPort as the ConcreteImplementors.

Y.3.4 Discussion of Case Studies

The purpose of our work is to provide a technique that can supplement cur-rent reverse-engineering tools with the support to recover the all-important ra-tionale behind the design decisions. We based this technique on design patternsand presented three case studies, each illustrating a different pattern on a differ-ent industrial system. Related studies on pattern detection (Antoniol et al., 1998;Kraemer and Prechelt, 1996) provided tables indicating numbers for the detectedpatterns and the true pattern implementations in the investigated systems. Weargue that these numbers are misleading as they neither express quality of theanalyzed software or the detection tool, nor convey the rationale behind the pat-tern-based design (see Section Y.4 for further discussion). We believe in thestrength of visualization and the integration of the human into the recovery proc-ess. Therefore, we selected a case study approach to convey the practicability ofpattern-based reverse engineering. However, for comparison purposes, we sum-marize the results of our default recovery queries in Table Y.3.

Table Y.3. Implemented pattern-based design components.

System-A System-B ET++Template Method 3,243 1,857 1,022Factory Method 247 168 44Bridge 108 95 46

As the structures of Template Method (Figure Y.3) and Factory Method(Figure Y.5) unambiguously reflect the intent of the respective pattern, and inlight of our rich software repository, which includes information on both opera-tion calls and polymorphic methods, we can rely on the recovered design com-ponents for both patterns being correct. The Bridge pattern, on the other hand,requires human judgment. It is one of those patterns that can be implemented inmany different ways. We captured some of these implementations, and, as casestudy 3 illustrates, used the technique of growing bounding boxes to visuallyidentify those Abstractions that delegate many operations to an Implementor. InSystem-A, for example, the reference classes of 13 out of 108 discovered Bridge

Page 19: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

19

design components exhibited more than 5 bounding boxes; 6 of these were sur-rounded by more than 50 bounding boxes, which was clearly visible in the dia-gram. 4 design components were real Bridge pattern implementations, the 2others delegated many operations to another class, which provided much func-tionality, but did not have the semantics of an Implementor for the Abstraction ofthe considered component.

Y.4 Related Work

Below, we will briefly review a number of studies dealing with the detectionand the identification of design patterns. Also, we will discuss related work ad-dressing fine-grained design recovery. Finally, we will reflect on the added valueof our approach in the realm of documentation with patterns.

Several studies reported in the literature aim at detecting design patterns inobject-oriented software based on structural descriptions. Kraemer and Prechelt(1996) developed a Prolog based front-end to the Paradigm Plus CASE tool.They observed a precision ranging from 14 to 50 percent. Similar results arereported by Antoniol et al. (1998). However, as the number of patterns found inthe analyzed software was close to zero, the precision factor has little signifi-cance. Moreover, both studies report that only the header files of C++ programswere analyzed, meaning that their experiments were conducted in the absence ofinformation on function calls and object instantiations. Moreover, Kraemer andPrechelt (1996) do not report whether they considered polymorphism in theirtool, and Antoniol et al. (1998) mention that they do not handle polymorphism,information that we consider indispensable for the identification of pattern-likestructures in source code models. Note that we consider the information cur-rently managed by our repository (Table Y.1) as the minimum for serious recov-ery of pattern-based design components. Finally, we believe that only by thedirect involvement of the human analyzer in the recovery process interestingpattern-based design components may be found.

Other studies that have influenced our work report on identifying patterns inexisting software. Brown (1997) reviews the Gamma et al. patterns and providesan overview of how to identify each pattern in Smalltalk software. He discussesthe difficulties of recovery of patterns in existing software, but also stresses thefeasibility of detecting useful patterns in source code. Martin (1995) summarizeshis experience when manually looking for patterns in existing software. Despitethe fact that the application that his team investigated had been designed withoutany formal knowledge on patterns, they discovered that “in one or other formevery pattern of Gamma et al. was used.” Both studies convey the message that itis the human analyzer who needs to be in control of the detection process.

The recovery of design components has been subject of active research undervarying terminology. Rich and Waters (1988) coin the term cliché for “com-monly used combinations of elements with familiar name.” Similarly, Baniassad

Page 20: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

20

and Murphy (1998) define conceptual modules as “a set of lines of source codethat are to be treated as a logical unit.” The difference between these techniquesand pattern-based recovery of design components is in the level of abstraction.Whereas clichés and conceptual modules represent only small algorithms or datastructures, patterns illustrate the complex relationships among the large pieces ofsoftware and, equally important, embody informal explications of the rationalebehind the suggested designs. It is our contention that reverse engineering oflarge-scale software needs to put more emphasize onto discovering these well-known patterns of thought. Revisiting the statement of Johnson in our introduc-tion, it is the rationale behind the design decisions (the why) that needs to berecovered to gain insight into more complex pieces of software. Clichés, concep-tual modules, and alike cannot convey the why, but certainly are much neededbuilding blocks for achieving more elaborate recovery of pattern-based designcomponents.

Many authors have discussed the advantages of documenting software, and inparticular frameworks, with pattern (Butler et al., 2000; Johnson, 1992; Odenthaland Quibeldey-Cirkel, 1997). Johnson brings their cause to the point: “Patternscan describe the purpose of a framework, can let application programmers use aframework without having to understand in detail how it works, and can teachmany of the design details embodied in the framework” (Johnson, 1992). Weclaim that only the visualization of the implemented patterns in the context of theapplication under investigation will make documentation with patterns truly ef-fective, elucidate the rationale behind the framework’s design and make the ap-plied patterns more tangible and understandable. In reverse engineering, pattern-based re-documentation of existing frameworks and large-scale software needssophisticated tool support allowing the human analyzer to look at the softwarefrom different perspectives, and thus gain a more-encompassing picture of thecomplex relationships among the system’s constituents.

Y.5 Conclusion

Design patterns capture the subtle design decisions that have proven success-ful in many software development projects. They document the rationale behindthe design, which is so important to understand when evolving a software systemto meet the continuously changing requirements. Our experience when manuallyanalyzing parts of two telecommunications software systems of Bell Canada con-firm the findings of Martin that most of the design patterns of Gamma et al.(1995) are present in sizeable software systems (Martin, 1995). However, wealso learned that the effort for the manual recovery of a significant number ofdesign patterns in large-scale systems is infeasible, even with the use of state-of-the-art software comprehension tools, such as SNiFF+. Our study shows thateffective pattern-based reverse engineering of sizable software systems is indeed

Page 21: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

21

feasible, but that it requires both support from pattern analysis tools and tech-niques, as well as the cognitive strength of the human analyzer.

In this chapter we have discussed the SPOOL environment for the pattern-based recovery of design components. We assessed our technology based onthree case studies taken from industrial C++ software systems. The visualizationtechnique of growing bounding boxes around the reference classes of pattern-based design components proved very helpful to gain an immediate understand-ing about the nature of the patterns in the software under investigation. In mostcases, the size of the bounding box indicated if the recovered design componentalso carried the intent of the respective pattern. Advanced tool support compris-ing extraction of the design component into a separate diagram and design navi-gation helped verify the existence of the pattern.

Beyond extending the SPOOL environment with additional visual aids, weplan to work in five areas related to this study. First, we will continue conductingstudies about specific design patterns and idioms, covering further patterns fromthe catalogue of Gamma et al. (1995) and beyond (Schauer et al., 1999; Kellerand Schauer, 2000). Second, we wish to extend our repository to capture all ma-jor constructs of C++ and to cover additional programming languages. Theschema of the repository will be based on multiple logical layers, each increasingthe level of abstraction of the source code models. Third, we plan to supplementour current visualization technique, which is based on bounding boxes aroundthe reference classes of pattern-based design components, with alternative tech-niques. This includes the UML-style pattern notation (UML, 1997) and compo-nent-specific rendering techniques. For example, to convey the essence of theLayers architectural pattern (Buschmann et al., 1996), its classes should be illus-trated top-down according to their association with a layer, or, once jKit/GO(Instantiations, 2000) supports three-dimensional graphic objects, within threedimensional space, each layer being a two-dimensional structure diagram and theconnections among the layers being represented in the third dimension. Fourth,we aim to investigate the recovery of pattern-based design components with full-text, pattern-matching techniques. We believe that analyzing the names of identi-fiers and comments can retrieve much information about patterns. Fifth, we willintegrate our environment with the suite of software comprehension tools of BellCanada, including source code parsers for several programming languages, atool for clone detection, and an environment for metric analysis. Such integrationwill provide the software quality assessment team of Bell Canada with an indus-trial-strength environment that can support them in the assessment of suppliersoftware for maintenance and evolution.

Acknowledgements

We would like to thank the following organizations for providing us with li-censes of their tools: Bell Canada for the source code parser Datrix, LucentTechnologies for the layout generators Dot and Neato as well as for the C++

Page 22: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

22

source code analyzer GEN++ used in the previous versions of SPOOL, andTakeFive Software for the SNiFF+ software development environment.

Y.6 References

Antoniol, G., Fiutem, R., and Cristoforetti, L. (1998). Design pattern recovery in object-oriented software. In Proceedings of the Sixth International Workshop on ProgramComprehension, pages 153-160, Ischia, Italy, June 1998.

Appleton, B. (1998). Patterns and software: Essential concepts and terminology. On-lineat <http://www.enteract.com/~bradapp/docs/>.

Baniassad, E. L. A. and Murphy, G. (1998). Conceptual module querying for softwarereengineering. In Proceedings of the 20th International Conference on SoftwareEngineering, Kyoto, Japan, pages 64-73, April 1998.

Beck, K. and Johnson, R. (1994). Patterns generate architectures. In Proceedings of the13th European Conference on Object-Oriented Programming, pages 139-149.Springer-Verlag, LNCS 821.

Biggerstaff, T. J. (1989). Design recovery for maintenance and reuse. IEEE Computer,22(7):36-49, July 1989.

Booch, G. (1996). Object Solutions: Managing the Object-Oriented Project. Addison-Wesley.

Brown, K. (1997). Design reverse-engineering and automated design pattern detection inSmalltalk. On-line at <http://hillside.net/patterns/papers/>.

Buschmann, F. et al. (1996). Pattern-Oriented Software Architecture - A System of Pat-terns. John Wiley and Sons.

Butler, G., Keller, R. K., and Mili, H. A framework for framework documentation. ACMComputing Surveys, 32(1), March 2000. Symposium on Object-Oriented Applica-tion Frameworks. To appear.

Datrix (2000). Datrix homepage. Bell Canada. On-line at<http://www.iro.umontreal.ca/labs/gelo/datrix/>.

Fayad, M. and Schmidt, D. C. (1997). Object-oriented application frameworks. Commu-nications of the ACM, 40(10):32-40, October 1997.

Gamma E. et al. (1995). Design Patterns: Elements of Reusable Object-Oriented Soft-ware. Addison-Wesley.

IBM (1999). IBM–alphaWorks. XML Parser for Java. On-line at<http://www.alphaworks.ibm.com/tech/xml/>.

ICESoft (2000). ICESoft A/S, Bergen, Norway. ICEBrowser. Online documentation. On-line at <http://www.icesoft.no/>.

Instantiations (2000). jKit/GO online documentation. Instantiations, Tualatin, OR. On-line at <http://www.instantiations.com/>.

Johnson, R. (1992). Documenting frameworks using patterns. In Proceedings of the Con-ference on Object-Oriented Programming: Systems, Languages and Applications,pages 63-76, Vancouver, B.C., October 1992.

Keller, R. K. and Schauer, R.. (1998). Design components: towards software composi-tion at the design level. In Proceedings of the 20th International Conference onSoftware Engineering, Kyoto, Japan, pages 302-310, April 1998.

Page 23: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Pattern-Based Design Recovery with SPOOL

23

Keller, R. K. et al. (1999). Pattern-based reverse engineering of design components. InProceedings. of the 21st International Conference on Software Engineering, LosAngeles, pages 226-235, May 1999.

Keller, R. K. and Schauer R. (2000). Towards a quantitative assessment of method re-placement. In Proceedings of the Fourth Euromicro Working Conference on Soft-ware Maintenance and Reengineering, pages 141-150, Zurich, Switzerland,February 2000. IEEE.

Kontsofios, E. and North S. C. (2000). Drawing graphs with Dot. AT&T Bell Laborato-ries, Murray Hill, NJ. On-line at http://www.reasearch.att.com/sw/tools/graphviz>.

Kraemer, C. and Prechelt, L. (1996). Design recovery by automated search for structuraldesign patterns in object-oriented software. In Proceedings of the Working Confer-ence on Reverse Engineering, pages 208-215, Monterey, CA, November 1996.

Martin, R. (1995). Discovering design patterns in existing applications. In J. Coplien andD.C. Schmidt, editors, Pattern Languages of Program Design, pages 365-393, Ad-dison-Wesley.

North S. C. (2000). NEATO User's Manual. AT&T Bell Laboratories, Murray Hill, NJ.On-line at <http://www.research.att.com/sw/tools/graphviz/>.

Odenthal, G. and Quibeldey-Cirkel, K. (1997). Using patterns for design and documenta-tion. In Proceedings of the 11th European Conference on Object-Oriented Pro-gramming, Jyvaskyla, Finland, pages 511-529, June 1997.

OMG (1998). Object Management Group (OMG). XML Metadata Interchange (XMI).Document ad/98-10-05, October 1998. On-line at<ftp://ftp.omg.org/pub/docs/ad/98-10-05.pdf/>.

Rich, C. and Waters R. (1988). The programmer's apprentice: A research overview. IEEEComputer 21(11):11-24, November 1998.

Robitaille, S., Schauer, R., and Keller, R. K. (2000). Bridging program comprehensiontools by design navigation. In Proceedings of the International Conference on Soft-ware Maintenance (ICSM'2000), San Jose, CA, October 2000. IEEE. To appear.

Saint-Denis, G., Schauer, R., and Keller, R. K. (2000). Selecting a model interchangeformat. The SPOOL case study. In Proceedings of the Thirty-Third Annual HawaiiInternational Conference on System Sciences, Maui, HI, January 2000. Provided onCD-ROM, 10 pages.

Sax (2000). Sax 1.0: The simple API for XML. On-line at<http://www.megginson.com/SAX/>.

Schauer, R. et al. (1999). Hot spot recovery in object-oriented software with inheritanceand composition template methods. In Proceedings of the International Conferenceon Software Maintenance (ICSM'99), pages 220-229, Oxford, Britain, August 1999.IEEE.

Sun (2000). Sun Microsystems Inc. Java Foundation Classes (JFC) documentation. On-line at <http://www.javasoft.com/products/jfc/index.html>.

TakeFive (2000). TakeFive Software. SNiFF+ dcumentation set. On-line at<http://www.takefive.com/>.

UML (1997). Documentation set version 1.1. On-line at <http://www.rational.com>.W3C (1998). World Wide Web Consortium (W3C). Extensible Markup Language

(XML) 1.0. W3C recommendation, February 1998. On-line at<http://www.w3.org/TR/1998/REC-xml-19980210>.

Page 24: Y The SPOOL Approach to Pattern-Based Recovery … › ~keller › Publications › Papers › 2002 › ...are the implementations of operations. Free functions and operators are stored

Keller et al.

24

W3C (1999). World Wide Web Consortium (W3C). Document Object Model (DOM).W3C recommendation, June 1999. On-line at <http://www.w3.org/TR/PR-DOM-Level-1>.

Weinand, A., Gamma, A., and Marty, R.. (1989). Design and implementation of ET++, aseamless object-oriented application framework. Structured Programming,10(2):63-87, February 1989.