73
Aiding Program Comprehension by Static and Dynamic Feature Analysis Thomas Eisenbarth 1 , Rainer Koschke 2 , Daniel Simon 3 1 Axivion GmbH 2 Universit¨ at Bremen 3 SQS ICSM 2011 Presentation of Most-Influential Paper ICSM 2001

ICSM'01 Most Influential Paper - Rainer Koschke

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: ICSM'01 Most Influential Paper - Rainer Koschke

Aiding Program Comprehension by Static and DynamicFeature Analysis

Thomas Eisenbarth1, Rainer Koschke2, Daniel Simon3

1Axivion GmbH 2Universitat Bremen 3SQS

ICSM 2011Presentation of Most-Influential Paper ICSM 2001

Page 2: ICSM'01 Most Influential Paper - Rainer Koschke

This paper was joint work with my two colleagues.These are the three authors at the time of the publication, ten years ago.Left you have Thomas Eisenbarth and at the right you see Daniel Simon.Unfortunately, they cannot be here. They want me to send their bestregards. They are – like me – very honored by this award.

Page 3: ICSM'01 Most Influential Paper - Rainer Koschke

Here are two more current photographs of them.They have not changed much. That is no surprise since their mainexpertise is in maintenance.

Page 4: ICSM'01 Most Influential Paper - Rainer Koschke

I remember ICSM 2001 very well. It was in a great location. In Florence.Florence has so many attractions.

Page 5: ICSM'01 Most Influential Paper - Rainer Koschke

Florence is full of so many attractions and beauty.It was a real surprise that someone showed up at my talk at Florence.

Page 6: ICSM'01 Most Influential Paper - Rainer Koschke

Before I tell you more about the content of the paper, I would like to tellyou a bit about the history of the paper itself, that is, its developmentprocess.

Page 7: ICSM'01 Most Influential Paper - Rainer Koschke

Hintergrund

Die Entwicklung ähnlicher Produkte als Produktlinie– oder Produktfamilie – bietet gegenüber der relativteuren Einzelsystementwicklung viele Vorteile, dieüberwiegend darauf beruhen, daß alle Familienmit-glieder auf einer gemeinsamen Infrastruktur – auchPlattform oder Architektur genannt – aufbauen. Wäh-rend in anderen Industriebranchen, wie z.B. demAutomobilbau oder der Unterhaltungsindustrie, dieVorteile der Produktlinienentwicklung längst systema-tisch genutzt werden, werden die meisten Softwaresy-steme nach wie vor als teure Einzelstücke gefertigt.

Dabei kann speziell die Softwareentwicklung vonProduktlinien profitieren: zum Beispiel durch Zeit-und Kostenersparnis bei der Entwicklung neuer, ähnli-cher Produkte oder durch höhere Produktqualität auf-grund eines hohen Wiederverwendundgsanteilsexistierender und bereits erprobter Komponenten.Auch das Anpassen von Standardprodukten an beson-dere Kundenwünsche wird durch vorab geplanteVariabilität erleichtert. Produktlinien decken naturge-mäß den gesamten Softwarelebenszyklus ab, daherintegrieren sie viele andere Themenbereiche wieRequirements Engineering, Softwarearchitekturenund Reengineering.

Nach etwa einem Jahrzehnt der Forschung erfahrenProduktlinien für Softwaresysteme immer mehr Auf-merksamkeit, was sich in der zunehmenden Anzahlinternationaler Veranstaltungen zu diesem Themen-kreis niederschlägt. Auch in Deutschland stoßen Pro-duktlinien und benachbarte Themengebiete auf immermehr Interesse, was sich unter anderem an der Beteili-gung verschiedener Organisationen an europäischenProjekten wie z.B. PRAISE und ESAPS zeigt.

Ziel des Workshops

Der Workshop hat zum Ziel, einen Erfahrungsaus-tausch zwischen Industrie und Forschung im Bereichder Software-Produktlinien und angrenzender The-menbereiche zu ermöglichen.

Themengebiete

Beiträge, vor allem, aber nicht ausschließlich zu denfolgenden Themen, sind willkommen:• Planung von Produktlinien• Requirements Engineering für Produktlinien• Modellierung von Produktlinien• Verfolgbarkeit von Anforderungen• Konfigurationsmanagement für Produktlinien• Definition von Softwarearchitekturen• Recovery von Softwarearchitekturen• Referenzarchitekturen für Produktlinien• Weiterentwicklung von Architekturen• Komponententechnologie für Produktlinien• Reengineering im Hinblick auf Produktlinien• Industrielle Erfahrungen mit Produktlinien• Produktlinien für KMUs• Einführung von Produktlinienansätzen

Beiträge sind in elektronischer Form (PDF oderPostScript) an [email protected] einzureichen; derUmfang der Beiträge sollte fünf Seiten nicht über-schreiten. Weitere Informationen sind unterhttp://www.iese.fhg.de/dspl-workshop verfügbar.

Termine:Einsendung von Beiträgen: 31.8.2000Benachrichtigung über die Annahme: 1.10.2000Einsendung der endgültigen Version: 20.10.2000Versand des endgültigen Programms: 25.10.2000

Programmkommitee:• Dr. P. Knauber (Fraunhofer IESE)• Prof. Dr. K. Pohl (Universität Essen)

• Prof. Dr. C. Atkinson (Universität Kaiserslautern)• Dr. G. Böckle (Siemens AG)• Dr.-Ing. K. Czarnecki (DaimlerChrysler AG)• Prof. Dr. U. Eisenecker (FH Kaiserslautern)• Prof. Dr. E. Plödereder (Universität Stuttgart)• Prof. Dr. W. Pree (Universität Konstanz)• Prof. Dr. D. Rombach (Fraunhofer IESE)• S. Thiel (Robert Bosch GmbH)• R. Trauter (DaimlerChrysler AG)• Dr. M. Verlage (Market Maker Software AG)

*OTUJUVU

&YQFSJNFOUFMMFT

4PGUXBSF &OHJOFFSJOH

'SBVOIPGFS

*&4&

Call for Papers

1. Deutscher Software-Produktlinien Workshop

Kaiserslautern, 10. November 2000

The initial trigger for the idea of our paper was the call for paper of aGerman software product line workshop.

Page 8: ICSM'01 Most Influential Paper - Rainer Koschke

In software product lines, they have these product-feature maps thatdescribe the commonalities and differences of the products with respectto their features as a table.

Page 9: ICSM'01 Most Influential Paper - Rainer Koschke

At that time, there was a German professor, Gregor Snelting, whointroduced formal concept analysis in software engineering.I taught formal concept analysis as part of my reengineering class.

Page 10: ICSM'01 Most Influential Paper - Rainer Koschke

Concept analysis allows you to analyze such tables. In mathematicalterms, concept analysis is a technique to analyze the structure ofarbitrary binary relations.We proposed in that German workshop to use concept analysis to analyzesuch product-feature maps in software product lines.I will describe it later in more detail.

Page 11: ICSM'01 Most Influential Paper - Rainer Koschke

However, we were more interested in program analysis than inrequirement engineering.Another problem they have in product lines is to identify the componentsnecessary to implement a feature, which is needed to identify re-usablecomponents to be used in product lines.So we decided to use formal concept analysis to search where features areimplemented in the code.

Page 12: ICSM'01 Most Influential Paper - Rainer Koschke

1

Derivation of Feature Component Maps by means of Concept Analysis

Thomas Eisenbarth, Rainer Koschke, Daniel Simon

University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany{eisenbts, koschke, simondl}@informatik.uni-stuttgart.de

Abstract

Feature component maps describe which componentsare needed to implement a particular feature and are usedearly in processes to develop a product family based onexisting components. This paper describes a new tech-nique to derive the feature component map and additionaldependencies utilizing dynamic information and conceptanalysis. The method is simple to apply, cost-effective,largely language independent, and can yield resultsquickly and very early in the process.

1. Introduction

Developing similar products as product families prom-ises several advantages over relatively expensive separatedevelopments, like lesser costs and shorter time for devel-opment, test, and maintenance. These advantages arebased on the fact that all family members share a commoninfrastructure – also known as platform architecture. Thereare many approaches to newly developing product familiesfrom scratch [2, 11]. However, according to Martinez [16],most successful examples of product families at Motorolaoriginated in a single separate product. Only in the courseof time, a shared architecture for a product family evolved.Moreover, large investments impose a reluctance againstintroducing a product family approach that ignores exist-ing assets. Hence, an introduction of a product familyapproach has generally to cope with existing code.

Reverse engineering may help creating a product fam-ily for existing systems by identifying and analyzing thecomponents and also by deriving the individual architec-ture from each system. These individual architectures maythen be unified to a platform architecture and the derivedcomponents may be used to populate the unified architec-ture. To this end, code needs to be adjusted, reengineered,or wrapped. However, changing or wrapping the code isonly done in very late phases in moving toward a productfamily. Reverse engineering can also assist in earlierphases and, thus, Bayer et al. rightly demand an early inte-gration of reverse engineering into a product familyapproach [1]. Early reverse engineering is needed to derivefirst coarse information on existing system components(assets) timely needed by a product family analyst toinvestigate feasibility and to estimate costs of differentalternative ways to get to a suitable product family archi-

tecture.One important piece of information for a product fam-

ily analysis that tries to integrate existing assets is the so-calledfeature component mapthat describes which com-ponents are needed to implement a particular feature. Afeature is a realized (functional as well as non-functional)requirement (the termfeature is intentionally weaklydefined because its exact meaning depends on the specificcontext).Componentsare computational units of a soft-ware architecture (see Section 3.1). Because the featurecomponent map is needed very early to trade off alterna-tives in good time, complete and hence time-consumingreverse engineering of the system is out of the question. Inparticular, the decision for a certain alternative will lead toa consolidation on specific economically important corecomponents in many cases and hence to an exclusion ofless important components. Any investment in a deep andcostly pre-analysis of less important components would bein vain to a large degree. Instead, reverse engineering inearly phases should give information on the feature com-ponent map quickly and with simple means. To this end,the product line analyst imparts all relevant features, forwhich the necessary components need to be detected, tothe reverse engineer who in turn delivers the feature com-ponent map. On the basis of the feature component mapand additional economic reasons, a decision is made forparticularly interesting and required components, and fur-ther expensive analyses regarding quality can be cost-effectively aimed at selected components.

This paper describes a quickly realizable technique toascertain the feature component map based on dynamicinformation (gained from execution traces) and conceptanalysis. The technique is automatic to a great extent.Concept analysis is a mathematical technique to investi-gate binary relations (see Section 2).

Integration into a Product Family Process. A simpleprocess for feature-based reengineering toward productfamilies can be described as follows:

1. The economically relevant features are ascertained byproduct family engineers and market analysts.

2. The feature component map is derived based on theidentified relevant features.

3. The previously derived feature component map givesadditional insights into dependencies among features

2

and components and, hence, into feasibility and costsof different alternative product family platforms. Theknowledge gained from the feature component mapand additional economic considerations may lead to afurther selection of only a certain subset of all featuresand their corresponding components.

4. The selected components are more closely analyzed,for instance, with respect to maintainability, extract-ability, and integrability.

5. A product family platform is designed. Alternativesfor components to populate the product family plat-form are weighed: component extraction and reengi-neering, new development, integration of COTS, orwrapping.

6. A migration plan is prepared.

The technique described in this article is used to derivethe feature component map which plays a central roleearly in this process.

Overview. The technique described here is based on theexecution traces generated by a profiler for different usagescenarios (see Figure 1). One scenario represents the invo-cation of one single feature and yields all subprogramsexecuted for this feature. These subprograms identify thecomponents (or are themselves considered components)required for a certain feature. The required components forall scenarios and the set of features are then subject to con-cept analysis. Concept analysis gives information on rela-tionships between features and required components aswell as feature-feature and component-component depen-dencies.

We want to point out that not all non-functionalrequirements, e.g., time constraints, can be easily mappedto components, i.e., our technique primarily aims at func-tional features. However, in some cases, it is possible toisolate non-functional aspects, like security, in code andmap them to specific components. For instance, one couldconcentrate all network accesses in one single componentto enable controlled secure connections.

The remainder of this article is organized as follows.Section 2 introduces concept analysis. Section 3 explainshow concept analysis can be used to derive the feature

component map and Section 4 describes our experiencewith this technique in a case study. Section 5 discussesrelated research.

2. Concept Analysis

Concept analysis is a mathematical technique that pro-vides insights into binary relations. The mathematicalfoundation of concept analysis was laid by Birkhoff in1940. It has already been successfully used in other fieldsof software engineering. The binary relation in our specificapplication of concept analysis to derive the feature com-ponent map states which components are required when afeature is invoked. This section describes concept analysisin more detail.

Concept analysis is based on a relationR between a setof objectsO and a set of attributesA, henceR ⊆ O × A.

The tupleC = (O, A, R) is calledformal context. For aset of objects,O ⊆ O, the set ofcommon attributes, σ, isdefined as:

Analogously, the set ofcommon objects, τ, for a set ofattributes,A⊆ A, is defined as:

In Section 3.1, the formal context for applying conceptanalysis to derive the feature component map will be laiddown as follows;

• components will be considered objects,

• features will be considered attributes,

• a pair (component c, feature f) is in relationR if c isexecuted whenf is invoked.

However, here – for the time being – we will use as anabstract example the binary relation between arbitraryobjects and attributes shown in Table 1. An objectoi hasattributeaj if row i and columnj is marked with an✕ inTable 1 (the example stems from Lindig and Snelting [7]).For instance, the following equations hold for this table,also known asrelation table:

and

A pair (O, A) is calledconceptifholds, i.e., all objects share all attributes. For a conceptc =(O, A), O is theextent of c, denoted byextent(c), andA is

Figure 1. Overview.

feature Fusage scenario

execution tracerequired components C1 …Cn

(F, C1), …(F, Cn) ∈Rconcept analysis

feature component mapand dependencies

a1 a2 a3 a4 a5 a6 a7 a8

o1 ✕ ✕

o2 ✕ ✕ ✕

o3 ✕ ✕ ✕ ✕ ✕

o4 ✕ ✕ ✕ ✕ ✕ ✕

Table 1: Example relation.

σ O( ) a A∈ o O∈( ) o a,( ) R∈∀{ }=

τ A( ) o O∈ a A∈( ) o a,( ) R∈∀{ }=

σ o1{ }( ) a1 a2,{ }= τ a7 a8,{ }( ) o3 o4,{ }=

A σ O( )= O∧ τ A( )=

3

the intent of c, denoted byintent(c).Informally, a concept corresponds to a maximal rectan-

gle of filled table cells modulo row and column permuta-tions. For example, Table 2 contains the concepts for therelation in Table 1.

The set of all concepts of a given formal context formsa partial order via:

or equivalently with

.

If c1 ≤ c2 holds, thenc1 is called asubconceptof c2

andc2 is calledsuperconcept of c1. For instance,({ o2, o4}, { a3, a4, a5}) ≤ ({ o2, o3, o4}, { a3, a4}) is true inTable 2.

The set of all concepts of a given formal context andthe partial order≤ form a complete lattice, calledconceptlattice L:

The infimum of two concepts in this lattice is com-puted by intersecting their extents as follows:

The infimum describes a set of common attributes oftwo sets of objects. Similarly, thesupremum is deter-mined by intersecting the intents:

The supremum ascertains the set of common objects,which share all attributes in the intersection of two sets ofattributes.

Graphically, the concept lattice for the example relationin Table 1 can be represented as a directed acyclic graphwhose nodes represent concepts and whose edges denotethe superconcept/subconcept relation < as shown inFigure 2. The most general concept is called thetop ele-ment and is denoted by . The most special concept iscalled thebottom element and is denoted by .

The combination of the graphical representation inFigure 2 and the contents of the concepts in Table 2together form the concept lattice. The complete informa-

tion can be visualized in a more readable equivalent wayby marking only the graph node with an attributea ∈ Awhose represented concept is the most general conceptthat hasa in its intent. Analogously, a node will be markedwith an objecto ∈ O if it represents the most special con-cept that haso in its extent. The unique elementµ in theconcept lattice marked witha is therefore:

(1)

The unique elementγ marked with objecto is:

(2)

We will call a graph representing a concept latticeusing this marking strategy asparse representation. Theequivalent sparse representation for Figure 2 is shown inFigure 3. The content of a nodeN in this representationcan be derived as follows:

• the objects ofN are all objects at and belowN,

• the attributes ofN are all attributes at and aboveN.

For instance, the node in Figure 3 marked witho2 anda5 is the concept ({o2, o4}, {a3, a4, a5}).

3. Feature Component Map

In order to derive the feature component map via con-cept analysis, one has to define the formal context(objects, attributes, relation) and to interpret the resultingconcept lattice accordingly.

3.1. Context for Feature and Components

Components will be considered objects of the formalcontext, whereas features will be considered attributes.Note that in the reverse case, the concept lattice is simplyinverted but the derived information will be the same.

The set of relevant features will be determined by theproduct family experts. For components, we can considerthe following alternatives depending on how much knowl-edge on the system architecture is already available:

C1 ({o1, o2, o3, o4}, ∅)

C2 ({o2, o3, o4}, {a3, a4})

C3 ({o1}, {a1, a2})

C4 ({o2, o4}, {a3, a4, a5})

C5 ({o3, o4}, {a3, a4, a6, a7, a8})

C6 ({o4}, {a3, a4, a5, a6, a7, a8})

C7 (∅, {a1, a2, a3, a4, a5, a6, a7, a8})

Table 2: Concepts for Table 1.

O1 A1,( ) O2 A2,( )≤ O1 O2⊆⇔

O1 A1,( ) O2 A2,( )≤ A1 A2⊇⇔

L C( ) O A,( ) 2O

2A×∈ A σ O( )= O τ A( )=∧{ }=

O1 A1,( ) O2 A2,( )∧ O1 O2∩ σ O1 O2∩( ),( )=

O1 A1,( ) O2 A2,( )∨ τ A1 A2∩( ) A1 A2∩,( )=

Figure 2. Concept lattice for Table 1.

Figure 3. Sparse representation of Figure 2.

C2

C1

C4 C5C3

C7

C6 <

µ a( ) c L C( )∈ a intent c( )∈{ }∨=

γ o( ) c L C( )∈ o extent c( )∈{ }∧=

o3o2o4

o1

a1, a2 a5 a6, a7, a8

a3, a4

<

4

1. cohesive modulesandsubsystemsas defined and doc-umented by the system’s architects or re-gained by re-engineers; modules and subsystems will be consid-eredcomposite components in the following;

2. physical modules, i.e., modules as defined by meansof the underlying programming language or simplydirectly available as existing files (the distinction tocohesive modules is that one does not know a prioriwhether physical modules really group cohesive dec-larations; physical modules are the unscrutinizedresult of a programmer’s way of grouping declara-tions whether it makes sense or not);

3. subprograms, i.e., functions and procedures, andglo-bal variablesof the system; subprograms and globalvariables will be calledlow-level componentsin thefollowing.

Ideally, one will use alternative (1) when reliable andcomplete documentation exists. However, if cohesivemodules and subsystems are not known in advance, onewould hardly make the effort to analyze a large system toobtain these in order to apply concept analysis to get thefeature component map because it not yet clear whichcomponents are relevant at all and reverse engineering ofthe complete system first will likely not be cost-effective.Only later, if the retrieved feature component map (usingsimpler definitions of components, like those in (2) or (3))clearly shows which lower-level components should beinvestigated further to obtain composite components,reverse engineering may generally pay off (in order todetect cohesive modules, we have developed a semi-auto-matic method integrating many automatic state-of-the-arttechniques [9]).

Alternative (2) can be chosen if suitable documentationis not available but there is reason to trust the program-mers of the system to a great extent. In all other cases, onewill fall back on alternative (3). However, for alternative(3), concept analysis may additionally yield hints on setsof related subprograms forming composite components.

The relation for the formal context necessary for con-cept analysis is defined as follows:

(C, F) ∈ R if and only if componentC is requiredwhen featureF is invoked; a subprogram isrequired when it needs to be executed; a globalvariable is required when it is accessed (used orchanged); a composite component is required whenone of its parts is required.

In order to obtain the relation, a set of usage scenariosneeds to be prepared where each scenario exploits prefera-bly only one relevant feature. Then the system is usedaccording to the set of usage scenarios, one at a time, andthe execution traces are recorded. An execution trace con-tains all required low-level components for a usage sce-

nario or an invoked feature, respectively. If compositecomponents are used for concept analysis, the executiontrace containing the required low-level componentsinduces an execution trace for composite components byreplacing each low-level component with the compositecomponent to which it belongs. Hence, each system runyields all required components for a single scenario thatexploits one feature. Thus, a single column in the relationtable can be obtained per system run. Applying all usagescenarios provides the relation table.

An execution trace can be recorded by a profiler. How-ever, most profilers only record subprogram calls but notaccesses to variables. Instead of using a symbolic debug-ger, for example, that allows to set watchpoints on variableaccesses, or even to instrument the code if no sophisticatedprofiler is available, one can also use a simple staticdependency analysis: One considers all variables directlyand statically accessed for each executed subprogram alsoto be dynamically accessed (all transitively accessed vari-ables will automatically be considered because all exe-cuted subprograms are examined). In practice, thisanalysis may be a sufficient approximation. But oneshould be aware that it may overestimate referencesbecause variable accesses may be included that are onpaths not executed at runtime, and it will also ignore refer-ences to variables by means of aliases if the simple staticdependency analysis does not take aliasing into account.For a first analysis to obtain a simplified feature compo-nent map, one can also ignore variables and come back tothese in a later phase using more sophisticated dynamic orstatic analyses.

3.2. Interpretation of the Concept Lattice

Concept analysis applied to the formal contextdescribed in the last section gives a lattice, from whichinteresting relationships can be derived. These relation-ships can be fully automatically derived and presented tothe analyst such that the more complicated theoreticalbackground can be hidden. The only thing an analyst hasto know is how to interpret the derived relationships. Thissection explains how interesting relationships can be auto-matically derived.

As already abstractly described in Section 2, the fol-lowing base relationships can be derived from the sparserepresentation of the lattice (note the duality in the inter-pretation):

• A component,c, is required for all features at andaboveγ(c) – as defined by (1) – in the lattice.

• A feature,f, requires all components at and belowµ(f)– as defined by (2) – in the lattice.

• A component,c, is specific to exactly one feature,f, iff is the only feature on all paths fromγ(c) to the top

5

element.

• A feature,f, is specific to exactly one component,c, ifc is the only component on all paths fromµ(f) to thebottom element (i.e,c is the only component requiredto implement featuref).

• Features, to which two components,c1 andc2, jointlycontribute, can be identified byγ(c1) ∧ γ(c2); graphi-cally depicted, one ascertains in the lattice the closestcommon node toward the top element starting at thenodes to whichc1 andc2, respectively, are attached;all features at and above this common node are thosejointly implemented by these components.

• Components jointly required for two features,f1 andf2, are described byµ(f1) ∨ µ(f2); graphicallydepicted, one ascertains in the lattice the closest com-mon node toward the bottom element starting at thenodes to whichf1 andf2, respectively, are attached; allcomponents at and below this common node are thosejointly required for these features.

• Components required for all features can be found atthe bottom element.

• Features that require all components can be found atthe top element.

• If the top element does not contain features, then allcomponents in the top element are superfluous (suchcomponents will not exist when the set of objects forconcept analysis contains only components executedat least once, which is the case if a filter ignores allsubprograms for which the profiler reports an execu-tion count of 0).

• If the bottom element does not contain any compo-nent, all features in the bottom element are not imple-mented by the system (this constellation will notexist, if there is a usage scenario for each feature andevery usage scenario is appropriate and relevant to thesystem; a system may indeed not have all features,i.e., a usage scenario may be meaningless for a givensystem).

Beyond these relationships between components andfeatures, further useful aspects between features on onehand and between components on the other hand may bederived:

• If γ(c1) < γ(c2) holds for two componentsc1 andc2,then componentc2 requires componentc1.

• If µ(f1) < µ(f2) holds for two featuresf1 and f2, thenfeaturef1 is based on featuref2.

One has to note that the latter relationship between fea-tures safely holds for the analyzed system only, i.e., thisrelationship is not necessarily true for the features as such,

because the relationship was derived only from a specificimplementation.

The information described above can be derived by atool and fed back to the product family expert. As soon asa decision is made re-use certain features, all componentsrequired for these features (easily derived from the con-cept lattice) form a starting point for further analyses toinvestigate quality (like maintainability, extractability, andintegrability) and to estimate effort for subsequent steps(wrapping, reengineering, or re-development fromscratch).

3.3. Implementation

The implementation of the described approach is sur-prisingly simple (if one already has a tool for conceptanalysis). Our prototype for a Unix environment is anopportunistic integration of the following parts:

• Gnu C compilergcc to compile the system using acommand line switch for generating profiling infor-mation,

• Gnu object code viewernm and a short Perl script inorder to identify all functions of the system (asopposed to those included from standard libraries),

• Gnu profilergprof and a short Perl script to ascertainthe executed functions in the execution trace,

• concept analysis toolconcepts [8],

• graph editorGraphlet[3] to visualize the concept lat-tice,

• and two more short Perl scripts to convert the file for-mats of concepts and Graphlet (all Perl scriptstogether have just 147 LOC).

The fact that the subprograms are extracted from theobject code makes the implementation independent fromthe programming language to a great extent (as long as thelanguage is compiled to object code) and has the advan-tage that no front end is necessary. On the other hand,because a compiler may replace source names by linknames in the object code (for instance, C++ compilers usename mangling to resolve overloading) there is not alwaysa direct mapping from the subprograms in the executiontrace back to the original source. Because we dealt in ourcase study with C code, object code names were identicalto source names. If this is not the case, one either toleratesdivergences between names (mostly, names are similarenough) or has to reverse name mangling.

4. Case Study

As a case study, we analyzed the Xfig system [18](version 3.2.1) consisting of about 76 KLOCs written inthe programming language C. In this section, we will

6

{setup_ind_panel

set_line_stuff

set_cursor

create_bitmaps

process_pending

redisplay_zoomed_region

...

main

mode_balloon draw_mousefun_toprulernode

4:

{create_mouse

cmd_balloon}

XRotDrawString

draw_shift_mousefun_canvas

clear_mousefun_kbd

draw_mousefun_kbd

check_cancel

textsize

pw_text

lookfont

set_latesttext

in_text_bound

text_search

redisplay_text

toggle_textmarker

last_text

add_text

list_add_text

x_fontnum

draw_text

new_string

create_text

text_bound

erase_char_string

char_handler

draw_char_string

finish_text_input

text_drawing_selected

mouse_balloon

print_to_file

XRotDrawAlignedImageString

node

8:

{arrow_bound}

append_point

create_point

set_latestline

redisplay_line

last_line

add_line

list_add_line

draw_line

create_line

line_bound

boxsize_msg

resizing_box

elastic_box

pw_arcbox

draw_arcbox

arcbox_drawing_selected

clip_arrows

altlength_msg

length_msg

erase_lengths

compute_angle

resizing_poly

elastic_poly

regpoly_drawing_selected

draw−rectangle.mon

draw−polyline.mon

draw−polygone.mon

erase_box_lengths

init_box_drawing

box_drawing_selected

unconstrained_line

elastic_line

free_points

elastic_moveline

cancel_line_drawing

get_intermediatepoint

init_trace_drawing

create_lineobject

line_drawing_selected

set_latestspline

draw_spline

redisplay_spline

last_spline

add_spline

list_add_spline

create_spline

create_sfactor

spline_bound

spline_drawing_selected

make_sfactors

pw_curve

set_latestellipse

redisplay_ellipse

center_marker

last_ellipse

add_ellipse

list_add_ellipse

create_ellipse

ellipse_bound

draw_ellipse

resizing_ebr

elastic_ebr

ellipsebyradius_drawing_selected

resizing_ebd

elastic_ebd

ellipsebydiameter_drawing_selected

esizing_cbr

elastic_cbr

circlebyradius_drawing_selected

resizing_cbd

elastic_cbd

circlebydiameter_drawing_selected

set_latestarc

redisplay_arc

last_arc

add_arc

list_add_arc

compute_direction

compute_arccenter

draw_arc

create_arc

arc_bound

arc_drawing_selected

40

41 4244

1 39

38

32

Figure 4. Lattice for the first experiment

43

23

4

5

6

see Figure 6

concept

The taller a concept is, the morecomponents it contains.

7

firstly present a general overview of the results and sec-ondly go into further details for particular interestingobservations.

Xfig is a menu-driven tool that allows the user to drawand manipulate objects interactively under the X WindowSystem. Objects can be lines, polygons, circles, rectangles,splines, text, and imported pictures. An interesting firsttask in our case study was to define what constitutes a fea-ture. Clearly, the capability to draw specific objects, likelines, splines, rectangles, etc., can be considered a featureof Xfig. Moreover, one can manipulate drawn objects indifferent edit modes (rotate, move, copy, scale, etc.) withXfig. Hence, we considered as main features the followingtwo capabilities:

1. ability to draw different shapes (lines, curves, rectan-gles, etc.)

2. ability to modify shapes in different editing modes(rotate, move, copy, scale, etc.)

We conducted two experiments. In the first one, weinvestigated the ability to draw different shapes only. Inthe second one, we analyzed the ability to modify shapes.The second experiment exemplifies combined featurescomposed by basic features. For the second experiment, ashape is drawn and then modified. Bothdraw andmodifyconstitute a basic feature. Combined features add to theeffort needed to derive the feature component map as thereare many possible combinations.

In both experiments, we considered subprograms ascomponents. However, in our simple implementation, wedo not handle variable accesses. Hence, not all requiredlow-level components are detected.

The resulting concepts contain subprograms groupedtogether according to their usage for features. Note that themore general subprograms can be found at the lower con-cepts in the lattice since they are used for many features,while specific components are in the upper region of thelattice. Hence, the concept lattice also reflects the level ofabstraction of these subprograms within the given set of

scenarios. To identify all subprograms required for a sin-gle feature or a set of features, one can then analyze theconcept lattice as described in Section 3.2.

First experiment. In our first experiment, we prepared 15scenarios. Each scenario invokes Xfig, performs the draw-ing of one of the objects Xfig provides, and then termi-nates Xfig, i.e., the aspects above were not combined andno other functionality of Xfig was used. We used allshapes of Xfig’s drawing panel shown in Figure 5 exceptpicture objects andlibrary objects.

The resulting lattice for this experiment is shown inFigure 4. The contents of the concepts in the lattice areomitted for readability reasons. However, their size in thispicture is a linear function of their number of components(except for the bottom element that contains 136 compo-nents, mostly initialization and GUI code and very basicfunctions, and was too large to be drawn accordingly; as acomparison point: the text drawing concept, marked asnode #1, has 29 components). As Figure 4 shows, there area few concepts containing most of the components (i.e.,subprograms) of the system. The lattice contains 47 con-cepts. 26 of them introduce at least one new component,i.e., to these nodes, a component is attached (more pre-cisely, a conceptC introduces a component if there exists a

Figure 5. Xfig’s object shapes.

circle by diametercircle by radius

ellipse by diametersellipse by radii

closed approx. spline approximated spline

closed interpol. spline interpolated spline

polygon polyline

rectangular box rectangular boxwith rounded corners

arctext

library object

regular polygon

picture object

7

componentc for which γ(c) = C holds). 21 of the conceptsdo not introduce any new component and merely mergefunctionality needed by several superconcepts.

The first interesting observation is that concepts withmany components can be found in the upper region, whilein the lower region, the number of components decreasesand the number of interferences increases (aninterferenceleads to an unstructured lattice; a lattice is said to be struc-tured if it can be decomposed into independent sublatticesthat are connected via the top and bottom elements only).That is to say that there are many specific operations andfew shared operations and also that shared operations arereally used for many features.

Concept #1 in Figure 4 is the largest concept (exclud-ing the bottom element). It exploits a single feature “drawtext object”. According to the lattice, the feature is largelyindependent from other features and shares only a fewcomponents with other features.

Concept #5 represents the two features “draw polyline”and “draw polygon”. The only difference between thesetwo features is that an additional line is drawn that closes apolygon. This difference is not visible in the concept lat-tice since the two features are attached to the same con-cept. The distinction is made in the body of the functionthat is called to draw either a polygon or a polyline. Con-cept #3 denotes the feature “draw spline”. Concept #4 hasno feature attached and represents the components sharedfor drawing polygons, polylines, and splines. These com-ponents are no real drawing operations but operations tokeep a log of the points set by the user and to draw linesbetween set points while the user is still setting points (aspline first appears as polygon and is only re-shaped whenthe user has set all points).

Concept #2 stands for the feature “draw arc” and con-cept #7 is again a concept that represents shared compo-nents for drawing elastic lines while the user is settingpoints. The difference between concept #7 and concept #4is that the former only contains the components to drawthe elastic line, while the latter adds the capability to set anarbitrary number of points. Splines do not need this capa-bility because they are defined by exactly three points.

Concept #6 represents the feature “draw lines” and isused for drawing rectangles, polygons, and polylines, asone would expect. The generality of this feature becomesimmediately obvious in the concept lattice as it is locatedin the middle level of the lattice.

The framed area in Figure 4 has a simpler structurethan the rest of the lattice. This part deals with circles andellipses and its details are shown in Figure 6. Each node,N, in Figure 6 contains two sets: The upper set contains allcomponents attached to the node, i.e., those components,c, for whichγ(c) = N; the lower set contains all features ofN, including those inherited from other concepts. The

names of the features correspond to the objects drawn viathe panel in Figure 5; e.g.,draw-ellipse-radiusmeans thatan ellipse was drawn where the radius was specified (asopposed to the diameter).

Nodes #41, #42, #43, and #44 represent the features todraw circles and ellipses using either diameter or radius.They all contain three specific components to draw theobject, to plot an elastic bend while the user is drawing,and to resize the object. Note the similarity of the compo-nent names. The specific commonalities among circles andellipses are represented by node #38, which introduces theshared components to draw circles and ellipses (both spec-ified by diameter and radius).

Nodes #32 and #39 connect the circles and ellipses tothe other objects. No components are attached to nodes#32 and #39, they only merge components from differentconcepts. The two nodes have a direct infimum (not shownin Figure 6) and add the same components to the circle andellipse features. The components inherited via these twonodes are very basic components of the lowest regions ofthe lattice, which indicates that ellipses and circles arewidely separate from all other objects.

Second experiment.In a second experiment, we analyzedthe edit moderotate which comes in two variants: clock-wise and counterclockwise. The first ten shapes inFigure 5 were drawn and rotated once clockwise and oncecounterclockwise, which resulted in 20 scenarios. The

Figure 6. Relevant parts forcircles and ellipses

8

resulting lattice contained 55 concepts, most of them intro-duce no new component. We observed that the relatedshapes, i.e., the variants of splines, circles, ellipses, etc.,were merged at the top of the lattice since they use almostthe same components. In order to reduce the size of thelattice, we selected one representative among the relatedshapes and re-run the experiment with three shapes(ellipse, polygon, and open approximated spline). Theresulting lattice is shown in Figure 7.

This lattice consists of 22 concepts, three of them pro-vide the specific functionality for the respective shapes.Concept #1 (21 functions) depicts the functionality forsplines and concept #2 (17 functions) represents the onefor lines (used for polygons). Both are dependent on con-cept #4 (29 functions) that groups functions related topoints. Concept #3 (20 functions) denotes the ellipse fea-ture, concept #5 (29 functions) the general drawing sup-port functionality and concept #6 (123 functions) the start-up and initialization code of the system.

Analyzing concepts #1, #2, and #3, we found that theshapes provide individual rotate functions. In other words,the rotate feature is implemented specific to each shape,i.e., there is no generic component that draws all differentshapes, which would have been an interesting finding interms of reuse.

General observations.We made the experience thatapplying our method is easy in principle. However, run-ning all scenarios by hand is time consuming. It may befacilitated by the presence of test cases that allow an auto-mated replay of various scenarios.

Because Xfig has a GUI, running a single scenario byhand is an easy task. However, one has to pay attention not

to cause interferences by invoking irrelevant features. Forinstance, Xfig uses a balloon help facility that pops up alittle window when the cursor stays some time on a sensi-tive area of the GUI (e.g., over the button selecting the cir-cle drawing mode). Sometimes the balloon helpmechanism triggers, introducing interferences betweenfeatures. Such effects affect the analysis because theyintroduce spurious connections between features. Fortu-nately, this problem can be partly fixed by providing a spe-cific scenario in which only the accidentally invokedirrelevant feature is invoked, which leads to a refactoredconcept lattice that contains a new concept that isolates theirrelevant feature and its components. In our example,interferences due to an accidentally invoked irrelevant fea-ture appeared only at the two layers directly on top of thebottom element of the lattice, and could be more or lessignored.

5. Related Research

The mathematical foundation of concept analysis waslaid by Birkhoff in 1940. Primarily Snelting has recentlyintroduced concept analysis to software engineering. Sincethen it has been used to evaluate class hierarchies [15],explore configuration structures of preprocessor state-ments [10, 14], and to recover components [4,7,12,13].

For feature localization, Chen and Rajlich [5] propose asemi-automatic method, in which an analyst browses thestatically derived dependency graph; navigation on thatgraph is computer-aided. Since the analyst more or lesstakes on all the search, this method is less suited to quicklyand cheaply derive the feature component map. Moreover,the method relies on the quality of the static dependencygraph. If this graph, for example, does not contain infor-mation on potential values of function pointers, the humananalyst may miss functions only called via function point-ers. At the other extreme, if the too conservative assump-tion is made that every function whose address is taken iscalled at each function pointer call site, the search spaceincreases extremely. Generally, it is statically undecidablewhich paths are taken at runtime, so that every static anal-ysis will yield an overestimated search space, whereasdynamic analyses exactly tell which parts are really usedat runtime (though for a particular run only). However,Chen and Rajlich’s technique could be helpful in a laterphase, in which the system needs to be more rigorouslyanalyzed. The purpose of our technique is to derive thefeature component map. It handles the system as a blackbox and, hence, does not give insights in internal aspectswith respect to quality and effort.

Wilde and Scully [17] also use dynamic analysis tolocalize features as follows:

1. Theinvoking input set I(i.e., a set of test cases or – in

Figure 7. Concept lattice for second experiment.

6

5

4

21 3

9

our terminology – a set of usage scenarios) is identi-fied that will invoke a feature.

2. The excluding input set Eis identified that will notinvoke a feature.

3. The program is executed twice usingI and E sepa-rately.

4. By comparison of the two resulting execution traces,the components can be identified that implement thefeature.

Wilde and Scully focus on localizing rather than deriv-ing required components: For deriving all required compo-nents, the execution trace for the including input set issufficient. By subtracting all components in the executiontrace for the excluding input set from those in the execu-tion trace for the invoking input set, only those compo-nents remain that specifically deal with the feature.

Note that our technique achieves the same effect byconsidering several execution traces for different featuresat a time. Components not specific to a feature will “sink”in the concept lattice, i.e., will be closer to the bottom ele-ment. More precisely, recall from Section 3.2 that a com-ponent,c, is specific to exactly one feature,f, if f is theonly feature on all paths fromγ(c) to the top element.

One may argue that components that are only requiredto get the system started, but are not – strictly speaking –directly necessary for any feature will still appear in theconcept lattice when we do not subtract execution tracesfor an excluding input set. It is true that these componentscannot be distinguished from components that in fact con-tribute to all components because both kinds of compo-nents jointly appear at the bottom element. However, theidea of an excluding input set can be taken over to ourtechnique to distinguish these two kinds of components byproviding a usage scenario in which no feature is invoked,like simply starting and immediately shutting down thesystem without invoking any relevant feature. That simpletrick separates the two kinds of components in two distinctconcepts,C1 and C2, in the lattice whereC1 < C2 and

C1= and C2 contains only those components that arereally required for all components in a narrower sense.

Furthermore, our technique goes beyond Wilde andScully’s technique in that it also allows to derive relevantrelationships between components and features by meansof concept analysis, whereas Wilde and Scully’s techniqueonly localizes a feature. The derived relationships are animport information to product family experts and representadditional dependencies that need to be considered in adecision for certain features and components.

6. Conclusions

A feature component map describes which components

are required to implement a particular feature and isneeded at an early stage within a process toward a productfamily platform

• to weigh alternative platform architectures,

• to aim further tasks – like quality assessment – to onlythose existing components that are needed to populatethe platform architecture,

• and to decide on further steps, like reengineering orwrapping.

The technique presented in this paper yields the featurecomponent map automatically using the execution tracesfor different usage scenarios. The technique is based onconcept analysis, a mathematical sound technique to ana-lyze binary relations, which has the additional benefits toreveal not only correspondences between features andcomponents but also dependencies between features andbetween components (feature-feature dependencies arederived from an existing system and, hence, may onlyexist for this particular system but not necessarily for thesefeatures in general).

The technique is primarily suited for functional fea-tures that may be mapped to components. In particularnon-functional features do not easily map to components.For example, for applications for which timing is critical(because it may result in diverging behavior), the featureswould also have to take time into account.

Note also that the technique is not suited for featuresthat are only internally visible, like whether a compileruses a certain intermediate representation. Strictly speak-ing, internal features may be viewed as implementationdetails. However, such implementation details may be ofinterest for defining a product family architecture. Internalfeatures can only be detected by looking at the source,because it is not clear how to invoke them from outsideand how to derive from an execution trace whether thesefeatures are present or not. However, we assume thatexternally visible features are generally more important.

The invocation for externally visible features is com-paratively simple when a graphical user interface is avail-able (as it was the case in our case study). Then, usuallyonly a menu selection or a similar interaction is necessary.In the case of a batch system, one may vary command lineswitches and may have to provide different sets of test datato invoke a feature. However, in order to find suitable testdata, one might need some knowledge on internal detailsof a system.

The implementation of this technique was surprisinglysimple. We opportunistically put together a set of publiclyavailable tools and wrote a few Perl scripts (140 LOC intotal) for interoperability, which took us just one day. Adrawback of our simple implementation is that one has torun the system for each usage scenario from the beginning

10

to get an execution trace for each feature. A more sophisti-cated environment would allow to start and end recordingtraces at any time.

Our implementation only counts subprogram calls andignores accesses to global variables and single statementsor expressions. It might be useful to analyze at a finergranularity when subprograms are interleaved, i.e., differ-ent strands of control with different functionality areunited in a single subprogram, possibly for efficiency rea-sons. For instance, we have found a subprogram in ourcase study that draws different kinds of objects. The func-tion contained a large switch statement whose branchesdrew the specific kinds of objects. In the execution trace,this subprogram showed up for all objects where in factonly specific parts of it were actually executed.

Furthermore, the success of the described approachheavily depends on the clever choice of usage scenariosand the combination of them. Scenarios that cover toomuch functionality in one step or the clumsy combinationof scenarios will result in huge and complex lattices thatare unreadable for humans. Moreover, the number ofusage scenarios increases tremendously when features arecombined.

In our case study, the method provided us with valuableinsights. The lattice revealed dependencies among featuresfor the Xfig implementation and the absence of suchdependencies, respectively; e.g., the abilities to draw textand circles/ellipses are widely independent from othershapes. Related features were grouped together in the con-cept lattice, which allowed us to compare our mentalmodel of a drawing tool to the actual implementation ofXfig. The lattice also classified components according totheir abstraction level, which is a useful information forre-use; general components can be found at the lowerlevel, specific components at the upper level. Moreover,the lattice showed dependencies among components,which need to be known when components are to beextracted.

As future work, we want to explore how resultsobtained by the method described in this paper may becombined with results of additional static analyses. Forexample, we want to investigate the relation between theconcept lattice based on dynamic information and staticsoftware architecture recovery techniques.

References

[1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud,J.-M., ‘Transitioning Legacy Assets - a Product LineApproach’, Proceedings of the SIGSOFT Foundations ofSoftware Engineering, Toulouse, pp. 446-463, Association ofComputing Machinery (ACM), 1999.

[2] Bosch, J., ‘Product-Line Architectures in Industry: A CaseStudy’, Proc. of the 21st International Conference on Soft-ware Engineering (ICSE’99), (Los Angeles, CA, USA), pp.

544-554, May 1999

[3] Brandenburg, F.J., ‘Graphlet’, Universität Passau,http://www.infosun.fmi.uni-passau.de/Graphlet/.

[4] Canfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A.,‘A Case Study of Applying an Eclectic Approach to IdentifyObjects in Code’,Workshop on Program Comprehension, pp.136-143, Pittsburgh, 1999, IEEE Computer Society Press.

[5] Chen, K. und Rajlich, V., ‘Case Study of Feature LocationUsing Dependence Graph’,Proc. of the 8th Int. Workshop onProgram Comprehension, pp. 241-249, June 10-11, 2000,Limerick, Ireland, IEEE Computer Society Press.

[6] Graudejus, H.,Implementing a Concept Analysis Tool forIdentifying Abstract Data Types in C Code, master thesis,University of Kaiserslautern, Germany, 1998.

[7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure ofLegacy Code Based on Mathematical Concept Analysis’,Proc. of the Int. Conference on Software Engineering, pp.349-359, Boston, 1997.

[8] Lindig, C., Concepts,ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc.

[9] Koschke, R., ‘Atomic Architectural Component Recovery forProgram Understanding and Evolution’, Dissertation, Institutfür Informatik, Universität Stuttgart, 2000,http://www.informatik.uni-stuttgart.de/ifi/ps/rainer/thesis.

[10]Krone, M. and Snelting, G., ‘On the Inference of Configura-tion Structures From Source Code’,Proc. of the Int. Confer-ence on Software Engineering, pp. 49-57, May 1994, IEEEComputer Society Press.

[11]Perry, D., ‘Generic Architecture Descriptions for ProductLines’, Proc. of the Second International ESPRIT ARESWorkshop, Lecture Notes in Computer Science 1429, pp. 51-56, Springer, 1998

[12]Sahraoui, H., Melo. W, Lounis, H., and Dumont, F. (1997),‘Applying Concept Formation Methods to Object Identifica-tion in Procedural Code’,Proc. of the Conference on Auto-mated Software Engineering, Nevada, pp. 210-218,November, IEEE Computer Society.

[13]Siff, M. and Reps, T., ‘Identifying Modules via ConceptAnalysis’, Proc. of the Int. Conference on Software Mainte-nance, Bari, pp. 170-179, October, 1997, IEEE ComputerSociety.

[14]Snelting, G., ‘Reengineering of Configurations Based onMathematical Concept Analysis’,ACM Transactions on Soft-ware Engineering and Methodology5, 2, pp. 146-189, April,1997.

[15]Snelting, G. and Tip, F., ‘Reengineering Class HierarchiesUsing Concept Analysis’,Proc. of the ACM SIGSOFT Sym-posium on the Foundations of Software Engineering, pp. 99-110, November, 1994.

[16]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech-niques and Process Aspects for Product Line Development’,Proc. of the 10th International Software Process Workshop,June 1996, Ventron FR.

[17]Wilde, N. and Scully, M.C., ‘Software Reconnaissance:Mapping Program Features to Code’,Software Maintenance:Research and Practice, vol. 7, pp. 49-62, 1995.

[18]Xfig system, http://www.xfig.org.

Derivation of Feature Component Maps by means of Concept Analysis

Thomas Eisenbarth, Rainer Koschke, Daniel Simon

University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany{eisenbts, koschke, simondl}@informatik.uni-stuttgart.de

Abstract

Feature component maps describe which componentsare needed to implement a particular feature and are usedearly in processes to develop a product line based on exist-ing assets. This paper describes a new technique to derivethe feature component map and additional dependenciesutilizing dynamic information and concept analysis. Themethod is simple to apply, cost-effective, largely languageindependent, and can yield results quickly and very earlyin the process.

1. Introduction

Developing similar products as members of a productline promises advantages, like higher potential for reuse,lesser costs and shorter time to market. There are manyapproaches to newly developing product lines fromscratch [2, 10]. However, according to Martinez in [15],most successful examples of product lines at Motorolaoriginated in a single separate product. Only in the courseof time, a shared architecture for a product line evolved.Moreover, large investments impose a reluctance againstintroducing a product line approach that ignores existingassets. Hence, introducing a product line approach hasgenerally to cope with existing code.

Reverse engineering helps creating a product line fromexisting systems by identifying and analyzing the compo-nents and deriving the individual architectures. They canthen be unified to a product line architecture which is pop-ulated by the derived components.

As stated in Bayer et. al [1], early reverse engineeringis needed to derive first coarse information on existingassets needed by a product line analyst to set up a suitableproduct line architecture.

One important piece of information for a product lineanalysis that tries to integrate existing assets is the so-calledfeature component mapthat describes which com-ponents are needed to implement a particular feature. Afeature is a realized (functional as well as non-functional)requirement (the termfeature is intentionally weaklydefined because its exact meaning depends on the specificcontext).Componentsare computational units of a soft-ware architecture.

On the basis of the feature component map and addi-tional economic reasons, a decision is made for particu-

larly interesting and required components, and furtherexpensive analyses can be aimed at selected components.

This paper describes a quickly realizable technique toascertain the feature component map based on dynamicinformation (gained from execution traces) and conceptanalysis. The technique is automatic to a great extent.

The remainder of this article is organized as follows.Section 2 gives an overview, Section 3 explains how con-cept analysis can be used to derive the feature componentmap and Section 4 describes our experience with this tech-nique in an example. Section 5 references related research,Section 6 concludes the paper.

2. Overview

The technique described here is based on the executiontraces generated by a profiler for different usage scenarios.One scenario represents the invocation of one single fea-ture and yields all subprograms executed for this feature.These subprograms identify the components. The requiredcomponents for all scenarios and the set of features arethen subject to concept analysis. Concept analysis givesinformation on relationships between features andrequired components as well as feature-feature and com-ponent-component dependencies.

Concept Analysis.Concept analysis is a mathematicaltechnique that provides insights into binary relations. Themathematical foundation of concept analysis was laid byBirkhoff in 1940. The binary relation in our specific appli-cation of concept analysis to derive the feature componentmap states which components are required when a featureis invoked. The detailed mathematical background of con-cept analysis can be found in [7,13,14].

3. Feature Component Map

In order to derive the feature component map via con-cept analysis, one has to define the formal context(objects, attributes, relation) and to interpret the resultingconcept lattice accordingly.

3.1. Context for Feature and Components

The set of relevant featuresF will be determined by theproduct line experts. We consider all the system’s subpro-

grams a set of componentsC. A component corresponds toan object of the formal context, whereas a feature will beconsidered an attribute.

The relationR for the formal context necessary for con-cept analysis is defined as follows (wherec ∈ C, f ∈ F):

(c, f) ∈ R if and only if componentc is requiredwhen featuref is invoked; a subprogram isrequired when it needs to be executed.

R can be visualized using a relation table as shown inFigure 1:

Figure 1. Relation TableThe resulting concept lattice is shown in Figure 2. We

use the sparse representation for visualization showing anattribute/feature at the uppermost concept in the latticewhere it is required (so the attributes spread from this nodedown to the bottom). For a featuref, this node is denotedby µ(f). Analogously, a node is marked with an object/componentc ∈ C in the sparse representation if it repre-sents the most special concept that hasc in its extent. Thisunique node is denoted byγ(c). Hence, an object/compo-nentc spreads from the nodeγ(c), to which it is attached,up to the top.

Figure 2. Concept LatticeIn order to ascertain the relation table, a set of usage

scenarios needs to be prepared where each scenario trig-

gers exactly one relevant feature1. Then the system is usedaccording to the set of usage scenarios. For each usagescenario, the execution trace is recorded.

An execution trace contains all called subprograms fora usage scenario or an invoked feature, respectively.Hence, each system run yields all required components fora single scenario that exploits one feature. A single col-umn in the relation table can be obtained per system run.Applying all usage scenarios provides the relation table.

3.2. Interpretation of the Concept Lattice

Concept analysis applied to the formal contextdescribed in the previous section gives a lattice, fromwhich interesting relationships can be derived. These rela-tionships can be fully automatically derived and presentedto the analyst such that the complicated theoretical back-ground can be hidden. The only thing an analyst has toknow is how to interpret the derived relationships.

The following base relationships can be derived fromthe sparse representation of the lattice:

• A component,c, is required for all features at and aboveγ(c) in the lattice.

• A feature,f, requires all components at and belowµ(f) inthe lattice.

• A component,c, is specific to exactly one feature,f, if fis the only feature on all paths fromγ(c) to the top ele-ment.

• A feature,f, is specific to exactly one component,c, if cis the only component on all paths fromµ(f) to the bot-tom element (i.e,c is the only component required toimplement featuref).

• Features to which two components,c1 and c2, jointlycontribute can be identified byγ(c1) ∧ γ(c2); graphicallydepicted, one ascertains in the lattice the closest com-mon node toward the top element starting at the nodes towhichc1 andc2, respectively, are attached; all features atand above this common node are those jointly imple-mented by these components.

• Components jointly required for two features,f1 and f2,are described byµ(f1) ∨ µ(f2); graphically depicted, oneascertains in the lattice the closest common node towardthe bottom element starting at the nodes to whichf1 andf2, respectively, are attached; all components at andbelow this common node are those jointly required forthese features.

• Components required for all features can be found at thebottom element.

• Features that require all components can be found at thetop element.

The information described above can be derived by atool and fed back to the product line expert. As soon as adecision is made to re-use certain features, all componentsrequired for these features (easily derived from the con-cept lattice) form a starting point for further static analysesto investigate quality (like maintainability, extractability,and integrability) and to estimate effort for subsequentsteps (wrapping, reengineering, or re-development fromscratch).

f1 f2 f3 f4 f5 f6 f7 f8c1 ✕ ✕

c2 ✕ ✕ ✕

c3 ✕ ✕ ✕ ✕ ✕

c4 ✕ ✕ ✕ ✕ ✕ ✕

1. It is possible to combine multiple features into one scenario, mak-ing the interpretation of the resulting concept lattice more compli-cated. This is beyond the scope of this paper.

c3c2c4

c1

f1, f2 f5 f6, f7, f8

f3, f4

<

f5 applies tothese concepts

c3 applies tothese concepts

concept

3.3. Implementation

The implementation of the described approach is sur-prisingly simple (if one already has a tool for conceptanalysis). Our prototype for a Unix environment is anopportunistic integration of the following parts:

• Gnu C compilergcc to compile the system using a com-mand line switch for generating profiling information,

• Gnu object code viewernm,

• Gnu profilerprof,

• concept analysis toolconcepts [8],

• graph editorGraphlet[3] to visualize the concept lattice,

• and a short Perl script to ascertain the executed functionsin the execution trace and to convert the file formats ofconcepts andGraphlet (the script has just 225 LOC).

The fact that the subprograms are extracted from theobject code makes the implementation independent fromthe programming language to a great extent (as long as thelanguage is compiled to object code) and has the advan-tage that no additional compiler front end is necessary.

4. Example

We analyzed the Xfig system [18] (version 3.2.1) con-sisting of about 76 KLOCs written in the programminglanguage C.

Xfig is a menu-driven tool that allows the user to drawand manipulate objects interactively under the X WindowSystem. Objects can be lines, polygons, circles, rectangles,splines, text, and imported pictures. An interesting firsttask in our example was to define what constitutes a fea-ture. Clearly, the capability to draw specific objects, likelines, splines, rectangles, etc., can be considered a featureof Xfig. Moreover, one can manipulate drawn objects indifferent edit modes (rotate, move, copy, scale, etc.) withXfig.

We conducted two experiments. In the first one, weinvestigated the ability to draw different shapes only. Inthe second one, we analyzed the ability to modify shapes.The second experiment exemplifies combined featurescomposed by basic features. For the second experiment, ashape was drawn and then modified. Bothdraw andmod-ify constitute a basic feature. Combined features add to theeffort needed to derive the feature component map as thereare many possible combinations.

The lattice revealed dependencies among features forthe Xfig implementation and the absence of such depen-dencies, respectively. Related features were groupedtogether in the concept lattice, which allowed us to com-pare our mental model of a drawing tool to the actualimplementation. The lattice also classified componentsaccording to their abstraction level; general components

can be found at the lower level, specific components at theupper level. Moreover, the lattice showed dependenciesamong components.

Figure 3 shows a partial view of the concept lattice gen-erated for Xfig. Node #38 groups all components neededfor drawing ellipses and circles (both by diameter andradius). Nodes #41, #44, #42, and #43 contain the compo-nents for the more specific shape types.

We made the experience that applying our method iseasy in principle. However, running all scenarios by handis time consuming. It may be facilitated by the presence oftest cases that allow an automated replay of various sce-narios

5. Related Research

Snelting has recently introduced concept analysis tosoftware engineering. Since then it has been used to evalu-ate class hierarchies [14], explore configuration structuresof preprocessor statements [9, 13], and to recover compo-nents [4,6,7,11,12].

For feature localization, Chen and Rajlich [5] propose asemi-automatic method, in which an analyst browses thestatically derived dependency graph; navigation on thatgraph is computer-aided. Since the analyst more or lesstakes on all the search, this method is less suited to quicklyand cheaply derive the feature component map. In con-

Figure 3. Partial View of Xfig’s concept lattice

trast, our method treats the system as a black box.Wilde and Scully [16] also use dynamic analysis to

localize features. They focus on localizing rather thanderiving required components.

Our technique goes beyond Wilde and Scully’s tech-nique in that it also allows to derive relevant relationshipsbetween components and features by means of conceptanalysis, whereas Wilde and Scully’s technique only local-izes a feature. The derived relationships are an importinformation to product line experts and represent addi-tional dependencies that need to be considered in a deci-sion for certain features and components.

6. Conclusions and Future Work

A feature component map describes which componentsare required to implement a particular feature and isneeded at an early stage within a process toward a productline platform

• to weigh alternative platform architectures,

• to aim further tasks – like quality assessment – to onlythose existing components that are needed to populatethe platform architecture,

• and to decide on further steps, like reengineering orwrapping.

The technique presented in this paper yields the featurecomponent map automatically using the execution tracesfor different usage scenarios. The technique is based onconcept analysis, a mathematical sound technique to ana-lyze binary relations, which has the additional benefits toreveal not only correspondences between features andcomponents, but also commonalities and variabilitiesbetween features and components.

The success of the described approach heavily dependson the clever choice of usage scenarios and the combina-tion of them. Scenarios that cover too much functionalityin one step or the clumsy combination of scenarios willresult in huge and complex lattices that are unreadable forhumans.

As future work, we want to explore how resultsobtained by the method described in this paper may becombined with results of additional static analyses. Forexample, we want to investigate the relation between theconcept lattice based on dynamic information and staticsoftware architecture recovery techniques.

Our experiments suggest that investigating automaticanalyses of the lattice we described here is worth furthereffort. Dealing with scenarios covering multiple featuresshould be investigated in more depth.

References

[1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud,

J.-M., ‘Transitioning Legacy Assets – a Product LineApproach’, Proceedings of the SIGSOFT Foundations ofSoftware Engineering, pp. 446-463, September 1999.

[2] Bosch, J., ‘Product-Line Architectures in Industry: A CaseStudy’, Proc. of the 21st International Conference on Soft-ware Engineering (ICSE’99), pp. 544-554, May 1999.

[3] Brandenburg, F.J., ‘Graphlet’, Universität Passau,http://www.infosun.fmi.uni-passau.de/Graphlet/.

[4] Canfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A.,‘A Case Study of Applying an Eclectic Approach to IdentifyObjects in Code’,Workshop on Program Comprehension, pp.136-143, May 1999.

[5] Chen, K. und Rajlich, V., ‘Case Study of Feature LocationUsing Dependence Graph’,Proc. of the 8th Int. Workshop onProgram Comprehension, pp. 241-249, June 2000.

[6] Graudejus, H.,Implementing a Concept Analysis Tool forIdentifying Abstract Data Types in C Code, master thesis,University of Kaiserslautern, Germany, 1998.

[7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure ofLegacy Code Based on Mathematical Concept Analysis’,Proceedings of the International Conference on SoftwareEngineering, pp. 349-359, May 1997.

[8] Lindig, C., Concepts,ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc.

[9] Krone, M. and Snelting, G., ‘On the Inference of Configura-tion Structures From Source Code’,Proceedings of the Inter-national Conference on Software Engineering, pp. 49-57,May 1994.

[10]Perry, D., ‘Generic Architecture Descriptions for ProductLines’, Proceedings of the Second International ESPRITARES Workshop, Lecture Notes in Computer Science 1429,pp. 51-56, Springer, 1998.

[11]Sahraoui, H., Melo., W, and Lounis, H., and Dumont, F.,‘Applying Concept Formation Methods to Object Identifica-tion in Procedural Code’,Proceedings of the Conference onAutomated Software Engineering, pp. 210-218, November1997.

[12]Siff, M. and Reps, T., ‘Identifying Modules via ConceptAnalysis’, Proceedings of the International Conference onSoftware Maintenance, pp. 170-179, October 1997.

[13]Snelting, G., ‘Reengineering of Configurations Based onMathematical Concept Analysis’,ACM Transactions on Soft-ware Engineering and Methodology5, 2, pp. 146-189, April1997.

[14]Snelting, G. and Tip, F., ‘Reengineering Class HierarchiesUsing Concept Analysis’,Proc. of the ACM SIGSOFT Sym-posium on the Foundations of Software Engineering, pp. 99-110, November 1994.

[15]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech-niques and Process Aspects for Product Line Development’,Proceedings of the 10th International Software ProcessWorkshop, June 1996.

[16]Wilde, N. and Scully, M.C., ‘Software Reconnaissance:Mapping Program Features to Code’,Software Maintenance:Research and Practice, vol. 7, pp. 49-62, 1995.

[17]Xfig system, http://www.xfig.org.

We submitted a paper describing this idea to CSMR.

Page 13: ICSM'01 Most Influential Paper - Rainer Koschke

1

Derivation of Feature Component Maps by means of Concept Analysis

Thomas Eisenbarth, Rainer Koschke, Daniel Simon

University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany{eisenbts, koschke, simondl}@informatik.uni-stuttgart.de

Abstract

Feature component maps describe which componentsare needed to implement a particular feature and are usedearly in processes to develop a product family based onexisting components. This paper describes a new tech-nique to derive the feature component map and additionaldependencies utilizing dynamic information and conceptanalysis. The method is simple to apply, cost-effective,largely language independent, and can yield resultsquickly and very early in the process.

1. Introduction

Developing similar products as product families prom-ises several advantages over relatively expensive separatedevelopments, like lesser costs and shorter time for devel-opment, test, and maintenance. These advantages arebased on the fact that all family members share a commoninfrastructure – also known as platform architecture. Thereare many approaches to newly developing product familiesfrom scratch [2, 11]. However, according to Martinez [16],most successful examples of product families at Motorolaoriginated in a single separate product. Only in the courseof time, a shared architecture for a product family evolved.Moreover, large investments impose a reluctance againstintroducing a product family approach that ignores exist-ing assets. Hence, an introduction of a product familyapproach has generally to cope with existing code.

Reverse engineering may help creating a product fam-ily for existing systems by identifying and analyzing thecomponents and also by deriving the individual architec-ture from each system. These individual architectures maythen be unified to a platform architecture and the derivedcomponents may be used to populate the unified architec-ture. To this end, code needs to be adjusted, reengineered,or wrapped. However, changing or wrapping the code isonly done in very late phases in moving toward a productfamily. Reverse engineering can also assist in earlierphases and, thus, Bayer et al. rightly demand an early inte-gration of reverse engineering into a product familyapproach [1]. Early reverse engineering is needed to derivefirst coarse information on existing system components(assets) timely needed by a product family analyst toinvestigate feasibility and to estimate costs of differentalternative ways to get to a suitable product family archi-

tecture.One important piece of information for a product fam-

ily analysis that tries to integrate existing assets is the so-calledfeature component mapthat describes which com-ponents are needed to implement a particular feature. Afeature is a realized (functional as well as non-functional)requirement (the termfeature is intentionally weaklydefined because its exact meaning depends on the specificcontext).Componentsare computational units of a soft-ware architecture (see Section 3.1). Because the featurecomponent map is needed very early to trade off alterna-tives in good time, complete and hence time-consumingreverse engineering of the system is out of the question. Inparticular, the decision for a certain alternative will lead toa consolidation on specific economically important corecomponents in many cases and hence to an exclusion ofless important components. Any investment in a deep andcostly pre-analysis of less important components would bein vain to a large degree. Instead, reverse engineering inearly phases should give information on the feature com-ponent map quickly and with simple means. To this end,the product line analyst imparts all relevant features, forwhich the necessary components need to be detected, tothe reverse engineer who in turn delivers the feature com-ponent map. On the basis of the feature component mapand additional economic reasons, a decision is made forparticularly interesting and required components, and fur-ther expensive analyses regarding quality can be cost-effectively aimed at selected components.

This paper describes a quickly realizable technique toascertain the feature component map based on dynamicinformation (gained from execution traces) and conceptanalysis. The technique is automatic to a great extent.Concept analysis is a mathematical technique to investi-gate binary relations (see Section 2).

Integration into a Product Family Process. A simpleprocess for feature-based reengineering toward productfamilies can be described as follows:

1. The economically relevant features are ascertained byproduct family engineers and market analysts.

2. The feature component map is derived based on theidentified relevant features.

3. The previously derived feature component map givesadditional insights into dependencies among features

2

and components and, hence, into feasibility and costsof different alternative product family platforms. Theknowledge gained from the feature component mapand additional economic considerations may lead to afurther selection of only a certain subset of all featuresand their corresponding components.

4. The selected components are more closely analyzed,for instance, with respect to maintainability, extract-ability, and integrability.

5. A product family platform is designed. Alternativesfor components to populate the product family plat-form are weighed: component extraction and reengi-neering, new development, integration of COTS, orwrapping.

6. A migration plan is prepared.

The technique described in this article is used to derivethe feature component map which plays a central roleearly in this process.

Overview. The technique described here is based on theexecution traces generated by a profiler for different usagescenarios (see Figure 1). One scenario represents the invo-cation of one single feature and yields all subprogramsexecuted for this feature. These subprograms identify thecomponents (or are themselves considered components)required for a certain feature. The required components forall scenarios and the set of features are then subject to con-cept analysis. Concept analysis gives information on rela-tionships between features and required components aswell as feature-feature and component-component depen-dencies.

We want to point out that not all non-functionalrequirements, e.g., time constraints, can be easily mappedto components, i.e., our technique primarily aims at func-tional features. However, in some cases, it is possible toisolate non-functional aspects, like security, in code andmap them to specific components. For instance, one couldconcentrate all network accesses in one single componentto enable controlled secure connections.

The remainder of this article is organized as follows.Section 2 introduces concept analysis. Section 3 explainshow concept analysis can be used to derive the feature

component map and Section 4 describes our experiencewith this technique in a case study. Section 5 discussesrelated research.

2. Concept Analysis

Concept analysis is a mathematical technique that pro-vides insights into binary relations. The mathematicalfoundation of concept analysis was laid by Birkhoff in1940. It has already been successfully used in other fieldsof software engineering. The binary relation in our specificapplication of concept analysis to derive the feature com-ponent map states which components are required when afeature is invoked. This section describes concept analysisin more detail.

Concept analysis is based on a relationR between a setof objectsO and a set of attributesA, henceR ⊆ O × A.

The tupleC = (O, A, R) is calledformal context. For aset of objects,O ⊆ O, the set ofcommon attributes, σ, isdefined as:

Analogously, the set ofcommon objects, τ, for a set ofattributes,A⊆ A, is defined as:

In Section 3.1, the formal context for applying conceptanalysis to derive the feature component map will be laiddown as follows;

• components will be considered objects,

• features will be considered attributes,

• a pair (component c, feature f) is in relationR if c isexecuted whenf is invoked.

However, here – for the time being – we will use as anabstract example the binary relation between arbitraryobjects and attributes shown in Table 1. An objectoi hasattributeaj if row i and columnj is marked with an✕ inTable 1 (the example stems from Lindig and Snelting [7]).For instance, the following equations hold for this table,also known asrelation table:

and

A pair (O, A) is calledconceptifholds, i.e., all objects share all attributes. For a conceptc =(O, A), O is theextent of c, denoted byextent(c), andA is

Figure 1. Overview.

feature Fusage scenario

execution tracerequired components C1 …Cn

(F, C1), …(F, Cn) ∈Rconcept analysis

feature component mapand dependencies

a1 a2 a3 a4 a5 a6 a7 a8

o1 ✕ ✕

o2 ✕ ✕ ✕

o3 ✕ ✕ ✕ ✕ ✕

o4 ✕ ✕ ✕ ✕ ✕ ✕

Table 1: Example relation.

σ O( ) a A∈ o O∈( ) o a,( ) R∈∀{ }=

τ A( ) o O∈ a A∈( ) o a,( ) R∈∀{ }=

σ o1{ }( ) a1 a2,{ }= τ a7 a8,{ }( ) o3 o4,{ }=

A σ O( )= O∧ τ A( )=

3

the intent of c, denoted byintent(c).Informally, a concept corresponds to a maximal rectan-

gle of filled table cells modulo row and column permuta-tions. For example, Table 2 contains the concepts for therelation in Table 1.

The set of all concepts of a given formal context formsa partial order via:

or equivalently with

.

If c1 ≤ c2 holds, thenc1 is called asubconceptof c2

andc2 is calledsuperconcept of c1. For instance,({ o2, o4}, { a3, a4, a5}) ≤ ({ o2, o3, o4}, { a3, a4}) is true inTable 2.

The set of all concepts of a given formal context andthe partial order≤ form a complete lattice, calledconceptlattice L:

The infimum of two concepts in this lattice is com-puted by intersecting their extents as follows:

The infimum describes a set of common attributes oftwo sets of objects. Similarly, thesupremum is deter-mined by intersecting the intents:

The supremum ascertains the set of common objects,which share all attributes in the intersection of two sets ofattributes.

Graphically, the concept lattice for the example relationin Table 1 can be represented as a directed acyclic graphwhose nodes represent concepts and whose edges denotethe superconcept/subconcept relation < as shown inFigure 2. The most general concept is called thetop ele-ment and is denoted by . The most special concept iscalled thebottom element and is denoted by .

The combination of the graphical representation inFigure 2 and the contents of the concepts in Table 2together form the concept lattice. The complete informa-

tion can be visualized in a more readable equivalent wayby marking only the graph node with an attributea ∈ Awhose represented concept is the most general conceptthat hasa in its intent. Analogously, a node will be markedwith an objecto ∈ O if it represents the most special con-cept that haso in its extent. The unique elementµ in theconcept lattice marked witha is therefore:

(1)

The unique elementγ marked with objecto is:

(2)

We will call a graph representing a concept latticeusing this marking strategy asparse representation. Theequivalent sparse representation for Figure 2 is shown inFigure 3. The content of a nodeN in this representationcan be derived as follows:

• the objects ofN are all objects at and belowN,

• the attributes ofN are all attributes at and aboveN.

For instance, the node in Figure 3 marked witho2 anda5 is the concept ({o2, o4}, {a3, a4, a5}).

3. Feature Component Map

In order to derive the feature component map via con-cept analysis, one has to define the formal context(objects, attributes, relation) and to interpret the resultingconcept lattice accordingly.

3.1. Context for Feature and Components

Components will be considered objects of the formalcontext, whereas features will be considered attributes.Note that in the reverse case, the concept lattice is simplyinverted but the derived information will be the same.

The set of relevant features will be determined by theproduct family experts. For components, we can considerthe following alternatives depending on how much knowl-edge on the system architecture is already available:

C1 ({o1, o2, o3, o4}, ∅)

C2 ({o2, o3, o4}, {a3, a4})

C3 ({o1}, {a1, a2})

C4 ({o2, o4}, {a3, a4, a5})

C5 ({o3, o4}, {a3, a4, a6, a7, a8})

C6 ({o4}, {a3, a4, a5, a6, a7, a8})

C7 (∅, {a1, a2, a3, a4, a5, a6, a7, a8})

Table 2: Concepts for Table 1.

O1 A1,( ) O2 A2,( )≤ O1 O2⊆⇔

O1 A1,( ) O2 A2,( )≤ A1 A2⊇⇔

L C( ) O A,( ) 2O

2A×∈ A σ O( )= O τ A( )=∧{ }=

O1 A1,( ) O2 A2,( )∧ O1 O2∩ σ O1 O2∩( ),( )=

O1 A1,( ) O2 A2,( )∨ τ A1 A2∩( ) A1 A2∩,( )=

Figure 2. Concept lattice for Table 1.

Figure 3. Sparse representation of Figure 2.

C2

C1

C4 C5C3

C7

C6 <

µ a( ) c L C( )∈ a intent c( )∈{ }∨=

γ o( ) c L C( )∈ o extent c( )∈{ }∧=

o3o2o4

o1

a1, a2 a5 a6, a7, a8

a3, a4

<

4

1. cohesive modulesandsubsystemsas defined and doc-umented by the system’s architects or re-gained by re-engineers; modules and subsystems will be consid-eredcomposite components in the following;

2. physical modules, i.e., modules as defined by meansof the underlying programming language or simplydirectly available as existing files (the distinction tocohesive modules is that one does not know a prioriwhether physical modules really group cohesive dec-larations; physical modules are the unscrutinizedresult of a programmer’s way of grouping declara-tions whether it makes sense or not);

3. subprograms, i.e., functions and procedures, andglo-bal variablesof the system; subprograms and globalvariables will be calledlow-level componentsin thefollowing.

Ideally, one will use alternative (1) when reliable andcomplete documentation exists. However, if cohesivemodules and subsystems are not known in advance, onewould hardly make the effort to analyze a large system toobtain these in order to apply concept analysis to get thefeature component map because it not yet clear whichcomponents are relevant at all and reverse engineering ofthe complete system first will likely not be cost-effective.Only later, if the retrieved feature component map (usingsimpler definitions of components, like those in (2) or (3))clearly shows which lower-level components should beinvestigated further to obtain composite components,reverse engineering may generally pay off (in order todetect cohesive modules, we have developed a semi-auto-matic method integrating many automatic state-of-the-arttechniques [9]).

Alternative (2) can be chosen if suitable documentationis not available but there is reason to trust the program-mers of the system to a great extent. In all other cases, onewill fall back on alternative (3). However, for alternative(3), concept analysis may additionally yield hints on setsof related subprograms forming composite components.

The relation for the formal context necessary for con-cept analysis is defined as follows:

(C, F) ∈ R if and only if componentC is requiredwhen featureF is invoked; a subprogram isrequired when it needs to be executed; a globalvariable is required when it is accessed (used orchanged); a composite component is required whenone of its parts is required.

In order to obtain the relation, a set of usage scenariosneeds to be prepared where each scenario exploits prefera-bly only one relevant feature. Then the system is usedaccording to the set of usage scenarios, one at a time, andthe execution traces are recorded. An execution trace con-tains all required low-level components for a usage sce-

nario or an invoked feature, respectively. If compositecomponents are used for concept analysis, the executiontrace containing the required low-level componentsinduces an execution trace for composite components byreplacing each low-level component with the compositecomponent to which it belongs. Hence, each system runyields all required components for a single scenario thatexploits one feature. Thus, a single column in the relationtable can be obtained per system run. Applying all usagescenarios provides the relation table.

An execution trace can be recorded by a profiler. How-ever, most profilers only record subprogram calls but notaccesses to variables. Instead of using a symbolic debug-ger, for example, that allows to set watchpoints on variableaccesses, or even to instrument the code if no sophisticatedprofiler is available, one can also use a simple staticdependency analysis: One considers all variables directlyand statically accessed for each executed subprogram alsoto be dynamically accessed (all transitively accessed vari-ables will automatically be considered because all exe-cuted subprograms are examined). In practice, thisanalysis may be a sufficient approximation. But oneshould be aware that it may overestimate referencesbecause variable accesses may be included that are onpaths not executed at runtime, and it will also ignore refer-ences to variables by means of aliases if the simple staticdependency analysis does not take aliasing into account.For a first analysis to obtain a simplified feature compo-nent map, one can also ignore variables and come back tothese in a later phase using more sophisticated dynamic orstatic analyses.

3.2. Interpretation of the Concept Lattice

Concept analysis applied to the formal contextdescribed in the last section gives a lattice, from whichinteresting relationships can be derived. These relation-ships can be fully automatically derived and presented tothe analyst such that the more complicated theoreticalbackground can be hidden. The only thing an analyst hasto know is how to interpret the derived relationships. Thissection explains how interesting relationships can be auto-matically derived.

As already abstractly described in Section 2, the fol-lowing base relationships can be derived from the sparserepresentation of the lattice (note the duality in the inter-pretation):

• A component,c, is required for all features at andaboveγ(c) – as defined by (1) – in the lattice.

• A feature,f, requires all components at and belowµ(f)– as defined by (2) – in the lattice.

• A component,c, is specific to exactly one feature,f, iff is the only feature on all paths fromγ(c) to the top

5

element.

• A feature,f, is specific to exactly one component,c, ifc is the only component on all paths fromµ(f) to thebottom element (i.e,c is the only component requiredto implement featuref).

• Features, to which two components,c1 andc2, jointlycontribute, can be identified byγ(c1) ∧ γ(c2); graphi-cally depicted, one ascertains in the lattice the closestcommon node toward the top element starting at thenodes to whichc1 andc2, respectively, are attached;all features at and above this common node are thosejointly implemented by these components.

• Components jointly required for two features,f1 andf2, are described byµ(f1) ∨ µ(f2); graphicallydepicted, one ascertains in the lattice the closest com-mon node toward the bottom element starting at thenodes to whichf1 andf2, respectively, are attached; allcomponents at and below this common node are thosejointly required for these features.

• Components required for all features can be found atthe bottom element.

• Features that require all components can be found atthe top element.

• If the top element does not contain features, then allcomponents in the top element are superfluous (suchcomponents will not exist when the set of objects forconcept analysis contains only components executedat least once, which is the case if a filter ignores allsubprograms for which the profiler reports an execu-tion count of 0).

• If the bottom element does not contain any compo-nent, all features in the bottom element are not imple-mented by the system (this constellation will notexist, if there is a usage scenario for each feature andevery usage scenario is appropriate and relevant to thesystem; a system may indeed not have all features,i.e., a usage scenario may be meaningless for a givensystem).

Beyond these relationships between components andfeatures, further useful aspects between features on onehand and between components on the other hand may bederived:

• If γ(c1) < γ(c2) holds for two componentsc1 andc2,then componentc2 requires componentc1.

• If µ(f1) < µ(f2) holds for two featuresf1 and f2, thenfeaturef1 is based on featuref2.

One has to note that the latter relationship between fea-tures safely holds for the analyzed system only, i.e., thisrelationship is not necessarily true for the features as such,

because the relationship was derived only from a specificimplementation.

The information described above can be derived by atool and fed back to the product family expert. As soon asa decision is made re-use certain features, all componentsrequired for these features (easily derived from the con-cept lattice) form a starting point for further analyses toinvestigate quality (like maintainability, extractability, andintegrability) and to estimate effort for subsequent steps(wrapping, reengineering, or re-development fromscratch).

3.3. Implementation

The implementation of the described approach is sur-prisingly simple (if one already has a tool for conceptanalysis). Our prototype for a Unix environment is anopportunistic integration of the following parts:

• Gnu C compilergcc to compile the system using acommand line switch for generating profiling infor-mation,

• Gnu object code viewernm and a short Perl script inorder to identify all functions of the system (asopposed to those included from standard libraries),

• Gnu profilergprof and a short Perl script to ascertainthe executed functions in the execution trace,

• concept analysis toolconcepts [8],

• graph editorGraphlet[3] to visualize the concept lat-tice,

• and two more short Perl scripts to convert the file for-mats of concepts and Graphlet (all Perl scriptstogether have just 147 LOC).

The fact that the subprograms are extracted from theobject code makes the implementation independent fromthe programming language to a great extent (as long as thelanguage is compiled to object code) and has the advan-tage that no front end is necessary. On the other hand,because a compiler may replace source names by linknames in the object code (for instance, C++ compilers usename mangling to resolve overloading) there is not alwaysa direct mapping from the subprograms in the executiontrace back to the original source. Because we dealt in ourcase study with C code, object code names were identicalto source names. If this is not the case, one either toleratesdivergences between names (mostly, names are similarenough) or has to reverse name mangling.

4. Case Study

As a case study, we analyzed the Xfig system [18](version 3.2.1) consisting of about 76 KLOCs written inthe programming language C. In this section, we will

6

{setup_ind_panel

set_line_stuff

set_cursor

create_bitmaps

process_pending

redisplay_zoomed_region

...

main

mode_balloon draw_mousefun_toprulernode

4:

{create_mouse

cmd_balloon}

XRotDrawString

draw_shift_mousefun_canvas

clear_mousefun_kbd

draw_mousefun_kbd

check_cancel

textsize

pw_text

lookfont

set_latesttext

in_text_bound

text_search

redisplay_text

toggle_textmarker

last_text

add_text

list_add_text

x_fontnum

draw_text

new_string

create_text

text_bound

erase_char_string

char_handler

draw_char_string

finish_text_input

text_drawing_selected

mouse_balloon

print_to_file

XRotDrawAlignedImageString

node

8:

{arrow_bound}

append_point

create_point

set_latestline

redisplay_line

last_line

add_line

list_add_line

draw_line

create_line

line_bound

boxsize_msg

resizing_box

elastic_box

pw_arcbox

draw_arcbox

arcbox_drawing_selected

clip_arrows

altlength_msg

length_msg

erase_lengths

compute_angle

resizing_poly

elastic_poly

regpoly_drawing_selected

draw−rectangle.mon

draw−polyline.mon

draw−polygone.mon

erase_box_lengths

init_box_drawing

box_drawing_selected

unconstrained_line

elastic_line

free_points

elastic_moveline

cancel_line_drawing

get_intermediatepoint

init_trace_drawing

create_lineobject

line_drawing_selected

set_latestspline

draw_spline

redisplay_spline

last_spline

add_spline

list_add_spline

create_spline

create_sfactor

spline_bound

spline_drawing_selected

make_sfactors

pw_curve

set_latestellipse

redisplay_ellipse

center_marker

last_ellipse

add_ellipse

list_add_ellipse

create_ellipse

ellipse_bound

draw_ellipse

resizing_ebr

elastic_ebr

ellipsebyradius_drawing_selected

resizing_ebd

elastic_ebd

ellipsebydiameter_drawing_selected

esizing_cbr

elastic_cbr

circlebyradius_drawing_selected

resizing_cbd

elastic_cbd

circlebydiameter_drawing_selected

set_latestarc

redisplay_arc

last_arc

add_arc

list_add_arc

compute_direction

compute_arccenter

draw_arc

create_arc

arc_bound

arc_drawing_selected

40

41 4244

1 39

38

32

Figure 4. Lattice for the first experiment

43

23

4

5

6

see Figure 6

concept

The taller a concept is, the morecomponents it contains.

7

firstly present a general overview of the results and sec-ondly go into further details for particular interestingobservations.

Xfig is a menu-driven tool that allows the user to drawand manipulate objects interactively under the X WindowSystem. Objects can be lines, polygons, circles, rectangles,splines, text, and imported pictures. An interesting firsttask in our case study was to define what constitutes a fea-ture. Clearly, the capability to draw specific objects, likelines, splines, rectangles, etc., can be considered a featureof Xfig. Moreover, one can manipulate drawn objects indifferent edit modes (rotate, move, copy, scale, etc.) withXfig. Hence, we considered as main features the followingtwo capabilities:

1. ability to draw different shapes (lines, curves, rectan-gles, etc.)

2. ability to modify shapes in different editing modes(rotate, move, copy, scale, etc.)

We conducted two experiments. In the first one, weinvestigated the ability to draw different shapes only. Inthe second one, we analyzed the ability to modify shapes.The second experiment exemplifies combined featurescomposed by basic features. For the second experiment, ashape is drawn and then modified. Bothdraw andmodifyconstitute a basic feature. Combined features add to theeffort needed to derive the feature component map as thereare many possible combinations.

In both experiments, we considered subprograms ascomponents. However, in our simple implementation, wedo not handle variable accesses. Hence, not all requiredlow-level components are detected.

The resulting concepts contain subprograms groupedtogether according to their usage for features. Note that themore general subprograms can be found at the lower con-cepts in the lattice since they are used for many features,while specific components are in the upper region of thelattice. Hence, the concept lattice also reflects the level ofabstraction of these subprograms within the given set of

scenarios. To identify all subprograms required for a sin-gle feature or a set of features, one can then analyze theconcept lattice as described in Section 3.2.

First experiment. In our first experiment, we prepared 15scenarios. Each scenario invokes Xfig, performs the draw-ing of one of the objects Xfig provides, and then termi-nates Xfig, i.e., the aspects above were not combined andno other functionality of Xfig was used. We used allshapes of Xfig’s drawing panel shown in Figure 5 exceptpicture objects andlibrary objects.

The resulting lattice for this experiment is shown inFigure 4. The contents of the concepts in the lattice areomitted for readability reasons. However, their size in thispicture is a linear function of their number of components(except for the bottom element that contains 136 compo-nents, mostly initialization and GUI code and very basicfunctions, and was too large to be drawn accordingly; as acomparison point: the text drawing concept, marked asnode #1, has 29 components). As Figure 4 shows, there area few concepts containing most of the components (i.e.,subprograms) of the system. The lattice contains 47 con-cepts. 26 of them introduce at least one new component,i.e., to these nodes, a component is attached (more pre-cisely, a conceptC introduces a component if there exists a

Figure 5. Xfig’s object shapes.

circle by diametercircle by radius

ellipse by diametersellipse by radii

closed approx. spline approximated spline

closed interpol. spline interpolated spline

polygon polyline

rectangular box rectangular boxwith rounded corners

arctext

library object

regular polygon

picture object

7

componentc for which γ(c) = C holds). 21 of the conceptsdo not introduce any new component and merely mergefunctionality needed by several superconcepts.

The first interesting observation is that concepts withmany components can be found in the upper region, whilein the lower region, the number of components decreasesand the number of interferences increases (aninterferenceleads to an unstructured lattice; a lattice is said to be struc-tured if it can be decomposed into independent sublatticesthat are connected via the top and bottom elements only).That is to say that there are many specific operations andfew shared operations and also that shared operations arereally used for many features.

Concept #1 in Figure 4 is the largest concept (exclud-ing the bottom element). It exploits a single feature “drawtext object”. According to the lattice, the feature is largelyindependent from other features and shares only a fewcomponents with other features.

Concept #5 represents the two features “draw polyline”and “draw polygon”. The only difference between thesetwo features is that an additional line is drawn that closes apolygon. This difference is not visible in the concept lat-tice since the two features are attached to the same con-cept. The distinction is made in the body of the functionthat is called to draw either a polygon or a polyline. Con-cept #3 denotes the feature “draw spline”. Concept #4 hasno feature attached and represents the components sharedfor drawing polygons, polylines, and splines. These com-ponents are no real drawing operations but operations tokeep a log of the points set by the user and to draw linesbetween set points while the user is still setting points (aspline first appears as polygon and is only re-shaped whenthe user has set all points).

Concept #2 stands for the feature “draw arc” and con-cept #7 is again a concept that represents shared compo-nents for drawing elastic lines while the user is settingpoints. The difference between concept #7 and concept #4is that the former only contains the components to drawthe elastic line, while the latter adds the capability to set anarbitrary number of points. Splines do not need this capa-bility because they are defined by exactly three points.

Concept #6 represents the feature “draw lines” and isused for drawing rectangles, polygons, and polylines, asone would expect. The generality of this feature becomesimmediately obvious in the concept lattice as it is locatedin the middle level of the lattice.

The framed area in Figure 4 has a simpler structurethan the rest of the lattice. This part deals with circles andellipses and its details are shown in Figure 6. Each node,N, in Figure 6 contains two sets: The upper set contains allcomponents attached to the node, i.e., those components,c, for whichγ(c) = N; the lower set contains all features ofN, including those inherited from other concepts. The

names of the features correspond to the objects drawn viathe panel in Figure 5; e.g.,draw-ellipse-radiusmeans thatan ellipse was drawn where the radius was specified (asopposed to the diameter).

Nodes #41, #42, #43, and #44 represent the features todraw circles and ellipses using either diameter or radius.They all contain three specific components to draw theobject, to plot an elastic bend while the user is drawing,and to resize the object. Note the similarity of the compo-nent names. The specific commonalities among circles andellipses are represented by node #38, which introduces theshared components to draw circles and ellipses (both spec-ified by diameter and radius).

Nodes #32 and #39 connect the circles and ellipses tothe other objects. No components are attached to nodes#32 and #39, they only merge components from differentconcepts. The two nodes have a direct infimum (not shownin Figure 6) and add the same components to the circle andellipse features. The components inherited via these twonodes are very basic components of the lowest regions ofthe lattice, which indicates that ellipses and circles arewidely separate from all other objects.

Second experiment.In a second experiment, we analyzedthe edit moderotate which comes in two variants: clock-wise and counterclockwise. The first ten shapes inFigure 5 were drawn and rotated once clockwise and oncecounterclockwise, which resulted in 20 scenarios. The

Figure 6. Relevant parts forcircles and ellipses

8

resulting lattice contained 55 concepts, most of them intro-duce no new component. We observed that the relatedshapes, i.e., the variants of splines, circles, ellipses, etc.,were merged at the top of the lattice since they use almostthe same components. In order to reduce the size of thelattice, we selected one representative among the relatedshapes and re-run the experiment with three shapes(ellipse, polygon, and open approximated spline). Theresulting lattice is shown in Figure 7.

This lattice consists of 22 concepts, three of them pro-vide the specific functionality for the respective shapes.Concept #1 (21 functions) depicts the functionality forsplines and concept #2 (17 functions) represents the onefor lines (used for polygons). Both are dependent on con-cept #4 (29 functions) that groups functions related topoints. Concept #3 (20 functions) denotes the ellipse fea-ture, concept #5 (29 functions) the general drawing sup-port functionality and concept #6 (123 functions) the start-up and initialization code of the system.

Analyzing concepts #1, #2, and #3, we found that theshapes provide individual rotate functions. In other words,the rotate feature is implemented specific to each shape,i.e., there is no generic component that draws all differentshapes, which would have been an interesting finding interms of reuse.

General observations.We made the experience thatapplying our method is easy in principle. However, run-ning all scenarios by hand is time consuming. It may befacilitated by the presence of test cases that allow an auto-mated replay of various scenarios.

Because Xfig has a GUI, running a single scenario byhand is an easy task. However, one has to pay attention not

to cause interferences by invoking irrelevant features. Forinstance, Xfig uses a balloon help facility that pops up alittle window when the cursor stays some time on a sensi-tive area of the GUI (e.g., over the button selecting the cir-cle drawing mode). Sometimes the balloon helpmechanism triggers, introducing interferences betweenfeatures. Such effects affect the analysis because theyintroduce spurious connections between features. Fortu-nately, this problem can be partly fixed by providing a spe-cific scenario in which only the accidentally invokedirrelevant feature is invoked, which leads to a refactoredconcept lattice that contains a new concept that isolates theirrelevant feature and its components. In our example,interferences due to an accidentally invoked irrelevant fea-ture appeared only at the two layers directly on top of thebottom element of the lattice, and could be more or lessignored.

5. Related Research

The mathematical foundation of concept analysis waslaid by Birkhoff in 1940. Primarily Snelting has recentlyintroduced concept analysis to software engineering. Sincethen it has been used to evaluate class hierarchies [15],explore configuration structures of preprocessor state-ments [10, 14], and to recover components [4,7,12,13].

For feature localization, Chen and Rajlich [5] propose asemi-automatic method, in which an analyst browses thestatically derived dependency graph; navigation on thatgraph is computer-aided. Since the analyst more or lesstakes on all the search, this method is less suited to quicklyand cheaply derive the feature component map. Moreover,the method relies on the quality of the static dependencygraph. If this graph, for example, does not contain infor-mation on potential values of function pointers, the humananalyst may miss functions only called via function point-ers. At the other extreme, if the too conservative assump-tion is made that every function whose address is taken iscalled at each function pointer call site, the search spaceincreases extremely. Generally, it is statically undecidablewhich paths are taken at runtime, so that every static anal-ysis will yield an overestimated search space, whereasdynamic analyses exactly tell which parts are really usedat runtime (though for a particular run only). However,Chen and Rajlich’s technique could be helpful in a laterphase, in which the system needs to be more rigorouslyanalyzed. The purpose of our technique is to derive thefeature component map. It handles the system as a blackbox and, hence, does not give insights in internal aspectswith respect to quality and effort.

Wilde and Scully [17] also use dynamic analysis tolocalize features as follows:

1. Theinvoking input set I(i.e., a set of test cases or – in

Figure 7. Concept lattice for second experiment.

6

5

4

21 3

9

our terminology – a set of usage scenarios) is identi-fied that will invoke a feature.

2. The excluding input set Eis identified that will notinvoke a feature.

3. The program is executed twice usingI and E sepa-rately.

4. By comparison of the two resulting execution traces,the components can be identified that implement thefeature.

Wilde and Scully focus on localizing rather than deriv-ing required components: For deriving all required compo-nents, the execution trace for the including input set issufficient. By subtracting all components in the executiontrace for the excluding input set from those in the execu-tion trace for the invoking input set, only those compo-nents remain that specifically deal with the feature.

Note that our technique achieves the same effect byconsidering several execution traces for different featuresat a time. Components not specific to a feature will “sink”in the concept lattice, i.e., will be closer to the bottom ele-ment. More precisely, recall from Section 3.2 that a com-ponent,c, is specific to exactly one feature,f, if f is theonly feature on all paths fromγ(c) to the top element.

One may argue that components that are only requiredto get the system started, but are not – strictly speaking –directly necessary for any feature will still appear in theconcept lattice when we do not subtract execution tracesfor an excluding input set. It is true that these componentscannot be distinguished from components that in fact con-tribute to all components because both kinds of compo-nents jointly appear at the bottom element. However, theidea of an excluding input set can be taken over to ourtechnique to distinguish these two kinds of components byproviding a usage scenario in which no feature is invoked,like simply starting and immediately shutting down thesystem without invoking any relevant feature. That simpletrick separates the two kinds of components in two distinctconcepts,C1 and C2, in the lattice whereC1 < C2 and

C1= and C2 contains only those components that arereally required for all components in a narrower sense.

Furthermore, our technique goes beyond Wilde andScully’s technique in that it also allows to derive relevantrelationships between components and features by meansof concept analysis, whereas Wilde and Scully’s techniqueonly localizes a feature. The derived relationships are animport information to product family experts and representadditional dependencies that need to be considered in adecision for certain features and components.

6. Conclusions

A feature component map describes which components

are required to implement a particular feature and isneeded at an early stage within a process toward a productfamily platform

• to weigh alternative platform architectures,

• to aim further tasks – like quality assessment – to onlythose existing components that are needed to populatethe platform architecture,

• and to decide on further steps, like reengineering orwrapping.

The technique presented in this paper yields the featurecomponent map automatically using the execution tracesfor different usage scenarios. The technique is based onconcept analysis, a mathematical sound technique to ana-lyze binary relations, which has the additional benefits toreveal not only correspondences between features andcomponents but also dependencies between features andbetween components (feature-feature dependencies arederived from an existing system and, hence, may onlyexist for this particular system but not necessarily for thesefeatures in general).

The technique is primarily suited for functional fea-tures that may be mapped to components. In particularnon-functional features do not easily map to components.For example, for applications for which timing is critical(because it may result in diverging behavior), the featureswould also have to take time into account.

Note also that the technique is not suited for featuresthat are only internally visible, like whether a compileruses a certain intermediate representation. Strictly speak-ing, internal features may be viewed as implementationdetails. However, such implementation details may be ofinterest for defining a product family architecture. Internalfeatures can only be detected by looking at the source,because it is not clear how to invoke them from outsideand how to derive from an execution trace whether thesefeatures are present or not. However, we assume thatexternally visible features are generally more important.

The invocation for externally visible features is com-paratively simple when a graphical user interface is avail-able (as it was the case in our case study). Then, usuallyonly a menu selection or a similar interaction is necessary.In the case of a batch system, one may vary command lineswitches and may have to provide different sets of test datato invoke a feature. However, in order to find suitable testdata, one might need some knowledge on internal detailsof a system.

The implementation of this technique was surprisinglysimple. We opportunistically put together a set of publiclyavailable tools and wrote a few Perl scripts (140 LOC intotal) for interoperability, which took us just one day. Adrawback of our simple implementation is that one has torun the system for each usage scenario from the beginning

10

to get an execution trace for each feature. A more sophisti-cated environment would allow to start and end recordingtraces at any time.

Our implementation only counts subprogram calls andignores accesses to global variables and single statementsor expressions. It might be useful to analyze at a finergranularity when subprograms are interleaved, i.e., differ-ent strands of control with different functionality areunited in a single subprogram, possibly for efficiency rea-sons. For instance, we have found a subprogram in ourcase study that draws different kinds of objects. The func-tion contained a large switch statement whose branchesdrew the specific kinds of objects. In the execution trace,this subprogram showed up for all objects where in factonly specific parts of it were actually executed.

Furthermore, the success of the described approachheavily depends on the clever choice of usage scenariosand the combination of them. Scenarios that cover toomuch functionality in one step or the clumsy combinationof scenarios will result in huge and complex lattices thatare unreadable for humans. Moreover, the number ofusage scenarios increases tremendously when features arecombined.

In our case study, the method provided us with valuableinsights. The lattice revealed dependencies among featuresfor the Xfig implementation and the absence of suchdependencies, respectively; e.g., the abilities to draw textand circles/ellipses are widely independent from othershapes. Related features were grouped together in the con-cept lattice, which allowed us to compare our mentalmodel of a drawing tool to the actual implementation ofXfig. The lattice also classified components according totheir abstraction level, which is a useful information forre-use; general components can be found at the lowerlevel, specific components at the upper level. Moreover,the lattice showed dependencies among components,which need to be known when components are to beextracted.

As future work, we want to explore how resultsobtained by the method described in this paper may becombined with results of additional static analyses. Forexample, we want to investigate the relation between theconcept lattice based on dynamic information and staticsoftware architecture recovery techniques.

References

[1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud,J.-M., ‘Transitioning Legacy Assets - a Product LineApproach’, Proceedings of the SIGSOFT Foundations ofSoftware Engineering, Toulouse, pp. 446-463, Association ofComputing Machinery (ACM), 1999.

[2] Bosch, J., ‘Product-Line Architectures in Industry: A CaseStudy’, Proc. of the 21st International Conference on Soft-ware Engineering (ICSE’99), (Los Angeles, CA, USA), pp.

544-554, May 1999

[3] Brandenburg, F.J., ‘Graphlet’, Universität Passau,http://www.infosun.fmi.uni-passau.de/Graphlet/.

[4] Canfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A.,‘A Case Study of Applying an Eclectic Approach to IdentifyObjects in Code’,Workshop on Program Comprehension, pp.136-143, Pittsburgh, 1999, IEEE Computer Society Press.

[5] Chen, K. und Rajlich, V., ‘Case Study of Feature LocationUsing Dependence Graph’,Proc. of the 8th Int. Workshop onProgram Comprehension, pp. 241-249, June 10-11, 2000,Limerick, Ireland, IEEE Computer Society Press.

[6] Graudejus, H.,Implementing a Concept Analysis Tool forIdentifying Abstract Data Types in C Code, master thesis,University of Kaiserslautern, Germany, 1998.

[7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure ofLegacy Code Based on Mathematical Concept Analysis’,Proc. of the Int. Conference on Software Engineering, pp.349-359, Boston, 1997.

[8] Lindig, C., Concepts,ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc.

[9] Koschke, R., ‘Atomic Architectural Component Recovery forProgram Understanding and Evolution’, Dissertation, Institutfür Informatik, Universität Stuttgart, 2000,http://www.informatik.uni-stuttgart.de/ifi/ps/rainer/thesis.

[10]Krone, M. and Snelting, G., ‘On the Inference of Configura-tion Structures From Source Code’,Proc. of the Int. Confer-ence on Software Engineering, pp. 49-57, May 1994, IEEEComputer Society Press.

[11]Perry, D., ‘Generic Architecture Descriptions for ProductLines’, Proc. of the Second International ESPRIT ARESWorkshop, Lecture Notes in Computer Science 1429, pp. 51-56, Springer, 1998

[12]Sahraoui, H., Melo. W, Lounis, H., and Dumont, F. (1997),‘Applying Concept Formation Methods to Object Identifica-tion in Procedural Code’,Proc. of the Conference on Auto-mated Software Engineering, Nevada, pp. 210-218,November, IEEE Computer Society.

[13]Siff, M. and Reps, T., ‘Identifying Modules via ConceptAnalysis’, Proc. of the Int. Conference on Software Mainte-nance, Bari, pp. 170-179, October, 1997, IEEE ComputerSociety.

[14]Snelting, G., ‘Reengineering of Configurations Based onMathematical Concept Analysis’,ACM Transactions on Soft-ware Engineering and Methodology5, 2, pp. 146-189, April,1997.

[15]Snelting, G. and Tip, F., ‘Reengineering Class HierarchiesUsing Concept Analysis’,Proc. of the ACM SIGSOFT Sym-posium on the Foundations of Software Engineering, pp. 99-110, November, 1994.

[16]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech-niques and Process Aspects for Product Line Development’,Proc. of the 10th International Software Process Workshop,June 1996, Ventron FR.

[17]Wilde, N. and Scully, M.C., ‘Software Reconnaissance:Mapping Program Features to Code’,Software Maintenance:Research and Practice, vol. 7, pp. 49-62, 1995.

[18]Xfig system, http://www.xfig.org.

Derivation of Feature Component Maps by means of Concept Analysis

Thomas Eisenbarth, Rainer Koschke, Daniel Simon

University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany{eisenbts, koschke, simondl}@informatik.uni-stuttgart.de

Abstract

Feature component maps describe which componentsare needed to implement a particular feature and are usedearly in processes to develop a product line based on exist-ing assets. This paper describes a new technique to derivethe feature component map and additional dependenciesutilizing dynamic information and concept analysis. Themethod is simple to apply, cost-effective, largely languageindependent, and can yield results quickly and very earlyin the process.

1. Introduction

Developing similar products as members of a productline promises advantages, like higher potential for reuse,lesser costs and shorter time to market. There are manyapproaches to newly developing product lines fromscratch [2, 10]. However, according to Martinez in [15],most successful examples of product lines at Motorolaoriginated in a single separate product. Only in the courseof time, a shared architecture for a product line evolved.Moreover, large investments impose a reluctance againstintroducing a product line approach that ignores existingassets. Hence, introducing a product line approach hasgenerally to cope with existing code.

Reverse engineering helps creating a product line fromexisting systems by identifying and analyzing the compo-nents and deriving the individual architectures. They canthen be unified to a product line architecture which is pop-ulated by the derived components.

As stated in Bayer et. al [1], early reverse engineeringis needed to derive first coarse information on existingassets needed by a product line analyst to set up a suitableproduct line architecture.

One important piece of information for a product lineanalysis that tries to integrate existing assets is the so-calledfeature component mapthat describes which com-ponents are needed to implement a particular feature. Afeature is a realized (functional as well as non-functional)requirement (the termfeature is intentionally weaklydefined because its exact meaning depends on the specificcontext).Componentsare computational units of a soft-ware architecture.

On the basis of the feature component map and addi-tional economic reasons, a decision is made for particu-

larly interesting and required components, and furtherexpensive analyses can be aimed at selected components.

This paper describes a quickly realizable technique toascertain the feature component map based on dynamicinformation (gained from execution traces) and conceptanalysis. The technique is automatic to a great extent.

The remainder of this article is organized as follows.Section 2 gives an overview, Section 3 explains how con-cept analysis can be used to derive the feature componentmap and Section 4 describes our experience with this tech-nique in an example. Section 5 references related research,Section 6 concludes the paper.

2. Overview

The technique described here is based on the executiontraces generated by a profiler for different usage scenarios.One scenario represents the invocation of one single fea-ture and yields all subprograms executed for this feature.These subprograms identify the components. The requiredcomponents for all scenarios and the set of features arethen subject to concept analysis. Concept analysis givesinformation on relationships between features andrequired components as well as feature-feature and com-ponent-component dependencies.

Concept Analysis.Concept analysis is a mathematicaltechnique that provides insights into binary relations. Themathematical foundation of concept analysis was laid byBirkhoff in 1940. The binary relation in our specific appli-cation of concept analysis to derive the feature componentmap states which components are required when a featureis invoked. The detailed mathematical background of con-cept analysis can be found in [7,13,14].

3. Feature Component Map

In order to derive the feature component map via con-cept analysis, one has to define the formal context(objects, attributes, relation) and to interpret the resultingconcept lattice accordingly.

3.1. Context for Feature and Components

The set of relevant featuresF will be determined by theproduct line experts. We consider all the system’s subpro-

grams a set of componentsC. A component corresponds toan object of the formal context, whereas a feature will beconsidered an attribute.

The relationR for the formal context necessary for con-cept analysis is defined as follows (wherec ∈ C, f ∈ F):

(c, f) ∈ R if and only if componentc is requiredwhen featuref is invoked; a subprogram isrequired when it needs to be executed.

R can be visualized using a relation table as shown inFigure 1:

Figure 1. Relation TableThe resulting concept lattice is shown in Figure 2. We

use the sparse representation for visualization showing anattribute/feature at the uppermost concept in the latticewhere it is required (so the attributes spread from this nodedown to the bottom). For a featuref, this node is denotedby µ(f). Analogously, a node is marked with an object/componentc ∈ C in the sparse representation if it repre-sents the most special concept that hasc in its extent. Thisunique node is denoted byγ(c). Hence, an object/compo-nentc spreads from the nodeγ(c), to which it is attached,up to the top.

Figure 2. Concept LatticeIn order to ascertain the relation table, a set of usage

scenarios needs to be prepared where each scenario trig-

gers exactly one relevant feature1. Then the system is usedaccording to the set of usage scenarios. For each usagescenario, the execution trace is recorded.

An execution trace contains all called subprograms fora usage scenario or an invoked feature, respectively.Hence, each system run yields all required components fora single scenario that exploits one feature. A single col-umn in the relation table can be obtained per system run.Applying all usage scenarios provides the relation table.

3.2. Interpretation of the Concept Lattice

Concept analysis applied to the formal contextdescribed in the previous section gives a lattice, fromwhich interesting relationships can be derived. These rela-tionships can be fully automatically derived and presentedto the analyst such that the complicated theoretical back-ground can be hidden. The only thing an analyst has toknow is how to interpret the derived relationships.

The following base relationships can be derived fromthe sparse representation of the lattice:

• A component,c, is required for all features at and aboveγ(c) in the lattice.

• A feature,f, requires all components at and belowµ(f) inthe lattice.

• A component,c, is specific to exactly one feature,f, if fis the only feature on all paths fromγ(c) to the top ele-ment.

• A feature,f, is specific to exactly one component,c, if cis the only component on all paths fromµ(f) to the bot-tom element (i.e,c is the only component required toimplement featuref).

• Features to which two components,c1 and c2, jointlycontribute can be identified byγ(c1) ∧ γ(c2); graphicallydepicted, one ascertains in the lattice the closest com-mon node toward the top element starting at the nodes towhichc1 andc2, respectively, are attached; all features atand above this common node are those jointly imple-mented by these components.

• Components jointly required for two features,f1 and f2,are described byµ(f1) ∨ µ(f2); graphically depicted, oneascertains in the lattice the closest common node towardthe bottom element starting at the nodes to whichf1 andf2, respectively, are attached; all components at andbelow this common node are those jointly required forthese features.

• Components required for all features can be found at thebottom element.

• Features that require all components can be found at thetop element.

The information described above can be derived by atool and fed back to the product line expert. As soon as adecision is made to re-use certain features, all componentsrequired for these features (easily derived from the con-cept lattice) form a starting point for further static analysesto investigate quality (like maintainability, extractability,and integrability) and to estimate effort for subsequentsteps (wrapping, reengineering, or re-development fromscratch).

f1 f2 f3 f4 f5 f6 f7 f8c1 ✕ ✕

c2 ✕ ✕ ✕

c3 ✕ ✕ ✕ ✕ ✕

c4 ✕ ✕ ✕ ✕ ✕ ✕

1. It is possible to combine multiple features into one scenario, mak-ing the interpretation of the resulting concept lattice more compli-cated. This is beyond the scope of this paper.

c3c2c4

c1

f1, f2 f5 f6, f7, f8

f3, f4

<

f5 applies tothese concepts

c3 applies tothese concepts

concept

3.3. Implementation

The implementation of the described approach is sur-prisingly simple (if one already has a tool for conceptanalysis). Our prototype for a Unix environment is anopportunistic integration of the following parts:

• Gnu C compilergcc to compile the system using a com-mand line switch for generating profiling information,

• Gnu object code viewernm,

• Gnu profilerprof,

• concept analysis toolconcepts [8],

• graph editorGraphlet[3] to visualize the concept lattice,

• and a short Perl script to ascertain the executed functionsin the execution trace and to convert the file formats ofconcepts andGraphlet (the script has just 225 LOC).

The fact that the subprograms are extracted from theobject code makes the implementation independent fromthe programming language to a great extent (as long as thelanguage is compiled to object code) and has the advan-tage that no additional compiler front end is necessary.

4. Example

We analyzed the Xfig system [18] (version 3.2.1) con-sisting of about 76 KLOCs written in the programminglanguage C.

Xfig is a menu-driven tool that allows the user to drawand manipulate objects interactively under the X WindowSystem. Objects can be lines, polygons, circles, rectangles,splines, text, and imported pictures. An interesting firsttask in our example was to define what constitutes a fea-ture. Clearly, the capability to draw specific objects, likelines, splines, rectangles, etc., can be considered a featureof Xfig. Moreover, one can manipulate drawn objects indifferent edit modes (rotate, move, copy, scale, etc.) withXfig.

We conducted two experiments. In the first one, weinvestigated the ability to draw different shapes only. Inthe second one, we analyzed the ability to modify shapes.The second experiment exemplifies combined featurescomposed by basic features. For the second experiment, ashape was drawn and then modified. Bothdraw andmod-ify constitute a basic feature. Combined features add to theeffort needed to derive the feature component map as thereare many possible combinations.

The lattice revealed dependencies among features forthe Xfig implementation and the absence of such depen-dencies, respectively. Related features were groupedtogether in the concept lattice, which allowed us to com-pare our mental model of a drawing tool to the actualimplementation. The lattice also classified componentsaccording to their abstraction level; general components

can be found at the lower level, specific components at theupper level. Moreover, the lattice showed dependenciesamong components.

Figure 3 shows a partial view of the concept lattice gen-erated for Xfig. Node #38 groups all components neededfor drawing ellipses and circles (both by diameter andradius). Nodes #41, #44, #42, and #43 contain the compo-nents for the more specific shape types.

We made the experience that applying our method iseasy in principle. However, running all scenarios by handis time consuming. It may be facilitated by the presence oftest cases that allow an automated replay of various sce-narios

5. Related Research

Snelting has recently introduced concept analysis tosoftware engineering. Since then it has been used to evalu-ate class hierarchies [14], explore configuration structuresof preprocessor statements [9, 13], and to recover compo-nents [4,6,7,11,12].

For feature localization, Chen and Rajlich [5] propose asemi-automatic method, in which an analyst browses thestatically derived dependency graph; navigation on thatgraph is computer-aided. Since the analyst more or lesstakes on all the search, this method is less suited to quicklyand cheaply derive the feature component map. In con-

Figure 3. Partial View of Xfig’s concept lattice

trast, our method treats the system as a black box.Wilde and Scully [16] also use dynamic analysis to

localize features. They focus on localizing rather thanderiving required components.

Our technique goes beyond Wilde and Scully’s tech-nique in that it also allows to derive relevant relationshipsbetween components and features by means of conceptanalysis, whereas Wilde and Scully’s technique only local-izes a feature. The derived relationships are an importinformation to product line experts and represent addi-tional dependencies that need to be considered in a deci-sion for certain features and components.

6. Conclusions and Future Work

A feature component map describes which componentsare required to implement a particular feature and isneeded at an early stage within a process toward a productline platform

• to weigh alternative platform architectures,

• to aim further tasks – like quality assessment – to onlythose existing components that are needed to populatethe platform architecture,

• and to decide on further steps, like reengineering orwrapping.

The technique presented in this paper yields the featurecomponent map automatically using the execution tracesfor different usage scenarios. The technique is based onconcept analysis, a mathematical sound technique to ana-lyze binary relations, which has the additional benefits toreveal not only correspondences between features andcomponents, but also commonalities and variabilitiesbetween features and components.

The success of the described approach heavily dependson the clever choice of usage scenarios and the combina-tion of them. Scenarios that cover too much functionalityin one step or the clumsy combination of scenarios willresult in huge and complex lattices that are unreadable forhumans.

As future work, we want to explore how resultsobtained by the method described in this paper may becombined with results of additional static analyses. Forexample, we want to investigate the relation between theconcept lattice based on dynamic information and staticsoftware architecture recovery techniques.

Our experiments suggest that investigating automaticanalyses of the lattice we described here is worth furthereffort. Dealing with scenarios covering multiple featuresshould be investigated in more depth.

References

[1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud,

J.-M., ‘Transitioning Legacy Assets – a Product LineApproach’, Proceedings of the SIGSOFT Foundations ofSoftware Engineering, pp. 446-463, September 1999.

[2] Bosch, J., ‘Product-Line Architectures in Industry: A CaseStudy’, Proc. of the 21st International Conference on Soft-ware Engineering (ICSE’99), pp. 544-554, May 1999.

[3] Brandenburg, F.J., ‘Graphlet’, Universität Passau,http://www.infosun.fmi.uni-passau.de/Graphlet/.

[4] Canfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A.,‘A Case Study of Applying an Eclectic Approach to IdentifyObjects in Code’,Workshop on Program Comprehension, pp.136-143, May 1999.

[5] Chen, K. und Rajlich, V., ‘Case Study of Feature LocationUsing Dependence Graph’,Proc. of the 8th Int. Workshop onProgram Comprehension, pp. 241-249, June 2000.

[6] Graudejus, H.,Implementing a Concept Analysis Tool forIdentifying Abstract Data Types in C Code, master thesis,University of Kaiserslautern, Germany, 1998.

[7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure ofLegacy Code Based on Mathematical Concept Analysis’,Proceedings of the International Conference on SoftwareEngineering, pp. 349-359, May 1997.

[8] Lindig, C., Concepts,ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc.

[9] Krone, M. and Snelting, G., ‘On the Inference of Configura-tion Structures From Source Code’,Proceedings of the Inter-national Conference on Software Engineering, pp. 49-57,May 1994.

[10]Perry, D., ‘Generic Architecture Descriptions for ProductLines’, Proceedings of the Second International ESPRITARES Workshop, Lecture Notes in Computer Science 1429,pp. 51-56, Springer, 1998.

[11]Sahraoui, H., Melo., W, and Lounis, H., and Dumont, F.,‘Applying Concept Formation Methods to Object Identifica-tion in Procedural Code’,Proceedings of the Conference onAutomated Software Engineering, pp. 210-218, November1997.

[12]Siff, M. and Reps, T., ‘Identifying Modules via ConceptAnalysis’, Proceedings of the International Conference onSoftware Maintenance, pp. 170-179, October 1997.

[13]Snelting, G., ‘Reengineering of Configurations Based onMathematical Concept Analysis’,ACM Transactions on Soft-ware Engineering and Methodology5, 2, pp. 146-189, April1997.

[14]Snelting, G. and Tip, F., ‘Reengineering Class HierarchiesUsing Concept Analysis’,Proc. of the ACM SIGSOFT Sym-posium on the Foundations of Software Engineering, pp. 99-110, November 1994.

[15]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech-niques and Process Aspects for Product Line Development’,Proceedings of the 10th International Software ProcessWorkshop, June 1996.

[16]Wilde, N. and Scully, M.C., ‘Software Reconnaissance:Mapping Program Features to Code’,Software Maintenance:Research and Practice, vol. 7, pp. 49-62, 1995.

[17]Xfig system, http://www.xfig.org.

Unfortunately, the reviewers did not like the paper so much.It was accepted only as a short paper.We took the comments of the reviewers serious, improved the paper andadded more case studies.It was a complete re-write, but the essential idea survived.We submitted the new paper to ICSM and it received the best paperaward.So, young researchers: never give up!

Page 14: ICSM'01 Most Influential Paper - Rainer Koschke

The paper was selected for a special issue of ICSM for TSE.Before we submitted the paper, we asked the editors for the page limit.They told us there was no limit.

Page 15: ICSM'01 Most Influential Paper - Rainer Koschke

1

Locating Features in Source CodeThomas Eisenbarth, Rainer Koschke, and Daniel Simon

Abstract— Understanding the implementation of a certainfeature of a system requires to identify the computationalunits of the system that contribute to this feature. In manycases, the mapping of features to the source code is poorlydocumented. In this paper, we present a semi-automatictechnique that reconstructs the mapping for features thatare triggered by the user and exhibit an observable behavior.

The mapping is in general not injective; that is, a com-putational unit may contribute to several features. Ourtechnique allows to distinguish between general and specificcomputational units with respect to a given set of features.For a set of features, it also identifies jointly and distinctlyrequired computational units.

The presented technique combines dynamic and staticanalyses to rapidly focus on the system’s parts that re-late to a specific set of features. Dynamic information isgathered based on a set of scenarios invoking the features.Rather than assuming a one-to-one correspondence betweenfeatures and scenarios as in earlier work, we can now handlescenarios that invoke many features.

Furthermore, we show how our method allows incremen-tal exploration of features while preserving the “mentalmap” the analyst has gained through the analysis.

Keywords— program comprehension, formal concept anal-ysis, feature location, program analysis, software architec-ture recovery

I. Introduction

UNDERSTANDING how a certain feature is imple-mented is a major problem of program understand-

ing. Before real understanding starts, one has to locatethe implementation of the feature in the code. Systemsoften appear as a large number of modules each contain-ing hundreds of lines of code. It is in general not obviouswhich parts of the source code implement a given feature.Typically existing documentation is outdated (if it exists atall), the system’s original architects are no longer available,or their view is outdated due to changes made by others.So maintenance introduces incoherent changes which causethe system’s overall structure to degrade [1]. Understand-ing the system in turn becomes harder any time a changeis made to it.

One option, when trying to escape this vicious circle,is to completely reverse engineer the system in order toexhaustively identify its components and to assign fea-tures to components. We integrated published automatictechniques for component retrieval in an incremental semi-automatic process, in which the results of selected auto-matic techniques are validated by the user [2].

However, exhaustive methods are not cost-effective. For-tunately, knowledge of components implementing a spe-cific set of features suffices in many cases. Consequently,

T. Eisenbarth, R. Koschke, and D. Simon are with the In-stitute of Computer Science at the University of Stuttgart,Breitwiesenstrae 20–22, D-70565 Stuttgart, Germany. E-mail:{eisenbarth,simon,koschke}@informatik.uni-stuttgart.de.

a feature-oriented search focusing on the components ofinterest is needed.

This article describes a process and its supporting tech-niques to identify those parts of the source code which im-plement a specific set of related features. The process is au-tomated to a large extent. It combines static and dynamicanalyses and uses concept analysis—a mathematical tech-nique to investigate binary relations—to derive correspon-dences between features and computational units. Conceptanalysis additionally yields the computational units jointlyand distinctly required for a set of features.

An advantage of starting with features is that domainknowledge from the user’s perspective may be exploited,which is especially useful for external change requests anderror reports expressed in the terminology of a program’sproblem domain.

The remainder of this article is organized as follows.Sect. II gives an overview of our technique and introducesthe basic concepts. Sect. III introduces concept analysis.Sect. IV describes the process for locating and analyzingfeatures in more detail. In Sect. V, we report on two casestudies conducted to validate our approach. The relatedresearch in the area is summarized in Sect. VI.

II. Overview

The goal of our technique is to identify the computa-tional units that specifically implement a feature as well asthe set of jointly or distinctly required computational unitsfor a set of features. To this end, the technique combinesstatic and dynamic analyses.

This section gives an overview on our technique, de-scribes the relationships among features, scenarios, andcomputational units (summarized in Fig. 1) and explainswhat kind of dynamic information is used as input to ourtechnique. The section also introduces a simple examplethat we will use throughout the description of the methodin the following sections. The example is inspired by a pre-vious case study [4] in which we analyzed the drawing toolXFIG [5].

Computational unit . A computational unit is an exe-cutable part of a system. Examples for computationalunits are instructions (like accesses to global variables),basic blocks, routines, classes, compilation units, compo-nents, modules, or subsystems. The exact specification ofa computational unit is a generic parameter of our method.

Feature. A feature is a realized functional requirement ofa system (the term feature is intentionally defined weaklybecause its exact meaning depends on the specific context).Generally, the term feature also subsumes non-functionalrequirements. In the context of this paper, only functionalfeatures are relevant; that is, we consider a feature an ob-

2

computational unit

routine modulebasic block

featurescenarioimplemented by

***

invokes

*

Fig. 1. Conceptual model in UML notation.

servable behavior of the system that can be triggered bythe user.

Example. Our fictitious drawing tool FIG (which re-sembles XFIG [5]) allows a user to draw, move, and colordifferent objects, such as rectangles, circles, ellipses, and soforth. From the viewpoint of an analyst who is interestedin the implementation of circle operations in FIG, the abil-ity to draw, to move, and to color a circle are three relevantfeatures. 2

Every computational unit (excluding dead code) con-tributes to the purpose of the system and thus correspondsto at least one feature—be it a very basic feature, suchas the ability of the system to start or terminate. Yet,only few features may actually be of interest to the ana-lyst for her task at hand. In the following, we assume thatonly a subset of features is relevant. Consequently, onlythe computational units required for these features are ofinterest, too. The feature-unit map—as one result ofour technique— describes which computational units im-plement a given set of relevant features.

Scenario. Features are abstract descriptions of a system’sexpected behavior. If a user wants to invoke a feature of asystem, he needs to provide the system with adequate inputto trigger the feature. For instance, to draw a circle, theuser of FIG needs to press a certain button on the controlpanel for selecting the circle drawing operation, then toposition the cursor on the drawing area for specifying thecenter of the circle, to specify the diameter by moving themouse, and eventually to press the left mouse button forfinalizing the circle. Such sequences of user inputs thattrigger actions of a system with observable result [6] arecalled scenarios.

Our technique requires a set of scenarios that invoke thefeatures the analyst is interested in. A scenario s invokesa feature f if f ’s result can be observed by the user whenthe system is used as described by scenario s. A scenariomay invoke multiple features and features may be invokedby multiple scenarios. For instance, a scenario for movinga circle requires to draw the circle first, so this scenarioalso invokes feature “circle drawing”. There may be evendifferent scenarios all invoking the same set of features.Each scenario, then, represents an alternative way of in-voking the features. For instance, FIG allows a user topush a button or to use a keyboard shortcut to begin a cir-cle drawing operation. A set of scenarios each representingoptions and choices for the same feature resembles a usecase.

Scenarios are used in our technique to gather the com-putational units for the relevant features through dynamic

analysis, similarly to Wilde and Scully’s technique [7]. Ifthe system is used as described by the scenario, the exe-cution trace lists the sequence of all performed calls forthis scenario. Since our technique aims at only identify-ing the computational units rather than at the order of thecomputational units’ execution, we need only the executionprofile. The execution profile of a given program run isthe set of computational units called during the run with-out information about the order of execution. From theexecution profile, we gather the fact that a computationalunit has been executed at least once. We ignore the dura-tion of the computational unit’s execution because compu-tation time hardly gives hints for feature-specific compu-tational units. Once the specific computational units havebeen identified through our technique, other techniques,such as static or dynamic slicing [8], [9], can be used toobtain the order of execution if required. These techniquescan then be applied more goal-oriented by focusing on themost feature-specific computational units yielded by ourtechnique.

Feature-unit map. Our technique derives the feature-unitmap through concept analysis, a mathematically soundtechnique. In our application of concept analysis, conceptanalysis—simply stated—mutually intersects the executionprofiles for all scenarios and all resulting intersections toobtain the specific computational units for a feature andthe jointly and distinctly required computational units fora set of features.

Example. FIG allows to draw a circle either by diameteror by radius. The analyst who is interested in the differ-ences of these two circle operations and their differences toother circle operations, such as moving and coloring, willset up the scenarios listed in Fig. 2. Figure 3 lists thecomputational units executed for the scenarios in Fig. 2.Intersecting the execution profiles shows that setRadius isspecific to feature Draw-circle-radius, move to Move-circle,and color to Color-circle. 2

scenario name actions performed

Draw-circle-diameter draw a circle by diameterDraw-circle-radius draw a circle by radiusMove-circle draw a circle by diameter

and move itColor-circle draw a circle by diameter

and color it

Fig. 2. Example scenarios for FIG.

Beyond simply identifying the computational units

3

scenario executed computational unitsDraw-circle-diameter draw, setDiameterDraw-circle-radius draw, setRadiusMove-circle draw, setDiameter, moveColor-circle draw, setDiameter, color

Fig. 3. Execution profiles for Fig. 2.

specifically required for a feature, concept analysis addi-tionally allows to derive detailed relationships between fea-tures and computational units. These relationships iden-tify computational units jointly required by any subset offeatures and classify computational units as low-level orhigh-level with respect to the given set of features.

Example. Intersecting the execution profiles in Fig. 3additionally shows that the computational units jointly re-quired for Draw-circle-diameter, Move-circle, and Color-circle are draw and setDiameter, where draw is requiredfor all scenarios. 2

The information gained by concept analysis is used toguide a subsequent static analysis along the static depen-dency graph in order to narrow the computational units tothose that form self-contained and understandable feature-specific computational units. Computational units thatare only very basic computational units used as buildingblocks for other computational units but not containing anyapplication-specific logic are sorted out. Additional staticanalyses, like strongly connected component identification,dominance analysis, and program slicing [8] support thesearch for the units of interest.

For large and complex systems, our approach can be ap-plied incrementally as described in this paper.

Applicability

The retrieval of the feature-unit map is based on dynamicinformation where all computational units that are exe-cuted for a scenario are collected. The scenario describeshow to invoke a feature. This section describes the as-sumptions on features, scenarios, and computational unitswe make.

Features. Our technique is primarily suited for functionalfeatures that may be mapped onto computational units.In particular, non-functional features, such as robustness,reliability, or maintainability, do not easily map to compu-tational units.

The technique is suited only for features that can be in-voked from outside; internal implementation features, suchas the use of a garbage collector, may not necessarily bedeterministically and easily triggered from outside.

Scenarios. Scenarios are designed (or selected from existingtest cases) to invoke a known set of relevant features; thatis, we assume that the analyst knows in advance whichfeatures are invoked by a scenario.

Because suitable scenarios are essential to our technique,a domain expert is needed to set up scenarios. In manycases, the domain expert can reuse existing test cases asscenarios to locate features. However, the purpose of test

cases is to reveal errors, and hence test cases tend to becomplex and to cover many features. Contrarily, scenariosfor our feature location technique should be simpler and in-voke fewer features to differentiate the computational unitsmore clearly.

In order to explore variations of a feature, the domainexpert provides several scenarios, each triggering a featurevariation with a different set of input. To obtain effec-tive and efficient coverage, he builds equivalence classes ofrelevant input data. Identifying equivalence classes mayrequire knowledge on internal details of a system.

Computational units. The exact notion of computationalunit is a generic parameter to our technique and dependson the task and system at hand. In principle, there is nolimit to the granularity of computational units: One coulduse basic blocks, routines, classes, modules, or subsystems.Subsystems as computational units are suitable to obtainan overview for very large systems. Considering routines,methods, subprograms, etc. as computational units givesan overview at the global declaration level, whereas classesand modules lie in between subsystem and global declara-tion level. Basic blocks as computational units are onlyadequate for smaller systems or parts of a system wheremore detail is needed due to the likely information over-load to the analyst.

For practical reasons, for this paper we decided to useroutines as the computational unit of choice, where a rou-tine is a function, procedure, subprogram, or method ac-cording to the programming language. For the case studiespresented later on in this paper, routines were appropriate.

Static and dynamic dependencies. The results from conceptanalysis based on dynamic information are used to guidethe analyst in her static analysis, that is, her inspection ofthe static dependency graph. We use dynamic informationonly as a guide and not as a definite answer because dy-namic information depends upon suitable input data andthe test environment in which the scenarios are executed.

The static dependency graph can be extracted from pro-cedural, functional, as well as object-oriented programminglanguages. Because execution profiles can be recorded forthese languages, too, our technique is applicable to all theselanguages. However, the precision of the static extrac-tion influences the ease of the analyst’s inspection of thestatic dependencies, and static analysis is inherently moredifficult for object-oriented languages (and for functionallanguages with higher-order functions) than for procedurallanguages.

Static analyses need to make conservative assumptionsin the presence of pointers and dynamic binding, whichweaken the precision of the dependency graph. Fortu-nately, research in pointer analysis has made considerableprogress. There is a large body of work on pointer analy-sis for procedural languages [10], [11], [12], [13], [14], [15],[16], [17] and object-oriented languages [18], [19] that re-solves general pointers, function pointers, and dynamicbinding. These techniques vary in precision and costs.Interestingly enough, Milanova and others have recently

4

presented empirical data indicating that less expensiveand—theoretically—less precise techniques to resolve func-tion pointers reach the precision of more expensive and—theoretically—more precise techniques [20] due to the com-mon way of using function pointers (as opposed to pointersto stack and heap objects).

III. Formal Concept Analysis

This section presents the necessary background informa-tion on formal concept analysis. Readers already familiarwith concept analysis can skip to the next section.

Formal concept analysis is a mathematical technique foranalyzing binary relations. The mathematical foundationof concept analysis was laid by Birkhoff [21] in 1940. Formore detailed information on formal concept analysis we re-fer to [22], where the mathematical foundation is explored.

Concept analysis deals with a relation I ⊆ O×A betweena set of objects O and a set of attributes A. The tuple C =(O,A, I) is called a formal context. For a set of objectsO ⊆ O, the set of common attributes σ(O) is defined as:

σ(O) = {a ∈ A | (o, a) ∈ I for all o ∈ O} (1)

Analogously, the set of common objects τ(A) for a set ofattributes A ⊆ A is defined as:

τ(A) = {o ∈ O | (o, a) ∈ I for all a ∈ A} (2)

A formal context can be represented by a relation table,where the columns hold the objects and the rows hold theattributes. An object oi and attribute aj are in the rela-tion I iff the cell at column i and row j is marked by ”×”.As an example, a binary relation between arbitrary objectsand attributes is shown in Fig. 4(a). For that formal con-text, we have:

σ({o1}) = {a1, a4, a6, a7}τ({a6, a7}) = {o1, o3}

A tuple c = (O,A) is called a concept iff A = σ(O)and O = τ(A), that is, all objects in c share all attributesin c. For a concept c = (O,A), O is called the extent ofc, denoted by extent(c), and A is called the intent of c,denoted by intent(c). Informally speaking, a concept cor-responds to a maximal rectangle of filled table cells modulorow and column permutations. In Fig. 4(b), all conceptsfor the relation in Fig. 4(a) are listed.

The set of all concepts of a given formal context forms apartial order via the superconcept-subconcept ordering ≤:

(O1, A1) ≤ (O2, A2)⇔ O1 ⊆ O2 (3)

or, dually, with

(O1, A1) ≤ (O2, A2)⇔ A1 ⊇ A2 (4)

Note that (3) and (4) imply each other by definition. Ifwe have c1 ≤ c2, then c1 is called a subconcept of c2 andc2 is called superconcept of c1. For instance, in Fig. 4(b)we have c4 ≤ c2.

The set L of all concepts of a given formal context andthe partial order ≤ form a complete lattice, called conceptlattice:

L(C) = {(O,A) ∈ 2O×2A | A = σ(O) and O = τ(A)} (5)

The infimum (u) of two concepts in this lattice is com-puted by intersecting their extents as follows:

(O1, A1)u(O2, A2) = (O1 ∩O2, σ(O1 ∩O2)) (6)

The infimum describes a set of common attributes oftwo sets of objects. Similarly, the supremum (t) is de-termined by intersecting the intents:

(O1, A1)t(O2, A2) = (τ(A1 ∩A2), A1 ∩A2) (7)

The supremum yields the set of common objects, whichshare all attributes in the intersection of two sets of at-tributes.

The concept lattice for the formal context in Fig. 4(a)can be depicted as a directed acyclic graph whose nodesrepresent the concepts and whose edges denote thesuperconcept-subconcept relation ≤ as shown in Fig. 5(a).The most general concept is called the top element andis denoted by >. The most special concept is called thebottom element and is denoted by ⊥.

The concept lattice can be visualized in a more readableequivalent way by marking only the graph node with anattribute a ∈ A whose represented concept is the most gen-eral concept that has a in its intent. Analogously, a nodewill be marked with an object o ∈ O iff it represents themost special concept that has o in its extent. The uniqueelement in the concept lattice marked with a is therefore:

µ(a) = t{c ∈ L(C) | a ∈ intent(c)} (8)

The unique element marked with object o is:

γ(o) = u{c ∈ L(C) | o ∈ extent(c)} (9)

We will call a graph representing a concept lattice usingthis marking strategy a sparse representation of the lat-tice. The equivalent sparse representation of the lattice inFig. 5(a) is shown in Fig. 5(b). The content of a node Nin this representation can be derived as follows:• The objects of N are all objects at and below N .• The attributes of N are all attributes at and above N .For instance, the node in Fig. 5(b) marked with o1 and a1is the concept c4 = ({o1}, {a1, a4, a6, a7}).

For practical reasons, it is sometimes useful to apply onlyone of (8) or (9). For example if we have a large number ofattributes but just a small number of objects, we eliminatethe redundant appearance of attributes and keep the fulllist of objects in the concepts.

IV. Analysis Process

Our process to locate features is depicted in Fig. 6 usingthe IDEF0 notation [23]. It consists of five major activities:

5

a1 a2 a3 a4 a5 a6 a7

o1 × × × ×o2 × × × ×o3 × × × ×

(a)A formal context.

> ({o1, o2, o3}, {a7})c1 ({o1, o2}, {a4, a7})c2 ({o1, o3}, {a6, a7})c3 ({o2, o3}, {a5, a7})c4 ({o1}, {a1, a4, a6, a7})c5 ({o2}, {a2, a4, a5, a7})c6 ({o3}, {a3, a5, a6, a7})⊥ (∅, {a1, a2, a3, a4, a5, a6, a7})

(b)Concepts for the formal context.

Fig. 4. An example relation between objects and attributes. The corresponding concepts that can be derived from the formal context arelisted on the right.

({o2, o3}, {a5, a7})

({o2}, {a2, a4, a5, a7})

({o3}, {a3, a5, a6, a7})

({o1, o3}, {a6, a7})({o1, o2, o3}, {a7})

(∅, {a1, a2, a3, a4, a5, a6, a7})

({o1}, {a1, a4, a6, a7})

({o1, o2}, {a4, a7})

(a)Full concept lattice.

({o2}, {a2})

(∅, {a6})

(∅, {a5})

({o3}, {a3})

(∅, {a7})

(∅, ∅)

({o1}, {a1})

(∅, {a4})

(b)Sparse representation.

Fig. 5. The concept lattices for the example context in Fig. 4.

codesource

dynamic

analysis

scenario

creation

exp

ert

do

ma

in

unit

map

feature−

featuresrelevant(initially)

gra

ph

extr

acto

r

an

aly

sis

too

l

co

nce

pt

interpretation

of concept

lattice

static

dependency

analysis

filter, granularity

pro

file

r

co

mp

ile

r

validatedstatically

unit map

scenarios

1

3 4

25

human involvement

(not part of IDEF0 notation)

static dependencydependency graph

feature−

graph extraction

use

r

an

aly

st

de

pe

nd

en

cy

execution

profiles

need for additional scenarios (incremental analysis)

an

aly

st

Fig. 6. Process for feature location in IDEF0 notation.

6

1. Scenario creation: Based on features (either known ini-tially or discovered during incremental analysis), the do-main expert creates scenarios.2. Static dependency-graph extraction: The static depen-dency graph of the system under analysis is extracted.3. Dynamic analysis: The system is used according to se-lected scenarios.4. Interpretation of concept lattice: The data yielded bythe dynamic analysis is presented to and interpreted by theanalyst. Relevant computational units are identified.5. Static dependency analysis: The analyst searches thesystem for additional computational units that are relevantto selected features.

The different roles of human resources for these activ-ities are (human resources are highlighted in the processdiagrams by a UML actor icon):• The analyst is the person interested in how features maponto source code. She interprets the concept lattice andperforms the static analysis.• The domain expert designs the scenarios and lists theinvoked features for each scenario.• The user is the person who uses the system accordingto the selected scenarios.

All activities except the static dependency graph extrac-tion (which is done only once) benefit from the knowledgethat is gained in previous iterations and can be applied re-peatedly until sufficient knowledge about the system hasbeen gained. The order of the activities is specified bythe IDEF0 diagram in Fig. 6: An activity may start onceits input is available. The activities are explained in thefollowing sections.

A. Static Dependency Graph Extraction

The static dependency graph should subsume all typesof entities and dependencies present in the dynamic depen-dency graph: It is unnecessary to extract dynamic informa-tion that is not used in the subsequent static analysis. Yet,the static dependency graph may provide additional typesof entities and dependencies and also more fine-grained in-formation if a static extraction tool is used that exceedsthe capabilities of the available dynamic extraction tool.In this case, the static analysis can leverage less dynamicinformation but is still conservative. In our case studies, forinstance, we extracted many detailed static dependenciesamong global declarations (routines, global variables, anduser-defined types) but the profiler we used let us extractonly the dynamic call relationship among routines. Thisway, we had to analyze static variable accesses that mighthave never been executed in any of our scenarios.

B. Scenario Creation

A domain expert is needed for creating the scenarios.Any available information on the system’s behavior (e.g.,documentation, existing test cases, domain models, etc.) isuseful as input to him. Existing test cases may be usefulbut not necessarily directly applicable, because the focusduring testing is to cover the code completely and to com-bine features in many ways. Scenarios in our sense are very

distinctive; that is, they should invoke all relevant featuresbut as few other features as possible to ease the mappingsfrom scenarios to features and from features to computa-tional units (often it is unavoidable to invoke features thatare not of interest for the task at hand).

The scenarios are documented for future use similarly totest cases. Additionally, the documentation includes thefeatures invoked by the scenarios. If the domain expert alsospecifies the expected result of the scenario, the scenariomay also be used as simple test case.

C. Dynamic Analysis

The goal of the dynamic analysis is to find out whichcomputational units contribute to a given set of features.Each feature is invoked by at least one of the preparedscenarios.

The process that deals with the dynamic analysis isshown in more detail in Fig. 8. The inputs to the processare source code and a set of scenarios created by processstep 1 in Fig. 6. We proceed as follows:3.1 Compile for recording: The source code is compiledwith profiling options or is instrumented to obtain the ex-ecution profile.3.2 Scenario execution: The system is executed by auser according to the scenarios and execution profiles arerecorded.

If suitable tool support is available, a scenario’s execu-tion may be recorded at wish to exclude parts of the execu-tion that are not relevant, such as start-up and shutdown ofthe system [24], [25], [26]. Certain debuggers, for instance,allow to start and end trace recording. Instrumenting thesource code so that only relevant parts are recorded is gen-erally not an option because this requires that the feature-unit map is at least partially known already.

An alternative solution is to specify a special “start-end”scenario containing the actions to be filtered out. For in-stance, in order to mask out initialization and finalizationcode, the domain expert may prepare a “start-end” sce-nario in which the system is started and immediately shutdown.

Since each scenario is a precise description of the se-quence of user inputs that trigger actions of the system,every execution of a scenario yields the same executionprofile unless the system is nondeterministic. In case ofnondeterminism, one could either unite the profiles of allexecutions of the same scenario or differentiate each sce-nario execution. The latter is useful to identify differencesdue to nondeterminism.

D. Interpretation of Concept Lattice

In this process step, a concept lattice for the relationtable created by process step 3 is built. The goals of inter-preting the resulting concept lattices are:1. Identification of the relationships between scenarios andcomputational units (process steps 4.1–4.3)2. Identification of the relationships between scenarios andfeatures and thus between features and computationalunits (process step 4.4)

7

Sect. III main part

object o u computational unitset of objects O U set of computational unitsall objects O U all computational unitsattribute a s scenarioset of attributes A S set of scenariosall attributes A S all scenariosincidence relation I I invocation table

Fig. 7. Translation from the identifiers of Sect. III and the identifiersused from here on, which instantiate formal concept analysis.

The following subsections describe how to achieve thesegoals. The basic process of lattice interpretation is depictedin Fig. 9.

D.1 Scenario Selection

A number of execution profiles is selected in order toset up the context. Execution profiles may be recombinedto analyze various aspects of a system, where executionprofiles and scenarios can be reused.

Example. The analyst of FIG may first be interested inthe two different ways to draw a circle. She would thereforeselect the two scenarios Draw-circle-diameter and Draw-circle-radius. When she understands the differences be-tween these two features, she would investigate other circleoperations and additionally select Move-circle and Color-circle. 2

D.2 Concept Analysis

This process embodies a completely automated step thatcreates a concept lattice from the invocation table.

In order to derive the feature-unit map by means of con-cept analysis, we have to define the formal context (i.e., theobjects, the attributes, and the relation) and to interpretthe resulting concept lattice accordingly.

The formal context for applying concept analysis to de-rive the relationships between scenarios and computationalunits will be laid down as follows:

• Computational units will be considered objects.• Scenarios will be considered attributes.• A pair (computational unit u, scenario s) is in relation Iif u is executed when s is performed.

Figure 7 shows how to map the identifiers used in thegeneral description of concept analysis in Sect. III to theidentifiers used in the specific instantiation of concept anal-ysis within our method.

The system is used according to the set of scenarios, oneat a time, and the execution profiles are recorded. Eachsystem run yields all executed computational units for asingle scenario; that is, one column of the relation tablecan be filled per system run. Applying all scenarios thathave been selected during the process of scenario selectionprovides the relation table for formal concept analysis.

Example. Figure 10 shows the concept lattice for theinvocation table in Fig. 3, where all scenarios have beenselected. 2

D.3 Basic Interpretation

Concept analysis applied to the formal context describedin the last section yields a lattice from which interestingrelationships can be derived. These relationships can befully automatically derived and presented to the analyst.Thus, the analyst has to know how to interpret the derivedrelationships, but does not need to be familiar with thetheoretical background of lattices.

The following base relationships can be derived from thesparse representation of the lattice (note the duality):• A computational unit u is required for all scenarios atand above γ(u) in the lattice; for instance, SetDiameter isrequired for Draw-circle-diameter, Move-circle, and Color-circle according to Fig. 10.• A scenario s requires all computational units at and be-low µ(s) in the lattice; for instance, Color-circle requirescolor, setDiameter, and draw according to Fig. 10.• A computational unit u is specific to exactly one scenarios if s is the only scenario on all paths from γ(u) to thetop element; for instance, color is specific to Color-circleaccording to Fig. 10.• Scenarios to which two computational units u1 andu2 jointly contribute can be identified by the supremumγ(u1)tγ(u2). In the lattice, the supremum is the closestcommon node toward the top element starting at the nodesto which u1 and u2 are attached. All scenarios at and abovethis common node are those jointly implemented by u1 andu2. For instance, setDiameter and color jointly contributeto Color-circle according to Fig. 10.• Computational units jointly required for two scenarios s1and s2 are described by the infimum µ(s1)uµ(s2). In thelattice, the infimum is the closest common node toward thebottom element starting at the nodes to which s1 and s2are attached. All computational units at and below thiscommon node are those jointly required for s1 and s2. Forinstance, setDiameter and draw are jointly required forMove-circle and Color-circle according to Fig. 10.• Computational units required for all scenarios can befound at the bottom element; for instance, draw is requiredfor all scenarios according to Fig. 10.• Scenarios that require all computational units can befound at the top element. In Fig. 10, there is no suchscenario.

Beyond these relationships between computational unitsand scenarios, further useful aspects between scenarios onone hand and between computational units on the otherhand may be derived:• If γ(u1) < γ(u2) holds for two computational units u1and u2, then computational unit u2 is more specific withrespect to the given scenarios than computational unit u1because u1 contributes not just to the features for which u2contributes, but also to other features. For instance, coloris more specific to Color-circle than setDiameter and set-Diameter is more specific than draw according to Fig. 10.• If µ(s1) < µ(s2) holds for two scenarios s1 and s2, thenscenario s2 is based on scenario s1 because if s2 is executed,all computational units in the extent of µ(s1) need also tobe executed. For instance, Move-circle and Color-circle

8

scenario

execution

compile for

recording

profilesexecution

co

mp

ile

r

codesource

3.1 3.2executable

scenarios

pro

file

r

use

r

Fig. 8. The process for the dynamic analysis in Fig. 6.

scenario

selection

an

aly

sis

incre

me

nta

l

an

aly

sis

too

l

co

nce

pt

tableinvocation

concept

analysis

senario

feature

mapping feature−

unit map

an

aly

st

an

aly

st

concept

an

aly

st

4.1 4.2 4.4

executionprofiles

4.3

basic

interpretation

lattice

Fig. 9. The process for interpretation of the concept lattice in Fig. 6.

draw

setDiameter

color setRadiusmove Draw−circle−radiusColor−circle

Draw−circle−diameter

Move−circle

Fig. 10. Sparse concept lattice for Fig. 3.

are based on Draw-circle-diameter according to Fig. 10.

Thus the lattice also reflects the level of applicationspecificity of computational units. The information de-scribed above can be derived by a tool and fed back tothe analyst. Inspecting the relationships derived from theconcept lattice, a decision may be made to analyze only asubset of the original features in depth due to the additionaldependencies that concept analysis reveals. All computa-tional units required for these features (easily derived fromthe concept lattice) form a starting point for further staticanalyses to validate the identified computational units andto identify further computational units that were possiblynot executed during dynamic analysis because of limita-tions in the design of the scenarios.

D.4 Scenario Feature Mapping

The interpretation of the concept lattice as describedabove gives insights into the relationship between scenariosS and computational units U . However, the analyst isprimarily interested in the relationship between features Fand computational units U . This section describes how toidentify this relationship in the concept lattice if there is noone-to-one correspondence between scenarios and features.

Because one feature can be invoked by many scenariosand one scenario can invoke several features, there is notalways a strict correspondence between features and sce-narios. For instance, as discussed above, the scenariosMove-circle and Color-circle of FIG are based on Draw-circle-diameter according to Fig. 10 because in order tomove or color a shape, one has to draw it first. The sce-nario for moving or coloring a shape will thus necessarilyinvoke the feature which draws a shape. Fortunately, there

9

f1 f2 f3 u1 u2 u3 u4 u5 u6 u7

s1 × × × × × ×s2 × × × × × ×s3 × × × × × ×

(a)Invocation relation I.

({u6, u7}, {s1, s3}})

({u3, u5, u6, u7}, {s3})

({u2, u4, u5, u7}, {s2})

({u5, u7}, {s2, s3})

({u1, u2, u3, u4, u5, u6, u7}, ∅)

({u7}, {s1, s2, s3})

({u4, u7}, {s1, s2})

({u1, u4, u6, u7}, {s1})

(b)Concept lattice for context in Fig. 11(a)

({u6}, {s1, s3}})

({u5}, {s2, s3})

({u3}, {s3})

Spec

Rlvt

({u7}, {s1, s2, s3})

(∅, ∅)

({u4}, {s1, s2})

({u1}, {s1})Cspc

Shrd

Irlvt

({u2}, {s2})

(c)Sparse concept lattice of Fig. 11(b) categorized with respectto feature f1 that has been exposed in scenarios s1 and s2.

Fig. 11. Categorizing concept lattices.

is still a simple way to identify computational units rele-vant to the actual features in the concept lattice, althoughan unambiguous identification may require additional dis-criminating scenarios. The basic idea is to isolate featuresin the concept lattice through combinations of overlappingscenarios.

If a scenario invokes several features, one can formallymodel a scenario as a set of features s = {f1, f2, . . . , fm},where fn ∈ F for 1 ≤ n ≤ m (F is the set of all relevantfeatures). This modeling is simplifying because it abstractsfrom the exact order and frequency of feature invocationsin a scenario. On the other hand, if the order or frequencyof feature invocations do count, the scenarios may indeedbe considered complex features in their own right. If thesescenarios yield different execution profiles, they will appearin different concepts in the lattice and their commonalitiesand differences are revealed and may be analyzed.

With the domain expert’s additional knowledge of whichfeatures are invoked by a scenario we can identify the com-putational units relevant to a certain feature. Let us con-sider the invocation relation I in Fig. 11(a) (for better leg-ibility, scenarios are listed as rows and computational unitsas listed as columns). The table contains the called compu-tational units u1, . . . , u7 per scenario, and furthermore theinvoked features per scenario: s1 = {f1, f3}, s2 = {f1, f2},and s3 = {f2, f3}. The corresponding concept lattice forthe invocation relation in Fig. 11(a) is shown in Fig. 11(b).The feature part of the table is ignored while constructingthis lattice.

Computational units specific to feature f1 can be foundin the intersection of the executed computational units ofthe two scenarios s1 and s2 because f1 is invoked for s1 ands2. The intersection of the computational units executedfor s1 and s2 can be identified as the extent of the infimumof the concepts associated with s1 and s2: µ(s1)uµ(s2) =({s1, s2}, {u4, u7}). Since s1 and s2 do not share any other

feature, the computational units particularly relevant to f1are u4 and u7.

We notice that u7 is also used in all other scenarios, sothat one cannot consider u7 a specific computational unitfor any of f1, f2, or f3. Computational unit u4, in con-trast, is used only in scenarios executing f1. We thereforestate the hypothesis that u4 is specific to f1 whereas u7is not. Because there is no other scenario containing f1other than s1 and s2, computational unit u4 is the onlycomputational unit specific to f1.

Note that this is just a hypothesis because other featuresmight be involved to which u4 is truly specific and that arenot explicitly listed in the scenarios. Another explanationcould be that, by accident, u4 is executed both for f2 (ins2) and f3 (in s1); then, it appears in both scenarios butnevertheless is not specific to f1. However, chances are highthat u4 is specific to f1 because u4 is not executed when f2and f3 are jointly invoked in s3, which suggests that u4 atleast comes into play only when f1 interacts with f2 or f3.At any rate, the categorization is hypothetic and needs tobe validated by the analyst.

Computational units that are somehow related to butnot specific for f1 are such computational units that areexecuted for scenarios invoking f1 amongst other features.In our example, both s1 and s2 invoke f1. Computationalunits in extents of concepts which contain s1 or s2 are there-fore potentially relevant to f1. In our example, u1, u2, u5,and u6 are potentially relevant in addition to u4 and u7.Computational unit u3 is executed only for scenario s3,which does not contain f1.

Altogether, we can identify five categories for computa-tional units with regard to feature f1 (see Fig. 11(c)):Spec: u4 is specific to f1 because it is used in all scenariosinvoking f1 but not in other scenarios.Rlvt: u7 is relevant to f1 because u7 is used in all sce-narios invoking f1; but it is also more general than u4 be-

10

cause u7 is also used in scenarios not invoking f1 at all.Cspc: u1 and u2 are executed only in scenarios invokingf1. They are less specific than u4 because they are not usedin all scenarios that invoke f1; that is, these computationalunits are only conditionally specific. Whether u1 and u2 aremore or less specific than u7 is not decidable based on theconcept lattice. On one hand, they are used in all scenariosinvoking f1 and other scenarios, whereas u7 is also executedin scenarios that do not require f1. On the other hand, u7is executed whenever f1 is required, whereas u1 and u2 arenot executed in some scenarios that do require f1.Shrd: u5 and u6 are executed in scenarios invoking f1 butthey are also executed in scenarios not invoking f1; that is,they are shared with other features. These computationalunits are presumably less relevant than u1 and u2, whichare executed only when f1 is invoked, and also less relevantthan u7, which is executed in all scenarios invoking f1.Irlvt: u3 is irrelevant to f1 because u3 is executed onlyin scenarios not containing f1.

These facts are more obvious in the sparse representationof the lattice. Using this representation, given a featuref , one identifies the concept, cf , for which the followingcondition holds:

cf = (U, S) and⋂

sj∈Ssj = {f} (10)

Concept cf is called a feature-specific concept for f .Based on the feature-specific concept, one can categorizethe computational units as follows:

Spec: all computational units u for which γ(u) = c holds.Rlvt: all computational units u for which γ(u) = c′ andc′ < c holds.Cspc: all computational units u for which γ(u) = c′ andc < c′ holds.Shrd: all computational units u for which u is in the in-tent of concept c′ where c < c′ holds and c and γ(u) areincomparable.Irlvt: all other computational units not categorized byother categories.

When the distance between c and c′ is considered, thereare additional nuances within categories Rlvt, Cspc,and Shrd possible. The distance measures the size of theset of features a computational unit is potentially relevantfor. The larger the set, the less specific the computationalunit is.

Example. The scenario Move-circle in Fig. 2 invokestwo features: the ability of FIG to draw a circle by di-ameter and the ability to move this circle. The scenarioColor-circle also uses the ability to draw a circle; yet, itcolors the circle instead of moving it. Hence, the compu-tational units responsible for drawing a circle are attachedto the concept in Fig. 10 that represents the intersection ofthe features invoked by Move-circle and Color-circle. Thescenario Draw-circle-diameter would not necessarily havebeen required to identify the computational units for draw-ing a circle by diameter: The sparse lattice reveals thesecomputational units as the direct infimum of Move-circle

and color-circle even if Draw-circle-diameter is not consid-ered. However, Draw-circle-diameter is useful to separatedraw from setDiameter. 2

As a matter of fact, there could be several concepts forwhich condition (10) holds when different computationalunits are executed for the given feature, depending on thescenario contexts in which the feature is embedded. Forinstance, let us assume we are analyzing FIG’s undo ca-pabilities. Three scenarios can be provided to explore thisfeature:• Draw a circle: {draw-circle}• Undo circle drawing: {draw-circle, undo}• Undo without preceding drawing operation: {undo}

For the overlapping scenarios {draw-circle, undo} and{undo}, we may assume that different computational unitswill be executed beyond those that are specific to com-mand draw-circle: Quite likely, additional computationalunits will be executed to handle the erroneous attempt tocall undo without previous operation. Consequently, thelattice will contain an own concept for {draw-circle, undo}and another one for {undo}, where the latter is not a sub-concept of the former. The infimum of these two scenarioswill contain the computational units of the undo opera-tion executed for normal as well as exceptional execution,whereas the concept representing {undo} contains the com-putational units for error handling.

In case of multiple concepts for which condition (10)holds, we can unite the computational units that are inSpec with respect to these concepts. If the identified con-cepts are in a subconcept relation to each other, the su-perconcept represents a strict extension of the behavior ofthe feature. If the concepts are incomparable, these con-cepts represent varying context-dependent behavior of thefeature.

If there is no concept for which condition (10) holds,one needs additional scenarios that factor out feature f .For instance, in order to isolate feature f1 in scenarios1 = {f1, f3}, one can simply add a new scenario s2 ={f1, f2}. The computational units specific to f1 will be inµ(s1)uµ(s2).

It is not necessary to consider all possible feature com-binations in order to isolate features in the lattice. Inter-secting all currently available scenarios exactly tells whichfeatures are not yet isolated (the intersection could be doneby concept analysis applied to the formal context consist-ing of scenarios and features, where the incidence rela-tion describes which feature is invoked by which scenario).Slightly modified variants of scenarios invoking the featurecan be added to isolate the feature specifically.

The addition of new scenarios in order to discriminatefeatures in the lattice will lead us to an incremental con-struction of the concept lattice described in Sect. IV-F.Before we come to that, we describe the static dependencyanalysis.

E. Static Dependency Analysis

From the concept lattice, we can easily derive all com-putational units executed for any set of relevant features.

11

However, this gives us only a set of computational units,but it is not clear which of these computational units aretruly feature-specific and which of them are rather general-purpose computational units used as building blocks forother computational units. Given a feature f of interest,this question can be answered as follows:

• As a first approximation, all computational units in theextents of all feature-specific concepts for f jointly con-tribute to f .• The analyst refines this approximation by adding and re-moving computational units: By inspecting the static de-pendency graph and the source code of the computationalunits, she sorts out irrelevant computational units; she mayalso add feature-relevant computational units that were notexecuted due to an incomplete input coverage of the sce-narios. The concept lattice is an important guidance forthe analyst’s inspection of the dependency graph.

Example. For FIG’s ability to color a circle, the ana-lyst will need to validate the set of computational units{color, setDiameter, draw} according to the concept lat-tice in Fig. 10. The lattice shows that the analyst shouldstart with inspecting color because this appears as the mostspecific computational unit for coloring a circle. 2

E.1 Building the Starting Set

All computational units in the extent of a concept jointlycontribute to all features in the intent of the concept, whichimmediately follows from the definition of a concept. How-ever, there may also be computational units in the extentthat contribute to other features as well, so that they arenot specific to the given feature. There may be computa-tional units in the extent that do not contain any feature-specific code at all. Thus, computational units in the ex-tent of the concept need to be inspected manually. Becausethere are no reliable criteria known that automatically dis-tinguish feature-specific code from general-purpose code,this analysis cannot be automated and human expertise isnecessary. However, the concept lattice may narrow thecandidates for manual inspection.

The concept lattice and the dependency graph can helpto decide in which order the computational units are to beinspected such that the effort for manual inspection can bereduced to a minimum. Since we are interested in com-putational units most specific to a feature f , we start atthose computational units ui that are attached to a feature-specific concept of f , that is, for which cf = γ(ui) holds,where cf is a feature-specific concept for f . If there areno such computational units, we collect all computationalunits below any of the feature-specific concepts cf of f withminimal distance to cf in the sparse representation. Therecan be more than one concept cf , so we unite all computa-tional units that are attached to one of these concepts. Thesubset of computational units identified in this step that isaccepted after manual inspection is called the starting setSstart(f).

Example. The starting set for FIG’s ability to color acircle, Sstart(color-circle), is {color}. 2

E.2 Inspection of the Static Dependency Graph

Next, we inspect the executable static dependency graph(as one specific subset of the static dependency graph) thatcontains all transitive control-flow successors and predeces-sors of computational units in Sstart(f). We concentrate oncomputational units here because they are the active con-stituents and because they were subject to the dynamicanalysis. The executable static dependency graph can beannotated with the features and scenarios for which thecomputational units were executed. If a computationalunit is not annotated with any scenario, the computationalunit was not executed. Non-executable parts of the system,namely, declarative parts, may be added once all relevantcomputational units have been identified. A static points-to analysis is needed to resolve dynamic binding and callsvia routine pointers if present. The static points-to anal-ysis may take advantage of the knowledge about actuallyexecuted computational units yielded by the dynamic anal-ysis.

We primarily consider only those computational units uifor which ui ∈ extent(cf ) holds because only those com-putational units are actually executed when f is invokedaccording to the dynamic analysis. Hence, we combinestatic and dynamic information to eliminate conditionalstatic computational units executions in order to reducethe search space. Nevertheless, one should check for thereasons why certain computational units have not been ex-ecuted.

Any kind of traversal of the executable static dependencygraph is possible, but a depth-first search along the control-flow is most suited because a computational unit can beunderstood only if all its executed computational units areunderstood. In a breadth-first search, a human would haveto cope with continuous context switches. The goal of theinspection is to sort out computational units that do notbelong to the feature in a narrow sense because they donot contain feature-specific code.

The executable static dependency graph rather than theconcept lattice is traversed for inspection because the lat-tice does not really reflect the control-flow dependencies:γ(u1) > γ(u2) does not imply that u1 is a control-flow pre-decessor of u2. However, the concept lattice may still pro-vide useful information for the inspection. In Section IV-D,we made the observation that the lower a concept γ(u) isin the lattice, the more general computational unit u is be-cause it serves more features—and vice versa. Thus, theconcept lattice gives us insight into the level of abstrac-tion of a computational unit and, therefore, contributes tothe degree of confidence that a specific computational unitcontains feature-specific code.

Example. The analyst would first validate the startingset for FIG’s ability to color a circle Sstart(color-circle) ={color}. Then she would inspect the control-flow predeces-sors and successors of color. Some of them might not beexecuted, yet a brief check is still necessary to make surethat they are indeed irrelevant. Then, she would continuewith setDiameter and eventually inspect draw. 2

Two additional analyses gather further information use-

12

ful while navigating on the dependency graph:

• Strongly connected component analysis is used to iden-tify cycles in the dependency graph: If there is one compu-tational unit in a cycle that contains feature-specific code,all computational units of the cycle are related to the fea-ture because of the cyclic dependency.• Dominance analysis is used to identify computationalunits that are local to other computational units. A com-putational unit u1 dominates another computational unitu2 if every path in the dependency graph from its root tou2 contains u1. In other words, u2 can be reached onlyby way of u1. If a computational unit u is found to befeature-specific, then all its dominators are also relevantto the feature, because they need to be executed in orderfor u to be executed. If none of a dominator’s dominateescontains feature-specific code and the dominator itself isnot feature-specific, then the dominator is a clear cuttingpoint as all its dominatees are local to it. Consequently,the dominator and all its dominatees can be omitted whileunderstanding the system.

If more than one feature is relevant, one simply unitesthe starting sets for each feature and then follows the sameapproach. For more than one feature, the concept latticeidentifies computational units jointly and distinctly usedby those features.

Once all relevant computational units have been identi-fied, other static (e.g., program slicing) as well as dynamicanalyses (e.g., trace recording to obtain the order of execu-tion) can be applied to obtain further information. Theseanalyses can be performed more goal-oriented by leveragingthe retrieved feature-unit map.

F. Incremental Analysis

There are at least two reasons why an incremental con-sideration of scenarios is desirable. First, one might notget the suite of scenarios sufficiently discriminating the firsttime. New scenarios become necessary to further differenti-ate scenarios into features. Second, new scenarios are usefulwhen trying to understand an unfamiliar system incremen-tally. One starts with a small set of relevant scenarios tolocate and understand a fundamental set of features byproviding a small and manageable overview lattice. Then,one successively increments the set of considered scenariosto widen the understanding.

Adding scenarios means adding attributes to the formalcontext; but there are also situations in which objects areadded incrementally: in cases where computational unitsneed to be refined. For instance, computational units withlow cohesion—that is, computational units with multiple,yet different functions—will “sink” in the concept lattice ifthey contribute to many features. A routine containing avery large switch statement where only one branch is actu-ally executed for each feature is a typical example. If theanalyst encounters such a routine during static analysis,she could lower the level of granularity for computationalunits specifically for this routine to basic blocks. Basicblocks as computational units disentangle the interleavedcode: For the example routine with the large switch state-

ment, the individual switch branches would be more clearlyassigned to the respective feature in the concept lattice.

In this section, we describe an incremental considerationof attributes, namely, scenarios. Incremental considerationof objects—that is, refinement of computational units—isanalogous.

As soon as one understands the basics of a system, oneadds new scenarios for further detailed investigation andexploration of the unknown portions of the system. If onetries to capture all features of a software at once, the re-sulting lattice may become too large, too detailed, and thusunmanageable. If one starts with a smaller set of scenariosand further increases this set, all accumulated knowledgean analyst gained while working with the smaller latticehas to be preserved. The lattice—the mental map for theanalyst’s understanding—changes when new scenarios areadded. Fortunately, the smaller lattice can be mapped tothe larger one (the smaller lattice is the result of a so-calledsubcontext).Definition. Let C = (O,A, I) a context, O′ ⊆ O, andA′ ⊆ A. Then C ′ = (O′, A′, I ∩ (O′ ×A′)) is called a sub-context of C and C is called a supercontext of C ′. 2

In our application of concept analysis, we add only newrows (one for each new scenario, assuming that scenariosoccur in rows of the relation table) but never new columnsto the relation table (because we statically know all com-putational units in advance). Adding new rows leads to anew formal context (U, S′, I ′) in which relation I ′ extendsrelation I.

Proposition. Let C = (O,A, I) and C ′ = (O,A′, I ′),where A′ ⊆ A and I ′ = (I ∩ (O ×A′)). Then every extentof C ′ is an extent of C. 2

Proof. See [22]. 2

According to this proposition, each extent within thesubcontext will show up in the supercontext. This canbe made plausible with the relation table: Added rowswill never change existing rows, so the maximal rectan-gles forming concepts will extend only in vertical direction(if scenarios are listed in rows).

This proposition on the invariability of extents of sub-contexts that differ only in the set of objects results ina simple mapping of concepts from the subcontext to thesupercontext (for a formal proof see [22]):

(U, S) 7→ (U, σ(U))

The mapping is a u-preserving embedding, meaning thatthe partial order relationship is completely preserved. Con-sequently, the supercontext is basically a refinement of thesubcontext. By this mapping all concepts of the subcontextcan be found in the supercontext.

The supercontext may include new concepts not foundin the subcontext. The consequence for the visualizationof the supercontext is that the newly introduced conceptscan be highlighted easily in the visualized lattice of thesupercontext and that concepts in the subcontext can be

13

mapped onto concepts in the superconcept along with pos-sible user annotations. Additionally, an incremental auto-matic graph layout can be chosen: Only additional nodesand edges may be introduced in the supercontext, nodesand edges of the subcontext are kept. Thus, the positionof concepts relatively to each other will be preserved.

Example. Let us assume the analyst of FIG is nowinterested whether invoking the feature “circle drawing”twice makes a difference and what the differences betweendrawing a circle and drawing a dot (“Draw-dot”) on onehand and between moving a circle and undoing a circlemove operation (“Move-circle-undo”) on the other handare. The domain expert will design the appropriate sce-narios. The resulting invocation table for these and allprevious scenarios may be as in Fig. 12(a). The latticefor this new supercontext is shown in Fig. 12(b). Thenew scenario Draw-circle-diameter-twice is subsumed bythe existing scenario Draw-circle-diameter, showing thatusing the feature twice does not lead to additional rele-vant computational units. The new scenario Draw-dot issubsumed by the bottom concept; thus, Draw-dot sharesonly the computational unit draw with the feature “cir-cle drawing”. Both scenarios Draw-circle-diameter-twiceand Draw-dot do not change the general structure of thelattice. Only the concept highlighted in Fig. 12(b) is new.This concept shows the difference between Move-circle andMove-circle-undo, which is the additionally executed com-putational unit undo. 2

V. Case Studies

This section describes two case studies evaluating ourmethod. The first case study on web browsers shows thebenefit from combining static and dynamic information.The second case study focuses on dynamic information andexemplifies the incremental analysis for a very large com-mercial system.

In both case studies, the computational units of choiceare routines. The Bauhaus [46] tools were used to ex-tract the static dependency graph. The extracted staticdependency graph contains all global declarations (rou-tines, global variables, and user-defined types) and manydependencies such as calls between routines, references ofglobal variables by routines, type information for variables,dependencies between user-defined types, occurrences oftypes in routine signatures, and so on [27].

For the dynamic analysis, we used a standard profiler togather execution profiles. The profiler has the limitationthat it does not record accesses to variables. We thereforeanalyzed variable accesses statically.

A. Web Browsers

In this section, we discuss the usefulness of static anddynamic informations as introduced in Sect. IV-E.

We analyzed two web browsers (both written in C; seeFig. 13) using the same set of relevant related features.The concept lattice for each of these systems was derived asdescribed in Sect. IV. The required routines as identified bydynamic analysis and the relationships derived by concept

system version KLOC(wc) #subprograms

Mosaic 2.6 51,440 701Chimera 2.0a19 38,208 928

Fig. 13. Analyzed web browsers.

analysis formed a starting point for the static dependencyanalysis.

A.1 Case Study Setup

In two experiments, we tried to understand how twospecific sets of related features are implemented in bothbrowsers using the process described above. The goal ofthis analysis was to recover the feature-specific computa-tional units and the way they interact—that is, to reverseengineer a partial description of the software architecture.The partial software architecture, for instance, allows oneto decide whether feature-specific computational units canbe extracted from one system and integrated into anothersystem with only minor changes. Chimera does not imple-ment all features that Mosaic provides and we wanted tofind out whether the respective feature-specific computa-tional units of Mosaic can be reused for Chimera.• Experiment “History” (H): Chimera allows going backin the history of already visited URLs, but Chimera doesnot have a forward button that allows a user to move for-ward in the history again after the back button was used.Mosaic has both a back and a forward button. In this ex-periment, going back and going forward were consideredrelated features.• Experiment “Bookmark” (B): Both Mosaic and Chimeraoffer bookmarks for visited URLs. URLs may be book-marked, and bookmarked URLs may be loaded and re-moved. We considered the following related features: ad-dition of a new bookmark for a currently viewed URL,removal of a bookmark, and navigation to a bookmarkedURL.

A.2 Objectives

The questions we wanted to answer in our case study areas follows:• Identification and extraction: How are the history andthe bookmark features implemented in Mosaic (Chimera)?What are the interfaces between the specific computationalunits that implement these features and the rest of Mosaic(Chimera)? In both cases, a partial description of the soft-ware architecture was recovered.• Integration: How can the identified portion of the codeof one browser be integrated into the other browser?

The whole experiment (from initial setup of scenariosand compiling with profiler options up to the architecturalsketches) took two people half a day of work altogether forMosaic and Chimera.

A.3 Scenarios for Dynamic Analysis

For each experiment and each browser, we ran thebrowser in a start-end scenario in which the browser was

14

draw setDiameter setRadius move color undo

Draw-circle-diameter × ×Draw-circle-radius × ×

Move-circle × × ×Color-circle × × ×

Draw-circle-diameter-twice × ×Move-circle-undo × × × ×

Draw-dot ×(a)Supercontext of Fig.3.

draw

setDiameter

color setRadiusmove

undoMove−circle−undo

Draw−circle−diameter

Draw−circle−radiusColor−circleMove−circle

Draw−circle−diameter−twice

Draw−dot

(b)Lattice for the (super)context in Fig. 12(a)

Fig. 12. The lattice for the supercontext of Fig. 10.

started and immediately quit in order to separate start-up and shutdown code. The following additional scenarioswere defined specifically to the two experiments. Experi-ment “History” was covered by the following three scenar-ios:(H1) Basic scenario doing nothing but browsing(H2) Scenario using the back button(H3) Scenario using the back and forward buttons

For Chimera, the last scenario was not performed (be-cause Chimera possesses no forward button).

Experiment “Bookmark” was covered by the followingfour scenarios:(B1) Basic scenario: simply opening and closing the book-mark window(B2) Scenario: adding a new bookmark for the currentlydisplayed URL(B3) Scenario: removing a bookmark(B4) Scenario: selecting a bookmark and visiting the as-sociated URL

Each scenario was immediately ended by quitting therespective system. We provided scenarios that invoke onefeature only except for one scenario: One cannot use theforward button without using the back button. Conse-quently, the concept containing routines executed for sce-nario (H2) is a subconcept of the concept related to (H3).Likewise, a bookmark can be deleted only when a URL hasbeen added before. To circumvent this problem, we startedthe browser with a non-empty bookmark file in all scenar-ios. Thus, we did not consider the case of insertion into anempty bookmark list.

A.4 Static Dependency Analysis

In the dependency graph for the browsers, visualizedusing the Bauhaus extension to Rigi [28], we derived allstatically transitively called routines (using Rigi’s basic se-lection facilities [28]) and intersected the static informa-

(1) (2) (3) (2) ∩ (3) relevantMosaic/(B) 701 359 99 74 16Mosaic/(H) 348 74 65 6Chimera/(B) 928 431 89 55 3Chimera/(H) 419 123 55 24

Fig. 14. Subprogram counts for Mosaic and Chimera.

routine calls routine

less specific routines and general purpose functions

very specific routines

cutting level

lower region

upper region

Fig. 15. Relevant parts of Chimera for history.

tion with the actually executed routines manually. We ad-ditionally filtered out all routines specific to HTML andthe X-window-based graphical user interface guided by thebrowser’s proper naming conventions. These routines wereall in the bottom element of the concept lattice.

A.5 Results

Figure 14 provides a summary of the numbers of rou-tines that needed to be further considered in each step andshows how the search space could be reduced in each step.The history experiment is denoted by (H) and the book-

15

(3)

(2)

(1)browser

GUI

history

(a)Mosaic’s history.

inner

state

location

of history

dispatch

browser

GUI

(b)Chimera’s history.

component data storage routine call

Fig. 16. Mosaic’s and Chimera’s history architecture.

mark experiment is denoted by (B). The total number ofall routines of the kernels (not including libraries such ashtml, jpeg, zlib) is in column (1), the number of actuallyexecuted routines for any of the scenarios is shown in col-umn (2). All routines statically called by routines selectedfrom the set of dynamically executed routines in upper con-cepts of the lattice (i.e., called from routines in the start-ing set) are in column (3). The intersection of column (2)and (3) contains all routines dynamically called by routinesselected from the set of dynamically executed routines inupper concepts of the lattice; their number is reported incolumn “(2) ∩ (3)”. Column relevant reports all routinesin column (2)∩ (3) that are specific to the selected featuresaccording to our manual inspection. All other routines areused for other purposes than bookmarks and histories.

Eventually, only a small number of routines needed tobe inspected more thoroughly due to the top-down inspec-tion process. As an example, Fig. 15 shows the remainingroutines of Chimera (omitting their names) relevant to thehistory experiment. This picture clearly shows the possiblecutting points in the dependency graph (consisting of rou-tines, global variables, and user-defined types and their de-pendencies) of routines specific to the history features (up-per region) and non-specific routines (lower region): Onlytwo entities need to be removed to isolate feature-specificfrom non-specific entities.

We recovered the parts of the architecture of Mosaic andChimera relevant to the two experiments.

A.6 Results for History

The interface between Mosaic’s browser kernel and thehistory component (see Fig. 16(a)) is formed by three rou-tines to (1) get the current URL, (2) set the current URL,and (3) communicate the action and event (changed URL).

The history component can be easily extracted from Mo-saic’s source code because it is a separate component—

whereas the history is an integral part of Chimera’s kernel(cf. Fig. 16(b)). There is no set of routines of Chimerathat could be reasonably addressed as ”history managercomponent” as in Mosaic. Chimera uses a layer of wrap-pers calling a dispatching routine around a list of actionswhere the displayed URLs are part of that list.

The recovered partial architecture shows that Chimera’sbrowser kernel is built around a list of visited URLswhereas Mosaic’s browser kernel does not know the his-tory of visited URLs at all. As the analysis of the partialarchitectural architectures reveals, re-using Mosaic’s his-tory components in Chimera would be very difficult due tothe architectural mismatch [29].

A.7 Results for Bookmarks

The partial architectures of the two systems are similarto each other with respect to bookmarks. Both architec-tures include an encapsulated bookmark component, whichcommunicates via a narrow interface with the basic browserkernel (see Fig. 17).

The basic actions that have to be performed are: (1) getcurrently shown URL, (2) set currently shown URL, (3) dis-play the bookmarks, and (4) communicate the bookmarkselection back.

Exchanging the two implementations between Mosaicand Chimera would be reasonably easy.

B. Case Study Agilent

This section reports on a case study conducted to inves-tigate the usefulness of the approach in a realistic full-scaleindustrial setting. The case study stresses the importanceof incremental understanding of very large concept latticesas described in Section IV-F and the modeling of scenariosas set of features as explained in Section IV-D.4.

The system analyzed is part of the software of the Ag-ilent 93000 SOC Series, a semi-conductor test equipment

16

(2)

(1)

(3)

(4)

browser

GUI

bookmarks

(a)Mosaic’s bookmarks.

inner

state

(2)

(1)

(3)

(4)

dispatch

GUI

browser

bookmarks

(b)Chimera’s bookmarks.

Fig. 17. Mosaic’s and Chimera’s bookmark architecture.

produced by Agilent Technologies.

B.1 Agilent 93000 SOC Series

The Agilent 93000 SOC Series is a single scalable testerplatform used in the manufacturing process of integratedcircuits. It provides test capabilities for digital, analog, andradio frequency circuits as well as for embedded memories.The SmarTest software controls the complex tester hard-ware. It is an interactive environment for developing andrunning test programs.

SmarTest consists of numerous tools supporting test en-gineering tasks. At the center of the software lies thefirmware, an interpreter for IEEE-488-like commands. Thefirmware is responsible for programming the hardware.The input to the firmware are the test cases, which are se-quences of firmware commands. The firmware parses andinterprets each command, drives the Agilent 93000 device,and returns the result. It is the firmware that was analyzedin our case study.

The software of the Agilent 93000 SOC series is main-tained by several geographically distributed groups. Twoof them are situated in the USA, one in Japan, and onein Germany. The group in which the case study was con-ducted is the SOC Test Platform Division at Bblingen,Germany.

The firmware of the Agilent 93000 has evolved over 15years. Today, it consists of 1.2 million commented linesof C code—counted with the Unix program wc—or about500.000 non-empty lines of declarative or executable Ccode, respectively. The static call graph of the part of thefirmware that was analyzed for this case study had 9.988routines and 17.353 call edges excluding standard C rou-tines and operating system routines.

Figure 18 depicts the software architecture of thefirmware as described by one of the software architects atAgilent. The firmware is used simultaneously by differenttools running as separate processes. Interaction betweenthese tools and the firmware is through shared memoryand message queues as part of the firmware. A semaphoreis used to synchronize interaction between firmware andother tools.

The firmware is basically an interpreter for test pro-grams. When a test program is filed into the shared mem-ory, the firmware parses and runs each command. In orderto run a command, the firmware dispatches the correspond-ing C routine that acts as an entry point to the implemen-tation of the command. There is one such C routine—alsoreferred to as executor—for each command. When the ex-ecutor has finished, its result is written back to the sharedmemory and the waiting process is informed through themessage queue. As Fig. 18 suggests, the executors sharea set of re-usable utility routines—routines offering moregeneral services. Which utility routines are actually sharedby which executors is, however, not shown in the architec-tural sketch. As a matter of fact, the software architectcurrently does not exactly know what the precise relationbetween executors and utility routines is due to the size ofthe system and the lack of documentation.

Many commands interpreted by the firmware come inpairs: the actual command and an additional command tofetch the result of its execution. The latter is called thequery command. The commands are named by four-letteracronyms. Query commands are additionally annotatedwith a question mark. For instance, CNTR? is the querycommand of CNTR.

The firmware understands about 250 different actualcommands; most of them have a corresponding query com-mand. Altogether, there are about 450 different com-mands.

For this case study, we focused on the digital part ofthe firmware, namely on Configuration Setup, Relay Con-trol, Level Setup, Timing Setup, and Vector Setup com-mands (other classes of commands are Analog Setup, ACTest Function, DC Measurement, Test Result, Utility Line,and Calibration and Attributes commands):

Configuration Setup Commands: Configuring pins is thefirst step one must take when preparing a test. Commandsof this class allow assigning pin names to a test or powersupply channel, configuring pin type and operation modes,specifying the series resistor, and other things.Routing Setup Commands: The Routing Setup commandsspecify the signal mode and connection for each pin, and

17

utility functions

exe

ctu

or

exe

ctu

or

exe

ctu

or

constructor

command

YACC parser

response

semaphor queuemessageshared

memory

control flow

data flow

applications

firmware

firmware

hardware

Fig. 18. Software architecture of Agilent 93000 firmware

the order of connections.Level Setup Commands: The Level Setup commands spec-ify the required driver amplifier and receiver comparatorvoltage levels, as well as set termination via the active loador set the clamp voltage.Timing Setup Commands: The Timing Setup commandsdefine the length of the device cycle, the shape of the wave-forms making up a device cycle, and the position of thetiming edges in a tester cycle for all configured pins.Vector Setup Commands: The Vector Setup commandsare required to set up and sequence test vectors.Relays Control Commands: The Relay Control commandsare used to set relay positions and the tester state.

B.2 Objectives

This case study had three goals:

1. The architectural sketch in Fig. 18 had to be mappedto the source code so that the parts of the system thatcontribute to the blocks “executors” and “utility functions”are identified. It had to be clarified which routines areexecutors.2. The utility routines were to be assigned to the executorsthey support. This mapping clarifies the fine structure ofthe “utility functions” block in Fig. 18.3. Some commands of the Agilent 93000 firmware we in-vestigated were not assigned to one of the classes of Config-uration Setup, Relay Control, Level Setup, Timing Setup,or Vector Setup commands, neither by the architect norby the user manual. These were to be classified accordingto the resulting concept lattice to see whether the latticeprovides useful information to classify features.

The overall goal of our case study was to map the archi-tecture sketch in Fig. 18 to the source and to show which

utility routines are really shared. Given the above men-tioned classes of commands, our hypothesis was that theexecutors for commands of the same class share many util-ity routines. On the other hand, for commands of differentclasses, we expected less commonalities, in other words,one would expect that only more general utility routinesare shared.

B.3 Scenarios for the firmware of Agilent 93000

The software architect at Agilent selected the commandsfor digital tests that were to be investigated. Three stu-dents of the University of Stuttgart created the test cases—advised by the expert. For each relevant firmware com-mand, a test case was provided that executes the command.

The execution of some commands is bound to certainpreconditions that need to be fulfilled by calling other com-mands first, which requires to add these commands to thetest cases. Hence, a test case is generally not a singlecommand but a sequence of firmware commands, of whichone is the relevant command and the others are requiredpreparing steps. The order of preparing commands was thesame for all test cases that had these commands as precon-ditions, and there were no two test cases executing the sameset of routines. As already described in Section IV-D.4, wecan thus model a test case (scenario) as set of commands(features) s = {command1, command2, . . . , commandm}.

In order to identify the routines specific to the relevantcommand only, one can factor out preparing steps by ad-ditional test cases, which execute the preparing commandsbut not the relevant command. For instance, in order tocall command UDPS, one needs to execute DFPS first. Thus,the test case for UDPS is {DFPS, UDPS} where only UDPS isrelevant. In order to identify the routines for UDPS specifi-

18

real 76 scenarios for relevant commands1 scenario for NOP command

additional 2 additional parameter combinationsfactoring 1 start-end

13 scenarios for preparing steps

total 93 scenarios

Fig. 19. Test cases / scenarios.

cally, one can simply add another test case executing DFPS

only. The routines specific to UDPS can then be identifiedin the concept lattice as described in Section IV-D.4.

If a command has a query command, two test cases werecreated: one for the actual command and one for the querycommand. The former contains only the actual commandbut not the query command and the latter only the querycommand but not the actual command (in all cases wherethe query command can be called without calling the actualcommand before).

If a command has different options, the test case exe-cutes the command with several different combinations ofoptions. The combination is aimed at covering equivalenceclasses of option settings.

For one pair of an actual and a query command, namely,the command SDSC, four scenarios were created: two forthe actual and two for the query command. The differenceof the two scenarios for both the actual and the query com-mand is the setting of the specification parameter, that ei-ther relates to Timing or Level Setup. The distinction wasmade to see whether the command requires routines fromdifferent parts of the system, that is, the timing setup andlevel setup parts.

Each test case represents a scenario. In total, 93 sce-narios were provided (cf. Fig. 19). Among these, 76 sce-narios correspond to one relevant firmware command fordigital tests. One additional scenario contained just theno-operation (NOP) command, which has no effect on thetester. Two additional scenarios were added to call com-mand SDCS and its query command with the alternativeparameter setting. The remaining scenarios were used torefactor scenarios: The start-end scenario was used to re-move start-up and shutdown code by simply starting thesystem, executing a reset command, and shutting down thesystem, and 13 factoring scenarios were provided to factorout preparing steps in real scenarios.

Agilent’s own large test suite for testing the firmwarecould not be used since we needed scenarios that explorepreferably one command (or feature, respectively) at atime. Agilent’s test cases use combinations of commands.Moreover, the existing test driver of the test suite executesall tests in one run so that the result would have been a sin-gle profile for all test cases instead of an individual profilefor each test case.

B.4 Resulting Concept Lattice

The resulting concept lattice is shown in Fig. 20. It con-sists of 165 concepts and 326 non-transitive subconcept re-

lations. Out of the 9.988 statically declared routines, only1.463 were actually executed by at least one of the 92 con-sidered scenarios (the start-end scenario is used to removethose routines from the profiles of the other scenarios thatare executed for initialization, reset, and shutdown of thesystem only).

Although, the worst case execution time to compute aconcept lattice is exponential in the number of objects andattributes, our computation of the concept lattice for thefirmware took less than 2 minutes on an Intel Pentium III800 MHz machine running Linux.

Another developer at Agilent (different from the soft-ware architect who sketched the firmware architecture) wasasked to validate the resulting concept lattice. To make aclear distinction between this validating expert and the ex-pert who sketched the firmware architecture, the formerwill be called developer and the latter software archi-tect in the following.

The developer was familiar with the firmware but wasnot involved in the preparation of the test cases. We ex-plained the test cases that were selected and the interpre-tation of the concept lattice as described in this paper.We did not show the architecture sketch from the soft-ware architect. We asked the developer to explain the gen-eral structure of the system with the concept lattice andwhether there are any surprises in the lattice.

The developer immediately spotted in the 65 direct sub-concepts of the top element—that is, concepts in the firstrow below the top element of the lattice—the individual ex-ecutors for 65 commands (including the executor for NOP).(The top element itself does not contain any scenario.)Among these 65 concepts, 63 contain a single scenario andtwo contain two scenarios. The ones with two scenarios arethe two different parameter settings for the SDSC commandand the corresponding query command (cf. Sect. V-B.3).Consequently, the implementation of the SDSC commandexecutes the same routines independently from the param-eter that refers to timing or level setup, respectively. Thus,65 executors could immediately be detected in the lattice.Based on these observations, we could easily map the con-cept lattice in Fig. 24 to the architecture sketch of Fig. 18.

The other 12 real scenarios can be found in subconceptsof the above mentioned 65 concepts. The reason why thesescenarios cannot be found directly below the top elementis that they represent commands that are also needed aspreparing steps for other commands. For instance, beforethe commands PSLV and UDPS can be called, one must callDFPS. The scenarios for PLSV and UDPS are consequently{DFPS, PLSV} and {DFPS,UDPS}, respectively. The scenariothat contains DFPS only will therefore be part of the conceptthat is the common infimum of the scenarios for PLSV andUDPS since {DFPS} = {DFPS,PLSV} ∩ {DFPS,UDPS}. By rep-resenting test cases (scenarios) as sets of commands (fea-tures) and isolating commands through intersecting testcases as described in Section IV-D.4, we could easily iden-tify the executors for the remaining 12 commands whosetest case is not directly located below the top element.

As described above, the firmware commands can be cat-

19

Fig. 20. The lattice for all commands. The boxes’ height corresponds to the number of routines in the concepts.

egorized in different classes (Configuration Setup, RelayControl, Level Setup, Timing Setup, and Vector Setupcommand). In order to visualize the jointly used routinesby executors for commands of the same class, we coloredthe concept lattice as follows:1. Each concept representing an executor in the lattice getsthe color of the executor’s class; the colored concept is thestarting node for the traversal in the next step.2. By top-down traversal starting at the colored concept,the color of the respective executor is propagated to allsubconcepts of the executor’s concept (until a different ex-ecutor is reached).

The colored concept lattice for Agilent’s firmware givesinteresting insights. All concepts directly below the top el-ement in Fig. 24 have just one color because these conceptsactually represent just one executor of a given command. Ifa concept, c, has more than one color, the routines, ui, forwhich γ(ui) = c holds contribute to commands of differentclasses. As a matter of fact, there were only few conceptsabove the bottom element with different colors showingthat there is substantial sharing of routines among execu-tors of the same class of commands. The utility routinesin concepts having only one color seem to be specific tojust a single class of commands. In other words, either aroutine is specific to a class of commands or it is used forall command classes in general.

The dynamic analysis in conjunction with concept analy-sis thus has given important insight into the internal struc-ture of the black box labeled ”utility routines” in Fig. 18:534 routines (out of 1.463 routines executed for at leastone test case and 9.988 statically declared routines, respec-tively) could be related to the executors, that is, are notspecifically attached to the bottom element.

There are also executors for commands of the same classthat share only the most general routines in the bottomelement, that is, those routines executed for all executors.The most remarkable example are the executors for theconfiguration setup of single pins on one hand and those forthe configuration setup of whole pin groups. While the ex-

ecutors for single pins share many routines specific to theirclass, the executors for pin groups (which also belong to thesame class Configuration Setup) do not share any routinebeyond those in the bottom element, neither with executorsfor single pins nor with other executors for pin groups. Ourhypothesis was that there are many routines jointly usedby configuration setup commands for pin groups similarlyto commands for single pins. The developer reviewing theconcept lattice explained that macros are heavily used forroutine inlining in the subsystem implementing pin groupconfiguration. According to the developer, this subsystemis an older part of the system. Apparently, at its initialdevelopment, no compiler with automatic routine inliningwas available. The use of macros undermines our way tocollect dynamic information. The profiler we used recordsonly routine calls and, hence, cannot reveal code sharingamong these pin group commands.

Generally, the concepts just below the top element con-tain only one routine. Some contain more than one rou-tines but less than five. In these cases, a programmer ap-parently has split a large executor into smaller pieces forbetter modularization. There is one concept just belowthe top element that contains a very large number of rou-tines. This concept represents the test execution. Thedeveloper explained that the routines specifically attachedto this concept are strongly related but could have beenfurther grouped if more scenarios for test execution wouldhave been provided.

The developer also looked at another very large conceptlocated in the middle of the concept lattice. By looking atthe routines specifically attached to this concept, he told usthat about 70% of these routines deal with memory man-agement. Hence, this concept collected a large number ofsemantically related routines.

There are 929 routines specifically attached to the bot-tom element, that is, routines that are used for all scenar-ios. For these routines, either the selection of test casesfailed to further structure this set of routines or the rou-tines are necessarily required for all possible usage scenar-

20

Configuration SetupCNTR, CNTR?, CONF, CONF?

UDEF, UDPS, UDGP

DPFN, DFPN?, DFPS, DFPS?

DFGP, DFGP?, DFGE, DFGE?

PALS, PALS?, PSTE, PSTE?

PSFC, PSFC?, PQFC, PQFC?

PACT, PACT?

Relay Control (Test Execution)RLYC, RLYC?

Level Setup CommandsLSUS, LSUS?, DRLV, DRLV?

RCLV, RCLV?, TERM, TERM?

Timing Setup CommandsPCLK, PCLK?, DCDF, DCDF?

WFDF, WFDF?, WAVE, WAVE?

ETIM, ETIM?, BWDF, BWDF?

Vector Setup CommandsSQLA, SQLB, SQLB?, SQPG, SQPG?

SPRM, SPRM?, SQSL, SQSL?

Fig. 21. Categorization of commands as found by the software ar-chitect.

UncategorizedFTST, VBMP, PSLV, CLMP

WSDM, DCDT, CLKR, VECC

SDSC, SREC, DMAS, STML

Fig. 22. Commands not categorized by the software architect.

ios, in which case other techniques are needed to groupthese routines semantically. Since our goal was to identifythe executors and the routines shared by the executors,we did not further investigate the routines in the bottomelement.

B.5 Inferring Categorization from Concept Lattice

Prior to our analysis, the software architect selectedfirmware commands that were to be investigated. He alsocategorized the commands as described in Section V-B.3.As it turned out during our analysis of the concept lattice,the categorization was incomplete. The software architectcategorized only the commands listed in Fig. 21. Addi-tionally, he prepared scenarios that explored the commandslisted in Fig. 22. The incomplete categorization gave us theopportunity to check whether it would be possible to cat-egorize commands into the above classes just on the basisof the concept lattice without any knowledge of the systemand the application domain.

One of the authors of this article guessed the categoriesbased on the concept lattice only—more precisely, based onthe sharing of utility routines with other already classifiedcommands. The assumption was that a command belongsto the class of commands with which it shares most utilityroutines. Altogether 7 out of 12 commands were actuallyassigned to one of these classes based on this assumption.For the remaining commands, the lattice did not provide

Guess Developer Manual

Relay Control (Test Execution)FTST

Level Setup CommandsPSLV PSLV PSLV

CLMP CLMP

FTST

VBMP VBMP

Timing Setup CommandsDCDT DCDT

CLKR CLKR CLKR

WSDM

Vector Setup CommandsVECC VECC VECC

DMAS DMAS

SREC

Others/MultipleSDSC SDSC SDSC

DMAS

STML STML STML

SREC SREC

VBMP

DCDT

WSDM WSDM

FTST

CLMP

Fig. 23. Comparison with oracles.

unambiguous information.We used two oracles to validate these guesses. Firstly

we asked the developer to classify these commands andsecondly we checked the user manual for the firmware. Thecomparison of the guesses with the two oracles is shown inFig. 23.

Interestingly enough, the classification given in the man-ual is also incomplete. Two of the used commands, namely,CLMP and STML, are not described in the manual. Moreover,the command FTST does not really belong to the targetedclasses of commands according to the manual; it was addedby the software architect because it is the starting com-mand for the actual test execution. SDSC and WSDM arecommands that cannot be assigned to one class of com-mand only but rather contain aspects of different classes.

As can be seen in Fig. 23, the classification of the devel-oper is also incomplete since he did not know all firmwarecommands. There are more than 250 commands, notcounting the corresponding query commands. The clas-sification of the developer is in accordance with the usermanual except for CLMP, which is not described in the man-ual.

If we compare the lattice-based guesses with the oracle,we find that the author was truly wrong only once, namely,for command FTST. In case of command WSDM, he assigneda command to one class of two equally possible classes.

It was interesting to see that many commands could beassigned correctly simply based on the lattice without any

21

knowledge of the application domain and implementationof the system.

B.6 Lessons Learnt

In the beginning of our case study, we explained the ba-sic interpretation of the concept lattice to the developerwithout going into the formal mathematical details. Thedeveloper learnt how to read the concept lattice surpris-ingly quickly in less than 10 minutes, which suggests thatthe technique can easily be adopted by practitioners.

The developer confirmed that the technique could be use-ful for maintenance programmers who are less familiar withthe system in order to quickly identify the executors. Sincethere was a naming convention for executors in place, locat-ing the executors could have been done with textual searchtools, such as grep, more easily, he noted. The developeralso confirmed the general approach for the static analysisonce the executors have been located: If he is to modifya command, he also traverses the dependency graph. Forlack of more sophisticated tools, he is using simple tools,such as the Unix tool ctags, to get the necessary cross-reference information. However, the developer agreed thatit would have been very difficult for him—using such sim-ple tools—to identify the firmware commands to which agiven routine contributes. Such kind of information wouldhelp him in the impact analysis of changes. Moreover, itwould also have been very difficult for him to identify thesharing of utility routines among executors.

This case study also revealed some difficulties with theproposed technique. For instance, due to the use of inliningof routines by way of macros, the profiler could not identifythe code sharing of commands for pin groups. For suchinlining, a static analysis is necessary. In order to identifythis kind of code sharing, one could try to identify jointuses of macros in the non-preprocessed code or duplicatedcode in the preprocessed code by way of clone detectiontechniques.

Another difficulty that had to be tackled in this casestudy is the problem of handling parameterized scenarios,that is, scenarios that are alike except for values of certainparameters. For instance, most commands of the firmwarehave options. The options, of course, influence the behaviorof the system. The same command may execute differentroutines for different options. This problem is equivalent tothe input coverage problem of testing software in general.Analogously, the test cases for the Agilent case study weredefined so as to cover equivalence classes of possible param-eter values. The firmware commands were then called withdifferent combinations of representative values of equiva-lent parameter settings. However, full coverage of all possi-ble combinations would exceed all available resources, andthere is no guarantee that the software actually behavesequivalently for all apparently equivalent input values.

Due to the dynamic analysis, only about 15% of the al-most 10,000 routines were present in the formal context forconcept lattice. Likewise, the number of scenarios was re-alistic, yet trimmed to only the digital part of the system.Nevertheless, the concept lattice for the firmware of the

Agilent 93000 chip tester—containing 165 concepts—wasrelatively large and complex. Such large concept latticesare a challenge for visualization. Not so much with re-gard to the time to produce a visualization but with thereading and understanding of such a large graph. We usedGraphViz by AT&T [30] to layout the graph automati-cally in virtually no time. Also, the resulting layout wasacceptable—at any rate, much better than we could havedrawn the graph. However, we would have liked to groupthe nodes of the graph semantically in terms of the classesto which the associated commands belong beyond the aes-thetic criterion of minimizing edge crossings. Moreover,the lattice was too large to be presented on a 21” screen.For this reason, we used a print-out of the lattice with 19pages (DIN A4 format) for the discussion with the devel-oper, and even on this print-out, the names of routines andscenarios were hard to read.

The experiences with size and complexity of the final lat-tice in the Agilent case study lead us to develop supportfor incremental construction and understanding of the con-cept lattice as described in Section IV-F. The visual differ-ence for considering scenarios incrementally is illustratedby Fig. 24. Figure 24(a) contains the concept lattice forall Timing Setup commands. For the lattice in Fig. 24(b),all scenarios for Vector Setup have been added. When allscenarios for all classes of commands are added, the latticein Fig. 20 is obtained.

VI. Related Research

This section discusses research related to our work. Wediscuss work on several aspects that are of interest. First,we take a look at papers most closely related to our own ap-proach. Next, we summarize work that visualizes dynamicand static information in different ways.

Feature Location

Wilde et al. [7], [27] pioneered in locating features takinga fully dynamic approach. The goal of their Software Re-connaissance is the support of maintenance programmerswhen they modify or extend the functionality of a legacysystem.

Based on the execution of test cases for a particular fea-ture f , several sets of computational units are identified:

• computational units commonly involved (code executedin all test cases, regardless of f),• computational units potentially involved in f (code exe-cuted in at least one test case that invokes f),• computational units indispensably involved in f (codethat is executed in all test cases that invoke f , and• computational units uniquely involved in f (code exe-cuted exactly in cases where f is invoked)

Since the primary goal is the location of starting pointsfor further investigations, Wilde and Scully focus on locat-ing specific computational units rather than all requiredcomputational units. The approach deals with one featureat a time and gives little insight into connections betweensets of related features. If a set of related features is to

22

(a)Timing commands. (b)Timing and vector commands.

Fig. 24. Concept lattice for digital part of Agilent 93000 firmware.

be considered rather than a single feature, one could re-peat the analysis invoking each feature separately and thenunite the specifically required computational units. Eventhen the relationships among groups of features cannot berecognized.

Another approach based on dynamic information istaken by Wong and colleagues [28]. They analyze executionslices (which correspond to our execution profiles) of testcases implementing a particular functionality. The processis as follows:

1. The invoking input set I (i.e., a set of test cases or—inour terminology—a set of scenarios) is identified that willinvoke a feature.2. The excluding input set E is identified that will not in-voke a feature.3. The program is executed twice using I and E separately.4. By comparison of the two resulting execution slices, thecomputational units can be identified that implement thefeature.

For deriving all required computational units, the exe-cution slice for the including input set is sufficient. Bysubtracting all computational units in the execution slicefor the excluding input set from those in the execution slicefor the invoking input set, only those computational unitsremain that specifically deal with the feature. This infor-mation alone is not sufficient to identify the interface andthe constituents of a component in the source code, butthose computational units are at least a starting point fora more detailed static analysis. Again, interdependenciesbetween features are not revealed easily.

In [29], Wong et al. present a way for quantification offeatures. Metrics are provided to compute the dedicationof computational units to features, the concentration offeatures in computational units, and the disparity betweenfeatures. This work complements their earlier research andcan be used as a refinement for Wilde’s technique.

Chen and Rajlich [30] propose a semi-automatic methodfor feature location, in which the programmer browsesthe statically derived abstract system dependency graph(ASDG). The ASDG describes detailed dependenciesamong routines, types, and variables at the level of globaldeclarations. The navigation on the ASDG is computer-aided and the programmer takes on all the search for afeature’s implementation. The method takes advantage ofthe programmer’s experience with the analyzed software.It is less suited to locate features if programmers withoutany pre-knowledge do not know where to start the search.

The ASDG’s quality is essential for the method. Ifthe ASDG includes overoptimistic assumptions on func-tion pointers, the programmer may miss routines called viafunction pointers. If it reflects too conservative assump-tions, the search space increases drastically. It is staticallyundecidable which control flow paths are taken at runtime,so that every conservative static analysis will yield an over-estimated search space. In contrast, dynamic analyses ex-actly reveal which parts are actually used at runtime—although only for a particular run. Insights from dynamicanalyses are valid only for the input data used and theenvironment in which the system was run.

Recently, Wilde and Rajlich compared their ap-proaches [31]. In the presented case study, both techniqueswere effective in locating features. The Software Recon-naissance showed to be more suited to large infrequentlychanged programs, whereas Rajlich’s method is more ef-fective if further changes are likely and require deep andmore complete understanding.

Visualization of Object-Oriented Systems

De Pauw and colleagues [36], [37], [38] provide a gen-eral model for the visualization of the execution of object-oriented systems. Their language and platform indepen-dent approach visualizes dynamic information about the

23

runtime behavior by means of message sequence charts andchart-like views for summary information.

Program Explorer [25], [26] by Lange and Nakamura isa tool for understanding C++ programs by means of vi-sualization. Both static and dynamic information is ex-tracted and combined for the presentation of an object-oriented system. The static information derived from thesource (like class hierarchy and structural data) is stored ina program database. The dynamic information comprisesmethod invocation, object longevity, and variable accessesand is gained off-line from execution traces. Program Ex-plorer offers selective instrumentation of the source, requir-ing the user to have a certain knowledge about the sys-tem. To cope with the amount of information, the usercan further merge, prune, or slice results of analyses to re-move undesired information. The dynamic information iscoupled with the static information yielding class-to-objectand object-to-class clarification. Program Explorer is notuseful for global understanding, the user must have knowl-edge about the system and then focus on relevant parts.The approach is class and object centered and does notoffer other levels of abstraction.

Koskimies and Mssenbck developed Scene [24], a tool forvisualizing object-oriented systems written in the program-ming language Oberon. Scene uses scenario diagrams forvisualizing the message flow between objects in terms ofmethod invocations. The scenario diagrams are generatedfrom event traces and linked to other sources of informa-tion.

Jerding and colleagues [39], [40] focus on the interactionsbetween program components at runtime. They observedthat recurring interaction pattern can be used in the ab-straction process for program understanding. The authorsdeveloped a pattern identification algorithm and structurethe dynamic information by using identified patterns. Thework primarily aims at object-oriented systems but alsoseems applicable for procedural programming paradigms.Jerding and Rugaber present the tool ISVis [40] to supportarchitectural localization and extraction. They use bothstatic and dynamic information to extract components andconnectors. The components are specified by the analyst(using traditional static analyses) whereas the connectorsare recognized from actual execution traces. These exe-cution traces are then analyzed with the aforementionedmethods. The dynamic information is visualized as a vari-ant of message sequence charts; the user has the ability torestrict the instrumentation to specific files of the system.

Syst [41] focuses on reverse engineering Java legacy sys-tems. She discusses the combination of static and dy-namic information when reengineering a Java environment.Rigi [28] is used to extract the static information from classfiles and to connect the dynamic information (representedas state diagrams) gained through program runs.

Visualization and Abstraction

Another effort to combine dynamic and static informa-tion about object-oriented systems is taken by Richner andDucasse [42]. They offer a query-based approach where the

facts about the legacy system are modeled in terms of log-ical facts. The queries produce different views of the soft-ware (at different levels of abstraction) and help to restrictthe amount of data generated. There is no informationexchange between the views.

Sefika and colleagues [43] visualize statics and dynamicsof an object-oriented system in terms of its architecturalabstractions. The code instrumentation is light-weight andarchitecture-aware. It provides efficient on-line instrumen-tation to support architecture-guided queries. The archi-tectural abstraction are taken as a basis for the visualiza-tion. Similarly, Walker and colleagues [44] aim at visual-ization of dynamic information on a higher level of abstrac-tion. They use program animation techniques for programunderstanding.

Most recently, Robbillard and Murphy [45] address theproblem of crosscutting concerns in object-oriented sys-tems. They propose the usage of Concern Graphs thatabstract implementation details of concerns and explicitlyshow relationships between parts of the concerns. The ex-traction of concern graphs from a given legacy system couldbenefit from dynamic feature-location techniques.

Concept Analysis

Primarily Snelting has recently introduced concept anal-ysis to software engineering. Since then it has been usedto evaluate class hierarchies [32], explore configurationstructures of preprocessor statements [33], [48], for re-documentation [35], and to recover components [36]–[42].All of that research utilizes static information derived fromsource code.

A technique similar to ours is taken by Ball [43]. He de-scribes how to use concept analysis for the dynamic analysisof test sets. The source code is instrumented and profileinformation is gathered. The results of concept analysis onthe data are used to provide an intermediate point betweenentity-based and path-based coverage criteria.

Summary

All the researchers using program traces face the sameproblem: the huge amount of data that is produced bythe execution. The problem is tackled by removing unde-sired information—either by instrumenting only parts ofthe system or by providing filtering mechanisms (patternsor static information) on the stored traces.

The amount of information gained by profiling ratherthan tracing is much smaller (and less precise), and cantherefore be handled more efficiently. Even profiling on amore fine grained level than routines or methods (e.g., basicblocks) leads to comprehensible results. For our primarygoals, the sequences of operations was not crucial and canat least in parts be regained from static information. Thefrequency of invocations does not play a major role by now,but we believe that such information could be exploited infuture research.

24

VII. Conclusions

The technique presented in this paper identifies compu-tational units specific to a set of related features using ex-ecution profiles for different usage scenarios. At first, con-cept analysis—a mathematically sound technique to an-alyze binary relations—allows locating the most feature-specific computational units among all executed computa-tional units. Then, a static analysis uses these feature-specific computational units to identify additional feature-specific computational units along the dependency graph.The combination of dynamic and static information re-duces the search space drastically.

The value of our technique has been demonstrated byseveral case studies. In one case study, analyzing twoweb browsers, we could recover a partial description of thesoftware architecture with respect to a specific set of re-lated features. Commonalities and variabilities betweenthese partial architectures could be recovered quickly. Al-together, we found in two experiments with two systems 16and 6, respectively, feature-specific routines out of 701 rou-tines for Mosaic and 3 and 24, respectively, out of 928 forChimera. Only very few routines needed to be inspectedmanually.

The second case study was performed on a 1.2 millionLOC production system. The experiences we made duringthat case study showed two problems of our approach: thegrowing complexity of concept lattices for large systemswith many features and the need for handling compositionsof features.

In this paper, we extended our technique to solve theseproblems. We showed how the method allows incremen-tally exploring features while preserving the “mental map”the analyst has gained through the analysis.

The second improvement described in this paper is a de-tailed look at composing features into more complex sce-narios. Rather than assuming a one-to-one correspondencebetween features and scenarios as in earlier work, we cannow handle scenarios that invoke many features.

Further, the implementation of our approach is simple.For concept analysis we used the tool concepts [58]. Forvisualization we used our graphical Bauhaus front end [46].Layouts are generated by GraphViz [30]. The glue code iswritten in Perl, for compiling and profiling we used gcc andgprof.

Acknowledgments

We would like to thank Gerd Bleher and Jens Elmen-thaler (both at Agilent Technologies) for their support inthe Agilent case study. We also like to thank Tahir Karaca,Markus Knauss, and Stefan Opferkuch (all students at theUniversity of Stuttgart) for preparing the test cases in theAgilent case study.

References

[1] Meir M. Lehman, “Programs, Life Cycles and the Laws of Soft-ware Evolution,” Proceedings of the IEEE, Special Issue onSoftware Evolution, vol. 68, no. 9, pp. 1060–1076, Sept. 1980.

[2] Rainer Koschke, Atomic Architectural Component Recovery for

Program Understanding and Evolution, Dissertation, Univer-sitat Stuttgart, Germany, 2000.

[3] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon, “Locat-ing Features in Source Code—Case Studies,” Available at http://www.bauhaus-stuttgart.de/bauhaus/papers/, Oct. 2002.

[4] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon,“Derivation of Feature-Component Maps by Means of Con-cept Analysis,” in Proceedings of the 5th European Conferenceon Software Maintenance and Reengineering, Lisbon, Portugal,Mar. 2001, pp. 176–179, IEEE Computer Society Press.

[5] “The XFIG drawing tool, Version 3.2.3d.,” Available at http://www.xfig.org/, 2001.

[6] James Rumbaugh, Ivar Jacobson, and Grady Booch, The Uni-fied Modeling Language Reference Manual, Addison-Wesley,1999.

[7] Norman Wilde and Michael C. Scully, “Software Reconnais-sance: Mapping Program Features to Code,” Journal of Soft-ware Maintenance: Research and Practice, vol. 7, pp. 49–62,Jan. 1995.

[8] Susan Horwitz, Thomas Reps, and David Binkley, “Interproce-dural Slicing Using Dependence Graphs,” ACM Transactions onProgramming Languages and Systems, vol. 12, no. 1, pp. 26–60,Jan. 1990.

[9] Arpad Beszedes, Tamas Gergely, Zsolt Mihaly Szabo, JanosCsirik, and Tibor Gyimothy, “Dynamic slicing method for main-tenance of large C programs,” in Proceedings of the 5th EuropeanConference on Software Maintenance and Reengineering. Mar.2001, pp. 105–113, IEEE Computer Society Press.

[10] Lars Ole Andersen, Program Analysis and Specialization for theC Programming Language, Ph.D. thesis, DIKU, University ofCopenhagen, Danmark, 1994.

[11] Guiliano Antoniol, F. Calzolari, and Paolo Tonella, “Impact ofFunction Pointers on the Call Graph,” in Proceedings of the Eu-ropean Conference on Software Maintenance and Reengineering,Amsterdam, Netherlands, Mar. 1999, pp. 51–59.

[12] Ben-Chung Cheng and Wen-Mei W. Hwu, “Modular interproce-dural pointer analysis using access paths,” in Proceedings of theConference on Programming Language Design and Implemen-tation, Vancouver, BC, Canada, 2000, pp. 57–69.

[13] Manuvir Das, “Unification-based Pointer Analysis with Direc-tional Assignments,” in Proceedings of the Conference on Pro-gramming Language Design and Implementation, Vancouver,BC, Canada, 2000, pp. 35–46.

[14] Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren,“Context-Sensitive Interprocedural Points-to Analysis in thePresence of Function Pointers,” in Proceedings of the Confer-ence on Programming Language Design and Implementation,Orlando, FL, USA, 1994, pp. 242–257.

[15] Robert P. Wilson and Monica S. Lam, “Efficient context-sensitive pointer analysis for c programs,” in Proceedings ofthe Conference on Programming Language Design and Imple-mentation, La Jolla, CA, USA, 1995, pp. 1–12.

[16] Sean Zhang, Barbara G. Ryder, and William Landi, “Programdecompositon for pointer aliasing: A step towards pratical analy-ses,” in Symposium on the Foundations of Software Engineering,1996, pp. 81–92.

[17] Bjarne Steensgaard, “Points-To Analysis in almost linear time,”in Symposium on Principles of Programming Languages, St. Pe-tersburg Beach, FL, USA, Jan. 1996, pp. 32–41.

[18] Amer Diwan, Kathryn McKinley, and Eliot Moss, “Using typesto analyze and optimize object-oriented programs,” Program-ming Languages and Systems, vol. 23, no. 1, pp. 30–72, 2001.

[19] Atanas Rountev, Ana Milanova, and Barbara G. Ryder, “Points-To Analysis for Java using Annotated Constraints,” in Pro-ceedings of the Conference on Object Oriented ProgrammingSystems, Languages, and Applications, Tampa, FL, USA, Oct.2001, pp. 43–55.

[20] Ana Milanova, Atanas Rountev, and Barbara G. Ryder, “PreciseCall Graph Construction in the Presence of Function Pointers,”in Proceedings of the 2nd International Workshop on SourceCode Analysis and Manipulation, Montreal, Canada, Oct. 2002,pp. 155–162, IEEE Computer Society Press.

[21] Garret Birkhoff, Lattice Theory, American Mathematical So-ciety Colloquium Publications 25, Providence, RI, USA, firstedition, 1940.

[22] Bernhard Ganter and Rudolf Wille, Formal Concept Analysis—Mathematical Foundations, Springer, 1999.

25

[23] “IDEF0,” Available at http://www.idef.com/idef0.html, Dec.1993.

[24] Kai Koskimies and Hanspeter Mossenbock, “Scenario-BasedBrowsing of Object-Oriented Systems with Scene,” Report 4,Johannes Kepler Universitat Linz, Austria, Aug. 1995.

[25] Danny B. Lange and Yuichi Nakamura, “Program Explorer: AProgram Visualizer for C++,” in Proceedings of the USENIXConference on Object-Oriented Technologies, Monterey, CA,USA, June 1995, USENIX Association.

[26] Danny B. Lange and Yuichi Nakamura, “Object-Oriented Pro-gram Tracing and Visualization,” Computer, vol. 30, no. 5, pp.63–70, May 1997.

[27] Norman Wilde, Juan A. Gomez, Thomas Gust, and DouglasStrasburg, “Locating User Functionality in Old Code,” in Pro-ceedings of the International Conference on Software Mainte-nance, Orlando, FL, USA, Nov. 1992, pp. 200–205, IEEE Com-puter Society Press.

[28] W. Eric Wong, Swapna S. Gokhale, Joseph R. Horgan, andKishor S. Trivedi, “Locating Program Features using Exe-cution Slices,” in Proceedings of the IEEE Symposium onApplication-Specific Systems and Software Engineering & Tech-nology, Richardson, TX, USA, Mar. 1999, pp. 194–203, IEEEComputer Society Press.

[29] W. Eric Wong, Swapna S. Gokhale, and Joseph R. Hogan,“Quantifying the Closeness between Program Components andFeatures,” The Journal of Systems and Software, vol. 54, no. 2,pp. 87–98, Oct. 2000.

[30] Kunrong Chen and Vaclav Rajlich, “Case Study of FeatureLocation Using Dependence Graph,” in Proceedings of the 8thInternational Workshop on Program Comprehension, Limerick,Ireland, June 2000, pp. 241–249, IEEE Computer Society Press.

[31] Norman Wilde, Michelle Buckellew, Henry Page, and Vaclav Ra-jlich, “A Case Study of Feature Location in Unstructured LegacyFortran Code,” in Proceedings of the 5th European Conferenceon Software Maintenance and Reengineering, Lisbon, Portugal,Mar. 2001, pp. 68–75, IEEE Computer Society Press.

[32] Gregor Snelting and Frank Tip, “Reengineering Class Hierar-chies using Concept Analysis,” in Proceedings of the 6th SIG-SOFT Symposium on Foundations of Software Engineering, Or-lando, FL, USA, Nov. 1998, pp. 99–110, ACM Press.

[33] Maren Krone and Gregor Snelting, “On The Inference of Con-figuration Structures from Source Code,” in Proceedings of the16th International Conference on Software Engineering, Sor-rento, Italy, May 1994, pp. 49–58, IEEE Computer Society Press.

[34] Gregor Snelting, “Reengineering of Configurations Based onMathematical Concept Analysis,” ACM Transactions on Soft-ware Engineering and Methodology, vol. 5, no. 2, pp. 146–189,Apr. 1996.

[35] Tobias Kuipers and Leon Moonen, “Types and Concept Analysisfor Legacy Systems,” in Proceedings of the 8th InternationalWorkshop on Program Comprehension. June 2000, pp. 221–230,IEEE Computer Society Press.

[36] Gerardo Canfora, Aniello Cimitile, Andrea De Lucia, andGuiseppe A. Di Lucca, “A Case Study of Applying an EclecticApproach to Identify Objects in Code,” in Proceedings of the7th International Workshop on Program Comprehension, Pitts-burgh, PA, USA, May 1999, pp. 136–143, IEEE Computer So-ciety Press.

[37] Holger Graudejus, “Implementing a Concept Analysis Tool forIdentifying Abstract Data Types in C Code,” Diplomarbeit,Universitat Kaiserslautern, Germany, 1998.

[38] Christian Lindig and Gregor Snelting, “Assessing ModularStructure of Legacy Code Based on Mathematical Concept Anal-ysis,” in Proceedings of the 19th International Conference onSoftware Engineering, Boston, MA, USA, May 1997, pp. 349–359, IEEE Computer Society Press and ACM Press.

[39] Hourai Sahraoui, Walcelio Melo, Hakim Lounis, and FrancoisDumont, “Applying Concept Formation Methods to ObjectIdentification in Procedural Code,” in Proceedings of the Inter-national Conference on Automated Software Engineering, LakeTahoe, CA, USA, Nov. 1997, pp. 210–218, IEEE Computer So-ciety Press.

[40] Michael Siff and Thomas Reps, “Identifying Modules via Con-cept Analysis,” in Proceedings of the International Conferenceon Software Maintenance, Bari, Italy, Oct. 1997, pp. 170–179,IEEE Computer Society Press.

[41] Arie van Deursen and Tobias Kuipers, “Identifying Objects us-ing Cluster and Concept Analysis,” in Proceedings of the 21st

International Conference on Software Engineering, Los Angeles,CA, USA, 1999, pp. 246–255, IEEE Computer Society Press.

[42] Paolo Tonella, “Concept Analysis for Module Restructuring,”IEEE Computer Society Transactions on Software Engineering,vol. 27, no. 4, pp. 351–363, Apr. 2001.

[43] Thomas Ball, “The Concept of Dynamic Analysis,” in Proceed-ings of the 7th European Software Engineering Conference heldjointly with the 7th ACM SIGSOFT Symposium on the Foun-dations of Software Engineering, Toulouse, France, Sept. 1999,vol. LNCS 1687, pp. 216–234, Springer.

[44] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon, “Aid-ing Program Comprehension by Static and Dynamic FeatureAnalysis,” in Proceedings of the International Conference onSoftware Maintenance, Florence, Italy, Nov. 2001, pp. 602–611,IEEE Computer Society Press.

[45] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon, “In-cremental Location of Combined Features for Large-Scale Pro-grams,” in Proceedings of the International Conference on Soft-ware Maintenance, Montreal, Canada, Oct. 2002, pp. 273–282,IEEE Computer Society Press.

[46] “The New Bauhaus Stuttgart,” Available at http://www.bauhaus-stuttgart.de/, 2002.

Thomas Eisenbarth received his Diplomain computer science from the University ofStuttgart, Germany in 1998. Since then, he isworking for his dissertation at the University ofStuttgart in the field of reverse engineering asa member of the Bauhaus [46] project. His re-search interest is in reengineering, reverse engi-neering, program understanding, and softwarearchitecture. He focuses recovery methods forconnectors from the source code.

Rainer Koschke is a post-doctoral researcherat the computer science department at the Uni-versity of Stuttgart. His research interests areprimarily in the fields of software engineeringand program analyses. His current research in-cludes architecture recovery, feature location,program analyses, and reverse engineering. Heteaches reengineering, compilers, and program-ming language concepts. He holds a doctoraldegree in computer science from the Universityof Stuttgart, Germany.

Daniel Simon received his Diploma in com-puter science from the Saarland University atSaarbrcken, Germany in 2000. Since then, heis working for his dissertation at the Universityof Stuttgart in the field of reverse engineeringas a member of the Bauhaus [46] project. Hisresearch interest are in the field of reverse en-gineering, program analysis, and program un-derstanding. He co-authored several paperson feature location and software product lines,which is his current research focus.

So we submitted a paper with 20 pages.The reviewers asked us to add more detail and the version that wasfinally accepted had 25 pages.When we submitted the camera-ready, the production people stepped in.They told us that we have only 12 pages.Meanwhile the editor-in-chief was replaced and the rules had changed.The paper consisted of mainly two parts: the theoretical paper describingthe method and the evaluations with case studies.We had to cut the paper and published only the theoretical part.Today, the day has come to present the case study we conducted forTSE. After 9 years finally!Sometimes you get a second chance in life.

Page 16: ICSM'01 Most Influential Paper - Rainer Koschke

The problem we were trying to solve in the paper can be explained verysimply with an example.

• This is a screenshot of the drawing tool XFig. It allows you to drawgraphical objects such as circles, rectangles, and text.

• Suppose, you were a developer and assigned to extend XFig. For instance,your task is to add triangles.

• As you know, XFig has been developed by someone else, not you.

• Likely, you would first like to understand how it works for drawing theexisting objects.

• The very first problem to do that is to locate the code that implementsthese features.

Page 17: ICSM'01 Most Influential Paper - Rainer Koschke

Here is the call graph of XFig. Now, where would you start?

Page 18: ICSM'01 Most Influential Paper - Rainer Koschke

Where does this

program do X?— Norman Wilde, 1994

This problem is known as feature location. Another term used is conceptlocation.Feature location answers the question “Where does this program do X?”,as Norman Wilde phrased it back in 1992.Norman Wilde is a pioneer in feature location. He received themost-influential paper award for ICSM 1992 for his work on featurelocation.This year’s best-paper award went to a feature location paper. too.There seems to be a tradition for most-influential papers related tofeature location at ICSM.

Page 19: ICSM'01 Most Influential Paper - Rainer Koschke

codesource

compiler executable

invoke with feature trace profiler

invoking

input set I

traceinvoke w/o feature profiler

excluding

input set E

starting set

difference I−E for static

analysis

— Wilde et al. 1992

His technique works as follows.Because the technique is based on dynamic information, you need tocompile your program first.

Page 20: ICSM'01 Most Influential Paper - Rainer Koschke

codesource

compiler executable

invoke with feature trace profiler

invoking

input set I

traceinvoke w/o feature profiler

excluding

input set E

starting set

difference I−E for static

analysis

— Wilde et al. 1992

Then, you run the program invoking the relevant feature X and recordevery piece of code that was executed.All that code is relevant for the feature. But it may also be executedwhen other features are executed, thus it may contain code not reallyspecific to the feature of interest, for instance, the main function.

Page 21: ICSM'01 Most Influential Paper - Rainer Koschke

codesource

compiler executable

invoke with feature trace profiler

invoking

input set I

traceinvoke w/o feature profiler

excluding

input set E

starting set

difference I−E for static

analysis

— Wilde et al. 1992

For this reason, the program is executed once more. This time withoutinvoking the feature of interest. This gives you all code that is executedwhen the feature is not used.

Page 22: ICSM'01 Most Influential Paper - Rainer Koschke

codesource

compiler executable

invoke with feature trace profiler

invoking

input set I

traceinvoke w/o feature profiler

excluding

input set E

starting set

difference I−E for static

analysis

— Wilde et al. 1992

Now we have two sets: the code executed for the feature of interest, andcode that is executed even though the feature was not invoked.We can determine the difference between these two sets, which gives usthe code that is more specific to the feature of interest.

Page 23: ICSM'01 Most Influential Paper - Rainer Koschke

with feature

without feature

main

draw draw arcset centc

set cente

load

Here is a simple example.

Let this be the dynamic call graph of XFIG when the feature was

executed.

Page 24: ICSM'01 Most Influential Paper - Rainer Koschke

with feature without feature

main

draw draw arcset centc

set cente

load

Then we execute the program once more, this time without the feature.We obtain this other red call graph, which overlaps with the other one.

Page 25: ICSM'01 Most Influential Paper - Rainer Koschke

with feature without feature

main

draw draw arcset centc

set cente

load

We compute the difference between them and detect the routineset centc as routine that was executed only for the feature of interest.Problems of dynamic analysis

• Results depend upon input and are, thus, incomplete

• Set difference is binary: an element is either in the set or not

– some of the code in the excluding input set may still be somewhatrelevant to the feature

Page 26: ICSM'01 Most Influential Paper - Rainer Koschke

An alternative approach was proposed by Vaclav Rajlich, another pioneerin concept location.

Page 27: ICSM'01 Most Influential Paper - Rainer Koschke

extractor

callgraph

traversal

callgraphcall graph

save

move

set ru ll

set text

set cente

load

main

draw draw arcset centc

Vaclav proposed a static technique. Here, the idea is to extract a staticdependency graph. The user browses the call graph and a tool supportsthe navigation, similar to a web browser.

Problems:

• Where to start?

• Where to continue?

• When to stop?

• Static analysis is difficult.

Page 28: ICSM'01 Most Influential Paper - Rainer Koschke

invoke feature f1

invoke feature f2

compiler executable

trace

trace

profiler

profiler

routines (f1)

routines (f2)

sourcecode

...

traceinvoke feature fn

... ...

profiler

...

routines (fn)

tableinvocation

analysis

concept

conceptlattice

extractor

callgraph

traversal

callgraphcall graph

Our technique combines these two ideas and generalizes from one featureof interest to multiple features.First, we run a dynamic analysis similar to Norman Wilde’s idea.

Page 29: ICSM'01 Most Influential Paper - Rainer Koschke

invoke feature f1

invoke feature f2

compiler executable

trace

trace

profiler

profiler

routines (f1)

routines (f2)

sourcecode

...

traceinvoke feature fn

... ...

profiler

...

routines (fn)

tableinvocation

analysis

concept

conceptlattice

extractor

callgraph

traversal

callgraphcall graph

We are interested in many features and not only one. We want tounderstand what the difference is between drawing circles, rectangles,and text, for instance.For this reason, we execute the program more than once. At least oncefor each feature of interest.This gives us an invocation table. Each column in that table contains thecode that was executed.Since we have many such columns, a simple set difference does no longersuffice.Instead, we use formal concept analysis.I will describe formal concept analysis shortly.

Page 30: ICSM'01 Most Influential Paper - Rainer Koschke

invoke feature f1

invoke feature f2

compiler executable

trace

trace

profiler

profiler

routines (f1)

routines (f2)

sourcecode

...

traceinvoke feature fn

... ...

profiler

...

routines (fn)

tableinvocation

analysis

concept

conceptlattice

extractor

callgraph

traversal

callgraphcall graph

The information we obtain from formal concept analysis is then used tohelp navigating the static call graph.It tells us where to start, where to continue and where to stop.I will describe all these step with an example.

Page 31: ICSM'01 Most Influential Paper - Rainer Koschke

Scenariosdraw Ellipsisdraw Circledraw Rectangledraw Text

In our example of XFig, we are interested in its capabilities of drawingdifferent graphical objects.For each such object, we prepare one usage scenario or test case.Each tries to execute only one feature of interest and as few otherfeatures as possible.For instance, we prepare four test cases or usage scenarios:

• draw an ellipsis

• draw a circle

• draw a rectangle

• draw a text

Page 32: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table

= relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO

drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

Here is the result of the dynamic analysis: the invocation table.Each column describes the set of routines executed for the respectivefeature.In Norman Wilde’s approach, we would have two columns. Here we havemany.Consequently, a simple binary set difference is no longer possible.Instead, we are using formal concept analysis.

Page 33: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table

= relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

Formal concept analysis is a mathematical technique to analyze binaryrelations. An invocation table is such a binary relation.Of course, formal concept analysis can analyze arbitrary binary relations.It is based on :

• a set of objects → routines

• a set of attributes → feature scenarios or test cases

• a binary relation between these objects and attributes; it describes whichobject possesses which attributes → invocation table

Page 34: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table

= relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

Formal concept analysis is a mathematical technique to analyze binaryrelations. An invocation table is such a binary relation.Of course, formal concept analysis can analyze arbitrary binary relations.It is based on :

• a set of objects → routines

• a set of attributes → feature scenarios or test cases

• a binary relation between these objects and attributes; it describes whichobject possesses which attributes → invocation table

Page 35: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

Formal concept analysis is a mathematical technique to analyze binaryrelations. An invocation table is such a binary relation.Of course, formal concept analysis can analyze arbitrary binary relations.It is based on :

• a set of objects → routines

• a set of attributes → feature scenarios or test cases

• a binary relation between these objects and attributes; it describes whichobject possesses which attributes → invocation table

Page 36: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

common attributes for O ⊆ O

σ(O) := {a ∈ A | (o, a) ∈ R ∀o ∈ O}

Given the relation, you can define a function that yields the set ofcommon attributes for a given set of objects.For instance, the common attributes for main, draw, and draw arc aredrawEllipsis and drawCircle.You can spot that in the table by the completely filled rectangle.

Page 37: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

common attributes for O ⊆ O

σ(O) := {a ∈ A | (o, a) ∈ R ∀o ∈ O}

Given the relation, you can define a function that yields the set ofcommon attributes for a given set of objects.For instance, the common attributes for main, draw, and draw arc aredrawEllipsis and drawCircle.You can spot that in the table by the completely filled rectangle.

Page 38: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

common objects for A ⊆ A

τ(A) := {o ∈ O | (o, a) ∈ R ∀a ∈ A}

Analogously, you can define a function that yields all objects that have agiven set of attributes.In this example, the common objects for drawEllipsis and drawCircle aremain, draw, and draw arc.

Page 39: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

common objects for A ⊆ A

τ(A) := {o ∈ O | (o, a) ∈ R ∀a ∈ A}

Analogously, you can define a function that yields all objects that have agiven set of attributes.In this example, the common objects for drawEllipsis and drawCircle aremain, draw, and draw arc.

Page 40: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

formal concept c = (O,A)

A = σ(O) ∧ O = τ(A)

Given these two functions, you can define a formal concept. It is definedas a pair of objects and attributes where all objects have all theseattributes and vice versa.For example, main, draw, and draw arc together with drawEllipsis anddrawCircle are a formal concept.

Page 41: ICSM'01 Most Influential Paper - Rainer Koschke

Invocation Table = relation R ⊆ O ×A

set of attributes A

set

of

ob

ject

sO drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

formal concept c = (O,A)

A = σ(O) ∧ O = τ(A)

Given these two functions, you can define a formal concept. It is definedas a pair of objects and attributes where all objects have all theseattributes and vice versa.For example, main, draw, and draw arc together with drawEllipsis anddrawCircle are a formal concept.

Page 42: ICSM'01 Most Influential Paper - Rainer Koschke

drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

c1 = ({main, draw},{drawEllipsis,drawCircle, drawText, drawRectangle })c2 = ({draw arc, main, draw}, {drawEllipsis,drawCircle})c3 = ({set cente, draw arc, main, draw}, {drawEllipsis})c4 = ({set centc, draw arc, main, draw}, {drawCircle})c5 = ({set text, main, draw},{drawText})c6 = ({set ru ll, main, draw},{drawRectangle})c7 = ({set ru ll, set text, set centc, set cente, draw arc, main, draw}, ∅)

Intuitively, you are searching for maximally large filled rectangles in thistable, where you can permute rows and columns.

Page 43: ICSM'01 Most Influential Paper - Rainer Koschke

drawEllipsis drawCircle drawRectangle drawText

main × × × ×draw × × × ×draw arc × ×set centc ×set cente ×set ru ll ×set text ×

c1 = ({main, draw},{drawEllipsis,drawCircle, drawText, drawRectangle })c2 = ({draw arc, main, draw}, {drawEllipsis,drawCircle})c3 = ({set cente, draw arc, main, draw}, {drawEllipsis})c4 = ({set centc, draw arc, main, draw}, {drawCircle})c5 = ({set text, main, draw},{drawText})c6 = ({set ru ll, main, draw},{drawRectangle})c7 = ({set ru ll, set text, set centc, set cente, draw arc, main, draw}, ∅)

The set of all concepts in this table is listed here.Now, let us pick two of these concepts and look closer.

Page 44: ICSM'01 Most Influential Paper - Rainer Koschke

({draw arc, main, draw}, {drawEllipsis,drawCircle})({set centc, draw arc, main, draw}, {drawCircle})

Let c1 = (O1,A1) and c2 = (O2,A2) be concepts;

c1 ≤ c2 :⇔ O1 ⊆ O2

or duallyc1 ≤ c2 :⇔ A2 ⊆ A1

c2 is superconcept of c1

c1 is subconcept of c2

⇒ lattice

For instance, we pick these two concepts.We see that the objects of the first one are a subset of the objects of thesecond one.Likewise, the attributes of the second one are a subset of the attributesof the first one.The second one has fewer attributes. Consequently, there are moreobjects having these attributes.If you think of a concept as a class in an object-oriented programminglanguage, this observation would be expressed as a superclass / subclassrelation.The first concept has all attributes of the second one plus additionalattributes.

Page 45: ICSM'01 Most Influential Paper - Rainer Koschke

({draw arc, main, draw}, {drawEllipsis,drawCircle})({set centc, draw arc, main, draw}, {drawCircle})

Let c1 = (O1,A1) and c2 = (O2,A2) be concepts;

c1 ≤ c2 :⇔ O1 ⊆ O2

or duallyc1 ≤ c2 :⇔ A2 ⊆ A1

c2 is superconcept of c1

c1 is subconcept of c2

⇒ lattice

This allows us to define an ordering between concepts.This ordering is analogous to subclassing.A concept c1 is smaller than concept c2 if all objects of c1 are containedin c2 or, dually, if all attributes of c2 are in c1.

Page 46: ICSM'01 Most Influential Paper - Rainer Koschke

({draw arc, main, draw}, {drawEllipsis,drawCircle})({set centc, draw arc, main, draw}, {drawCircle})

Let c1 = (O1,A1) and c2 = (O2,A2) be concepts;

c1 ≤ c2 :⇔ O1 ⊆ O2

or duallyc1 ≤ c2 :⇔ A2 ⊆ A1

c2 is superconcept of c1

c1 is subconcept of c2

⇒ lattice

In that case, c2 is called a superconcept of c1 and c1 is a subconcept ofc2.

Page 47: ICSM'01 Most Influential Paper - Rainer Koschke

({draw arc, main, draw}, {drawEllipsis,drawCircle})({set centc, draw arc, main, draw}, {drawCircle})

Let c1 = (O1,A1) and c2 = (O2,A2) be concepts;

c1 ≤ c2 :⇔ O1 ⊆ O2

or duallyc1 ≤ c2 :⇔ A2 ⊆ A1

c2 is superconcept of c1

c1 is subconcept of c2

⇒ lattice

This partial order forms a lattice, called concept lattice.Lattices can be visualized with Hasse diagrams.

Page 48: ICSM'01 Most Influential Paper - Rainer Koschke

Hasse diagramm

draw Circledraw Ellipsis

maindraw

maindraw

draw arc

maindraw

draw arc

set ru llset textset centcset cente

maindraw

draw arc

maindraw

maindraw

draw Ellipsisdraw Circledraw Textdraw Rectangle

2 4 53

0

1

draw arc

set cente

set ru ll

draw Rectangle

draw Ellipsis

set centc

draw Circle

draw Textset text

6

maindraw

Here, we see the Hasse diagram of our example.The nodes are the concepts. The edges are the partial order. Where theedge is directed from bottom to top by convention. That is,superconcepts are at the top, subconcepts are below.We see the attributes in blue. In our case, these are our features ofinterest.We see objects, which are the routines executed for these features.There are two special concepts, namely, the top and the bottom element.The top element consists of all objects and their attributes.The bottom element consists of all attributes and the objects thatpossess all these attributes.By the definition of the ordering of concepts, every superconcept has allattributes of its subconcepts.Likewise, every subconcept has all objects of its superconcepts.That is, there is a lot of redundancy in this Hasse diagram.

Page 49: ICSM'01 Most Influential Paper - Rainer Koschke

Sparse Hasse diagramm

2 4 53

0

1

draw arc

set cente

set ru ll

draw Rectangle

draw Ellipsis

set centc

draw Circle

draw Textset text

6

maindraw

The sparse Hasse diagram avoids this redundancy.Each object and every attribute is listed only once, where they appearfirst in the lattice.By the definition of the ordering of concepts, we can infer where theyalso appear in the lattice.

The sparse representation is much more readable.

Page 50: ICSM'01 Most Influential Paper - Rainer Koschke

2 4 53

0

1

draw arc

set cente

set ru ll

draw Rectangle

draw Ellipsis

set centc

draw Circle

draw Textset text

6

maindraw

save

move

set ru ll

set text

set cente

load

main

draw draw arcset centc

We can use the sparse Hasse diagram in combination with the static callgraph as follows.If we want to know what are the specific routines for a given feature, wesimply look for that feature in the lattice.Let us assume, we are interested in the feature drawCircle.If the concept at which this feature occurs has no other feature, allroutines listed at this concept are specific to this feature.In our example, we would start browsing the call graph at routineset centc.This information could as well have been by simple set differenceoperations.But the lattice provides more information. If we look at the subconceptof concept 3, we find a concept annotated with routine draw arc. Thisroutine also contributes to feature drawEllipsis because 3 is also asubconcept of concept 2.Thus draw arc serves two features. It is also required for featuredrawEllipsis, but it is less specific than set centc.Yet, it is more specific than main and draw which are listed as transitivesubconcepts of concept 3. While set difference is binary, the lattice givesa finer ranking of feature specificity.That is, you would continue your navigation of the static call graph atdraw arc, and then look also at main and draw.

Page 51: ICSM'01 Most Influential Paper - Rainer Koschke

2 4 53

0

1

draw arc

set cente

set ru ll

draw Rectangle

draw Ellipsis

set centc

draw Circle

draw Textset text

6

maindraw

save

move

set ru ll

set text

set cente

load

main

draw draw arcset centc

In later work, we extended this approach to handle cases in which there isno one-to-one mapping between features and scenarios. Furthermore, weused concept analysis incrementally so that you start with a small set offeatures and then extend it to a larger set without loosing your previousknowledge.

Page 52: ICSM'01 Most Influential Paper - Rainer Koschke

Now we come to one case study that was not published in our TSE paper.We tried this technique in an industrial case study on this machine here.This machine is chip tester. It is used by chip manufacturers to checkwhether a chip works correctly before it is shipped.A robot puts the chip into that machine and this machine runs varioustests that can be programmed by a test engineer.

Page 53: ICSM'01 Most Influential Paper - Rainer Koschke

utility functions

exectu

or

exectu

or

exectu

or

constructor

command

YACC parser

response

semaphor queuemessageshared

memory

control flow

data flow

applications

firmware

firmware

hardware

The software architecture of the firmware of this chip tester is sketchedhere.The firmware provides the basic operations used by various applications,that is by tools to implement, configure, run, analyze, and visualize tests.Because these applications run in parallel, the first layer of thearchitecture provides some synchronization and means to exchange inputand output.There is a programming language for writing these tests. The input tothe firmware are such programs. There are first parsed and then executed.The firmware is written in C. And there is exactly one C function thatexecutes an operation in this programming language. These functions arecalled executors.The executors use some shared utility functions to execute the operation.This architecture looks very tidy and structured. The truth is, however,that 90 % of the code is hidden in this box labeled utility functions.Nobody had a clear picture of which executors shared which utilityfunctions.

Page 54: ICSM'01 Most Influential Paper - Rainer Koschke

Here is the static call graph of the firmware.It consists of roughly 10,000 routines and is very complex.

Page 55: ICSM'01 Most Influential Paper - Rainer Koschke

Configuration SetupCNTR, CNTR?, CONF, CONF? UDEF, UDPS, UDGP

DPFN, DFPN?, DFPS, DFPS? DFGP, DFGP?, DFGE, DFGE?

PALS, PALS?, PSTE, PSTE? PSFC, PSFC?, PQFC, PQFC?

PACT, PACT?

Relay Control (Test Execution)RLYC, RLYC?

Level Setup CommandsLSUS, LSUS?, DRLV, DRLV? RCLV, RCLV?, TERM, TERM?

Timing Setup CommandsPCLK, PCLK?, DCDF, DCDF? WFDF, WFDF?, WAVE, WAVE?

ETIM, ETIM?, BWDF, BWDF?

Vector Setup CommandsSQLA, SQLB, SQLB?, SQPG, SQPG? SPRM, SPRM?, SQSL, SQSL?

Misc.FTST, VBMP, PSLV, CLMP WSDM, DCDT, CLKR, VECC

SDSC, SREC, DMAS, STML

We analyzed 76 different operations of this programming language.Related operations can be grouped into categories. For instance, thereare operations for the configuration setup, relay control, and many more.Since the operations of one category are semantically related, we wouldassume that they share also a lot of utility functions.That was one of your hypotheses, we investigated.

Page 56: ICSM'01 Most Influential Paper - Rainer Koschke

real 76 scenarios for relevant commands1 scenario for NOP command

additional 2 additional parameter combinations

factoring 1 start-end13 scenarios for preparing steps

total 93 scenarios

To locate these 76 features, we provided one test case for each.To factor out all C functions that are executed for all commands, weadded one test case that contained only the NOP command that doesnothing at all.Because some commands allow variant parameters, we added additionaltest cases to cover these, too.In order to factor out code for startup and shutdown etc., we added onetest case in which the firmware was started and immediately shutdownagain.Because some of the commands had certain preconditions, we addedadditional test cases to fulfill these preconditions.In total, we had 93 test cases.

Page 57: ICSM'01 Most Influential Paper - Rainer Koschke

Here is the resulting concept lattice for our study. The height of theconcepts is proportional to the number of routines contained therein;except for the bottom element.In the first layer of the lattice, you find the code of the executors and allfunctions that were executed only by these executors.And below that layer, you can see which utility functions are shared bywhich executors.Furthermore, we could confirm our hypothesis that executors of the samecategory have more utility functions in common.To validate our findings, we asked one developer of this firmware whetherthis lattice makes any sense to him.It did and he learned new things he did not knew before.

Page 58: ICSM'01 Most Influential Paper - Rainer Koschke

Study of SDCC / GCC(cc1)

Features of interest:

Loops: do-while, while, for, if-goto

Mathematical expressions: +, -, *, /, int literals

Optimization options

In another case study, published in ASE, we evaluated our technique fortwo C compilers, namely, SDCC and cc1, the C compiler of GCC.The motivation of this study was to evaluate whether a finer graineddynamic analysis is feasible and pays off and whether the technique willscale to very large feature sets.The features of interests were different loop constructs and mathematicalexpressions in C. In addition to that, we looked at different compileroptimization options.

Page 59: ICSM'01 Most Influential Paper - Rainer Koschke

Granularity Routine vs. Statements

void handle( ... )

{switch ( ... )

{case DO : ...

case WHILE : ...

case FOR : ...

...

}}

In the earlier study, we traced routines. In this compilers study wewanted to try statement level.

Page 60: ICSM'01 Most Influential Paper - Rainer Koschke

Granularity Routine vs. Statements

void handle( ... )

{switch ( ... )

{case DO : ...

case WHILE : ...

case FOR : ...

...

}}

For instance, constructs such as this are expected in compilers written ina procedural language.The routine handle() would be called for all loop constructs, but not allof its code is executed for each loop construct.If we trace at the level of basic blocks, we would be able to find the codewithin handle() that is specific to handle the DO loop in C, for instance.

Page 61: ICSM'01 Most Influential Paper - Rainer Koschke

R = {B0, B1, B2}

T1, T2

F1, F2, F3

R

In terms of concept lattice, tracing of basic blocks is a refinement of

tracing at the routine level.

Page 62: ICSM'01 Most Influential Paper - Rainer Koschke

2 3

1

F1, F3F1, F2

B0, B1

T1

B0, B2

T2

B0

F1

T1, T2

T1, T2

F1, F2, F3

R = {B0, B1, B2}

That is, a concept in the lattices for routines may be split into severalconcepts in the lattice for basic blocks.Thus, you gain more detail.The additional level of detail comes not for free, however. The dynamicanalysis becomes more expensive. Even worse, the lattice becomes bigger.Lattices may grow exponentially with the number of attributes andobjects in the worst case. So we were wondering whether tracing at basicblock level is feasible.

Page 63: ICSM'01 Most Influential Paper - Rainer Koschke

sdcc cc1

#routines 1,325 15,986#routines executed 650 2,657

#basic blocks 46,699 379,086#basic blocks executed 10,113 34,602

Here are some size numbers concerning the input for concept analysiswhen tracing on the level of either routines or basic blocks is used.The analysis at basic block level was slowed down by a factor in between50 and 200. But we could in fact compute the lattice in all cases.Furthermore, by the analysis at basic block level, we could find detailsthat could not have been found at the routine level.

Page 64: ICSM'01 Most Influential Paper - Rainer Koschke

Scalability for Large Feature Sets

features

100 test cases for 100 C language constructs

one/multiple backends

no/two compiler switches

sdcc

1 one backend, no compiler switches: → 80,000 concepts

2 multiple backends, two compiler switches: → 4.5 mio concepts

cc1 one backend, no compiler switches:

→ invocation table has 1.3 mio entries→ lattice cannot be computed

While we had 76 different features in the earlier study, we wanted to seewhether the technique still scales to even larger feature sets.If you can combine features freely, there is easily a combinatorialexplosion of possible features.

Page 65: ICSM'01 Most Influential Paper - Rainer Koschke

Scalability for Large Feature Sets

features

100 test cases for 100 C language constructs

one/multiple backends

no/two compiler switches

sdcc

1 one backend, no compiler switches: → 80,000 concepts

2 multiple backends, two compiler switches: → 4.5 mio concepts

cc1 one backend, no compiler switches:

→ invocation table has 1.3 mio entries→ lattice cannot be computed

Therefore, we looked at 100 different C language constructs incombinations with different compiler backends and additional commandline options.

Page 66: ICSM'01 Most Influential Paper - Rainer Koschke

Scalability for Large Feature Sets

features

100 test cases for 100 C language constructs

one/multiple backends

no/two compiler switches

sdcc

1 one backend, no compiler switches: → 80,000 concepts

2 multiple backends, two compiler switches: → 4.5 mio concepts

cc1 one backend, no compiler switches:

→ invocation table has 1.3 mio entries→ lattice cannot be computed

If we use only one backend of SDCC and no compiler switches, the lattice

has about 80,000 concepts.

Page 67: ICSM'01 Most Influential Paper - Rainer Koschke

Scalability for Large Feature Sets

features

100 test cases for 100 C language constructs

one/multiple backends

no/two compiler switches

sdcc

1 one backend, no compiler switches: → 80,000 concepts

2 multiple backends, two compiler switches: → 4.5 mio concepts

cc1 one backend, no compiler switches:

→ invocation table has 1.3 mio entries→ lattice cannot be computed

If we have multiple backends of SDCC and two compiler switches, the

lattice has about 4.5 mio concepts.

Page 68: ICSM'01 Most Influential Paper - Rainer Koschke

Scalability for Large Feature Sets

features

100 test cases for 100 C language constructs

one/multiple backends

no/two compiler switches

sdcc

1 one backend, no compiler switches: → 80,000 concepts

2 multiple backends, two compiler switches: → 4.5 mio concepts

cc1 one backend, no compiler switches:

→ invocation table has 1.3 mio entries→ lattice cannot be computed

If we use the simple configuration for cc1, the input to concept analysis isa table where 1.3 mio entries are set. For this size, we were not able tocompute the lattice.So, the approach does not scale for large sets of feature combinations.The lattice should be computed only on demand for subsets of features.

Page 69: ICSM'01 Most Influential Paper - Rainer Koschke

Feature Location in Source Code:

A Taxonomy and Survey

Bogdan Dit, Meghan Revelle, Malcom Gethers, Denys Poshyvanyk

The College of William and Mary

Journal of Software Maintenance and Evolution to appear

Now, let’s turn to the question what have others done.If you are interested in this question, I recommend to read this upcomingpaper published by our hosts.They have written a very nice survey on papers on feature location thatwill soon appear in the journal of Software Maintenance and Evolution.I know they are constantly renaming this journal, but I stick to this name.

Page 70: ICSM'01 Most Influential Paper - Rainer Koschke

Denys and his colleagues have reviewed 89 articles from 25 venues andclassified them within a taxonomy.Here is a distribution of the venues of papers published in these venues.ICSM is second. The premier conference for feature location seems to beICPC. However, the chance to get an award for a feature location paperare higher at ICSM.

Page 71: ICSM'01 Most Influential Paper - Rainer Koschke

Dynamic Analyses

Static Analyses

Textual approaches

There have been several improvements on the dynamic analysis. Someresearchers, for instance, take the frequency of execution into account.The intuition is, the more often code is executed, the more relevant itshould be.Also they improved the recording of traces. You start and end therecording while the program is executed, so that you observe theexecution only right after you triggered the feature of interest.In addition to that, textual approaches based on methods frominformation retrieval emerged. Andrian Marcus is one pioneer in this fieldand Denys Poshyvanyk has continued this work.

Page 72: ICSM'01 Most Influential Paper - Rainer Koschke

Open Issues

accepted evaluation procedures and benchmarks

tool adoption in industry

user studies

The authors have summarized several open issues in feature location.I am listing here those that I find most important.We have many competing feature location techniques, but we have noclear picture yet, when to use which.There was one experiment by Vaclav Rajlich and Norman Wilde, in whichthey compared their static and dynamic approaches.But there is no comprehensive evaluation. Nor are there acceptedbenchmarks. Luckily, Denys and colleagues have started to create some.There is no Eclipse plugin for feature location, other than maybeprototypes. The techniques we developed are not really used in the field.We have not yet found the right ways of smooth integrations of suchtools in the developer’s toolkit.In order to do so, we must better understand how programmers dofeature location. There are some initial observational studies. We needmore of these and we also need tool evaluations with real programmers.

Page 73: ICSM'01 Most Influential Paper - Rainer Koschke

“Feature location is irrelevant in industry.”

Senior Researcher, CSMR 2009, Kaiserslautern

Finally, let me conclude with a quote from a senior researcher stated in apanel at CSMR 2009 in Kaiserslautern.He said that feature location is irrelevant in industry.I have never had the change to ask him what he meant by thisstatement. I personally do sometimes need to locate features in my code.Regrettably, I am still using mostly grep.