47
Towards generic representation of web applications: solutions and trade-offs Damith C. Rajapakse and Stan Jarzabek Department of Computer Science School of Computing, National University of Singapore {damith,stan}@comp.nus.edu.sg SUMMARY Server pages (also called dynamic pages) render a generic web page into many similar ones. The technique is commonly used for implementing web application user interfaces. Yet our previous study found a high rate of repetitions (also called “clones”) in web applications, particularly in user interfaces. The finding raised the question as to why such repetitions had not been averted with the use of server pages. For an answer, we conducted an experiment using PHP server pages to explore how far server pages can be pushed to achieve generic web applications. Our initial findings suggested that generic representation obtained using server pages sometimes compromises certain important system qualities such as run-time performance. It may also complicate the use of WYSIWYG editors. We have analyzed the nature of these trade-offs, and now propose a mixed-strategy approach to obtain optimum generic representation of web applications without unnecessary compromise to critical system qualities and user experience. The mixed-strategy approach 1

Proceedings Template - WORD

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Proceedings Template - WORD

Towards generic representation of web applications: solutions and trade-offs

Damith C. Rajapakse and Stan Jarzabek

Department of Computer ScienceSchool of Computing, National University of Singapore {damith,stan}@comp.nus.edu.sg

SUMMARY

Server pages (also called dynamic pages) render a generic web page into many similar ones. The technique is commonly used for implementing web application user interfaces. Yet our previous study found a high rate of repetitions (also called “clones”) in web applications, particularly in user interfaces. The finding raised the question as to why such repetitions had not been averted with the use of server pages. For an answer, we conducted an experiment using PHP server pages to explore how far server pages can be pushed to achieve generic web applications. Our initial findings suggested that generic representation obtained using server pages sometimes compromises certain important system qualities such as run-time performance. It may also complicate the use of WYSIWYG editors. We have analyzed the nature of these trade-offs, and now propose a mixed-strategy approach to obtain optimum generic representation of web applications without unnecessary compromise to critical system qualities and user experience. The mixed-strategy approach applies a generative technique of XVCL to achieve genericity at the meta-level representation of a web application, leaving repetitions to the actual web application. Our experiments show that the mixed-strategy approach can achieve a good level of genericity without conflicting with other system qualities. Our findings should open the way for others to better-informed decisions regarding generic design solutions, which should in turn lead to simpler, more maintainable and more reusable web applications.

KEYWORDS: code duplication, clone unification, meta-programming, design patterns, Web applications, genericity

1

Page 2: Proceedings Template - WORD

1. INTRODUCTION

Server pages (also called dynamic pages) are commonly used to implement web application user interfaces (UIs). ASP, JSP, and PHP are typical examples of web technologies that use some form of server pages. A server page is a combination of HTML and programming language scripts, which the web server uses to generate web pages dynamically at run-time. Therefore, a server page can represent many similar web pages in a generic form.

Our previous study [23] revealed a high level of repetitions in the domain of web applications. We analyzed 17 web applications of different sizes, developed using different technologies, and in different application domains. We found a higher incidence of clones in web applications as compared to traditional applications, particularly in the user interface (UI) area. It is tempting and often useful to represent such clones in a unique, generic form for a number of reasons: Such unification reduces the size and the cognitive complexity of a software solution. It explicates dependencies among program parts that contain clones, and reduces the chance of update anomalies. The C++ standard template library (STL) is a powerful example of simplifications and engineering benefits achievable from a skillfully designed generic representation. While techniques used to achieve genericity of STL may not be equally effective in other domains, generative techniques can provide more versatile solutions to the generic design problem [13][20][30][31].

This paper presents an experiment we have done to investigate how far, in practice, server pages can be pushed to achieve clone-free web applications. Our experiment was based on alternative designs of a web application called Project Collaboration Environment (PCE). We built PCE based on requirements from one of our industry projects [20] and using PHP server pages.

In the first phase of the experiment (also described in [24]), we analyzed how clone unification in PCE was affecting other engineering qualities of PCE. We managed to remove most of the clones, but we also observed that clone unification caused many trade-offs in other engineering qualities of PCE. While some of these trade-offs were well known in traditional software development, most were less obvious and applicable only in web application development. These trade-offs resulted from the interplay between clone unification, realities of web application development (such as fuzzy requirements, dramatically short development schedules, constant evolution, and shortened revision cycles [8]), and desirable engineering qualities of web applications (such as high performance, high information content, and good aesthetics). We believe that these trade-offs may be unacceptable for many web applications.

That observation led us to the second phase of the experiment, which is the main contribution of this paper and the extension of [24]. In the second phase, we tackled clones using an alternative approach that we call the mixed-strategy, which is based on the generative technique of XML-based Variant Configuration Language (XVCL) [13]. Instead of removing clones at code-level, we did it at the meta-level representation of a web application built with XVCL. The web application is derived from its meta-level counterpart. The mixed-strategy avoids trade-offs of clone removal as web applications derived from the meta-level representation can contain clones, avoiding conflicts with other design goals and required properties of a web application.

2

Page 3: Proceedings Template - WORD

At the same time, developers can work at the clone-free meta-level representation of a web application.

The rest of the paper is organized as follows. Section 2 describes the PCE Web application we used for this experiment. Section 3 details the first phase of the experiment, in which we used conventional techniques to unify clones. The trade-offs we observed during the first phase are detailed in Section 4. The contents in Sections 2, 3 and 4 are an extended version of the material presented in [24]. In Section 5, we describe the second phase of the experiment and how the mixed-strategy approach can avoid the trade-offs we previously observed. Section 6 is a survey of related work, and we conclude in Section 7.

2. PROJECT COLLABORATION PORTAL (PCE)

Figure 1. A screenshot from the Staff module

Our experiment was based on alternative implementations of a web application called Project Collaboration Environment (PCE), created based on requirements received from our industry partner. PCE supports project record keeping, task assignment to staff, task progress tracking, and a range of other activities related to project planning and execution. It has six modules corresponding to six entity types in the PCE domain, namely Staff, Project, Product, Task, Notes, and File. PCE maintains records of those entities. For example, the Staff module maintains records of staff members and the Product module tracks the status of project deliverables. Figure 1 shows a screenshot from the Staff module with a listing of staff members.

3

Page 4: Proceedings Template - WORD

Figure 2. Domain model of PCE

PCE also tracks relationships between various entities in the PCE domain. For example, the Staff and Project modules maintain data about which staff members belong to which project teams, and the Task and Staff modules maintain data about project tasks assigned to staff members. Figure 2 shows the main PCE entity types and the relationships among them.

Figure 3. Feature diagram of a PCE module

Figure 3 shows the similarities and variations between PCE modules as a feature diagram [15]. A typical module M in PCE has a name (e.g., “Staff”), and several attributes (e.g., the Staff module has the attributes Full Name, Job Title, … cf Figure 2). Some attributes are common to all modules while others, which are optional, are module-specific. Each module supports

4

Page 5: Proceedings Template - WORD

actions (create, edit, delete, …). Some actions can be further decomposed into lower-level actions (e.g., edit action contains lower-level actions modify, link/unlink and copy), some of which are optional. In addition, a module may optionally have one of three types (i.e., an OR-feature) of relationships with another module: a simple association (e.g., a staff member works on a project), an aggregation type association (e.g., a Project has files), or a composition type association (e.g., A project consists of sub-projects).

The functionality and structure of a given PCE module is a combination of features given in the feature diagram. The high proportion of mandatory features implies that modules are highly similar to each other. However, optional features and OR-features inject variations between modules. Our feature diagram only depicts high-level, inter-module variations. There are also lower level variations among modules. For example, the create action of the File module carries an extra functionality for the uploading of to upload a file. In addition, there are intra-module similarities. For example, the copy action and the edit action are very similar, as both involve retrieving an existing record, editing it, and storing it in the database (for edit – overwrite the previous record, for copy - save as a new record).

3. PHASE I: USING SERVER PAGES TO UNIFY CLONES

We selected PHP server pages to implement PCE. PHP is a popular (20 million web domains used PHP by the April 2007 [21]), free for use and versatile technology specifically geared for web application development. Although PHP started out as a simple scripting language, it has since evolved into an industrial strength web application technology, used in complex web applications such as Facebook and sourceforge.net.

Figure 4. Three implementations of PCE

In this phase, we carried out three different implementations of PCE (see Figure 4), each one functionally equivalent to the other two. The first implementation was based on a very simple design, with little attention to minimizing clones. We call this PCE simple. In the second implementation, named PCEpatterns, we tried to unify clones by applying suitable design patterns to PCEsimple. In PCEunified, we unified any remaining clones that, in our judgment, were worth the effort. We considered clones as worth unifying if they were of considerable length (at least 20 tokens long) and if their unification decreased the code involved by at least of 25% (i.e., unified code was at least 25% smaller than the size of total clone instances).

5

Page 6: Proceedings Template - WORD

Figure 5. High-level architecture of PCE

We designed the three PCEs based on the CPG-Nuke (www.cpgnuke.com) open source web portal. CPG-Nuke is an adaptation of PHP-Nuke (http://phpnuke.org), a popular open source web application, averaging ½ million downloads per year during the 2004-2006 period. By using CPG-Nuke as the basis of our implementations, we hoped not only to reduce implementation workload, but also to ensure that our implementations follow an industry-accepted architecture.

Figure 5 shows the high-level architecture of the three PCEs. The Foundation part of the PCE architecture consists of Admin Modules (used for the administration of PCE) and Service Modules (used to provide various infrastructure services such as database connectivity, logging, etc.). Foundation acts as the platform on which various User Modules are deployed. It provides a framework for implementing modules and administration facilities to manage those modules. General User Modules provide common facilities to users (e.g., polls, message boards, preference management, etc.). For Foundation and General User Modules, we reused CPG-Nuke code as is whenever possible, and with minimal changes where necessary.

We developed the six PCE modules as another set of User modules. The reuse of CPG-Nuke reduced the PCE implementation to just these six modules. We built them to conform to Foundation requirements so that they too could use Foundation services, and could be managed using the Foundation (e.g., we used Foundation services for implementing a common look and feel). As our focus was on the UI layer of PCE, we kept the business logic and database layers of these modules simple.

We categorized clones into intra-module clones and inter-module clones. In PCEpatterns, we focused on intra-module clones as they were more localized and easier to tackle. In PCEunified, we widened our focus to cover inter-module clones as well. After selecting clones for unification, we used a combination of the following strategies to unify them:

6

Page 7: Proceedings Template - WORD

Applying design patterns – We unified some clones using design patterns that reduce code duplication. We selected a suitable design pattern by comparing clone characteristics against design patterns drawn from industry best practices in J2EE [1] and .NET [27], and from platform-independent recommendations [12][10].

Applying refactoring techniques – We unified some clones by applying commonly known refactorings such as given in [11].

Context-specific restructurings – When a clone did not fit into a known design pattern or a refactoring, we applied PHP features and context-specific design/code restructuring to unify the clones.

We will provide concrete examples of each these strategies in the forthcoming sections.

3.1 PCEsimple: a first-cut solution

Figure 6. Design of PCEsimple

We followed the KISS principle (i.e., Keep It Simple, Straightforward) when implementing PCEsimple. This initial version of PCE exemplified a first-cut solution that was likely to emerge when a new web application was developed under time pressure. Our priority in developing this PCE version was to “get PCE done”, with maintainability concerns such as “clone avoidance” taking a low priority. Each action (or sub-action) of the module in PCEsimple was implemented as a single independent server page, as shown in Figure 6. For example, the createStaff.php page implemented the create action for the Staff module. Cloning was liberally used when dealing with intra/inter-module similarities. For example, we implemented one module and used it to implement other modules by simply copying and modifying it. Two considerations heavily influenced the design of PCEsimple:

1. Architectural guidelines implied by the Foundation part of PCE – although our design was simple, we still adhered to the guidelines implied by Foundation.

2. Conceptual design of a similar web application implemented by our industry partner [20] (source code was not available as it was a commercial application). The PCE conceptual

7

Page 8: Proceedings Template - WORD

model (Figure 2), direct mapping of modules to entities, and the page-per-action organization in PCEsimple were direct results of this.

With the above two considerations, we expected our PCE to closely match an industrial implementation.

Figure 7 shows some examples of intra/inter-module clones in PCEsimple. Note that this is only a visual representation of relative proportion/location of similarities and variations created based on actual clones; the text contained in clones is not meant to be read. The dotted line indicates the part of the code that belongs to the clone. Light regions represent duplicated text while dark regions indicate variations between the clone instances.

[a] createFile.php is an inter-module clone of createProject.php.

[b] An identical intra-module clone in the Project module (caused by similar preprocessing done for each action).

[c] Intra-module clone in the Project module (caused by similarity between the create action and the edit action).

Figure 7. Some clones in PCEsimple

8

Page 9: Proceedings Template - WORD

3.2 PCEpatterns: a pattern-based solution

Figure 8. Meta-model of a module in PCEpatterns

Our objective when implementing PCEpatterns was to reduce cloning in PCEsimple by applying design patterns. We first re-organized our design around the Model-View-Controller (MVC) pattern that is widely used for UI-intensive applications. Following this pattern, each PCE entity consisted of a Model, a number of Views, and a number of Controllers that updated the model and selected the appropriate View to visualize the Model. Figure 8 shows the meta-model of a PCE Entity.

9

Page 10: Proceedings Template - WORD

Figure 9. Design of Staff module in PCEpatterns

Figure 9 shows a module designed following the meta-model. Application of the MVC pattern did not unify clones directly. Rather, it was a necessary precursor to applying other design patterns that unify clones since those patterns were targeted for an MVC architecture.

After applying MVC, we used CCFinder [14] and Gemini[29] (which provided a UI to visualize CCFinder results) to identify intra-module clones, and applied further design patterns to unify them. We applied these patterns within the scope of a module, repeatedly applying the same patterns to each module. We list below some examples of clones we found and the matching patterns we used to unify them:

Similar preprocessing sequences were repeated for each page request (e.g., session validation, parameter decoding). We applied the Front Controller [1][10][27] pattern to unify such clones into a single location. Following this pattern, each module had one Front Controller that received all user requests and performed control tasks common to all requests (FrontController in Figure 8, StaffFrontController in Figure 9).

Some views exhibited much similarity among them. We applied the Template View [10] pattern to unify such clones. That is, we put similar parts of views into a template (ActionView in Figure 8, createStaffView in Figure 9), and used that template to generate various cloned views that it unified.

10

Page 11: Proceedings Template - WORD

Data retrieval code was cloned in multiple views. We used the View Helper [1] pattern to unify the cloned code into a common helper class. Accordingly, some PCE Views were aided by helper classes (ActionViewHelper in Figure 8, createStaffViewHelper in Figure 9).

Some fragments of UI recurred in multiple places (e.g., attribute display code was cloned in the Edit page as well as the Display page). Following the Composite View pattern [27], we unified such fragments into a smaller view that was then reused to compose larger views.

As mentioned earlier, we kept the non-UI parts of PCE as simple as possible: Each module was to have minimal domain logic and a single table was to be used to represent a module in the database. As recommended by [10] for such situations, we used the Table Module pattern and the Table Data Gateway patterns for this portion (i.e., TableModule and TableDataGateway in Figure 8, omitted in Figure 9). In the controller portion, we also used the Page Controller [10][27] pattern to control the complexity of controllers. Following this pattern, each Front Controller was to use a number of Page Controllers, one per each action supported by the module (ActionController in Figure 8, createStaffController in Figure 9), rather than having a single controller for all the actions.

3.3 PCEunified: further clone unification

PCEunified was an all-out effort to unify any remaining clones. First, we identified remaining intra-module clones in PCEpatterns and unified them using a combination of the following techniques:

We extracted duplicated code fragments into methods using “extract method” refactoring [11].

We unified largely similar functions using “add parameter” refactoring [11], conditional branches, and Template Method pattern [12].

We converted similar HTML fragments to PHP server pages, using PHP scripts to handle variations in HTML clones (an example of a “context-specific restructuring” using PHP).

We further applied more intensively the Composite View pattern to unify common parts of Views.

11

Page 12: Proceedings Template - WORD

Figure 10. Design of PCEunified

After the intra-module clones were dealt with, we shifted our focus to inter-module clones. Our clone detection indicated that there were enough inter-module clones to consider each whole module as a coarse-grain clone of the others. This was not surprising since the modules initially implemented were used as blueprints for later modules. To remedy this situation, we unified the six modules into one generic module in the following manner: We extracted the six Front Controllers out of the modules and unified clones among them by creating two layers of Controllers. The top layer consisted of a common Front Controller that unified common control tasks. The second layer consisted of six module-specific controllers (e.g., StaffFrontController, ProjectFrontController, … in Figure 10). The rest of the six modules were unified into one module (called unified module in Figure 10). Variations found were handled using the same techniques that we used to handle variations in intra-module clones.

3.4 Overall comparison of the three PCEs

We started by comparing size and cloning level of the three PCE implementations. To measure cloning level (C%), we used the percentage of non-unique (i.e., cloned) code, calculated based on clones detected by CCFinder tool [14]. C% for a system was the sum of clone-related tokens (i.e., clone-related tokens were those tokens that form a part of any of the clones in that system) expressed as a percentage of total number of tokens in the system. C% was directly related to the probability of update anomalies. For instance, if 35% of the system was non-unique, any change to that 35% of the system risked an update anomaly. To minimize distortions created by false-positives and trivially short clones, only exact duplicates that were longer than 20 tokens were counted as clones in the calculation of C%. A manual inspection confirmed that no false positives were included in the results.

12

Page 13: Proceedings Template - WORD

Table 1. Size and cloning level comparison

intra-module all modules

Inte

r-m

od C

%

C%

LO

C

#F C%

LO

C

#F

PCEsimple 55 1085 10 98 5244 55 80

PCEpatterns 32 931 21 86 5095 120 74

PCEunified 15 838 20 26 1128 32 -

Table 1 summarizes the cloning percentage (C%), LOC count, and number of files (#F), calculated for a typical module (we chose Project module as the typical module because it was used as the blueprint for other modules) and for all modules. The last column shows the inter-module cloning level (we chose Project module and Product module to calculate this metric). These data indicate a very high (98%) overall cloning level in PCEsimple, i.e., almost all code in PCEsimple was repeated in multiple locations. This is because we copied existing modules to create new modules, resulting in many inter-module clones. This number is also comparable with findings of our industry case study [20], which reported that up to 90% of a new module may be implemented by reusing code from existing modules.

We also see a noticeable drop in intra-module cloning from PCEsimple to PCEpatterns (from 55% to 32%). This shows that the application of patterns did indeed lower the cloning level. However, the repeated application of the same patterns for each module caused the inter-module cloning level (see last column of Table 1), and the overall cloning levels to remain high. Further unification of intra-module clones, followed by unification of the six modules into one module reduced both intra-module and overall cloning levels in PCEunified. A manual examination revealed that the remaining clones in PCEunified were either too small to warrant unification, or not practical to unify (Section 4.3 gives an example).

Table 2. Change propagation comparison

#MPCEsimple PCEpatterns PCEunified#F #L #F #L #F #L

Change 1one 7 49 5 35 2 8

all 37 251 29 195 2 8

Change 2one 3 6 1 2 1 2

all 18 36 6 12 1 2

Change 3one 9 9 1 1 1 1

all 55 55 6 1 1 1

Change 1. Link all attribute names to a Glossary page.

Change 2. Move “last edited time” to another location.

Change 3. Record each request to PCE in a log file.

13

Page 14: Proceedings Template - WORD

These figures indicate a significant drop in the size of code to be maintained. There was a 23% reduction in code size (in terms of LOC) within a module, from PCE simple to PCEunified. The overall system size decreased much more (by 78%), largely due to unification of six modules into one. Second, the chance of update anomalies reduced. Table 2 shows the distribution of the impact of three hypothetical evolutionary changes when carried out for one module, or for all modules. It illustrates how the number of modified files (#F) and modified locations (#L) decreased from PCEsimple to PCEunified, reducing the chance of an inconsistency during an update.

4. TRADE-OFF ANALYSIS

As we increased the clone unification level from PCEsimple to PCEunified during the first phase of our experiment, we observed trade-offs in other web application properties that often should not be compromised. For example, we noticed that run-time performance of PCE degraded as we unified more clones.

Figure 11. Cloning level in three conventional PCEs

In Figure 11, we graphically illustrate the relative clone unification levels of the three conventional PCEs. There are many other ways to design PCE using conventional technologies, resulting in different levels of cloning, and landing at different locations in this axis. However, we believe that the threat of the trade-offs we observed could pull us towards the left: When clone unification threatens to compromise some good quality of the application, the tendency will be to retain the clones instead, resulting in an application that appears more to the left on this axis.

This section describes these trade-off situations in detail. For each such situation,

we discuss the web application engineering realities that set the context for the trade-off, and

we use concrete examples from PCE to illustrate how clone unification forces us to compromise a desirable quality of the application.

4.1 Performance

Some web applications operate in the highly competitive environment of the Internet. As slower performance can drive users away, “criticality of performance” is one important characteristic of such web applications [8]. Unfortunately, clone unification can affect performance negatively by introducing additional function calls, function parameters, and “include” directives. As an example, a simple comparison of page generation time for five randomly selected pages of the Staff module is shown in Figure 12 (all other things being equal, averaged over 10 page requests, when PCE was hosted on a Pentium IV, 3GHz machine having 1 Gb memory). In all

14

Page 15: Proceedings Template - WORD

cases, the page generation times of the three PCEs followed the pattern: PCEsimple <<< PCEpatterns

< PCEunified. On average (shown in extreme right), PCEunified was more than three times slower than PCEsimple. This example shows how clone unification, although feasible, can incur performance trade-offs.

0

100

200

300

400

create Staff edit Staff delete Staff display Staff list Staff averagepagegen

erat

ion

time

(ms) PCEsimple PCEpatterns PCEunif ied

Figure 12. Page generation time comparison

4.2 WYSIWYG editor compatibility

Three important characteristics of a web application are “aesthetics”, “information content”, and “constant evolution” [8]. Therefore, the creation and maintenance of web application UIs require continuous involvement of multimedia authors (e.g., graphic designers), content authors (e.g., technical writers), and programmers. The first two categories typically prefer to work with WYSIWYG authoring tools. Overzealous clone unification, however, can interfere with such WYSIWYG editing. For example, the PCE UI was constructed as an HTML based template, and the program logic was placed in helper classes.

Figure 13. Parallel editing of dynamic pages

Typically, a graphic designer creates the UI template using a WYSIWYG editor (e.g., Macromedia Dream Weaver) while a programmer builds helper classes. “Hooks” (e.g., very short PHP scripts) in the template are used to extract the dynamic parts from the helper class. Except during the time programmer places hooks in the template, both experts work in parallel. Figure 13 illustrates this situation.

15

Page 16: Proceedings Template - WORD

Figure 14. Effect of clone unification on WYSIWYG editing

We observed that intensive clone unification in PCEunified had a negative impact on this setup. It brought more programming logic into the template (in the form of extra parameters, conditional branches, function calls), fragmented the template (e.g., when using Composite View pattern), and made rendering of the WYSIWYG editor increasingly different from the actual result. This shows that clone unification can force a trade-off in the ability of the web application UI to be edited using WYSIWYG editors. An example is shown in Figure 14 where (a) depicts how the page is displayed in the browser, (b) depicts how the PCEsimple version of the page is shown in a WYSIWYG editor1, and (c) shows how the PCEunified version of the page is shown in the same editor. Clearly, there is a increased distortion from (a) to (c).

We note, however, the existence of editors claiming to have high-end WYSIWYG editing capabilities for .NET (e.g., Visual Studio.NET) and JEE (e.g., NitroX2).

4.3 Platform/framework conformance

It is typical to build web applications using available platforms/frameworks rather than from scratch. Such platforms/frameworks have conformance requirements. For instance, some may require certain code/file to be physically present in a given location. We encountered two such examples in our experiment:

1. PCE Foundation required a certain security check to appear at the beginning of each file to prevent direct access to that file.

1 We used the Microsoft Frontpage editor to generate this view

2 NitroX is available from http://workshopstudio.bea.com/

16

Page 17: Proceedings Template - WORD

2. PCE Foundation required each module to be in a separate folder (bearing the same name as the module), and a file named index.php to be present in each such folder.

Clone unification can interfere with such requirements. In the first example, we did not unify the said clone because such unification prevented us from using the built-in security mechanisms of the Foundation. In the second example, we modified the Foundation (generally a risky, undesirable option) to remove that requirement. These examples show how clone unification can force a trade-off in our ability to utilize platforms/frameworks.

4.4 Ability to use multiple content types

While applications written in several languages are certainly nothing new, multilingualism is taken to a new level in web application development [26]. web applications are implemented using a mixture of content types (ASP, C#, CSS, DTD, HTML, Java, JavaScript, etc.). In our previous study [23], we found 59 content types in 17 web applications (we considered all text files that are likely to be maintained by hand); on average, one web application involved 10 different content types. Furthermore, some clones can involve multiple content types intertwined with each other. To give an example from PCE, two cloned files could include HTML, PHP, Java Script, and SQL. While each content type may have its own clone unification facilities, intermixing of multiple content types complicates clone unification. Therefore, a drive towards a high level of clone unification can compromise the ability to mix content types in a web application implementation.

4.5 Rapid development capability

Our experiment started with a clone-ridden implementation (i.e., PCEsimple), and progressively unified clones to arrive at a clone-free implementation (PCEunified). However, in a production environment we might prefer to achieve PCEunified as our first implementation rather than go through three iterations. Clone unification is implicit in such a scenario; Clones are unified before they are created at all (i.e., “clone avoidance”). We can extrapolate our observations in “unifying” clones to show that “avoiding” clones in an initial implementation too can incur a trade-off in another important property of a web application, as we shall explain next.

Being the “first-in-the-market” can be a significant advantage for a commercial web application. Consequently, web applications have “compressed development schedules” [8]. However, clone avoidance requires additional effort, which may not be affordable for a web application project done under a compressed schedule. A comparison of the three PCE designs supports this argument: There were additional concepts, more indirection, and more layers as we went from PCEsimple to PCEunified requiring more effort during initial planning, analysis, and modeling. Therefore, despite the drop in LOC, the upfront development effort and time-to-market increased as we went from PCEsimple to PCEunified. Although PCEunified was the smallest of the three, it is unlikely that we could have achieved the same high degree of clone unification in the first attempt and within the same time it took us to develop PCE simple. This shows that intense clone avoidance can compromise the ability to quickly release a working web application.

17

Page 18: Proceedings Template - WORD

4.6 Rapid evolution capability

Web application projects typically start with “insufficient requirement specification” [8]. They have to evolve continuously to match volatile requirements/technologies. Therefore, web applications are required to evolve rapidly. However, a high level of clone unification can force a trade-off in this ability. This line of argument may appear to contradict Section 3.4, where we illustrated how the number of modified files/locations decreased as we unified more clones (cf Table 2). This is not so, as we shall illustrate with the following example:

Table 3. Effort for adding composition

file

s in

volv

ed

file

s m

odif

ied

loca

tion

s m

odif

ied

LO

C

mod

ifie

d

mod

ule

s to

te

st

PCEsimple 1 1 3 n 1

PCEpatterns 7 3 4 ~ n 1

PCEunified 9 4 4 > n all

Let us consider the effort required to add a new feature to the three conventional PCEs. Table 3 shows what would be involved in adding composition relationships to only one of the modules (assuming it only supported the other two types of relationships before). It shows that the number of files that might be affected by this new feature, the number of files actually modified, the number of independent locations modified, and the number of LOC modified tended to increase as we moved from PCEsimple to PCEunified. We would need to test the functionality of all six modules in PCEunified although the change affected only one module. This could be a major burden, given the immaturity of web application testing techniques. In general, clone unification limits the degree of freedom with which individual clones can evolve independently of the others. Therefore, while clone unification may ease certain kind of modifications (typically, modifications that need to be repeated for multiple clones, such as given in Table 2) it can also render certain other kind of changes more difficult (typically, localized modifications applicable to a minority of the clones, such as given in Table 3).

4.7 Efficiency of source code packaging

Often, the server page portion of a web application is delivered in source form. In such cases, it is desirable to eliminate all unused code from the delivered code. This may be due to space/time efficiency concerns (e.g., severe space constrains on the server) or to avoid transfer of unused client-side scripts over the network. Alternatively, this may be to minimize the impact of modifications. Most web applications are accessed globally, and need to be available 24/7. Downtime caused by updates to an unused part of the code is unacceptable for such web applications. Unfortunately, clone unification sometimes injects unused code into the delivered code. For example, in PCEunified, the Staff module used only 77% of the unified module. If the unified module was reused in another web application to serve as a Staff module, it would result in carrying over 23% of the code that would not be used at all. Therefore, clone

18

Page 19: Proceedings Template - WORD

unification can sometimes inject unused code into the distribution package, compromising our ability to distribute a clean, minimal, source code package.

4.8 Ability to vary web application’s run-time structure

A clone can represent a complete stand-alone system, not just a part of a system. A common example of such a situation is a product line. In the PCE experiment, we too had three similar systems (i.e., PCEsimple, PCEpatterns, PCEunified) that we could consider clones of each other. For example, 55% of the code of PCEsimple was found to have a cloned counterpart in PCEpatterns (see Figure 15). Occasionally, it may be necessary for such cloned systems to have different run-time structures. Possible reasons for this include:

to fit a new API/framework/platform (e.g., to deploy PCE modules on a different Foundation)

when one variant requires better performance than the rest (e.g., PCEsimple Vs PCEunified )

for compatibility with other legacy systems at the deployment site (e.g., to integrate with a legacy system that uses an old version of PHP)

Although our reasons for having three PCEs were quite different from those given above, we too found ourselves in a similar scenario: We had to maintain three separate web applications having drastically different run-time structures, yet having much similarity among them. Unification of such clones (i.e., having one generic system, instead of multiple cloned systems) requires some re-alignment in the run-time structures, possibly compromising the motives behind varying the run-time structures in the first place.

Figure 15. Similarity across the first three PCEs

5. PHASE II: USING MIXED-STRATEGY TO UNIFY CLONES

In the second phase of the experiment, we used a complementary technique called the mixed-strategy to unify clones in PCE. We shall first briefly introduce the fundamentals of the mixed-strategy.

19

Page 20: Proceedings Template - WORD

5.1 Introduction to mixed-strategy

Mixed-strategy is a synergistic mixing of conventional technologies with meta-programming to tackle similarities in programs. The name mixed-strategy emphasizes that meta-programming complements rather than competes with conventional technologies.

Figure 16. Applying mixed-strategy to unify clones

Applying mixed-strategy in clone unification translates to applying meta-programming on application code to unify clones at meta-code level (illustrated in Figure 16). By “unify clones at meta-code level”, we mean that we create a clone-free meta-code that can be used to generate the original application code. For brevity, let us call the former the “meta-code” and the latter the “application code”. Mixed-strategy assumes that maintenance of the code will be primarily done through the meta-code. On the other hand, only the application code can be compiled and executed. During run-time, there is no difference between a traditional application and a mixed-strategy application. cpp (C++ preprocessor), type parameterization (such as Java generics or C++ templates) or AOP (Aspect Oriented Programming) are well known examples of meta-code in that sense. We have to pre-process cpp directives, instantiate type parameters or weave aspect code into the classes to obtain the concrete program that can be compiled and executed.

20

Page 21: Proceedings Template - WORD

function getContainer(){

global $db;

$row = CompositionTableGateway::getRow($mod, $id);

...

$data = TaskTableGateway::getData($row['leftModInstanceID']);

$container['SubMod'] = $row['leftMod'];

$container['ID'] = $row['leftModInstanceID'];

return $container;

}

Figure 17. Function getContainer() in TaskTableModule.php

function getContainer(){

global $db;

$row = CompositionTableGateway::getRow($mod, $id);

...

$data = StaffTableGateway::getData($row['leftModInstanceID']);

$container['SubMod'] = $row['leftMod'];

$container['ID'] = $row['leftModInstanceID'];

$container['Name'] = $data['Name'];

return $container;

}

Figure 18. Function getContainer() in StaffTableModule.php

XVCL [28] is the de facto meta-programming language used in the mixed-strategy approach. Next, we illustrate by example how we can use XVCL to unify multiple clone instances into generic, adaptable, reusable meta-components. An XVCL “meta-component” may correspond to any program unit (or part thereof) such as a subsystem, interface, program component, group of classes, class, method, attribute declarations, fragment of method implementation. In the following example, we show how to use XVCL to unify two cloned functions in PCEpatterns:

Consider the function getContainer() occurring in PHP files TaskTableModule.php and StaffTableModule.php (shown in Figure 17 and Figure 18, respectively), with the differences between the clone instances highlighted).

21

Page 22: Proceedings Template - WORD

x-frame getContainer

function getContainer(){

global $db;

$row = CompositionTableGateway::getRow($mod, $id);

...

$data = <@ModuleName>TableGateway::getData($row['leftModInstanceID']);

$container['SubMod'] = $row['leftMod'];

$container['ID'] = $row['leftModInstanceID'];

<break moreData>

return $container;

}

Figure 19. x-frame getContainer

An XVCL meta-component is called an x-frame (or an x-framework, if it consists of multiple x-frames). Figure 19 shows an x-frame that unifies the two clone instances3. It consists of the common code of the clones, with XVCL commands marking variation points, i.e., points where function getContainer() in StaffTableModule.php differs from function getContainer() in TaskTableModule.php. One variation points is marked with ‘@ModuleName’, which is a reference to an XVCL variable called ModuleName. The other variation point is <break moreData>.

x-frame TaksTableModule

<?php

class TaskTableModule{

...

<set ModuleName = "Task" />

<adapt getContainer />

...

}

?>

Figure 20. x-frame TaskTableModule

3 For readability, we use a simplified, pseudo-XML notation in these examples.

22

Page 23: Proceedings Template - WORD

x-frame StaffTableModule

<?php

class StaffTableModule{

...

<set ModuleName = "Staff" />

<adapt getContainer >

<insert moreData>$container['Name'] = $data['Name'];</insert>

</adapt>...

}

?>

Figure 21. x-frames StaffTableModule

An XVCL meta-component such as x-frame getContainer is generic and adaptable, in that it can be used to generate customized variants of the component to match various reuse contexts. For that, we first create two x-frames TaskTableModule and StaffTableModule (Figure 20), to represent the two reuse-contexts TaskTableModule.php and StaffTableModule.php. They contain the PHP code from the respective source files (shown in Figure 17 and Figure 18), with the cloned method replaced by the XVCL code required to generate the appropriate customized clone instance. The <adapt getContainer> command indicates that the x-frame getContainer will be used as the template for code generation. <adapt> is the XVCL command used to include contents of a specific x-frame (or an x-framework) into the generated code. It is somewhat similar to an include directive in other programming languages. Setting the XVCL variable moduleName to Task before the <adapt> command in the x-frame TaskTableModule generates the original PHP code for the getContainer() method in TaskTableModule.php. For the StaffTableModule, we additionally use the <insert> command to inject some additional code into the variation point marked by the break command (see Figure 19).

x-frame PCE

<set TableModuleName = "TaskTableModule, StaffTableModule" />

<while TableModuleName >

<adapt @TableModuleName>

</while>

Figure 22. x-frame PCE

The x-frame called PCE shown in Figure 22 contains the XVCL code required to generate the two PHP files, which include the two clone instances. It also illustrates the use of the XVCL while command.

23

Page 24: Proceedings Template - WORD

Figure 23. Generating application code from the x-framework

The four x-frames described so far connect together by <adapt> commands to form an x-framework, as shown on the left side of Figure 23. Arrows in the figure represents <adapt> commands. The top-level x-frame of an x-framework – x-frame PCE in this case – is also called the SPC (to mean “specification frame”). The SPC acts as the handle to the x-framework. Running this x-framework through a preprocessor (called XVCL processor) generates the original application code (shown in Figure 18).

Figure 24. Further evolved x-framework for PCE

As we continue to unify other clones this way, we add more x-frames to the x-framework, and the x-framework evolves. Figure 24 shows a more evolved state of our x-framework. According to this x-framework, f1 and f2 are x-frames that unify other cloned functions between TaskTableModule.php and StaffTableModule.php. The x-frame TableModule represents a generic TableModule that captures the clones occurring in various TableModules such as TaskTableModule, StaffTableModule, etc. X-frames Task and Staff contains the XVCL

24

Page 25: Proceedings Template - WORD

code that uses the generic TableModule to generate specific TableModules TaskTableModule and StaffTableModule, respectively.

XVCL can handle much more complex variations than those illustrated in the preceding example. We refer the reader to the XVCL web site [28] for more XVCL learning resources.

As illustrated by the preceding example, XVCL:

1. represents each group of similar program structures (i.e., clones) in a unique, generic but adaptable form (e.g., x-frame getContainer) while explicating variation points (e.g., use of @ModuleName and <break MoreData> in x-frame getContainer).

2. records the exact location of each such structure in a program (e.g., <adapt getContainer> in x-frames TaskTableModule and StaffTableModule marks the two locations of the clone).

3. delineates the differences among specific program structures in each group as deltas from their generic form (e.g., <insert> command in x-frame StaffTableModule).

4. automates derivation of specific structures from their generic forms to produce an executable program (i.e., using the XVCL processor, as shown in Figure 23).

Following from the above four attributes, the mixed-strategy is expected to positively affect the maintenance in the following ways:

Modification points are reduced. Since modifications can be applied to the XVCL version of the clone and propagated automatically to all instances, the risk of update anomalies is reduced.

Similarities and variations (i.e., commonalities and differences) are explicitly expressed using XVCL. This helps in deciding whether a modification should be applied to all clone instances identically.

Reduces the code size that needs to be maintained. Since the XVCL code does not contain duplicate code, the entire system can be understood by examining the much smaller XVCL code.

25

Page 26: Proceedings Template - WORD

5.2 PCEms: applying mixed-strategy to PCE

Figure 25. Adding PCEms to the experiment

When applying the mixed-strategy to PCE, we use PCEpatterns as the target application code (This is an arbitrary decision, used only for illustration. Other scenarios are possible, for instance, we can use PCEsimple as the target and applied XVCL to unify its clones). Then, we create PCEms by using XVCL to unify any remaining clones in PCEpatterns.

Figure 26. X-framework for PCEms

Figure 26 illustrates a simplified version of the PCEms x-framework. It is a further evolved version of the x-framework shown in Figure 24, and the x-frames we discussed in the previous section are highlighted. We do not go into detailed description of this PCEms x-framework here. In brief, the x-frame [M] and the x-framework under it represent a generic PCE module. X-frames at the level of Staff, Project and so on represent module-specific variations that need to be injected to the generic module when generating respective modules in PHP. The PCE is the specification frame that represents the whole PCEpatterns.

26

Page 27: Proceedings Template - WORD

Table 4. Size and cloning level comparison (of all four PCEs)

intra-module all modules

Inte

r-m

od C

%

C%

LO

C

#F C%

LO

C

#F

PCEsimple 55 1085 10 98 5244 55 80

PCEpatterns 32 931 21 86 5095 120 74

PCEunified 15 838 20 26 1128 32 -

PCEms 17 864 24 15 1063 34 -

We have found the code size and cloning level in PCEms to be comparable to those in PCEunified. In fact, the size and cloning level of the whole PCEms is marginally better than those of PCEunified. For easy comparison, Table 4 shows the cloning level of all four PCEs (figures for the first three PCEs were already reported in Table 1). This shows that mixed-strategy could unify clones to the same extent as achieved using PHP features and design patterns. However, the important difference is that mixed-strategy can do this without incurring the trade-offs we previously observed. The next section explains how mixed-strategy avoids each trade-off.

5.3 How mixed-strategy avoids the trade-offs of clone removal

Since mixed-strategy works at the meta-code level and not at the source code level, the code that is executed at run-time is not affected at all. For example, PCEms produces the same source code as PCEpatterns. Therefore, mixed-strategy does not affect the run-time properties of an application at all, avoiding the following trade-offs:

Performance (see Section 4.1) - The performance of PCEms is the same as PCEpatterns since the application code generated by PCEms is identical to PCEpatterns. If we want even better performance, we can apply mixed-strategy on PCEsimple. Thus, mixed-strategy can avoid the performance trade-offs involved in clone unification.

Platform/framework conformance (see Section 4.3) - In PCEms, we have unified the two clones that required to be in the application for the sake of framework compliance. Since the clones are unified at meta-program level, PCE’s framework compliance is not affected.

XVCL treats application code as plain text. Further, XVCL’s parameterization capabilities are not restricted by the syntax/semantics of the application code. Therefore, mixed-strategy does not hinder intermixing of content types in the application code, avoiding the trade-off mentioned in Section 4.4 (i.e., Ability to use multiple content types).

Mixed-strategy can generate multiple variant instances of a clone using the same meta-code. Therefore, it can generate customized code for each reuse context, containing the minimal code required, thus avoiding the overhead of library code that is never used. This avoids the trade-off mentioned in Section 4.7 (i.e., Efficiency of source code packaging ) because each PCE module in PCEms will contain only the minimum code required for that module.

27

Page 28: Proceedings Template - WORD

Figure 27. WYSIWYG editing when using mixed-strategy

Mixed-strategy’s ability to generate multiple variant instances of a clone from the same meta-code base can also minimize the trade-off observed in the use of WYSIWYG editors (see Section 4.2). Mixed-strategy can generate an extra WYSWYG-friendly version that has less program logic. Non-programmers will work on this WYSWYG-friendly version while programmers are in charge of “back-propagating” the changes to the XVCL version and then “forward-propagating” them to all instances of the clone in application code. Figure 27 illustrates this scenario.

Figure 28. Using XVCL to unify all three PCEs

Furthermore, if we consider similar products as clones of each other, we can use the mixed-strategy approach to generate members of a product line using the same generic meta-code base. As XVCL is agnostic to the semantics of the application code, mixed-strategy solutions are not

28

Page 29: Proceedings Template - WORD

sensitive to the run-time structure of the variant applications generated. Hence, we can easily use mixed-strategy to unify highly similar web applications that have different run-time structures. For example, we can have one generic PCEall that generates all three PCEs (see Figure 28).

Section 4.5 argued that clone unification could lengthen the time-to-market of a web application project. In contrast, we can use the mixed-strategy approach in such a way that we retain and even enhance the ability to release the web application quickly. We do so by applying the mixed-strategy only after the web application has been released. This means during initial periods where the code is volatile, developers can concentrate on building the web application quickly, without worrying about mixed-strategy or clones. In fact, we can use cloning during this stage to speed up the development process (as cloning is faster than using more complex reuse techniques). We can tackle the clones later using mixed-strategy. Since mixed-strategy does not affect the application code, the released product remains the same. This way, mixed-strategy can in fact improve the speed of initial development rather than hinder it.

Similarly, the evolution of a mixed-strategy solution need not be done using the meta-code alone. If there is a need to change something urgently in the web application, one can always work on the generated application code until the change is stable and released. We can back-propagate the changes to the meta-code later, when the time pressure has eased. This way, mixed-strategy can tackle clones without compromising the ability to evolve web applications rapidly (see Section 4.6). However, the success of this approach depends on tool support for back-propagating the changes from the application code to the meta-code.

6. DISCUSSION OF RESULTS

Our study shows that it is technically feasible to use server pages to unify most of the clones in PCE, but such unification could compromise many important properties of PCE. The mixed-strategy approach can unify the same clones at the meta-program level while avoiding most of those compromises.

One internal threat to the validity of our findings is the question of whether the trade-offs observed were really caused by the use of server pages to unify clones or by some other factors. However, in Sections 4.1 to 4.8, we explained how the unification of clones using server pages was really the cause of the trade-off in concern. Another internal threat to validity is whether we can credit mixed-strategy for avoiding the trade-offs. Section 5.3 explains how the mixed-strategy avoids each of the trade-off previously observed.

As ours is a lab case study, the main external threat to the validity of our findings is how far we can generalize them to industry practice. Below, we discuss various aspects of this threat.

Generalizability of PCE to industry web applications:

PCE represents a typical information retrieval system, a common type of web applications. In addition, we used functional requirements and the conceptual model of a real web application built by our industry partner. However, PCE is much smaller in comparison with most industry web applications.

Generalizability of our design techniques to those of industry:

29

Page 30: Proceedings Template - WORD

We followed design best practices from industry (captured by design patterns) in PCE design. We used industry accepted architectures and frameworks as the core of our PCE implementation. We maintained a tight feedback loop with our industry partner throughout the experiment to validate our approaches/observations. In the end, the cloning level of PCE was also comparable to a similar web application built by our industry partner [20]. While we acknowledge that there are other possible designs for each of the PCE versions, we believe that at least some of the trade-offs we observed are applicable for those designs as well.

Generalizability of our experiment setting to that of industry:

All implementations were done by the first author who is a trained software engineer having prior industry experience in web application building. However, we note that ours has been a one-man project team, which affects its generalizability to industry projects done by larger project teams. In particular, the effect of using XVCL in a team setting is yet to be studied.

Generalizability of our findings to other web application technologies:

PHP is only one of technologies used for dynamic page generation. Next, we discuss how we can relate our observations to alternative web application technologies.

JEE™ and .NET™ - two advanced platforms for implementing web applications. They provide rich sets of general middleware-level infrastructure services (e.g., for managing security, transactions, resources). In our PHP solution, the Foundation provides similar service, but in addition, it also provides more application-specific infrastructure services. Therefore, PCE implemented on the .NET or J2EE is likely to follow the same high-level architecture shown in Figure 5, possibly with a thinner Foundation (since some middleware-level services are provided by the platform itself). As the design patterns we applied in our experiment are also applicable to .NET and JEE, we expect to see similar cloning situations across the PHP, .NET and JEE platforms. In addition, the basic role of server pages remains the same whether we use PHP, ASP.NET or JSP on JEE. Therefore, we believe that the limits involved in using server pages, at least in the situations discussed in Section 5.3, apply independently of the platform on which these techniques are used. Further work is required to support or dismiss this hypothesis.

Another technology we can use to unify clones in web applications is Server Side Includes (SSI): we can put a clone in a separate file, and replace the clone instances with SSI “include” directives. The web server will replace the directive with the clone text, at the point of serving the web page. Since SSI avoids the overhead of generating the whole page dynamically (as done by PHP), it is considered a good way to add small amounts of dynamic content to an otherwise static website, without the hassle/overhead of using a full-fledged server page technique such as PHP. However, a web application such as PCE by default requires a full-fledged server page technique, to achieve the level of dynamic behavior expected from it. This reduces the attractiveness of using SSI to unify the clones in PCE. Furthermore, the clones we unified had much variation between clone instances. Both PHP and XVCL are equipped with sufficient mechanisms to tackle such variations while SSI is more suited for unifying exact duplicates or clones with very simple variations. In summary, the clones we encountered may not be a good fit for SSI. However, we note that the use of SSI for clone unification may have a lesser impact on performance than using PHP to unify the same clones.

30

Page 31: Proceedings Template - WORD

Other related technologies requiring further work include web application frameworks/generators (e.g., Struts, Ruby on Rails), incarnations of the server page technique in other languages (e.g., ColdFusion), template engines (e.g., Velocity), client-side technologies (e.g., AJAX) and transformation techniques (e.g., XSLT).

7. RELATED WORK

Cloning is a well-known problem in traditional software development, and it has been under research for more than a decade. The interest in the area appears to be growing, with recent work focusing on clone detection (e.g., [9]), analysis (e.g., [3]) and determining the actual harmfulness of clones (e.g., [18],[19]). Cloning in the web domain too has started to attract interest from the research community (e.g., [2][4] [6][7] [26]). Our work adds to this body of knowledge, by providing an in-depth treatment of clone unification trade-offs in web applications, together with a possible solution.

Coming from the non-Web domain, Cordy’s work [5] in critical financial systems reports that the risk of breaking an existing system is a great deterrent to clone unification. We can formulate this as a trade-off, i.e., clone unification forces us to compromise system reliability. He also mentions that certain clones speed up development/maintenance by introducing a “degree of freedom”. Although not the main focus of their work, Synytskyy et al. [26] point out that overzealous clone unification can result in hard-to-understand spaghetti code. We agree with both these views, as implied by Sections 4.5 and 4.6. Synytskyy et al. [26] also mention how multiple content types complicate clone detection in web domain. We have observed that the same is true for clone unification (cf Section 4.4). Boldyreff and Kewish [4] propose storing unified clones in a relational database and retrieving clones at run-time using scripts. A somewhat similar approach is proposed by Ricca and Tonella [25]. Clone unification by storing cloned web page fragments in a database in this manner is a powerful mechanism with its own merits. However, it should be used in moderation as it may aggravate trade-offs in several areas, such as performance and WYSIWYG editing. Clone unification as proposed by [4][25] and [26] is fully automated ([25] allows manual refinements to the generated result). Ping and Kontogiannis [22] propose an approach to automatically refactor web sites to remove “potential duplication”. Such automation is a step forward as it greatly reduces the effort required in clone unification. However, one needs to be careful not to offset the advantages of automation with the cost of trade-offs we have highlighted here. Other researchers who have observed that clones are not always appropriate to remove include Kim et al. [17], and Kapser and Godfrey [16].

We often find cloning in conventional software written in languages such as Java or C++. Many clones are “essential” in the sense that they either play some useful role in a software system, or occur because of limitations of programming languages. Such clones cannot be removed in any case. We refer to detailed analysis of the cloning problem in [13].

With the mixed-strategy approach to clone unification as described in Section 5, we unify clones at the meta-level program representation. An executable program, derived from its meta-level counterpart, contains clones, but as we maintain and reuse the meta-level program representation, we are not concerned with that. Therefore, we can enjoy the benefits of generic, non-redundant program representation, no matter why clones occur in a program, without

31

Page 32: Proceedings Template - WORD

creating conflicts with other goals, and without being constrained by programming language rules.

We have applied the mixed-strategy approach in lab studies and industrial projects in the web application domain, application programs, and class libraries, written in a range of programming languages. Our project experiences confirm that clone unification at the meta-level is practical, and a meta-program representation is conceptually simpler and easier to maintain than the actual program [13]. Clone unification at the meta-level is also an effective though unconventional reuse strategy, where the subjects of reuse are meta-level generic structures, rather than components as in conventional reuse.

8. CONCLUSION

Using an empirical study, we have showed that it is technically feasible to use server pages to unify most clones in web applications. Such unification greatly reduces code size and the chance of update anomalies. However, this approach can compromise many important web application properties. In a real-world web application project, these trade-offs limit how far we can practically push server pages towards clone unification, and they may explain why clones persist in software. Understanding these trade-offs provides us with a critical criterion against which solutions to cloning should be evaluated.

The second phase of our experiment has showed that the mixed-strategy approach could unify clones at the meta-program level while avoiding most of the compromises that clone unification otherwise entails. While there may be many other ways to tackle the clones we have observed in PCE, the mixed-strategy approach attempts to do so in a manner independent of domain, language and platform, with only modest extensions to today’s programming techniques.

9. ACKNOWLEDGEMENTS

The authors wish to thank the anonymous reviewers for their very useful suggestions, our industry collaborators Ulf Pettersson and his team (ST Engineering, Singapore) for their valuable feedback during the experiment, Katsuro Inoue and his team (Osaka University, Japan) for their help in providing clone detection tools, and former member of our research team Hamid Abdul Basit for his valuable feedback. This research was supported by NUS research grant R-252-000-239-112.

10. REFERENCES

1. Alur D, Crupi J, Malks D. Core J2EE Patterns: Best Practices and Design Strategies, Prentice Hall, 2003

2. Aversano L, Canfora G, De Lucia A, Gallucci P. Web site reuse: cloning and adapting, Proceedings of the 3rd International Workshop on Web Site Evolution, (WSE'01), 2001; 107 – 111

3. Bakota T, Ferenc R, Gyimothy T. Clone Smells in Software Evolution. Proceedings of the International Conference on Software Maintenance (ICSM07), 2007, pp. 24 – 33

32

Page 33: Proceedings Template - WORD

4. Boldyreff C, Kewish R. Reverse engineering to achieve maintainable WWW sites. Proceedings of the 8th Working Conference on Reverse Engineering (WCRE 01), 2001; 249 – 257

5. Cordy JR, Comprehending Reality: Practical Challenges to Software Maintenance Automation. Proceedings of the 11th IEEE International Workshop on Program Comprehension, (IWPC03), 2003; 196 – 206

6. De Lucia A, Francese R, Scanniello G, Tortora G. Reengineering Web Applications Based on Cloned Pattern Analysis. Proceedings of the12th IEEE International Workshop on Program Comprehension (IWPC04), 2004; 132 – 141

7. De Lucia A, Scanniello G, and Tortora G. Identifying Clones in Dynamic Web Sites Using Similarity Thresholds (ICEIS04), 2004; 391 – 396

8. Deshpande et at. Web Engineering. Journal of Web Engineering, 2002; 1(1): 3 – 17

9. Evans WS, Fraser CW, Ma F. Clone Detection via Structural Abstraction. Proceedings of the 14th Working Conference on Reverse Engineering (WCRE07), 2007; 150 – 159

10. Fowler M. Patterns of Enterprise Application Architecture, Addison-Wesley, 2003

11. Fowler M. Refactoring - improving the design of existing code, Addison-Wesley, 1999

12. Gamma E, Helm R, Johnson R, Vlissides J. Design patterns: Elements of reusable object-oriented software, Addison-Wesley, 1997

13. Jarzabek S. Effective Software Maintenance and Evolution: Reused-based Approach, CRC Press Taylor and Francis, 2007

14. Kamiya T, Kusumoto S, Inoue K. CCFinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, July 2002; 28 (7): 654 – 670

15. Kang K et al. Feature-Oriented Domain Analysis (FODA) Feasibility Study (CMU/SEI-90-TR-21, ADA 235785). Software Engineering Institute, CMU, 1990

16. Kapser C, Godfrey M W. “Cloning Considered Harmful” Considered Harmful, Proceedings of the of the 2006 Working Conference on Reverse Engineering (WCRE'06), 2006; 9 – 28

17. Kim M, Sazawal V, Notkin D, Murphy GC. An Empirical Study of Code Clone Genealogies. Proceedings of the 10th European Software Engineering Conference & the 13th Foundations of Software Engineering (ESEC/FSE’05), 2005; 187–196

18. Krinke J. A Study of Consistent and Inconsistent Changes to Code Clones. Proceedings of the 14th Working Conference on Reverse Engineering (WCRE07), 2007; 170 – 178

19. Lozano A, Wermelinger M, Nuseibeh B. Evaluating the Harmfulness of Cloning: A Change Based Experiment. Proceedings of the 4th International Workshop on Mining Software Repositories (MSR07), 2007; 18 – 18

20. Pettersson U, Jarzabek S. Industrial Experience with Building a Web Portal Product Line using a Lightweight, Reactive Approach. European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE'05), 2005; 326 – 325

21. PHP usage statistics, http://www.php.net/usage.php [30 Apr 2007]

33

Page 34: Proceedings Template - WORD

22. Ping Y, Kontogiannis K. Refactoring Web sites to the Controller-Centric Architecture. 8th

Euromicro Working Conference on Software Maintenance and Reengineering (CSMR'04), 2004; 204 – 213

23. Rajapakse DC, Jarzabek S. An Investigation of Cloning in Web Applications. Proceedings of the 5th International Conference on Web Engineering (ICWE'05), 2005; 252 – 262

24. Rajapakse DC, Jarzabek S. Using Server Pages to Unify Clones in Web Applications: A Trade-off Analysis. Proceedings of the 29th International Conference on Software Engineering (ICSE’07), Minneapolis, USA, May 2007; 116-125

25. Ricca F, Tonella P. Using Clustering to Support the Migration from Static to Dynamic Web Pages. Proceedings of the 11th IEEE International Workshop on Program Comprehension (IWPC’03), 2003; 207 – 216

26. Synytskyy N, Cordy JR, Dean T. Resolution of static clones in dynamic Web pages, Proceedings of the 5th IEEE International Workshop on Web Site Evolution, (IWSE'03), 2003; 49 – 56

27. Trowbridge et al. Enterprise Solution Patterns Using Microsoft .NET (Version 2.0), Microsoft Press, 2003

28. XVCL (XML-based Variant Configuration Language) http://xvcl.comp.nus.edu.sg

29. Ueda Y, Kamiya T, Kusumoto S, Inoue K. Gemini: Maintenance Support Environment Based on Code Clone Analysis. Proceedings of the 8th IEEE Symposium on Software Metrics, 2002; 67 – 76

30. Yang J, Jarzabek S. Applying a Generative Technique for Enhanced Reuse on J2EE Platform. Proceedings of the 4th International Conference on Generative Programming and Component Engineering, (GPCE'05), 2005; 237 – 255

31. Zhang W, Jarzabek S. Reuse without Compromising Performance: Experience from RPG Software Product Line for Mobile Devices. Proceedings of the 9th International Software Product Line Conference (SPLC’05), 2005; 57 – 69

34