23
A Quantitative Study of Design Pattern Adoption Michael Hahsler Department of Information Business Vienna University of Economics and BA, Augasse 2-6, A-1090 Vienna, Austria [email protected] Abstract - Design patterns are an accepted method to improve software development. There are many benefits of the application of patterns described in the literature. However, there is a lack of quantitative research concerning the actual adoption of design patterns and about the realization of the benefits. In this paper we use the log files from CVS for over 1000 open-source software projects realized in Java to gain an insight into the usage of design patterns. 1 Introduction The need for reliable software has made software engineering an important industry in the last decades. The steady progress has lately produced an enormous number of different approaches, concepts and techniques: the object oriented paradigm, agile programming, components based systems, frameworks and software patterns, just to name a few. All these approaches claim to be superior, more effective or more appropriate in some area to their predecessors. However, to prove these claims qualitative research is the most common. Isolated case studies and surveys are used. Quantitative approaches rare to find. Some reasons are: 1. It is often hard to identify the application of the concepts in real world projects if it is not explicitly mentioned that it is used. E.g.: using a programming language that supports object oriented constructs like C++ does not automatically mean that the project is analyzed, designed and implemented in a purely object oriented way. 2. It is hard to collect the needed quantity of information about the actual application of the approaches. Case studies -1-

A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

A Quantitative Study of Design Pattern Adoption

Michael HahslerDepartment of Information Business

Vienna University of Economics and BA, Augasse 2-6, A-1090 Vienna, [email protected]

Abstract - Design patterns are an accepted method to improve software development. There are many benefits of the application of patterns described in the literature. However, there is a lack of quantitative research concerning the actual adoption of design patterns and about the realization of the benefits. In this paper we use the log files from CVS for over 1000 open-source software projects realized in Java to gain an insight into the usage of design patterns.

1 IntroductionThe need for reliable software has made software engineering an important industry in

the last decades. The steady progress has lately produced an enormous number of different approaches, concepts and techniques: the object oriented paradigm, agile programming, components based systems, frameworks and software patterns, just to name a few. All these approaches claim to be superior, more effective or more appropriate in some area to their predecessors. However, to prove these claims qualitative research is the most common. Isolated case studies and surveys are used. Quantitative approaches rare to find. Some reasons are:

1. It is often hard to identify the application of the concepts in real world projects if it is not explicitly mentioned that it is used. E.g.: using a programming language that supports object oriented constructs like C++ does not automatically mean that the project is analyzed, designed and implemented in a purely object oriented way.

2. It is hard to collect the needed quantity of information about the actual application of the approaches. Case studies of some projects are used, but to analyze a large number of different projects is not very common. This is due to the unavailability of the information needed, e.g. the development documentation of commercial projects is not detailed enough for analysis or it is not publicly available and so on.

3. The results are often difficult to interpret and compare. E.g. the comparison of two projects, let us say one using components and the other some alternative technique can not be easily compared, since the two projects are probably very different in respect to the team, the aims and even the requirements.

In this paper we try to present an quantitative analysis for the application of design patterns. Design patterns describe not obvious solutions for recurring software design problems in a certain context. Since their introduction in Gamma et al. [GHJV95], design patterns were rapidly accepted by the software engineering community. The number of publications about design patterns have soared, and even a several conference series on the topic were initiated. In the US the conference series has the name Pattern Languages of Programs (PLoP) and in other parts of the world conferences series like EuroPLoP, KoalaPLoP, ChiliPLoP and several others were started.

-1-

Page 2: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

However, the research about the actual usage of design patterns and the benefits of adopting patterns is still underdeveloped.

We do ... usage, communication [.....]

2 Research MethodIn this paper we analyze publicly available version control data on the development of

open source software projects available via the Source Forge Web site (http://www.sourceforge.net). This approach was already successfully used to analyze the development of the GNOME project [KoSe02] and the Apache Web Server project [MoFH00], both large scale open source projects.

Source Forge hosts currently over 50000 open source projects and has over 500000 registered users. It provides the projects with a version control facility as well as a presentation platform and communication channels for developers and users. For Source Forge each developer has a unique pseudonym, the user name. Each project has a home page with general information about the project like the project name, the users in charge of the project (administrators), the development status of the project, the intended audience, the programming languages, and more general information.

For this paper we analyzed the projects using Java that employ the version control tool Concurrent Versions Control (CVS) [Foge99] which enables parallel development by several programmers, comments for changes, control of releases, reversing changes, generate history logs for each file, and much more. A project is a collection of individual files which are stored with versioning information in a CVS repository. New files can be added to the project and existing files can be changed by developers of the project at will. To change a file a developer has to perform the following steps:

1. Obtain the latest version (or some other version) of the file from the repository (‘checkout’ the file).

2. Change the file locally.3. Update the repository with the changes (execute a ‘checkin’ for the file).

During the checkin the programmer is encouraged to add a short log message which explains the changes and makes it later on easier to understand them. Also CVS records the changes in the file in lines of code (LOCs) added and LOCs deleted by the programmer. CVS uses as the definition of LOCs just the number of physical lines and does not distinguish between program statements, commentary or arbitrary text. We adopt this definition for this research. Furthermore, CVS does not account for changed lines but records a changed line as a line deleted and a new line added. Therefore, the growth of LOCs for a checkin is the difference between the LOCs added and the LOCs deleted. A fragment of a CVS log is shown in table 1 where the information given for each checkin (the date, the programmer called in CVS author and the LOCs added and deleted) is printed in bold font. In the log message you can see the comments the programmer provided with the checkin.

To analyze the application of design patterns we first have to identify the patterns in the projects. Design patterns are design artifacts that result in special constructions in the final code, e.g. several objects that interact in a certain way. It is very difficult to infer the application of design patterns automatically directly from code. Therefore, we use the log messages to identify the application of design patterns by looking for their names and descriptions. Although design patterns can be applied without mentioning them or a pattern can be referred to by a different name, this seems to be a reasonable approach due of the following fact. One of the major contributions of patterns stressed throughout the literature is the fact, that the names of design patterns become part of a common design language which

-2-

Page 3: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

programmers use to communicate more efficiently [GHJV95, p.352], [BMRS96, p.6], [Vlis98, p.6]. Therefore, the usage of patterns is beneficiary for other programmers and for documentation purpose if the name is given in the description, in our case the log file. By analyzing the log file we can find the cases when patters are used for this reason.

Although more design patterns were introduced in the literature in e.g. [CoSc95], [BMRS96], [VlCK96] and [MaRB98], we only use the original design patterns introduced by Gamma et al. in [GHJV95]. The list of patterns plus the keywords used are listed in the appendix. To reduce the problem with pattern names that are very common words for software design besides the design pattern, patterns were only counted for a checkin, if the log message also contained at leased one of the keywords taken from the pattern intends (italic in bolt in the appendix) or were accompanied by the word ‘pattern.’

We extracted the CVS log files for all projects, parsed them using regular expressions and entered the information in a relational database. The structure of the database is depicted by the entity-relationship diagram in Figure 1. Each project contains several files. Each checkin is done by one programmer and can contain none, one or several design pattern patterns. One or more of the programmers administrate a project.

All analyzes in the subsequent sections are performed using standard SQL select statements on the data base and a standard statistical package.

-3-

Table 1:Excerpt of a CVS log for a file in a projectRCS file: /cvsroot/jboss/ejboss/src/java/org/ejboss/ejb/EnterpriseBeanWrapper.java,vWorking file: ejboss/src/java/org/ejboss/ejb/EnterpriseBeanWrapper.javahead: 1.23branch:locks: strictaccess list:symbolic names: EJBoss-1-0-PR2-A: 1.21 EJBoss-1-0-PR1-A: 1.20 EJBoss-1-0-DR2-A: 1.18 AUTOCONF: 1.8.0.2keyword substitution: kvtotal revisions: 23; selected revisions: 23description:----------------------------revision 1.23date: 2000/06/01 00:51:24; author: sylvain; state: Exp; lines: +16 -14Bug Fix : if a transaction-scoped method of a bean creates a new entity bean, we mustn't terminate the transaction after ejbPostCreate. We have to wait until the end of the encapsulating transaction..----------------------------.. (lines omitted).----------------------------revision 1.2date: 1999/10/28 23:35:21; author: fleury; state: Exp; lines: +91 -34Bulk of the work, pass the invoke on to the right wrapper.Use the wrapper manager for that. Also we now havean empty constructor as we use the "Factory" pattern in

Page 4: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

Figure 1: Entity-relationship diagram of the used database

3 The Data SetThe used data set includes 1319 open source projects from source forge using Java as

the programming language. The projects where downloaded between August and September 2001 and were selected by the following criteria: only projects that enabled CVS and that are non-empty, i.e. have files and checkins in the CVS repository. In total the projects contain 57771583 LOCs and 2164 different programmers had contributed. and Figure 3 depicts the distribution of the size of the projects by the LOCs and the number of files. Many of the projects are rather small with a mean of 43800 LOCs or 301 files but there is a significant amount of bigger projects.

LOC

4750000 - 5000000

4500000 - 4750000

4250000 - 4500000

4000000 - 4250000

3750000 - 4000000

3500000 - 3750000

3250000 - 3500000

3000000 - 3250000

2750000 - 3000000

2500000 - 2750000

2250000 - 2500000

2000000 - 2250000

1750000 - 2000000

1500000 - 1750000

1250000 - 1500000

1000000 - 1250000

750000 - 1000000

500000 - 750000

250000 - 500000

0 - 250000

1400

1200

1000

800

600

400

200

0

Std. Dev = 175256.3

Mean = 43800

N = 1319.00

Figure 2: Number of projects by size in LOC

-4-

Page 5: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

FILES

42500 - 45000

40000 - 42500

37500 - 40000

35000 - 37500

32500 - 35000

30000 - 32500

27500 - 30000

25000 - 27500

22500 - 25000

20000 - 22500

17500 - 20000

15000 - 17500

12500 - 15000

10000 - 12500

7500 - 10000

5000 - 7500

2500 - 5000

0 - 2500

1400

1200

1000

800

600

400

200

0

Std. Dev = 1349.78

Mean = 301

N = 1319.00

Figure 3: Number of Projects by size in number of files

The team size of the projects shows a similar distribution with many projects with only a single programmer (see Figure 4). The mean of the team size is 1.87 and the biggest team is 88 programmers.

TEAM_R

16+11-156-1054321

Cou

nt

1000

800

600

400

200

0

Figure 4: Number of projects by team size

Figure 5 shows the distribution of the projects by the development status given by source forge. The status ranges from 1 to 6 (1-planning, 2-pre-alpha, 3-alpha, 4-beta, 5-production/stable and 6-mature) giving an idea about the project in its live-cycle. The number of projects for the status 1 through 4 are similar with around 280 projects per status. For the status 5 and 6 there are fewer projects indicating that most of the analyzed projects are still considered in the beginning of their life-cycle.

-5-

Page 6: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

STATUS

654321

400

300

200

100

0

Std. Dev = 1.39

Mean = 3

N = 1319.00

Figure 5: Number of projects by status

Finally in Table 2 the main statistics are summarized.

Table 2: Descriptive statistics of the main project variablesDescriptive Statistics

1319 1 6 2.86 1.391319 1.00 4283433 43799.53 175256.33061319 1.00 42674.00 301.1888 1349.78221319 1 88 1.87 3.341319

STATUSLOCFILESTEAMValid N (listwise)

N Minimum Maximum Mean Std. Deviation

4 ResultsIn this section the results of the analysis are presented. First results at the project level

are given, followed by analyzing the pattern usage for individual files.....

4.1 Pattern Usage in ProjectsFor source forge the development efforts are organized in projects as the basic unit of

coordination. A project has one or several administrators who coordinate the development of the project and organize the cooperation between the programmers. Figure 6 showed the distribution of different design patterns in the projects. Many projects do not apply patterns and only a small fraction 9.7% (128 out of 1191 projects) use one ore more patterns.

-6-

Page 7: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

PATTERNS

76543210

1400

1200

1000

800

600

400

200

0

Std. Dev = .55

Mean = 0

N = 1319.00

Figure 6: Projects by number of different patterns used

Next we analyze the relationship between the main project variables (development status, size of the project in LOCs and number of files, the team size and the number of different patterns used in the project). Table 3 gives the Pearson correlation between all variables. A high correlation (0.887) exists between the two measures of the project size, the LOCs and the number of files. This was to be expected. What is more interesting is that the second highest correlation was found between the team size and the number of different patterns used in the project (a correlation of 0.415). This correlation is even higher than the more intuitive correlation between the team size and the size of the project with 0.351 for the siue in LOCs and 0.279 in files. Notable is also the fact that the development status of the project has only very little correlation (<0.2) with the other variables.

Table 3: Correlation between main project variablesCorrelations

1.000 .154** .122** .102** .117**. .000 .000 .000 .000

1319 1319 1319 1319 1319.154** 1.000 .887** .351** .182**.000 . .000 .000 .0001319 1319 1319 1319 1319.122** .887** 1.000 .279** .153**.000 .000 . .000 .0001319 1319 1319 1319 1319.102** .351** .279** 1.000 .415**.000 .000 .000 . .0001319 1319 1319 1319 1319.117** .182** .153** .415** 1.000.000 .000 .000 .000 .1319 1319 1319 1319 1319

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

STATUS

LOC

FILES

TEAM

PATTERNS

STATUS LOC FILES TEAM PATTERNS

Correlation is significant at the 0.01 level (2-tailed).**.

-7-

Page 8: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

To explore the relationship between the team size and the usage of patterns more we use a scatter plot of the mean of the number of different patterns by the team size in Figure 7. The figure also includes an estimated linear model for the data. The constant of 0.01865 is not significant with a t-value of 1.174, but the gradient of 0.06875 is highly significant with a t-value of 16.542. From the plot in Figure 7 it is clearly visible, that the probability of using patters increases with the size of the team. A explanation for this finding is that design patterns are, as stressed by several authors [cite], used mainly for efficient communication between developers. Bigger teams need more coordination and therefore communication between the team members. And part of this communication is greatly improved by using patterns and pattern names in to describe changes to the design and the code. However the linear model depicted in Figure 7 has only a R-square of 0.172. The high proportion of variation not explained by the models stems from the fact that there is clearly another important factor influencing the usage of patterns. This factor could for example be the level of expertise of different programmers, making it more likely that a more experienced programmer uses design patterns even if he programs alone.

PATTERNS

TEAM

100806040200

8

6

4

2

0

-2

Observed

Linear

Figure 7: Linear regression of the mean number of patterns used by project depending on the team size

4.2 Pattern Usage by FileIn this section we analyze the application of design patterns at the level of individual

files. The main variables for this analysis are the number of programmers working on a file (performing one or more checkins on the file), the number of checkins for the file, the number of different patterns used in the file, the average number of LOCs the file size increases per checkin and the average number of LOCs changed by checkin.

The most important information for this analysis in Table 4 is that most files are only worked on by a single programmer (the mean is 1.12) and that only a very small portion of files contain a design pattern at all. In Table 5 the correlations between the variables are given. The relationship between the increase in LOC per checkin and the change of LOC per checkin with a correlation of 0.545 is trivial. The next strongest relationship is between the number of programmers working on a file and the number of checkins for the file. This is also no surprise. All other correlations are rather small. There is some relationship between the number of different patterns used in the file and the number of programmers as well as the number of checkins but they are rather small with a correlation lower than.0.2.

-8-

Page 9: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

Table 4:Descriptive statistics of the main variablesDescriptive Statistics

393350 1 32 1.12 .48393350 1 389 2.84 3.73393350 0 4 4.79E-03 7.17E-02393350 -10974.40 35303.25 1.6919 93.3578393350 .00 38328.00 6.3533 147.3698393350

NUM_PROGCHECKINSDIS_PATTI_LOC_PCA_LOC_PCValid N (listwise)

N Minimum Maximum Mean Std. Deviation

Table 5: Correlation between main variablesCorrelations

1.000 .441** .099** .008** .031**. .000 .000 .000 .000

393350 393350 393350 393350 393350.441** 1.000 .158** .012** .043**.000 . .000 .000 .000

393350 393350 393350 393350 393350.099** .158** 1.000 .004* .012**.000 .000 . .010 .000

393350 393350 393350 393350 393350.008** .012** .004* 1.000 .545**.000 .000 .010 . .000

393350 393350 393350 393350 393350.031** .043** .012** .545** 1.000.000 .000 .000 .000 .

393350 393350 393350 393350 393350

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

NUM_PROG

CHECKINS

DIS_PATT

I_LOC_PC

A_LOC_PC

NUM_PROG CHECKINS DIS_PATT I_LOC_PC A_LOC_PC

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

4.3 Usage of individual Patterns by ProjectIn this section differences of the application of the individual patterns are analyzed.

Table 6 summarized the main variables used for the analysis. In the 128 projects that used patterns, between 1 and 7 different patterns were used. We added up the number of different patters for each project and found out that there are 194 pattern applications in the data set. The number of programmers working together on one pattern in a project (programmers mentioning the same pattern name) is between 1 and 10 with a mean of 1.54. Also we analyzed the number of files changed per pattern, the LOCs added and deleted and the increase of LOCs due to the pattern.

-9-

Page 10: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

Table 6: Descriptive statistics for the main variables used to analyze patterns by projectsDescriptive Statistics

194 1 10 1.54 1.27194 1.00 210.00 9.7216 19.3590194 .00 42426.00 244.8433 3044.8692194 .00 57162.00 308.3147 4103.3164194 -14736.00 491.00 -63.4714 1059.8724194

NUM_PROGFILESADD_LOCDEL_LOCINC_LOCValid N (listwise)

N Minimum Maximum Mean Std. Deviation

In the following we will analyze several aspects (usage, number of programmers co-operating, LOCs changed and number of files involved) of the individual patterns. However, since the data set is not very large, we will only point out the most striking results for the patterns with the biggest deviation from the norm.

Figure 8 shows the number of projects each pattern was applied to. The pattern names Command and State appear very often in the data. Since both words are very common in programming and design, it could indicate a problem in our way to identify patterns by there name and some keywords from the intents. We have to keep this observation in mind for the remainder of this section. However, in third place appears Singleton which is a very unusual word for programmers not familiar with design patterns. This leads us to the conclusion, that for Java the application of the pattern Singleton seems to provide important design advantages.

PATT

ERN

AbstractAdapter

BridgeBuilder

Chain ofCommandCompositeDecorator

FacadeFactory MFlyweight

InterpretIterator

MementoObserverPrototype

ProxySingleton

StateStrategyTemplate

Visitor

Count

6050403020100

Figure 8: Number of projects using individual patterns

Figure 9 depicts the mean of the number of programmers working on one pattern in the same project. It seems interesting that within the four patterns with the highest average of different programmers three (Adapter, Composite, Proxy) are so called structural patterns concerned with how to compose classes and objects to form larger structures in a flexible way (see [GHJV95, pp.137]). This structural aspect seems to explain or at least support this finding.

-10-

Page 11: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

PATT

ERN

AbstractAdapter

BridgeBuilder

Chain ofCommandCompositeDecorator

FacadeFactory MFlyweight

InterpretIterator

MementoObserverPrototype

ProxySingleton

StateStrategyTemplate

Visitor

Mean NUM_PROG

2.42.22.01.81.61.41.21.0.8

Figure 9: Number of programmers per pattern use

Figure 10 shows the median of the increase in LOCs for the files when a pattern was mentioned. The most striking increase was observed for pattern Chain of Responsibility which has the aim to decouple sender and receiver of a request by passing down a request along a chain of objects until one can handle it. This means that several objects have to be extended by the methods to pass on requests, which ends as observed here, in many lines of very simple code. However, we only have very few observations of this pattern in the data set, which makes this conclusion very tentative. Interesting is also the decrease of LOCs for the Abstract Factory and the Strategy patterns. Both patterns can be used to consolidate related functionality (referred to as ‘family of ...’ in [GHJV95]) into one place. This approach can result in more flexible and reduced code as observed here.

PATT

ERN

AbstractAdapter

BridgeBuilder

Chain ofCommandCompositeDecorator

FacadeFactory MFlyweight

InterpretIterator

MementoObserverPrototype

ProxySingleton

StateStrategyTemplate

Visitor

Med INC_LOC

806040200-20

Figure 10: Median of LOC increase per checked in file over all projects

Figure 11 depicts the LOCs changed (and added) for the different patterns. Almost all patterns are similar in this respect. Only the pattern Chain of Responsibility need significantly more changes than the other patterns, which can be explained with the same arguments used above.

-11-

Page 12: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

PATT

ERN

AbstractAdapter

BridgeBuilder

Chain ofCommandCompositeDecorator

FacadeFactory MFlyweight

InterpretIterator

MementoObserverPrototype

ProxySingleton

StateStrategyTemplate

Visitor

Med ADD_LOC

2001000

Figure 11: Mean of the LOC changed and added per checked in file over all projects

The last analysis in this section is the analysis of the median number of files changed for different patterns in Figure 12. The largest number of files are changed for the implementation of the Template Method pattern. This pattern defers certain steps of an algorithm to subclasses. Classes are in Java normally stored in separate files which can explain the higher number of files observed for this pattern.

PATT

ERN

AbstractAdapter

BridgeBuilder

Chain ofCommandCompositeDecorator

FacadeFactory MFlyweight

InterpretIterator

MementoObserverPrototype

ProxySingleton

StateStrategyTemplate

Visitor

Med FILES

20100

Figure 12: Median of the number of files changed per pattern use over all projects

4.4 Pattern Usage by ProgrammerIn this section we use the programmer as the main unit of analysis. We analyze how

many patterns each programmer uses and compare it with his programming experience and productivity. As the measure of the programming experience we use the number of projects the programmer contributes to and as the measure of productivity we use the increase of LOCs he produced. Figures Figure 13 and Figure 14 show the distribution of these two variables in the population of programmers. Most programmers only work on one project and only very few programmers increase the size of the projects measured in LOCs.

-12-

Page 13: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

PROJECTS

7.06.05.04.03.02.01.0

3000

2000

1000

0

Std. Dev = .48

Mean = 1.1

N = 2164.00

Figure 13: Number of projects a programmer contributes to

INC_LOC

170000.0

150000.0

130000.0

110000.0

90000.0

70000.0

50000.0

30000.0

10000.0

-10000.0

-30000.0

-50000.0

-70000.0

-90000.0

-110000.0

3000

2000

1000

0

Std. Dev = 8181.71

Mean = 1641.4

N = 2164.00

Figure 14: Histogram if the total increase of lines of code by programmer.

We analyzed the relationship of the variables with simple correlations in Table 7. The correlations were rather small. For the relationship between the measure of productivity (increase in LOCs) and the usage of patterns a correlation of 0.245 was found. This relationship is visualized in Figure 15 by plotting the median of the increase in LOCs by programmer by the number of different patterns a programmer uses. There is an upward trend

-13-

Page 14: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

Table 7: Pearson Correlation between the main variablesCorrelations

1.000 .092** .186**. .000 .000

2164 2164 2164.092** 1.000 .245**.000 . .0002164 2164 2164.186** .245** 1.000.000 .000 .2164 2164 2164

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

PROJECTS

INC_LOC

DIS_PATT

PROJECTS INC_LOC DIS_PATT

Correlation is significant at the 0.01 level (2-tailed).**.

INC_LOC

10000050000

100005000

1000500

10050

105

1

DIS

_PAT

T

7

6

5

4

3

2

1

0

-1

Figure 15: Scatter plot of the different patterns used by a programmer and the productivity (increase in LOCs on a logarithmic scale).

-14-

Page 15: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

NTILES of INC_LOC

20191817161514131211109876421

Mea

n D

IS_P

ATT

1.0

.8

.6

.4

.2

0.0

Figure 16: Mean of the different patterns used by productivity (recoded into 20 intervalves with the same number of observations; 4 equals an increase of 0 LOCs)

5 Conclusion

6 References

[BMRS96] Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, and Michael Stal. Pattern-Oriented Software Architecture, A System of Patterns. John Wiley & Sons Ltd, Chichester, England, 1996.

[CoSc95] James O. Coplien and Douglas C. Schmidt, editors. Pattern Languages of Program Design. Addison-Wesley, Reading, MA, 1995.

[GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional Computing Series. Addison-Wesley Publishing Company, New York, NY, 1995.

[Foge99] K. Fogel. Open Source Development with CVS. Coriolos Open Press, Scottsdale, AZ, 1999.

[KoSc02] Stefan Koch and Georg Schneider. Effort, co-operation and co-ordination in an open source software project: GNOME. Information Systems Journal, Nr.12, pp. 27-42, 2002

[MaRB98] Robert C. Martin, Dirk Riehle, and Frank Buschmann, editors. Pattern Languages of Program Design 3. Addison-Wesley, Reading, MA, 1998.

[MoFH00] A. Mockus, R. Fielding and J. Herbsleb. A case study of open source software development: the Apache server. In: Procs. Of the 22nd Intl. Conf. On Software Engineering, pp. 263-272, 2000.

[Vlis98] John Vlissides. Pattern Hatching: Design Patterns Applied. Software Patterns Series. Addison-Wesley, New York, 1998

-15-

Page 16: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

[VlCK96] John M. Vlissides, James O. Coplien, and Norman L. Kerth, editors. Pattern Languages of Program Design 2. Addison-Wesley, Reading, MA, 1996.

-16-

Page 17: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

7 AppendixThe design patterns uses for analysis from Gamma et al. [cite] with their intents.

Keywords extracted from the intends are printed italic. These words were first stemmed (by removing the endings) and then used to identify the design patterns.

The design patterns are organized by the three groups from Gamma et al. into creational, structural and behavioral patterns.

7.1 Creational Patterns

Abstract FactoryProvide an interface for creating families of related or dependent objects without

specifying their concrete classes.

BuilderSeparate the construction of a complex object from its representation so that the same

construction process can create different representations.

Factory MethodDefine an interface for creating an object, but let subclasses decide whichclass to instantiate. Factory Method lets a class defer instantiation to subclasses.

PrototypeSpecify the kinds of objects to create using a prototypical instance, and create new

objects by copying this prototype.

SingletonEnsure a class only has one instance, and provide a global point of access to it.

7.2 Structural Patterns

AdapterConvert the interface of a class into another interface clients expect. Adapter lets classes

work together that couldn't otherwise because of incompatible interfaces.

BridgeDecouple an abstraction from its implementation so that the two can vary

independently.

CompositeCompose objects into tree structures to represent part-whole hierarchies. Composite

lets clients treat individual objects and compositions of objects uniformly.

DecoratorAttach additional responsibilities to an object dynamically. Decorators provide a

flexible alternative to subclassing for extending functionality

FacadeProvide a unified interface to a set of interfaces in a subsystem. Facade defines a higher-

level interface that makes the subsystem easier to use.

FlyweightUse sharing to support large numbers of fine-grained objects efficiently.

ProxyProvide a surrogate or placeholder for another object to control access to it.

-17-

Page 18: A Quantitative Study of Design Pattern Adoption€¦  · Web viewFigure 7: Linear regression of the mean number of patterns used by project depending on the team size. Pattern Usage

7.3 Behavioral Patterns

Chain of ResponsibilityAvoid coupling the sender of a request to its receiver by giving more than one object a

chance to handle the request. Chain the receiving objects and pass the request along the chain until an objects handles it.

CommandEncapsulate a request in a object, thereby letting you parameterize clients with different

requests, queue or log requests, and support undoable operations.

InterpreterGiven a language, define a representation for its grammar along with an interpreter that

uses the representation to interpret sentences in the language.

IteratorProvide a way to access the elements of an aggregate object sequentially without

exposing its underlying representation.

MediatorDefine an object that encapsulates how a set of objects interact. Mediator promotes

loose coupling by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.

MementoWithout violating encapsulation, capture and externalize an object's internal state so

that the object can be restored to this state later.

ObserverDefine a one-to-many dependency between objects so that when one object changes

state, all its dependents are notified and updated automatically.

StateAllow an object to alter its behavior when its internal state changes. The object will

appear to change its class.

StrategyDefine a family of algorithms, encapsulate each one, and make them interchangeable.

Strategy lets algorithms vary independently from clients that use it.

Template MethodDefine the skeleton of an algorithm in an operation, deferring some steps to subclasses.

Template Method lets subclasses redefine certain steps of an algorithm without changing the algorithm's structure.

VisitorRepresent an operation to be performed on the elements of an object structure. Visitor

lets you define a new operation without changing the classes of the elements on which it operates.

-18-