10
Research Article Process Correlation Analysis Model for Process Improvement Identification Su-jin Choi, 1 Dae-Kyoo Kim, 2 and Sooyong Park 1 1 Sogang University, Seoul, Republic of Korea 2 Oakland University, Rochester’ MI 48309, USA Correspondence should be addressed to Dae-Kyoo Kim; [email protected] Received 28 December 2013; Accepted 25 February 2014; Published 24 March 2014 Academic Editors: S. K. Bhatia and P. Muller Copyright © 2014 Su-jin Choi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Soſtware process improvement aims at improving the development process of soſtware systems. It is initiated by process assessment identifying strengths and weaknesses and based on the findings, improvement plans are developed. In general, a process reference model (e.g., CMMI) is used throughout the process of soſtware process improvement as the base. CMMI defines a set of process areas involved in soſtware development and what to be carried out in process areas in terms of goals and practices. Process areas and their elements (goals and practices) are oſten correlated due to the iterative nature of soſtware development process. However, in the current practice, correlations of process elements are oſten overlooked in the development of an improvement plan, which diminishes the efficiency of the plan. is is mainly attributed to significant efforts and the lack of required expertise. In this paper, we present a process correlation analysis model that helps identify correlations of process elements from the results of process assessment. is model is defined based on CMMI and empirical data of improvement practices. We evaluate the model using industrial data. 1. Introduction Soſtware process improvement (SPI) is carried out to improve the efficiency of soſtware development process by analyzing involved practices and their relations in consideration of available resources in an organization. SPI is initiated by assessing the current process to identify strengths and weak- nesses. Based on findings, improvement plans are developed to reinforce strengths and improve weaknesses. In general, a reference model is used throughout SPI as the base. A widely used model is Capability Maturity Model Integration (CMMI) [1] which has shown its impact on product quality, development cost, and time-to-market across the industry [24]. CMMI provides a structured process assessment framework defined upon a set of process areas (PAs). 22 PAs (e.g., project planning and requirement definition) are defined each describing specific goals to be achieved and specific practices to be carried out. PAs and their components (goals and practices) are highly correlated, which reflects the iterative nature of soſtware development process [5]. ese correlations capture dependencies among PAs and components which should be considered in improve- ment planning [6, 7]. However, the current practice focuses only on individual PAs and oſten overlooks correlations of PAs, which dimin- ishes the efficiency of improvement plans. is is mainly attributed to significant efforts and the lack of required exper- tise. ere exist a few studies on identifying correlations of improvement items (e.g., CMMI practices and improvement agendas) and improvement planning [710]. e existing work, however, largely relies on analyst’s expertise and man- ual work which make it difficult for the less experienced to practice. Some works (e.g., [7, 8, 10]) propose manual approaches for reviewing and relating process items and some others (e.g., [9]) present an expert-friendly template for describing improvement agendas. In this paper, we present a process correlation analysis model for identifying correlations of PAs, goals, and practices based on CMMI and empirical data collected from SPI projects where the analysis of findings’ correlations have been done. CMMI is used to infer correlations of PAs and their components and the inferred correlations are affirmed Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 104072, 10 pages http://dx.doi.org/10.1155/2014/104072

Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

Research ArticleProcess Correlation Analysis Model forProcess Improvement Identification

Su-jin Choi1 Dae-Kyoo Kim2 and Sooyong Park1

1 Sogang University Seoul Republic of Korea2Oakland University Rochesterrsquo MI 48309 USA

Correspondence should be addressed to Dae-Kyoo Kim kim2oaklandedu

Received 28 December 2013 Accepted 25 February 2014 Published 24 March 2014

Academic Editors S K Bhatia and P Muller

Copyright copy 2014 Su-jin Choi et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Software process improvement aims at improving the development process of software systems It is initiated by process assessmentidentifying strengths and weaknesses and based on the findings improvement plans are developed In general a process referencemodel (eg CMMI) is used throughout the process of software process improvement as the base CMMI defines a set of processareas involved in software development and what to be carried out in process areas in terms of goals and practices Process areasand their elements (goals and practices) are often correlated due to the iterative nature of software development process Howeverin the current practice correlations of process elements are often overlooked in the development of an improvement plan whichdiminishes the efficiency of the planThis is mainly attributed to significant efforts and the lack of required expertise In this paperwe present a process correlation analysis model that helps identify correlations of process elements from the results of processassessment This model is defined based on CMMI and empirical data of improvement practices We evaluate the model usingindustrial data

1 Introduction

Software process improvement (SPI) is carried out to improvethe efficiency of software development process by analyzinginvolved practices and their relations in consideration ofavailable resources in an organization SPI is initiated byassessing the current process to identify strengths and weak-nesses Based on findings improvement plans are developedto reinforce strengths and improve weaknesses

In general a reference model is used throughout SPIas the base A widely used model is Capability MaturityModel Integration (CMMI) [1] which has shown its impacton product quality development cost and time-to-marketacross the industry [2ndash4] CMMI provides a structuredprocess assessment framework defined upon a set of processareas (PAs) 22 PAs (eg project planning and requirementdefinition) are defined each describing specific goals to beachieved and specific practices to be carried out PAs andtheir components (goals and practices) are highly correlatedwhich reflects the iterative nature of software developmentprocess [5] These correlations capture dependencies among

PAs and componentswhich should be considered in improve-ment planning [6 7]

However the current practice focuses only on individualPAs and often overlooks correlations of PAs which dimin-ishes the efficiency of improvement plans This is mainlyattributed to significant efforts and the lack of required exper-tise There exist a few studies on identifying correlations ofimprovement items (eg CMMI practices and improvementagendas) and improvement planning [7ndash10] The existingwork however largely relies on analystrsquos expertise and man-ual work which make it difficult for the less experiencedto practice Some works (eg [7 8 10]) propose manualapproaches for reviewing and relating process items andsome others (eg [9]) present an expert-friendly template fordescribing improvement agendas

In this paper we present a process correlation analysismodel for identifying correlations of PAs goals and practicesbased on CMMI and empirical data collected from SPIprojects where the analysis of findingsrsquo correlations havebeen done CMMI is used to infer correlations of PAs andtheir components and the inferred correlations are affirmed

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 104072 10 pageshttpdxdoiorg1011552014104072

2 The Scientific World Journal

iPCG

mPCGmPCG

ePCGePCG

iPCG

CMMI

Empiricaldata

Building

Building

Building

Figure 1 Process correlation model

and complemented using empirical data The model takesas input the findings of process assessment and identifiestheir correlations by analyzing the findings to correspondingpractices correlated in CMMI and empirical data The modelthen produces a graph representing correlations of findingsWe evaluate the model in terms of precision recall F-measure and accuracy using five different industrial SPIprojects We also demonstrate tool support for the presentedmodel In our previous work we presented ReMo a modelfor developing improvement recommendations which usemanual correlation analysis This work complements theprevious work by providing a systematic way of identifyingprocess correlations

The paper is structured as follows Section 2 gives anoverview of related work Section 3 details the proposedprocess correlation analysis model and its supporting toolSection 4 presents the evaluation results using industrial dataand Section 5 concludes the paper with future work

2 Related Work

There is some work on identifying improvements using a cer-tain type of relationships Gorscheck andWohlin [8] proposeDAIIPS a method for packaging improvement issues by pri-oritizing them based on analysis of their dependencies Thismethod is designed for small organizations where resourcesfor addressing improvement issues are limited The degreeof dependency between improvement issues is decided byvote in a workshop That is the decision on dependencyis greatly influenced by the level of participantsrsquo expertiseMoreover they do not provide criteria for the qualification ofworkshop attendees and guidelines for making dependencydecisions Such a manual process is time consuming andrequires significant efforts

Calvo-Manzano Villalon et al [9] present an improve-ment specification template calledAction Package for describ-ing organizational technical and management aspects ofprocess improvements The template consists of twelve itemsincluding policy training and metrics However the tem-plate is designed for experts providing little details as to howthe items should be filled out This makes it difficult for theless experienced to use the template

Sun and Liu [10] present a CMMI-based method forrelating improvement issues to CMMI practices using the

Quality Function Deployment technique [11] Similar toGorscheck and Wolinrsquos work they prioritize improvementissues by the number of relations to practices The higher thenumber of relations the higher the priority However themethod does not describe how identified relations may beused

Chen et al [7] present a practice dependency modelfor CMMI using six process areas at level 2 Dependenciesbetween practices are identified via the flow of work productsbetween practices based on a textual analysis of the CMMIspecification Their work is similar to our work in that ourwork also uses CMMI However the specification of CMMIinvolves many ambiguities which limits the extent to whichCMMI alone may provide information about correlationsTo address this we make use of empirical field data fromindustry in addition to CMMI

3 Correlation Analysis Model

In this section we describe a process correlation analysismodel that identifies correlations from a given assessmentfinding set The model is built upon CMMI and empiricalfield data CMMI is used as a base for identifying an initialset of common correlations of practices from its descriptionson PAs and their components (eg goals and practices)Identified correlations are represented as a graph The graphis referred to as mPCG which denotes a process correlationgraph (PCG) for the considered model (CMMI) Howeverthe description in CMMI involves ambiguities and incon-sistencies due to its general nature To address this we useempirical data collected from various SPI projects in additiontoCMMI Correlations are identified from empirical data andrepresented as a graph referred to as ePCGmPCG and ePCGare then combined to produce an integrated graph iPCGFigure 1 illustrates the process of the model Henceforth weuse terms correlation and relation interchangeably

31 BuildingmPCG WeuseCMMI as the base for identifyingpractice correlations CMMIdescribes a PA in terms of119866119900119886119897119904119875119903119886119888119905119894119888119890119904 and 119878119906119887119901119903119886119888119905119894119888119890119904 Goalsmay be generic or specificA specific goal is achieved by a set of specific practicesproducing certain work products A specific practice mayconsist of a set of subpractices The specific goals practicesand subpractices of a PA may reference the components of

The Scientific World Journal 3

Table1Processa

reae

xamples

Processa

rea

Goal

Practic

eDescriptio

nRe

ferenceinformation

Requ

irementm

anagem

ent

(REQ

M)

a SG1

Manager

equirements

RefertotheP

rojec

tMonito

ringa

ndCo

ntrol

processa

reafor

moreinformationabou

tmanagingcorrectiv

eactionto

closure

b SP12

Obtaincommitm

enttorequ

irements

RefertotheP

rojec

tMonito

ringa

ndCo

ntrol

processa

reafor

moreinformationabou

tmon

itorin

gcommitm

ents

SP14

Maintainbidirectionaltraceabilityof

requ

irements

Notd

efined

Projectp

lann

ing(PP)

SG1

Establish

estim

ates

Notd

efined

SP11

Estim

atethe

scop

eofthe

project

RefertotheS

upplier

Agreem

entM

anagem

ent

processa

rea

SG2

Develo

pap

rojectplan

Notd

efined

SP22

Identifyprojectrisk

sRe

fertotheM

onito

rProjec

tRisk

sspecific

practic

ein

theP

rojec

tPlann

ingprocessa

reafor

more

inform

ationabou

trisk

mon

itorin

gactiv

ities

Projectm

onito

ringandcontrol

(PMC)

SG1

Mon

itorthe

projectagainstthep

lan

Notd

efined

SP12

Mon

itorc

ommitm

ents

Notd

efined

SP13

Mon

itorp

rojectris

ksRe

fertotheP

rojec

tPlann

ingprocessa

reafor

moreinformationabou

tidentify

ingprojectrisk

s

SG2

Managec

orrectivea

ctionto

closure

Notd

efined

SP21

Analyze

issues

RefertotheE

stablish

theB

udgeta

ndSchedu

lespecificp

racticeintheP

rojec

tPlann

ingprocess

area

form

oreinformationabou

tcorrectivea

ction

criteria

a Specific

goalbspecificp

ractice

4 The Scientific World Journal

PracticeProcessarea

Work product

A A

BB

(a) IPC

Reference

D D

C C

(b) EPC

Figure 2 Identifying practice correlations in CMMI

Goals

Processarea ldquoArdquo

Processarea ldquoBrdquo

Goals

Practices Practices

Subpractices

Goal levelPractice level

Reference information

Figure 3 Structure of process areas in CMMI

another PA Generic goals are shared by PAs and thus thereis no reference for generic goals

Given the PAdescriptions inCMMIwe look for informa-tion about internal practice correlations (IPCs) and externalpractice correlations (EPCs) IPCs exist within the sameprocess area while EPCs cut across process areas To identifyIPCs we make use of the description of practices andsubpractices and their related work products Specifically wefocus on relationships of input and output work productsamong practices Two internal practices are considered asinternally correlated if one practice has output work productsthat are used as input work products of the other Figure 2(a)shows an IPC EPCs are identified based on the descriptionof related process areas and external references Two externalpractices are considered as externally correlated if one prac-tice refers to the other which is shown in Figure 2(b)

Goals practices and subpractices are defined at differentlevels as shown in Figure 3 The goal level is the highestfollowed by the practice level and then the subpractice levelWe consider two components (eg goals) as correlated ifthey reference each other A component referencing anothercomponent at a lower level is considered as related to thatspecific component only On the other hand a componentreferencing another component at a higher level is consideredas related to all the subcomponents of the component TwoPAs are considered as correlated if a component of one PA

refers to a component of the other or one PA refers to theother PA in its description

Table 1 shows examples of PAs and a subset of theircomponents defined in CMMI The table describes threePAsmdashRequirement Management (REQM) Project MonitoringandControl (PMC) andProject Planning (PP)REQM has onespecific goal (SG 1) with two specific practices (SP 11 and SP12) Other PAs can be explained similarly A component maybe accompanied with reference information For instance SG1 in REQM has the following reference description ldquoReferto the Project Monitoring and Control process area for moreinformation about managing corrective action to closurerdquoFrom this description one can obviously infer a relation ofREQM SG 1 to the SG 2 goal of PMC which is an example of agoal-level reference Similarly from the reference descriptionof REQM SP 12 it can be easily inferred that REQM SP 12 isrelated to the SP 12 practice in PMC

Based on elicited correlations of PAs and their compo-nents a mPCG is built A mPCG is a nondirected graphcapturing correlations of practices where a node representsa practice and an edge represents a correlation An edgebetween nodes 119901

119898and 119901

119899has a weight denoted by119882

119898(119901119898119901119899)

All edges in a mPCG have their weight 1 denoting theexistence of a correlation Correlations may be found withinthe same PA or across PAs Figure 4 shows an example of amPCG

Note that CMMI descriptions are not always explicitFor instance subpractices are described mostly about outputwork products while having little description on input prod-ucts Goal-level references across PAs are described abstractlyproviding little information on how they influence relatedpractices CMMI descriptions on process areas also involveambiguities

32 Building ePCG To address the above we make use ofempirical data collected from a set of industrial SPI projectsin addition to CMMI The data is postproject data includingfindings and their correlations are already analyzed and usedin completed projects Postproject findings are often taggedwith relevant CMMI practices Accordingly we assume thatthey are all tagged in this work

Each project is analyzed to identify correlated practiceswhich are captured in a PCG PCGs of all the consideredprojects are combined to produce an ePCG The resultingePCG is then merged with the mPCG of CMMI However

The Scientific World Journal 5

REQM

PP

PMC

10

1010 10

10

1010

11 22

12 12 13

21

14

11

Process area

Practice

Correlated

Figure 4 mPCG example

Findingof practice C

Finding level Practice level

C998400 C

DD998400

(a) Case 1

Practice levelFinding level

C998400 C

DD998400 D998400998400

(b) Case 2

Practice levelFinding level

C998400 C

DD998400 D998400998400998400

(c) Case 3

Figure 5 Correspondence between findings and practices

unlike CMMI where correlations are described for practicesempirical data are described for findings which are imple-mentations of practices This is because an SPI project is animplementation of CMMI Given that findings in empiricaldata should be abstracted to practices to align the level ofabstraction of ePCGs with the mPCG of CMMI Abstractionis carried out as follows

(R1) For each finding identify the corresponding practicein CMMI

(R2) Two practices are considered as correlated if a findingof the practice is related to a finding of the otherpractice This is illustrated in Figure 5(a) The sameholds regardless the number of instances (findings)of a practice Figures 5(b) and 5(c) show the caseswhere a practice has multiple instances with differentrelations Both cases result in the same abstraction asFigure 5(a) per R3

We demonstrate abstraction using the findings in Table 2The table shows four findings REQM-W-01 PP-W-01 PP-W-02 and PP-S-01 where REQM-W-01 is an instance ofthe practice REQM SP 14 in Table 1 and PP-W-01 PP-W-02 and PP-S-01 are instances of PP SP 11 From thedescription of the findings REQM-W-01 and PP-W-01 arefound related since requirement traceability needs to bemaintained for development tasks andwork products definedin work breakdown structure Similarly REQM-W-01 and PP-S-01 are found related since PP-S-01 is a strength practiceof defining development tasks and work products Giventhis relation the corresponding practices REQM SP 14 and

PP SP 11 to these findings are considered also as relatedOn the other hand REQM-W-01 and PP-W-02 are foundnot related because COTS products are already made Thusrequirements traceability to development tasks and workproducts is not necessary However their correspondingpractices are considered as related as they have already beenso for other pairs

PCGs of SPI projects are merged by combining nodesand edges to create an ePCG for the considered project setEach edge is weighted by the number of projects having theedge identified in their PCG Let |119875

11990111199012| be the number of the

projects whose PCG has practices 1199011 and 1199012 correlated and|11987511990111199012| the number of the projects whose PCG has 1199011 and 1199012

identified as correlatedThen the weight119882119890(119901119898119901119899)of the edge

between 119901119898and 119901

119899is defined as follows

119882119890(119901119898119901119899)=

100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816

(0 le 119882119890(119901119898119901119899)le 1) (1)

As an example consider Figure 6 In the figure there arethree projects ProjetX ProjectY and ProjectZ where the PCGof ProjetX has one related practice pair ((PP 11 REQM 14))the PCGof ProjectY has two related pairs ((PP 11 REQM 14)(PP 11 REQM 11)) and the PCG of ProjectZ has one relatedpair ((REQM 14 REQM 11)) The practices are the same asthose in Figure 4 PCGs of the projects are merged into anePCG by adding all the nodes (PP 11 REQM 14 REQM 11)and their relations involved in the PCGs The weight of theedge between PP 11 and REQM 11 is measured at 05 (12)as the number of the projects that involve PP 11 and REQM

6 The Scientific World Journal

Table 2 Assessment findings and corresponding practices

Finding ID Finding description Practice ID

REQM-W-01 It is difficult to trace defined requirements to development tasks andwork products REQM SP 14

PP-W-01 Work break down structure are developed in high level (egmilestone) without detailed tasks and work product PP SP 11

PP-W-02 Product components to be purchased as COTS (commercial-off-the-shelf) are not documented PP SP 11

PP-S-01 In some projects WBS with detail description on tasks and workproducts is developed with the support of Quality Assurance team PP SP 11

ProjectX ProjectY ProjectZ ePCG

Practicepair

REQM REQM

REQM

REQM

PP PP PP11

11

11

11

11

11

11

14 14

14

14

+ +

= 22 = 10

= 12 = 05

= 13 = 033

We1

We1 We2We2

We3

We3

Figure 6 Building an ePCG

14 is two (ProjectX ProjectY) and the number of the projectsthat have PP 11 and REQM 11 identified as correlated is one(ProjectY)Weight for other edges can bemeasured similarly

33 Building iPCG by Combining mPCG and ePCG ThemPCG resulting from Section 31 and the ePCG from Sec-tion 31 are merged to produce an integrated graph iPCGFigure 7 shows an example which combines the mPCG inFigure 4 and the ePCG in Figure 6

The weight119882119894(119901119898119901119899)of an edge between 119901

119898and 119901

119899in the

iPCG is computed as follows

119882119894(119901119898119901119899)=119882119898(119901119898119901119899)+119882119890(119901119898119901119899)

2 (0 le 119882

119894(119901119898119901119899)le 1)

(2)

A threshold is set for weight and any edge having itsweight lower than the threshold is considered as not relatedWe use 025 for the threshold in this work This means thata practice pair in an iPCG is considered as correlated if it isidentified in CMMI (119882

119898= 1) or in the half or more of the

empirical projects (119882119890ge 05)

Figure 8 shows the application process of the presentedmodelThe findings identified in the process assessment of anSPI project are abstracted to practices using the abstractionrules in Section 32 The resulting practices are then input tothe iPCG to produce a PCG of the project described in termsof practicesThe practices in the PCG are concretized back tofindings using the same mapping used as in abstracting thefindings to practices at the beginning

Figure 9 shows an example of a resulting correlationmatrix The matrix is symmetric having the same set offindings in column and row and the values represent weightsIn the figure the box in bold line shows that the weight of

the (REQM-W-03 CM-W-01) pair is measured as 050 whichindicates that the practices in the pair are correlated in eitherCMMI (119882

119898= 1) or the empirical data (119882

119890= 1) used in the

model Those pairs that have zero weight are not related Theresultingmatrix is suggestiveThat is the process analyst maymodify the matrix at his discretion The matrix can be alsoused for identifying more significant findings by prioritizingthem by the number of correlations or accumulated weights(ie adding all the weights by column)

34 Supporting Tool We have developed a prototype toolsupporting the presented model Figure 10 shows the archi-tecture of the tool It consists of two componentsmdasha correla-tion analysis engine and a PCG viewer The correlation anal-ysis engine is Excel-based consisting of (1) a knowledge basecontaining CMMI-based practice correlations and project-based practice correlations (2) a PCG generator responsiblefor building PCGs and (3) a PCG executor applying an iPCGand generating the output correlations in an Excel file ThePCG viewer displays the resulting PCG using the yED GraphViewer an open source application for visualizing objectconnections [12] The viewer takes an input file in variousformats including the Excel files generated by the PCGexecutor Figure 11 shows a screenshot of the PCG viewerNodes are grouped to denote different PAs of relevance andcorrelation weights are displayed with mouse-over on edges

4 Evaluation

We evaluate the presentedmodel in terms of recall precisionF-measure and accuracy [13] by comparing the resulting cor-relations tomanually identified correlations by experts Recallis measured by the number of correlations produced by the

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 2: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

2 The Scientific World Journal

iPCG

mPCGmPCG

ePCGePCG

iPCG

CMMI

Empiricaldata

Building

Building

Building

Figure 1 Process correlation model

and complemented using empirical data The model takesas input the findings of process assessment and identifiestheir correlations by analyzing the findings to correspondingpractices correlated in CMMI and empirical data The modelthen produces a graph representing correlations of findingsWe evaluate the model in terms of precision recall F-measure and accuracy using five different industrial SPIprojects We also demonstrate tool support for the presentedmodel In our previous work we presented ReMo a modelfor developing improvement recommendations which usemanual correlation analysis This work complements theprevious work by providing a systematic way of identifyingprocess correlations

The paper is structured as follows Section 2 gives anoverview of related work Section 3 details the proposedprocess correlation analysis model and its supporting toolSection 4 presents the evaluation results using industrial dataand Section 5 concludes the paper with future work

2 Related Work

There is some work on identifying improvements using a cer-tain type of relationships Gorscheck andWohlin [8] proposeDAIIPS a method for packaging improvement issues by pri-oritizing them based on analysis of their dependencies Thismethod is designed for small organizations where resourcesfor addressing improvement issues are limited The degreeof dependency between improvement issues is decided byvote in a workshop That is the decision on dependencyis greatly influenced by the level of participantsrsquo expertiseMoreover they do not provide criteria for the qualification ofworkshop attendees and guidelines for making dependencydecisions Such a manual process is time consuming andrequires significant efforts

Calvo-Manzano Villalon et al [9] present an improve-ment specification template calledAction Package for describ-ing organizational technical and management aspects ofprocess improvements The template consists of twelve itemsincluding policy training and metrics However the tem-plate is designed for experts providing little details as to howthe items should be filled out This makes it difficult for theless experienced to use the template

Sun and Liu [10] present a CMMI-based method forrelating improvement issues to CMMI practices using the

Quality Function Deployment technique [11] Similar toGorscheck and Wolinrsquos work they prioritize improvementissues by the number of relations to practices The higher thenumber of relations the higher the priority However themethod does not describe how identified relations may beused

Chen et al [7] present a practice dependency modelfor CMMI using six process areas at level 2 Dependenciesbetween practices are identified via the flow of work productsbetween practices based on a textual analysis of the CMMIspecification Their work is similar to our work in that ourwork also uses CMMI However the specification of CMMIinvolves many ambiguities which limits the extent to whichCMMI alone may provide information about correlationsTo address this we make use of empirical field data fromindustry in addition to CMMI

3 Correlation Analysis Model

In this section we describe a process correlation analysismodel that identifies correlations from a given assessmentfinding set The model is built upon CMMI and empiricalfield data CMMI is used as a base for identifying an initialset of common correlations of practices from its descriptionson PAs and their components (eg goals and practices)Identified correlations are represented as a graph The graphis referred to as mPCG which denotes a process correlationgraph (PCG) for the considered model (CMMI) Howeverthe description in CMMI involves ambiguities and incon-sistencies due to its general nature To address this we useempirical data collected from various SPI projects in additiontoCMMI Correlations are identified from empirical data andrepresented as a graph referred to as ePCGmPCG and ePCGare then combined to produce an integrated graph iPCGFigure 1 illustrates the process of the model Henceforth weuse terms correlation and relation interchangeably

31 BuildingmPCG WeuseCMMI as the base for identifyingpractice correlations CMMIdescribes a PA in terms of119866119900119886119897119904119875119903119886119888119905119894119888119890119904 and 119878119906119887119901119903119886119888119905119894119888119890119904 Goalsmay be generic or specificA specific goal is achieved by a set of specific practicesproducing certain work products A specific practice mayconsist of a set of subpractices The specific goals practicesand subpractices of a PA may reference the components of

The Scientific World Journal 3

Table1Processa

reae

xamples

Processa

rea

Goal

Practic

eDescriptio

nRe

ferenceinformation

Requ

irementm

anagem

ent

(REQ

M)

a SG1

Manager

equirements

RefertotheP

rojec

tMonito

ringa

ndCo

ntrol

processa

reafor

moreinformationabou

tmanagingcorrectiv

eactionto

closure

b SP12

Obtaincommitm

enttorequ

irements

RefertotheP

rojec

tMonito

ringa

ndCo

ntrol

processa

reafor

moreinformationabou

tmon

itorin

gcommitm

ents

SP14

Maintainbidirectionaltraceabilityof

requ

irements

Notd

efined

Projectp

lann

ing(PP)

SG1

Establish

estim

ates

Notd

efined

SP11

Estim

atethe

scop

eofthe

project

RefertotheS

upplier

Agreem

entM

anagem

ent

processa

rea

SG2

Develo

pap

rojectplan

Notd

efined

SP22

Identifyprojectrisk

sRe

fertotheM

onito

rProjec

tRisk

sspecific

practic

ein

theP

rojec

tPlann

ingprocessa

reafor

more

inform

ationabou

trisk

mon

itorin

gactiv

ities

Projectm

onito

ringandcontrol

(PMC)

SG1

Mon

itorthe

projectagainstthep

lan

Notd

efined

SP12

Mon

itorc

ommitm

ents

Notd

efined

SP13

Mon

itorp

rojectris

ksRe

fertotheP

rojec

tPlann

ingprocessa

reafor

moreinformationabou

tidentify

ingprojectrisk

s

SG2

Managec

orrectivea

ctionto

closure

Notd

efined

SP21

Analyze

issues

RefertotheE

stablish

theB

udgeta

ndSchedu

lespecificp

racticeintheP

rojec

tPlann

ingprocess

area

form

oreinformationabou

tcorrectivea

ction

criteria

a Specific

goalbspecificp

ractice

4 The Scientific World Journal

PracticeProcessarea

Work product

A A

BB

(a) IPC

Reference

D D

C C

(b) EPC

Figure 2 Identifying practice correlations in CMMI

Goals

Processarea ldquoArdquo

Processarea ldquoBrdquo

Goals

Practices Practices

Subpractices

Goal levelPractice level

Reference information

Figure 3 Structure of process areas in CMMI

another PA Generic goals are shared by PAs and thus thereis no reference for generic goals

Given the PAdescriptions inCMMIwe look for informa-tion about internal practice correlations (IPCs) and externalpractice correlations (EPCs) IPCs exist within the sameprocess area while EPCs cut across process areas To identifyIPCs we make use of the description of practices andsubpractices and their related work products Specifically wefocus on relationships of input and output work productsamong practices Two internal practices are considered asinternally correlated if one practice has output work productsthat are used as input work products of the other Figure 2(a)shows an IPC EPCs are identified based on the descriptionof related process areas and external references Two externalpractices are considered as externally correlated if one prac-tice refers to the other which is shown in Figure 2(b)

Goals practices and subpractices are defined at differentlevels as shown in Figure 3 The goal level is the highestfollowed by the practice level and then the subpractice levelWe consider two components (eg goals) as correlated ifthey reference each other A component referencing anothercomponent at a lower level is considered as related to thatspecific component only On the other hand a componentreferencing another component at a higher level is consideredas related to all the subcomponents of the component TwoPAs are considered as correlated if a component of one PA

refers to a component of the other or one PA refers to theother PA in its description

Table 1 shows examples of PAs and a subset of theircomponents defined in CMMI The table describes threePAsmdashRequirement Management (REQM) Project MonitoringandControl (PMC) andProject Planning (PP)REQM has onespecific goal (SG 1) with two specific practices (SP 11 and SP12) Other PAs can be explained similarly A component maybe accompanied with reference information For instance SG1 in REQM has the following reference description ldquoReferto the Project Monitoring and Control process area for moreinformation about managing corrective action to closurerdquoFrom this description one can obviously infer a relation ofREQM SG 1 to the SG 2 goal of PMC which is an example of agoal-level reference Similarly from the reference descriptionof REQM SP 12 it can be easily inferred that REQM SP 12 isrelated to the SP 12 practice in PMC

Based on elicited correlations of PAs and their compo-nents a mPCG is built A mPCG is a nondirected graphcapturing correlations of practices where a node representsa practice and an edge represents a correlation An edgebetween nodes 119901

119898and 119901

119899has a weight denoted by119882

119898(119901119898119901119899)

All edges in a mPCG have their weight 1 denoting theexistence of a correlation Correlations may be found withinthe same PA or across PAs Figure 4 shows an example of amPCG

Note that CMMI descriptions are not always explicitFor instance subpractices are described mostly about outputwork products while having little description on input prod-ucts Goal-level references across PAs are described abstractlyproviding little information on how they influence relatedpractices CMMI descriptions on process areas also involveambiguities

32 Building ePCG To address the above we make use ofempirical data collected from a set of industrial SPI projectsin addition to CMMI The data is postproject data includingfindings and their correlations are already analyzed and usedin completed projects Postproject findings are often taggedwith relevant CMMI practices Accordingly we assume thatthey are all tagged in this work

Each project is analyzed to identify correlated practiceswhich are captured in a PCG PCGs of all the consideredprojects are combined to produce an ePCG The resultingePCG is then merged with the mPCG of CMMI However

The Scientific World Journal 5

REQM

PP

PMC

10

1010 10

10

1010

11 22

12 12 13

21

14

11

Process area

Practice

Correlated

Figure 4 mPCG example

Findingof practice C

Finding level Practice level

C998400 C

DD998400

(a) Case 1

Practice levelFinding level

C998400 C

DD998400 D998400998400

(b) Case 2

Practice levelFinding level

C998400 C

DD998400 D998400998400998400

(c) Case 3

Figure 5 Correspondence between findings and practices

unlike CMMI where correlations are described for practicesempirical data are described for findings which are imple-mentations of practices This is because an SPI project is animplementation of CMMI Given that findings in empiricaldata should be abstracted to practices to align the level ofabstraction of ePCGs with the mPCG of CMMI Abstractionis carried out as follows

(R1) For each finding identify the corresponding practicein CMMI

(R2) Two practices are considered as correlated if a findingof the practice is related to a finding of the otherpractice This is illustrated in Figure 5(a) The sameholds regardless the number of instances (findings)of a practice Figures 5(b) and 5(c) show the caseswhere a practice has multiple instances with differentrelations Both cases result in the same abstraction asFigure 5(a) per R3

We demonstrate abstraction using the findings in Table 2The table shows four findings REQM-W-01 PP-W-01 PP-W-02 and PP-S-01 where REQM-W-01 is an instance ofthe practice REQM SP 14 in Table 1 and PP-W-01 PP-W-02 and PP-S-01 are instances of PP SP 11 From thedescription of the findings REQM-W-01 and PP-W-01 arefound related since requirement traceability needs to bemaintained for development tasks andwork products definedin work breakdown structure Similarly REQM-W-01 and PP-S-01 are found related since PP-S-01 is a strength practiceof defining development tasks and work products Giventhis relation the corresponding practices REQM SP 14 and

PP SP 11 to these findings are considered also as relatedOn the other hand REQM-W-01 and PP-W-02 are foundnot related because COTS products are already made Thusrequirements traceability to development tasks and workproducts is not necessary However their correspondingpractices are considered as related as they have already beenso for other pairs

PCGs of SPI projects are merged by combining nodesand edges to create an ePCG for the considered project setEach edge is weighted by the number of projects having theedge identified in their PCG Let |119875

11990111199012| be the number of the

projects whose PCG has practices 1199011 and 1199012 correlated and|11987511990111199012| the number of the projects whose PCG has 1199011 and 1199012

identified as correlatedThen the weight119882119890(119901119898119901119899)of the edge

between 119901119898and 119901

119899is defined as follows

119882119890(119901119898119901119899)=

100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816

(0 le 119882119890(119901119898119901119899)le 1) (1)

As an example consider Figure 6 In the figure there arethree projects ProjetX ProjectY and ProjectZ where the PCGof ProjetX has one related practice pair ((PP 11 REQM 14))the PCGof ProjectY has two related pairs ((PP 11 REQM 14)(PP 11 REQM 11)) and the PCG of ProjectZ has one relatedpair ((REQM 14 REQM 11)) The practices are the same asthose in Figure 4 PCGs of the projects are merged into anePCG by adding all the nodes (PP 11 REQM 14 REQM 11)and their relations involved in the PCGs The weight of theedge between PP 11 and REQM 11 is measured at 05 (12)as the number of the projects that involve PP 11 and REQM

6 The Scientific World Journal

Table 2 Assessment findings and corresponding practices

Finding ID Finding description Practice ID

REQM-W-01 It is difficult to trace defined requirements to development tasks andwork products REQM SP 14

PP-W-01 Work break down structure are developed in high level (egmilestone) without detailed tasks and work product PP SP 11

PP-W-02 Product components to be purchased as COTS (commercial-off-the-shelf) are not documented PP SP 11

PP-S-01 In some projects WBS with detail description on tasks and workproducts is developed with the support of Quality Assurance team PP SP 11

ProjectX ProjectY ProjectZ ePCG

Practicepair

REQM REQM

REQM

REQM

PP PP PP11

11

11

11

11

11

11

14 14

14

14

+ +

= 22 = 10

= 12 = 05

= 13 = 033

We1

We1 We2We2

We3

We3

Figure 6 Building an ePCG

14 is two (ProjectX ProjectY) and the number of the projectsthat have PP 11 and REQM 11 identified as correlated is one(ProjectY)Weight for other edges can bemeasured similarly

33 Building iPCG by Combining mPCG and ePCG ThemPCG resulting from Section 31 and the ePCG from Sec-tion 31 are merged to produce an integrated graph iPCGFigure 7 shows an example which combines the mPCG inFigure 4 and the ePCG in Figure 6

The weight119882119894(119901119898119901119899)of an edge between 119901

119898and 119901

119899in the

iPCG is computed as follows

119882119894(119901119898119901119899)=119882119898(119901119898119901119899)+119882119890(119901119898119901119899)

2 (0 le 119882

119894(119901119898119901119899)le 1)

(2)

A threshold is set for weight and any edge having itsweight lower than the threshold is considered as not relatedWe use 025 for the threshold in this work This means thata practice pair in an iPCG is considered as correlated if it isidentified in CMMI (119882

119898= 1) or in the half or more of the

empirical projects (119882119890ge 05)

Figure 8 shows the application process of the presentedmodelThe findings identified in the process assessment of anSPI project are abstracted to practices using the abstractionrules in Section 32 The resulting practices are then input tothe iPCG to produce a PCG of the project described in termsof practicesThe practices in the PCG are concretized back tofindings using the same mapping used as in abstracting thefindings to practices at the beginning

Figure 9 shows an example of a resulting correlationmatrix The matrix is symmetric having the same set offindings in column and row and the values represent weightsIn the figure the box in bold line shows that the weight of

the (REQM-W-03 CM-W-01) pair is measured as 050 whichindicates that the practices in the pair are correlated in eitherCMMI (119882

119898= 1) or the empirical data (119882

119890= 1) used in the

model Those pairs that have zero weight are not related Theresultingmatrix is suggestiveThat is the process analyst maymodify the matrix at his discretion The matrix can be alsoused for identifying more significant findings by prioritizingthem by the number of correlations or accumulated weights(ie adding all the weights by column)

34 Supporting Tool We have developed a prototype toolsupporting the presented model Figure 10 shows the archi-tecture of the tool It consists of two componentsmdasha correla-tion analysis engine and a PCG viewer The correlation anal-ysis engine is Excel-based consisting of (1) a knowledge basecontaining CMMI-based practice correlations and project-based practice correlations (2) a PCG generator responsiblefor building PCGs and (3) a PCG executor applying an iPCGand generating the output correlations in an Excel file ThePCG viewer displays the resulting PCG using the yED GraphViewer an open source application for visualizing objectconnections [12] The viewer takes an input file in variousformats including the Excel files generated by the PCGexecutor Figure 11 shows a screenshot of the PCG viewerNodes are grouped to denote different PAs of relevance andcorrelation weights are displayed with mouse-over on edges

4 Evaluation

We evaluate the presentedmodel in terms of recall precisionF-measure and accuracy [13] by comparing the resulting cor-relations tomanually identified correlations by experts Recallis measured by the number of correlations produced by the

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 3: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

The Scientific World Journal 3

Table1Processa

reae

xamples

Processa

rea

Goal

Practic

eDescriptio

nRe

ferenceinformation

Requ

irementm

anagem

ent

(REQ

M)

a SG1

Manager

equirements

RefertotheP

rojec

tMonito

ringa

ndCo

ntrol

processa

reafor

moreinformationabou

tmanagingcorrectiv

eactionto

closure

b SP12

Obtaincommitm

enttorequ

irements

RefertotheP

rojec

tMonito

ringa

ndCo

ntrol

processa

reafor

moreinformationabou

tmon

itorin

gcommitm

ents

SP14

Maintainbidirectionaltraceabilityof

requ

irements

Notd

efined

Projectp

lann

ing(PP)

SG1

Establish

estim

ates

Notd

efined

SP11

Estim

atethe

scop

eofthe

project

RefertotheS

upplier

Agreem

entM

anagem

ent

processa

rea

SG2

Develo

pap

rojectplan

Notd

efined

SP22

Identifyprojectrisk

sRe

fertotheM

onito

rProjec

tRisk

sspecific

practic

ein

theP

rojec

tPlann

ingprocessa

reafor

more

inform

ationabou

trisk

mon

itorin

gactiv

ities

Projectm

onito

ringandcontrol

(PMC)

SG1

Mon

itorthe

projectagainstthep

lan

Notd

efined

SP12

Mon

itorc

ommitm

ents

Notd

efined

SP13

Mon

itorp

rojectris

ksRe

fertotheP

rojec

tPlann

ingprocessa

reafor

moreinformationabou

tidentify

ingprojectrisk

s

SG2

Managec

orrectivea

ctionto

closure

Notd

efined

SP21

Analyze

issues

RefertotheE

stablish

theB

udgeta

ndSchedu

lespecificp

racticeintheP

rojec

tPlann

ingprocess

area

form

oreinformationabou

tcorrectivea

ction

criteria

a Specific

goalbspecificp

ractice

4 The Scientific World Journal

PracticeProcessarea

Work product

A A

BB

(a) IPC

Reference

D D

C C

(b) EPC

Figure 2 Identifying practice correlations in CMMI

Goals

Processarea ldquoArdquo

Processarea ldquoBrdquo

Goals

Practices Practices

Subpractices

Goal levelPractice level

Reference information

Figure 3 Structure of process areas in CMMI

another PA Generic goals are shared by PAs and thus thereis no reference for generic goals

Given the PAdescriptions inCMMIwe look for informa-tion about internal practice correlations (IPCs) and externalpractice correlations (EPCs) IPCs exist within the sameprocess area while EPCs cut across process areas To identifyIPCs we make use of the description of practices andsubpractices and their related work products Specifically wefocus on relationships of input and output work productsamong practices Two internal practices are considered asinternally correlated if one practice has output work productsthat are used as input work products of the other Figure 2(a)shows an IPC EPCs are identified based on the descriptionof related process areas and external references Two externalpractices are considered as externally correlated if one prac-tice refers to the other which is shown in Figure 2(b)

Goals practices and subpractices are defined at differentlevels as shown in Figure 3 The goal level is the highestfollowed by the practice level and then the subpractice levelWe consider two components (eg goals) as correlated ifthey reference each other A component referencing anothercomponent at a lower level is considered as related to thatspecific component only On the other hand a componentreferencing another component at a higher level is consideredas related to all the subcomponents of the component TwoPAs are considered as correlated if a component of one PA

refers to a component of the other or one PA refers to theother PA in its description

Table 1 shows examples of PAs and a subset of theircomponents defined in CMMI The table describes threePAsmdashRequirement Management (REQM) Project MonitoringandControl (PMC) andProject Planning (PP)REQM has onespecific goal (SG 1) with two specific practices (SP 11 and SP12) Other PAs can be explained similarly A component maybe accompanied with reference information For instance SG1 in REQM has the following reference description ldquoReferto the Project Monitoring and Control process area for moreinformation about managing corrective action to closurerdquoFrom this description one can obviously infer a relation ofREQM SG 1 to the SG 2 goal of PMC which is an example of agoal-level reference Similarly from the reference descriptionof REQM SP 12 it can be easily inferred that REQM SP 12 isrelated to the SP 12 practice in PMC

Based on elicited correlations of PAs and their compo-nents a mPCG is built A mPCG is a nondirected graphcapturing correlations of practices where a node representsa practice and an edge represents a correlation An edgebetween nodes 119901

119898and 119901

119899has a weight denoted by119882

119898(119901119898119901119899)

All edges in a mPCG have their weight 1 denoting theexistence of a correlation Correlations may be found withinthe same PA or across PAs Figure 4 shows an example of amPCG

Note that CMMI descriptions are not always explicitFor instance subpractices are described mostly about outputwork products while having little description on input prod-ucts Goal-level references across PAs are described abstractlyproviding little information on how they influence relatedpractices CMMI descriptions on process areas also involveambiguities

32 Building ePCG To address the above we make use ofempirical data collected from a set of industrial SPI projectsin addition to CMMI The data is postproject data includingfindings and their correlations are already analyzed and usedin completed projects Postproject findings are often taggedwith relevant CMMI practices Accordingly we assume thatthey are all tagged in this work

Each project is analyzed to identify correlated practiceswhich are captured in a PCG PCGs of all the consideredprojects are combined to produce an ePCG The resultingePCG is then merged with the mPCG of CMMI However

The Scientific World Journal 5

REQM

PP

PMC

10

1010 10

10

1010

11 22

12 12 13

21

14

11

Process area

Practice

Correlated

Figure 4 mPCG example

Findingof practice C

Finding level Practice level

C998400 C

DD998400

(a) Case 1

Practice levelFinding level

C998400 C

DD998400 D998400998400

(b) Case 2

Practice levelFinding level

C998400 C

DD998400 D998400998400998400

(c) Case 3

Figure 5 Correspondence between findings and practices

unlike CMMI where correlations are described for practicesempirical data are described for findings which are imple-mentations of practices This is because an SPI project is animplementation of CMMI Given that findings in empiricaldata should be abstracted to practices to align the level ofabstraction of ePCGs with the mPCG of CMMI Abstractionis carried out as follows

(R1) For each finding identify the corresponding practicein CMMI

(R2) Two practices are considered as correlated if a findingof the practice is related to a finding of the otherpractice This is illustrated in Figure 5(a) The sameholds regardless the number of instances (findings)of a practice Figures 5(b) and 5(c) show the caseswhere a practice has multiple instances with differentrelations Both cases result in the same abstraction asFigure 5(a) per R3

We demonstrate abstraction using the findings in Table 2The table shows four findings REQM-W-01 PP-W-01 PP-W-02 and PP-S-01 where REQM-W-01 is an instance ofthe practice REQM SP 14 in Table 1 and PP-W-01 PP-W-02 and PP-S-01 are instances of PP SP 11 From thedescription of the findings REQM-W-01 and PP-W-01 arefound related since requirement traceability needs to bemaintained for development tasks andwork products definedin work breakdown structure Similarly REQM-W-01 and PP-S-01 are found related since PP-S-01 is a strength practiceof defining development tasks and work products Giventhis relation the corresponding practices REQM SP 14 and

PP SP 11 to these findings are considered also as relatedOn the other hand REQM-W-01 and PP-W-02 are foundnot related because COTS products are already made Thusrequirements traceability to development tasks and workproducts is not necessary However their correspondingpractices are considered as related as they have already beenso for other pairs

PCGs of SPI projects are merged by combining nodesand edges to create an ePCG for the considered project setEach edge is weighted by the number of projects having theedge identified in their PCG Let |119875

11990111199012| be the number of the

projects whose PCG has practices 1199011 and 1199012 correlated and|11987511990111199012| the number of the projects whose PCG has 1199011 and 1199012

identified as correlatedThen the weight119882119890(119901119898119901119899)of the edge

between 119901119898and 119901

119899is defined as follows

119882119890(119901119898119901119899)=

100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816

(0 le 119882119890(119901119898119901119899)le 1) (1)

As an example consider Figure 6 In the figure there arethree projects ProjetX ProjectY and ProjectZ where the PCGof ProjetX has one related practice pair ((PP 11 REQM 14))the PCGof ProjectY has two related pairs ((PP 11 REQM 14)(PP 11 REQM 11)) and the PCG of ProjectZ has one relatedpair ((REQM 14 REQM 11)) The practices are the same asthose in Figure 4 PCGs of the projects are merged into anePCG by adding all the nodes (PP 11 REQM 14 REQM 11)and their relations involved in the PCGs The weight of theedge between PP 11 and REQM 11 is measured at 05 (12)as the number of the projects that involve PP 11 and REQM

6 The Scientific World Journal

Table 2 Assessment findings and corresponding practices

Finding ID Finding description Practice ID

REQM-W-01 It is difficult to trace defined requirements to development tasks andwork products REQM SP 14

PP-W-01 Work break down structure are developed in high level (egmilestone) without detailed tasks and work product PP SP 11

PP-W-02 Product components to be purchased as COTS (commercial-off-the-shelf) are not documented PP SP 11

PP-S-01 In some projects WBS with detail description on tasks and workproducts is developed with the support of Quality Assurance team PP SP 11

ProjectX ProjectY ProjectZ ePCG

Practicepair

REQM REQM

REQM

REQM

PP PP PP11

11

11

11

11

11

11

14 14

14

14

+ +

= 22 = 10

= 12 = 05

= 13 = 033

We1

We1 We2We2

We3

We3

Figure 6 Building an ePCG

14 is two (ProjectX ProjectY) and the number of the projectsthat have PP 11 and REQM 11 identified as correlated is one(ProjectY)Weight for other edges can bemeasured similarly

33 Building iPCG by Combining mPCG and ePCG ThemPCG resulting from Section 31 and the ePCG from Sec-tion 31 are merged to produce an integrated graph iPCGFigure 7 shows an example which combines the mPCG inFigure 4 and the ePCG in Figure 6

The weight119882119894(119901119898119901119899)of an edge between 119901

119898and 119901

119899in the

iPCG is computed as follows

119882119894(119901119898119901119899)=119882119898(119901119898119901119899)+119882119890(119901119898119901119899)

2 (0 le 119882

119894(119901119898119901119899)le 1)

(2)

A threshold is set for weight and any edge having itsweight lower than the threshold is considered as not relatedWe use 025 for the threshold in this work This means thata practice pair in an iPCG is considered as correlated if it isidentified in CMMI (119882

119898= 1) or in the half or more of the

empirical projects (119882119890ge 05)

Figure 8 shows the application process of the presentedmodelThe findings identified in the process assessment of anSPI project are abstracted to practices using the abstractionrules in Section 32 The resulting practices are then input tothe iPCG to produce a PCG of the project described in termsof practicesThe practices in the PCG are concretized back tofindings using the same mapping used as in abstracting thefindings to practices at the beginning

Figure 9 shows an example of a resulting correlationmatrix The matrix is symmetric having the same set offindings in column and row and the values represent weightsIn the figure the box in bold line shows that the weight of

the (REQM-W-03 CM-W-01) pair is measured as 050 whichindicates that the practices in the pair are correlated in eitherCMMI (119882

119898= 1) or the empirical data (119882

119890= 1) used in the

model Those pairs that have zero weight are not related Theresultingmatrix is suggestiveThat is the process analyst maymodify the matrix at his discretion The matrix can be alsoused for identifying more significant findings by prioritizingthem by the number of correlations or accumulated weights(ie adding all the weights by column)

34 Supporting Tool We have developed a prototype toolsupporting the presented model Figure 10 shows the archi-tecture of the tool It consists of two componentsmdasha correla-tion analysis engine and a PCG viewer The correlation anal-ysis engine is Excel-based consisting of (1) a knowledge basecontaining CMMI-based practice correlations and project-based practice correlations (2) a PCG generator responsiblefor building PCGs and (3) a PCG executor applying an iPCGand generating the output correlations in an Excel file ThePCG viewer displays the resulting PCG using the yED GraphViewer an open source application for visualizing objectconnections [12] The viewer takes an input file in variousformats including the Excel files generated by the PCGexecutor Figure 11 shows a screenshot of the PCG viewerNodes are grouped to denote different PAs of relevance andcorrelation weights are displayed with mouse-over on edges

4 Evaluation

We evaluate the presentedmodel in terms of recall precisionF-measure and accuracy [13] by comparing the resulting cor-relations tomanually identified correlations by experts Recallis measured by the number of correlations produced by the

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 4: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

4 The Scientific World Journal

PracticeProcessarea

Work product

A A

BB

(a) IPC

Reference

D D

C C

(b) EPC

Figure 2 Identifying practice correlations in CMMI

Goals

Processarea ldquoArdquo

Processarea ldquoBrdquo

Goals

Practices Practices

Subpractices

Goal levelPractice level

Reference information

Figure 3 Structure of process areas in CMMI

another PA Generic goals are shared by PAs and thus thereis no reference for generic goals

Given the PAdescriptions inCMMIwe look for informa-tion about internal practice correlations (IPCs) and externalpractice correlations (EPCs) IPCs exist within the sameprocess area while EPCs cut across process areas To identifyIPCs we make use of the description of practices andsubpractices and their related work products Specifically wefocus on relationships of input and output work productsamong practices Two internal practices are considered asinternally correlated if one practice has output work productsthat are used as input work products of the other Figure 2(a)shows an IPC EPCs are identified based on the descriptionof related process areas and external references Two externalpractices are considered as externally correlated if one prac-tice refers to the other which is shown in Figure 2(b)

Goals practices and subpractices are defined at differentlevels as shown in Figure 3 The goal level is the highestfollowed by the practice level and then the subpractice levelWe consider two components (eg goals) as correlated ifthey reference each other A component referencing anothercomponent at a lower level is considered as related to thatspecific component only On the other hand a componentreferencing another component at a higher level is consideredas related to all the subcomponents of the component TwoPAs are considered as correlated if a component of one PA

refers to a component of the other or one PA refers to theother PA in its description

Table 1 shows examples of PAs and a subset of theircomponents defined in CMMI The table describes threePAsmdashRequirement Management (REQM) Project MonitoringandControl (PMC) andProject Planning (PP)REQM has onespecific goal (SG 1) with two specific practices (SP 11 and SP12) Other PAs can be explained similarly A component maybe accompanied with reference information For instance SG1 in REQM has the following reference description ldquoReferto the Project Monitoring and Control process area for moreinformation about managing corrective action to closurerdquoFrom this description one can obviously infer a relation ofREQM SG 1 to the SG 2 goal of PMC which is an example of agoal-level reference Similarly from the reference descriptionof REQM SP 12 it can be easily inferred that REQM SP 12 isrelated to the SP 12 practice in PMC

Based on elicited correlations of PAs and their compo-nents a mPCG is built A mPCG is a nondirected graphcapturing correlations of practices where a node representsa practice and an edge represents a correlation An edgebetween nodes 119901

119898and 119901

119899has a weight denoted by119882

119898(119901119898119901119899)

All edges in a mPCG have their weight 1 denoting theexistence of a correlation Correlations may be found withinthe same PA or across PAs Figure 4 shows an example of amPCG

Note that CMMI descriptions are not always explicitFor instance subpractices are described mostly about outputwork products while having little description on input prod-ucts Goal-level references across PAs are described abstractlyproviding little information on how they influence relatedpractices CMMI descriptions on process areas also involveambiguities

32 Building ePCG To address the above we make use ofempirical data collected from a set of industrial SPI projectsin addition to CMMI The data is postproject data includingfindings and their correlations are already analyzed and usedin completed projects Postproject findings are often taggedwith relevant CMMI practices Accordingly we assume thatthey are all tagged in this work

Each project is analyzed to identify correlated practiceswhich are captured in a PCG PCGs of all the consideredprojects are combined to produce an ePCG The resultingePCG is then merged with the mPCG of CMMI However

The Scientific World Journal 5

REQM

PP

PMC

10

1010 10

10

1010

11 22

12 12 13

21

14

11

Process area

Practice

Correlated

Figure 4 mPCG example

Findingof practice C

Finding level Practice level

C998400 C

DD998400

(a) Case 1

Practice levelFinding level

C998400 C

DD998400 D998400998400

(b) Case 2

Practice levelFinding level

C998400 C

DD998400 D998400998400998400

(c) Case 3

Figure 5 Correspondence between findings and practices

unlike CMMI where correlations are described for practicesempirical data are described for findings which are imple-mentations of practices This is because an SPI project is animplementation of CMMI Given that findings in empiricaldata should be abstracted to practices to align the level ofabstraction of ePCGs with the mPCG of CMMI Abstractionis carried out as follows

(R1) For each finding identify the corresponding practicein CMMI

(R2) Two practices are considered as correlated if a findingof the practice is related to a finding of the otherpractice This is illustrated in Figure 5(a) The sameholds regardless the number of instances (findings)of a practice Figures 5(b) and 5(c) show the caseswhere a practice has multiple instances with differentrelations Both cases result in the same abstraction asFigure 5(a) per R3

We demonstrate abstraction using the findings in Table 2The table shows four findings REQM-W-01 PP-W-01 PP-W-02 and PP-S-01 where REQM-W-01 is an instance ofthe practice REQM SP 14 in Table 1 and PP-W-01 PP-W-02 and PP-S-01 are instances of PP SP 11 From thedescription of the findings REQM-W-01 and PP-W-01 arefound related since requirement traceability needs to bemaintained for development tasks andwork products definedin work breakdown structure Similarly REQM-W-01 and PP-S-01 are found related since PP-S-01 is a strength practiceof defining development tasks and work products Giventhis relation the corresponding practices REQM SP 14 and

PP SP 11 to these findings are considered also as relatedOn the other hand REQM-W-01 and PP-W-02 are foundnot related because COTS products are already made Thusrequirements traceability to development tasks and workproducts is not necessary However their correspondingpractices are considered as related as they have already beenso for other pairs

PCGs of SPI projects are merged by combining nodesand edges to create an ePCG for the considered project setEach edge is weighted by the number of projects having theedge identified in their PCG Let |119875

11990111199012| be the number of the

projects whose PCG has practices 1199011 and 1199012 correlated and|11987511990111199012| the number of the projects whose PCG has 1199011 and 1199012

identified as correlatedThen the weight119882119890(119901119898119901119899)of the edge

between 119901119898and 119901

119899is defined as follows

119882119890(119901119898119901119899)=

100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816

(0 le 119882119890(119901119898119901119899)le 1) (1)

As an example consider Figure 6 In the figure there arethree projects ProjetX ProjectY and ProjectZ where the PCGof ProjetX has one related practice pair ((PP 11 REQM 14))the PCGof ProjectY has two related pairs ((PP 11 REQM 14)(PP 11 REQM 11)) and the PCG of ProjectZ has one relatedpair ((REQM 14 REQM 11)) The practices are the same asthose in Figure 4 PCGs of the projects are merged into anePCG by adding all the nodes (PP 11 REQM 14 REQM 11)and their relations involved in the PCGs The weight of theedge between PP 11 and REQM 11 is measured at 05 (12)as the number of the projects that involve PP 11 and REQM

6 The Scientific World Journal

Table 2 Assessment findings and corresponding practices

Finding ID Finding description Practice ID

REQM-W-01 It is difficult to trace defined requirements to development tasks andwork products REQM SP 14

PP-W-01 Work break down structure are developed in high level (egmilestone) without detailed tasks and work product PP SP 11

PP-W-02 Product components to be purchased as COTS (commercial-off-the-shelf) are not documented PP SP 11

PP-S-01 In some projects WBS with detail description on tasks and workproducts is developed with the support of Quality Assurance team PP SP 11

ProjectX ProjectY ProjectZ ePCG

Practicepair

REQM REQM

REQM

REQM

PP PP PP11

11

11

11

11

11

11

14 14

14

14

+ +

= 22 = 10

= 12 = 05

= 13 = 033

We1

We1 We2We2

We3

We3

Figure 6 Building an ePCG

14 is two (ProjectX ProjectY) and the number of the projectsthat have PP 11 and REQM 11 identified as correlated is one(ProjectY)Weight for other edges can bemeasured similarly

33 Building iPCG by Combining mPCG and ePCG ThemPCG resulting from Section 31 and the ePCG from Sec-tion 31 are merged to produce an integrated graph iPCGFigure 7 shows an example which combines the mPCG inFigure 4 and the ePCG in Figure 6

The weight119882119894(119901119898119901119899)of an edge between 119901

119898and 119901

119899in the

iPCG is computed as follows

119882119894(119901119898119901119899)=119882119898(119901119898119901119899)+119882119890(119901119898119901119899)

2 (0 le 119882

119894(119901119898119901119899)le 1)

(2)

A threshold is set for weight and any edge having itsweight lower than the threshold is considered as not relatedWe use 025 for the threshold in this work This means thata practice pair in an iPCG is considered as correlated if it isidentified in CMMI (119882

119898= 1) or in the half or more of the

empirical projects (119882119890ge 05)

Figure 8 shows the application process of the presentedmodelThe findings identified in the process assessment of anSPI project are abstracted to practices using the abstractionrules in Section 32 The resulting practices are then input tothe iPCG to produce a PCG of the project described in termsof practicesThe practices in the PCG are concretized back tofindings using the same mapping used as in abstracting thefindings to practices at the beginning

Figure 9 shows an example of a resulting correlationmatrix The matrix is symmetric having the same set offindings in column and row and the values represent weightsIn the figure the box in bold line shows that the weight of

the (REQM-W-03 CM-W-01) pair is measured as 050 whichindicates that the practices in the pair are correlated in eitherCMMI (119882

119898= 1) or the empirical data (119882

119890= 1) used in the

model Those pairs that have zero weight are not related Theresultingmatrix is suggestiveThat is the process analyst maymodify the matrix at his discretion The matrix can be alsoused for identifying more significant findings by prioritizingthem by the number of correlations or accumulated weights(ie adding all the weights by column)

34 Supporting Tool We have developed a prototype toolsupporting the presented model Figure 10 shows the archi-tecture of the tool It consists of two componentsmdasha correla-tion analysis engine and a PCG viewer The correlation anal-ysis engine is Excel-based consisting of (1) a knowledge basecontaining CMMI-based practice correlations and project-based practice correlations (2) a PCG generator responsiblefor building PCGs and (3) a PCG executor applying an iPCGand generating the output correlations in an Excel file ThePCG viewer displays the resulting PCG using the yED GraphViewer an open source application for visualizing objectconnections [12] The viewer takes an input file in variousformats including the Excel files generated by the PCGexecutor Figure 11 shows a screenshot of the PCG viewerNodes are grouped to denote different PAs of relevance andcorrelation weights are displayed with mouse-over on edges

4 Evaluation

We evaluate the presentedmodel in terms of recall precisionF-measure and accuracy [13] by comparing the resulting cor-relations tomanually identified correlations by experts Recallis measured by the number of correlations produced by the

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 5: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

The Scientific World Journal 5

REQM

PP

PMC

10

1010 10

10

1010

11 22

12 12 13

21

14

11

Process area

Practice

Correlated

Figure 4 mPCG example

Findingof practice C

Finding level Practice level

C998400 C

DD998400

(a) Case 1

Practice levelFinding level

C998400 C

DD998400 D998400998400

(b) Case 2

Practice levelFinding level

C998400 C

DD998400 D998400998400998400

(c) Case 3

Figure 5 Correspondence between findings and practices

unlike CMMI where correlations are described for practicesempirical data are described for findings which are imple-mentations of practices This is because an SPI project is animplementation of CMMI Given that findings in empiricaldata should be abstracted to practices to align the level ofabstraction of ePCGs with the mPCG of CMMI Abstractionis carried out as follows

(R1) For each finding identify the corresponding practicein CMMI

(R2) Two practices are considered as correlated if a findingof the practice is related to a finding of the otherpractice This is illustrated in Figure 5(a) The sameholds regardless the number of instances (findings)of a practice Figures 5(b) and 5(c) show the caseswhere a practice has multiple instances with differentrelations Both cases result in the same abstraction asFigure 5(a) per R3

We demonstrate abstraction using the findings in Table 2The table shows four findings REQM-W-01 PP-W-01 PP-W-02 and PP-S-01 where REQM-W-01 is an instance ofthe practice REQM SP 14 in Table 1 and PP-W-01 PP-W-02 and PP-S-01 are instances of PP SP 11 From thedescription of the findings REQM-W-01 and PP-W-01 arefound related since requirement traceability needs to bemaintained for development tasks andwork products definedin work breakdown structure Similarly REQM-W-01 and PP-S-01 are found related since PP-S-01 is a strength practiceof defining development tasks and work products Giventhis relation the corresponding practices REQM SP 14 and

PP SP 11 to these findings are considered also as relatedOn the other hand REQM-W-01 and PP-W-02 are foundnot related because COTS products are already made Thusrequirements traceability to development tasks and workproducts is not necessary However their correspondingpractices are considered as related as they have already beenso for other pairs

PCGs of SPI projects are merged by combining nodesand edges to create an ePCG for the considered project setEach edge is weighted by the number of projects having theedge identified in their PCG Let |119875

11990111199012| be the number of the

projects whose PCG has practices 1199011 and 1199012 correlated and|11987511990111199012| the number of the projects whose PCG has 1199011 and 1199012

identified as correlatedThen the weight119882119890(119901119898119901119899)of the edge

between 119901119898and 119901

119899is defined as follows

119882119890(119901119898119901119899)=

100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816100381610038161003816100381610038161003816119875(119901119898119901119899)

10038161003816100381610038161003816

(0 le 119882119890(119901119898119901119899)le 1) (1)

As an example consider Figure 6 In the figure there arethree projects ProjetX ProjectY and ProjectZ where the PCGof ProjetX has one related practice pair ((PP 11 REQM 14))the PCGof ProjectY has two related pairs ((PP 11 REQM 14)(PP 11 REQM 11)) and the PCG of ProjectZ has one relatedpair ((REQM 14 REQM 11)) The practices are the same asthose in Figure 4 PCGs of the projects are merged into anePCG by adding all the nodes (PP 11 REQM 14 REQM 11)and their relations involved in the PCGs The weight of theedge between PP 11 and REQM 11 is measured at 05 (12)as the number of the projects that involve PP 11 and REQM

6 The Scientific World Journal

Table 2 Assessment findings and corresponding practices

Finding ID Finding description Practice ID

REQM-W-01 It is difficult to trace defined requirements to development tasks andwork products REQM SP 14

PP-W-01 Work break down structure are developed in high level (egmilestone) without detailed tasks and work product PP SP 11

PP-W-02 Product components to be purchased as COTS (commercial-off-the-shelf) are not documented PP SP 11

PP-S-01 In some projects WBS with detail description on tasks and workproducts is developed with the support of Quality Assurance team PP SP 11

ProjectX ProjectY ProjectZ ePCG

Practicepair

REQM REQM

REQM

REQM

PP PP PP11

11

11

11

11

11

11

14 14

14

14

+ +

= 22 = 10

= 12 = 05

= 13 = 033

We1

We1 We2We2

We3

We3

Figure 6 Building an ePCG

14 is two (ProjectX ProjectY) and the number of the projectsthat have PP 11 and REQM 11 identified as correlated is one(ProjectY)Weight for other edges can bemeasured similarly

33 Building iPCG by Combining mPCG and ePCG ThemPCG resulting from Section 31 and the ePCG from Sec-tion 31 are merged to produce an integrated graph iPCGFigure 7 shows an example which combines the mPCG inFigure 4 and the ePCG in Figure 6

The weight119882119894(119901119898119901119899)of an edge between 119901

119898and 119901

119899in the

iPCG is computed as follows

119882119894(119901119898119901119899)=119882119898(119901119898119901119899)+119882119890(119901119898119901119899)

2 (0 le 119882

119894(119901119898119901119899)le 1)

(2)

A threshold is set for weight and any edge having itsweight lower than the threshold is considered as not relatedWe use 025 for the threshold in this work This means thata practice pair in an iPCG is considered as correlated if it isidentified in CMMI (119882

119898= 1) or in the half or more of the

empirical projects (119882119890ge 05)

Figure 8 shows the application process of the presentedmodelThe findings identified in the process assessment of anSPI project are abstracted to practices using the abstractionrules in Section 32 The resulting practices are then input tothe iPCG to produce a PCG of the project described in termsof practicesThe practices in the PCG are concretized back tofindings using the same mapping used as in abstracting thefindings to practices at the beginning

Figure 9 shows an example of a resulting correlationmatrix The matrix is symmetric having the same set offindings in column and row and the values represent weightsIn the figure the box in bold line shows that the weight of

the (REQM-W-03 CM-W-01) pair is measured as 050 whichindicates that the practices in the pair are correlated in eitherCMMI (119882

119898= 1) or the empirical data (119882

119890= 1) used in the

model Those pairs that have zero weight are not related Theresultingmatrix is suggestiveThat is the process analyst maymodify the matrix at his discretion The matrix can be alsoused for identifying more significant findings by prioritizingthem by the number of correlations or accumulated weights(ie adding all the weights by column)

34 Supporting Tool We have developed a prototype toolsupporting the presented model Figure 10 shows the archi-tecture of the tool It consists of two componentsmdasha correla-tion analysis engine and a PCG viewer The correlation anal-ysis engine is Excel-based consisting of (1) a knowledge basecontaining CMMI-based practice correlations and project-based practice correlations (2) a PCG generator responsiblefor building PCGs and (3) a PCG executor applying an iPCGand generating the output correlations in an Excel file ThePCG viewer displays the resulting PCG using the yED GraphViewer an open source application for visualizing objectconnections [12] The viewer takes an input file in variousformats including the Excel files generated by the PCGexecutor Figure 11 shows a screenshot of the PCG viewerNodes are grouped to denote different PAs of relevance andcorrelation weights are displayed with mouse-over on edges

4 Evaluation

We evaluate the presentedmodel in terms of recall precisionF-measure and accuracy [13] by comparing the resulting cor-relations tomanually identified correlations by experts Recallis measured by the number of correlations produced by the

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 6: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

6 The Scientific World Journal

Table 2 Assessment findings and corresponding practices

Finding ID Finding description Practice ID

REQM-W-01 It is difficult to trace defined requirements to development tasks andwork products REQM SP 14

PP-W-01 Work break down structure are developed in high level (egmilestone) without detailed tasks and work product PP SP 11

PP-W-02 Product components to be purchased as COTS (commercial-off-the-shelf) are not documented PP SP 11

PP-S-01 In some projects WBS with detail description on tasks and workproducts is developed with the support of Quality Assurance team PP SP 11

ProjectX ProjectY ProjectZ ePCG

Practicepair

REQM REQM

REQM

REQM

PP PP PP11

11

11

11

11

11

11

14 14

14

14

+ +

= 22 = 10

= 12 = 05

= 13 = 033

We1

We1 We2We2

We3

We3

Figure 6 Building an ePCG

14 is two (ProjectX ProjectY) and the number of the projectsthat have PP 11 and REQM 11 identified as correlated is one(ProjectY)Weight for other edges can bemeasured similarly

33 Building iPCG by Combining mPCG and ePCG ThemPCG resulting from Section 31 and the ePCG from Sec-tion 31 are merged to produce an integrated graph iPCGFigure 7 shows an example which combines the mPCG inFigure 4 and the ePCG in Figure 6

The weight119882119894(119901119898119901119899)of an edge between 119901

119898and 119901

119899in the

iPCG is computed as follows

119882119894(119901119898119901119899)=119882119898(119901119898119901119899)+119882119890(119901119898119901119899)

2 (0 le 119882

119894(119901119898119901119899)le 1)

(2)

A threshold is set for weight and any edge having itsweight lower than the threshold is considered as not relatedWe use 025 for the threshold in this work This means thata practice pair in an iPCG is considered as correlated if it isidentified in CMMI (119882

119898= 1) or in the half or more of the

empirical projects (119882119890ge 05)

Figure 8 shows the application process of the presentedmodelThe findings identified in the process assessment of anSPI project are abstracted to practices using the abstractionrules in Section 32 The resulting practices are then input tothe iPCG to produce a PCG of the project described in termsof practicesThe practices in the PCG are concretized back tofindings using the same mapping used as in abstracting thefindings to practices at the beginning

Figure 9 shows an example of a resulting correlationmatrix The matrix is symmetric having the same set offindings in column and row and the values represent weightsIn the figure the box in bold line shows that the weight of

the (REQM-W-03 CM-W-01) pair is measured as 050 whichindicates that the practices in the pair are correlated in eitherCMMI (119882

119898= 1) or the empirical data (119882

119890= 1) used in the

model Those pairs that have zero weight are not related Theresultingmatrix is suggestiveThat is the process analyst maymodify the matrix at his discretion The matrix can be alsoused for identifying more significant findings by prioritizingthem by the number of correlations or accumulated weights(ie adding all the weights by column)

34 Supporting Tool We have developed a prototype toolsupporting the presented model Figure 10 shows the archi-tecture of the tool It consists of two componentsmdasha correla-tion analysis engine and a PCG viewer The correlation anal-ysis engine is Excel-based consisting of (1) a knowledge basecontaining CMMI-based practice correlations and project-based practice correlations (2) a PCG generator responsiblefor building PCGs and (3) a PCG executor applying an iPCGand generating the output correlations in an Excel file ThePCG viewer displays the resulting PCG using the yED GraphViewer an open source application for visualizing objectconnections [12] The viewer takes an input file in variousformats including the Excel files generated by the PCGexecutor Figure 11 shows a screenshot of the PCG viewerNodes are grouped to denote different PAs of relevance andcorrelation weights are displayed with mouse-over on edges

4 Evaluation

We evaluate the presentedmodel in terms of recall precisionF-measure and accuracy [13] by comparing the resulting cor-relations tomanually identified correlations by experts Recallis measured by the number of correlations produced by the

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 7: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

The Scientific World Journal 7

mPCG

ePCG

iPCG

11

14

PP

11REQM

10 05

033REQM

12

11

14 12PMC

PP

11 22

13

10

101010101010

REQM

12

11

14 12

21

PMC

PP

11 22

13

05

050505

0505067

05025

21

+

Figure 7 Building an iPCG

Transform _findingToPractice

Practicelist

ProcessInput Output

Apply_iPCG ProjectPCG

Transform _practiceToFinding

iPCG

Assessmentfindings

list

Findings correlation

matrix

A

B CD E

F

YZ

X

Figure 8 Application process of the correlation model

Finding

Practice

MA-W-01 REQM-W-02 REQM-W-03 VER-W-02

MA_SP11 REQM_SP13 REQM_SP14 VER_SP11

CM-W-01 CM_SP13 050

MA-W-01 MA_SP11 000

REQM-W-02 REQM_SP13 000

050

000

A set of correlated findings for a given finding ldquoREQM-W-03rdquo

REQM-W-03

VER-W-02

REQM_SP14

VER_SP11

CM_SP13

CM-W-01

000 050

050

050

050

025

000

033

075

000

025 050

033 075

000 088

088

000

000

050

Figure 9 Example of resulting finding correlation

model over the number of manually identified correlationsSimilarly precision is measured by the number of manuallyproduced correlations over the number of correlations pro-duced by themodelThe accuracy ismeasured by the numberof correctly identified correlations to the total number ofpractice pairs

We use five industry SPI projects to evaluate the modeleach from a different company Four projects are used forbuilding the iPCG in the model and the remaining project isused as the target project to which the model is applied The

target project is rotated in the five projects so that the modelcan be applied to all the five projects As the target projectis rotated the project used as the target project is excludedfrom the project set used to build the iPCG in the model Inthis way the iPCG is not biased to the target project The fiveprojects are all CMMI-based targeting thematurity level from2 to 4

Table 3 shows an overview of the five projects used inthe evaluation In the table 119862119900119898119901119860 and 119862119900119898119901119861 aim atlevel 3 119862119900119898119901119862 and 119862119900119898119901119863 aim at level 2 and 119862119900119898119901119864aims at level 4 Given that it can be observed in the tablethat the four PAs (requirement management project planningmeasurement and analysis and configuration managementprocess) in 119862119900119898119901119862 and 119862119900119898119901119863 are from level 2 and thetwo additional PAs (technical solution and verification process)in 119862119900119898119901119860 119862119900119898119901119861 and 119862119900119898119901119864 are from level 3 The tableshows that for 119862119900119898119901119860 20 findings are found in six PAs andthey are all related to 18 practices 49 practice correlationsare found out of total 153 practice pairs in the six PAs Otherprojects can be explained similarly The six PAs involve total50 practices in CMMI of which 40 practices (accounting for80) are addressed in the five projects

Table 4 shows measured data from the evaluation TruePositive denotes the number of correlations that are identifiedby the presented model as related and also evaluated as

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 8: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

8 The Scientific World Journal

xls

Correlationmatrix

PCGgenerator

Correlation analysis engine

(Excel-based)

PCGexecutor

Knowledge base

Correlations information

PCG viewer(yED graph editor based)

∙ CMMI-based∙ Project-based

Figure 10 Prototype tool components

Project planning PA Project mgt and control PA

Figure 11 PCG viewer

Table 3 Summary on collected data

Company Number of process areas1 Number of Findings Number of practices1 Number of practice pairsTotal Correlated

CompA 6 20 18 153 49CompB 6 21 19 171 32CompC 4 12 15 105 52CompD 4 11 10 45 33CompE 6 22 20 190 49Total 6 86 40 664 2151Total means number of distinct process areas or practices

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 9: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

The Scientific World Journal 9

Table 4 Measured data

Company Truepositive

Truenegative

Falsepositive

Falsenegative

CompA 25 76 28 24CompB 11 105 34 21CompC 20 44 9 32CompD 23 8 4 10CompE 30 121 22 17Sum 109 354 97 104

related by experts True Negative represents the number ofcorrelations that are identified by the model as not relatedand also evaluated by experts as not related False Positiveshows the number of the correlations that are identified bythe model but are evaluated as irrelevant by experts FalseNegative is the number of the correlations that are evaluatedas related by experts but not identified by the presentedmodel The table shows CompB having a low true positiveThis is due to the lack of thoroughness in the project whichwas conducted hastily with nonexperts involved CompDshows a relatively higher true positive and a low false positiveThis is because there are only 10 practices involved in theprojectThus it was easier for experts to identify correlations

Based on Table 4 precision recall f-measure and accu-racy of themodel aremeasured for the five projects Precisionis measured by True Positive(True Positive + False Positive)while recall ismeasured byTrue Positive(True Positive+ FalseNegative) Table 5 shows the results The table shows that theaverage precision is 057 which implies that about 60 ofthe correlations identified by the presented model are alsoidentified as correlated in the five projects The average ofrecall is measured as 051 which means that about the half ofthe manually identified correlations are also identified by themodel F-measure which is the harmonic mean of precisionand recall is measured at 054

A higher precision implies that experts have more corre-lations identified manually that match with the correlationsidentified by the model On the other hand a higher recallimplies that the model identified more correlations matchingwith manually identified correlations CompB shows thelowest precision and recall This is because there is onlya small number of manually identified correlations to thetotal number of practice pairs (which is only 19 while theaverage is 46) On the other hand CompD has 73 ofthe total number of practice pairs identified as correlatedwhich contributes to its highest precision and recall Thisobservation was expected as precision and recall depend onthe number of manually identified correlations

Accuracy is measured by (True Positive + True Nega-tive)(True Positive + True Negative + False Positive + FalseNegative) Accuracy is measured at 069 in average whichmeans that about 70 of the correlations identified by thepresentedmodel are considered as correct in the five projectsThe standard deviation of accuracy is measured at 007 whichis relatively low compared to precision and recallThis means

Table 5 Evaluation results

Company Precision Recall 119865-measure AccuracyCompA 047 051 049 066CompB 024 034 028 068CompC 069 038 049 061CompD 085 070 077 069CompE 058 064 061 079Average 057 051 054 069Standard Deviation 023 016 018 007

that the presented model is consistent to the consideredprojects

5 Conclusion

We have presented a process correlation analysis model foridentifying correlations of findings A major advantage ofthe model is the use of empirical data which complementsthe CMMI specification being ambiguous and abstract Theevaluation of the model shows 051 for recall and 069 foraccuracy which demonstrates the potential of the model Itis noteworthy to mention that the model is developed inresponse to the need of techniques for identifying findingcorrelations from the field We shall continue to improveand evaluate the model by applying it to more case studiesAs more case studies are conducted we shall extend theevaluation to look into the efficiency aspect of themodel overmanual analysis

We also plan to investigate an effective way of packagingfindings based on identified correlations to build improve-ment plans Using correlation information a finding thathas more correlations can be found and as such a findingcan lend itself as a seed for forming a package The packagemay include findings that are directly correlated by the seedThe resulting package then serves as an early version of animprovement plan and may evolve throughout a series ofrefinement activities

The model can be also used in the case where empiricaldata is not available In such a case one may start analyzingcorrelations using only mPCG and then use the empiricaldata from the current project as it is completedThe data thencan be used as input to build an ePCG for the next projectWe expect that the model is to be more accurate as moreempirical data is used

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Korean Institute of EnergyTechnology Evaluation and Planning (KETEP) under theinternational collaborative RampD program (20118530020020)and Next-Generation Information Computing Development

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999

Page 10: Process Correlation Analysis Model for Process Improvement ... · TheScientificWorldJournal 5 REQM PP PMC 1.0 1.0 1.0 1.0 1.0 1.0 1.1 2.2 1.2 1.2 1.3 2.1 1.4 1.1 Process area Practice

10 The Scientific World Journal

Program through the National Research Foundation ofKorea(NRF) funded by theMinistry of Science ICTamp FuturePlanning (2012M3C4A7033348)

References

[1] CMMI Product Team ldquoCMMI for Development Version 13rdquoTech Rep CMUSEI-2010-TR-033 Software Engineering Insti-tute Carnegie Mellon University 2010

[2] M Staples and M Niazi ldquoSystematic review of organizationalmotivations for adopting CMM-based SPIrdquo Information andSoftware Technology vol 50 no 7-8 pp 605ndash620 2008

[3] M Swinarski D H Parente and R Kishore ldquoDo small IT firmsbenefit from higher process capabilityrdquo Communications of theACM vol 55 no 7 pp 129ndash134 2012

[4] M Unterkalmsteiner T Gorschek A K M M Islam C KCheng R B Permadi and R Feldt ldquoEvaluation and measure-ment of software process improvementmdasha systematic literaturereviewrdquo IEEE Transactions on Software Engineering vol 38 no2 pp 398ndash424 2012

[5] D Damian and J Chisan ldquoAn empirical study of the complexrelationships between requirements engineering processes andother processes that lead to payoffs in productivity quality andrisk managementrdquo IEEE Transactions on Software Engineeringvol 32 no 7 pp 433ndash453 2006

[6] S Choi D K Kim and S Park ldquoReMo a recommendationmodel for software process improvementrdquo in Proceedings ofthe International Conference on Software and Systems Process(ICSSP rsquo12) pp 135ndash139 2012

[7] X Chen M Staples and P Bannerman ldquoAnalysis of depen-dencies between specific practices in CMMI maturity level 2rdquoin Software Process Improvement vol 16 of Communications inComputer and Information Science pp 94ndash105 2008

[8] T Gorschek and C Wohlin ldquoPackaging software processimprovement issues a method and a case studyrdquo SoftwarePractice and Experience vol 34 no 14 pp 1311ndash1344 2004

[9] J A Calvo-Manzano Villalon G C Agustın T S F GilabertA de Amescua Seco and L G Sanchez ldquoExperiences inthe application of software process improvement in SMESrdquoSoftware Quality Control vol 10 no 3 pp 261ndash273 2002

[10] Y Sun andX Liu ldquoBusiness-oriented software process improve-ment based on CMMI using QFDrdquo Information and SoftwareTechnology vol 52 no 1 pp 79ndash91 2010

[11] Y Akao Ed Quality Function Deployment Integrating Cus-tomer Requirements Into Product Design Productivity Press1990

[12] yED Graph Editor Homepage httpwwwyworkscomenindexhtml

[13] M Junker A Dengel and R Hoch ldquoOn the evaluation of doc-ument analysis components by recall precision and accuracyrdquoin Proceedings of the 5th International Conference on DocumentAnalysis and Recognition (ICDAR rsquo99) pp 713ndash716 1999