[IEEE 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS) - Marrakech, Morocco (2014.5.28-2014.5.30)] 2014 IEEE Eighth International Conference

Overcoming challenges in Global SoftwareDevelopment: The role of Brokers

Christina Manteli, Bart van den Hooff, Hans van VlietVU University Amsterdam

{c.manteli, b.j.vanden.hooff, j.c.van.vliet}@vu.nl

Wilco van DuinkerkenOlery BV

[email protected]

Abstract—A common collaboration structure in global soft-ware development (GSD) is clustering, wherein people tend to becloser to others with whom they share common characteristics.Clusters often create barriers in communication, coordinationand expertise awareness between remote teams, restraining thedevelopment of transactive memory (TM). In order to overcomesuch barriers, the role of brokers has emerged. In this paper, weexamine the role of brokers as facilitators in the development oftransactive memory. We use social network theory to analyze thecollaboration of an EU-funded project, where development teamscome from different partners and different locations. Our resultssuggest that task-based clusters emerge and that project memberswho coordinate activities as well as those who contribute to thecode development act as brokers. Our empirical evaluation showsthat clustering has a negative effect on TM and that brokers canmoderate that effect.

Keywords—Global Software development; clusters; brokers;social network analysis; transactive memory systems.

I. INTRODUCTION

Many studies in global software development (GSD) focuson the collaboration structures of distributed teams. FollowingConway’s law [1], for instance, previous studies examine therelationship between the architectural designs of the system,the task allocation and the teams’ collaboration (e.g. [2], [3]).Other researches focus on the analysis of GSD collaborationstructures using social network analysis (SNA) techniques. Forexample, Damian et. al. [4] use social networks to examinethe dynamic nature of membership in the different stagesof a multi-site software development project. Social networkanlysis has also been applied in the collaboration patterns ofrequirements engineering among multiple development loca-tions [5].

To date, a common collaboration structure in global soft-ware development is clustering [6]. Clusters are formed withina network of interconnected members, where some membersare closer to each other, creating dense subgroups. Thesegroups are homogeneous, as their members share certain com-mon characteristics. In global software development, the mostprominent characteristic is geographic distance [7], meaningthe people from the same location communicate more andcollaborate closer with their co-located colleagues than withtheir remote ones. In other collaborative environments, such asin open source software development communities, a commoncharacteristic that brings people together into clusters can bethe project that the members are working on, or shared interestssuch as language technologies [8].

Clusters may lead to communication and collaborationchallenges, limiting expertise awareness ( [7], [9]), and con-sequently restraining the development of transactive memory(TM). Transactive memory is the memory that members of agroup develop for encoding, storing and retrieving expertiseknowledge within their group [10]. As Faraj & Sproull note[11] “teams must be able to manage their skill and knowledgeinterdependencies effectively through expertise coordination,which entails knowing where expertise is located, knowingwhere expertise is needed, and bringing needed expertise tobear.”

In order to overcome such communication and collabo-ration challenges caused by clusters, literature suggests thepositioning of a bridging role between the remote teams [12].These bridges or brokers bring different groups together, fa-cilitate communication, improve knowledge sharing and oftenpromote innovation [13].

In this paper, we examine the role of brokers as facilitatorsin the development of transactive memory (TM). We performa case study on a multi-site project in order to answer thefollowing questions:

1) What kind of clusters emerge in the project’s collab-oration structure?

2) Which members play the role of brokers and what aretheir characteristics?

3) To what extent do brokers facilitate the developmentof transactive memory?

In the next section, we present the theoretical backgroundof transactive memory (TM), and how we measure suchmemory in our case study. In section III, we discuss clustersas a collaboration structure in global software development. Insection IV, we describe the role of brokers, their characteristicsand contribution in coordinating clusters. In sections V andVI, we present the project overview, the data collection and weprovide answers to our research questions. Finally, sections VIIand VIII discuss potential threats to validity and summarize themain lessons learned as well as suggestions for future research.

II. TRANSACTIVE MEMORY

Transactive memory is developed for encoding, storing, andretrieving expertise knowledge between members of a group.Contrary to shared (or team) mental models, where the empha-sis is on the shared knowledge between team members (theoverlapping knowledge), transactive memory focuses on theexpertise knowledge, the unique knowledge that every member

978-1-4799-2393-9/14/$31.00 ©2014 IEEE

TABLE I. TM OPERATIONALISATION

Directory Updating Information Allocation Retrieval CoordinationExpertiseAwareness

Q1. I am aware of this per-son’s area of expertise.

Q2. I am aware of the knowl-edge that this person needs forhis/her job.

Accessibility Q4. It is easy for me to accessthis person.

Credibility Q5. The information I receivefrom this person is importantfor my job.

CommunicationFrequency

Q3. How often do you transferknowledge to this person?

Q6. How often do you requestknowledge from this person?

possesses [14]. Using transactive memory, the group membershave a collective awareness of each others’ specialized knowl-edge, which helps them identify and locate where expertiseresides within the group [15]. Another characterisation of TMis as a shared understanding of “who knows what” within thegroup.

Recent studies in global software development turned theirattention to the importance of transactive memory in thecollaboration and communication of the distributed teams.One of the main concerns in GSD that steered the attentiontowards TM is the lack of knowledge that people have aboutthe expertise domain of their remote colleagues. Faraj &Sproull [11], for instance, propose a model to test expertisecoordination and their empirical results showed that factorssuch as locating and sharing expertise knowledge significantlyimproves team performance. In another study [16], the authorsidentified a positive relationship between different dimensionsof TM (such as knowledge differentiation, knowledge location,knowledge credibility) and the communication quality, as wellas the performance of the project members. Coordination ofgeographically dispersed teams is also influenced by how wellpeople are aware of each others’ expertise [17].

A. TM as a latent variable model

There have been several proposals on how to measuretransactive memory [18]–[20]. Lewis & Herndon [21] suggestthat transactive memory can be either assessed directly orindirectly as a latent variable. Direct measures allow us to drawconclusions about TM as a whole, while indirect measures areuseful in settings where TM cannot be clearly assessed. Thechoice on how to measure transactive memory mainly dependson the case study at hand [21]. In the field of global softwaredevelopment, transactive memory is not clearly defined andhence, we adopt an indirect measure for our case study.

Furthermore, according to Lewis [19] a measure of TMmust meet two basic criteria. The first is to be theoreticallyconsistent with Wegner’s conceptualization of TM that reflectsthe cooperative processes and transactions of memory used in agroup [10]. The second criterion is to align the measurementswith the settings of the field we are interested in studying.Following these suggestions, we consider previous studies andconceptualizations of the theory around transactive memoryand we create a latent variable model to measure TM in globalsoftware development. We choose to create a TM model,tailored to the concepts of GSD, rather than use an existingone such as the model proposed by Lewis [19], which providesgeneral manifestations of transactive memory. The rest of this

section describes how we operationalized TM theory and howwe used well-established concepts to create a latent variablemodel of TM.

Table I presents our operationalisation model. As suggestedby Lewis [19], the main drivers of our proposed model areWegner’s three processes of transactive memory: directoryupdating, information allocation and retrieval coordination[10]. Encoding the information in a memory system impliesthat group members keep their memory directories updatedwith regard to what others in the group are likely to know,or otherwise what is the others’ expertise domain (DirectoryUpdating). Storing the information in transactive memoryentails the process of allocating knowledge to the person whoseexpertise will facilitate its storage (Information Allocation).Additionally, in a knowledge network information is retrievedbased on knowledge of the relative expertise of the individuals(Retrieval Coordination).

Next, we consider the different scales of transactive mem-ory proposed by other scholars. According to a recent literaturereview on transactive memory theory [22], there are three wellestablished measurements of TM proposed by Lewis [19],Faraj & Sproull [11], and Austin [18]. Analysing togetherthese three measurements, we have identified three mainconcepts that they have in common and which reflect thecognitive processes of transactive memory; Expertise Aware-ness, Accessibility and Credibility. Transactive memory existswhen members are aware of the differentiated and specialisedknowledge of each other (Expertise awareness), and they canrely on each other for this specialised knowledge (Credibility).Additionally, in order to share that specialised knowledge, teammembers need to develop coordination activities that will allowthe knowledge to be shared and transferred easily across thenetwork (Accessibility).

Finally, we incorporate into our proposed model, the ideaof communication frequency as an important aspect that needsto be assessed if we want to reflect the existence of TMwithin distributed teams [23]. Especially in global softwaredevelopment, communication frequency is often regarded as abarrier in the collaboration and knowledge transfer between re-mote team members [24]. In settings where people collaborateacross different geographical locations, it becomes easier fora collective memory system to be built and maintained whencommunication frequency is high. In a previous work [25], weprovide more details on how we measure transactive memorysystems.

III. CLUSTERS: A COLLABORATION STRUCTURE

Previous researches in global software development iden-tify two main collaboration structures; clusters and core-peripheries [6]. In a core-periphery structure, “a particulargroup of developers is at the centre of the coordinationactivities. The rest of the developers (in the periphery) relyon interactions with the centrally positioned developers, forcoordinating their tasks” [26]. Several studies also suggestthat core-periphery structures lead to effective collaborationsbecause they can improve the speed and flexibility with whichinformation diffuses (e.g. in [27], [28]).

Despite the positive effects of core-periphery structureson the communication and knowledge sharing among dis-tributed teams, such structures are more popular in opensource software (OSS) communities, than in commercial GSDcollaborations [28]. The main reason is that OSS communitiesare less hierarchically organised and members’ participation ismainly voluntarily. Up to date, the most common collaborationstructure in commercial GSD is clustering. Team membersthat share common characteristics, tend to “attract” each otherand create dense sub-groups. In global software development,proximity is the most common characteristic, bringing co-located members closer to each other. For instance, a previousstudy [25] presents the collaboration structure of two multi-sitesoftware development projects where location-based clustersare formed and some of them are tightly connected with manycommunication ties between the locations, while others areloosely connected with only few communication links.

Another characteristic that drives clustering is task allo-cation, i.e. people that work together on the same task orproject tend to be closer to each other. For example, Chang &Ehrlich [7] examine how members of global software teamscreate sub-groups based on the project they are working on,and that the communication between those sub-groups is oftencoordinated by a group leader or an architect. In the same vein,several other studies build up Conway’s law [1], suggestingthat the collaboration structures of software teams representthe software modules of the system, eg. [2], [29], [30]. Task-based clusters are also often found in OSS projects, wherecontributors create subgroups based on the projects they areparticipating in (eg. in [31]–[33]).

Clustering creates several challenges in the communicationand sharing of expertise knowledge, and particularly on thedevelopment of transactive memory. The tendency to createhomogeneous clusters poses a special challenge to knowledge-intensive collaborations as the aim of such collaborations is toproduce knowledge and to disseminate it quickly and effec-tively [34]. For instance, members of location-based clustersdo not always know their remote colleagues, and as Rogers& Lea [35] observe “the lack of social presence reduces thepressure to respond to an information request, so that a requestfrom a remote individual is less likely to get handled quickly”.Similarly, members of project-based clusters are often lessaware about the domain expertise of their colleagues in otherteams, so that they do not always know the right person tocontact, or where to find appropriate information sources [9].

In order to overcome such challenges, literature suggeststhe positioning of a bridging role between the clusters. Thisintermediate role is the focus of attention of several studies,

described in various terms such as “boundary spanners”,“brokers”, “liaisons”, and “gatekeepers”. The next sessionelaborates on the role of brokers.

IV. THE ROLE OF BROKERS

Several studies have analysed the characteristics of brokers.Williams [36], for instance, describes the broker as a personwho must be capable of building sustainable relationships,negotiating and managing interdependencies. Brokers mustalso be able to balance conflicting demands of multiple groups(clusters) with differing expectations [37].

An interesting classification of the role of brokers isthat brokers can either be formal or informal [38]. Formalbrokers are dedicated members that are particularly appointedto play the role of mediator between teams. For example,in [25], communication between remote teams is defined bythe hierarchical organization of the offices, and consequentlyteam leaders and project managers act as bridges betweenlocations. Informal brokers are members that emerge as me-diators between clusters, due for example to that person’sexpertise. Milewski et. al. note that [12] “most of the bridgeswe have found in our research and in the literature are not onesthat were developed intentionally as part of the organisationstrategy, but units that have grown out of organisational need,or of the team positioning themselves in this role”. Informalbrokers are mainly common in open source projects wherecommunities are created based on flat-hierarchies and self-organization [39], [40].

Brokerage is also perceived as a good coordination strategyin the management of distributed collaboration [41]. Brokerageroles are important within a network because they cover thestructural holes that might exist between people, or betweengroups of people. Global software collaboration is, by itsnature, a situation where structural holes may emerge betweengroups that are geographically, temporally and culturally dis-tant [12].

Finally, previous studies suggest that the role of brokers canfacilitate knowledge sharing and team awareness in distributedcollaborations. Having access to diverse sources of informationgives brokers the “power” to control the information flowwithin the network [42]. Hossain at el. [43], for instance,observe that “a person who lies between many others wouldbe able to coordinate, because the position dictates a role ofleadership.”. Bridging social capital, which results from tieswith people from outside the group, is vital for knowledgecreation [44]. Within social networks, roles responsible forlinking one knowledge area to another have been shown tosignificantly facilitate and accelerate knowledge transfer acrossboundaries, promoting innovation [45].

V. RESEARCH METHODOLOGY

Our research is based on data collected from membersparticipating in an EU-funded research project, under theseventh Framework Programme (FP7) 1. The project focuseson the enhancement of natural language processing techniques.It is a two-year project (July 2012-July 2014) and it involvesthe collaboration of six partners from three locations; Spain,

1http://www.opener-project.org

TABLE II. PROJECT TASKS AND RESPONSIBLE PARTNERS

Tasks Description Responsible partnersProject Management Project management provides the overall financial and administrative

management of the project.S1, N2

Design Design tasks involve the architectural and design decisions of thesystem.

I1

Data Harvesting The development of the system components for harvesting data fromonline sources. The data are to be used as an input for analysis.

I2

Language Technologies The development of the of core components of the project, includingsentiment analysis and natural language processing tools.

N1, S2

Integration Integration tasks provide support for integrating all system compo-nents, testing and deploying.

N2, S1

Italy and the Netherlands. Our case study was performed atthe end of the first year of the project (June-July 2013).

The work of the project is divided into five main tasks,which are distributed among the partners. Table II presents thetasks as well as the partners responsible for them. We refer tothe two partners from Spain as S1, S2 and to the two partnersfrom the Netherlands as N1, N2. Similarly, we indicate thetwo partners from Italy as I1, I2.

Our analysis is based on collecting social network datathrough a web-based survey. The questionnaire invited therespondents to select those with whom they mainly communi-cate, from a list including all active project members. Addition-ally, we asked the respondents to value the relationship (on a5-point Likert scale) with each one of their connections, basedon the questions Q1-Q6 presented in Table I. The end result is aset of data that represents who communicates with whom, anda set of six valued relationships for every connection. Overall,the project involves 37 members and we had a response rateof 92% (only 3 people did not respond to the survey).

A. Measuring Transactive Memory

Using the social network data, we measure transactivememory as a latent variable characterising the link betweentwo people in the network (respondent i - connection j), basedon six items, i.e. the six questions of the survey. We performeda reliability analysis, to estimate the degree to which theselected items measure the underlying construct of transactivememory. A proper analysis for assessing the reliability ofscales is Cronbach’s alpha coefficient [46]. Cronbach’s alphameasures the internal consistency, which is the extent to whichall items in a scale can reliably express a latent variable.Cronbach’s alpha coefficient for our data set is calculated at.84. This is an acceptable value of internal consistency andhence, we can conclude that our model can be reliably usedto measure transactive memory as a latent variable.

Collecting social network data allows us to express trans-active memory both as a dyadic relationship between twopeople, as well as an individual characteristic of each one ofthe members within a team. In section II-A, we explained thatour indirect way of measuring TM is guided by the three basictransactions of directory updating (DU), information allocation(IA) and retrieval coordination (RC). Directory updating isconstructed using one item (Q1), information allocation is con-structed using items Q2, Q3 and finally retrieval coordinationis constructed using items Q4, Q5, and Q6. Therefore, wecalculate TM at the dyadic level according to Eq.(1).

TMdyadic(i,j) =Q1 + µ(Q2 +Q3) + µ(Q4, Q5, Q6)

3(1)

Subsequently, we need a TM score that can be attributedto every person in the network, and which will give us a valuefor the transactive memory at the individual level. We calculatethe individual TM based on the total scores of TMdyadic thatan individual has with his direct connections, divided by thetotal amount of people in the network, N. Using N as thedenominator in Eq.(2), we take into consideration the unequalamount of direct connections that someone has, by weightinghigher those with more connections and rating lower those withfewer connections. For example, if someone has an averageTM score of four but only two direct connections, then hisoverall transactive memory should be lower than someone withthe same average score but 10 direct connections.

TMindividual(i) =

∑ji=1 TMdyadic(i,j)

N(2)

B. Measuring Clustering and Brokerage

In order to identify clusters within the network, we used agraphical representation (referred to as sociograph from nowon) to illustrate the collaboration structure of the project.We draw the sociograph using the spring embedded layoutalgorithm [47]. Based on this algorithm, nodes of the networkthat have small path lengths between them, attract each otherand hence they are closer in the graph [48]. The springembedded layout allows us to get a fair overview of thenetwork and the location of the nodes, by representing whichnodes are closer and which nodes are further apart from eachother.

We also calculated the clustering coefficient of every mem-ber and of the entire network using the methods proposedby Watts [49]. The clustering coefficient of a single memberis calculated based on the density of the member’s localneighbourhood, i.e the extent to which a member’s directconnections are also connected with each other. The clusteringcoefficient of the entire network is the mean of all member’sclustering coefficient.

Finally, to identify the brokers in the network, we used themeasure of betweenness centrality. Betweenness centrality is acommonly used measure in social network analysis to identifycentral members, and more specifically “the extent to whichthe agent can play the part of a broker or gatekeeper” [50].

To calculate betweenness we used the method proposed byNewman [51], which counts the times a person lies in betweenthe shortest communication path of two others.

VI. ANALYSIS & RESULTS

In this section, we answer our research questions andpresent the results of our analysis. First, we examine thecoordination structure of the project by identifying the kindof clusters that emerge. Particularly, we are interested in iden-tifying the common characteristics that drive project memberscloser to each other, creating subgroups. Next, we look at themembers of the network that play the role of brokers, bridgingthe different clusters. We examine the characteristics of thetop brokers such as their role within the project and theircontribution. Finally, we perform a regression analysis in orderto study how brokerage moderates the effect that clustering hason transactive memory.

A. What kind of clusters emerge in the project’s collaborationstructure?

In order to identify the clustering tendency of the network,we begin our analysis with the overall clustering coefficientmeasure. The clustering coefficient of the network is calculatedat 0.57, which indicates a moderate tendency of the projectmembers in subgrouping (clustering coefficient ranges from 1,indicating total clustering to 0, when there are no clusters).Additionally, we plot the sociograph of the project (Fig. 1),using the spring embedded algorithm. To help with the inter-pretation of the network, we applied a colour-based graphicalconvention where every colour represents a partner:

• Partners from Spain: S1 red, S2 blue.

• Partners from Italy: I1 yellow, I2 green.

• Partners from the Netherlands: N1 black, N2 orange.

Fig. 1. Project’s collaboration structure: Overview of clusters

Design

Design

Data Harvesting

Management & Integration

Language Technology

To understand what kind of subgroups emerge, we lookat the sociograph in Fig. 1. The most prominent clusteringtendency is based on the tasks the partners are working on.Particularly, we see that the two partners working together onmanagement and integration tasks (S1, N2) are closer to each

other in the network’s structure. Similarly, the two partnersresponsible for the development of language technologies (S2,N1) also “attract” each other. Finally, the partner responsiblefor data harvesting (I1) creates a separate cluster within thenetwork’s structure.

An interesting observation is the position of the secondpartner from Italy (I2). This partner is represented by onlytwo members and they are responsible for the design andsystem architecture. Architectural decisions affect all otherpartners working on the different parts of the system andconsequently, these two members are positioned between theintegration tasks, data harvesting and the development oflanguage technologies.

Summarising the results, we observe a moderate tendencyin task-based clustering. As described in section III, previousstudies suggest that this type of clusters can be found both incommercial GSD projects where Conway’s law is represented,as well as in open source communities. As Prause et. al. [52]note “it is the nature of EU projects that company, culture, andlanguage boundaries run through the midst of a project task.It is not one partner responsible for requirements engineering,but several. Similarly, several partners will do architecture de-sign, implementation work and testing.”. Therefore, analysingthe coordination structure of this research-driven, GSD project,we observe that there is a moderate tendency for task-basedclustering.

B. Which members play the role of brokers and what are theircharacteristics?

Next, we calculate betweenness centrality for all membersin the project. In order to identify the most important brokersin the network, we look at the top 10 members with the highestbetweenness scores. Fig. 2 illustrates who are those membersand where they are positioned within the network structure.Our first observation is that the brokers are located towardsthe middle of the sociograph, spanning the boundaries betweenthe task-based clusters.

Fig. 2. Project’s top Brokers

Management & Integration

Data Harvesting

Language Technology

L

L

TC

AC

CC

L

L

CC

L

CC

bur

ora

tes

Examining the characteristics of the most important bro-kers, we see that five out of ten are partner leaders (we marked

them in the sociograph with the letter L). Partner leaders areappointed as key members representing each partner, aligningand coordinating the activities of their teams with the restof the project participants. Also, a central member of thenetwork is a person appointed as the technical coordinator ofthe project (marked in the sociograph with the initials TC). Thetechnical coordinator is responsible for keeping an overview ofthe components’ development process and the implementationsolutions. Another interesting observation is the position of theperson appointed as the administrative coordinator (depictedin the sociograph with the letters AC). The administrativecoordinator is responsible for the project’s deadlines (i.e. makesure everybody delivers on time), communicating documentsto the EU commission as well as leading project meetings.

Additionally, we look at the code repository of the projectat GitHub to find which members contribute to the code devel-opment. As Fleming & Waguespack [40] point out “technicalcontribution increases a member’s likelihood of becoming aleader in an open innovation community”. Table III presentsthe contributors for the language technology and the dataharvesting components. We notice that two top contributors(ora and bur) to the development of the language componentsare also among the top brokers of their task-based cluster. Sim-ilarly, the top contributor to the development of the componentsfor data harvesting (tes) is also a top broker in his cluster.

TABLE III. CODE CONTRIBUTORS

Contributors Number of commits

Language Technologycomponents

ora 131bur 125ana 81iga 25

moc 6asi 2civ 1

Data Harvesting com-ponents

tes 50nag 38raf 5

As we described in section IV, brokers can either be formalor informal. Summarising our results, we find that both formaland informal brokers exist within the collaboration structure ofthis project. Particularly, we observe that formal brokers arethose members appointed to coordinate the project activitiesand they have a more “managerial” role, such as team leaders,technical and administrative coordinators. On the other hand,informal brokers emerge among those members that have amore “technical” role within the project. As Lee & Cole [53]also note, in a community-based project roles emerge in theprocess of performing tasks, as opposed to firm-based projectswhere roles are planned and members are assigned to carryout tasks. In the same vein, informal brokers emerge due tothe expertise knowledge that certain members develop throughtheir technical contribution.

C. To what extent do brokers facilitate the development oftransactive memory?

As we described in section III, previous studies suggestthat creating clusters in a GSD environment inhibits thedevelopment of transactive memory. A common approach toeliminate those challenges is the role of brokers. We thereforeexamine the hypothesis that brokers moderate the negativeeffect that clustering has on transactive memory (Fig. 3).

Fig. 3. Hypothesis: Brokers as moderators

Clustering Transactive Memory

Brokerage

To test the hypothesis and answer our last question, weperformed a regression analysis using the following variables:

• Dependent variable: The individual scores of transac-tive memory calculated using Eq. 2 (section V).

• Independent variable: The individual clustering coef-ficient scores that we described in Section V-B.

• Independent variable: The betweenness centralityscores that we use to represent brokerage.

• Control variable: The role of team leader. This is adichotomous variable indicating whether a member isa team leader or not.

• Control variable: The role as code contributor. Thisis also a dichotomous variable indicating whether amember contributed code commits, or not.

A common problem in multiple regression models, and es-pecially in moderation is multicollinearity [54]. Multicollinear-ity exists when there is high correlation between the inde-pendent variables. A popular remedy to multicollinearity is tocenter the independent variables at their mean [55], [56]. Weuse this approach to transform the independent variables of ourmoderation model (betweenness and clustering coefficient).Next, we tested the variance inflation factors (VIF), a measurefor estimating the significance of the correlation between theindependent variables. All VIFs have values lower than 10which is the threshold suggested by [54]. Therefore, we canconclude that there is no significant multicollinearity betweenthe independent variables of our model.

Table IV presents the results of the analysis. Model 1shows that the two control variables of leadership role (β =.135, p > .05) and code contribution (β = .255, p > .05),do not have any significant effect on transactive memory.In Model 2, betweenness has a significant positive effect onTM (β = .350, p < .05), and clustering has a significantnegative effect (β = −.606, p < .001). Model 3 includes theinteraction of betweenness and clustering (interaction BxC),representing the moderation effect and it has a significantpositive effect on TM (β = 1.118, p < .05). These resultsvalidate our hypothesis that brokerage moderates the effect ofclustering on transactive memory.

To further support our conclusions, we apply the PRO-CESS, a method proposed by Hayes [57]. PROCESS is afreely-available regression-based path analysis (for SPSS andSAS) that estimates the model coefficients in mediation andmoderation [57]. Using PROCESS, we can identify how bro-kerage (measured through betweenness) moderates the effectthat clustering has on transactive memory. The results arepresented in Table V. We notice that for low scores ofbetweenness (one standard deviation below the mean) there

TABLE IV. RESULTS OF REGRESSION ANALYSIS ON TRANSACTIVE MEMORY

Model 1 Model 2 Model 3 DescriptivesB SE β B SE β B SE β mean SD

Control VariablesRole .218 .286 .135 -.250 .184 -.155 -.292 .162 -.181

Code Contributor .313 .217 .255 -.134 .141 -.109 -.156 .124 -.127

Independent Variables

Betweenness .011 .004 .350* .058 .015 1.802** 13.08 18.69

Clustering -1.875 .423 -.606** .974 .969 .315* .57 .19

interaction BxC 0.168 .053 1.118* 5.16 6.10Adj-R2 .004 .647 .726∆R2 .060 .627** .077*F-statistics 1.08 17.49** 20.03**∗p < .05, ∗ ∗ p < .001

is a significant negative effect of clustering on transactivememory (β = −1.20, p < .001). At the mean levels ofbetweenness, the effect is positive, although not significant(β = .89, p > .05). At high scores of betweenness (onestandard deviation above the mean), the effect is significantlypositive (β = 3.88, p < .05). In other words, we find thatamong clustered members, those who also act as brokers (i.e.those with high betweenness scores) will be better aware ofwhere expertise resides within the network, how to access,share and store expertise knowledge, than those who do notact as brokers (i.e. those with lower betweenness scores).

TABLE V. MODERATION EFFECT ON TRANSACTIVE MEMORY

Betweenness Effect.00 -1.20**

(.42)

13.08 .89(.98)

31.77 3.88*(1.95)

Bringing everything together, we conclude that brokeragemoderates the negative effect that clustering has on the devel-opment of transactive memory. Our results suggest that mem-bers who are more often “in-between” many others, within aclustered network, will overcome the knowledge managementchallenges brought by the clusters and they will develop bettertransactive memory.

VII. THREATS TO VALIDITY

The credibility of a study refers to the degree of confidencewe have on its conclusions [58]. Although we have addressedthe validity of our study from the beginning of this research, itis important to point out several aspects that might be a threat.We do so for two reasons; first because we want to establishthe integrity and credibility of our observations and results,and second for suggesting future improvements.

To tackle internal validity in our study, we carefully col-lected our social network data and achieved a 92% responserate for an accurate representation of the collaboration network.Nevertheless, we do not take into account certain individualattributes of the participants, such as the background knowl-edge and experience. For instance, academic partners often

participate in EU-funded projects where their main focus ison research, rather than on the technical contribution. Culturalaspects might also influence people’s responses. In this paper,we do not take these aspects into account but we suggest theirconsideration for the future. Additionally, real networks arenot static, but rather dynamic. Relationships between peoplechange and as such, communication networks evolve. In thecurrent study, we examine only a snapshot of the networks.An interesting approach for a subsequent study is to comparethe development of transactive memory over time and observenetworks’ evolution.

While we report on the reliability analysis for the latentvariable model of TM and the multicollinearity tests of theregression model, we recognize that there might be furtherthreats to construct validity. For instance, identifying thedomain expertise of every member in the network (i.e. thetype of specialized knowledge that each possess) can improvethe operationalization model of TM. We propose thereforethat the theoretical concepts and the variables used in theoperationalization of the model be further re-evaluated, tested,validated and potentially improved.

Finally, there might be factors posing a threat to exter-nal validity for our case study. Although, we discuss thedifferences that an EU-funded project might have comparedto commercial or OSS projects, our observations might beinfluenced by particular and distinctive characteristics of thepartners. For instance, some partners might have experienceworking together in other projects, and consequently theircollaboration might differ from partners who work togetherfor the first time. Also, EU projects’ life time might vary fromone to approxiamtely four years, which also influences howwell team members get to know each other. Whereas “a theorymay never be generalized to a setting where it has not yet beenempirically tested and confirmed” [59], we suggest that usingadditional case studies to the research can eliminate the threatto external validity.

VIII. CONCLUSION

With the evergrowing body of knowledge managementchallenges in global software development, solutions are ofparamount importance. In this paper, we present the role ofbrokers as a solution to overcome challenges in GSD, and par-ticularly to facilitate the development of transactive memory.TM is of special interest in knowledge-driven projects due to

the specific focus on expertise knowledge that team memberspossess, rather than general knowledge that everybody shares.Particularly, we used social network data collected from amulti-site, EU-funded research project and we have answeredthree research questions:

1) What kind of clusters emerge in the project’s collab-oration structure? The results show that in our casestudy, there is a moderate tendency for task-basedclustering.

2) Which members play the role of brokers and whatare their characteristics? We find that brokers (des-ignated or emergent) are members responsible forthe overall coordination of the project, as well asmembers with high technical contributions.

3) To what extent do brokers facilitate the developmentof transactive memory? Our empirical evaluationshows that clustering has a negative effect on TMand that brokers can moderate that effect.

From a practical point of view, we provide an analyticalmethod to identify clusters created within multi-site collabora-tion networks, as well as the brokers that emerge between thoseclusters. Considering the importance of brokers in overcomingknowledge management challenges, practitioners may benefitfrom our results to foster this particular role. Likewise, teamscan analyse their structure and composition and decide whetherto assign formal brokers or encourage the emergence ofinformal brokers.

From a theoretical perspective, we add to the body ofknowledge management challenges in GSD by presenting therole of brokers as facilitators on the development of transactivememory. As GSD research evolves and the attention to knowl-edge management challenges increases, solutions such as usingbrokers can benefit future studies. Additionally, we analyse anEU-funded project, which is a multi-site software developmentproject with mainly research rather than commercial interests.Up to date, research in GSD focus either on commercial andpropretiaty software, or on OSS. As Prause et. al. note [52],these research projects can be considered a third type of GSDprojects besides industrial and open source and they exhibitdifferent characteristics. In this study for instance, we foundthat members of a research-driven project tend to create task-based clusters, rather than location-based clusters. We alsoobserved that there can be both formal brokers with leadershippositions (a characteristic commonly found in industrial caseswith appointed team leaders), as well as informal brokersdefined based on their technical contribution (a characteristiccommonly found in OSS projects).

REFERENCES

[1] M. E. Conway, “How do committees invent,” Datamation, vol. 14, no. 4,pp. 28–31, 1968.

[2] J. D. Herbsleb and R. E. Grinter, “Splitting the organization andintegrating the code: Conway’s law revisited,” in Proceedings of the21st international conference on Software engineering. ACM, 1999,pp. 85–95.

[3] ——, “Architectures, coordination, and distance: Conway’s law andbeyond,” Software, IEEE, vol. 16, no. 5, pp. 63–70, 1999.

[4] D. Damian, L. Izquierdo, J. Singer, and I. Kwan, “Awareness in thewild: Why communication breakdowns occur,” in Global Software En-gineering, 2007. ICGSE 2007. Second IEEE International Conferenceon. IEEE, 2007, pp. 81–90.

[5] S. Marczak and D. Damian, “How interaction between roles shapesthe communication structure in requirements-driven collaboration,” inRequirements Engineering Conference (RE), 2011 19th IEEE Interna-tional. IEEE, 2011, pp. 47–56.

[6] C. Manteli, H. v. Vliet, and B. v. d. Hooff, “Adopting a social networkperspective in global software development,” in Global Software En-gineering (ICGSE), 2012 IEEE Seventh International Conference on.IEEE, 2012, pp. 124–133.

[7] K. T. Chang and K. Ehrlich, “Out of sight but not out of mind?: Informalnetworks, communication and media use in global software teams,” inProceedings of the 2007 conference of the center for advanced studieson Collaborative research. IBM Corp., 2007, pp. 86–97.

[8] C. Bird, D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu, “Latentsocial structure in open source projects,” in Proceedings of the 16thACM SIGSOFT International Symposium on Foundations of softwareengineering. ACM, 2008, pp. 24–35.

[9] J. D. Herbsleb and A. Mockus, “An empirical study of speed andcommunication in globally distributed software development,” SoftwareEngineering, IEEE Transactions on, vol. 29, no. 6, pp. 481–494, 2003.

[10] D. M. Wegner, “Transactive memory: A contemporary analysis of thegroup mind,” in Theories of group behavior. Springer, 1987, pp. 185–208.

[11] S. Faraj and L. Sproull, “Coordinating expertise in software develop-ment teams,” Management science, vol. 46, no. 12, pp. 1554–1568,2000.

[12] A. E. Milewski, M. Tremaine, F. Kobler, R. Egan, S. Zhang, andP. O’Sullivan, “Guidelines for effective bridging in global softwareengineering,” Software Process: Improvement and Practice, vol. 13,no. 6, pp. 477–492, 2008.

[13] N. Levina and E. Vaast, “Innovating or doing as told? status differencesand overlapping boundaries in offshore collaboration,” Mis Quarterly,vol. 32, no. 2, pp. 307–332, 2008.

[14] S. Mohammed and B. C. Dumville, “Team mental models in a teamknowledge framework: Expanding theory and measurement acrossdisciplinary boundaries,” Journal of Organizational Behavior, vol. 22,no. 2, pp. 89–106, 2001.

[15] A. B. Hollingshead, “Communication, learning, and retrieval in trans-active memory systems,” Journal of experimental social psychology,vol. 34, no. 5, pp. 423–442, 1998.

[16] X. Chen, X. Li, J. G. Clark, and G. B. Dietrich, “Knowledge sharingin open source software project teams: A transactive memory systemperspective,” International Journal of Information Management, 2013.

[17] J. A. Espinosa, S. A. Slaughter, R. E. Kraut, and J. D. Herbsleb, “Teamknowledge and coordination in geographically distributed softwaredevelopment,” Journal of Management Information Systems, vol. 24,no. 1, pp. 135–169, 2007.

[18] J. R. Austin, “Transactive memory in organizational groups: the effectsof content, consensus, specialization, and accuracy on group perfor-mance.” Journal of Applied Psychology, vol. 88, no. 5, p. 866, 2003.

[19] K. Lewis, “Measuring transactive memory systems in the field: scaledevelopment and validation.” Journal of Applied Psychology, vol. 88,no. 4, p. 587, 2003.

[20] Y. Yoo and P. Kanawattanachai, “Developments of transactive memorysystems and collective mind in virtual teams,” International Journal ofOrganizational Analysis, vol. 9, no. 2, pp. 187–208, 2001.

[21] K. Lewis and B. Herndon, “Transactive memory systems: Current issuesand future research directions,” Organization Science, vol. 22, no. 5, pp.1254–1265, 2011.

[22] Y. Ren and L. Argote, “Transactive memory systems 1985–2010: An in-tegrative framework of key dimensions, antecedents, and consequences,”The Academy of Management Annals, vol. 5, no. 1, pp. 189–229, 2011.

[23] S. P. Borgatti and R. Cross, “A relational view of information seekingand learning in social networks,” Management science, vol. 49, no. 4,pp. 432–445, 2003.

[24] R. Noordeloos, C. Manteli, and H. van Vliet, “From rup to scrumin global software development: A case study,” in Global SoftwareEngineering (ICGSE), 2012 IEEE Seventh International Conference on.IEEE, 2012, pp. 31–40.

[25] C. Manteli, B. van den Hooff, and H. van Vliet, “The effect ofgovernance on global software development: An empirical research in

transactive memory systems,” Information and Software Technology,2014.

[26] M. Cataldo and J. D. Herbsleb, “Communication patterns in geograph-ically distributed software development and engineers’ contributionsto the development effort,” in Proceedings of the 2008 internationalworkshop on Cooperative and human aspects of software engineering.ACM, 2008, pp. 25–28.

[27] J. N. Cummings and R. Cross, “Structural properties of work groups andtheir consequences for performance,” Social Networks, vol. 25, no. 3,pp. 197–210, 2003.

[28] T. Wolf, T. Nguyen, and D. Damian, “Does distance still matter?”Software Process: Improvement and Practice, vol. 13, no. 6, pp. 493–510, 2008.

[29] C. Ebert and P. De Neve, “Surviving global software development,”Software, IEEE, vol. 18, no. 2, pp. 62–69, 2001.

[30] C. Del Rosso, “Comprehend and analyze knowledge networks toimprove software evolution,” Journal of Software Maintenance andEvolution: Research and Practice, vol. 21, no. 3, pp. 189–215, 2009.

[31] G. Madey, V. Freeh, and R. Tynan, “The open source software devel-opment phenomenon: An analysis based on social network theory,” inAmericas conf. on Information Systems (AMCIS2002), 2002, pp. 1806–1813.

[32] K. Crowston and J. Howison, “The social structure of free and opensource software development,” First Monday, vol. 10, no. 2, 2005.

[33] J. Xu, S. Christley, and G. Madey, “Application of social networkanalysis to the study of open source software,” pp. 205–224, 2006.

[34] S. S. Levine and R. Kurzban, “Explaining clustering in social networks:Towards an evolutionary theory of cascading benefits,” Managerial andDecision Economics, vol. 27, no. 2-3, pp. 173–187, 2006.

[35] P. Rogers and M. Lea, “Social presence in distributed group en-vironments: The role of social identity,” Behaviour & InformationTechnology, vol. 24, no. 2, pp. 151–158, 2005.

[36] P. Williams, “The competent boundary spanner,” Public administration,vol. 80, no. 1, pp. 103–124, 2002.

[37] J. M. Podolny and J. N. Baron, “Resources and relationships: Socialnetworks and mobility in the workplace,” American sociological review,pp. 673–693, 1997.

[38] A. Johri, “Boundary spanning knowledge broker: An emerging role inglobal engineering firms,” in Frontiers in Education Conference, 2008.FIE 2008. 38th Annual. IEEE, 2008, pp. S2E–7.

[39] S. Sowe, I. Stamelos, and L. Angelis, “Identifying knowledge brokersthat yield software engineering knowledge in oss projects,” Informationand Software Technology, vol. 48, no. 11, pp. 1025–1033, 2006.

[40] L. Fleming and D. M. Waguespack, “Brokerage, boundary spanning,and leadership in open innovation communities,” Organization Science,vol. 18, no. 2, pp. 165–180, 2007.

[41] A. Boden and G. Avram, “Bridging knowledge distribution-the roleof knowledge brokers in distributed software development teams,”in Cooperative and Human Aspects on Software Engineering, 2009.CHASE’09. ICSE Workshop on. IEEE, 2009, pp. 8–11.

[42] I. Carboni and K. Ehrlich, “The effect of relational and team character-istics on individual performance: A social network perspective,” HumanResource Management, vol. 52, no. 4, pp. 511–535, 2013.

[43] L. Hossain, A. Wu, and K. K. Chung, “Actor centrality correlates toproject based coordination,” in Proceedings of the 2006 20th anniver-sary conference on Computer supported cooperative work. ACM,2006, pp. 363–372.

[44] Y. C. Yuan and G. Gay, “Homophily of network ties and bonding andbridging social capital in computer-mediated distributed teams,” Journalof Computer-Mediated Communication, vol. 11, no. 4, pp. 1062–1084,2006.

[45] P. R. Carlile, “A pragmatic view of knowledge and boundaries: Bound-ary objects in new product development,” Organization science, vol. 13,no. 4, pp. 442–455, 2002.

[46] L. J. Cronbach, “Coefficient alpha and the internal structure of tests,”Psychometrika, vol. 16, no. 3, pp. 297–334, 1951.

[47] S. Borgatti, M. G. Everett, and L. Freeman, “Ucinet 6,” AnalyticTechnologies, 2006.

[48] R. A. Hanneman and M. Riddle, “Introduction to social networkmethods,” 2005.

[49] D. J. Watts, Small worlds: the dynamics of networks between order andrandomness. Princeton university press, 1999.

[50] J. Scott and P. J. Carrington, The SAGE handbook of social networkanalysis. SAGE publications, 2011.

[51] M. E. Newman, “A measure of betweenness centrality based on randomwalks,” Social networks, vol. 27, no. 1, pp. 39–54, 2005.

[52] C. R. Prause, R. Reiners, and S. Dencheva, “Empirical study of toolsupport in highly distributed research projects,” in Global SoftwareEngineering (ICGSE), 2010 5th IEEE International Conference on.IEEE, 2010, pp. 23–32.

[53] G. K. Lee and R. E. Cole, “From a firm-based to a community-based model of knowledge creation: The case of the linux kerneldevelopment,” Organization Science, vol. 14, no. 6, pp. 633–649, 2003.

[54] R. H. Myers, Classical and modern regression with applications.Duxbury Press Belmont, CA, 1990, vol. 2.

[55] L. J. Cronbach, “Statistical tests for moderator variables: Flaws inanalyses recently proposed.” 1987.

[56] J. Jaccard, C. K. Wan, and R. Turrisi, “The detection and interpretationof interaction effects between continuous variables in multiple regres-sion,” Multivariate Behavioral Research, vol. 25, no. 4, pp. 467–478,1990.

[57] A. F. Hayes, Introduction to mediation, moderation, and conditionalprocess analysis: A regression-based approach. Guilford Press, 2013.

[58] D. E. Perry, A. A. Porter, and L. G. Votta, “Empirical studies of softwareengineering: a roadmap,” in Proceedings of the conference on The futureof Software engineering. ACM, 2000, pp. 345–355.

[59] A. S. Lee and R. L. Baskerville, “Generalizing generalizability ininformation systems research,” Information systems research, vol. 14,no. 3, pp. 221–243, 2003.

Documents

[IEEE 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS) - Marrakech, Morocco (2014.5.28-2014.5.30)] 2014 IEEE Eighth International Conference