18
Multilevel Collaboration between Software Developers and the Impact of Proximity: an Early, Preliminary Work Dawn Foster, Guido Conaldi, Riccardo De Vita University of Greenwich Centre for Business Network Analysis http://www.gre.ac.uk/business/research/centres/cbna/home

Multilevel Collaboration between Software Developers and the Impact of Proximity:an Early, Preliminary Work

Embed Size (px)

Citation preview

Multilevel Collaboration between Software Developers and the Impact of Proximity:

an Early, Preliminary Work  

Dawn Foster, Guido Conaldi, Riccardo De Vita University of Greenwich

Centre for Business Network Analysis http://www.gre.ac.uk/business/research/centres/cbna/home

Goals for Today Very early work – seeking feedback on •  Best approaches for incorporating

multilevel concepts. •  Fitting a suitable model for multilevel

networks. •  What we have done so far.

2  

Research Overview How do participants who are paid by firms collaborate within a fluid organization? Proximity theory as a theoretical framework: •  to understand intraorganizational collaboration •  within fluid organizations •  using an open source software project, the

Linux kernel, as the empirical setting. 3  

Contributions Contribute to literature on fluid organizations by: •  Determining the impact of firm affiliation on intraorganizational

collaboration between individuals in fluid organizations. –  Existing studies on open source mostly individual motivations. –  Firms can influence collaboration of employees.

•  Demonstrating that proximity theory can be used to better understand collaboration within fluid organizations. –  Boschma’s (2005) five dimensions should further our understanding. –  Most proximity studies are inter; Fluid boundaries blur distinction.

As fluid organizations become more common, understanding collaboration within them is increasingly important. 4  

Fluid Organizations •  In fluid organizations, the boundaries and structures allow fluid

movement within the organization as individuals collaborate to coordinate activities (Ashkenas et al., 2002; Glance & Huberman, 1994).

•  Some fluid organizations are based on global virtual work across many

time zones by people from different backgrounds (Nurmi & Hinds, 2016) and may include individuals from different firms and different types of institutions (O’Mahony & Bechky, 2008).

•  Collaboration, especially within fluid organizations, crosses dimensions of proximity, including cognitive, organizational, social, institutional and geographical, which can be used to better understand collaboration (Balland, 2012; Boschma, 2005; Cantner & Graf, 2006; Crescenzi, Nathan, & Rodríguez-Pose, 2016; Knoben & Oerlemans, 2006).

5  

Proximity Theory •  Social proximity: relations between actors with trust coming from friendship and

experience (Boschma 2005).

•  Institutional proximity: whether individuals collaborate more with others in a similar institutional setting, like corporation, non-profit, university, non-affiliated, etc. (Balland 2012; Crescenzi et al. 2013).

•  Organizational proximity: relationship within an organizational structure (Boschma 2005) and to look at collaboration within and between orgs.

•  Cognitive proximity: similarity of frames of reference and knowledge (Knoben & Oerlemans 2006).

•  Geographic Proximity: physical, spatial distance between actors (Boschma 2005). Online, geographical proximity is often irrelevant, but others have used a temporal measure (time zones) (O’Leary & Cummings, 2007).

6  

Empirical Setting: Open Source •  Open source frequently studied as a fluid organization (e.g. Chen

& O’Mahony, 2009; O'Mahony & Bechky, 2008; Puranam et al., 2014)

•  Contributions by individuals, not firms (O’Mahony, 2007), but firms are increasingly paying employees to contribute as a way to participate (Jensen & Scacchi, 2007; Roberts et al., 2006).

•  Linux Kernel1: –  < 8% of contributions by

unpaid software developers –  Neutral project, competing

companies participate –  22 million lines of code –  14,000 developers –  1,300 organisations

7  

Linux Kernel

Computer Hardware (CPU, memory, disk)

Linux Operating System (Red Hat, Ubuntu)

Applications (web browser, office) S

yste

m o

nly

Use

r fac

ing

1 Corbet & Kroah-Hartman, 2016  

Collaboration Network •  Network ties: Mailing Lists – ego replies to alter

–  Collaboration for code review, patch feedback, bugs & discussions are on mailing lists before source code is accepted into repository.

•  “The mailing lists are still the primary communications space.” •  “All of our collaboration happens over discussing patches.”

8  10 Mailing Lists 2015-01-27 90 days k-core>=10

Multilevel Network •  Individual / Organizational / Mailing List Levels

–  Employers pay developers to enable firm’s products, gain influence and set direction, share information, more.

–  Most consider affiliation with the Linux kernel community to be more important than their employer.

–  Almost all contributions come from paid software developers. –  Collaboration occurs in 200+ mailing lists simultaneously.

•  How does firm affiliation with an organization shape collaboration of individuals?

•  How do mailing lists enable collaboration?

9  

Operationalizing Proximity Using Boschma’s (2005) 5 dimensions of proximity •  Organizational:

–  Operationalized as firm affiliation (company) or unaffiliated (hobbyist, etc.) •  Cognitive:

–  Usually measured based on shared knowledge / technologies –  Operationalized as contributing to areas of the source code (subsystems)

•  Geographic: –  Usually measured based on physical location, less relevant for online

collaboration. –  Operationalized using time zones (temporal geographic proximity)

•  Institutional: –  Operationalized based on employment by firm, academia, or unaffiliated

•  Social: –  Often measured using collaboration network (seems like double counting) –  Operationalized by # of times dyad participated in same mailing list threads 10  

Dataset •  Subset for testing multilevel analysis – 2 years •  Dates:

–  2013-11-01 (complete dataset: 2006-03-20 first LTS release) –  2015-11-01 – date of 4.3 release –  15, 30, 45, 60, 75, 90 day moving windows

•  Mailing Lists: –  19 of the top mailing lists (over 200), excluded top mailing list –  226,919 messages (out of 2,818,774 for top 20, all dates)

•  Source Code: –  Linux-stable tree –  177,113 commits (out of 603,006 for all dates)

11  

Relational Event Models •  Relational event models provide a “highly flexible framework for

modeling actions within social settings, which permits likelihood-based inference for behavioral mechanisms with complex dependence.” (Butts, 2008, p. 155)

•  Based on relational events, or actions generated by sender directed toward a receiver. Represented by sender, receiver, action type and time (Butts, 2008).

•  Mailing list data with a time stamp for each message provides useful data for relational event models.

•  Each reply to a mailing list post can be thought of as an event created by a sender targeted at a receiver.

•  Used to explain likelihood of collaboration between 2 developers given influence of dimensions of proximity and other effects.

12  

Effects: Dyadic P-Shifts, Recency

13  

Illustra3ons  by    Carter  Bu;s,    Sunbelt  2015  

Results - Series of difficulties •  REM model struggled with number of events:

–  Reduced to first 500 events (1.5 days) to get the model to run (used first 200 events as control, ran model with 300 events)

–  Takes 6+ hours to estimate 600 events (3 days) on a big server. –  Might have to do with the way we are loading variables into the

model. –  Possible other limitations with the REM model / Relevent software

14  

Preliminary Results

15  

Preliminary Results •  Model not yet complete: Testing the waters now.

–  tiny number of events won’t represent whole. –  missing variables likely to change these results. –  need to analyze per mailing list (mailing list level)

•  Proximity looks promising as theoretical framework –  Org prox - less likely to reply to other employees. Do they

use internal corporate channels to collaborate? –  Cognitive prox – more likely to reply to people working in

same areas of code. –  Geo prox – less likely to reply as tz difference increases

16  

Future Developments / Feedback •  We know the Model has issues:

–  Get feedback on what we have done so far and on fitting a suitable model for multilevel networks.

•  Multilevel: Both aspects need to be developed: –  Multilevel analysis of networks: multiple mailing lists at the same

time (like classrooms within schools) •  Mailing lists as levels? How do we do this?

–  Analysis of multilevel networks: complex models for networks - modeling organizational affiliation as a level.

•  Can we treat organizations as a level, instead of as an attribute of developers? •  Need to look at org level to see interactions by organization.

•  Relational Event Models: –  Options for modeling large event sequences in networks. 17  

Thank You and Questions Authors:    Dawn  M.  Foster    [email protected]          Guido  Conaldi      [email protected]    Riccardo  De  Vita    [email protected]      University  of  Greenwich,  Centre  for  Business  Network  Analysis  h;p://www.gre.ac.uk/business/research/centres/cbna  

  18