24
The technical impact of The technical impact of social links in free social links in free software development software development Rishab Aiyer Ghosh [email protected] MERIT/Infonomics, University of Maastricht ford Workshop on Libre Software, Oxford Internet Institute, June 25, 2004

The technical impact of social links in free software development Rishab Aiyer Ghosh [email protected] MERIT/Infonomics, University of Maastricht Oxford Workshop

Embed Size (px)

Citation preview

Page 1: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

The technical impact of social links The technical impact of social links in free software developmentin free software development

Rishab Aiyer [email protected]

MERIT/Infonomics, University of Maastricht

Oxford Workshop on Libre Software, Oxford Internet Institute, June 25, 2004

Page 2: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Some data on the Linux kernelSome data on the Linux kernel

-number of developers,

share of sub-modules

Page 3: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Some data on the Linux kernelSome data on the Linux kernel

-number of sub-modules per author

Page 4: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Some data on the Linux kernelSome data on the Linux kernel

-number of co-authors per author

Page 5: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Social choice of modules?Social choice of modules?

Most modules have just a few authors

Most authors have contributed to just one module

However, these lone contributions are not made to 1-author modules

New contributors choose modules with many other contributors

Page 6: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Mean developers per moduleMean developers per module

100.00 200.00 300.00 400.00

avcodv25 IF nz25=1 (else MISSING)

0%

5%

10%

15%

Per

cen

t

Mean number of “co-developers” per module for 1-module contributors: over 45% have more than 200 co-developers per module.

Page 7: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Mean developers per moduleMean developers per module

Mean number of “co-developers” per module:

- for 1-module contributors: over 45% have more than 200 co-developers per module.

- for 2-module contributors (red overlay), only a slight change. I.e. 2nd modules are not much smaller than 1st modules contributed to.

Page 8: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Mean developers per moduleMean developers per module

Correlation: number of modules authored by average number of developers per module

avdevv1.0 (n=158) -0.225v2.0.30 (n=618) -0.222v2.5.25 (n=2263) -0.145

(Pearson 2-tailed: correlations are significant at the 0.01 level)

Page 9: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Developer numbers / module sizeDeveloper numbers / module size

Contributors don’t necessarily know how many co-authors they have / will have for a given module

Code size of module is easily available Size is an explicit proxy for “importance” Hypothesis: developers are attracted to

“important” projects, partly because they have many other developers

Page 10: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Developer numbers / module sizeDeveloper numbers / module size

Module size highly correlated to number of authors

Correlation

v1.0 (n=30) 0.890

v2.0.30 (n=60) 0.892

v2.5.25 (n=169) 0.894(Pearson 2-tailed: correlations are significant at the 0.01 level)

Page 11: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Module associationModule association

Modules can be linked together to form a graph based on at least two criteria

Authorship: one or more contributors are common to modules

Code: one module depends on functions defined in another module

It turns out that the co-incidence of these two attributes is quite high

Page 12: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Module association: Module association: authorsauthors, v1.0, v1.0

Page 13: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Module association: Module association: codecode, v1.0, v1.0

Page 14: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Module association: Module association: bothboth, v1.0, v1.0

Page 15: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Module association: Module association: bothboth, , v2.5.25v2.5.25

Page 16: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Module association: Module association: bothboth, , v2.5.25v2.5.25

Page 17: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Exploring the author-code linkExploring the author-code link

Strong degree of coincidence – between author link and code link

Significant correlation between strength of author link and presence of code link

Indications of dynamic impact of author link on future code links (and vice versa)

Page 18: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Exploring the author-code linkExploring the author-code link

Phi 4-point similarity between binary variables for author link & code link:

v1.0 (n=435) 0.122

v2.0.30 (n=1770) 0.254

v2.5.25 (n=14196) 0.341

Page 19: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Exploring the author-code linkExploring the author-code link

Spearman’s rho: scalar variables for strength of author link against binary variable for presence of code link:

v1.0 (n=435) 0.130

v2.0.30 (n=1770) 0.241

v2.5.25 (n=14196) 0.341(strength measured by number of common authors;

also tested with other strength measures)

Page 20: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Exploring the author-code linkExploring the author-code link

Analytical problems: no good fit model with regression, possibly due to highly skewed data

Hard to select strength (rather than binary presence) variable for code dependency link

Time lag between versions too big for dynamic analysis

Page 21: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Future stepsFuture steps

Similar exercise using finer dynamic granularity (and possibly CVS data, for activity rather than cumulative authorship) may allow better interpretation

Better data on code dependency (especially dependency strength) may help in identifying relationships

Page 22: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Tentative conclusionsTentative conclusions

Significant relationship between social and technical links between modules

Direction of causality is unclear Impact on code development is, however,

potentially very high “tip of the iceberg” – e.g. code reuse

(including dependency on “trivial” rather than “complex” functions) may be much higher with greater social links

Page 23: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Future stepsFuture steps

Use of clustering algorithms may help identify predictor relationships

Common authorship may predict code dependency group-wise more than pair-wise

Page 24: The technical impact of social links in free software development Rishab Aiyer Ghosh rishab@dxm.org MERIT/Infonomics, University of Maastricht Oxford Workshop

Oxford, June 25, 2004 © Rishab Aiyer Ghosh / MERIT

Further informationFurther information

www.flossproject.orgCODD technical papers:

orbiten.org/codd/

codd.berlios.de