Upload
massimiliano-penta
View
361
Download
0
Embed Size (px)
DESCRIPTION
Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD
Citation preview
Gerardo Canfora, Luigi Cerulo, Marta Cimitile, Massimiliano Di Penta
Social Interactions around Cross-System Bug Fixings:
The Case of FreeBSD and OpenBSD
Context
Source code is often reused across different systems
Unixes (FreeBSD, OpenBSD, Linux)
Office applications (NeoOffice, OpenOffice)
Desktop environment apps (KDE or GNOME apps)
Maintenance might require to propagate bug fixings
We call this “Cross System Bug Fixing” (CSBF)
Example:
FreeBSD, 1996/01/19, file ip_icmp.h:
–“Added definitions for ICMP router discovery. Reviewed by: wollman
OpenBSD, 1996/08/02, file ip_icmp.h:
–“ICMP Router Discovery definitions; from FreeBSD”
What we propose
A method to track CSBFs A study on the social characteristics
and development activity made by CSBF committersdegree, betweenness, brokeragecommits, lines changed
Detecting CSBF - I
Step 1: mining cross-referencing commitsopenbsd, atphy.c,2008/09/25 20:47:16,brad,
Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
Step 2: mine commits previously performed on files with same name in the other system freebsd,atphy.c,2008/05/19 01:12:10,yongari,
Add Attansic/Atheros F1 PHY driver.
openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
Detecting CSBF - II
Step 3: compute file similarity with clone detectionCCFinder
Threshold: at least 10% of cloned lines
Step 4: take the previous change with the highest textual similarity in the commit noteUse of Vector Space models
Cosine similarity; threshold (0.20) to filter out unrelated commits
Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
==
Add Attansic/Atheros F1 PHY driver.
0.72
Building Committers' Network
We extract communication from mailing listsBug fixing mailing lists
Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emailsAlso, to map committer Ids to mailing list
names/emails
Nodes of the network labeled as:Committer / other mailing list contributorsCSBFs committer
Empirical Study
Goal: analyze the phenomenon of CSBFs
Purpose: understanding its relevance with respect to the social characteristics of the involved developers
Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSDPeriod: 1993-2009 (FreeBSD), 1998-2009
(OpenBSD)Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
Research Questions
RQ1: How do the source code committers and contributors of the two systems overlap?
RQ2: How frequent is the phenomenon of CSBFs?
RQ3: Who are the contributors involved in CSBFs?
RQ4: Are mailing list contributors involved in CSBFs more active than others?
RQ1 – Team overlap
FreeBSD OpenBSD Both
Committers 383 211 26
Mailing list contribs 8035 3843 359
Committers and mailing list contributors
213 122 17
The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different
RQ2 – Commit filtering
FreeBSD OpenBSD0
100
200
300
400
500
600
700
800
900
1000
439
933
133
296
59120
Referring commits Cloned files Linked commits
At the end of the filtering not that many but...
RQ2 – Cloned lines in CSBF files
Percentage smaller for .h files
Use of preprocessor conditional to make header files system-dependent
#if defined(__FreeBSD__)#if defined(__FreeBSD__)
C source files header files
RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common
RQ3: social characteristics Importance in terms of
(in/out) degree: number of (incoming/outcoming) communication links
Betweenness: number of communications for which the node is in the short path
Brokerage metrics: useful to analyze the communication between two clusters
B is a coordinator
B is a gatekeeper
B is a representative
All differences statistically significant
High effect size (Cohen d>1)
Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems
Row 1 Row 2 Row 3 Row 40
2
4
6
8
10
12
Column 1Column 2Column 3
Degree
In-degree
Out-degree
Betweenness / 1000
Coordinator /10
Gatekeeper
Representative
0 5 10 15 20 25 30 35 40 45 50
CSBF Others
RQ3 – social characteristics
RQ3 – committers with highest social metrics
RQ4 – change activity of CSBF committers and others
FreeBSD OpenBSD0
20000
40000
LOC added/removed
CSBF Others
FreeBSD OpenBSD0
500
1000
1500
Commits
CSBF Others
All differences statistically significant
High effect size (Cohen d∼1)
Contributors involved in CSBF are more active than others
Conclusions and Work-in-Progress
We proposed method to mine CSBF
We reported a study on FreeBSD and OpenBSD where:Development team is almost disjoint
There is a small, though not negligible portion of CSBF
Committers involved in CSBF have– Higher social importance
– Higher brokerage level
– Higher activity in source code commits
Work-in-progress:Better approaches to identify implicit CSBF, tracking and
linking changes occurring on both systems
More extensive study on less obvious cases