Upload
andrea-wiggins
View
682
Download
1
Embed Size (px)
Citation preview
Reclassifying Success and Tragedy in FLOSS Projects
Andrea WigginsKevin Crowston
1 June, 2010
Motivation
• Replication of prior research using data from RoRs– English & Schweik’s 2007 classification of project growth and success
– Relevant to both researchers and practitioners
Success and Tragedy
• English & Schweik generated classification criteria based on empirical research– Stage of growth: Initiation (I) or Growth (G)
– Outcome: Success (S), Tragedy (T) or Indeterminate (I)
– Variables (project level): age, releases, release timing, downloads, other distribution channels
Replication
• Data– Used SRDA data– Original used FLOSSmole data + spidered data•Some difference between data sets due to offsets in data collection dates for original
• Analysis– Identical criteria for 4 of 6 classes– Slight variations for TG and SG were operationally equivalent to original
Extended Analysis: Suggested Future Work
• Release-based sustainability criteria– Original operationalization does not account for diverse release management strategies
• Added two different release rate criteria– Original: time between first and last releases
– V2: Threshold for time between most recent releases (suggested by English & Schweik)
– V3: Average time between each release (my idea)
Extended Analysis: Over Time
• Additional dates– Original used data from October 2006
– Added April 2006 for short-term comparison of stability of classification
– With the default values for several variables, project status can change in a period of 6 months, affecting classification status
Results
1. Comparison to original published results (from 2007 paper)
2. Comparison of results from varying the release rate classification criterion
3. Comparison of classifications over time, Markov model showing state changes
Comparison to Original Results
2006-10 Original Replication
unclassifiable 3,186 3,296
II 13,342 (12%) 16,252 (16%)
IG 10,711 (10%) 16,252 (14%)
TI 37,320 (35%) 36,507 (31%)
TG 30,592 (28%) 32,642 (28%)
SG 15,782 (15%) 16,045 (14%)
other 8,422 n/a
Total 119,355 117,733 (+9.6%)
Comparison of Release Rate Criteria
2006-10-23 Method 1 Method 2 Method 3*
IG 11% 12% 16%
II 14% 14% 14%
SG 13% 13% 3%
TG 28% 27% 33%
TI 32% 32% 32%
Comparison Over Time
Class 2006-04-21 2006-10-23
IG 12,166 (10.8%) 12,991 (11.0%)
II 13,592 (12.4%) 16,252 (13.8%)
SG 14,244 (12.7%) 16,045 (13.6%)
TG 28,777 (25.6%) 32,642 (27.7%)
TI 39,948 (35.5%) 36,507 (31.0%)
unclassifiable 3,343 (3.0%) 3,296 (2.8%)
Total 112,430 117,733
Changes to Project Classification
Changes to Project Classification
Discussion of Methods
• Challenges for large-scale analysis– Data exceptions + automated processing
•Allow extra time to refine data handling•Adapt processes for changes in data structures
– Managing data flow across tools•Advantages and disadvantages for each tool•Create test data sets to speed debugging
Limitations
• Same as the original work– Generalizability beyond SF, imperfect data sources, simplistic measures
• Specific to this work– Changes to data source– Need for sensitivity analysis on parameters
• Inherent to topic and methods– Hard (impossible?) to validate empirically on large scale
Future Work
• Additional replication and extension• Exhaustive testing of threshold values
• Evaluate alternate measures & dynamic thresholds based on project statistics
• Incorporate CVS/email/forum data• More closely examine changes in classification over time
Conclusions
• Replicated classification of FLOSS project growth and development
• Extended analysis with variations on classification criteria
• Extended analysis with additional date
• Recommendations for large-scale FLOSS data analysis and future work
Questions?
• Data, workflows & scripts:http://floss.syr.edu/reclassifying-
success-and-tragedy