17
Reclassifying Success and Tragedy in FLOSS Projects Andrea Wiggins Kevin Crowston 1 June, 2010

Reclassifying Success and Tragedy in FLOSS Projects

Embed Size (px)

Citation preview

Page 1: Reclassifying Success and Tragedy in FLOSS Projects

Reclassifying Success and Tragedy in FLOSS Projects

Andrea WigginsKevin Crowston

1 June, 2010

Page 2: Reclassifying Success and Tragedy in FLOSS Projects

Motivation

• Replication of prior research using data from RoRs– English & Schweik’s 2007 classification of project growth and success

– Relevant to both researchers and practitioners

Page 3: Reclassifying Success and Tragedy in FLOSS Projects

Success and Tragedy

• English & Schweik generated classification criteria based on empirical research– Stage of growth: Initiation (I) or Growth (G)

– Outcome: Success (S), Tragedy (T) or Indeterminate (I)

– Variables (project level): age, releases, release timing, downloads, other distribution channels

Page 4: Reclassifying Success and Tragedy in FLOSS Projects

Replication

• Data– Used SRDA data– Original used FLOSSmole data + spidered data•Some difference between data sets due to offsets in data collection dates for original

• Analysis– Identical criteria for 4 of 6 classes– Slight variations for TG and SG were operationally equivalent to original

Page 5: Reclassifying Success and Tragedy in FLOSS Projects

Extended Analysis: Suggested Future Work

• Release-based sustainability criteria– Original operationalization does not account for diverse release management strategies

• Added two different release rate criteria– Original: time between first and last releases

– V2: Threshold for time between most recent releases (suggested by English & Schweik)

– V3: Average time between each release (my idea)

Page 6: Reclassifying Success and Tragedy in FLOSS Projects

Extended Analysis: Over Time

• Additional dates– Original used data from October 2006

– Added April 2006 for short-term comparison of stability of classification

– With the default values for several variables, project status can change in a period of 6 months, affecting classification status

Page 7: Reclassifying Success and Tragedy in FLOSS Projects

Results

1. Comparison to original published results (from 2007 paper)

2. Comparison of results from varying the release rate classification criterion

3. Comparison of classifications over time, Markov model showing state changes

Page 8: Reclassifying Success and Tragedy in FLOSS Projects

Comparison to Original Results

2006-10 Original Replication

unclassifiable 3,186 3,296

II 13,342 (12%) 16,252 (16%)

IG 10,711 (10%) 16,252 (14%)

TI 37,320 (35%) 36,507 (31%)

TG 30,592 (28%) 32,642 (28%)

SG 15,782 (15%) 16,045 (14%)

other 8,422 n/a

Total 119,355 117,733 (+9.6%)

Page 9: Reclassifying Success and Tragedy in FLOSS Projects

Comparison of Release Rate Criteria

2006-10-23 Method 1 Method 2 Method 3*

IG 11% 12% 16%

II 14% 14% 14%

SG 13% 13% 3%

TG 28% 27% 33%

TI 32% 32% 32%

Page 10: Reclassifying Success and Tragedy in FLOSS Projects

Comparison Over Time

Class 2006-04-21 2006-10-23

IG 12,166 (10.8%) 12,991 (11.0%)

II 13,592 (12.4%) 16,252 (13.8%)

SG 14,244 (12.7%) 16,045 (13.6%)

TG 28,777 (25.6%) 32,642 (27.7%)

TI 39,948 (35.5%) 36,507 (31.0%)

unclassifiable 3,343 (3.0%) 3,296 (2.8%)

Total 112,430 117,733

Page 11: Reclassifying Success and Tragedy in FLOSS Projects

Changes to Project Classification

Page 12: Reclassifying Success and Tragedy in FLOSS Projects

Changes to Project Classification

Page 13: Reclassifying Success and Tragedy in FLOSS Projects

Discussion of Methods

• Challenges for large-scale analysis– Data exceptions + automated processing

•Allow extra time to refine data handling•Adapt processes for changes in data structures

– Managing data flow across tools•Advantages and disadvantages for each tool•Create test data sets to speed debugging

Page 14: Reclassifying Success and Tragedy in FLOSS Projects

Limitations

• Same as the original work– Generalizability beyond SF, imperfect data sources, simplistic measures

• Specific to this work– Changes to data source– Need for sensitivity analysis on parameters

• Inherent to topic and methods– Hard (impossible?) to validate empirically on large scale

Page 15: Reclassifying Success and Tragedy in FLOSS Projects

Future Work

• Additional replication and extension• Exhaustive testing of threshold values

• Evaluate alternate measures & dynamic thresholds based on project statistics

• Incorporate CVS/email/forum data• More closely examine changes in classification over time

Page 16: Reclassifying Success and Tragedy in FLOSS Projects

Conclusions

• Replicated classification of FLOSS project growth and development

• Extended analysis with variations on classification criteria

• Extended analysis with additional date

• Recommendations for large-scale FLOSS data analysis and future work

Page 17: Reclassifying Success and Tragedy in FLOSS Projects

Questions?

• Data, workflows & scripts:http://floss.syr.edu/reclassifying-

success-and-tragedy