Upload
tuari
View
24
Download
1
Embed Size (px)
DESCRIPTION
Software Collaboration Networks. By Chris Zachor. Overview. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions. Introduction. Use network analysis to better understand the SourceForge and Github community developers - PowerPoint PPT Presentation
Citation preview
SOFTWARE COLLABORATION
NETWORKSBy Chris Zachor
Overview
Introduction Background
Changes Methodology
Data Collection Network Topologies Measures Tools
Conclusion Questions
Introduction
Use network analysis to better understand the SourceForge and Github community developers
Identify key differences (if any) within the two communities
Examine the diversity of collaborations within these two communities
Changes
The addition of Github to the study Contains some of the same attributes to
allow for a comparison
Other communities were looked at, but they either were not large enough or did not provide enough public data.
Data Collection
Crawling the websites using a simple Perl script and regular expressions
Collect a project list from Sourceforge www.sourceforge.net/projects/projectTitle No specified request limit Check for duplicates
Sourceforge Project Page
Github Crawling
Using the Github API provides our data Limited to 60 API calls per minute Use multiple computers to collect all 1.5
million projects
Github Project Page
Github API
Developer/Project Network
Project-Developer Network
Measures and Metrics
Degree Clustering Coeficient Modularity Power Law Small World Phenomenon
Degree
Average number of projects worked on by a developer
Average number of collaborations Average number of developers on a
project
Clustering Coeficient
Examine how likely developers are to stick together in groups
Examine both average clustering coefficient for the entire network and the local clustering coefficient for nodes of interest
Modularity
Provide us with a measure of how diverse developer collaborations are.
Range -1 < Q < 1 Ranges closer to one show less diversity
in collaboration choices Ranges closer to negative one show more
diversity in collaboration choices
Power Law
Previous studies have found that the Sourceforge community does follow the power law
No such study has been done on the Github community
Fewer developers should be apart of many project while many developers should be involved with only one project
Small World Phenomenon
Previous studies have shown the Sourceforge community does exhibit small world properties
Once again, no study has been done on the Github community
Using Pajek, I will create a random network of the same nodes and edges
Then, compare the clustering coefficient and the average shortest path
Tools
Perl Pajek cURL wget GUESS
Conclusion
Through the use of network analysis, we hope to gain a better understanding of the developers of Sourceforge and Github communities.
Questions?
Suggestions?Comments?