17
Quantitative data about libre software development: the FLOSSMetrics project Jesus M. Gonzalez-Barahona Teo Romera Otero (GSyC/LibreSoft, URJC) [email protected] [email protected] FOSSa, Grenoble, November 17th 2009

Floss Metrics 2009

Embed Size (px)

DESCRIPTION

FLOSSMETRICS: The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and industrial exploitation of results.

Citation preview

Page 1: Floss Metrics 2009

Quantitative data about libre softwaredevelopment: the FLOSSMetrics project

Jesus M. Gonzalez-BarahonaTeo Romera Otero

(GSyC/LibreSoft, URJC)[email protected]@libresoft.es

FOSSa, Grenoble, November 17th 2009

Page 2: Floss Metrics 2009

1

c!2006-2009 GSyC/LibreSoft

Some rights reserved. This document is distributed under the

Creative Commons Attribution-ShareAlike 3.0 licence, available

in http://creativecommons.org/licenses/by-sa/3.0/

c!GSyC/LibreSoft

Page 3: Floss Metrics 2009

FLOSSMetrics: base ideas 2

FLOSSMetrics: base ideas

Libre software development:

Lots of opinions, few known facts

Researcher-friendly: public data, reproducibility, validation of

results, large samples

Interest by volunteers and companies

Main questions:

Can libre software development be improved?

Can software engineering learn from libre software?

Can projects better understand their processes and products?

http://flossmetrics.org

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 4: Floss Metrics 2009

FLOSSMETRICS goals 3

FLOSSMETRICS goals

Retrieval of data from (thousands of) libre software projects

Analysis about actors, artefacts and processes involved in de-

velopment

Higher level studies: software evolution, human resources, ef-

fort estimation, productivity, quality, etc.

Database available to other researchers, developers

Providing tools for development follow-up

Involvement with the libre software community

http://flossmetrics.org

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 5: Floss Metrics 2009

Main results 4

Main results

Huge database with factual details about libre software de-

velopment (accessible to everyone)

Higher level analysis and studies

Sustainable platform for benchmarking and analysis

Targeted reports (SMEs, industry, etc.)

Focus on providing data and information that others can use

for research, evaluation, follow-up

http://flossmetrics.org

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 6: Floss Metrics 2009

Partners 5

Partners

Universidad Rey Juan Carlos (ES)

University of Maastricht (NL)

Wirtsshaftuniversitaet Wien (AT)

Aristotle Univeristy of Thessaloniki (GR)

Conecta s.r.l (IT)

ZEA Partners (BE)

Philips Medical Systems (NL)

Project funded by the European Commission

(FP6-IST programme)

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 7: Floss Metrics 2009

Current status and work in progress 6

Current status and work in progress

Data (of various kinds) for about 2,300 projects

Full MySQL dumps for

• CVS and Subversion commit records

• Metrics (size, complexity) for source code

• Mailing lists main headers

• Issue tracking system (bug reports, etc.)

Focused report on SMEs (third release)

Working on focused studies

Web-based data repository: Melquiades

Direct access to database to some researchers

http://melquiades.flossmetrics.org

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 8: Floss Metrics 2009

Current status and work in progress (cont.) 7

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 9: Floss Metrics 2009

Current status and work in progress (cont.) 8

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 10: Floss Metrics 2009

Current status and work in progress (cont.) 9

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 11: Floss Metrics 2009

Tools and current status 10

Tools and current status

LibreSoft Tools suite

• CVSAnalY already work with CVS, SVN, git (Bazaar com-

ing soon)

• CVSAnalY produces complexity metrics counts for each

release of each file (C, C++, Java, Python, more to come)

• Bicho: bug reports from SourceForge (Bugzilla coming soon)

• MLStat: mailing lists, hidding real email addresses

About 2,300 projects and counting

All of this integrated in Melquiades

http://melquiades.flossmetrics.org

http://forge.morfeo-project.org/projects/libresoft-tools/

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 12: Floss Metrics 2009

Retrieving information: general problems 11

Retrieving information: general problems

Diversity:

Kinds of forges: di!cult to automate

Kinds of projects: not all projects in SF are relevant

Sources for same project: forge(s), distributions...

Missing information:

Hidden information (eg: mail headers)

Lost information (eg: transition from CVS to SVN)

Bugs and errors (eg: old locks in SCM)

Stress to projects infrastructure!!

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 13: Floss Metrics 2009

Retrieving information: SCM problems 12

Retrieving information: SCM problems

Di"erent systems (CVS, Subversion, git, Bazaar, Mercurial,

etc.)

Di"erent models (file-based, commit-based, distributed)

Bots performing commits

Large transitions don’t preserve information

Performance issues (systems poorly designed for massive re-

trieval)

But at least we have facilities for incremental retrieval

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 14: Floss Metrics 2009

Retrieving information: BTS problems 13

Retrieving information: BTS problems

Di"erent systems (Bugzilla, SourceForge, GForge, trac, Launch-

pad, etc.)

Di"erent models (bug cycle, bug report parameters, etc)

Di"erent uses (issue tracker, only bugs, scheduler, etc.)

Bots acting on bug reports

Lack of facilities for incremental retrieval

Performance issues (systems not really designed for massive

retrieval)

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 15: Floss Metrics 2009

Retrieving information: Mailing lists problems 14

Retrieving information: Mailing listsproblems

Di"erent systems (usually accessible only through HTML)

Partial information (missing headers)

Bots sending email (eg: commit messages)

Spam (mixed with real messages)

But email messages are pretty uniform in format

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 16: Floss Metrics 2009

Retrieving information: All together 15

Retrieving information: All together

How to track actors and products:

• Di"erent repositories of the same project

• Di"erent projects

SourceForge helps a bit!

Massive information (when dealing with 1,000s projects)

Exchange formats (for third parties and reproduction)

Tracking information (where did this commit record came

from?):

• Repositories change

• Retrieval tools change

• Errors do occur

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project

Page 17: Floss Metrics 2009

Interested? 16

Interested?

Detailed description of work available from the website

All the software used is libre software

Keep an eye on the website

Tell us about your pet project, we can analyze it

Interested in knowing how this is useful for you: provide feed-

back about your interests, needs

Willing to collaborate with projects!

http://flossmetrics.org

http://forge.morfeo-project.org/projects/libresoft-tools/

c!GSyC/LibreSoft Quantitative data about libre software development: the FLOSSMetrics project