Upload
tom-mens
View
476
Download
0
Embed Size (px)
Citation preview
On the Development andDistribution of R Packages
An Empirical Analysis of the R Ecosystem
Alexandre Decan, Tom Mens,Maëlick Claes & Philippe GrosjeanCOMPLEXYS Research Institute
8th September 2015, IWSECO-WEA 2015
Statistical environment
Packages with code, doc, examples, tests, datasets:
http://www.r-project.org
i n s t a l l . p a c k a g e s ( " M y P a c k a g e " )
R package repositories (in March 2015)Repository name Number of packages Since Role
CRAN 6411 1997 Distribution
Bioconductor 997 2001 Distribution
R-Forge 1883 2006 SVN developmentDistribution
GitHub 5150 2008 Git developmentDistribution using devtools
But there are more: RForge, Omegahat, Bitbucket, Sourceforge, Google code, ...
How to install packagesinstall.packages function:
automatically installs a package and its dependencies if neededonly uses CRAN by defaultcan be configured to use other repositories like Bioconductor and R-Forge
Package devtools provides various functions to install packages from other sources:SVNGitGitHubBitbucketGitorious
devtools retrieves the package content and installs it using install.packages
Previous workPreliminary empirical study using CRAN meta-data
On the maintainability of CRAN packages (CSMR-WCRE 2014)Inter-project (Type1) clone study of CRAN packages:
An Empirical Study of Identical Function Clones in CRAN (IWSC 2015)Web-dashboard for CRAN maintainers
maintaineR, a web-based dashboard for maintainers of CRAN packages (ICSME 2014)
Research QuestionsWhere are R packages developed and/or distributed?How to resolve package dependencies?
Number of newly created packages on GitHub
More and more packages are developed on GitHub that are not distributed somewhere else.
Evolution of the number of packages in CRANand GitHub
The number of packages only on GitHub grows faster than the number of packages on CRAN!But it does not seem to impact the growth of CRAN.
DependenciesDefined in the DESCRIPTION file
Using the fields Depends and Imports
These fields does not specify from which repository the dependency must come!
P a c k a g e : S c i V i e w sT y p e : P a c k a g eT i t l e : S c i V i e w s G U I A P I - M a i n p a c k a g eI m p o r t s : e l l i p s eD e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S SE n h a n c e s : b a s eD e s c r i p t i o n : F u n c t i o n s t o i n s t a l l S c i V i e w s a d d i t i o n s t o R , a n d m o r e ( v a r i o u s ) t o o l sV e r s i o n : 0 . 9 - 5D a t e : 2 0 1 3 - 0 3 - 0 1A u t h o r : P h i l i p p e G r o s j e a nM a i n t a i n e r : P h i l i p p e G r o s j e a n p h g r o s j e a n @ s c i v i e w s . o r gL i c e n s e : G P L - 2L a z y L o a d : y e sU R L : h t t p : / / w w w . s c i v i e w s . o r g / S c i V i e w s - RB u g R e p o r t s : h t t p s : / / r - f o r g e . r - p r o j e c t . o r g / t r a c k e r / ? g r o u p _ i d = 1 9 4P a c k a g e d : 2 0 1 4 - 0 3 - 0 1 2 0 : 3 4 : 1 1 U T C ; p h g r o s j e a nN e e d s C o m p i l a t i o n : n oR e p o s i t o r y : C R A ND a t e / P u b l i c a t i o n : 2 0 1 4 - 0 3 - 0 2 1 2 : 4 0 : 4 2
I m p o r t s : e l l i p s eD e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
Package repository priorityFor each defined dependency relationship we consider the first package matching the dependencyby privileging repositories in this order:
CRAN Bioconductor GitHub R-Forge
Dependencies between repositories
CRAN
Bioconductor GitHub
R-Forge
58,8% 48.9%
37.2%
5.2%
2.3%
77.1%
61%
5.8%
5.7%
CRAN is the core of the ecosystem
ConclusionWe looked where R packages are developed and distributed taking into account CRAN,Bioconductor, GitHub and R-ForgeGitHub is growing at a faster pace than the other repositoriesMore and more packages are developed on GitHub but not distributed somewhere elseHowever it does not impact the other repositories:
CRAN is (still) at the center of the ecosystemMost of Bioconductor, R-Forge and GitHub requires CRAN in order to work
Current and future workTake into account more R package repositories (e.g. Bitbucket)Investigate why there are so many packages only on GitHubAsking developers (survey) about usage of CRAN and GithubEventually provide support to R package users and developersby improving package dependency managementSocio-technical analysis of R package developer communitiesSimilar study of an ecosystem based on another programming
Thanks for your attention
Questions?
Slides: http://maelick.net/presentations/iwseco-wea2015/