R, Git, Github, and CITaiwan R User GroupWush Wu2014-09-20
DSC 2014
2014 is the first year of DSC(Data Science Conference) in Taiwan.
We (Taiwan R User Group) organizes the Tutorial Program of R in DSC.
There were more than 100 students joined us during DSC 2014.
The averaged rating is more than 4.2 (1 ~ 5).
Goal of Tutorial
Systematically introduce the analysis step with RBasic
Data Manipulation(Extract, Transform and Loading)
Analysis
Visualization
Based on the latest tools of R
Reproducibility of examples
Integration of materials
*Well designed exercises
About Me
PhD Candidate in NTU EE
Current research field:Online Advertisement
Large Scale Predictive Modeling
Organizer of Taiwan R User Group
Organizer of Tutorial Program in DSC 2014
Outline
Share the experience of organizing tutorial program with 16 people with:Git, my favorite tool of version control
Github, a platform of cooperation
Jenkins, a system of automation
I will show how to cooperate these tools with R package
Why R Package
There are many dependency for examples and exercises
R package is the recommended way to share your code
Wrap all materials in one R Package: DSC2014Tutorial so the students only need to download once.All slides are included.
Customized R API
All data
*Installation of depended packages
Solving issue of portability(Windows, Mac, and Ubuntu)
The package is easily managed by git and released on github
The structure of R package
Dependencies
DESCRIPTION
Package: DSC2014TutorialType: PackageTitle: Materials of Tutorial Program on DSC 2014Version: 1.2Date: 2014-08-03Author: Taiwan R User GroupMaintainer: Wush Wu Description: This package contains the required materials of R Tutorial DSC2014License: GPL (>= 3)Depends: R (>= 3.1.0)Imports: tools, ...
The structure of R package
Data
data
data(salary, package = 'DSC2014Tutorial')
The structure of R package
cross-platform
configure.ac / configure
The structure of R package
slides and external source
system.file('Basic', package = 'DSC2014Tutorial')
Git, Version Control
Some speakers are new to git
We used the following feature:Self version control: add, commit
Repository: remote, push, pull, and merge
Cooperation: submodul
Git plays the fundamental role in our workflow
Why Git?
Speed is king
Local commits rock
Github
My favorite
Github
Most popular platform for managing git repository
Provide many convenient featuresAccount of Organization
Designed for cooperation
Simple integration with many popular CI tools
Static website (Sufficient for R Repository)
Release R Package on Github
R is released as:a git repository
a R repository
Github and R Repository
How to establish a R repository on github:
Create a new git repository named R
Add the content of R repository into git repository in branch gh-pages
Push and wait
The R Repository is located at http://.github.io/R
The user could install the binary of DSC2014Tutorial directly via
install.packages(DSC2014Tutorial, repos = "http://TaiwanRUserGroup.github.io/R")
Cooperation
I cannot build all slides of tutorialThere are 7 slides built from different groups of speakers
Each slides should be managed by its authorEach slides is a standalone git repository
No branching here because not all speakers are familiear with git
Use gitsubmodule to embed these slides into R Package
We need modern work flow to control the quality
Workflow 1
Each speakers creates the slides and initialize the git repository
Speakers commit their changes to git repository
Open the pull request
Slide review and test on different platform
Merge changes to DSC2014Tutorial
Commits
Pull Requests
Review
Merge
Slide Review
Each speakers review the slides of each others
The comment are posted to Issue of the github pages
The speaker should resolve the posted issue
Issues
Challenge
After the first rehearsal on Taiwan R User Group, we notice a serious encoding issueDefault chinese encoding is different
Challenge
We could resolve the specific issue
The slides are evolving, some bugs might occur
We need to test the slides, but there are 7 slides and we want to test them on Windows, ubuntu and mac*
Why CI
CI automates the following thingsTesting
Integration
Deployment
CI makes me a better life
CI also introduces some problems. Let's discuss it later.
Test R Package
R CMD check --no-codoc --no-manual --no-vignettes no-build-vignettes
Deploy R Package
git push
Commit to R Repository
tools::write_PACKAGES( type = c("source", "mac.binary", "win.binary") )
R and CI
travis-ci.org
Existed work for R and Travis-ci
https://github.com/craigcitro/r-travis/wiki
travis.yml
language: cscript: ./travis-tool.sh run_testsafter_failure: - ./travis-tool.sh dump_logsbefore_install: - curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh - chmod 755 ./travis-tool.sh - ./travis-tool.sh bootstrap - ./travis-tool.sh r_binary_install XML Rcpp knitr brew RUnit inline highlight formatR highr markdown rglinstall: - ./travis-tool.sh install_deps - ./travis-tool.sh install_github hadley/testthatnotifications: email: on_success: change on_failure: changeenv:
R and CI
jenkins
Setup Jenkins
Github Plugin http://sanketdangi.com/post/62740311628/integrate-jenkins-github-trigger-build-process
Github Pull Request Builder http://www.kabisa.nl/building-github-pull-requests-with-jenkins/
Firewall (open to 192.30.252.0/22)
Auto Testing
Result
Discussion
No Error v.s. No Warnings
Existed Problems:Memory issue
Unknown Bugs
Unclear Message
Summary
Tutorial and R Package
Git and R Package
Github and R Package
CI and R Package
Q&A
Thanks for your listening