View
73
Download
0
Category
Preview:
Citation preview
Goals of reproducible programming?
I Make your code readible by you and othersI Group your code and functionalizeI Embrace collaboration, version control and automation
Writing cleaner R code | NamesI Keep new filenames descriptive and meaningful
"helper-functions.R"# or for sequences of processing work"01_Download.R""02_Preprocessing.R"#...
I Use CamelCase or Snake_case for variables
"spatial_data""ModelFit""regression.results"
Avoid predetermined names like c or plot
Writing cleaner R code | SpacingUse Spacing just as in the english language
# Goodmodel.fit <- lm(age ~ circumference, data = Orange)
# Badf1=lm(Orange$age~Orange$circumference)
Don’t be afraid of using new lines
model.results <- data.frame(Type = sample(letters, 10),Data = NA,SampleSize = 10 )
# Same goes for loops
# And don't forget good documentation
More on writing clean code
I Google R Style GuideI Hadley Wickhams Style GuideI RopenSci Guide
And there even is a r-package to clean up your code:
formatR
Further ways to improve reproduciability
I Ideally attach your code + data to publicationsI Open-access hoster (DataDryad, Figshare, Zenodo)I Restructuring of workflow with RMarkdown / LaTeX / HTML
Functionalize!
I Many R users are tempted to write their code very specializedand non-reusable
I Number 1 rule for clear coding :
DRY - Don't repeat yourself!
Simple example: We want to fit a linear model to test if in anorange orchard the circumference (mm) increases with age (age oftrees). If so we want to quantify and display theRoot-Mean-Square-Error (RMSE) of this fit for each individualorange tree in the dataset (N = 5).
Normal way:
# Linear modelmodel.fit <- lm(age ~ circumference, data = Orange)model.resid <- residuals( model.fit )model.fitted <- fitted( model.fit )rmse <- sqrt( mean( (model.resid - model.fitted)^2 ))
tapply(model.resid - model.fitted, Orange$Tree,function(x) sqrt( mean( (x)^2 )))
Defining your functions
Essentially most r-packages are just a compilation of usefulfunctions that users have written.
# We want to get the RMSE of a linear modelrmse <- function(fit, groups = NULL, ...){
f.resid <- residuals(fit);f.fitted <- fitted(fit)if(! is.null( groups )) {
tapply((f.resid-f.fitted), groups, function(x) sqrt(mean(x^2, ...)) )} else {
sqrt(mean((f.resid-f.fitted)^2, ...))}
}
model.fit <- lm(age ~ circumference, data = Orange)
# This function is more flexible, can be further customized and# applied in other situationsrmse(model.fit)
## [1] 1041.809
rmse(model.fit, Orange$Tree)
## 3 1 5 2 4## 602.4244 688.8896 929.9055 1319.1573 1408.7033
(very) short intro into pipes
Pipes (|) are a common tool in the linux / programming world thatcan be used to chain inputs and outputs of functions together. In Rthere are two packages, namely dplyr and magrittr that enablegeneral piping between all functions
Goal:
Solve complex problems by combining simple pieces(Hadley Wickham)
library(dplyr)
model.rmse <- Orange %>%lm(age ~ circumference, data=.) %>%rmse(., Orange$Tree) %>%barplot
OR like this (Correlation within Iris dataset)
iris %>% group_by(Species) %>%summarize(count = n(), pear_r = cor(Sepal.Length, Petal.Length)) %>%arrange(desc(pear_r))
## Source: local data frame [3 x 3]#### Species count pear_r## 1 virginica 50 0.8642247## 2 versicolor 50 0.7540490## 3 setosa 50 0.2671758
Outsource your functions
# Put your function into an extra files
# At the beginning of your main processing script# you simply load them via sourcesource("outsourced.rmse.R")
Easy package writing
I Open RStudioI Install the devtools and roxygen2 packageI Create a new package project and use the existing function as
basisI Create the documentation for itI Update the package metadata and build your package
library(roxygen2)library(devtools)# Build your package with two simple commands# Has to be within your package projectdocument() # Update the namespaceinstall() # Install.package
I However package development has multiple facets and options.I More detailed info on Package development with RStudio.I Higher acceptance for method papers and analysis code. Make
it citable with a DOI
Software management and collaboration with Github
I Git is one of the most commonly used revision control systemsI Originally developed for the Linux kernel by Linus Torvalds
Github is web-based software repository service offeringdistributed revision control
Californian Startup, now the largest code hoster in theworld
Offers public repositories for free, private for money and anice snippet exchange service called gists
How to Git with rstudio (do it later)
1. Setup an account with a git repository hoster like Github2. Install RStudio and git for your platform (http://www.
rstudio.com/ide/docs/version_control/overview)3. Link to the git executable within the RStudio options4. Create a new repository on Github and a new project in
RStudio -> Version Control git5. Clone your empty project (pull), add new files/changes to it
(commit) and (push)
Further developments
There are now packages to push gists and normal git updatesdirectly from within R. In order to use them you need a github apikey (instructions on the websites below) rgithub
To detailed to show here, but have a look at the gistr package:gistr
Recommended