Upload
edwin-de-jonge
View
4.959
Download
1
Embed Size (px)
DESCRIPTION
Presentation given at UseR!2014, July 2nd 2014.
Citation preview
Docopt, beautiful command-line options for R
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
July 2014, UseR!2014
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
What is docopt?
docopt is an utility R library for parsing command-line options. It isa port of docopt.py (python).
How does it work?
You supply a properly formed help descriptiondocopt creates from this a fully functional command-lineparser
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Why Command-line?R is used more and more:
Ad hoc, interactive analysis, e.gR REPL shellRStudio
interactive data analysisCreating R libraries with vi, Rstudio etc.no data analysisBut also for repetitive batch jobs:
Rscript my_script.R arg1 arg2 . . .R -f my_script.R --args arg1 arg2 . . .
reproducible data processing
So also more and more Command-line!Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Rscript example
#!/usr/bin/Rscriptmy_model <- glm( data=iris
, Sepal.Width ~ Sepal.Length)
print(coef(my_model))
Hmm, that script only works for this specific data set.
I Need Arguments and Options!
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Command-line parameters
Parsing command-line parameters seems easy, but what about:
Switches? e.g. --debug, --helpShort names and long names? -d, -h vs --debug, --help?Options with a value? --output=garbage.csvArguments e.g. input_file.csv?Optional arguments?default values for options?documenting all options and arguments?
That is a lot of work for just a batch script. . .
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Retrieving command-line options
What libraries available?
base::commandArgs (very primitive)library(getopt): (basic)library(argparse), Python dependencylibrary(optparse) very nice, Python inspired
These are all fine, but result in a lot of parsing or settting-up codein your script. (and that is not what your script is about. . . )
docopt is different.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
What is Docopt?
Originally a Python lib: http://docopt.org
It is a Command Line Interface Specification language:
You specify your help and docopt parser takes care ofeverything.The documentation = the specification.Your script starts with the command-line helpdocopt automatically has --help or -h switch to supply helpto users of your script.It will stop when obligatory switch are not set or non existingoptions are set.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Simple example
#!/usr/bin/Rscript"This is my incredible script
Usage: my_inc_script.R [-v --output=<output>] FILE" -> doc
library(docopt)my_opts <- docopt(doc)
That’s all you need to handle your command-line options.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Options
Docopt lets you parse:
Both short as long optionsDefault valuesDescriptions of parametersOptional parameters: my_script.R [-a -b]Commands: my_script.R (lm | summary)Positional arguments
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Usage patterns
Syntax is defined at http://docopt.org
Start with Usage:
"Usage:script.R --option <argument>script.R [<optional-argument>]script.R --another-option=<with-argument>script.R (--either-that-option | <or-this-argument>)script.R <repeating-argument> <repeating-argument>...
" -> doc
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Longer example
#!/usr/bin/Rscript"This is my useful scriptI I use on everything
Usage: my_uf_script.R [options] FILE
Options:-b --bogus This is a bogus switch-o --output=OUTPUT output file [default: out.csv]
Arguments:FILE the input file" -> doc
library(docopt)my_opts <- docopt(doc)
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Recall first example
Lets make a CLI for our script
#!/usr/bin/Rscriptmy_model <- glm( data=iris
, Sepal.Width ~ Sepal.Length)
print(coef(my_model))
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Preparing. . .
#!/usr/bin/Rscriptmain <- function( DATA, response, terms, family){
data <- read.csv(DATA)f <- as.formula(paste0(response, " ~ ", terms))my_model <- glm(f, family=family, data=data)print(coef(my_model))
}
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Done!
"Usage: my_script.R --response=<y> --terms=<x>[--family=<family>] DATA
Options:-r --response=<y> Response for glm-t --terms=<x> Terms for glm-f --family=<family> Family [default: gaussian]
Arguments:DATA Input data frame" -> doc
main <- function( DATA, response, terms, family){...}opt <- docopt::docopt(doc)
main(opt$DATA, opt[["--response"]], opt[["--terms"]],opt[["--family"]])
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Implementation
Docopt is implemented:
using Reference classes (R5) in pure R.It is port of the original Python project: http://docopt.orgAvailable from: CRAN andhttps://github.com/edwindj/docopt.RVery functional, except for:
multiple identical arguments -vvvrepeating arguments (both will be fixed soon)
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Questions?
$ my_talk.R --helpEdwins talk on docopt
Usage: my_talk.R (--questions | --fell-asleep)
Options:-q --questions Anyone any questions?-f --fell-asleep Wake up! Next UseR talk!
$ my_talk.R --questions
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R
Questions?
Thanks for listening!Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R