17
Docopt, beautiful command-line options for R Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) July 2014, UseR!2014 Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R

Docopt, beautiful command-line options for R, user2014

Embed Size (px)

DESCRIPTION

Presentation given at UseR!2014, July 2nd 2014.

Citation preview

Page 1: Docopt, beautiful command-line options for R,  user2014

Docopt, beautiful command-line options for R

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)

July 2014, UseR!2014

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 2: Docopt, beautiful command-line options for R,  user2014

What is docopt?

docopt is an utility R library for parsing command-line options. It isa port of docopt.py (python).

How does it work?

You supply a properly formed help descriptiondocopt creates from this a fully functional command-lineparser

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 3: Docopt, beautiful command-line options for R,  user2014

Why Command-line?R is used more and more:

Ad hoc, interactive analysis, e.gR REPL shellRStudio

interactive data analysisCreating R libraries with vi, Rstudio etc.no data analysisBut also for repetitive batch jobs:

Rscript my_script.R arg1 arg2 . . .R -f my_script.R --args arg1 arg2 . . .

reproducible data processing

So also more and more Command-line!Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 4: Docopt, beautiful command-line options for R,  user2014

Rscript example

#!/usr/bin/Rscriptmy_model <- glm( data=iris

, Sepal.Width ~ Sepal.Length)

print(coef(my_model))

Hmm, that script only works for this specific data set.

I Need Arguments and Options!

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 5: Docopt, beautiful command-line options for R,  user2014

Command-line parameters

Parsing command-line parameters seems easy, but what about:

Switches? e.g. --debug, --helpShort names and long names? -d, -h vs --debug, --help?Options with a value? --output=garbage.csvArguments e.g. input_file.csv?Optional arguments?default values for options?documenting all options and arguments?

That is a lot of work for just a batch script. . .

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 6: Docopt, beautiful command-line options for R,  user2014

Retrieving command-line options

What libraries available?

base::commandArgs (very primitive)library(getopt): (basic)library(argparse), Python dependencylibrary(optparse) very nice, Python inspired

These are all fine, but result in a lot of parsing or settting-up codein your script. (and that is not what your script is about. . . )

docopt is different.

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 7: Docopt, beautiful command-line options for R,  user2014

What is Docopt?

Originally a Python lib: http://docopt.org

It is a Command Line Interface Specification language:

You specify your help and docopt parser takes care ofeverything.The documentation = the specification.Your script starts with the command-line helpdocopt automatically has --help or -h switch to supply helpto users of your script.It will stop when obligatory switch are not set or non existingoptions are set.

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 8: Docopt, beautiful command-line options for R,  user2014

Simple example

#!/usr/bin/Rscript"This is my incredible script

Usage: my_inc_script.R [-v --output=<output>] FILE" -> doc

library(docopt)my_opts <- docopt(doc)

That’s all you need to handle your command-line options.

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 9: Docopt, beautiful command-line options for R,  user2014

Options

Docopt lets you parse:

Both short as long optionsDefault valuesDescriptions of parametersOptional parameters: my_script.R [-a -b]Commands: my_script.R (lm | summary)Positional arguments

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 10: Docopt, beautiful command-line options for R,  user2014

Usage patterns

Syntax is defined at http://docopt.org

Start with Usage:

"Usage:script.R --option <argument>script.R [<optional-argument>]script.R --another-option=<with-argument>script.R (--either-that-option | <or-this-argument>)script.R <repeating-argument> <repeating-argument>...

" -> doc

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 11: Docopt, beautiful command-line options for R,  user2014

Longer example

#!/usr/bin/Rscript"This is my useful scriptI I use on everything

Usage: my_uf_script.R [options] FILE

Options:-b --bogus This is a bogus switch-o --output=OUTPUT output file [default: out.csv]

Arguments:FILE the input file" -> doc

library(docopt)my_opts <- docopt(doc)

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 12: Docopt, beautiful command-line options for R,  user2014

Recall first example

Lets make a CLI for our script

#!/usr/bin/Rscriptmy_model <- glm( data=iris

, Sepal.Width ~ Sepal.Length)

print(coef(my_model))

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 13: Docopt, beautiful command-line options for R,  user2014

Preparing. . .

#!/usr/bin/Rscriptmain <- function( DATA, response, terms, family){

data <- read.csv(DATA)f <- as.formula(paste0(response, " ~ ", terms))my_model <- glm(f, family=family, data=data)print(coef(my_model))

}

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 14: Docopt, beautiful command-line options for R,  user2014

Done!

"Usage: my_script.R --response=<y> --terms=<x>[--family=<family>] DATA

Options:-r --response=<y> Response for glm-t --terms=<x> Terms for glm-f --family=<family> Family [default: gaussian]

Arguments:DATA Input data frame" -> doc

main <- function( DATA, response, terms, family){...}opt <- docopt::docopt(doc)

main(opt$DATA, opt[["--response"]], opt[["--terms"]],opt[["--family"]])

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 15: Docopt, beautiful command-line options for R,  user2014

Implementation

Docopt is implemented:

using Reference classes (R5) in pure R.It is port of the original Python project: http://docopt.orgAvailable from: CRAN andhttps://github.com/edwindj/docopt.RVery functional, except for:

multiple identical arguments -vvvrepeating arguments (both will be fixed soon)

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 16: Docopt, beautiful command-line options for R,  user2014

Questions?

$ my_talk.R --helpEdwins talk on docopt

Usage: my_talk.R (--questions | --fell-asleep)

Options:-q --questions Anyone any questions?-f --fell-asleep Wake up! Next UseR talk!

$ my_talk.R --questions

Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R

Page 17: Docopt, beautiful command-line options for R,  user2014

Questions?

Thanks for listening!Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)Docopt, beautiful command-line options for R