36
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Embed Size (px)

Citation preview

Page 1: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Tools for reproducible and accessible science

VMs, KnitR and OMERORob Davidson

Cardiac Physiome WorkshopAuckland, April 8th 2015

Page 2: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

All Your Research Objects

• Project proposal • Project experimental SOPs • Images of equipment, subjects, conditions• RAW data• Meta-data• Analysis code, parameters, pipelines• Analysis environment, VM or provisioning script• Intermediate results• Publication figures/images/tables: codify• Publication text

Source: DOI: 10.6084/m9.figshare.1330219

Page 3: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

GigaSolution: deconstructing the paperCombines and integrates:

Open-access journal

Data Publishing Platform

Data Analysis Platform

Page 4: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Today’s message

• Tools that fit with GigaDB– General purpose Research Object store

• Enhancing– Accessibility– Reproducibility

• Of some of your research objects– Software– images

Page 5: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Problems with scientific software - reproducibility

Page 6: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Measuring software reproducibility

• Systematic study:• 515 papers (429 conference, 86 journal)• <30% reproducible

DOI: 10.6084/m9.figshare.1330219http://reproducibility.cs.arizona.edu

Page 7: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Measuring software reproducibilityDOI: 10.6084/m9.figshare.1330219http://reproducibility.cs.arizona.edu

Page 8: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Reasons for failure

“The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.”

DOI: 10.6084/m9.figshare.1330219http://reproducibility.cs.arizona.edu

Page 9: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Cost of failure

• Waste time• Waste money

– Ioannidis 2014 – 85% resources wasted

• Frustrating• Distrust

DOI: 10.6084/m9.figshare.1330219DOI: 10.1371/journal.pmed.1001747

Page 10: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Literate programming - KnitR

Page 11: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Literate programming

• Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.– Donald E. Knuth, Literate Programming, 1984

Page 12: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Literate programming options

• See listing: http://www.gigasciencejournal.com/content/3/1/19– R: KnitR, Sweave, R-Markdown– Javascript: Tangle, Active Markdown (CoffeeScript)– Python: Ipython Notebooks – iReport links this functionality for Galaxy

DOI: 10.6084/m9.figshare.1330219

Page 13: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

KnitR is versatile

R

Python

Ruby

HaskellPerl

SAS

Coffeescript

.txt

LaTeX

HTML

D3.js

R Markdown

HTML5 slides

Command line Any text?

WordPress

Page 14: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

KnitR – how does it work?

• Code chunks– Basic text (or latex or markdown), interrupted by

‘chunks’ of code• For latex, similar to Sweave

…some text \Sexpr{rfunc(var)} more text……some text <<language, chunk_name, chunk_options>>=Some code@

• Process this combined text/code with knit() in R

Page 15: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

KnitR uses: easy to explainDOI: 10.6084/m9.figshare.1330219http://reproducibility.cs.arizona.edu

Page 16: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

KnitR uses: reproducible analysis

• Can string different tools/languages together • Stores parameters• Just like a pipeline/workflow system

– E.g. galaxy, taverna, Knime

• But also: codifies your figures…

Page 17: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

KnitR uses – codified figuresDOI: 10.6084/m9.figshare.1330219

• Classic problems:• No description of error

bars• No description of

distributions

• Admittedly this could be fixed by ‘proper’ peer review

Source code: http://bit.ly/1NQZlHh

Page 18: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

KnitR uses: codified figuresDOI: 10.6084/m9.figshare.1330219

• Code can be found quickly• Using text as markers

• Plot can be altered – 1 line of code

• New visualisation produced instantaneously

• Better evaluation of results

Source code: http://bit.ly/1NQZlHh

Page 19: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

GigaScience KnitR example• “This article is an example of a literate programming document. It has

been created in R using the knitr package. Figures and tables in this paper are generated dynamically as the document is compiled. Several R packages are required to run the analysis. Materials are archived in the Gigascience database”

DOI: 10.6084/m9.figshare.1330219DOI:10.1186/2047-217X-3-3

Page 20: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Environment wrappers - VMs

DOI: 10.6084/m9.figshare.1330219

Page 21: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Measuring software reproducibilityDOI: 10.6084/m9.figshare.1330219http://reproducibility.cs.arizona.edu

Page 22: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Your environment

• How hard would it be to start from scratch?• What if you move from Ubuntu to Centos? Or

just upgrade?

• Dependencies / Versions• System settings• Hard for you, horrendous for others!

DOI: 10.6084/m9.figshare.1330219

Page 23: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Share your environment• Virtual machine

– Copy your exact environment– If it works for you, it works for anyone– Reproducibility, frozen in time

DOI: 10.6084/m9.figshare.1330219DOI:10.1186/2047-217X-3-23

Page 24: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Share your environment

• Docker– ‘light’ vm – Discrete unit of code+environment– Can be called from command line– Can be linked together

• New possibilities e.g. nucleotid.es – Benchmarking -> “data-driven peer-review”?

DOI: 10.6084/m9.figshare.1330219http://nucleotid.es/

Page 25: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Share your environment

• Some concerns:– http://ivory.idyll.org/blog/vms-considered-harmfu

l.html– VM = black box?– Docker == black box!

Solution-> codify the environment

DOI: 10.6084/m9.figshare.1330219

Page 26: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Codify your environment

• Provisioning scripts are ‘research objects’• Improves adaptability (easier to recode for

alternative OS etc)• Builds in extra documentation• Easier to share – although GigaDB still wants a

compiled snapshot (i.e. full machine)

DOI: 10.6084/m9.figshare.1330219

Page 27: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Short list of provisioning systems

• Vagrant• Chef• Salt• Puppet• Ansible

• Many more – see link for info

DOI: 10.6084/m9.figshare.1330219Source: http://bit.ly/1wrYiuI

Page 28: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Images: release ALL the images with OMERO

“And now for something completely different”

Page 29: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

NO

Phenotyping with microCTdoi:10.1186/2047-217X-2-14

Page 30: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

NO

Phenotyping with microCTdoi:10.1186/2047-217X-3-6

Page 31: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Hosting Images• Image LIMS

• Links to GigaDB • Can handle most

formats• Web embedding

• View online, no need for software

• Open Source

www.openmicroscopy.org/site/products/omero

Page 32: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

www.openmicroscopy.org/site/products/omero

Page 33: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

OMERO: providing access to imaging data

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

Page 34: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

OMERO: Adding value http://jcb-dataviewer.rupress.org/

Page 35: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

The alternative...

...look but don't touch

Page 36: Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015

Thanks for listening!

Acknowledgements• GigaTeam

– Scott Edmunds– Peter Li– Chris Hunter– Jesse Xiao– Nicole Edmunds– Laurie Goodman

Where to get these slides• FigShare DOI: