34
Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774 DOI: 10.6084/m9.figshare.1368

Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Embed Size (px)

Citation preview

Page 1: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Tools for reproducible and accessible science

KnitR, VMs and OMERORob Davidson

Cardiac Physiome WorkshopAuckland, April 8th 2015

DOI for this talk: 10.6084/m9.figshare.1368774

DOI: 10.6084/m9.figshare.1368774

Page 2: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Today’s message

• Tools that fit with GigaDB– General purpose Research Object store

• Enhancing– Accessibility– Reproducibility

• Of some of your research objects– Software– images

DOI: 10.6084/m9.figshare.1368774

Page 3: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Problems with scientific software - reproducibility

DOI: 10.6084/m9.figshare.1368774

Page 4: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Measuring software reproducibility

• Systematic study:• 515 papers (429 conference, 86 journal)• <30% reproducible

http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774

Page 5: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Measuring software reproducibilityhttp://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774

Page 6: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Reasons for failure

“The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.”

http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774

Page 7: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Cost of failure

• Waste time• Waste money

– Ioannidis 2014 – 85% resources wasted

• Frustrating• Distrust

DOI: 10.1371/journal.pmed.1001747 DOI: 10.6084/m9.figshare.1368774

Page 8: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Literate programming - KnitR

DOI: 10.6084/m9.figshare.1368774

Page 9: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Literate programming

• Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.– Donald E. Knuth, Literate Programming, 1984

DOI: 10.6084/m9.figshare.1368774

Page 10: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Literate programming options

• See listing: http://www.gigasciencejournal.com/content/3/1/19– R: KnitR, Sweave, R-Markdown– Javascript: Tangle, Active Markdown (CoffeeScript)– Python: Ipython Notebooks – iReport links this functionality for Galaxy

DOI: 10.6084/m9.figshare.1368774

Page 11: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

KnitR is versatile

R

Python

Ruby

HaskellPerl

SAS

Coffeescript

.txt

LaTeX

HTML

D3.js

R Markdown

HTML5 slides

Command line Any text?

WordPress

DOI: 10.6084/m9.figshare.1368774

Page 12: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

KnitR – how does it work?

• Code chunks– Basic text (or latex or markdown), interrupted by

‘chunks’ of code• For latex, similar to Sweave

…some text \Sexpr{rfunc(var)} more text……some text <<language, chunk_name, chunk_options>>=Some code@

• Process this combined text/code with knit() in R

DOI: 10.6084/m9.figshare.1368774

Page 13: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

KnitR uses: easy to explainhttp://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774

Page 14: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

KnitR uses: reproducible analysis

• Can string different tools/languages together • Stores parameters• Just like a pipeline/workflow system

– E.g. galaxy, taverna, Knime

• But also: codifies your figures…

DOI: 10.6084/m9.figshare.1368774

Page 15: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

KnitR uses – codified figures

• Classic problems:• No description of error

bars• No description of

distributions

• Admittedly this could be fixed by ‘proper’ peer review

Source code: http://bit.ly/1NQZlHh DOI: 10.6084/m9.figshare.1368774

Page 16: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

KnitR uses: codified figures

• Code can be found quickly• Using text as markers

• Plot can be altered – 1 line of code

• New visualisation produced instantaneously

• Better evaluation of results

Source code: http://bit.ly/1NQZlHh DOI: 10.6084/m9.figshare.1368774

Page 17: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

GigaScience KnitR example• “This article is an example of a literate programming document. It has

been created in R using the knitr package. Figures and tables in this paper are generated dynamically as the document is compiled. Several R packages are required to run the analysis. Materials are archived in the Gigascience database”

DOI:10.1186/2047-217X-3-3 DOI: 10.6084/m9.figshare.1368774

Page 18: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Environment wrappers - VMs

DOI: 10.6084/m9.figshare.1368774

Page 19: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Measuring software reproducibilityhttp://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774

Page 20: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Your environment

• How hard would it be to start from scratch?• What if you move from Ubuntu to Centos? Or

just upgrade?

• Dependencies / Versions• System settings• Hard for you, horrendous for others!

DOI: 10.6084/m9.figshare.1368774

Page 21: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Share your environment• Virtual machine

– Copy your exact environment– If it works for you, it works for anyone– Reproducibility, frozen in time

DOI:10.1186/2047-217X-3-23 DOI: 10.6084/m9.figshare.1368774

Page 22: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Share your environment

• Docker– ‘light’ vm – Discrete unit of code+environment– Can be called from command line– Can be linked together

• New possibilities e.g. nucleotid.es – Benchmarking -> “data-driven peer-review”?

http://nucleotid.es/ DOI: 10.6084/m9.figshare.1368774

Page 23: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Share your environment

• Some concerns:– http://ivory.idyll.org/blog/vms-considered-harmfu

l.html– VM = black box?– Docker == black box!

Solution-> codify the environment

DOI: 10.6084/m9.figshare.1368774

Page 24: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Codify your environment

• Provisioning scripts are ‘research objects’• Improves adaptability (easier to recode for

alternative OS etc)• Builds in extra documentation• Easier to share – although GigaDB still wants a

compiled snapshot (i.e. full machine)

DOI: 10.6084/m9.figshare.1368774

Page 25: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Short list of provisioning systems

• Vagrant• Chef• Salt• Puppet• Ansible

• Many more – see link for info

Source: http://bit.ly/1wrYiuI DOI: 10.6084/m9.figshare.1368774

Page 26: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Images: release ALL the images with OMERO

“And now for something completely different”

DOI: 10.6084/m9.figshare.1368774

Page 27: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

NO

Phenotyping with microCTdoi:10.1186/2047-217X-2-14 DOI: 10.6084/m9.figshare.1368774

Page 28: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

NO

Phenotyping with microCTdoi:10.1186/2047-217X-3-6 DOI: 10.6084/m9.figshare.1368774

Page 29: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Hosting Images• Image LIMS

• MetaData!!!• Can handle most

formats• Web embedding

• View online, no need for software

• Open Source

www.openmicroscopy.org/site/products/omero DOI: 10.6084/m9.figshare.1368774

Page 30: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

www.openmicroscopy.org/site/products/omero DOI: 10.6084/m9.figshare.1368774

Page 31: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

OMERO: providing access to imaging data

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

DOI: 10.6084/m9.figshare.1368774

Page 32: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

OMERO: Adding value http://jcb-dataviewer.rupress.org/ DOI: 10.6084/m9.figshare.1368774

Page 33: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

The alternative...

...look but don't touch

DOI: 10.6084/m9.figshare.1368774

Page 34: Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774

Thanks for listening!

Acknowledgements• GigaTeam

– Scott Edmunds– Peter Li– Chris Hunter– Jesse Xiao– Nicole Edmunds– Laurie Goodman

Where to get these slides• FigShare DOI:

– 10.6084/m9.figshare.1368774

• http://bit.ly/1JmnRiU

DOI: 10.6084/m9.figshare.1368774