38
Fernando Pérez (@fperez_org & [email protected]) LBL & UC Berkeley Changing our publication models Reproducible science with Jupyter

Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Fernando Pérez(@fperez_org & [email protected])

LBL & UC Berkeley

Changing our publication models

Reproducible science with Jupyter

Page 2: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Every research discipline is now awash in data

Physics:LHC

Sociology:TheWeb Biology:SequencingEconomics:POS

terminals

Neuroscience:EEG,fMRI

Astronomy: LSST Personalized,data-drivenmedicine

Page 3: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and
Page 4: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

IPython: Interactive Python, 2001

❖ Object Introspection (TAB!)

❖ OS Integration

❖ Rich terminal client

❖ GUI support (plots, …)

❖ %magic commands

❖ Embeddable

Page 5: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

The IPython/Jupyter Notebook

❖ Rich web client❖ Text & math❖ Code❖ Results❖ Share, reproduce.

Page 6: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Funding and partnerships

Page 7: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Core ideas of the web: HTTP & HTML

HTML: format to represent contentHyperText Markup Language

HTTP: protocol to connect clients and serversHyperText Transport Protocol

Image credit: eviltester.com

Page 8: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Core ideas of JupyterDocument Format

https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Interactive Computing Protocol

SUB SUB DEAL

Client

SUBDEALDEALDEAL

ROUTPUB ROUTROUTKernel

ØMQ + JSON

Page 9: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Jupyter Protocolcapture the process of interactive computing

any mime-type output

❖ text

❖ svg, png, jpeg

❖ latex, pdf

❖ html, javascript

❖ interactive widgets

Page 10: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Jupyter Protocolis language agnostic

u alj i

~75 different kernels: https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages

Page 11: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Notebook: a data structure

Page 12: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Reproducible Research

An article about computational science in a scientific publication is not the scholarship itself, it is merely

advertising of the scholarship. The actual scholarship is the complete software development environment

and the complete set of instructions which generated the figures.

Buckheit and Donoho, WaveLab and Reproducible Research, 1995

Page 13: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Nature: “the advertising”

Gross, Andrew M., et al. Nature genetics 46.9 (2014): 939-943.

Page 14: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Notebooks on Github: the “actual scolarship”

Page 15: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Reproducible Research (2012): Paper, Notebooks and Virtual Machine

http://www.nature.com/ismej/journal/v7/n3/full/ismej2012123a.html http://qiime.org/home_static/nih-cloud-apr2012

Page 16: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

mybinder.org

github.com/freeman-lab

Andrew Osheroff’s SciPy’16 talk:https://www.youtube.com/watch?v=OK6M4w7LYIc

github.com/andrewosh

Page 17: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Gravitational waves detected on Jupyter!

From LIGO Open Science Center, binder-ified: github.com/minrk/ligo-binder

Page 18: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

LIGO: Open Science with Jupyter

Page 19: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

The future of reproducible science?

Page 20: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Global scientific output doubles every nine years

Bornmann & Mutz, arxiv.org/1402.4578, Nature News Blog, May 2014

Page 21: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and
Page 22: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Who is reading the literature?

Larivière & Gingras, arxiv.org/0809.5250

Page 23: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

The scientific literature, today

We are conflating two things:

1. Communication of ideas for others to build upon (hence, reproducibility)

2. Professional credit

Page 24: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and
Page 25: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

The literature will be read by the machines

LIGO GW150914 analysis as Jupyter Notebook. 1,000,000+ of these on

Page 26: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Let’s “publish” less so we can read more!

Page 27: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

What if…

❖ All our daily work was captured in a way the machines could read…

❖ annotated with rich metadata…

❖ natural language, code, results and data all linked…

❖ easy for the machines to mine for discovery and credit…

❖ and less frequent highlights were written in long form, also backed by their “real scholarship” (à la Donoho)?

Page 28: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

What would that look like?❖ “Executable preprints/blog posts”

❖ Capture rapid progress, expose data and software

❖ Fully reproducible: build scientific community and knowledge

❖ With DOIs - citable as needed.

❖ Peer-reviewed papers:

❖ less frequent, high-quality narratives

❖ real synthesis of important ideas

Page 29: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

But in recent months, I received reviews of my own submitted papers that suggest reviewers simply did not read the manuscript properly.

[…]To protect quality reviewing, a hybrid model should be considered. I

suggest a two-tier system, in which some papers are not reviewed before publication at all and are instead subject to a post-publication peer review.

Page 30: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

The “scientific paper of the future”

Victoria Stodden

Titus Brown

Yolanda Gil

The Geoscience Papers of the Future (GPF) is an initiative to encourage geoscientists to publish papers together with the associated digital

products of their research

Page 31: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Some new developments in Jupyter’s orbit…

Page 32: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

version control for notebooks?

Page 33: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

nbdime to the rescue!

(notebook diff and merge: https://github.com/jupyter/nbdime

Page 34: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

JupyterLab: the notebook, evolved…

Page 35: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

The “Notebook”?

Page 36: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

JupyterLab: unifying these ideas

Brian, Jason, Steven, Darian,Sylvain, Carol, Cameron, Farica, Paul, Reese, Kyle,Chris, Ian, Matthias, …

A Collaborative effort:

Page 37: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and
Page 38: Reproducible Changing our science with Jupyter...Core ideas of the web: HTTP & HTML HTML: format to represent content HyperText Markup Language HTTP: protocol to connect clients and

Live Demo!Demo credits/thank you:

Brian Granger (Cal Poly SLO)Jason Grout (Bloomberg)