35
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.14

WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Embed Size (px)

Citation preview

Page 1: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

WHAT ARE WE GOING TO DO WITH DATA?

Rob L DavidsonGigaScience

@bobbledavidson#WCSJ2015

This presentation DOI:10.6084/m9.figshare.1439750

Page 2: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Up ahead

• The need for Open Data in science• GigaScience and GigaDB• Everything is data• Open is accessible• Literate programming• So, what are we going to do with data?

DOI:10.6084/m9.figshare.1439750

Page 3: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

THE NEED FOR OPEN DATA IN SCIENCE

DOI:10.6084/m9.figshare.1439750

Page 4: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Researcher bias Positive result bias

20 teams do studies, 1 publishes p<0.05 Poorly explained analyses

DOI: 10.1371/journal.pmed.0020124 DOI:10.6084/m9.figshare.1439750

Page 5: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Problem: ReproducibilityOut of 18 microarray papers, results

from 10 could not be reproduced

5

DOI: 10.1038/ng.295 DOI:10.6084/m9.figshare.1439750

Page 6: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Software?http://reproducibility.cs.arizona.edu/

“The good news is that I was able to find some code. I am just hoping that it isa stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.”

613 paperstested

123 successfulreproductions

DOI:10.6084/m9.figshare.1439750

Page 7: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

DOI: 10.1371/journal.pmed.1001747

85% of research resources are wasted!

We must ... favor ... unbiased, transparent, collaborative research with greater standardization

Share data, protocols, materials, software, other tools

DOI:10.6084/m9.figshare.1439750

Page 8: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

What, why Open Data?

• Knowledge is open if anyone is – free to access, – use,– modify, – and share it – subject, at most, to measures

that preserve provenance and openness.

http://opendefinition.org/od/ DOI:10.6084/m9.figshare.1439750

Page 9: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

FAIR Datahttp://datafairport.org/ DOI:10.6084/m9.figshare.1439750

Page 10: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

GIGASCIENCE AND GIGADB

DOI:10.6084/m9.figshare.1439750

Page 11: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

The publishing tradition

18121665 1869

DOI:10.6084/m9.figshare.1439750

Page 12: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

The publishing tradition

• Aimed at paper product• Limited length• Limited detail• No supporting data• No supporting code• Poor images• Limited figures

DOI:10.6084/m9.figshare.1439750

Page 13: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Anatomy of a traditional Publication

Data

Idea

Study

Analysis

Answer

Metadata

13

DOI:10.6084/m9.figshare.1439750

Page 14: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Anatomy of an Open Data Publication

14

Data

Idea

Study

Analysis

Answer

Metadata

DOI:10.6084/m9.figshare.1439750

Page 15: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Multi-faceted publication

Open-access journal

Data Publishing Platform

Data Analysis Platform

Data

Metadata

Methods

Analyses

DOI:10.6084/m9.figshare.1439750

Page 16: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

16

DOI:10.6084/m9.figshare.1439750

Page 17: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

17

DOI:10.6084/m9.figshare.1439750

Page 18: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

18

DOI:10.6084/m9.figshare.1439750

Page 19: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Image Source: http://commons.wikimedia.org/wiki/File:System-Mechanic-California.jpg

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

19

DOI:10.6084/m9.figshare.1439750

Page 20: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

EVERYTHING IS DATA

DOI:10.6084/m9.figshare.1439750

Page 21: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Data is dataDOI:10.1186/2047-217X-3-7 DOI:10.6084/m9.figshare.1439750

Page 22: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Software is data

“For loading data from the provided datasets, a script that can load individual spectra or images is provided”

DOI: DOI:10.6084/m9.figshare.1439750

Page 23: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Metadata is dataFindable, reusable…• Bioontologies/ISA-Tab

– Standard language • ORCID

– Unique, traceable authors• Fundref

– Track funding outputs• API’s

– Easy search

DOI:10.6084/m9.figshare.1439750

Page 24: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

ACCESSIBLE, USABLE DATA

DOI:10.6084/m9.figshare.1439750

Page 25: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Curation

• Not all science data is pretty• ISA-Tab, SRA helps • Peer reviewed data is better data

http://bit.ly/1F47YZz DOI:10.6084/m9.figshare.1439750

Page 26: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Software pipelinesGigagalaxy.net

Tool List Tool Parameters History/results

DOI:10.6084/m9.figshare.1439750

Page 27: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Visualise pipelinesDOI:10.6084/m9.figshare.1439750Gigagalaxy.net

Page 28: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Reproducing results? SOAPdenovo2 S. aureus pipelineDOI: 10.1186/2047-217X-1-18 DOI:10.6084/m9.figshare.1439750

Page 29: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Easy installation• Virtual machine

– Pre-installed– Peer-reviewed– Reproducibility, frozen in time

DOI:10.1186/2047-217X-3-23 DOI:10.6084/m9.figshare.1439750

Page 30: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Literate programming• Data journalism for all!• KnitR, iPython, project Jupyter

DOI:10.1186/2047-217X-3-3 DOI:10.6084/m9.figshare.1439750

Page 31: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

WHAT ARE WE GOING TO DO WITH DATA?

DOI:10.6084/m9.figshare.1439750

Page 32: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Add valuehttp://bit.ly/1JyTfxO DOI:10.6084/m9.figshare.1439750

Page 33: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Do science?

• Data– DOI: 10.5524/100034

• Subsequent analysis– DOI: 10.1126/scitranslmed.3006086

• Science journalism– http://bit.ly/1AXEkKJ

• Why not do part 2 as well?

DOI:10.6084/m9.figshare.1439750

Page 34: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

Summary

• Science has problems – so how good can science journalism be?

• Things are changing – slowly• The future is bright• The future is data-driven• Data journalists will be the new scientists?

DOI:10.6084/m9.figshare.1439750

Page 35: WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

THANKS!

DOI:10.6084/m9.figshare.1439750

GigaScience team:Scott EdmundsPeter LiChris HunterJesse XiaoRob Davidson