Upload
eustace-chambers
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
WHAT ARE WE GOING TO DO WITH DATA?
Rob L DavidsonGigaScience
@bobbledavidson#WCSJ2015
This presentation DOI:10.6084/m9.figshare.1439750
Up ahead
• The need for Open Data in science• GigaScience and GigaDB• Everything is data• Open is accessible• Literate programming• So, what are we going to do with data?
DOI:10.6084/m9.figshare.1439750
THE NEED FOR OPEN DATA IN SCIENCE
DOI:10.6084/m9.figshare.1439750
Researcher bias Positive result bias
20 teams do studies, 1 publishes p<0.05 Poorly explained analyses
DOI: 10.1371/journal.pmed.0020124 DOI:10.6084/m9.figshare.1439750
Problem: ReproducibilityOut of 18 microarray papers, results
from 10 could not be reproduced
5
DOI: 10.1038/ng.295 DOI:10.6084/m9.figshare.1439750
Software?http://reproducibility.cs.arizona.edu/
“The good news is that I was able to find some code. I am just hoping that it isa stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.”
613 paperstested
123 successfulreproductions
DOI:10.6084/m9.figshare.1439750
DOI: 10.1371/journal.pmed.1001747
85% of research resources are wasted!
We must ... favor ... unbiased, transparent, collaborative research with greater standardization
Share data, protocols, materials, software, other tools
DOI:10.6084/m9.figshare.1439750
What, why Open Data?
• Knowledge is open if anyone is – free to access, – use,– modify, – and share it – subject, at most, to measures
that preserve provenance and openness.
http://opendefinition.org/od/ DOI:10.6084/m9.figshare.1439750
FAIR Datahttp://datafairport.org/ DOI:10.6084/m9.figshare.1439750
GIGASCIENCE AND GIGADB
DOI:10.6084/m9.figshare.1439750
The publishing tradition
18121665 1869
DOI:10.6084/m9.figshare.1439750
The publishing tradition
• Aimed at paper product• Limited length• Limited detail• No supporting data• No supporting code• Poor images• Limited figures
DOI:10.6084/m9.figshare.1439750
Anatomy of a traditional Publication
Data
Idea
Study
Analysis
Answer
Metadata
13
DOI:10.6084/m9.figshare.1439750
Anatomy of an Open Data Publication
14
Data
Idea
Study
Analysis
Answer
Metadata
DOI:10.6084/m9.figshare.1439750
Multi-faceted publication
Open-access journal
Data Publishing Platform
Data Analysis Platform
Data
Metadata
Methods
Analyses
DOI:10.6084/m9.figshare.1439750
“Deconstructed”Journal
“Regular”Journal
“Conscientious” Online Journal
16
DOI:10.6084/m9.figshare.1439750
“Deconstructed”Journal
“Regular”Journal
“Conscientious” Online Journal
17
DOI:10.6084/m9.figshare.1439750
“Deconstructed”Journal
“Regular”Journal
“Conscientious” Online Journal
18
DOI:10.6084/m9.figshare.1439750
Image Source: http://commons.wikimedia.org/wiki/File:System-Mechanic-California.jpg
“Deconstructed”Journal
“Regular”Journal
“Conscientious” Online Journal
19
DOI:10.6084/m9.figshare.1439750
EVERYTHING IS DATA
DOI:10.6084/m9.figshare.1439750
Data is dataDOI:10.1186/2047-217X-3-7 DOI:10.6084/m9.figshare.1439750
Software is data
“For loading data from the provided datasets, a script that can load individual spectra or images is provided”
DOI: DOI:10.6084/m9.figshare.1439750
Metadata is dataFindable, reusable…• Bioontologies/ISA-Tab
– Standard language • ORCID
– Unique, traceable authors• Fundref
– Track funding outputs• API’s
– Easy search
DOI:10.6084/m9.figshare.1439750
ACCESSIBLE, USABLE DATA
DOI:10.6084/m9.figshare.1439750
Curation
• Not all science data is pretty• ISA-Tab, SRA helps • Peer reviewed data is better data
http://bit.ly/1F47YZz DOI:10.6084/m9.figshare.1439750
Software pipelinesGigagalaxy.net
Tool List Tool Parameters History/results
DOI:10.6084/m9.figshare.1439750
Visualise pipelinesDOI:10.6084/m9.figshare.1439750Gigagalaxy.net
Reproducing results? SOAPdenovo2 S. aureus pipelineDOI: 10.1186/2047-217X-1-18 DOI:10.6084/m9.figshare.1439750
Easy installation• Virtual machine
– Pre-installed– Peer-reviewed– Reproducibility, frozen in time
DOI:10.1186/2047-217X-3-23 DOI:10.6084/m9.figshare.1439750
Literate programming• Data journalism for all!• KnitR, iPython, project Jupyter
DOI:10.1186/2047-217X-3-3 DOI:10.6084/m9.figshare.1439750
WHAT ARE WE GOING TO DO WITH DATA?
DOI:10.6084/m9.figshare.1439750
Add valuehttp://bit.ly/1JyTfxO DOI:10.6084/m9.figshare.1439750
Do science?
• Data– DOI: 10.5524/100034
• Subsequent analysis– DOI: 10.1126/scitranslmed.3006086
• Science journalism– http://bit.ly/1AXEkKJ
• Why not do part 2 as well?
DOI:10.6084/m9.figshare.1439750
Summary
• Science has problems – so how good can science journalism be?
• Things are changing – slowly• The future is bright• The future is data-driven• Data journalists will be the new scientists?
DOI:10.6084/m9.figshare.1439750
THANKS!
DOI:10.6084/m9.figshare.1439750
GigaScience team:Scott EdmundsPeter LiChris HunterJesse XiaoRob Davidson