66
How to share useful data Peter McQuilton Biosharing.org @drosophilic

How to share useful data

Embed Size (px)

Citation preview

Page 1: How to share useful data

How to share useful dataPeter McQuiltonBiosharing.org@drosophilic

Page 2: How to share useful data

Outline• Data sharing

• Reusability and reproducibility• How the lack of these affects scientific accountability and progress

• Experimental context• What to report – what level of granularity• How to report it – what format, structure

• Content standards• How to find them• Complying with repositories, funders and publishers

Page 3: How to share useful data

Outline• Data sharing

• Reusability and reproducibility• How the lack of these affects scientific accountability and progress

• Experimental context• What to report – what level of granularity• How to report it – what format, structure

• Content standards• How to find them• Complying with repositories, funders and publishers

Page 4: How to share useful data

Research data life cycle

Image credit to:

Page 5: How to share useful data

Credit to: ttps://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/ 2014

Better data = better science

Page 6: How to share useful data

A community mobilization for “openness”

image by Greg Emmerich

http://discovery.urlibraries.org/ https://okfn.org

Open data is a means to do better science more efficientlyhttp://pantonprinciples.org

https://creativecommons.org

Page 7: How to share useful data

Growing movement for FAIR data and research outputs

Page 8: How to share useful data

But in all fairness, not much data is FAIR!

Page 9: How to share useful data

But in all fairness, not much data is FAIR!

Page 10: How to share useful data

But in all fairness, not much data is FAIR!

Page 11: How to share useful data

“Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results”

Unfairness in both experimental and computation areas

Page 12: How to share useful data

• Not always well cited, storedo Software, codes, workflows are hard(er) to get hold of

• Poorly described for third party reuseo Different level of detail and annotation

• Curation activities are perceived as time consumingo Collection and harmonization of detailed methods and

experimental steps is rushed at the publication stage

Not very FAIR: low findability and understandability

Page 13: How to share useful data

• Effectively document your data so that it can be understood in the future

• Periodically move data to new storage media (drives degrade over time)

• Keep more than one copy of data (local and cloud)• Migrate data to new software versions• Use a well documented and supported format

Ideally this should be covered in a data management plan at the start of a project, so that you can factor any associated time and resources into your budget.

What can I do to ensure my data are shareable/usable in the future?

Page 14: How to share useful data

Outline• Data sharing

• Reusability and reproducibility• How the lack of these affects scientific accountability and progress

• Experimental context - standards• What to report – what level of granularity• How to report it – what format, structure

• Content standards• How to find them• Complying with repositories, funders and publishers

Page 15: How to share useful data

Do you know what this is?

LS1_C2_LD_TP2_P1 file1-fastq.gz

Page 16: How to share useful data

…how NOT to report the experimental information!

LS1_C2_LD_TP2_P1 file1-fastq.gz

Page 17: How to share useful data

…how NOT to report the experimental information!

Sample name (?!) Data file

LS1_C2_LD_TP2_P1 file1-fastq.gz

Page 18: How to share useful data

We need to clearly describe the information

• L S1 liver sample 1• C2 compound 2• LD low dose• TP2 time point 2• P1 protocol 1• file1-fastq.gz compressed data file for sequence

information corresponding to this sample

Sample name (?!) Data file

LS1_C2_LD_TP2_P1 file1-fastq.gz

Page 19: How to share useful data

Without context data is meaningless

Page 20: How to share useful data

Without context data is meaningless

Page 21: How to share useful data

Without context data is meaningless

Page 22: How to share useful data

Without context data is meaningless

Page 23: How to share useful data

• We need to report sufficient information to reuse the dataset

• We must strike a balance between depth and breadth of information

Information intensive experiments

Page 24: How to share useful data

Information intensive experiments

• Not too much• Not too little• ….just right

Page 25: How to share useful data

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared…

From natural language to ‘computable’ concepts

Page 26: How to share useful data

Age value?Unit?Strain nameSubject of the experimentType of diet and experimental conditionAnatomy part

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Page 27: How to share useful data

Age valueUnitStrain name?Subject of the experiment?Type of diet and experimental conditionAnatomy part

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Page 28: How to share useful data

Age valueUnitStrain nameSubject of the experimentType of diet and experimental condition?Anatomy part

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Page 29: How to share useful data

Age valueUnitStrain nameSubject of the experimentType of diet and experimental conditionAnatomy part?

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Page 30: How to share useful data

Age valueUnitStrain nameSubject of the experimentType of diet and experimental conditionAnatomy part

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Page 31: How to share useful data

Age valueUnitStrain nameSubject of the experimentType of diet and experimental conditionAnatomy part

Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Type of protocol – cell preparation

Type of protocol - sample treatment

Type of protocol – liver preparation

Page 32: How to share useful data

How do you know what to report, or how to structure it?

• Data/content standards:• Structure, enrich and report the description of the

datasets and the experimental context under which they were produced

• Facilitate the discovery, sharing, understanding and reuse of datasets

Page 33: How to share useful data

Outline• Data sharing

• Reusability and reproducibility• How the lack of these affects scientific accountability and progress

• Experimental context• What to report – what level of granularity• How to report it – what format, structure

• Content standards• How to find them• Complying with repositories, funders and publishers

Page 34: How to share useful data

193

85

346

miameMIAPA

MIRIAMMIQASMIX

MIGEN

ARRIVEMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

MAGE-TabGCDML

SRAxmlSOFT FASTA

DICOM

MzMLSBRML

SEDML…

GELML

ISA-Tab

CML

MITAB

AAO

CHEBIOBI

PATO ENVOMOD

BTOIDO…

TEDDY

PRO

XAO

DO

VO

There are over 600 content standards in the life sciences

Page 35: How to share useful data

de jure de facto

grass-rootsgroups

standard organizations

Nanotechnology Working Group

Community mobilisation to develop content standards

Page 36: How to share useful data

Databases have their own standards, e.g. at EBI:

Page 37: How to share useful data

Enablers: to better describe, share and query data

Page 38: How to share useful data

Enablers: to better describe, share and query data

• Minimum information reporting requirements, or checklists o Report the same core,

essential information

Page 39: How to share useful data

• Minimum information reporting requirements, or checklists o Report the same core,

essential information

• Controlled vocabularies, taxonomies, thesauri, ontologies etc.o Use the same word and refer to the same

‘thing’

Enablers: to better describe, share and query data

Page 40: How to share useful data

• Minimum information reporting requirements, or checklists o Report the same core,

essential information

• Controlled vocabularies, taxonomies, thesauri, ontologies etc.o Use the same word and refer to the same

‘thing’

• Conceptual model, conceptual schema, or exchange formatso Allow data to flow from one

system to another

Enablers: to better describe, share and query data

Page 41: How to share useful data

A web-based, curated and searchable registry ensuring that biological standards and databases are registered, informative and discoverable; also

monitoring the development and evolution of standards, their use in databases and the adoption of both in data policies.

Page 42: How to share useful data

Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;

Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or fund or implement

Our mission: To help people make the right choice

Page 43: How to share useful data

Three interlinked registries

Page 44: How to share useful data

Work out which format your data should be in for submission to a particular database

Page 45: How to share useful data

STANDARD DATABASE

Standards and databases (and policies) cross-linked

Page 46: How to share useful data

From simple and advanced searches

Page 47: How to share useful data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Search and filter to find what is relevant to your type of data

Page 48: How to share useful data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Tracking evolution, e.g. deprecations and substitutions

Page 49: How to share useful data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Tracking evolution, e.g. deprecations and substitutions

Page 50: How to share useful data

Create your own Collection

Page 51: How to share useful data
Page 52: How to share useful data
Page 53: How to share useful data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

53

User profiles populated from ORCID...

Page 54: How to share useful data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

54

... credit for creating, contributing to, maintaining standards, databases and policies

Ownership of open standards can be problematic in broad, grass-root collaborations

It requires improved models, to encourage maintenance of and contributions to these efforts, rewards and incentives need to be identified for all contributors to supporting the continued development of standards

Page 55: How to share useful data

What you can do with BioSharing…“Which standard should I use for this data, considering I’d

like to publish in journal X?

“Are we using the most up-to-date version of this standard?”

“My data is in X format, which databases take that format?

Page 57: How to share useful data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-projectISA powers data collection, curation resources and repositories, e.g.:

ISA

model and related formats

Page 58: How to share useful data
Page 59: How to share useful data

1

Create template(s) to fit the type of experiments to be described

Create templates detailing the steps to be reported for different investigations, complying to community standards in e.g. configuring the value(s) allowed for each field to be • text (with/without regular expressions),• ontology terms,• numbers etc.

We have ‘ready to use’ community standards compliant configurations and can create more according to

user needs

Page 60: How to share useful data

• The ISA model records the data’s provenance, how it was generated and where it is located.

• Published Data Descriptors are indexed in all major bibliographic indexing services (incl. PubMed)

• However, accompanying every Data Descriptor article there are metadata files, specifically created to aid discovery and understanding of the data itself.

• Using the ISA (Investigation, Study, Assay) model, these metadata files provide a machine readable overview of the study that generated the data.

Page 61: How to share useful data

• Filter datasets by data repository or metadata• Boolean searches

• Future enhancements: - Statistics- Richer queries based on semantics of the data

ISA-explorer: A demo tool for discovering and exploring Scientific Data’s ISA-tab metadata

Page 62: How to share useful data

ISA-explorer: A demo tool for discovering and exploring Scientific Data’s ISA-tab metadata

Visualise the dataassociated with a paper

http://tinyurl.com/isaexplorer

Page 63: How to share useful data

• Reusability and reproducibilityo Is pivotal to drive science and discoverieso Do your best to make your digital research outputs FAIR

• Experimental contexto Report the experimental context of your findingso Do to your data what you wish that others would do to theirs

• Content standardso Continuously evolvingo Make use of tools implementing standards, such as ISAtoolso Use biosharing.org to explore repositories, standards and policies

Summary

Page 64: How to share useful data

Acknowledgements

Page 65: How to share useful data

Find the right database for your data, and which data standard to use – https://www.biosharing.org

Checking your data conforms to a standard, or making your own templates – http://www.isa-tools.org

Where to keep research data: DCC checklist for evaluating data repositories (DCC) - http://tinyurl.com/DCCResearchData

How and why you should manage your research data (JISC) - http://tinyurl.com/JISCDMP

Useful links

Page 66: How to share useful data