21
Data sharing Data management The SysMO- SEEK Story Professor Carole Goble FREng FBCS CITP University of Manchester, UK [email protected]

Data sharing - Data management - The SysMO-SEEK Story

Embed Size (px)

DESCRIPTION

Professor Carole Goble, University of Manchester, talks at the RIN "Research data: policies & behaviour" event as part of a series on Research Information in Transition.

Citation preview

Page 1: Data sharing - Data management - The SysMO-SEEK Story

Data sharingData management

The SysMO-SEEK Story

Professor Carole Goble FREng FBCS CITPUniversity of Manchester, [email protected]

Page 2: Data sharing - Data management - The SysMO-SEEK Story

13 teams91 institutes, 300 scientistsMulti-site, multi-disciplinaryEach three year duration

Data generationData consumptionData analysis

Data management:Local – Shared – Long term

Pan European Systems Biology

http://www.sysmo.net

Page 3: Data sharing - Data management - The SysMO-SEEK Story
Page 4: Data sharing - Data management - The SysMO-SEEK Story

Own data solutions. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.

Extreme caution over sharing.Modellers vs experimentalist tribalism

Many institutions, many projects, overlapping memberships, changing membership. Projects ending, starting, carrying on the same, carrying on differently.

Legacy

Suspicion

Dynamics

Expert scientists, inexpert informaticians. Few resources.

Skills

Patchy standards, incomparable data, afterthought.

Data

Page 5: Data sharing - Data management - The SysMO-SEEK Story

Scientist Lab Collaborators Competitors

Programm

ePublished

Post-Publication

Pre-Publication

Page 6: Data sharing - Data management - The SysMO-SEEK Story

Data mine-ing

“my impression of researchers, and I can criticize myself in this, is that we’re much more interested in sharing data when we mean sharing somebody else’s as opposed [to] sharing ours.”

E-infrastructure - taking forward the strategy, RIN report, 2010

Page 7: Data sharing - Data management - The SysMO-SEEK Story

Competitive advantage.Adoption.

Kudos & Credit.Help.Fame.

Reputation.

Being scooped.Scrutiny.

Misinterpretation.Cost.

Blame. Reputation.

Rew

ards

Risk

s

Nature 461, 145 (10 September 2009)

1. Sharing

Page 8: Data sharing - Data management - The SysMO-SEEK Story

“It’s not ready yet”

“I need to get (another) publication first”

“We don’t have the resources or skills to prepare it for others, esp. now we finished that project”

“Its faster/easier to do it myself, and will keep the credit/control too”

“Its not described enough to be usable”

“I don’t trust the quality. Its not reliable enough. Its too noisy.

“Others won’t use it properly.” “It’s not worth my while”“They are my competitors!!”

Page 9: Data sharing - Data management - The SysMO-SEEK Story

Pseudo Sharing

Page 10: Data sharing - Data management - The SysMO-SEEK Story

2. Preparation for Use Curation StandardsReusabilityReproducibilityAccountability & QualityData discipline Silo busting

Page 11: Data sharing - Data management - The SysMO-SEEK Story

CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry

ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist

BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditionshttp://www.mibbi.org/index.php/MIBBI_portal

Minimum Information for Biological and Biomedical Investigations

Metadata Minefield

Page 12: Data sharing - Data management - The SysMO-SEEK Story

http://usefulchem.wikispaces.com/page/code/EXPLAN001

http://www.mygrid.org.uk/tools/taverna/

Publishing Process

modelssoftware

methods

scripts

http://openwetware.org

standard operating procedures

Page 13: Data sharing - Data management - The SysMO-SEEK Story

Community Curation Responsiblity

Page 14: Data sharing - Data management - The SysMO-SEEK Story

Blue Collar ScienceJohn Quackenbush

Difficult and time consuming

Poor Creditor Reward

Shabby CareerPaths & Prospects

Page 15: Data sharing - Data management - The SysMO-SEEK Story

3. Credit Crisis• Reward sharing, curation and

reuse rather than reinvention. • Credit. Attribution. Citation.• For software, methods and

standards too.

• Technical (DataCite.org).• Cultural (Respected policy).• Institutional.• Funding bodies.

Page 16: Data sharing - Data management - The SysMO-SEEK Story

4. Infrastructure, Capability & Capacity• Three year

PhD/project cycle• Local data control• Realistic paths to

adoption by busy people.

• Spreadsheets, wikis, catalogues and yellow pages.

• Content and Tools

Page 17: Data sharing - Data management - The SysMO-SEEK Story

http://www.biosharing.org

Identity ManagementSharednames DataCiteLSID DOIs ORCID

5. Data Ecosystem

Resources

Page 18: Data sharing - Data management - The SysMO-SEEK Story

6. Sustained Resources• Three year projects.• Three year lifespan of data (and its software).• Sunsets and Sustains• Reinvention rewarded

• Institution.• Funding councils.• Funding panels.• Publishers• Libraries• National data centres• International data centres

Page 19: Data sharing - Data management - The SysMO-SEEK Story

Incentives.Sensitivity to Behaviours

Infrastructure

Community building

Trusted service

CoordinationGovernance

Policy

Capability

Community Integration

Page 20: Data sharing - Data management - The SysMO-SEEK Story

A Partnership• Software engineers• Computational scientists• Experimental Scientists• Domain informaticians• Service providers• Funding agencies

• But the community credit crisis continues….

Page 21: Data sharing - Data management - The SysMO-SEEK Story

Summary• Science is a complex social activity

undertaken by tribes of people and dominated by trust issues.

• Infrastructure has to be there and fit for purpose but its not the real the problem.

• Need a cultural shift (on all sides) that truly honours data.