BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May 2017

Preview:

Citation preview

Susanna-Assunta Sansone,Associate Director, Oxford e-Research Centre,

University of Oxford, UKdx.doi.org/10.6084/m9.figshare.4055496.v1

@biosharing

bioCADDIE – DATS and CDEs Workshop, Bethesda, 8 May 2017

Formats Terminologies Guidelines

CommonData

Elements

Types of content standards

Content standards: descriptors essential for interpretation, verification, reproducibility, sharing etc. of datasets

Minimum information reporting requirements, checklists

o Report the same core, essential information

o e.g. MIAME guidelines

Controlled vocabularies, taxonomies, thesauri, ontologies etc.

o Unambiguous identification and definition of concepts

o e.g. Gene Ontology

Conceptual model, schema, exchange formats etc

o Define the structure and interrelation of information, and the transmission format

o e.g. FASTA Formats Terminologies Guidelines

Types of content standards

CommonData

Elements

de jure de factograss-roots

groupsstandard

organizations

Nanotechnology Working Group

Formats Terminologies Guidelines

Community-driven efforts, just few examples

Formats Terminologies Guidelines

224

115

500+

source sourcesource

MIAMEMIRIAM

MIQASMIXMIGEN

ARRIVEMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

SRAxml

SOFT FASTADICOM

MzMLSBRML

SEDML…

GELML

ISA

CML

MITAB

AAOCHEBIOBI

PATO ENVOMOD

BTOIDO…

TEDDY

PROXAO

DO

VO

Content standards in numbers

Aweb-based,curatedandsearchableportalthat monitorsthedevelopment and

evolution ofstandards,theiruse indatabases andtheadoptionofbothindata

policies,toinform andeducate theusercommunity

Data policies by funders, journals and other organizations

Content standards

Formats Terminologies Guidelines

Map this complex and evolving landscape

Databases

Data policies by funders, journals and other organizations

Databases

Content standards

Formats Terminologies Guidelines

Using indicators to describe ‘status’

Readyforuse,implementation,orrecommendation

Indevelopment

Statusuncertain

Deprecatedassubsumedorsuperseded

Allrecordsaremanuallycurated

in-houseandverifiedbythe

communitybehindeachresource

Understanding how standards are used

Understanding how standards are used

Guideline

Understanding how standards are used

Formats

Guideline

Understanding how standards are used

Formats

Guideline

Formats

Understanding how standards are used

Formats

Guideline

Formats

Terminology

Technologically-delineated views of the world

Biologically-delineated views of the world

Generic features (‘common core’)- description of source biomaterial- experimental design components

Arrays

Scanning Arrays &Scanning

Columns

GelsMS MS

FTIR

NMR

Columns

transcriptomics proteomics metabolomics

plant biologyepidemiology microbiology

Duplications & lack of interoperability among standards

Arrays

Scanning Arrays &Scanning

Columns

GelsMS MS

FTIR

NMR

Columns

transcriptomics proteomics metabolomics

plant biologyepidemiology microbiology

Hard to use them in combinations, e.g. to represent:

Proteomics-based gut microbiota profiling

Proteomics and metabolomics based gut microbiota profiling

Arrays

Scanning Arrays &Scanning

Columns

GelsMS MS

FTIR

NMR

Columns

transcriptomics proteomics metabolomics

plant biologyepidemiology microbiology

Enhancing modularization

Proteomics-based gut microbiota profiling

Proteomics and metabolomics based gut microbiota profiling

Arrays

Scanning Arrays &Scanning

Columns

GelsMS MS

FTIR

NMR

Columns

transcriptomics proteomics metabolomics

plant biologyepidemiology microbiology

Proteomics-based gut microbiota profiling

Proteomics and metabolomics based gut microbiota profiling

Enhancing modularization

bsg-000174

biosharing:ReportingGuideline

bsg-000161

MINSEQE

MIMARKS

sample information

sample identifier

taxonomyidentifier

sequence read

geo location

High-level information about the metadata standards

Representations of the standards elements

Template elementsfor

el-000001

el-000002

el-000003

provenance: MINSEQE

provenance: MINSEQE

and MIMARKS

provenance:MIMARKS

• Serve machine-readable content metadata standards, providing provenance for their elements• Inform the creation of metadata templates, rendering standards invisible to the researchers

Modularize and combine

Standard developing groups:Journal, publishers:

Cross-links, data exchange:

Societies and organisations: Institutional RDM services:

Projects, programmes:

Recommended