31
EGU, 23 April 2012, Najla Rettberg, OpenAIRE, University of Göttingen, Linking Data to Open Access Publications

EGU, 23 April 2012, Najla Rettberg, OpenAIRE, University of Göttingen, Linking Data to Open Access Publications

Embed Size (px)

Citation preview

EGU, 23 April 2012, Najla Rettberg, OpenAIRE, University of Göttingen,

Linking Data to Open Access Publicati ons

In 12 Minutes….

OpenAIRE – Publications and Data

Demonstrators for Enhanced Publications

Use Case Scenarios

Services for Users

EGU, April 23 20122

OpenAIRE – Second Phase

Open Access, participatory infrastructure for scientific information linking publications, datasets, funding

Disseminates OA/RDM information in Europe

Opens its content (search, browse, stats) and to 3rd-party/Service providers

Capitalizes on the OpenAIRE infrastructure, built for Open Access pilot, FP7-funded articles (measuring the impact of EC SC39)

EGU, April 23 20123

Portal:Search, Access, Deposit

EGU, April 23 20124

Past, present and OpenAIREplus

5

Publication repositories networkInstitutional & Thematic

FP7 publications

EC Project metadata

National Project metadata

National funding publications

Driver Guidelines OpenAIRE Guidelines v1.0

OpenAIRE Guidelines v2.0

Dataset repositories

Metadata on data sets

OpenAIRE+ Guidelines for Data Providers

OpenAIREplus

EGU, April 23 2012

5,600,000 OA publications311 validated repositories

OA Publication Infrastructure

Open Data Infrastructures

EGU, April 23 2012

ES

FR

i, EU

wid

e in

frastructu

res

Covering ‘European Knowledge’

6

A ‘Static‘ publication

<Slide from Jens Klump

Enhanced Publicati ons (EPs)

Compound information objects: represent the aggregation of distinct information objects through meaningful relationships

Example of SURF-EPs: textual publications enhanced with links to datasets

OpenAIREplus provides EP services:

Management: creation and curation

Visualization, browsing, querying

Import: OAI-PMH/ORE harvesting of EPs from external providers

Export: OAI-PMH/ORE publishing of EPs, Linked Data representation

EGU, April 23 20128

‘Information in Context’

EGU, April 23 20129

Attempt at a generic workflow

No one-size fits all for data– Use different data types, PIs, policies, access levels,

standards

Look at research driven disciplines, different communities

Incremental, based on prototypes

“..any roadmap for OA infrastructure must address this natural tension between diversity and infrastructure”

C. Meier zu Verl, & W. Horstmann (Eds.) 2011. Studies on Subject-Specific Requirements for Open Access Infrastructure.

Cross-discipline approach

EGU, April 23 201210

Subject-specific pilots

Learning lessons from interoperation of data infrastructures– Interoperability pilots between OpenAIREplus and subject-

specific infrastructures In the Life Sciences In the Social Sciences

– Exploitation in modelling and implementation for OpenAIRE data model Relationship entities: projects, publications, datasets

EGU, April 23 201211

The Challenges

Aggregation and Discovery of resources

Representation of diverse disciplines in a ‚generic‘ infrastructure

Access restrictions/reuse policies

User friendly way for Researchers to link research results with project information

Machine-readable (Linked Open Data)

EGU, April 23 201212

Two disciplines…

SSH - DANS/EASY– Produce handmade EP‘s at file level– Experienced data modelling and research work (Veteran

tapes)

Life Sciences – EMBL-EBI– Text mine abstracts/full texts– Link bio-entities to database– Enriched information could be transfered to generic

infrastructure

EGU, April 23 201213

Demonstrator

Data model – Generalised

Extract citation info for datasets– from e.g UniProt and full text

Derive Persistent Identifiers – from URLs (URNs and PMC-Ids)

Transfer of linked entities– community services and OpenAIRE infrastructure

EGU, April 23 201214

Use Cases

1. Import EP created in DANS or SURF– Proof of Services Interoperability

EGU, April 23 201215

Use Cases

1. Import EP created in DANS or SURF– Proof of Services Interoperability

2. Manual composition of EP in OpenAIRE– Proof of Tools: Editor, Discovery of Research data in OpenAIRE

EGU, April 23 201216

Use Cases

1. Import EP created in DANS or SURF– Proof of Services Interoperability

2. Manual composition of EP in OpenAIRE– Proof of Tools: Editor, Discovery of Research data in OpenAIRE

3. Automatic generation of EP by extracting citation information (or mining), auto-linking– Proof that rich metadata can be represented in user-friendly

way– Possible Linked Open Data compliancy

EGU, April 23 201217

Use Cases

4. Reuse and enrichment: annotations added by users to datasets or publications – An EP is used by researcher in publication– Adequate documentation– Test legal framework – Study into Licensing of publications and data

Analyse requirements of legal protection of research data Legal prototype of restraints

EGU, April 23 201218

Research Scenario 1

1. You are an EC-project researcher– OA publication– Dataset with a DOI– Generate the link in OpenAIRE

2. Researcher completes data output with paper– No data repository– Submit dataset to OpenAIRE ‚orphan‘ repository

EGU, April 23 201219

Research Scenario 2

You search for ‚mouse genome literature‘ in OpenAIRE– Find a citation for publication– funding details of project– Related data, say a protein link to GenBank– Create your own links to this

EGU, April 23 201220

Service acti viti es

For publication providers - OpenAIRE’s Guidelines for repository managers

– Metadata: (DC) and Protocols: (OAI etc.)

For data providers: accessing (metadata of) datasets from providers while minimizing effort to comply

– Metadata: indications on minimal metadata about datasets (e.g., identifiers, date of creations, title, URLs) and best-practices for interlinking datasets and publications

– Access protocols: no requirements for adopting precise protocols (e.g., OAI, FTP) or ID/URL frameworks (e.g., OpenURL, DOI) to comply

EGU, April 23 201221

Service activitiesUsers

Registered end-users (e.g., EC personnel, project coordinators, researchers, authors)

– Search, browse and access statistics

– Deposit files and metadata of publications and datasets into the Orphan Repository

– Ingest (claim) into the information space metadata

– Create EP by combining datasets from different communities

– Reuse of datasets as secondary data (with respect to IPR)

22 EGU, April 23 2012

Service activitiesUsers

Content provider managers (e.g. datasets and publications repository managers)

– Registration and validation (OpenAIREPlus guidelines) of publication and dataset repositories

Data curators (administrative tasks)

– Collect and aggregate publications, project data and dataset metadata

Third-party application developers

– Bulk-fetch content from the (curated) information space

23 EGU, April 23 2012

The Future…..

“Forget PDFs, imagine an ideal publication where you

click on tables to get through to raw data, where you can

contribute and discuss some aspects and later update or

correct parts of a paper in subsequent versions. The latter

is similar to Wikipedia, actually.”

– PhD Student, UGOE

EGU, April 23 201224

Danke…...– [email protected]– @openaire_eu

EGU, April 23 201225

Linking: Publication to Database

EGU, April 23 201226

Author supplied Supplementary info: TIFF,MOV

EGU, April 23 2012

PLoS: O’Toole, Greenan, Lange, Srayko, Müller-Reichert

27

Research Impact

OpenAIRE puts foundations to measure research impact per publication, researcher, project, institution, country, …

EGU, April 23 201228

Data Management Issues

Good data practices

Data policies, standards

Drivers for deposit? What‘s in it for researchers?

Work with publishers, DOIs

Where do researchers deposit data? Figshare?

EGU, April 23 201229

• Potential issues: unstructured data with different kinds of media files

• Persistent IDs: resolvable and managed by the originator of resource

• Preservation: responsibility lies in the trusted repositories

EGU, April 23 201230

Demonstrators

Demonstrators for Enhanced Publications– Explore how links are managed between publications and research data in Life

Sciences and SSH– How data can be mutually complemented and exchanged in generic

infrastructures– Example: how a publication ‚reported‘ in OpenAIRE is enriched via UKPMC with

links to databases

Report: „Connection Data and Publications through e-Infrastructure“

EGU, April 23 201231