Upload
tobias-floyd
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
EGU, 23 April 2012, Najla Rettberg, OpenAIRE, University of Göttingen,
Linking Data to Open Access Publicati ons
In 12 Minutes….
OpenAIRE – Publications and Data
Demonstrators for Enhanced Publications
Use Case Scenarios
Services for Users
EGU, April 23 20122
OpenAIRE – Second Phase
Open Access, participatory infrastructure for scientific information linking publications, datasets, funding
Disseminates OA/RDM information in Europe
Opens its content (search, browse, stats) and to 3rd-party/Service providers
Capitalizes on the OpenAIRE infrastructure, built for Open Access pilot, FP7-funded articles (measuring the impact of EC SC39)
EGU, April 23 20123
Past, present and OpenAIREplus
5
Publication repositories networkInstitutional & Thematic
FP7 publications
EC Project metadata
National Project metadata
National funding publications
Driver Guidelines OpenAIRE Guidelines v1.0
OpenAIRE Guidelines v2.0
Dataset repositories
Metadata on data sets
OpenAIRE+ Guidelines for Data Providers
OpenAIREplus
EGU, April 23 2012
5,600,000 OA publications311 validated repositories
OA Publication Infrastructure
Open Data Infrastructures
EGU, April 23 2012
ES
FR
i, EU
wid
e in
frastructu
res
Covering ‘European Knowledge’
6
Enhanced Publicati ons (EPs)
Compound information objects: represent the aggregation of distinct information objects through meaningful relationships
Example of SURF-EPs: textual publications enhanced with links to datasets
OpenAIREplus provides EP services:
Management: creation and curation
Visualization, browsing, querying
Import: OAI-PMH/ORE harvesting of EPs from external providers
Export: OAI-PMH/ORE publishing of EPs, Linked Data representation
EGU, April 23 20128
Attempt at a generic workflow
No one-size fits all for data– Use different data types, PIs, policies, access levels,
standards
Look at research driven disciplines, different communities
Incremental, based on prototypes
“..any roadmap for OA infrastructure must address this natural tension between diversity and infrastructure”
C. Meier zu Verl, & W. Horstmann (Eds.) 2011. Studies on Subject-Specific Requirements for Open Access Infrastructure.
Cross-discipline approach
EGU, April 23 201210
Subject-specific pilots
Learning lessons from interoperation of data infrastructures– Interoperability pilots between OpenAIREplus and subject-
specific infrastructures In the Life Sciences In the Social Sciences
– Exploitation in modelling and implementation for OpenAIRE data model Relationship entities: projects, publications, datasets
EGU, April 23 201211
The Challenges
Aggregation and Discovery of resources
Representation of diverse disciplines in a ‚generic‘ infrastructure
Access restrictions/reuse policies
User friendly way for Researchers to link research results with project information
Machine-readable (Linked Open Data)
EGU, April 23 201212
Two disciplines…
SSH - DANS/EASY– Produce handmade EP‘s at file level– Experienced data modelling and research work (Veteran
tapes)
Life Sciences – EMBL-EBI– Text mine abstracts/full texts– Link bio-entities to database– Enriched information could be transfered to generic
infrastructure
EGU, April 23 201213
Demonstrator
Data model – Generalised
Extract citation info for datasets– from e.g UniProt and full text
Derive Persistent Identifiers – from URLs (URNs and PMC-Ids)
Transfer of linked entities– community services and OpenAIRE infrastructure
EGU, April 23 201214
Use Cases
1. Import EP created in DANS or SURF– Proof of Services Interoperability
EGU, April 23 201215
Use Cases
1. Import EP created in DANS or SURF– Proof of Services Interoperability
2. Manual composition of EP in OpenAIRE– Proof of Tools: Editor, Discovery of Research data in OpenAIRE
EGU, April 23 201216
Use Cases
1. Import EP created in DANS or SURF– Proof of Services Interoperability
2. Manual composition of EP in OpenAIRE– Proof of Tools: Editor, Discovery of Research data in OpenAIRE
3. Automatic generation of EP by extracting citation information (or mining), auto-linking– Proof that rich metadata can be represented in user-friendly
way– Possible Linked Open Data compliancy
EGU, April 23 201217
Use Cases
4. Reuse and enrichment: annotations added by users to datasets or publications – An EP is used by researcher in publication– Adequate documentation– Test legal framework – Study into Licensing of publications and data
Analyse requirements of legal protection of research data Legal prototype of restraints
EGU, April 23 201218
Research Scenario 1
1. You are an EC-project researcher– OA publication– Dataset with a DOI– Generate the link in OpenAIRE
2. Researcher completes data output with paper– No data repository– Submit dataset to OpenAIRE ‚orphan‘ repository
EGU, April 23 201219
Research Scenario 2
You search for ‚mouse genome literature‘ in OpenAIRE– Find a citation for publication– funding details of project– Related data, say a protein link to GenBank– Create your own links to this
EGU, April 23 201220
Service acti viti es
For publication providers - OpenAIRE’s Guidelines for repository managers
– Metadata: (DC) and Protocols: (OAI etc.)
For data providers: accessing (metadata of) datasets from providers while minimizing effort to comply
– Metadata: indications on minimal metadata about datasets (e.g., identifiers, date of creations, title, URLs) and best-practices for interlinking datasets and publications
– Access protocols: no requirements for adopting precise protocols (e.g., OAI, FTP) or ID/URL frameworks (e.g., OpenURL, DOI) to comply
EGU, April 23 201221
Service activitiesUsers
Registered end-users (e.g., EC personnel, project coordinators, researchers, authors)
– Search, browse and access statistics
– Deposit files and metadata of publications and datasets into the Orphan Repository
– Ingest (claim) into the information space metadata
– Create EP by combining datasets from different communities
– Reuse of datasets as secondary data (with respect to IPR)
22 EGU, April 23 2012
Service activitiesUsers
Content provider managers (e.g. datasets and publications repository managers)
– Registration and validation (OpenAIREPlus guidelines) of publication and dataset repositories
Data curators (administrative tasks)
– Collect and aggregate publications, project data and dataset metadata
Third-party application developers
– Bulk-fetch content from the (curated) information space
23 EGU, April 23 2012
The Future…..
“Forget PDFs, imagine an ideal publication where you
click on tables to get through to raw data, where you can
contribute and discuss some aspects and later update or
correct parts of a paper in subsequent versions. The latter
is similar to Wikipedia, actually.”
– PhD Student, UGOE
EGU, April 23 201224
Danke…...– [email protected]– @openaire_eu
EGU, April 23 201225
Author supplied Supplementary info: TIFF,MOV
EGU, April 23 2012
PLoS: O’Toole, Greenan, Lange, Srayko, Müller-Reichert
27
Research Impact
OpenAIRE puts foundations to measure research impact per publication, researcher, project, institution, country, …
EGU, April 23 201228
Data Management Issues
Good data practices
Data policies, standards
Drivers for deposit? What‘s in it for researchers?
Work with publishers, DOIs
Where do researchers deposit data? Figshare?
EGU, April 23 201229
• Potential issues: unstructured data with different kinds of media files
• Persistent IDs: resolvable and managed by the originator of resource
• Preservation: responsibility lies in the trusted repositories
EGU, April 23 201230
Demonstrators
Demonstrators for Enhanced Publications– Explore how links are managed between publications and research data in Life
Sciences and SSH– How data can be mutually complemented and exchanged in generic
infrastructures– Example: how a publication ‚reported‘ in OpenAIRE is enriched via UKPMC with
links to databases
Report: „Connection Data and Publications through e-Infrastructure“
EGU, April 23 201231