36
The Dendro research data management platform Applying ontologies to long-term preservation in a collaborative environment João Rocha da Silva [email protected] Faculdade de Engenharia da Universidade do Porto / INESC TEC João Aguiar Castro [email protected] Cristina Ribeiro [email protected] DEI—Faculdade de Engenharia da Universidade do Porto / INESC TEC João Correia Lopes [email protected] iPRES 2014, October 06 - 10 2014, Melbourne, Australia

The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Embed Size (px)

DESCRIPTION

It has been shown that data management should start as early as possible in the research workflow to minimize the risks of data loss. Given the large numbers of datasets produced every day, curators may be unable to describe them all, so researchers should take an active part in the process. However, since they are not data management experts, they must be provided with user-friendly but powerful tools to capture the context information necessary for others to interpret and reuse their datasets. In this paper, we present Dendro, a fully ontology-based collaborative platform for research data management. Its graph data model innovates in the sense that it allows domain-specific lightweight ontologies to be used in resource description, acting as a staging area for later deposit in long-term preservation solutions.

Citation preview

Page 1: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

The Dendro research data management platform

!Applying ontologies to long-term preservation in a collaborative

environment

João Rocha da Silva [email protected]

Faculdade de Engenharia da

Universidade do Porto / INESC TECJoão Aguiar Castro

[email protected]

Cristina Ribeiro [email protected] DEI—Faculdade de

Engenharia da Universidade do

Porto / INESC TECJoão Correia Lopes [email protected]

iPRES 2014, October 06 - 10 2014, Melbourne, Australia

Page 2: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Contents• Research data management in the long tail

• Linked Open Data: why do we need it?

• Collaboration for easier metadata production

• The Dendro platform

• Conclusions

2

Page 3: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Research Data Management in the long tail of research

Why we need to start early

3

Page 4: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

2011: Science magazine reviewers are asked about their data requirements

~1700 replied

The long tail of research

4

Page 5: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Dealing with data. Challenges and opportunities. Introduction. (2011). Science (New York, N.Y.), 331(6018), 692–3. doi:10.1126/science.331.6018.692

Source

5

Page 6: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Dealing with data. Challenges and opportunities. Introduction. (2011). Science (New York, N.Y.), 331(6018), 692–3. doi:10.1126/science.331.6018.692

Source

6

Page 7: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Gathering

Processing

Paper writing

Preservation, Sharing

7

Page 8: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Gathering

Processing

Paper writing

Researcher leaves

Metadata

8

Page 9: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Gathering

Processing

Paper writing

Project ends9

Page 10: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

“Where is the data?”“How / when / by whom was the data

produced?”

Gathering

Processing

Paper writing

10

Page 11: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Researchers must participate in RDM from the start

They are the domain experts

Curators cannot cope with a posteriori description

11

Page 12: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Linked Open DataWhat is it? Why do we need it?

12

Page 13: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Linked Open Data• Simplicity!

- LOD is a very simple model for representing knowledge

• Meaning!

- Resources are interlinked by properties with established meaning

• Interoperability!

- Standard methods for querying data - SPARQL

- Representations use standard formats - RDF, OWL

13

Page 14: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

!!!!

http://dendro.fe.up.pt/project/datanotes/data

nie:isLogicalPartOf

“Base data of the DCB experiments”

dc:title

base data.xls

nie:title

rdf:type

nie:File

180mm

dcb:initialCrackLength

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

14

Page 15: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Analytical Chemistry Dataset

Fracture Mechanics Dataset …

GenericAuthor

Description Creation date

Author Description

Creation date …

Domain Specific

Sample Count Analysed Substance

Initial Crack Length Specimen Type

15

Page 16: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

CollaborationFor metadata useful now and in the future

16

Page 17: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Gathering

Processing

Paper writing

Preservation, Sharing

17

Page 18: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Gathering

Deposit

“Freeze” in repository

Collaboration Description

Sharing

18

Page 19: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Gathering

…19

Page 20: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Demo

Dendroβ

20

Page 21: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

The Dendro platformAn open-source platform for Linked Open Data in

research environments

21

Page 22: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Metadata

Ontologies

• Data store fully built on Linked Data

• No relational database to preserve

• Model can grow by loading more ontologies

• External systems can retrieve resources via SPARQL

Description

22

Page 23: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Metadata

Ontologies

File Storage !

!

• GridFS cluster for large or numerous files

• Can work in the cloud if needed

Deposit

23

Page 24: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Metadata

Ontologies

File Storage !

!

Business Logic

• Flexible access control system

• Backup / Restore

• Versions history

• File type previews

• Integration • DSpace (SWORD)

• ePrints (SWORD)

• CKAN

• Figshare

• ……..

Collaboration

24

Page 25: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Metadata

Ontologies

File Storage !

!

Business Logic

API

Sharing

• All operations available via RESTful API using JSON

• All resources are de-referenceable (HTTP content negotiation)

• Plugin architecture allows integration with external systems

Web UI

25

Page 26: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

For curators• Curators can work with researchers to build more

ontologies using existing tools (e.g. Protégé)

• Established ontologies can be loaded (DC, FOAF…)

• Ontologies mature (reuse across Dendro instances)

• Data, metadata and its meaning go together

Creating lightweight ontologies for dataset description: Practical applications in a cross-domain research data management workflow Castro, J., Rocha da Silva, J., Ribeiro, C. Digital Libraries 2014 (DL2014) (pre-print available at http://dendro.fe.up.pt/)

Beyond INSPIRE: An ontology for biodiversity metadata records !Rocha da Silva, J., Castro, J., Ribeiro, C., Honrado, J., Lomba, A., Gonçalves, J. 10th International Workshop on Ontology Content (OntoContent 2014) (pre-print available at http://dendro.fe.up.pt/) 26

Page 27: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

For programmers

• 100% Open-source software

• Rich API allows Dendro to be connected to almost any system (e.g. mobile apps)

LabTablet: semantic metadata collection on a multi-domain laboratory notebook Amorim,R., Castro, J., Rocha da Silva, J., Ribeiro, C. 8th Metadata and Semantics Research Conference (MTSR 2014) (pre-print available at http://dendro.fe.up.pt/)

Ontology-based multi-domain metadata for research data management using triple stores Rocha da Silva, J., Ribeiro, C., Correia Lopes, J. 18th International Database Engineering & Applications Symposium (IDEAS 2014) (pre-print available at http://dendro.fe.up.pt/) 27

Page 28: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Triple Store Ontologies

Dendro dies, data lives on

“Database” “Documentation”28

Page 29: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Conclusions• Research data management should start early

• Linked Open Data: simple, interoperable, flexible

• Collaboration support helps researchers while gathering metadata for later deposit

• Dendro: a fully open-source platform for RDM, built on Linked Open Data

• Dendro integrates with major repository platforms

29

Page 30: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Conclusions (cont’d)

• Ontologies: source of metadata descriptors

• Data model grows as more ontologies are loaded

• Curators can model and share the ontologies

• Domain ontologies evolve with reuse

30

Page 31: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Visit us at

http://dendro.fe.up.pt

Page 32: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets.!!He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.!

PhD Student, Senior Web Developer, Semantic Web at INESC TEC

João Rocha da Silva!

João Correia Lopes is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. He has graduated in Electrical Engineering in the University of Porto in 1984 and holds a PhD in Computing Science by Glasgow University in1997. His teaching includes undergraduate and graduate courses in databases and web applications, software engineering and object-oriented programming, markup languages and semantic web. He has been involved in research projects in the area of long-term preservation, service-oriented architectures and e-Science. Currently his main research interests are e-Science and the management of research data.

Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval, digital libraries, knowledge representation and markup languages. She has been involved in research projects in the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research interests are information retrieval, digital preservation and the management of research data.

Assistant Professor in Informatics Engineering at Universidade do Porto, Researcher at INESC TEC

Cristina Ribeiro! João Correia Lopes!Assistant Professor in Informatics Engineering at Universidade do Porto, Researcher at INESC TEC

João Aguiar Castro holds a Masters degree in Information Science, and is currently a Digital Platforms PhD student at the Faculty of Engineering of the University of Porto. He is a research data management researcher, particularly in the definition of application profiles that meet the metadata needs of different research domains

PhD Student, Research Data Management researcher at INESC TEC

João Aguiar Castro!

Page 33: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Extras

Page 34: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Graph Database(LOD)

Distributed document index

File Storage Cluster

Business Logic

Web Interface

Openlink Virtuoso 7 ElasticSearch MongoDB

(GridFS)

NodeJS (JavaScript)

AngularJS (JavaScript)

DB Adapter ES Endpoint GridFS Client

Human UsersWeb

JSON JSON JSON

RDF/XML, SPARQL Endpoint

JSON API

HTML

Data

Logic

Presentation

Page 35: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

CuratedDataset

Curator

WorkingFiles

Dendro

FOAF

DC

dc:titlenie:isPartOfdcb:specimenLength

Ontology concept reuse

SPARQLEndpoint

Sharing & evolution

“Mature”ontologies on the web

Metadatavalidation

Deposit

Data producers

Free-TextSearch

API

CKANDryad

Web Portal

Domain-Specific Lightweight Ontologies

dcbdcb

Data reuser

dcb

Specification of new metadata ontologies

1

2

3

4

Page 36: The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment