73
Maximising (Re)Usability of Resources using Linked Data A. Gómez-Pérez Universidad Politécnica de Madrid [email protected] Acknowledgements: Daniel Vila, Jorge Gracia, Victor Rodríguez Doncel, Ontology Engineering Group and LIDER Consortium members

Maximising (Re)Usability of Resources using Linked Data

Embed Size (px)

Citation preview

Maximising

(Re)Usability of

Resources using

Linked Data

A. Gómez-Pérez

Universidad Politécnica de Madrid

[email protected]

Acknowledgements: Daniel Vila, Jorge Gracia, Victor Rodríguez Doncel,

Ontology Engineering Group and LIDER Consortium members

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

About us

Directors: A. Gómez-Pérez, O. Corcho

Position: 8th in the UPM ranking (200 groups)

Research Group (30 people) - 2 Full Professors

- 5 Associate Professors

- 3 Assistant Professors

- 7 Senior Postdocs

- 12 PhD Students

- 5 MSc and BSc Students

- 3 software engineers

- 1 system administrator

- 2 project managers

170+ Past Collaborators

50+ Past Visitors http://www.oeg-upm.net/

https://github.com/oeg-upm

@oeg-upm

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Ontology Engineering Group at a glance Created in1995

World-wide known in the research areas Ontologies

Semantic Web and Linked Data

Multilingual linked Data

Open Data

eScience

Projects (> 12M€) 27 EU projects (7 as coordinator)

54 National Projects

27 contracts with companies

Publications 106 journals

362 International conferences and book chapters

7 Books

Impact of publications H-index Asunción Gómez-Pérez (h:47, citations 13583)

Oscar Corcho García (h: 33, citations 7230)

Services to the Spanish community Host esDbpedia

Host linkeddata.es

Supervision of students 23 Ph.D thesis (9 awarded best thesis prize)

>150 MS.C thesis and BS.C

Events organization 11 editions of the International Summer School

on Ontological Engineering and the Semantic

Web

> 50 WS and tutorials

Standardization activities

>25 @ W3C, ISO, OASIS, etc.

Mobility PhD students: 3-6 months abroad

Postdocs: 1 month every 2 years

Visibility Program chairs of ESWC, ISWC, KCAP,

EKAW, TKE, TIA

Editorial board of Journals

Invited talks at conferences and events

Programme Committee presence

Collaboration with COM (Center Open

Middleware)

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

License

• This work is licensed under the Creative Commons

Attribution – Non Commercial – Share Alike License

• You are free:

- to Share — to copy, distribute and transmit the work

- to Remix — to adapt the work

• Under the following conditions

- Attribution — You must attribute the work by inserting

• “[source http://www.oeg-upm.net/]” at the footer of each

reused slide

• a credits slide stating: “Maximising (Re)Usability of

Resources using Linked Data” by A. Gómez-Pérez ”

- Non-commercial

- Share-Alike

4

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

1. Motivation

2. Linked Data Foundations

3. Linked Data Process

Examples from : http://datos.bne.es

4. Linguistic Linked Data

5. Multilingual Linked Data

6. Uses of Linked Data

5

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

A world of digital data

Heterogeneous

Formats

Providers Domains Languages

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The students case

“Cervantes"

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Complementary,

Different

languages,

but not connected

Lack of interoperability

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Multilingual Data Integration

Fotografía

El Quijote

Image

http://www.mancia.org/foro/

articulos/107712-don-quijote-medicina.html

URL

El Quijote

Photo

M. Cervantes

El Quijote

Author of

BNE Located

El Quijote

Vídeo

El Quijote

Español

Video

Film Language

http://www.rtve.es/alaca

rta/videos/el-quijote/

URL Movie

M. Cervantes

Don Quixote

Polish

Written by

Translated in

1960 Year of

publication

VIAF

located

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Energy Efficiency scenario

10

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Turbine metadata Fore cast

information

Wind Turbine

Energy output by

month

Limitations when exploiting different and disconnected data sources

Wind Speed per

day and city

Wind farm topology

Company Private data

Real time wind speed

Metadata Data

M D

M D

M D M D

M D

M

M D

Complementary

but

not connected

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Lack of interoperability:

Language, Syntax,Semantic and Technical

• Ecosystem of

- Open Resources in silos

- Complementary domains

- Heterogeneous formats

- Different languages

- Repositories with different

metadata

- Many APIs and services

for querying

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The user problems…

Discovery and Use of Information in

third party applications is hard,

manual and time consuming

Metadata Metadata

Combination of Private and

Public Sector data in third

party applications requires

solutions to the license issues

Data

Data

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

2. Linked Data Foundations

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data: why it is important?

• Facilitate data integration

- From heterogeous sources

- In different formats

- Different granularity

- In different languages

- From different countries

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

LD domains in August 2014

Media

Geographic

Life Sciences

Publications Goverment

Social

Networking Cross-domains

User Generated

Content Linguistics

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Foundations

Unique identifiers: URI identify or name a resource

RDF(S) models

El Quijote Cervantes Is creator of

Work Person Is creator of

Is a Is a

http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563

http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001

Equivalence links to other datasets Same As

http://viaf.org/viaf/17220427

Cervantes

Same As Same As

http://dbpedia.org/resource/Miguel_de_Cervantes

Cervantes

Data navigation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Foundations: Linking

Models alignment using Owl EquivalentClass

EquivalentClass

Same As

http://xmlns.com/foaf/0.1/Person Person

http://schema.org/Person Person

EquivalentClass

Municipality

Person

Place of birth

http://iflastandards.info/ns/fr/frbr/frbrer/C1005

http://dbpedia.org/resource/Municipalities_of_Spain

http://dbpedia.org/page/Alcal%C3%A1_de_Henares

Alcalá de Henares

Is a

http://geo.linkeddata.es/ontology/Municipio

Municipio

http://geo.linkeddata.es/resource/Alcalá de Henares

Alcalá de Henares

IS A

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The model (Ontology) and the data

23

Work

Idiom

translation

Year

Publication date

Library

Located at

Person

Is creator of

Has subject

El Quijote Cervantes

Is creator of

Catalán

translation

1960

Publication date

BNE

Located in

Has subject

Vida de Cervantes

birthPlace Place

birthPlace Alcalá de Henares

Ontology

Data

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 24

http://iflastandards.info/ns/fr/frbr/frbrer/C1001

http://iflastandards.info/ns/fr/frbr/frbrer/C1002

translation

Año

Publication date

http://xmlns.com/foaf/0.1/Organization

Located in

http://iflastandards.info/ns/fr/frbr/frbrer/C1005

Is creator of

Has subject

http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747

Es autor

http://datos.bne.es/resource/XX1924295

translation

1960

Publication date

BNE

Located in

Has subject

http://datos.bne.es/resource/bimo0002045496

Vida de Miguel de Cervantes Saavedra

Don Quijote de la Mancha Cervantes Saavedra, Miguel de

Catalán

Ontology

Data

http://datos.bne.es/#

Language

work

Biblioteca

Person

http://geo.linkeddata.es/ontology/Municipio

birthPlace

http://geo.linkeddata.es/resource/Alcalá de Henares

birthPlace

Linked data is full of URIs

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data without ontologies

http://www.server1.org/resource/Cervantes

http://www.server2.es/resource/Cervantes

http://datos.bne.es/resource/XX1718747

http://d-nb.info/gnd/11851993X

http://geo.linkeddata.es/page/resource/Municipio/Cervantes

Same as

Same as

Same as

Same as

URI URI

URI URI

URI

914 296 093

276,4 km²

Phone

Size

1547

#People

1547

Date of Birth

Author

D. Quijote

Cervantes

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data and ontologies

http://www.server1.org/resource/Cervantes

http://www.server2.es/resource/Cervantes

http://datos.bne.es/resource/XX1718747

http://d-nb.info/gnd/11851993X

http://geo.linkeddata.es/page/resource/Municipio/Cervantes

Same as

Person rdf:type

rdf:type

Retaurant rdf:type

Street rdf:type

Municipality rdf:type

URI URI

URI URI

URI

1547

Date of Birth

Author

D. Quijote

Cervantes

(Person)

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The problem and challenges

27

Need to access heterogeneous relational

data sources (Geography, Energy, Medicine,

Environment)

Need to submit SPARQL queries into

distributed SPARQL endpoints

• Some of the databases are available

in different DBMSs

• Some of the data sources are

available as spreadsheets, Words, PDFs,

• Furthermore, many of these datasets

are already published as Linked Data

or in SPARQL endpoints

• Data may be available from data

streams (e.g., sensors)

We can use ontologies as

global schemas for our data sources

Oscar Corcho and the OEG-UPM Data Integration

team

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data allows uniform access

1. Agree on vocabularies for

describing metadata and domain

data

2. Unified and standardized language

for describing resources ( RDF(S))

3. Unified and standardized query

language (SPARQL)

4. Standardized non-proprietary APIs

5. Links to other resources

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked data Technologies @ OEG

29

Geometry2RDF

shp2RDF

geo REST service annotation

Sem4Tags Marimba NOR2O Morph SPARQL

-Stream

Linked Library Data

Visualisation Map4RDF Sensor Data

Visualisation

Visualization

RDF Generation and Linking

LDP4j

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Metadata and data Integration

Data Generation Metadata Generation

Public Resources

Producers

Private Resources

Geographical Information REST service

annotation Web 2.0

Library and Cultural Heritage

Diverse Information Sensor

Networks data

Data Integration

Users

Metadata Integration

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

3. Linked Data Process

Examples from datos.bne.es

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data life cycle

1. Clear methodologies,

methods and tools for

monolingual LD

generation and

publication

Villazón-Terrazas, B.; Vilches. L.; Corcho, O.; Gómez-Pérez, A.

Methodological Guidelines for Publishing Government Linked

Data. In D. Wood, ed. Linking Government Data. Springer. (pp,

27-49). 2011

Specification

Modelling

Generation Publication

Exploitation

Linking

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Specification

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Goal

Linked Data generation of the Spanish National Library Metadata

• Source data: MARC 21 records, not RDB. Very flat structure difficult to map to richer models

• Domain experts (catalogers) need to be part of the mapping process.

- Highly specialized library models: FRBR, ISBD.

• Data quality good but still many errors: data curation during the LD generation process

- Iterative and incremental transformation process: measure coverage and progress.

• Multilinguality, collaboration with IFLA

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 34

• Identify and analyse the data sources

analysis

• Design the URIs

• License and Provenance definition

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

MARC21

• Different communication formats:

- MARC 21 format for Bibliographic Data

- MARC 21 format for Authority Data

- Others: Holdings, Classification, etc.

• 3.9 million bibliographical records

• 4.2 million authority records

35

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

MARC21 record structure

001 XX1721208

005 200012181124

008 901120nn aijnnaabn n aaa

016 $a BNE19900178994

040 $a SpMaBN $b spa $c SpMaBN $e rdc $f

embne

100 10 $a Camus, Albert

$d 1913-1960

670 $a El mite de Sísif, 1987 $b port. (Albert

Camus)

670 $a Dic. de filosofía, de J. Ferrater Mora,

1980$b(Camus., Albert (1913-1960); n.

Mondovi, Argel)

670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)

37

Subfield Field

Control Field

Content

Subfield Content

• Authority record: Camus, Albert*

HEADING

1XX

* http://datos.bne.es/resource/XX1721208

Specification

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Frecuency of codes in records

39

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Modelling: Ontologies and Terminology

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Shared

Understanding

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Ontology

• Ligth weight Ontologies: o Concepts

o Organized in taxonomies

o Properties between concepts

o Properties for describing concepts

• Shared understanding of a domain of interest

• Ontologies expressed in OWL or RDF(S)

• The NeOn methodology helps to build ontologies

Modelling

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Model: FRBR at a glance

Works

Expressions

Manifestations

Work 1

Work 2

Work 3

Expression1

Expression 2

Manifestation1 Manifestation2

42

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The Ontology: based on IFLA vocabularies

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Who will be the mapping generator?

001 XX1721208

005 200012181124

008 901120nn aijnnaabn n aaa

016 $a BNE19900178994

040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne

100 10 $a Camus, Albert

$d 1913-1960

670 $a El mite de Sísif, 1987 $b port. (Albert Camus)

670 $a Dic. de filosofía, de J. Ferrater Mora,

1980$b(Camus., Albert (1913-1960); n. Mondovi,

Argel)

670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

MARC 21 records

IFLA-based Ontologies

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Similar to mapping ontologies

45

100at Work

property subfield

maps

100t title of work maps

is creator of

Person 100a maps

Content

(100a)

Content

(100at) contained in

maps

Modelling

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Librarians create mappings using excell

47

Classification

mapping

Annotation

mapping

Relationships

mapping

MARC21

info

Records count Content sample Mapping

100 $a $d 888.880 Camus, Albert

1913-1960

foaf:Person

100 $a 999.999 Cervantes, Miguel

de

foaf:name

100 $a $m 10.000 Cervantes, iguel ERROR

Basic structure

Classification

mapping

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 48

Annotation

mapping

Relationships

mapping

Librarians create mappings using excell

place of publication

has dimensions

Is part of work

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Marimba: Mapping process summary

Classify

Annotate

Relate

51

001 XX1721208

100 10 $a Camus, Albert $d 1913-1960

001 XX1910518

100 10 $a Camus, Albert$d1913-1960 $tLa

peste

bne:XX1721208 a frbr:Person

bne:XX1910518 a frbr:Work

bne:XX1721208 a frbr:Person

frbr:name "Camus, Albert" .

frbr:hasDates 1913-1960

bne:XX1910518 a frbr:Work

frbr:title "La Peste"

bne:XX1721208 a frbr:Person

frbr:name "Camus, Albert" .

frbr:hasDates 1913-1960 .

frbr:isCreatorOf bne:XX1721208

bne:XX1910518 a frbr:Work

frbr:title "La Peste" .

frbr:isCreatedBy bne:XX1721208

(MARC records)

BNE

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Marimba uses the ontology to generate RDF

BNE

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

• http://marimba4lib.com

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Marimba links with other resources:

VIAF, DNB, SUDOC, LIBRIS, DBpedia

BNE

http://datos.bne.es/resource/XX1718747

Same As

Same As

Same As

Same As

Same As

LIBRIS

http://libris.kb.se/resource/auth/45369

SUDOC

http://www.idref.fr/026774771/id

DNB

http://d-nb.info/gnd/11851993X

DBpedia

http://dbpedia.org/resource/Miguel_de_Cervantes

VIAF

http://viaf.org/viaf/17220427

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Several IRI/URIs exist for Miguel de Cervantes

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Marimba links with other resources:

VIAF, DNB, SUDOC, LIBRIS, DBpedia

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Publicación

Data publication

Metadata publicacion using VOID

To facilitate the discovery

• Register in CKAN your dataset

• Use to sitemap4rdf to generate the site map

• Upload the site map to Google and Sindice

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Exploitation: datos.bne.es

select distinct COUNT(?Obras) where {

http://datos.bne.es/resource/XX1718747

<http://iflastandards.info/ns/fr/frbr/frbrer/P2010>

?Obras

}

URI Cervantes

Is author

SPARQL queries

Web Interface

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Agregation of geographical information with library metadata

60

http://datos.bne.es/autor/XX869875

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Locations related with “El Quijote”

61

Itinerary followed in the

trip

Locations

Route

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

4. Linguistic Linked Licensed Data

Linked data

Linguistic

Linked Data

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Use cases for LR Discovery

• Language metadata content

- Give me bilingual dictionaries in

Spanish, Polish , that accounts

for grammatical number and

gender with Creative Common

licenses

• Language Resources content

- Give me all occurrences in

corpora of the token “bank”

disambiguated as the WorNet

synset http://wordnet-

rdf.princeton.edu/wn31/1084372

35-n

• Language Services

- Give me all RESTfull

services that can extract

terms from text in Spanish.

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Lack of interoperability of Language resources

• Ecosystem of - Open and Closed resources

- Different Languages

- Silos of LRs

- Complementary resources • Lexicon, Corpora,

Dictionaries, Grammars, ….

- Heterogeneous formats • E.g, for Lexicons: Lexinfo,

LMF, LIR, Lemon, …

- Several repositories with different metadata and schemas

- Many APIs and services for querying

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

http://es.wiktionary.org

http://rae.es

http://www.wikilengua.org/

index.php/Terminesp:red

http://es.wikipedia.org

http://www.wordreference.

com/sinonimos/

An example

“Red”

(computer

network)

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

“Red”

Etimologiy Del latin “rete”

Gender: “f”

Definition.: “Conjunto de

ordenadores o de equipos

informáticos conectados entre

sí….”

“Red”

Sinonyms: “sistema”, “malla”,” distribución”

“Red”

Norm: UNE 21302-131

English: network

German: Netzwerk

“Red”

Pronunciation: [red]

Grammar category: sustantivo femenino

Singular: “red”

Plural: “redes”

“Red_de_computadores”

Category: redes informáticas

Image

“Red”

(computer

network) Complementary but

not connected

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

LD allows linguistic data integration

Red

Phonetic form

Form

number singular

[RED]

Form

plural

[REDES]

Phonetic form

number

Red

Sense

written form

“red”

Sense

written form

“malla”

equivalent

Red

image

Red

Sense Sense

translation

es - en

written form

“red” “network”

written form

Red

written form

Form

gender

femenine

“red”

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linguistic Linked Licensed Data

3LD Linguistic Linked Licensed Data

Language resources

such as:

- Lexica

- Corpora

- Dictionaries ..

NIF

NLP Interchange Format

Using RDF and standard data

models (vocabularies):

- Lexica

- Corpora

ODRL Open Digital Rights Language

Published along with

a machine-readable license.

www.lider-project.eu

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linguistic Linked Data Evolution

Jan. 2013

2014

Sept. 2014 Sept. 2013

April. 2015

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linguistic Linked Licensed Data @ Nov 2014

LLOD Cloud in November 2014 • 103 Resources (+58%) • 165 Links (+101% increase) • More balanced (14 Corpora,

+367%) • Less Centralized: Babelnet, LexVo

and LexInfo new hubs

Criteria for inclusion: • Resolvable: URLs that resolve • RDF: resolve to RDF • 1000 Triples: self-explaining • Links: to one resource from the

cloud or other 50 links • Crawlable: get the whole

resource by crawling • Linguistic: data must be a

language resources • Registered: at CKAN

www.lider-project.eu

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Best practices and guidelines (BPMLOD @ W3C)

1. Best practices for Multilingual Linked Data Publication (BPMLOD @ W3C)

- Practices for Naming (URIs)

- Practices for Dereferencing

- Practices for Textual Information

- Practices for Linking

- Practices for Language Identification

2. Guidelines for Linguistic Linked License Data

- Wordnets,

- Multilingual Lexicographic resources

- Bilingual Dictionaries

- Terminologies in TBX

- NIF-based NLP Web services

How many Linguistic

Resources are exposed in

RDF?

www.lider-project.eu

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

How data and Linguistic LD is related

How many Linguistic Resources are exposed in

RDF?

LOD

Is Linguistic LD just

another type of

dataset to be

exposed in RDF?

Is the role of Linguistic

LD to extend any

dataset with lexical

entries?

LLD

How many Linguistic Resources are exposed in

RDF?

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linguistic Linked Licensed Data

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linguistic Linked Licensed Data

How do we represent license information?

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data and Linguistic Linked Data

1. Agree on vocabularies for

describing

• Domain vocabularies

• LR metadata and content (Lemon-

Ontolex, NIF, …)

2. Unified and standardized language

for describing resources ( RDF(S))

3. Unified and standardized query

language (SPARQL)

4. Standardized non-proprietary APIs

5. Links to other resources

Linguistic LD

www.lider-project.eu

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

5. Linked Data is multilingual

Linked data

Linguistic

Linked Data

Multilingual

Linked Data

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Rationale: LOD is dominated by the English

Language

Some questions:

1. Distribution of natural languages across RDF

datasets?

2. Usage of language tags to indicate the natural

language of RDF tags?

1. Distribution of usage of language tags

2. Distribution of literals tagged as English vs other languages

3. Distribution of literals tagged in languages other than

English

89

2007 2009 2014

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The multilingual LOD: Current state*

90

9%

91%

RDFliteralswithlangtag

RDFliteralswithoutlangtag

7%

93%

RDFliteralswithlangtag

RDFliteralswithoutlangtag

67%

33%

RDFliteralsEnglish

RDFliteralsotherthanEnglish

71%

29%

RDFliteralsEnglish

RDFliteralsotherthanEnglish

JAN

2015

JAN

2014

^* Used corpus: swse.deri.org/dyldo/

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The multilingual LOD: Current state*

91

0

10.000

20.000

30.000

40.000

50.000

60.000

70.000

es de zh fr it ru pl nl pt sv

jan2014 jan2015

Evolution of top 10 most used language tags in languages other than English

^* see statistics for 2012 in the paper “Guidelines for Multilingual Linked Data” Gómez-Pérez 2013

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Messages to take home

1. Data providers should include language metadata in their datasets

• in the original data sources (e.g., MARC21 records)

• tags into RDF (e.g., @es, @ pl at least)

• language URIs in the VOID or DCAT descriptions

2. Guidelines and best practices needed to help language metadata generation,

linking and consumption

3. Benefits of adding language information LD datasets

• Reduce the time and cost of identifying language in resources and

terminology

• Foster the aggregation and enrichment of data across complementary

resources

• Enhances data curation

• Improves precision and recall in information retrieval and search

Publishing Linked Data on the Web: The Multilingual Dimension

Daniel Vila-Suero, Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Jorge Gracia, Guadalupe Aguado-de-Cea

http://link.springer.com/chapter/10.1007/978-3-662-43585-4_7

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

6. Uses

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Linked Data Applications

104

Ontology Engineering Group

Culture (@BNE) Geograhical (@IGN) Metereological (@AEMET)

News and Media (@ Prisa, RTVE) Internet of Things ( @ CRTM, Bike sharing system)

Smart Cities and Open Data (@ Zaragoza, Gob Aragón, Jacathon, Catalogues)

Host of esDBpedia

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Uses of Linked Data

1. Programmers built

applications using

make queries in

SPARQL and get RDF

Culture

(@BNE)

Geograhical

(@IGN)

Metereological

(@AEMET)

Smart Cities 2. Citizens/Users access

LD through a user

interface (they do not

see RDF)

3. Machine – Machine

data exchange and

semantic

interoperability in RDF

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

The new Linked Data Ecosystem

Culture

(@BNE)

Geograhical

(@IGN)

Metereological

(@AEMET)

Smart Cities

Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez

Thanks for your attention !

107