23
datos.bne.es: Publishing and consuming Daniel Vila Suero [email protected] Ontology Engineering Group, Universidad Politécnica de Madrid Acknowledgements: OEG Members, BNE staff (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí, Ricardo Santos and others)

datos.bne.es: Publishing and consuming

Embed Size (px)

DESCRIPTION

A presentation by Daniel Vila Suero of the Ontology Engineering Group at the Universidad Politecnica de Madrid. Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.

Citation preview

Page 1: datos.bne.es: Publishing and consuming

datos.bne.es: Publishing and

consuming

Daniel Vila Suero

[email protected]

Ontology Engineering Group, Universidad Politécnica de Madrid

Acknowledgements: OEG Members, BNE staff (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí,

Ricardo Santos and others)

Page 2: datos.bne.es: Publishing and consuming

2

datos.bne.es

Page 3: datos.bne.es: Publishing and consuming

3

Background

• Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid.

• Multidisciplinary effort: Librarians, Computer scientists, linguists..

• Close collaboration between library experts and computer scientists.

• Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, DC, RDA..)

datos.bne.es

Page 4: datos.bne.es: Publishing and consuming

4

Main goals

• Perform the transformation incrementally and iteratively

• Develop a system where library experts can define and assess the mappings to RDF independently from the IT people

• Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example)

• Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data)

datos.bne.es

Page 5: datos.bne.es: Publishing and consuming

5

Source MARC recordsdatos.bne.es

AUTHORITY BIBLIOGRAPHIC

Persons

Corporate bodies

Conferences

Titles

Subject

76576 Maps

320727 Sound recordings

166017 Gravings, drawings, pictures

35770 Manuscripts

143959 Ancient books

2696560 Modern books

178473 Scores

3021 Electronic resources

156634 Serials

96672 Videos

Page 6: datos.bne.es: Publishing and consuming

6

Some figuresdatos.bne.es

• Total number of authority records: 4.100.000• Total number of bibliographical records: 2.390.140• Total number of RDF triples: 58.053.215 • Number of links: (15% authorities): 587.520 • Linked sources:

• VIAF• SUDOC (French Collective University Catalogue) FR• GND (German National Library Authorities) GER• LIBRIS Sweden• DBPedia• Soon BNF, BNB, German Bibliographie

Page 7: datos.bne.es: Publishing and consuming

7

Some statisticsdatos.bne.es

2,390,103

1,969,526

1,163,764

1,114,719

497,644

282,879

Manifestation

Work

Person

Expression

Thema

Corporate Body

Page 8: datos.bne.es: Publishing and consuming

8

Some statisticsdatos.bne.es

"is c

reat

or (p

erso

n) o

f"

"is c

reat

ed b

y (p

erso

n)

"is e

mbo

dimen

t of"

"is e

mbo

died

in"

"is re

alize

d th

roug

h"

"is re

aliza

tion

of"

"is p

art (

work)

of"

"has

par

t (wor

k)"

"is s

ubor

dinat

e of

"

"is c

reat

ed b

y (c

orpo

rate

bod

y)"

"is c

reat

or (c

orpo

rate

bod

y) o

f"

"is p

art (

expr

essio

n) o

f"

"has

par

t (ex

pres

sion)

"

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000 2,129,2222,129,222

1,246,773

1,246,773

1,054,736

1,054,736

85,347 85,347 78,56116,462 16,462

755 755

Page 9: datos.bne.es: Publishing and consuming

9

Publishing

Page 10: datos.bne.es: Publishing and consuming

10

Our data modelPublishing

Page 11: datos.bne.es: Publishing and consuming

11

Transformation processPublishing

• How to facilitate the mapping process to library experts?1. Use a familiar and intuitive interface: Spreadsheets

2. Work only on what's in the database: Pre-process records to build the spreadsheets

• 3 step-process 3 different spreadsheets

1. Classification: is it a Person? a Work? a Manifestation?

2. Annotation: name, birth date, title, language of expression

3. Relation: find relationships between entities (Person is creator of a certain work)

Page 12: datos.bne.es: Publishing and consuming

12

Publishing

Page 13: datos.bne.es: Publishing and consuming

13

Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21

Page 14: datos.bne.es: Publishing and consuming

14

Mapping processPublishing

Page 15: datos.bne.es: Publishing and consuming

15

Mapping processPublishing

Page 16: datos.bne.es: Publishing and consuming

16

Still a lot of work to doPublishing

• We cover only core relations of FRBR

• There are a significant amount of manifestations not linked to their expressions currently looking at more sophisticated clustering techniques

• Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica) Next version (to be published this year) will contain these links

• Classification step can be further automatized

Page 17: datos.bne.es: Publishing and consuming

17

Consuming

Page 18: datos.bne.es: Publishing and consuming

18

Perspectives

• 2 different perspectives:- Systems and applications:

• SPARQL endpoint, • Linked Data API

- End-user interfaces

• + an interesting side-effect:- By applying FRBR and RDF mappings we can (and did)

improve the catalogue

• Using standard web technologies and more intuitive models we open the door to:

- Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions…

Consuming

Page 19: datos.bne.es: Publishing and consuming

19

Graph analysis exampleConsuming

Using Open-source tools:Gephi for example

http://bne.linkeddata.es/graphvis

Page 20: datos.bne.es: Publishing and consuming

20

Enabling access to systems and appsConsuming

Linked Data API: http://datos.bne.es/frontend/persons

Page 21: datos.bne.es: Publishing and consuming

21

Flexible access to dataConsuming Out of the box:

• Search by every field• Access cluster of resources• Filtering• Paging• Serve multiple formats: XML,

Turtle, JSON

Page 22: datos.bne.es: Publishing and consuming

22

Different views on the dataConsuming

HTML

XML

Page 23: datos.bne.es: Publishing and consuming

23

ConsumingEND-user interfaces

Current linked data opens the door to:• Re-rank OPAC results• Better clustering of results• Recommendation• Enhance data from other sources