22
Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of many people involved in Europeana (referenced in the slides) Eurovoc Conference, 18-19 November 2010

Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Embed Size (px)

Citation preview

Page 1: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

www.europeanaconnect.eu

Multilingual Access to Online Content - the Europeana Experience

Vivien Petras (Humboldt-Universität zu Berlin)

With the help of many people involved in Europeana (referenced in the slides)

Eurovoc Conference, 18-19 November 2010

Page 2: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Outline

• Europeana – a brief introduction

• Multilingual access to Europeana – approaches

• Europeana Semantic Data Layer

• Multilingual Alignments of Vocabularies

• Semantic Search Engine Prototype

Page 3: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana

“A digital library that is a single, direct and multilingual access point to the European cultural heritage.”

European Parliament, 27 September 2007

Page 4: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Today

• 13 million objects

• 28 data aggregators

• 1500 participating institutions

• 200 partners

• 35 FTE’s

• 21 projects

• 1 million visits in 2010

• 30,000 My Europeana signees

• 2008: Prototype

• 2010: Operational Service

•Stable portal

•Open Source Code

•EuropeanaLabs

•Public Domain Charter

From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam

Page 5: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Contributions by Country

From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam

Different languages!(?)

Page 6: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Goethe, Johann Wolfgang von Title: Goethe, Johann Wolfgang von Date: unknown Creator: Goethe, Johann Wolfgang von Description: Goethe, Johann Wolfgang von Language: de-DE Format: image/jpeg Source: SLUB/Deutsche Fotothek Rights: Deutsche Fotothek Provider: Deutsche Fotothek ;  Germany Identifier: http://www.deutschefotothek.de/obj70226592.html Subject: Bildnis;  Bildniskatalog;  Foto;  Fotos;  Portrait Type: image

Books, Articles, Postcards, Folklore objects, Photography, Art

Europeana Content Types

Page 7: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Multilingual Acess to Europeana

• Interface • static pages

• Search • query translation• (document translation)

• Subject Browse (& Search)

• Controlled vocabularies• Semantic Data Layer

French English Spanish

German Italian Polish

Dutch Portugese

Hungarian Swedish

Page 8: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Semantic Data Layer

Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM).

Page 9: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Semantic Data Layer

Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM).

library

archive

museum

Bridging „isles of information“ by connecting objects from different domains via cross-vocabulary links.

Page 10: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Semantic Data Layer Alignment Example

Irish vocabulary

From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam

Norwegian vocabularySKOS Mapping

skos:exactMatch

Page 11: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Multilingual Alignment: Approach

• Identify and convert relevant semantic resources • Pivot vocabularies for relevant categories (subject, persons, places…)

= multilingual and with wide coverage• E.g. UDC, DDC, VIAF, TGN, Geonames, Wordnets, dbPedia

From: Isaac, Antoine; Schreiber, Guus (2010). Vrije Universiteit Amsterdam Approach to Multilingual Mapping of Vocabularies.

Page 12: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Multilingual Alignment: Approach

• Align more specific vocabularies to the pivots = anchoring mappings

• Finding instances of skos:exactMatch mappings

• Vocabulary characteristics important for matching:• Lexical variance of lables (e.g. plural/singular, diacritics,

multilinguality)• Preferred / alternative labels• Nature of hierarchy

From: EuropeanaConnect Milestone 1.2.1 (2010). Specification of preferred terms identification methodology.

Page 13: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Multilingual Alignment: Approach

• Methodology:• Conversion to SKOS/RDF• Application of different alignment methods:

• Lexical matching• Structure-based matching• Instance-based matching

• Filtering / disambiguation of matching candidates:• Analyzing children / parent matches

• Combining alignments

From: EuropeanaConnect Milestone 1.2.1 (2010). Specification of preferred terms identification methodology.

Page 14: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

VUA Vocabulary Aligment Tool Amalgame

• AMsterdam ALignment GenerAtion MEtatool

• Uses EDOAL (Expressive and Declarative Ontology Alignment Language) or SKOS

• Also provides pre- / post-mapping statistics and an evaluation tool

From: EuropeanaConnect Milestone 1.2.2 (2010). Semantics of descriptions aligned (intermediary).

Page 15: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

VUA Vocabulary Aligment Tool Amalgame

• Skosified: en, fr, de, nl, hu

• Mappings (>500,000): en, fr, nl

• Mostly label matches

http://semanticweb.cs.vu.nl/beta/amalgame/list_alignments

Page 16: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Semantic Search Engine

http://eculture.cs.vu.nl/europeana/session/search

Page 17: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Semantic Search Engine

Disambiguation of search terms

Page 18: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Semantic Search Engine

Multilingual query expansion

Page 19: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana Semantic Search Engine

• Works created by matching person

• Works related to matching person

• Works created by a teacher of matching person

• Works related to an artefact created by matching person

• Works created by an artist professionally related to matching person

• Works titled

• Works showing concept

• Works with matching Location

• ….

Clustering of search results

Page 20: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Next Steps

• Adding more vocabularies from the content providers:• VIAF • Spanish and Polish subject heading lists

• Switching metadata delivery to Europeana Data Model (EDM) format (2011)

• And: linking with the cloud…

Page 21: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

Europeana & Linked Open Data

Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM).

Information Spaces•DBpedia•PND and SWD (prototype)•Geonames•LCSH•…

Page 22: Www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of

Vivien Petras, Humboldt-Universität zu BerlinEurovoc Conference, 18-19 November 2010

www.europeana.eu

Thank you.