Antoine Isaac Europeana – VU University Amsterdam

Preview:

DESCRIPTION

Antoine Isaac Europeana – VU University Amsterdam. Dagstuhl Multilingual Semantic Web seminar. Europeana. “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament. 24 M objects ( images, text, sound and video) - PowerPoint PPT Presentation

Citation preview

Antoine Isaac

Europeana – VU University Amsterdam

Dagstuhl Multilingual Semantic Web seminar

Europeana

24 M objects (images, text, sound and video)

From over 2.200 libraries, museums, archives

From 33 countries

For everyone

“A digital library that is a single, direct and multilingual access point to the European cultural heritage.”

European Parliament

Multilingual Access in Europeana

Dimensions of multilingual access

Interface

Search (query translation or document translation)

Result presentation

Browsing

Europeana's efforts

Interface translated into 26 languages

Query translation: only prototype

Query result filtering by country/language

Document translation (user enabled)

Semantic contextualization of objects

• Multilingual enrichment/annotation of metadata

Making metadata work for multilingual access

Current metadata in Europeana

Simple object records

Flat (text values)

Without language tags!

Only language-related info on metadata is at collection level

• Can be "mul"

Need to change!

a new Europeana Data Model (EDM)

"Semantic layer" of contextual resources(concepts, persons, places, events...)

Networked objects

Cultural artefact

PaintingSculptureBuildling

Exploiting semantic relationse.g. “broader concept”, “place of birth”, “involved

person”…

Multilingual metadata

Fetching already available linked data

http://www.w3.org/2005/Incubator/lld/XGR-lld-vocabdataset/

E.g., from libraries

Interoperability

Encouraging the use of RDF + common and simple elements

Interoperability

Encouraging the use of common and simple data elements

<skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2308"> <skos:prefLabel xml:lang="fr">Piano carré</skos:prefLabel> <skos:prefLabel xml:lang="it">Pianoforte a tavolino</skos:prefLabel> <skos:prefLabel xml:lang="en">Square pianoforte</skos:prefLabel> <skos:prefLabel xml:lang="de">Tafelklavier</skos:prefLabel> <skos:prefLabel xml:lang="nl">Tafelpiano</skos:prefLabel> <skos:prefLabel xml:lang="sv">Taffel</skos:prefLabel> <skos:broader> <skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2273"> <skos:prefLabel xml:lang="en">Pianofortes</skos:prefLabel> </skos:Concept> </skos:broader></skos:Concept>

Interoperability

mixed nature of eligible contextual resources: dictionaries, synonym/translation lists, thesauri, authority lists, gazetteers…

interplay: “semantic” data next to multilingual data

Simultaneous approaches

Getting richer semantic/multilingual metadata from providers

Fetching third-party contextual data and linking it to “un-contextualized” objects

Linking contextual data from an institution to another more general / more commonly used contextual dataset

• Dbpedia.org, VIAF.org…

Status and challenges

Current status

All this is work in progress and will take time

R&D prototypes (EuropeanaConnect) showing the challenges of gathering appropriate multilingual tools and data

First tests of simple techniques in production portal: GeoNames (places) and GEMET (concepts)

Encouraging, but illustrate issues with too naïve approaches (no NLP) and incomplete data

• Cheval

• Poison

http://www.europeana.eu

Problems & requirements

For providers & Europeana

Continue work on metadata

Benchmarking (cf. CHiC lab @ CLEF)

Positioning as consumers and contributors of data (cf Asun’s slides)

data.europeana.eu

For language-intensive tools and resources

Availability: open resources

Interoperability

Simplicity

• But not always! E.g., not only “first hit” translations

Scale: scalability of tools, number and scope of datasets

Many languages, some lesser-resourced (wrt. English)

Another illustration: VOICES project

Something entirely different but not completely unrelated

Voice-based community-centric mobile services for social development

Easing communication on agricultural trade

Listing of products/prices via phone/radio

Pilot in Mali

Challenges

Data-centric project, but language technology plays a crucial role

Objects should be provided with textual and audio labels (text-to-speech system) in different languages

Local languages: e.g., Bambara

Lack of resource: need low-cost, easy-to-adapt solutions

Victor de Boer, VU Amsterdam (v.de.boer@cs.vu.nl)

Thank you

aisaac@few.vu.nl

http://www.few.vu.nl/~aisaac/

Some slides based on Marlies Olensky and Juliane Stiller -

Multilingual Web Workshop, June 11, 2012, Dublin

Recommended