Upload
asuncion-gomez-perez
View
354
Download
3
Embed Size (px)
DESCRIPTION
This set presents the concept of Linguistic Linked Licensed Data (known as 3LD) and the LIDER project http://www.lider-project.eu/. The project’s mission is to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content. By achieving this goal, LIDER will impact on the ease and efficiency with which Linguistic Linked Data will be exploited in content analytics processes.
Citation preview
20/03/2014 1 Presenter name
Linked Data and Language Technologies: The LIDER project
A. Gómez-Pérez (UPM)
Project Coordinator
CSA Budget: 1.482.000€ Starting date: 1. Nov. 2013 Duration: 2 Years
20/03/2014 2 Asun Gómez-Pérez
• Motivation
• Linked Data for Language Technologies
• What is LIDER about
20/03/2014 3 Asun Gómez-Pérez
Heterogeneity of Linguistic Resources
• Ecosystem of
– Open and Close resources
– Complementary resources • Lexicon
• Corpora
• Dictionaries
• ….
– Heterogeneous formats • E.g, for Lexicons: Lexinfo, LMF, LIR, Lemon, …
– Language Resources available on the web • Meta-share, ELDA, ELRA, Clarin, FLaReNet, MultiJEDI,
20/03/2014 4 Asun Gómez-Pérez
Limitations when exploiting LRs
• The process of finding and integrating LR in third party applications is manual and time consuming
• LR metadata – cannot be queried using a common
language (e.g. SPARQL)
• LR content – is available in heterogeneous formats
– LR content is not linked with other linguistic content
Language resources and technologies supported are still far
from being Free, Open and Interoperable
20/03/2014 5 Asun Gómez-Pérez
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/index.php/Terminesp:red
http://es.wikipedia.org
http://www.wordreference.com/sinonimos/
An example
“Red” (computer network)
20/03/2014 6 Asun Gómez-Pérez
6
http://rae.es
Complex queries using data from heterogeneous sources
20/03/2014 7 Asun Gómez-Pérez
7 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
http://es.wiktionary.org
http://rae.es
20/03/2014 8 Asun Gómez-Pérez
8 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/index.php/Terminesp:red
20/03/2014 9 Asun Gómez-Pérez
9 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/index.php/Terminesp:red
http://www.wordreference.com/sinonimos/
20/03/2014 10 Asun Gómez-Pérez
10 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/index.php/Terminesp:red
http://es.wikipedia.org
http://www.wordreference.com/sinonimos/
20/03/2014 11 Asun Gómez-Pérez
*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
“Red”
Etimologiy Del latin “rete”
Gender: “f”
Definition.: “Conjunto de
ordenadores o de equipos
informáticos conectados entre
sí….”
“Red”
Sinonyms: “sistema”, “malla”,” distribución”
“Red”
Norm: UNE 21302-131
English: network
German: Netzwerk
“Red”
Pronunciation: [red]
Grammar category: sustantivo femenino
Singular: “red”
Plural: “redes”
“Red_de_computadores”
Category: redes informáticas
Image
Complementary
but not connected
20/03/2014 12 Asun Gómez-Pérez
LD allows linguistic data integration
12
Red
Phonetic form
Form
number singular
[RED]
Form
plural
[REDES]
Phonetic form
number
Red
Sense
written form
“red”
Sense
written form
“malla”
equivalent
Red
image
Red
Sense Sense
translation
es - en
written form
“red” “network”
written form
Red
written form
Form
gender
femenine
“red”
20/03/2014 13 Asun Gómez-Pérez
LD as a possible solution
• Agree on 21st century vocabularies for describing resource metadata and content
• Unified and standardized language for describing resources ( RDF(S))
• Unified and standardized query language (SPARQL)
• Standardized non-propietary APIs
• Links to other resources
20/03/2014 14 Presenter name
Linked Data
for
Language Technologies
20/03/2014 15 Asun Gómez-Pérez
Linked Open Data and Language
1. LOD is increasingly multilingual
2. LOD interconnects resources
– In many domains
– in many languages
How many Linguistic Resources are exposed in RDF?
20/03/2014 16 Asun Gómez-Pérez
Linked Data and Language Resources
Linguistic LOD (LLOD) Subset of LOD
Linguistic domain
Open License
Resources in RDF
Interconnected with other LD resources
• Long term experience • Huge amount of resources • Maturity • Curation • Legal liability
20/03/2014 17 Presenter name
The LIDER project
20/03/2014 18 Asun Gómez-Pérez
The LIDER consortium
18
Universidad Politécnica de Madrid
(UPM, Spain) [COORDINATOR]
Trinity College Dublin (Ireland)
DFKI (Germany)
National University of Ireland, Galway (Ireland)
Institut für Angewandte Informatik EV (INFAI, Germany)
University of Bielefeld (Germany)
Universita degli Studi di Roma La Sapienza (Italy)
GEIE ERCIM (France)
20/03/2014 19 Asun Gómez-Pérez
What is 3LD?
3LD Linguistic Linked Licensed Data
Language resources such as:
- Lexica
- Corpora
- Dictionaries ..
NIF NLP Interchange Format
Using RDF and standard data models (vocabularies):
- Lexica
- Corpora
ODRL Open Digital Rights Language
Published along with
a machine-readable license.
20/03/2014 20 Asun Gómez-Pérez
Challenge
• Which extensions to the LOD are needed to support a new generation of large-scale content analytics applications that will overcome language barriers. – Expose Linguistic Resources in LD format with license information
• Metadata
• Content
– Guidelines for Linguistic Linked Licensed Data (3LD)
– Specification of a new generation of 3LD aware NLP services
• Requirements: – Keep track of the License information
– Keep track of the Provenance of the resource
– Keep track of the use of the resource
20/03/2014 21 Asun Gómez-Pérez
LOD as large background knowledge for NLP
Producers
Multimedia and Multilingual Content
Metadata Generation
Consumers
Content Analytics
Metadata as LD
... Language Resources (Lexicon, corpora, ...) some of
them are FOI other are private
Linguistic LOD generation (Metadata and Content)
Language resources as LD
LOD-aware NLP services
20/03/2014 22 Asun Gómez-Pérez
Industry use cases
1. Roadmap on 3LD for Content Analytics
2. Guidelines for 3LD
3. 3LD Reference Architecture
Community building
networking LD4LT
BP-MLOD W3C-CG OntoLex W3C-CG
.- Surveys
.- Requirements
20/03/2014 23 Asun Gómez-Pérez
Community Building
• Industrial Board
• Open community Events tailored to the different audiences
– Roadmapping Workshops 2013 • 21 March, EDF (Athens)
• 7-8 May, Multilingual Web WS (Madrid)
• 26-27 May, WS on Emotions (LREC – Reykjavik)
• 27 May, WS on LD and Linguistics (LREC – Reykjavik)
• 4-6 June, WS on Localization World (Dublin)
• 2 September, WS on Semantics Conference (Leipzig)
– Publication of best practices material via W3C community groups • LD4LT
• BP-MLOD W3C-CG
• OntoLex W3C-CG
– Hackathon on September - Semantics Conference (Leipzig)
– Surveys to localization industry and general Web companies
20/03/2014 24 Asun Gómez-Pérez
Expected Contributions from the Community
• Use case definition from industry will be input to the roadmap
• Linguistic resources LLOD
• Validation of guidelines and reference architecture
• Participation in surveys
• Participation in events:
– Roadmapping WS, hackatons, etc.
Lider will help with travelling grants
to participants in Roadmapping WS
20/03/2014 25 Asun Gómez-Pérez
Web channels
www.lider-project.eu
twitter.com/multilingweb
Hashtag: #LiderEU
Join the community
www.w3c.org/community/ld4lt