12
LIDER Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe A. Gómez-Pérez (UPM) asun@fi.upm.es Project Coordinator LIDER CSA Budget: 1.482.000€ Starting date: 1. Nov. 2013 Duration: 2 Years

Gómez-Pérez (UPM) [email protected] Project Coordinator

  • Upload
    sondra

  • View
    41

  • Download
    2

Embed Size (px)

DESCRIPTION

Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe. Gómez-Pérez (UPM) [email protected] Project Coordinator. CSA Budget : 1.482.000€ Starting date: 1. Nov. 2013 Duration : 2 Years. The LIDER consortium. - PowerPoint PPT Presentation

Citation preview

Page 1: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER

Linked Data as an enabler of cross-media and multilingual

content analytics for enterprises across Europe

A. Gómez-Pérez (UPM) [email protected]

Project Coordinator

LIDER

CSABudget: 1.482.000€Starting date: 1. Nov. 2013Duration: 2 Years

Page 2: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER The LIDER consortium

2

Universidad Politécnica de Madrid (UPM, Spain) [COORDINATOR]

Trinity College Dublin (Ireland)

DFKI (Germany)

National University of Ireland, Galway (Ireland)

Institut für Angewandte Informatik EV (INFAI, Germany)

University of Bielefeld (Germany)

Universita degli Studi di Roma La Sapienza (Italy)

GEIE ERCIM (France)

Page 3: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER Evidence of industrial demand

Multilingual multimedia content annotation.o Increase demand for NLP services that combine text

processing with Multimedia meta-data and media processing components.

LOD generation from linguistic resourceso data is already being published by companies, but

not linguistic resources as LLOD LOD-based NLP services for Content Analytics

o CA related companies that actively use the English Dbpedia (OpenCalais, Zemanta, Ontos, Yahoo!, Nerd, etc.)

o multilingual LOD would be vital for reaching EU-wide and global markets

3

Page 4: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER The use of LOD for NLP in Content Analytics

Which extensions to the LOD are needed to support a new generation of large-scale content analytics applications that will overcome language barriers. o identification of key NLP

tasks that require background knowledge

o Specification of a new generation of NLP services that are LOD-aware and can exploit LOD

Licensed linguistic linked data (LLD or LLOD)

Page 5: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER Linked Open Data and Language

2007

2009

2012

1. LOD is increasingly multilingual2. LOD interconnects resources in

many languages

Page 6: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER

2,567,324

10,250,936

3,154,779

10,594,338 12,272,806

3,365,930

RDF literals without language tag

RDF literals withlanguage tag

January 2012 June 2012 December 2012

2. Current usage of language tagging capabilities in RDF

349

1,906

635

2,201 1,984

676

Monolingualdatasets

Multilingualdatasets

January 2012 June 2012 December 2012

1. Number of Monolingual and multilingual datasets

4. Evolution of top-10 languages (non Eglish)

LOD is dominated by the English language

431,660

2,135,664 2,751,065

403,714

2,808,145

557,785

RDF literals withEnglish tag

RDF literals withother language tag

January 2012 June 2012 December 2012

3. English tags versus other languages' tags

Page 7: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER LOD as large background knowledge for NLP

7

Multimedia andMultilingual Content

Producers

Metadata Generation

Multilingual content medatada

Consumers

Content Analytics

...Language Resources (Lexicon, corpora, ...)

some of them are FOI other are private

Linguistic LOD generation

LLOD (language resources as LD)

LOD-aware NLP services

Page 8: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER Iterative approach

8

Industry use cases

Roadmap, guidelines,

target architecture

Community building

networking LIDER

Page 9: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER Expected Contributions from the Community

Use case definition from industry will be input to the roadmap

Linguistic resources LLOD Validation of guidelines and

reference architecture Participation in surveys Participation in events:

o Roadmapping WS, hackatons, etc.

9

Lider will help with travelling grants to participants in Roadmapping WS

[email protected]

Page 10: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER

Linked Data as an enabler of cross-media and multilingual

content analytics for enterprises across Europe

A. Gómez-Pérez (UPM) [email protected]

Project Coordinator

LIDER

Page 11: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER The use of (Linguistic) LOD for NLP

Linguistic LOD (LLOD) Subset of LOD Linguistic and Open resources

in RDF interconnected with other Linguistic and Open resources

Not too many linguistic resources as LOD

Linguistic LD (LLD) Licensed linguistic linked

data

LOD, LLOD and LLD as a source of large background knowledge for NLP

11

Page 12: Gómez-Pérez  (UPM)  asun@fi.upm.es Project  Coordinator

LIDER Lot of domain data in LOD…

Music

Geographic Life Sciences

PublicationsE-Gov

On-line activities

Cross-domains