38
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 http://lcl.uniroma1.it Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking of Web pages in 50 languages Roberto Navigli [email protected] ERC Starting Grant MultiJEDI No. 259234 LIDER EU Project No. 610782

Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli! 1 !

http://lcl.uniroma1.it!

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking of Web pages in 50 languages Roberto Navigli [email protected]

ERC Starting Grant MultiJEDI No. 259234 LIDER EU Project No. 610782!

Page 2: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Joint work with

A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, 2014.

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro and Roberto Navigli

Andrea Moro

Alessandro Raganato

Page 3: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

2014.05.08 BabelNet & friends Roberto Navigli

3

Page 4: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

2014.05.08 BabelNet & friends Roberto Navigli

4

Page 5: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Motivation

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

•  Domain-specific Web content is available in many languages

•  Information should be extracted and processed independently of the source/target language

•  This could be done automatically by means of high-performance multilingual text understanding

Page 6: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

BabelNet (http://babelnet.org) A wide-coverage multilingual semantic network and

encyclopedic dictionary in 50 languages!

Concepts from WordNet Named Entities and specialized concepts

from Wikipedia

Concepts integrated from both resources

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 7: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

BabelNet (http://babelnet.org)

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 8: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

•  50 languages covered (including Latin!)

•  21 million textual definitions

•  67M word senses and named entities!

•  1.1B RDF triples available via SPARQL endpoint •  Seamless integration of:

•  WordNet 3.0 •  Wikipedia •  Wiktionary •  OmegaWiki: a collaborative multilingual dictionary •  Open Multilingual WordNet [Bond and Foster, 2013]

•  Translations for all open-class parts of speech

New 2.5 version out!

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 9: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

BabelNet as a Multilingual Inventory for

Concepts Calcio in Italian can denote different concepts:

Named Entities The text mario can be used to represent different things such as the video game charachter or a soccer player (Gomez) or even a music album

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 10: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Calcio / Kick in BabelNet 2.5

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 11: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Calcio / Calcium in BabelNet 2.5

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 12: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Calcio / Soccer in BabelNet 2.5

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 13: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Word Sense Disambiguation in a Nutshell

striker (target word)

“Thomas and Mario are strikers playing in Munich” (context)

WSD system

knowledge

sense of target word

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 14: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Entity Linking in a Nutshell

Thomas (target mention)

“Thomas and Mario are strikers playing in Munich” (context)

Entity Linking system

Named Entity

knowledge

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 15: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Disambiguation and Entity Linking together!

BabelNet is a huge multilingual inventory for both word senses and named entities!

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 16: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

.org

Page 17: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfy: A Joint approach to WSD and EL

Personalized PageRank is the state-of-the-art method for graph-based word sense disambiguation, however it cannot be run for each new input on huge graphs.

Idea: Precompute semantic signatures for the nodes! Semantic signatures are the most relevant nodes for a

given node in the graph computed by using random walk with restart

Andrea Moro and Alessandro Raganato and Roberto Navigli. 2014. Entity

Linking meets Word Sense Disambiguation: a Unified Approach. TACL http://babelfy.org

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 18: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Semantic Signatures: RWR

1.  Start from one target vertex of the semantic netowork;

2.  Randomly select a neighbor of the current vertex or restart at the target vertex;

3.  Keep counting the hitting frequencies;

4.  Take the most visited vertices.

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 19: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Step 1: Calculate Semantic Signatures

striker offside

athlete

sport soccer player

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 20: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

A Joint approach to WSD and EL

1.  Given an input text select all the possible candidate meanings from BabelNet by matching mentions with BabelNet lexicalizations;

2.  Connect all the candidate meanings by using semantic signatures;

3.  Extract a dense subgraph containing semantically coherent candidates;

4.  Select the most connected candidate for each fragment of text.

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 21: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Step 2: Find all possible meanings of words in context “Thomas and Mario are strikers playing in Munich”

Thomas (novel)

Seth Thomas

Thomas Müller

Mario Gómez

Mario (Album)

Mario (Character)

Striker (Movie)

Striker (Video Game)

striker (Sport) Munich (City)

FC Bayern Munich

Munich (Song)

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 22: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Step 2: Find all possible meanings of words in context “Thomas and Mario are strikers playing in Munich”

Thomas (novel)

Seth Thomas

Thomas Müller

Mario Gómez

Mario (Album)

Mario (Character)

Striker (Movie)

Striker (Video Game)

striker (Sport) Munich (City)

FC Bayern Munich

Munich (Song)

Ambiguity!

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 23: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Step 3: Connect all the candidate meanings

Thomas and Mario are strikers playing in Munich

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 24: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Step 4: Extract a dense subgraph

Thomas and Mario are strikers playing in Munich

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 25: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Step 5: Select the most reliable meanings

“Thomas and Mario are strikers playing in Munich”

Thomas (novel)

Seth Thomas

Thomas Müller

Mario Gómez

Mario (Album)

Mario (Character)

Striker (Movie)

Striker (Video Game)

striker (Sport) Munich (City)

FC Bayern Munich

Munich (Song)

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 26: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Experimental Results: State-of-the-art multilingual disambiguation

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 27: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Experimental Results

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 28: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Experimental Results: State-of-the-art Entity Linking

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 29: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm Andrea Moro and Roberto Navigli

Page 30: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the
Page 31: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm Andrea Moro and Roberto Navigli

Page 32: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 33: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 34: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 35: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 36: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 37: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

Thanks or…

m i (grazie)

Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli

Page 38: Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the

http://babelnet.org http://babelfy.org

Roberto Navigli