98
Jose Camacho-Collados Cardiff University, 18 March 2019 Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP 1

Cardiff University, 18 March 2019 Word, Sense and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cardiff University, 18 March 2019 Word, Sense and

Jose Camacho-Collados

Cardiff University, 18 March 2019

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP

1

Page 2: Cardiff University, 18 March 2019 Word, Sense and

Outline

2

❖ Background➢ Vector Space Models (word embeddings)➢ Lexical resources

❖ Sense representations

➢ Knowledge-based: NASARI, SW2V

➢ Contextualized: ELMo, BERT

❖ Applications

Page 3: Cardiff University, 18 March 2019 Word, Sense and

Word vector space models

3

Words are represented as vectors: semantically similar words

are close in the vector space

Page 4: Cardiff University, 18 March 2019 Word, Sense and

Neural networks for learning word vector representations from text corpora -> word embeddings

4

Page 5: Cardiff University, 18 March 2019 Word, Sense and

Why word embeddings?Embedded vector representations:

• are compact and fast to compute

• preserve important relational information between

words (actually, meanings):

• are geared towards general use

5

Page 6: Cardiff University, 18 March 2019 Word, Sense and

• Syntactic parsing (Weiss et al. 2015)

• Named Entity Recognition (Guo et al. 2014)

• Question Answering (Bordes et al. 2014)

• Machine Translation (Zou et al. 2013)

• Sentiment Analysis (Socher et al. 2013)

… and many more!

6

Applications for word representations

Page 7: Cardiff University, 18 March 2019 Word, Sense and

AI goal: language understanding

7

Page 8: Cardiff University, 18 March 2019 Word, Sense and

• Word representations cannot capture ambiguity. For instance, bank

8

Limitations of word embeddings

Page 9: Cardiff University, 18 March 2019 Word, Sense and

9

Problem 1: word representations cannot capture ambiguity

Page 10: Cardiff University, 18 March 2019 Word, Sense and

07/07/2016 10

Problem 1: word representations cannot capture ambiguity

Page 11: Cardiff University, 18 March 2019 Word, Sense and

11

Problem 1: word representations cannot capture ambiguity

Page 12: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

12

Word representations and the triangular inequality

Example from Neelakantan et al (2014)

plant

pollen refinery

Page 13: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

13

Example from Neelakantan et al (2014)

plant1

pollen refinery

plant2

Word representations and the triangular inequality

Page 14: Cardiff University, 18 March 2019 Word, Sense and

• They cannot capture ambiguity. For instance, bank

-> They neglect rare senses and infrequent words

• Word representations do not exploit knowledge from existing lexical resources.

14

Limitations of word representations

Page 15: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

Motivation: Model senses instead of only words

He withdrew money from the bank.

Page 16: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

Motivation: Model senses instead of only words

...

...

He withdrew money from the bank.

bank#1

bank#2

Page 17: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

Motivation: Model senses instead of only words

...

...

He withdrew money from the bank.

bank#1

bank#2

Page 18: Cardiff University, 18 March 2019 Word, Sense and

18

a Novel Approach to a Semantically-Aware

Representations of Items

http://lcl.uniroma1.it/nasari/

Page 19: Cardiff University, 18 March 2019 Word, Sense and

19

Key goal: obtain sense representations

Page 20: Cardiff University, 18 March 2019 Word, Sense and

20

Key goal: obtain sense representations

We want to create a separate representation for each entry of a given word

Page 21: Cardiff University, 18 March 2019 Word, Sense and

WordNet

Idea

21

+Encyclopedic knowledge Lexicographic knowledge

Page 22: Cardiff University, 18 March 2019 Word, Sense and

WordNet

Idea

22

+Encyclopedic knowledge Lexicographic knowledge

+Information from text corpora

Page 23: Cardiff University, 18 March 2019 Word, Sense and

23

WordNet

Page 24: Cardiff University, 18 March 2019 Word, Sense and

WordNetMain unit: synset (concept)

electronic devicetelevision, telly,

television set, tv, tube, tv set, idiot box, boob tube,

goggle box

the middle of the day

Noon, twelve noon, high noon, midday, noonday, noontide

24

synset

word sense

Page 25: Cardiff University, 18 March 2019 Word, Sense and

the branch of biology that

studies plantsbotany

WordNet semantic relations

((botany) a living organism lacking

the power of locomotion

plant, flora, plant life

a living thing that has (or can

develop) the ability to act or function independently

organism, being

any of a variety of plants grown indoors

for decorative purposes

houseplant

a protective covering that

is part of aplant

hood, capHypernymy

(is-a)

DomainHyponymy (has-kind)

Meronymy(part of)

25

Page 26: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

Knowledge-based Representations (WordNet)

X. Chen, Z. Liu, M. Sun: A Unified Model for Word Sense Representation and Disambiguation (EMNLP 2014)

S. Rothe and H. Schutze: AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes (ACL 2015)

Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. Retrofitting Word Vectors to Semantic Lexicons (NAACL 2015)*

S. K. Jauhar, C. Dyer, E. Hovy: Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models (NAACL 2015)

M. T. Pilehvar and N. Collier, De-Conflated Semantic Representations (EMNLP 2016)

26

Page 27: Cardiff University, 18 March 2019 Word, Sense and

27

Wikipedia

Page 28: Cardiff University, 18 March 2019 Word, Sense and

28

WikipediaHigh coverage of named entities and

specialized concepts from different domains

Page 29: Cardiff University, 18 March 2019 Word, Sense and

29

Wikipedia hyperlinks

Page 30: Cardiff University, 18 March 2019 Word, Sense and

30

Wikipedia hyperlinks

Page 31: Cardiff University, 18 March 2019 Word, Sense and

Thanks to an automatic mapping algorithm, BabelNet

integrates Wikipedia and WordNet, among other

resources (Wiktionary, OmegaWiki, WikiData…).

Key feature: Multilinguality (271 languages)

31

Page 32: Cardiff University, 18 March 2019 Word, Sense and

32

BabelNet

Concept

Entity

Page 33: Cardiff University, 18 March 2019 Word, Sense and

It follows the same structure of WordNet:

synsets are the main units

33

BabelNet

Page 34: Cardiff University, 18 March 2019 Word, Sense and

In this case, synsets are multilingual

34

BabelNet

Page 35: Cardiff University, 18 March 2019 Word, Sense and

35

NASARI(Camacho-Collados et al., AIJ 2016)

Goal

Build vector representations for multilingual BabelNet

synsets.

How?

We exploit Wikipedia semantic network and WordNet

taxonomy to construct a subcorpus (contextual

information) for any given BabelNet synset.

Page 36: Cardiff University, 18 March 2019 Word, Sense and

36

Process of obtaining contextual information for a BabelNet synset exploiting BabelNet taxonomy and Wikipedia as a semantic network

Pipeline

Page 37: Cardiff University, 18 March 2019 Word, Sense and

37

Three types of vector representations:

- Lexical (dimensions are words)

-

- Unified (dimensions are multilingual BabelNet synsets)

-

- Embedded (latent dimensions)

Three types of vector representations

Page 38: Cardiff University, 18 March 2019 Word, Sense and

38

Three types of vector representations:

- Lexical (dimensions are words)

-

- Unified (dimensions are multilingual BabelNet synsets)

-

- Embedded (latent dimensions)

Three types of vector representations

}

Page 39: Cardiff University, 18 March 2019 Word, Sense and

Human-interpretable dimensions

plant (living organism)

organism#1

table#3

tree#1leaf#14

soil#2carpet#2

food#2garden#2

dictionary#3

refinery#1

39

Page 40: Cardiff University, 18 March 2019 Word, Sense and

40

Three types of vector representations:

- Lexical (dimensions are words)

- Unified (dimensions are multilingual BabelNet synsets)

- Embedded: Low-dimensional vectors exploiting word embeddings

obtained from text corpora.

Three types of vector representations

Page 41: Cardiff University, 18 March 2019 Word, Sense and

41

Three types of vector representations:

- Lexical (dimensions are words)

- Unified (dimensions are multilingual BabelNet synsets)

- Embedded: Low-dimensional vectors exploiting word

embeddings obtained from text corpora.

-

Word and synset embeddings share the same vector space!

Three types of vector representations

Page 42: Cardiff University, 18 March 2019 Word, Sense and

42

Embedded vector representationClosest senses

Page 43: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

43

SW2V (Mancini and Camacho-Collados et al., CoNLL 2017)

A word is the surface form of a sense: we can exploit this intrinsic relationship for jointly training word and sense embeddings.

Page 44: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

44

SW2V (Mancini and Camacho-Collados et al., CoNLL 2017)

A word is the surface form of a sense: we can exploit this intrinsic relationship for jointly training word and sense embeddings.

How?

Updating the representation of the word and its associated senses interchangeably.

Page 45: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

45

SW2V: IdeaGiven as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

He withdrew money from the bank.

Page 46: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

46

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

He withdrew money from the bank.

SW2V: Idea

Page 47: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

47

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

SW2V: Idea

Page 48: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

48

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank

money

withdrew thefrom

SW2V: Idea

Page 49: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

49

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank

money

withdrew thefrom

error

SW2V: Idea

Page 50: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

50

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank

money

withdrew thefrom

error

SW2V: Idea

Page 51: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

51

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank

money

withdrew thefrom

error

SW2V: Idea

Page 52: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

52

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

He bank

money

withdrew thefrom

error

SW2V: Idea

Page 53: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

53

Given as input a corpus and a semantic network:

1. Use a semantic network to link to each word its associated senses in context.

2. Use a neural network where the update of word and sense embeddings is linked, exploiting virtual connections.

In this way it is possible to learn word and sense/synset embeddings jointly on a single training.

SW2V: Idea

Page 54: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

54

Full architecture of W2V (Mikolov et al., 2013)

Words and associated senses used both as input and output.

E=-log(p(wt|Wt))

Page 55: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

55

Words and associated senses used both as input and output.

Full architecture of SW2V

E=-log(p(wt|Wt,St)) - ∑s∈St log(p(s|Wt,St))

Page 56: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

56

Word and senses connectivity: example 1

Ten closest word and sense embeddings to the sense company (military unit)

Page 57: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

57

Word and senses connectivity: example 2

Ten closest word and sense embeddings to the sense school (group of fish)

Page 58: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

58

- Taxonomy Learning (Espinosa-Anke et al., EMNLP 2016)

- Open Information Extraction (Delli Bovi et al. EMNLP 2015).

- Lexical entailment (Nickel & Kiela, NIPS 2017)

- Word Sense Disambiguation (Rothe & Schütze, ACL 2015)

- Sentiment analysis (Flekova & Gurevych, ACL 2016)

- Lexical substitution (Cocos et al., SENSE 2017)

- Computer vision (Young et al. ICRA 2017)

...

Applications of knowledge-based sense representations

Page 59: Cardiff University, 18 March 2019 Word, Sense and

59

❖ Domain labeling/adaptation

❖ Word Sense Disambiguation

❖ Downstream NLP applications (e.g. text classification)

Applications

Page 60: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

60

Annotate each concept/entity with its corresponding

domain of knowledge.

To this end, we use the Wikipedia featured articles page,

which includes 34 domains and a number of Wikipedia

pages associated with each domain (Biology, Geography,

Mathematics, Music, etc. ).

Domain labeling(Camacho-Collados and Navigli, EACL 2017)

Page 61: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

Wikipedia featured articles

Domain labeling

Page 62: Cardiff University, 18 March 2019 Word, Sense and

NAACL 2018 Tutorial: The Interplay between Lexical Resources and Natural Language ProcessingCamacho-Collados, Espinosa-Anke, Pilehvar

62

How to associate a concept with a domain?

1. Learn a NASARI vector for the concatenation of all Wikipedia pages

associated with a given domain.

2. Exploit the semantic similarity between knowledge-based vectors

and graph properties of the lexical resources.

Domain labeling

Page 63: Cardiff University, 18 March 2019 Word, Sense and

63

BabelDomains(Camacho-Collados and Navigli, EACL 2017)

As a result:

Unified resource with information about domains of knowledge

BabelDomains available for BabelNet, Wikipedia and WordNet available at

http://lcl.uniroma1.it/babeldomains

Already integrated into BabelNet (online interface and API)

Page 64: Cardiff University, 18 March 2019 Word, Sense and

64

BabelDomains

Physics and astronomy

Computing

Media

Page 65: Cardiff University, 18 March 2019 Word, Sense and

65

Task: Given a term, predict its hypernym(s)

Model: Distributional supervised system based on the transformation

matrix of Mikolov et al. (2013).

Idea: Training data filtered by domain of knowledge

Domain filtering for supervised distributional hypernym discovery

(Espinosa-Anke et al., EMNLP 2016;Camacho-Collados and Navigli, EACL 2017)

Fruit

Apple

is a

Page 66: Cardiff University, 18 March 2019 Word, Sense and

66

Domain filtering for supervised distributional hypernym discovery

Results on the hypernym discovery task for five domains

Conclusion: Filtering training data by domains prove to be clearly beneficial

Domain-filtered training data

Non-filtered training data

Page 67: Cardiff University, 18 March 2019 Word, Sense and

67

Kobe, which is one of Japan's largest cities, [...]

?

Word Sense Disambiguation

Page 68: Cardiff University, 18 March 2019 Word, Sense and

68

Kobe, which is one of Japan's largest cities, [...]

X

Word Sense Disambiguation

Page 69: Cardiff University, 18 March 2019 Word, Sense and

69

Kobe, which is one of Japan's largest cities, [...]

Word Sense Disambiguation

Page 70: Cardiff University, 18 March 2019 Word, Sense and

70

Word Sense Disambiguation (Camacho-Collados et al., AIJ 2016)

Basic idea

Select the sense which is semantically closer to the

semantic representation of the whole document

(global context).

Page 71: Cardiff University, 18 March 2019 Word, Sense and

Word Sense Disambiguation on textual definitions

(Camacho-Collados et al., LREC 2016; LREV 2018)

Combination of a graph-based disambiguation system (Babelfy) with NASARI to disambiguate the concepts and named entities of over 35M definitions in 256 languages.

Sense-annotated corpus freely available at

http://lcl.uniroma1.it/disambiguated-glosses/

71

Page 72: Cardiff University, 18 March 2019 Word, Sense and

Context-rich WSD

Interchanging the positions of the king and a rook.

castling (chess)

Page 73: Cardiff University, 18 March 2019 Word, Sense and

Context-rich WSD

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Interchanging the positions of the king and a rook.

castling (chess)

Page 74: Cardiff University, 18 March 2019 Word, Sense and

Context-rich WSD

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

castling (chess)

Page 75: Cardiff University, 18 March 2019 Word, Sense and

Context-rich WSD

75

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

castling (chess)

Page 76: Cardiff University, 18 March 2019 Word, Sense and

Context-rich WSD exploiting parallel corpora

(Delli Bovi et al., ACL 2017)

Applying the same method to provide high-quality sense annotations from parallel corpora (Europarl): 120M+ sense annotations for 21 languages.

http://lcl.uniroma1.it/eurosense/

Extrinsic evaluation: Improved performance of a standard supervised WSD system using this automatically sense-annotated corpora.

Page 77: Cardiff University, 18 March 2019 Word, Sense and

77

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a

standard neural classifier?

Problems:

Page 78: Cardiff University, 18 March 2019 Word, Sense and

78

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a

standard neural classifier?

Problems:

- WSD is not perfect

Page 79: Cardiff University, 18 March 2019 Word, Sense and

79

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a

standard neural classifier?

Problems:

- WSD is not perfect -> Solution: High-confidence disambiguation

Page 80: Cardiff University, 18 March 2019 Word, Sense and

80

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a

standard neural classifier?

Problems:

- WSD is not perfect -> Solution: High-confidence disambiguation

- WordNet lacks coverage

Page 81: Cardiff University, 18 March 2019 Word, Sense and

81

Towards a seamless integration of senses in downstream NLP applications

(Pilehvar et al., ACL 2017)

Question: What if we apply WSD and inject sense embeddings to a

standard neural classifier?

Problems:

- WSD is not perfect -> Solution: High-confidence disambiguation

- WordNet lacks coverage -> Solution: Use of Wikipedia

Page 82: Cardiff University, 18 March 2019 Word, Sense and

82

Tasks: Topic categorization and sentiment analysis (polarity detection)

Topic categorization: Given a text, assign it a topic (e.g. politics, sports, etc.).

Polarity detection: Predict the sentiment of the sentence/review as either positive or negative.

Page 83: Cardiff University, 18 March 2019 Word, Sense and

83

Classification model

Standard CNN classifier

inspired by Kim (2014)

Page 84: Cardiff University, 18 March 2019 Word, Sense and

84

Sense-based vs. word-based: Conclusions

Sense-based better than word-based... when the input text is large enough

Page 85: Cardiff University, 18 March 2019 Word, Sense and

85

Sense-based vs. word-based:

Sense-based better than word-based... when the input text is large enough:

Page 86: Cardiff University, 18 March 2019 Word, Sense and

86

Why does the input text size matter?

- Word sense disambiguation works better in larger texts (Moro et al. 2014; Raganato et al. 2017)

- Disambiguation increases sparsity

Page 87: Cardiff University, 18 March 2019 Word, Sense and

87

Contextualized word embeddingsELMo/BERT

Peters et al. (NAACL 2018)

Devlin et al. (NAACL 2019)

Page 88: Cardiff University, 18 March 2019 Word, Sense and

88

Contextualized word embeddingsELMo/BERT

Page 89: Cardiff University, 18 March 2019 Word, Sense and

89

Contextualized word embeddingsELMo/BERT

As word embeddings, learned by leveraging language models on massive amounts of text corpora.

Page 90: Cardiff University, 18 March 2019 Word, Sense and

90

Contextualized word embeddingsELMo/BERT

As word embeddings, learned by leveraging language models on massive amounts of text corpora.

New: each word vector depends on the context. It is dynamic.

Page 91: Cardiff University, 18 March 2019 Word, Sense and

91

Contextualized word embeddingsELMo/BERT

As word embeddings, learned by leveraging language models on massive amounts of text corpora.

New: each word vector depends on the context. It is dynamic.

Important improvements in many NLP tasks.

Page 92: Cardiff University, 18 March 2019 Word, Sense and

92

Contextualized word embeddingsELMo/BERT (examples)

He withdrew money from the bank.

The bank remained closed yesterday.

We found a nice spot by the bank of the river.

Page 93: Cardiff University, 18 March 2019 Word, Sense and

93

Contextualized word embeddingsELMo/BERT (examples)

He withdrew money from the bank.

The bank remained closed yesterday.

We found a nice spot by the bank of the river.

0.25, 0.32, -0.1 ….

0.22, 0.30, -0.08 ….

-0.8, 0.01, 0.3 ….

Page 94: Cardiff University, 18 March 2019 Word, Sense and

94

Contextualized word embeddingsELMo/BERT (examples)

He withdrew money from the bank.

The bank remained closed yesterday.

We found a nice spot by the bank of the river.

0.25, 0.32, -0.1 ….

0.22, 0.30, -0.08 ….

-0.8, 0.01, 0.3 ….

Similar vectors

Page 95: Cardiff University, 18 March 2019 Word, Sense and

95

How well do these models capture “meaning”?

Good enough for many applications.

Room for improvement. No noticeable improvements in:

➢ Winograd Schema Challenge: BERT ~65% vs Humans ~95%

➢ Word-in-Context Challenge: BERT ~65% vs Humans ~85%

Page 96: Cardiff University, 18 March 2019 Word, Sense and

96

How well do these models capture “meaning”?

Good enough for many applications.

Room for improvement. No noticeable improvements in:

➢ Winograd Schema Challenge: BERT ~65% vs Humans ~95%

➢ Word-in-Context Challenge: BERT ~65% vs Humans ~85%

requires commonsense reasoning

requires abstracting the notion of sense

Page 97: Cardiff University, 18 March 2019 Word, Sense and

97

For more information on meaning representations (embeddings):

❖ ACL 2016 Tutorial on “Semantic representations of word senses and concepts”: http://josecamachocollados.com/slides/Slides_ACL16Tutorial_SemanticRepresentation.pdf

❖ EACL 2017 workshop on “Sense, Concept and Entity Representations and their Applications”: https://sites.google.com/site/senseworkshop2017/

❖ NAACL 2018 Tutorial on “Interplay between lexical resources and NLP”: https://bitbucket.org/luisespinosa/lr-nlp/

❖ “From Word to Sense Embeddings: A Survey on Vector Representations of Meaning” (JAIR 2018): https://www.jair.org/index.php/jair/article/view/11259

Page 98: Cardiff University, 18 March 2019 Word, Sense and

98

Thank you!

Questions please!

[email protected]@CamachoCollados

josecamachocollados.com