41
NLP Technologies for Cognitive Computing Lecture 3: Word Senses Devdatt Dubhashi LAB (Machine Learning. Algorithms, Computational Biology) Computer Science and Engineering Chalmers

NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

NLP Technologies for Cognitive Computing

Lecture 3: Word Senses

Devdatt DubhashiLAB

(Machine Learning. Algorithms, Computational Biology) Computer Science and Engineering

Chalmers

Page 2: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Why Language is difficult ..

He sat on the river bank and counted his dough.

She went to the bank and took out some money.

Lexical Layer

Concept Layer

synonymouspolysemous

Page 3: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised
Page 4: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

WSI and WSD

• Word sense induction (WSI): the automatic identification of the senses of a word (i.e. meanings).

• Word sense disambiguation (WSD): identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings.

Page 5: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

WSI and WSD: Two approaches

• Detailed• Interpretable’• High quality

• Greater coverage• Novel senses

(previously unknown)

Knowledge Based Corpus Based/Data Driven

Page 6: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

WORD SENSE INDUCTION

Page 7: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Context and Target Vectors in word2vec

• Assign to each word w, a target vector 𝒖𝒖𝑤𝑤 and a context vector 𝒗𝒗𝑤𝑤 in 𝑹𝑹𝑑𝑑

• Target vector represents the meaning of the word

• Context vector represents the word when it is used in the context of another word.

Page 8: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Key Idea: Cluster contexts

• For each word, look at all the contexts in which it occurs in the corpus

• Clustering those context vectors should give information about its different senses – the Distributional Hypothesis!

Page 9: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

K-Means Clustering• Initialze k centers 𝑐𝑐1…𝑐𝑐𝑘𝑘• Repeat until

convergence: – Assign each point 𝑥𝑥𝑗𝑗 to

its closest center, call the set assigned to 𝑐𝑐𝑖𝑖 , 𝐶𝐶𝑖𝑖

– Recompute the centers: 𝑐𝑐𝑖𝑖 = 1

|𝐶𝐶𝑖𝑖|∑𝑗𝑗∈𝐶𝐶𝑖𝑖 𝑥𝑥𝑗𝑗

Page 10: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Centers: How many and which?

• Performance of k Means depends very strongly on :– Where the initial centers are chosen at the start– How many centers are chosen, the parameter k.

Page 11: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

How to Initialize I

• Pick k random points• Pick k points at random from input points• Assign points at random to k groups, and take

their centroids as initial centers• Pick first center at random, take next center as

far away from first, take next as far away from first two …

Page 12: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

K Means ++ • Let 𝑑𝑑 𝑥𝑥,𝐶𝐶 : = 𝑚𝑚𝑚𝑚𝑚𝑚𝑐𝑐∈𝐶𝐶 𝑑𝑑(𝑥𝑥, 𝑐𝑐)• Start with 𝐶𝐶1containing a single point picked from input uniformly at

random• For 𝑘𝑘 ≥ 2, let

– 𝐶𝐶𝑘𝑘 = 𝐶𝐶𝑘𝑘−1 ∪ {𝑐𝑐∗}

– D. Arthur, S. Vassilvitskii (2007): 𝑂𝑂(log𝑚𝑚) approximation to the optimal.– Disadvantage: needs k passes through input, running time 𝑂𝑂(𝑚𝑚𝑘𝑘𝑑𝑑)

Page 13: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

𝐴𝐴𝐴𝐴𝐴𝐴 −𝑀𝑀𝐶𝐶2 (NIPS 2016)MCMC approach to sampling from the target distribution.

Page 14: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised
Page 15: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Non-parametric clustering: k?

• Intra-cluster variance:

• 𝑊𝑊 = ∑𝑘𝑘𝑊𝑊𝑘𝑘

• Heuristic: Choose k to minimize W• Elbow Heuristic• Gap-Statistic: Choose minimum k

Page 16: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Non-parametric clustering via convex relaxations: SON

Goodness of fit regularization

Page 17: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Quiz

• If we ignore the second (regularization) term, what is the optimal solution to the problem?

Page 18: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

SON: Sparsity inducing norm

• The regularization term is a group norm penalty: it will force 𝜇𝜇𝑖𝑖= 𝜇𝜇𝑗𝑗many centroid pairs (𝜇𝜇𝑖𝑖,𝜇𝜇𝑗𝑗).

• Thus for appropriate 𝜆𝜆 the right number of clusters will be identified automatically tailored to the data without user intervention.

Page 19: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

SON versus k-means

• No need to specify k• Can be used incrementally: as data comes in,

the number of clusters adjusts automatically.• Convex problem, hence unique optimum and

no problems of initialization etc.• Solved efficiently for large data sets by

stochastic proximal gradient descent.

Page 20: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

SON properties

• If the input data has a clear cluster structure: – a mixture of well separated Gaussians – a stochastic block model

Then the SON optimization problem is guaranteed to recover the clusters perfectly.

Page 21: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

ICE Clustering of Context Vectors

• For each word occurrence w, form the weighted centroid of its context vectors: 𝑐𝑐𝑤𝑤 = ∑𝑤𝑤′∈𝑁𝑁(𝑤𝑤)𝛼𝛼𝑤𝑤,𝑤𝑤𝑤 𝑣𝑣𝑤𝑤𝑤

• 𝛼𝛼𝑤𝑤.𝑤𝑤′ = 𝜎𝜎(𝑢𝑢𝑤𝑤𝑣𝑣𝑤𝑤𝑤)• Also use a triangular context window to give

higher weight to words closer to target.• Now apply k-means to the centroid vectors 𝑐𝑐𝑤𝑤

Page 22: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Word sense inductionM. Kageback, F. Johansson et al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP)

Page 23: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Semantic OntologiesWordNet SALDO

Page 24: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Fitting word vectors to ontologies

• Split each word vector into vectors corresponding to its different senses

• Assign the vector for a particular sense to the corresponding node of the semantic network.

Page 25: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Two Objectives

• For each word vector 𝑢𝑢𝑤𝑤, compute vectors 𝑢𝑢𝑤𝑤1,….,𝑢𝑢𝑤𝑤𝑤𝑤 corresonding to its r different senses in the network

• Minimize reconstruction error:

• Maximize fit to network: CBOW model

Page 26: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Overall Optimization Problem

min

-

- What kind of optimization problem is this?- What method would you use to solve it?

http://demo.spraakdata.gu.se/richard/scouse/

Page 27: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

WORD SENSE DISAMBIGUATION

Page 28: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

WSD: Use context

• Given an occurrence of a word in a text, to disambiguate which sense is being used …

• … use the surrounding context.• Use sense vectors and context vectors from

word2vec!

Page 29: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

CBOW approach

Page 30: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Using Order of Words: RNNs

• Can we use the order/sequence of words in the context?

• RNNs!

Page 31: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Use long range dependence

Page 32: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

LSTMs!

Page 33: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

GRUs

Page 34: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

WSD: A BLSTM Model

……

𝑥𝑥|𝐷𝐷|𝑥𝑥0

𝑃𝑃(𝑆𝑆𝑤𝑤0|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤𝑛𝑛−1|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤𝑛𝑛|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤𝑛𝑛+1|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤|𝑉𝑉||𝐷𝐷)…

𝑥𝑥𝑛𝑛+2𝑥𝑥𝑛𝑛+1𝑥𝑥𝑛𝑛−1𝑥𝑥𝑛𝑛−2

𝑆𝑆𝑤𝑤𝑛𝑛= Senses of word type 𝑤𝑤𝑛𝑛𝑥𝑥𝑛𝑛= Word token n in document 𝐷𝐷

Kågebäck and Salomonsson, CogAlex, Coling 2016

https://bitbucket.org/salomons/wsd

Page 35: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Shared parameters

……

𝑥𝑥|𝐷𝐷|𝑥𝑥0

𝑃𝑃(𝑆𝑆𝑤𝑤0|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤𝑛𝑛−1|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤𝑛𝑛|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤𝑛𝑛+1|𝐷𝐷) 𝑃𝑃(𝑆𝑆𝑤𝑤|𝑉𝑉||𝐷𝐷)…

𝑥𝑥𝑛𝑛+2𝑥𝑥𝑛𝑛+1𝑥𝑥𝑛𝑛−1𝑥𝑥𝑛𝑛−2

𝑆𝑆𝑤𝑤𝑛𝑛= Senses of word type 𝑤𝑤𝑛𝑛𝑥𝑥𝑛𝑛= Word token n in document 𝐷𝐷

BLSTM and reflection layer

Efficient use of labeled data

Scale to full vocabulary

No explicit window

Page 36: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Example - rock

……

𝑃𝑃(𝑆𝑆𝑤𝑤𝑟𝑟𝑐𝑐𝑘𝑘|𝐷𝐷)

𝑆𝑆𝑤𝑤𝑟𝑟𝑐𝑐𝑘𝑘= Senses of word type rock… I love rock n' roll …

I love n' 𝑥𝑥|𝐷𝐷|𝑥𝑥0 roll

No part-of-speech

𝑥𝑥𝑛𝑛=rock

No knowledge graphs

No parsers

End-to-end From Text to Sense

Easy to apply and generalize

No hand crafted features

Page 37: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Using Sense Vectors

• Sense vectors useful in many downstream NLP tasks, also in machine translation.

• Sense vectors also useful in document summarization: first disambiguate the sense of each word occurrence, then compose sense vectos to form vector for sentences.

Page 38: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised
Page 39: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Adaptive Natural Language Processing

• Adaptive to collection and to human user

AdaptiveMachine Learning

Use annotations as features

TextData

Find regularities

Annotate regularities in data

WWW

enhance backgroundby focused crawling

StructureDiscovery

Algorithms

StructureDiscovery

Algorithms

StructureDiscovery

Algorithms

adaptive annotation

rich representation

jaguar

ISAvehicle

carbrand

company

SIMmercedes

bmwcadillac

jeep

ISAanimalspecieswildlife

predator

SIMtiger

leopardlion

cougar

semantic model

Page 40: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

Takeaways: Lecture 3

• Word sense induction and disambiguation are fundamental tasks in NLP

• Clustering is a fundamental unsupervised ML technique

• Word, context and sense embeddings are useful tools in these tasks.

• RNN architectures such as LSTMS and GRUsare powerful tools for these tasks

Page 41: NLP Technologies for Cognitive Computing Lecture …Takeaways: Lecture 3 • Word sense induction and disambiguation are fundamental tasks in NLP • Clustering is a fundamental unsupervised

References• M. Kågebäck et al, Neural context embeddings for automatic

discovery of word senses (NAACL 2015 workshop on Vector Space Modeling for NLP)

• T. Hocking et al, “Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties”, ICML 2011.

• R. Johansson and L. Pena Neto, Embedding a Semantic Network in a Word Space, NAACL 2015.

• R. Johansson and L. Pena Neto, Embedding Senses for Efficient Graph Based Word Sense Disambiguation, TextGraphs@NAACL2016

• M. Kågebäck and H. Salomonsson,Word Sense Disambiguation using a Bidirectional LSTM (Coling 2016 Workshop on Cognitive Aspects of the Lexicon (CogALex-V))