UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf ·...

Preview:

Citation preview

”UBC-ALM: Combining k-NN with SVD for WSD”

Michael Brutz

CU-Boulder Department of Applied Mathematics

21 March 2013

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 1 / 41

UBC-ALM: Combining k-NN and SVD for WSD

This 2007 paper by Agirre and Lacalle looks at representing senses of aword with feature vectors and combining two mathematical methods to doWSD using those vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 2 / 41

Outline

1 Feature Vectors2 SVD3 k-Nearest Neighbors4 Paper’s Results

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 3 / 41

Feature Vectors

Feature Vectors

The idea is to represent a word with a vector (a list of numbers).

One of the simplest ways to do this is what is known as the bag-of-wordsapproach.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 4 / 41

Feature Vectors

Feature Vectors

The idea is to represent a word with a vector (a list of numbers).

One of the simplest ways to do this is what is known as the bag-of-wordsapproach.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 4 / 41

Feature Vectors

Feature Vectors

If you have the sentence:

”The cat ran.”

Then the bag-of-words representation for the words in this sentence is tohave a vector where the only non-zero entries are:

the = 1, cat =1, ran =1

basically, just having a value of 1 for every position in the vectorcorresponding to a word that appears in the sentence.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 5 / 41

Feature Vectors

Feature Vectors

To make this into a vector, all we do is associate each entry in the vectorwith a word.

For the ”The cat ran.” example, this vector would look like

|A1

|

=

10...10...1

Again, where the 1’s are in the slots for ”the”, ”cat”, and ”ran”, and all

the others are 0.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 6 / 41

Feature Vectors

Feature Vectors

When we collect a set of such vectors into a single object like so

| |A1 A2

| |

=

1...

0 1... 0

1...

0 1...

...1 0

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 7 / 41

Feature Vectors

Feature Vectors

We have what is called a matrix.

| |A1 A2

| |

=

1...

0 1... 0

1...

0 1...

...1 0

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 8 / 41

Feature Vectors

Feature Vectors

I’m only using two vectors for illustration, you can include any number andstill have a matrix.

| | |A1 A2 ... An

| | |

=

1...

...0 1 0... 0 1

1... ...

...0 1 0...

......

1 0 1

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 9 / 41

Feature Vectors

Feature Vectors

This leads us into SVD.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 10 / 41

SVD

SVD

SVD stands for Singular Value Decomposition.

In mathy terms, it’s where you take a matrix, A, and break it up into theproduct of three special matrices.

SVD of A

A = UΣV T

U contains the left singular vectors of the matrix A, Σ contains its singularvalues, and V contains its right singular vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 11 / 41

SVD

SVD

SVD stands for Singular Value Decomposition.

In mathy terms, it’s where you take a matrix, A, and break it up into theproduct of three special matrices.

SVD of A

A = UΣV T

U contains the left singular vectors of the matrix A, Σ contains its singularvalues, and V contains its right singular vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 11 / 41

SVD

SVD

This is what it looks like for a 3 by 3 matrix.

a11 a12 a13a21 a22 a23a31 a32 a33

= | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 σ3

− v1 −− v2 −− v3 −

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 12 / 41

SVD

SVD

The terms in the middle matrix, the σ’s, are what are called the ”singularvalues”. a11 a12 a13

a21 a22 a23a31 a32 a33

= | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 σ3

− v1 −− v2 −− v3 −

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 13 / 41

SVD

SVD

When we delete the singular values the farthest along the diagonal, we stillhave a good approximation to our original matrix A a11 a12 a13

a21 a22 a23a31 a32 a33

≈ | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 0

− v1 −− v2 −− v3 −

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 14 / 41

SVD

SVD

This implies we can approximate the three column vectors of the A matrixby only using the first two u vectors. a11 a12 a13

a21 a22 a23a31 a32 a33

≈ | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 0

− v1 −− v2 −− v3 −

You have to know how matrix multiplication works to see this, but the ideaof using SVD to represent more vectors with fewer is the important thing.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 15 / 41

SVD

SVD

This implies we can approximate the three column vectors of the A matrixby only using the first two u vectors. a11 a12 a13

a21 a22 a23a31 a32 a33

≈ | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 0

− v1 −− v2 −− v3 −

You have to know how matrix multiplication works to see this, but the ideaof using SVD to represent more vectors with fewer is the important thing.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 15 / 41

SVD

SVD

So what do these approximations do?

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 16 / 41

SVD

SVD

Let’s ask Big Bird!

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 17 / 41

SVD

SVD

Grayscale images are represented with numbers that determine by howbright to make each pixel, meaning this 104 by 138 pixel image is actually

represented by a matrix.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 18 / 41

SVD

SVD

Using 1 Singular Value

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 19 / 41

SVD

SVD

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 20 / 41

SVD

SVD

Using 40 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 21 / 41

SVD

SVD

Using All 104 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 22 / 41

SVD

SVD

As you can see, we can get a pretty good approximation to what all thevectors comprising the image are doing, using only a fraction of thesingular vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 23 / 41

SVD

SVD

Moreover, with regards to WSD it isn’t that we want this

Using 40 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 24 / 41

SVD

SVD

We actually want this

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 25 / 41

SVD

SVD

The reason is that while using fewer singular values blurs the image, this istantamount to adding in unseen word counts with respect to our vectordescription of words.

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 26 / 41

SVD

SVD

It’s a double whammy! Not only do we have fewer vectors to deal with,but we also get guesstimates to word counts we haven’t seen in collectingdata, but suspect are relevant.

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 27 / 41

SVD

SVD

So instead of using simple bag-of-words vectors, we instead use thesesingular vectors to describe senses of a word as they help with sparsity ofdata issues.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 28 / 41

k-Nearest Neighbors

k-NN

The last piece of the puzzle is to use k-NN (k-Nearest Neighbors) todisambiguate words.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 29 / 41

k-Nearest Neighbors

k-NN

Before understanding how it works, one needs to realize that vectors havea geometric interpretation as points in space.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 30 / 41

k-Nearest Neighbors

k-NN

For example, take the vector

(0.40.6

)

This can be represented graphically as

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 31 / 41

k-Nearest Neighbors

k-NN

For example, take the vector

(0.40.6

)

This can be represented graphically as

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 31 / 41

k-Nearest Neighbors

k-NN

Graphically Representing (0.4,0.6)

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 32 / 41

k-Nearest Neighbors

k-NN

For the sake of illustration, say that this ”X” represents the word ”bass”and we are trying to find the sense for it.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 33 / 41

k-Nearest Neighbors

k-NN

Now add in the points that have been tagged in training our WSDclassifier.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 34 / 41

k-Nearest Neighbors

k-NN

Let ∝ represent the ”fish” version and the eighth-notes represent themusic version for feature vectors from a hand-tagged corpus.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 35 / 41

k-Nearest Neighbors

k-NN

The way k-NN works is to assign the label to the target word that isshared the most by its k nearest hand-tagged neighbors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 36 / 41

k-Nearest Neighbors

k-NN

For example, if k = 3 then in this example we would say that the targetword has the fish sense .

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 37 / 41

k-Nearest Neighbors

k-NN

If k = 5, then the music sense.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 38 / 41

Paper’s Results

Paper’s Results

In the paper, all of these approaches had some extra bells and whistlesthrown on them (e.g. the feature vectors weren’t just bag-of-words, andthey used a weighted k-NN), but nothing really worth getting into.

If you’ve understood what I’ve said so far, then you understand what theywere fundamentally doing in the paper.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 39 / 41

Paper’s Results

Paper’s Results

They compared their WSD classifier to the others entered in theSemEval-2007 on the Lexical Sample and All-Words tasks, and overall didpretty well:

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 40 / 41

Paper’s Results

Questions?

Thank you for your time.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 41 / 41

Recommended