Learning Effective and Interpretable Semantic Models using ... · Are NNSE Dimensions Coherent? •Word Intrusion Task (Boyd-Graber et al., NIPS 2009) • for each dimension, select

Learning Effective and Interpretable Semantic Models using

Non-Negative Sparse Embedding (NNSE)

Brian Murphy, Partha Talukdar, Tom MitchellMachine Learning Department

Carnegie Mellon University

1

Distributional Semantic Modeling2

Distributional Semantic Modeling

• Words are represented in a high dimensional vector space

2



• Long history:• (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ...

2




• Although effective, these models are often not interpretable

2




• Although effective, these models are often not interpretable

2

Examples of top 5 words from 5 randomly chosen dimensions from SVD300

Why interpretable dimensions?

Semantic Decoding: (Mitchell et al., Science 2008)

3



“motorbike”

Inputstimulus

word

3



“motorbike”

Semanticrepresentation

(0.87, ride)

(0.29, see)

(0.00, rub)

(0.00, taste)

.

.

.

Inputstimulus

word

3



“motorbike”


Mapping learned from fMRI data

(0.87, ride)

(0.29, see)

(0.00, rub)

(0.00, taste)

.

.

.

Inputstimulus

word

3



“motorbike”predictedactivity

for “motorbike”


Mapping learned from fMRI data

(0.87, ride)

(0.29, see)

(0.00, rub)

(0.00, taste)

.

.

.

Inputstimulus

word

3

Why interpretable dimensions?(Mitchell et al., Science 2008)

4


4


• Interpretable dimension reveals insightful brain activation patterns!

4



•But, features in the semantic representation were based on 25 hand-selected verbs

4



•But, features in the semantic representation were based on 25 hand-selected verbs‣ can’t represent arbitrary concepts

4



•But, features in the semantic representation were based on 25 hand-selected verbs‣ can’t represent arbitrary concepts‣ need data-driven, broad coverage semantic representations

4

What is an Interpretable, Cognitively-plausible Representation?

5


wor

ds

features

5


wor

ds

features

... 0 0 0 ... 1.2 ...w

5


wor

ds

features

... 0 0 0 ... 1.2 ...w

1. Compact representation:Sparse, many zeros

5


wor

ds

features

... 0 0 0 ... 1.2 ...w

2. Uneconomical to store negative (or inferable) characteristics:

Non-Negative


5


wor

ds

features

3. Meaningful Dimensions:Coherent

... 0 0 0 ... 1.2 ...w

2. Uneconomical to store negative (or inferable) characteristics:

Non-Negative


5

Properties of Different Semantic

Representations

BroadCoverage Effective Sparse

Non-Negative Coherent

6


Representations



Hand Tuned

6


Representations



Corpus-derived(existing)

Hand Tuned

6


Representations




Hand Tuned

6

Hand Coded

Corpus-derived 83.1

83.5

(Murphy, Talukdar, Mitchell, StarSem 2012)

Prediction accuracy (on Neurosemantic Decoding)


Representations




Hand Tuned

6


Representations




NNSE

Our proposal

Hand Tuned

6


Representations




NNSE

Our proposal

Hand Tuned

6

Non-Negative Sparse Embedding

(NNSE)

7

X


(NNSE)Input Matrix

(Corpus Cooc + SVD)

7

X


(NNSE)

wi

input representationfor word wi

Input Matrix(Corpus Cooc + SVD)

7

X


(NNSE)

wi


A= x D


7

X


(NNSE)

wi


A= x D

BasisInput Matrix

(Corpus Cooc + SVD)

NNSE representation for word wi

7

X


(NNSE)

wi


A= x D

Basis

• matrix A is non-negative



7

X


(NNSE)

wi


A= x D

Basis

• matrix A is non-negative• sparsity penalty on the rows of A



7

X


(NNSE)

wi


A= x D

Basis

• matrix A is non-negative• sparsity penalty on the rows of A• alternating minimization between A and D,

using SPAMS package



7

NNSE Optimization

X A= x D

8

Experiments9

Experiments

• Three main questions

9

Experiments

• Three main questions1. Are NNSE representations effective in practice?

9

Experiments


2. What is the degree of sparsity of NNSE?

9

Experiments



3. Are NNSE dimensions coherent?

9

Experiments




• Setup

9

Experiments




• Setup• partial ClueWeb09, 16bn tokens, 540m sentences,

50m documents

9

Experiments




• Setup• partial ClueWeb09, 16bn tokens, 540m sentences,

50m documents• dependency parsed using Malt parser

9

Baseline Representation: SVD10

Baseline Representation: SVD

• For about 35k words (~adult vocabulary), extract

10


• For about 35k words (~adult vocabulary), extract • document co-occurrence

10


• For about 35k words (~adult vocabulary), extract • document co-occurrence• dependency features from the parsed corpus

10



• Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline

10




• This is also the input (X) to NNSE

10




• This is also the input (X) to NNSE

• Other representations were also compared (e.g., LDA, Collobert and Weston, etc.), details in the paper

10

Is NNSE effective in

Neurosemantic Decoding?

11



11



NNSE has similar peak performance as SVD

11

Does NNSE result in sparse

semantic representation?

12



12



• NNSE is significantly sparser than SVD

12



• NNSE is significantly sparser than SVD• Words per dimension is significantly lower in NNSE

12



• NNSE is significantly sparser than SVD• Words per dimension is significantly lower in NNSE• Growth in active dimensions per word is sub-linear in NNSE

12

Are NNSE Dimensions Coherent?13

Are NNSE Dimensions Coherent?

• Word Intrusion Task (Boyd-Graber et al., NIPS 2009)

13


• Word Intrusion Task (Boyd-Graber et al., NIPS 2009)• for each dimension, select top ranked N (N=5) words

13


• Word Intrusion Task (Boyd-Graber et al., NIPS 2009)• for each dimension, select top ranked N (N=5) words• intrude it with a low ranking word from this dimension

13


• Word Intrusion Task (Boyd-Graber et al., NIPS 2009)• for each dimension, select top ranked N (N=5) words• intrude it with a low ranking word from this dimension• intruding word should be high ranking in another dimension

13


• Word Intrusion Task (Boyd-Graber et al., NIPS 2009)• for each dimension, select top ranked N (N=5) words• intrude it with a low ranking word from this dimension• intruding word should be high ranking in another dimension• ask a human to identify the intruding word

13


• Word Intrusion Task (Boyd-Graber et al., NIPS 2009)• for each dimension, select top ranked N (N=5) words• intrude it with a low ranking word from this dimension• intruding word should be high ranking in another dimension• ask a human to identify the intruding word• repeat multiple times for each dimension, calculate precision

13



• An intruded set from an NNSE1000 dimension

13



• An intruded set from an NNSE1000 dimension

intruder

13




NNSE dimensions are significantly more coherent than SVD-based dimensions

14

SVD and NNSE Dimensions: Examples

Examples of top 5 words from 5 randomly chosen dimensions from SVD300 and NNSE1000

15

SVD and NNSE Dimensions: Examples

Examples of top 5 words from 5 randomly chosen dimensions from SVD300 and NNSE1000

NNSE dimensions are significantly more coherent than SVD-based dimensions

15

Top 5 NNSE1000 Dimensions for “apple”16

Top 5 NNSE1000 Dimensions for “apple”

0.40 raspberry, peach, pear, mango, melon

0.26 ripper, aac, converter, vcd, rm

0.14 cpu, intel, mips, pentium, risc

0.13 motorola, lg, samsung, vodafone, alcatel

0.11 peaches, apricots, pears, cherries, blueberries

16







Fruit

Music

Processor

Company

Fruit

16







Fruit

Music

Processor

Company

Fruit

Different senses of the word are not mixed, each dimension corresponds to only

one sense of “apple”!

16

Conclusion17

Conclusion

•Non-Negative Sparse Embedding (NNSE)•broad coverage, sparse, non-negative• interpretable, effective in practice•probably first semantic model with all these

desirable traits

17

Conclusion


desirable traits

•Exploited large text corpus, including deep linguistic features (e.g., dependency parses)

17

Conclusion


desirable traits

•Exploited large text corpus, including deep linguistic features (e.g., dependency parses)

•Future work• multi-word extension; using NNSE representations

in non-neurosemantic domains (e.g., NELL)

17

Acknowledgment

• Khaled El-Arini (CMU) and Min Xu (CMU)

• Sheshadri Sridharan (CMU), Marco Baroni (Trento)

• Justin Betteridge (CMU), Jamie Callan (CMU)

• DARPA, Google, NIH

18

Thank You!

NNSE embeddings available at:http://www.cs.cmu.edu/~bmurphy/NNSE/

19

http://www.cs.cmu.edu/~bmurphy/NNSE/

http://www.cs.cmu.edu/~bmurphy/NNSE/

Documents

Learning Effective and Interpretable Semantic Models using ... · Are NNSE Dimensions Coherent? •Word Intrusion Task (Boyd-Graber et al., NIPS 2009) • for each dimension, select