Upload
jun-wang
View
155
Download
0
Tags:
Embed Size (px)
Citation preview
Linguistic Regularities in Sparse and Explicit
Word Representations
Omer Levy Yoav Goldberg
Bar-Ilan University
Israel
Neural Embeddings
• Dense vectors
• Each dimension is a latent feature
• Common software package: word2vec
𝐼𝑡𝑎𝑙𝑦: (−7.35, 9.42, 0.88,… ) ∈ ℝ100
• “Magic”
king − man + woman = queen
(analogies)
Explicit Representations (Distributional)
• Sparse vectors
• Each dimension is an explicit context
• Common association metric: PMI, PPMI
𝐼𝑡𝑎𝑙𝑦: 𝑅𝑜𝑚𝑒: 17, 𝑝𝑎𝑠𝑡𝑎: 5, 𝐹𝑖𝑎𝑡: 2, … ∈ ℝ 𝑉𝑜𝑐𝑎𝑏 ≈100,000
• Does the same “magic” work for explicit representations too?
• Baroni et al. (2014) showed that embeddings outperform explicit, but…
Questions
• Are analogies unique to neural embeddings?
Compare neural embeddings with explicit representations
• Why does vector arithmetic reveal analogies?
Unravel the mystery behind neural embeddings and their “magic”
Mikolov et al. (2013a,b,c)
• Neural embeddings have interesting geometries
• These patterns capture “relational similarities”
• Can be used to solve analogies:
man is to woman as king is to queen
Mikolov et al. (2013a,b,c)
• Neural embeddings have interesting geometries
• These patterns capture “relational similarities”
• Can be used to solve analogies:
𝑎 is to 𝑎∗ as 𝑏 is to 𝑏∗
• Can be recovered by “simple” vector arithmetic:
𝑎 − 𝑎∗ = 𝑏 − 𝑏∗
Mikolov et al. (2013a,b,c)
• Neural embeddings have interesting geometries
• These patterns capture “relational similarities”
• Can be used to solve analogies:
𝑎 is to 𝑎∗ as 𝑏 is to 𝑏∗
• With simple vector arithmetic:
𝑎 − 𝑎∗ = 𝑏 − 𝑏∗
• Experiment: compare embeddings to explicit representations
Are analogies unique to neural embeddings?
Are analogies unique to neural embeddings?
• Experiment: compare embeddings to explicit representations
Are analogies unique to neural embeddings?
• Experiment: compare embeddings to explicit representations
• Learn different representations from the same corpus:
Are analogies unique to neural embeddings?
• Experiment: compare embeddings to explicit representations
• Learn different representations from the same corpus:
• Evaluate with the same recovery method:
argmax𝑏∗
cos 𝑏∗, 𝑏 − 𝑎 + 𝑎∗
Analogy Datasets
• 4 words per analogy: 𝑎 is to 𝑎∗ as 𝑏 is to 𝑏∗
• Given 3 words: 𝑎 is to 𝑎∗ as 𝑏 is to ?
• Guess the best suiting 𝑏∗ from the entire vocabulary 𝑉• Excluding the question words 𝑎, 𝑎∗, 𝑏
• MSR: ~8000 syntactic analogies
• Google: ~19,000 syntactic and semantic analogies
Embedding vs Explicit (Round 1)
Embedding54%
Embedding63%
Explicit29%
Explicit45%
0%
10%
20%
30%
40%
50%
60%
70%
MSR Google
Acc
ura
cy
Many analogies recovered by explicit, but many more by embedding.
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑏∗ to 𝑏 − 𝑎 + 𝑎∗
• This is done with cosine similarity:
argmax𝑏∗∈𝑉
cos 𝑏∗, 𝑏 − 𝑎 + 𝑎∗ =
argmax𝑏∗∈𝑉
cos 𝑏∗, 𝑏 − cos 𝑏∗, 𝑎 + cos 𝑏∗, 𝑎∗
Problem: one similarity might dominate the rest.
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑏∗ to 𝑏 − 𝑎 + 𝑎∗
• This is done with cosine similarity:
argmax𝑏∗
cos 𝑏∗, 𝑏 − 𝑎 + 𝑎∗ =
argmax𝑏∗∈𝑉
cos 𝑏∗, 𝑏 − cos 𝑏∗, 𝑎 + cos 𝑏∗, 𝑎∗
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑏∗ to 𝑏 − 𝑎 + 𝑎∗
• This is done with cosine similarity:
argmax𝑏∗
cos 𝑏∗, 𝑏 − 𝑎 + 𝑎∗ =
argmax𝑏∗
cos 𝑏∗, 𝑏 − cos 𝑏∗, 𝑎 + cos 𝑏∗, 𝑎∗
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑏∗ to 𝑏 − 𝑎 + 𝑎∗
• This is done with cosine similarity:
argmax𝑏∗
cos 𝑏∗, 𝑏 − 𝑎 + 𝑎∗ =
argmax𝑏∗
cos 𝑏∗, 𝑏 − cos 𝑏∗, 𝑎 + cos 𝑏∗, 𝑎∗
vector arithmetic = similarity arithmetic
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑏∗ to 𝑏 − 𝑎 + 𝑎∗
• This is done with cosine similarity:
argmax𝑏∗
cos 𝑏∗, 𝑏 − 𝑎 + 𝑎∗ =
argmax𝑏∗
cos 𝑏∗, 𝑏 − cos 𝑏∗, 𝑎 + cos 𝑏∗, 𝑎∗
vector arithmetic = similarity arithmetic
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑥 to 𝑘𝑖𝑛𝑔 −𝑚𝑎𝑛 + 𝑤𝑜𝑚𝑎𝑛
• This is done with cosine similarity:
argmax𝑥
cos 𝑥, 𝑘𝑖𝑛𝑔 − 𝑚𝑎𝑛 + 𝑤𝑜𝑚𝑎𝑛 =
argmax𝑥
cos 𝑥, 𝑘𝑖𝑛𝑔 − cos 𝑥,𝑚𝑎𝑛 + cos 𝑥, 𝑤𝑜𝑚𝑎𝑛
vector arithmetic = similarity arithmetic
Why does vector arithmetic reveal analogies?
• We wish to find the closest 𝑥 to 𝑘𝑖𝑛𝑔 −𝑚𝑎𝑛 + 𝑤𝑜𝑚𝑎𝑛
• This is done with cosine similarity:
argmax𝑥
cos 𝑥, 𝑘𝑖𝑛𝑔 − 𝑚𝑎𝑛 + 𝑤𝑜𝑚𝑎𝑛 =
argmax𝑥
cos 𝑥, 𝑘𝑖𝑛𝑔 − cos 𝑥,𝑚𝑎𝑛 + cos 𝑥, 𝑤𝑜𝑚𝑎𝑛
vector arithmetic = similarity arithmetic
royal? female?
What does each similarity term mean?
• Observe the joint features with explicit representations!
𝒒𝒖𝒆𝒆𝒏 ∩ 𝒌𝒊𝒏𝒈 𝒒𝒖𝒆𝒆𝒏 ∩ 𝒘𝒐𝒎𝒂𝒏
uncrowned Elizabeth
majesty Katherine
second impregnate
… …
The Additive Objective
cos 𝐼𝑟𝑎𝑞, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝐼𝑟𝑎𝑞, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝐼𝑟𝑎𝑞, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos 𝑀𝑜𝑠𝑢𝑙, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝑀𝑜𝑠𝑢𝑙, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝑀𝑜𝑠𝑢𝑙, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
The Additive Objective
cos 𝐼𝑟𝑎𝑞, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝐼𝑟𝑎𝑞, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝐼𝑟𝑎𝑞, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos 𝑀𝑜𝑠𝑢𝑙, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝑀𝑜𝑠𝑢𝑙, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝑀𝑜𝑠𝑢𝑙, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
The Additive Objective
cos 𝐼𝑟𝑎𝑞, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝐼𝑟𝑎𝑞, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝐼𝑟𝑎𝑞, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos 𝑀𝑜𝑠𝑢𝑙, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝑀𝑜𝑠𝑢𝑙, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝑀𝑜𝑠𝑢𝑙, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
The Additive Objective
cos 𝐼𝑟𝑎𝑞, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝐼𝑟𝑎𝑞, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝐼𝑟𝑎𝑞, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos 𝑀𝑜𝑠𝑢𝑙, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝑀𝑜𝑠𝑢𝑙, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝑀𝑜𝑠𝑢𝑙, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
The Additive Objective
cos 𝐼𝑟𝑎𝑞, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝐼𝑟𝑎𝑞, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝐼𝑟𝑎𝑞, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos 𝑀𝑜𝑠𝑢𝑙, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝑀𝑜𝑠𝑢𝑙, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝑀𝑜𝑠𝑢𝑙, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
The Additive Objective
cos 𝐼𝑟𝑎𝑞, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝐼𝑟𝑎𝑞, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝐼𝑟𝑎𝑞, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos 𝑀𝑜𝑠𝑢𝑙, 𝐸𝑛𝑔𝑙𝑎𝑛𝑑 − cos 𝑀𝑜𝑠𝑢𝑙, 𝐿𝑜𝑛𝑑𝑜𝑛 + cos 𝑀𝑜𝑠𝑢𝑙, 𝐵𝑎𝑔ℎ𝑑𝑎𝑑
• Problem: one similarity might dominate the rest
• Much more prevalent in explicit representation
• Might explain why explicit underperformed
How can we do better?
• Instead of adding similarities, multiply them!
argmax𝑏∗
cos 𝑏∗, 𝑏 cos 𝑏∗, 𝑎∗
cos 𝑏∗, 𝑎
How can we do better?
• Instead of adding similarities, multiply them!
argmax𝑏∗
cos 𝑏∗, 𝑏 cos 𝑏∗, 𝑎∗
cos 𝑏∗, 𝑎
Multiplication > Addition
Add54%
Add63%
Add29%
Add45%
Mul59%
Mul67% Mul
57%
Mul68%
0%
10%
20%
30%
40%
50%
60%
70%
80%
MSR Google MSR Google
Embedding Explicit
Acc
ura
cy
Explicit is on-par with Embedding
Embedding59%
Embedding67%Explicit
57%
Explicit68%
0%
10%
20%
30%
40%
50%
60%
70%
80%
MSR Google
Acc
ura
cy
Explicit is on-par with Embedding
• Embeddings are not “magical”
• Embedding-based similarities have a more uniform distribution
• The additive objective performs better on smoother distributions
• The multiplicative objective overcomes this issue
Conclusion
• Are analogies unique to neural embeddings?
No! They occur in sparse and explicit representations as well.
• Why does vector arithmetic reveal analogies?
Because vector arithmetic is equivalent to similarity arithmetic.
• Can we do better?
Yes! The multiplicative objective is significantly better.
More Results and Analyses (in the paper)
• Evaluation on closed-vocabulary analogy questions (SemEval 2012)
• Experiments with a third objective function (PairDirection)
• Do different representations reveal the same analogies?
• Error analysis
• A feature-level interpretation of how word similarity reveals analogies