+ Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and...
29
+ Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University
+ Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon
+ Improving Vector Space Word Representations Using
Multilingual Correlation Manaal Faruqui and Chris Dyer Language
Technologies Institute Carnegie Mellon University
Slide 2
+ Distributional Semantics You shall know a word by the company
it keeps (Harris 1954; Firth, 1957) I will take what is mine with
fire and blood the end battle would be between fire and ice My
dragons are large and can breathe fire now flame is the visible
portion of a fire take place whereby fires can sustain their own
heat
Slide 3
+ Translational Semantics What other Information? (Bannard
& Callison-Burch, 2005) That plane can seat more than 300
people Russian airplanes are huge Multilingual Information! plane
airplane
+ Word Vector Representations How to encode such
co-occurrences? daynightcold sleep0102 winter3350 the10129 contexts
words
Slide 6
+ Word Vector Representation Latent Semantic Analysis
(Deerwester et al., 1990) Singular Value Decomposition words
context words
Slide 7
+ Multilingual Information English German French Spanish dragon
Drache dragon dragn Problem ? = Append
Slide 8
+ Multilingual Information Vector Size Increases Idiosyncratic
Info. What if word is OOV ? Disadvantages of Vector Concatenation
?
Slide 9
+ Multilingual Information I will take what is mine with fire
and blood the end battle would be between fire and ice My dragons
are large and can breathe fire now So, what can we do?... Das Ende
der Schlacht wrde zwischen Feuer und Eis...... gesehen ist Feuer
eine Oxidationsreaktion mit...... Das Licht des Feuers ist eine
physikalische Erscheinung Two Views: Canonical Correlation Analysis
!
Slide 10
+ Canonical Correlation Analysis (CCA) Project two sets of
vectors (equal cardinality) in a space where they are maximally
correlated Convex Optimization Problem with Exact Solution !
CCA
Slide 11
+ Canonical Correlation Analysis (CCA) k = min(r( ), r( )) W V
X Y n2n2 d1d1 k n1n1 d2d2 d2d2 k d1d1 XX YY k k n2n2 n1n1 X and Y
are now maximally correlated ! W, V = CCA( , )
Slide 12
+ Canonical Correlation Analysis (CCA) Vector Size Increases,
Doesnt increase Problems Addressed? Idiosyncratic Information, Lets
you choose! What if word is OOV?, Projection vectors for
everyone!
Slide 13
+ Canonical Correlation Analysis (CCA) The vocabularies cant be
of equal size ! Ok, but equal cardinality sets & ? Get word
alignments from a parallel corpus Preserve only words in the
original vocabulary For every word in English, select the best
foreign word
Slide 14
+ Experimental Setup LSA Word Vector Learning Monolingua l Data
EnglishGermanFrenchSpanish News CorpusWMT-2011 WMT 2011-12WMT-2011
Tokens360,000,000290,000,000263,000,000164,000,000
Types180,000294,000137,000145,000 Tokenizer and Lowercasing: WMT
scripts
Slide 15
+ Experimental Setup LSA Word Vector Learning Parallel Data
De-EnFr-EnEs-En News Comm + Europarl WMT
Tokens128,000,000138,000,000134,000,000 Word pairs37,00038,000 Word
Alignment Tool: fast_align (Dyer et al, 2013)
Slide 16
+ Experimental Setup LSA Word Vector Learning Corpus
Preprocessing...hello hello hello hello hello Context : 23.45, 21
st, 10-20-2014, 0.5e10 NUM anchfgugsjh, wekjfbg, bhguyq UNK
Slide 17
+ Experimental Setup Word Similarity Evaluation WS-353
(Finkelstein et al, 2001) WS-353-SIM (Agirre et al, 2009)
WS-353-REL (Agirre et al, 2009) RG-65 (Rubenstein and Goodenough,
1965) MC-30 (Miller and Charles, 1991) MTurk-287 (Radinsky et al,
2011) Word Relation Evaluation Semantic Relations (Mikolov et al,
2013) Syntactic Relations (Mikolov et al, 2013) Evaluation
Benchmarks
Slide 18
+ Experimental Setup Monolingual Vector Length: 80 Multilingual
Vector Length: ? Multilingual Vector Learning The length in
projected space can be chosen: k Choose the best value of k for
WS-353 k [0.1, 0.2, , 1.0]
Slide 19
+ Experimental Setup Multilingual Vector Learning Performance
on WS-353; k = 0.6 Spearmans correlation Dimensions
+ Experimental Setup RNNLM (Mikolov et al, 2011) Predict next
word given the history Neural language model Recurrent hidden layer
connections Skip-Gram, word2vec (Mikolov et al, 2013) Predict
context given the word Removes hidden layer Vocabulary represented
in Huffman coding Multilingual Vectors: Neural Networks
+ Experimental Setup Multilingual Vectors: Scaling Spearmans
correlation on WS-353
Slide 25
+ Experimental Setup Multilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of Beautiful: Monolingual Setting t-SNE tool
(van der Maaten and Hinton, 2008)
Slide 26
+ Experimental Setup Multilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of Beautiful: Multilingual Setting t-SNE tool
(van der Maaten and Hinton, 2008)
Slide 27
+ Conclusion CCA: Easy to use tool in MATLAB Take vectors from
two languages and improve them. Multilingual Information is
Important Even if the problems are inherently monolingual. More
Effective for Distributional Vectors Semantics generalizes better
than Syntax. Vectors available at: http://cs.cmu.edu/~mfaruqui
Slide 28
+ Related Work Document representation Vinokourov et al, 2002,
Platt et al, 2010 Synonymy and Paraphrasing Bannard and Burch,
2005, Ganitkevitch et al, 2013 Bilingual lexicon induction Haghighi
et al, 2008 Vulic and Moens, 2013 Bilingual word vectors Klementiev
et al 2012 Zou et al, 2013 Translation Models Kalbrenner &
Blunsom, 2013 Compositional Semantics Hermann & Blunsom,
2014