Learning Word Subsumption Projections for the Russian Language

Learning Word Subsumption Projections forthe Russian Language

Dmitry Ustalov1,2 Alexander Panchenko3

1Ural Federal University, Russia2Krasovskii Institute of Mathematics and Mechanics, Russia

3Technische Universität Darmstadt, Germany

October 6, 2016

Introduction

Hyponymy is the asymmetric relationship between a genericterm (hypernym) and an instance of this term (hyponym).

In biology, the same relationship between genus and species iscalled subsumptions.Examples: cat is−a→ feline, laptop is−a→ computer.

Extremely useful in various NLP applications, but barelyavailable for Russian.

Goals:Propose an approach for learning subsumptions (for Russian).Develop the software for that.Empirically evaluate them.

Related Work

Traditionally, subsumptions were derived by the expertlexicographers.Hearst (1992) proposed using lexico-syntactic patterns forextracting subsumptions automatically from a text corpus.Mikolov et al. (2013) developed word2vec, an efficient toolfor inducing word embeddings.Fu et al. (2014) presented a projection learning setup fortransforming hyponym embeddings into the hypernyms.Arefyev et al. (2015) trained a large word embeddings modelfor Russian during the RUSSE competition.Shwartz et al. (2016) used RNNs for learning subsumptionsfor English.

Word Embeddings

Word embeddings are similar to SVD of a document-term matrix,so the vocabulary words are being mapped into the dense vectors.

Figure from Mikolov et al. (2013).

What about other linear transformations?

Approach: The Baseline

The baseline approach assumes obtaining a projection matrix Φ∗

such that tranforms a hyponym x⃗ into its hypernym y⃗.

Φ∗ = argminΦ

∑(x⃗,y⃗)

dist(x⃗Φ, y⃗)

This is achieved by numerically minimizing the Euclidean (L2)distance using linear regression.

Also, separating the initial linear space using k-meanssubstantially increases the model capacity.

Example: Identity vs. Projection

Identity

кот 1.0000 котище0.7766

кота

0.7688

котяра

0.7663

котенок

0.7462

барсик

0.7272

кот…

0.7124

кот, —

0.7085

котом

0.7070

мяукнул

0.6980

Baseline Projection

кот 0.7012 животное0.7643

зверь

0.7299

хищник

0.7201

намбат

0.7060

cryptoprocta

0.6994

сумчатость

0.6978

вуалехвостый

0.6940

виверровая

0.6888

гепардообразной

0.6885

Approach: Hyponymy Penalization

Applying the same transformation to the hypernym vector x⃗Φas to the hyponym vector should not provide the initialhyponym vector x⃗.

Φ∗ = argminΦ

∣∣∣∣∣∣(1− α)∑(x⃗,y⃗)

dist(x⃗Φ, y⃗)− α∑x⃗

dist(x⃗ΦΦ, x⃗)

∣∣∣∣∣∣

Approach: Synonymy Penalization

Exploit the negative sampling technique by explicitly providingthe examples of synonyms z⃗ that penalizes the matrix toproduce the vectors similar to them.

Φ∗ = argminΦ

∣∣∣∣∣∣(1− α)∑(x⃗,y⃗)

dist(x⃗Φ, y⃗)− α∑(x⃗,z⃗)

dist(x⃗ΦΦ, z⃗)

∣∣∣∣∣∣

Approach: Hypernymy Promotion

Instead of negative sampling, promote the matrix to producehypernyms not just for the initial hyponym, but also for itsrandomly sampled synonym z⃗.In lexical ontologies, the words are grouped into synsets (setsof synonyms) and the subsumptions are established betweensuch synsets.

Φ∗ = argminΦ

(1− β)∑(x⃗,y⃗)

dist(x⃗Φ, y⃗) + β∑(y⃗,z⃗)

dist(z⃗Φ, y⃗)

Implementation

A single-layer perceptron instead of linear regression.TensorFlow for defining and executing the computation graph,scikit-learn for k-means clustering.The Adam stochastic optimization method for minimizing theloss function.Tried the cosine distance instead of L2, but without any luck(the details are in the paper).Parameters: α = 0.01, β = 0.3, 14 000 training epochs.

Experiments: The SetupLanguage Resources:

500-dimensional word vectors for Russian trained using theskip-gram architecture (64 GB in RAM),21 997 train items: Hearst patterns + Russian Wiktionary,10 811 test items: Russian Wiktionary only.

To avoid lexical overfitting, each set contains a distinct vocabulary.

Computational Resources:Intel Xeon E5-2620 v2 @ 2.10GHz (32 GB of RAM),NVIDIA Tesla K20Xm, 2866 cores (6 GB of VRAM).

Some preliminary computations have been done on anothermachine with the larger amount of available RAM.

Each experiment is run for five times for evaluating the statisticalsignificance using t-test.

Experiments: The Metric

No standard evaluation metric for such a task is available yet.We measure the quality by analyzing the ten nearest neighbours.

A@10 =1

∑(x⃗,y⃗)

1(NN10(x⃗Φ

∗) ∋ y⃗)

This is the probability of providing the correct hypernym amongthe ten nearest neighbours by projecting its related hyponym,which is previously unknown to the model.

Experiments: The ResultsOn (almost) every configuration, both hyponymy and synonymypenalizations significantly outperform the baseline.

1 2 3 4 5 6 7 8 9 10# of clusters

Baseline Pen. Hyponymy Pen. Synonymy Prom. Hypernymy

Experiments: The PerformanceSince the matrix is 501× 500, using GPU is infeasable due to therequirements for the batch size (which is 512 in our case).

512 1024 2048 4096 8192batch size

Baseline Pen. Synonymy

CPU GPU

Example: Baseline vs. Penalization

Baseline Projection

зверь

0.7299

хищник

0.7201

намбат

0.7060

cryptoprocta

0.6994

0.6978

вуалехвостый

0.6940

виверровая

0.6888

гепардообразной

0.6885

Pen. Synonymy Projection

зверь

0.7283

хищник

0.7141

намбат

0.6983

0.6946

cryptoprocta

0.6915

ornithoryngue

0.6887

млекопитающее

0.6876

кволл

0.6837

Conclusion

A negative sampling approach for synonymy penalization hasbeen proposed and successfully evaluated.An open source projection learning toolkit has been developedusing TensorFlow: https://github.com/dustalov/projlearn/.The released datasets, including the trained models, are alsoavailable for other researchers under a libré license.GPUs have a lot of potential in our task: multi-layer setup,CNNs, RNNs, etc.The primary obstacle right now is the availability of thetraining subsumptions.

Thank You!

Dmitry Ustalov https://linkedin.com/in/ustalov dmitry.ustalov@urfu.ru

The reported study was funded by RFBR according to the researchproject No. 16-37-00354 мол_a. We are grateful to Nikolay Arefyev,Andrey Kutuzov, Andrey Krizhanovsky, Benjamin Milde and AlexanderBersenev for the fruitful discussions on the present study. Dmitry Ustalovwas partially supported by the Deutscher Akademischer Austauschdienst(DAAD) scholarship. Alexander Panchenko was supported by theDeutsche Forschungsgemeinschaft (DFG) foundation under the project“JOIN-T: Joining Ontologies and Semantics Induced from Text”.

Learning Word Subsumption Projections for the Russian Language

Science

Subsumption Inteligencia Artificial

SUBSUMPTION - Southern Methodist Universitygeology.heroy.smu.edu/~dpa-www/robo/subsumption/subsumption.p… · C. SR04 Robot arbitration example The SR04 robot uses a homebrew cooperative

Subsumption Architectures

Call Subsumption Mechanisms for Tabled Logic Programsricroc/homepage/alumni/2010-cruzMSc.pdf · Call Subsumption Mechanisms for Tabled Logic Programs Disserta˘c~ao submetida a Faculdade

Subsumption Architecture and Nouvelle AI Arpit Maheshwari Nihit Gupta Saransh Gupta Swapnil Srivastava

Subsumption examples Let us analyze some subsumption robots and systems

The Dialectics of Space Subsumption, Struggle in Space

Tableau Algorithm. Presentation Outline Description Logics Reasoning Tasks Structural Subsumption Tableau Algorithm Examples Q & A

Computing ALCH -Subsumption Modules Using Uniform …ceur-ws.org/Vol-2013/paper15.pdf · 2017. 11. 29. · Computing ALCH -Subsumption Modules Using Uniform Interpolation Patrick

Lexikalisch-Funktionale Grammatik Subsumption Unifikation Von der K-Struktur zur F-Struktur

Types of Maps Mercator Projections Conic Projections Gnomonic Projections Topographic Maps

Robotic Summer School 2009 Subsumption architecture Andrej Lúčny

Heuristic Inverse Subsumption in Full-clausal Theoriesida.felk.cvut.cz/ilp2012/wp-content/uploads/ilp... · 2010: proposing a new form of inverse subsumption (IS) for complete explanatory

Identifying method-level mutation subsumption relations

NEW JERSEY DEPARTMENT OF LABOR PROJECTIONS SYSTEM Industry Projections Occupational Projections Population Projections Labor Force Projections Labor Force

Symbolic Execution with Abstract Subsumption CheckingSymbolic Execution with Abstract Subsumption Checking 165 Related Work. Our work follows a recent trend in software model check-ing,

UK Climate Projections science report: Projections …cedadocs.ceda.ac.uk/1335/1/weather_generator_full_report.pdfUK Climate Projections science report: Projections of future daily

IEEE Transactions on Services Computing A Subsumption ... · Postprint of article in IEEE Transactions on Services Computing (2014), doi: 10.1109/TSC.2014.2331683 A Subsumption Hierarchy

Ngai, Pun - Subsumption or Consumption

Subsumption Vienna