Classiﬁcation of Entailment Relations in PPDB · 2017. 11. 15. · Classiﬁcation of Entailment Relations in PPDB 1 Overview This document outlines our protocol for labeling noun

Classification of Entailment Relations in PPDB

1 Overview

This document outlines our protocol for labelingnoun pairs according to the entailment relations pro-posed by Bill MacCartney in his 2009 thesis on Nat-ural Language Inference. Our purpose of doing thisis to build a labelled data set with which to train aclassifier for differentiating between these relations.The classifier can be used to assign probabilities ofeach relation to the paraphrase rules in PPDB, mak-ing PPDB a more informative resource for down-stream tasks such as recognizing textual entailment(RTE).

2 Entailment Relations

MacCartney’s thesis proposes a method for parti-tioning all of the possible relationships that can existbetween two phrases into 16 distinct relationships.Of these, he defines the 7 non-degenerate relationsas the “basic entailment relationships.” MacCart-ney’s table explaining these 7 relations is reproducedin figure 1.

CHAPTER 5. ENTAILMENT RELATIONS 79

symbol10 name example set theoretic definition11 in R

x ⌘ y equivalence couch ⌘ sofa x = y R1001

x @ y forward entailment crow @ bird x ⇢ y R1101

x A y reverse entailment Asian A Thai x � y R1011

x ^ y negation able ^ unable x \ y = ; ^ x [ y = U R0110

x | y alternation cat | dog x \ y = ; ^ x [ y 6= U R1110

x ` y cover animal ` non-ape x \ y 6= ; ^ x [ y = U R0111

x # y independence hungry # hippo (all other cases) R1111

The relations in B can be characterized as follows. First, we preserve the semanticcontainment relations (v and w) of the monotonicity calculus, but factor them intothree mutually exclusive relations: equivalence (⌘), (strict) forward entailment (@),and (strict) reverse entailment (A). Second, we include two relations expressingsemantic exclusion: negation (^), or exhaustive exclusion, which is analogous to setcomplement; and alternation (|), or non-exhaustive exclusion. The next relation iscover (`), or non-exclusive exhaustion. Though its utility is not immediately obvious,it is the dual under negation (in the sense defined in section 5.3.1) of the alternationrelation. Finally, the independence relation (#) covers all other cases: it expressesnon-equivalence, non-containment, non-exclusion, and non-exhaustion. As noted insection 5.3.2, # is the least informative relation, in that it places the fewest constraintson its arguments.

10Selecting an appropriate symbol to represent each relation is a vexed problem. We soughtsymbols which (a) are easily approximated by a single ASCII character, (b) are graphically symmetriciff the relations they represent are symmetric, and (c) do not excessively abuse accepted conventions.The ^ symbol was chosen to evoke the logically similar bitwise XOR operator of the C programminglanguage family; regrettably, it may also evoke the Boolean AND function. The | symbol was chosento evoke the Sheffer stroke commonly used to represent the logically similar Boolean NAND function;regrettably, it may also evoke the Boolean OR function. The @ and A symbols were obviously chosento resemble their set-theoretic analogs, but a potential confusion arises because some logicians usethe horseshoe � (with the opposite orientation) to represent material implication.

11Each relation in B obeys the additional constraints that ; ⇢ x ⇢ U and ; ⇢ y ⇢ U (i.e., x andy are non-vacuous).

Figure 1: MacCartney’s 7 basic entailment relations. Incolumn 4, U refers to the space of all phrase pairs (x,y).

In summary, the relations are :

• equivalence (=) : if X is true then Y is true andif Y is true then X is true.

couch sofa

crow

bird

un-able

able

cat

feline

carni vore

canine

dog

Thai

Asian

hippo

ungulate

entity

substance

food


R0000 R0001 R0010 R0011

R0100 R0101 R0111

R1000 R1010 R1011

R1100 R1101 R1110 R1111

R0110

R1001

Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x\ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U .

equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.

In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R

are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.

equivalenceCHAPTER 5. ENTAILMENT RELATIONS 71

R0000 R0001 R0010 R0011

R0100 R0101 R0111

R1000 R1010 R1011

R1100 R1101 R1110 R1111

R0110

R1001






R0000 R0001 R0010 R0011

R0100 R0101 R0111

R1000 R1010 R1011

R1100 R1101 R1110 R1111

R0110

R1001






R0000 R0001 R0010 R0011

R0100 R0101 R0111

R1000 R1010 R1011

R1100 R1101 R1110 R1111

R0110

R1001






R0000 R0001 R0010 R0011

R0100 R0101 R0111

R1000 R1010 R1011

R1100 R1101 R1110 R1111

R0110

R1001






R0000 R0001 R0010 R0011

R0100 R0101 R0111

R1000 R1010 R1011

R1100 R1101 R1110 R1111

R0110

R1001





forward entailment

reverse entailment

negation

alternation

independence no pathhypernymy

hyponymy shared hypernym

antonymsynonym

Figure 2: Mappings of MacCartney’s relations onto theWordnet hierarchy.

• forward entailment (<) : if X is true then Y istrue but if Y is true then X may or may not betrue.

• reverse entailment (>) : if Y is true then X istrue but if X is true then Y may or may not betrue.

• negation (∧) : if X is true then Y is not true andif Y is not true then X is true, and either X or Ymust be true.

• alternation (|) : if X is true then Y is not truebut if Y is not true then X may or may not betrue.

• cover (^) : if X is true then Y may or may notbe true and if Y is true then X may or may notbe true, and either X or Y must be true.

• independent (#) : if X is true then Y may or

may not be true and if Y is true then X may ormay not be true.

As MacCartney acknowledges, the utility of thecover relation for RTE is “not immediately obvious,”and we do not use this relation in our work.

3 Defining relations within Wordnet

Below, we describe our rules for mapping a pair ofnodes in the Wordnet noun hierarchy onto one ofthe six basic entailment relations. Our mappings areshown pictorially in figure 2.

We use Python NLTK’s Wordnet interface withWordnet version 3.1 to implement the described al-gorithms.

In the following definitions, lowercase letters rep-resent strings and capital letters represent Wordnetsynsets.

Synonym x and y are considered synonyms if anysense of x shares a synset with any sense of y. Thatis, if

synsets(x) ∩ synsets(y) 6= ∅.

Hypernym x is considered a hypernym of y ifsome synset of x is the root of a subtree containingsome synset of y. I.e.

synsets(y) ∩⋃

X∈synsets(x)

children(X) 6= ∅

where children(X) is the transitive closure of thechildren of X (i.e. all the nodes in the subtree rootedat X).

Hyponym x is considered a hyponym of y if y isa hypernym of x, using the definition of hypernymgiven above. Figure 3 gives a visual representationof how this process establishes that “drop” is a hy-ponym of “amount.”

Antonym Wordnet defines the antonym relation-ship over lemmas, rather than synsets. We thereforedefine the set of antonyms of x as

antonyms(x) =⋃

X∈synsets(x)

⋃wx∈X

WNantonyms(wx)

payroll

sum of money

revenue volume

quantity

indefinite quantity grand total

sum

subtotal

small indefinite quantity

drop

capacity

vital capacity

box office

bead

dip

drop off

pearl

fall

cliff

dribletdrib

amount

measure

total

Figure 3: We establish the hypernym relation by takingthe transitive closure of the children of the synsets of x(“amount”) and intersecting with the union of the synsetsof y (“drop”).

where wx is a lemma and WNantonyms(wx) is theset of lemmas Wordnet defines as antonyms for wx.We then call x and y antonyms if x ∈ antonyms(y)or y ∈ antonyms(x).

Alternation We say that x and y are in alternationif x and y are in disjoint subtrees rooted at a sharedhypernym. I.e., x and y are in alternation if both

1. None of the conditions for synonym, hyper-nym, hyponym, or antonym are met, and

2. hypernyms(x) ∩ hypernyms(y) 6= ∅.

Wordnet is structured so that the entire noun hi-erarchy is rooted at “entity.” The hierarchy is fairlyfat, with most nodes having a depth between 5 and10 (figure 5). To avoid having too many false pos-itives for the alternation class, we do not count twowords as alternation if their deepest shared hyper-nym is in the first three levels of the hierarchy (i.e.has a depth less than three). This amounts to re-moving 26 synsets as potential shared hypernymsfor alternation, which are shown in table 5. Thesenodes largely represent abstract categories such as“physical entity” and “causal agent.” Removingthese nodes prevents true independent pairs from be-ing falsely labeled as alternation. We elaborate onthe more potential false positives for the alternationclass in section 7.

Other Relatedness In addition to MacCartney’srelations, we define a sixth relation category, which

serves as a catch-all for terms which are flagged asrelated by Wordnet but whose relation is not builtinto the hierarchical structure. WN refers to theserelations as either semantic, meaning they exist aspointers between synsets, or as lexical, meaning theyexist as pointers between lemmas.

While these noun pairs do not meet the criteriaof the basic entailment relations defined, we feelthey carry more information about entailment thando truly independent terms. Currently, we combineall of these forms of relatedness into an “other” cate-gory, and we leave the exact entailment implicationsof this category to be learned in downstream appli-cations.

Semantic relations We use two of WN’s seman-tic noun-noun relations, holonymy and meronymy.WN breaks each of these into three subclasses: part,member, and substance. For example, arm/body isclassified as part holonym, musician/musical groupis classified as member holonym, water/ice is clas-sified as substance holonym. We combine thesethree fine-grained classes and use only the coarserholonym and meronym labels. We define x and yas holonyms analogously to our definition of hyper-nymy:

synsets(y) ∩⋃

X∈synsets(x)

WNholonyms(X) 6= ∅

where WNholonym(X) is the transitive closure of theholonyms of X according to WN. Meronymy is de-fined identically.

Lexical relations We use two of WN’s lexicalrelations: derivational relatedness and pertainymy.Pertainym pointers exist between adjectives andnouns, not between nouns and other nouns. BecausePPDB contains unreliable part of speech labels, weassign the pertainym labels to describe the relationbetween a pair of nouns x/y if x is lexically identi-cal to an adjective reachable from y through a per-tainym pointers, or vis-versa. We define pertained ofx analogously to antonyms:

pertain(x) =⋃

X∈synsets(x)

⋃wx∈X

WNpertain(wx).

We define the lemmas of y as:

lemmas(y) =⋃

Y ∈synsets(y)

⋃wy∈Y

wy.

We then call y a pertainym of x if lemmas(y) ∩pertain(x) 6= ∅.

Derivational relatedness is defined identically,with a slight modification to enforce transitivity.I.e. x is derivationally related to y if lemmas(y) ∩derive(x) 6= ∅ or deriv(x) ∩ derive(y) 6= ∅.This is to deal with a small number of cases inwhich two nodes share a derivational stem but lackan explicit pointer between them, e.g. WN relatesvaccine/vaccinate and vaccination/vaccinate, but notvaccine/vaccination.

Semantic relations treated as lexical relationsWe also use WN’s attribute pointer, which is givenas a semantic relation (between synsets) but exists asa pointer from an adjective synset to a noun synset.We want to treat this relation as we did with per-tainymy, that is we want to assign the attribute labelto noun pairs if x is lexically identical to an adjec-tive reachable from y through an attribute pointer,or vis-versa. We therefore treat attribute as a lexi-cal relation by taking the union over the lemmas ofsynsets, rather than over synsets:

attr(x) =⋃

X∈synsets(x)

⋃A∈WNattr(X)

⋃wx∈A

wx

where WNattr(x) is the set of synsets for whichWN considers X to be an attribute. As before,we then call y an attribute of x if lemmas(y) ∩attar(x) 6= ∅.Independent Independence is simply defined asfailing to meet any of the criteria for the above 5 ba-sic entailment relations or for the “other” category.

4 Preprocessing steps

We remove noun pairs in which x and y are inflectedforms of the same lemma. In querying for Word-net synsets, however, we leave each noun in its in-flected form. This is to address the fact that NLTKlemmatizes automatically when no synset exists forthe inflected form (e.g. it will map “dogs” to thesynset for “dog”) but requires inflections when there

alternation antonym hypernym hyponym independent other related synonymactress/suspect antonym/simile amount/droplet aryan/being appeal/v anatolia/bosporus belongings/propertycontamination/desiccation assembly/disassembly box/casket battleground/region armenian/consonant blur/smear centers/plazacup/dons benignity/malignancy care/medication blattaria/order cops/machination championship/mount dispatch/expeditiondisguise/monsignor broadening/narrowing content/formalism botswana/countries farewell/string dissent/stand environment/surroundingsdory/race end/outset country/kabul carol/vocals freshmen/obligation fujiyama/japan lasagna/lasagneenterprise/nasa evil/goodness country/uganda connecticut/district general/inspection haliotidae/haliotis level/tiergovernance/presentation foes/friend district/omaha family/parentage hitchcock/vocals mountain/pack pa/popmagazine/tricycle hardness/softness genus/populus fashion/practice ko/state rating/site quivering/tremblingphenomenon/tooth hypernatremia/hyponatremia grounds/newfoundland fisher/workers southwestern/special shipment/transportation sentence/synonymregion/science imprecision/precision object/wedge male/someone wave/weiss stoppage/stopping yankees/yanks

Table 1: Some example noun pairs for each label defined according to Wordnet.

independent 11,512,210alternation 8,826,439hypernym 115,256hyponym 112,339synonym 13,653antonym 656other relations 43,620

deriv. rel. 25,094holonym 8,973meronym 8,595attribute 537pertainym 421

Table 2: Distribution of labels over 20,624,173 noun pairsfound in Wordnet.

is a differentiation in the synsets (e.g. querying for“marine” will return the synset for “a member of theUnited States Marine Corps” but only querying forthe pluralized “marines” will return the synset for “adivision of the United States Navy.”)

We check the criteria for each label in sequence,to enforce mutual exclusivity of labels. The orderin which the criteria are checked is 1) antonym, 2)synonym, 3) hypernym, 4) hyponym, 5) other relat-edness, 6) alternation, 7) independence. This meansthat if a noun pair fits the criteria for multiple rela-tions (due to multiple senses, for example), the la-bels will be prioritized in this order. For example,the noun pair king/queen fits the criteria for antonymas well as for synonym (under the synset “a com-petitor who holds a preeminent position”), but it isassigned only to the antonym relation.

5 Results

We ran the described labeling method on 20,624,173noun pairs. The nouns in each pair occurred to-

gether in a sentence in the Wackypedia corpus andwere joined by a path of no more than 5 nodes in theparse tree. Table 1 shows some sample noun pairsand table 2 gives the number of noun pairs matchingeach relation. Table 4 shows a sample of the labelsassigned to noun pairs from occurring both in Wack-ypedia and in PPDB.

6 Weaknesses and limitations

Deviations from MacCartney’s definitions Thestructure of the Wordnet hierarchy and our defini-tions result in some differences between our classesand those proposed by MacCarthy. Whereas Mac-Carthy’s alternation is a strictly contradictory rela-tionship (if X is true, than Y cannot be true), ouralternation class does not have this property.

Wordnet’s inconsistent use of the parent/childpointers prevents us from consistently identifyingwhen a shared parent node indicates true alternationverses when it indicates a hypernymy relationship.Figure 4 shows an example of how the relationshipbetween a node and its immediate children may ex-emplify either relationship. Because we cannot sys-tematically minimize these errors (we elaborate insection 7), we do nothing to prevent this form offalse positive. We recognize that, compared to thestrong exclusivity implication of MacCartney’s al-ternation (table 3), our alternation will be weaker inRTE tasks.

Additionally, MacCartney’s definition of antonymrequires that antonym pairs have the “exhaustive-ness” property; that is, it requires that if x and y areantonyms, then any noun must either be x or be y.This is not the definition of antonym used by Word-net. For example, Wordnet provides the antonympair man/woman, even though it is possible for somenoun to be neither a man nor a woman. Because ofthis, many of our pairs labelled antonym more accu-

Entailment MacCartney Wordnet

Containmentequivalence synonymforward entailment hyponymreverse entailment hypernym

Contradictionnegation

antonymalternation

Compatibility independentalternationindependent

Table 3: Our definition of alternation provides substan-tially weaker signal for entailment recognition than doesMacCartney’s alternation.

rately would fall under MacCartney’s alternation.

fear

panichysteria >

instrumentality

equipment

technology

device

machine <

emotion

fearlove |

carnivore

feline

cat

canine

dog |

Figure 4: Inconsistent levels of granularity in the Word-net hierarchy can cause containment relations (hyper-nymy/hyponym) to be falsely labeled as alternation. Onthe left is true alternation. On the right, the child nodeswould be most appropriately labeled as hypernymy.

Other limitations We try to work only withnouns. We therefore draw all of our labels fromWN’s noun hierarchy and apply our labeling onlyto phrases in PPDB which are tagged as nouns. Thetagging in PPDB is not perfect, and we receive somesupposed noun/noun pairs which are more likely adifferent part of speech (e.g. basic/fundamental).Classifying these pairs under the assumption thatthey are nouns causes noisy labeling (e.g. ba-sic/fundamental is labeled as independent).

Wordnet has peculiarities which sometimes causeincorrect labels. One issue is that of word sense.Despite Wordnet’s effort to cover a large numberof senses for each noun, it is not always com-plete. For example, Wordnet causes the pairs con-straint/limitation and downturn/slowdown to be la-beled as alternation, even though they are most oftenused synonymously. Another issue is that of over-formalization, which causes a disconnect betweenthe information in Wordnet and the way terms areused colloquially. For example, Wordnet defines ob-

jectivity as an “attribute” and impartiality as a “psy-chological feature,” thus causing the pair objectiv-ity/impartiality to be labeled as independent, despitethe fact that these two terms are used interchange-ably. Similarly, Wordnet defines “dairy” as a hy-ponym of “farm” and “dairy product” as a hyponymof “food,” causing the pair dairy/milk to be labeledas independent.

7 Handling alternations

As discussed above, the alternation class presentssome particular difficulties when defined using onlythe information available in Wordnet. In this section,we explore in detail the shortcomings of the Wordnetapproach and motivate our decision to collect man-ual annotations for our data.

7.1 Definitions

We refer to the depth of a node as the length of thepath from that node to the root. Some nodes havemultiple inheritance 1 or are reachable from the rootnode via more than one path. The NLTK Wordnetinterface provides access to both the minimum depth(the length of the shortest path from root to node)and the maximum depth (the length of the longestpath from root to node). In the following experi-ments, we mean the minimum depth when we re-fer to depth. All of the experiments were replicatedusing the maximum depth, and showed no notablechange in results.

When we refer to the distance between two nodeswe are referring to the length of the shortest pathbetween the two nodes.

When we refer to the shared parent of two nouns,we refer to the deepest common hypernym of bothnouns. That is, in cases where the nouns share morethan one hypernym, we choose the hypernym withthe greatest depth. For example, the nouns “ape” and“monkey” share the hypernym “primate,” which hasa depth of 11 and also “animal” which has a depthof 6. We choose “primate” as the deepest sharedhypernym.

1For example, the synset person.n.01 has a depth of 3via the path entity→physical entity→causal agent→person,as well as a depth of 6, via the path entity→physicalentity→object→whole→living thing→organism→person.

alternation hypernym hyponym independent other related synonymadvisor/councillor accuracy/precision bobcat/lynx business/commercial bicycle/cycling auberge/innbeach/range aircraft/plane boldness/courage cave/cellar green/greening avant-garde/vanguardcap/plug animal/livestock chaos/confusion col/colonel karelia/karelian bazaar/bazardealer/distributor combustion/incineration coast/shore disability/invalidity path/road boundary/limitflavor/perfume enterprise/firm compassion/sympathy flipper/pinball patriarch/patriarchate cereal/grainhomecoming/prom eruption/rash diet/food freezing/gel programming/programs faith/trusthop/leap grandchild/granddaughter footnote/note klein/small prosecution/trial normandie/normandykitten/pussycat homicide/manslaughter horseman/rider man/rights retail/retailer optimisation/optimizationphenol/phenolic machine/machinery jet/plane stability/stabilization ski/skiing usage/usetext/version oxide/rust spadefoot/toad tour/tower work/working vapor/vapour

Table 4: Labeling applied to noun pairs in PPDB-large. Only one pair appearing in WN (proliferation/nonproliferation)was assigned the label “antonym.”

7.2 Parameters

We want to optimize the accuracy of classifying anoun pair x/y as “alternation.” Using only the in-formation within Wordnet, we have three basic pa-rameters to play with: the depth of the x and y’sshared parent in the hierarchy, the distance of theshared parent from the either x or y, and the numberof children the parent has.

Depth of parent Wordnet is organized hierarchi-cally, so intuitively, nodes which are nearer to theroot refer to more general topics, and nodes deeperin the hierarchy become more specific. Figure 5shows how the top levels of the hierarchy containvery few nodes, and table 5 shows the kinds of gen-eral concepts that exist in these top levels.

It is therefore seems reasonable to assume that,for a noun pair labeled as alternation, the depth ofthe shared parent may be indicative of the accuracyof the alternation: noun pairs for which the lowestshared parent is very high in the hierarchy (close tothe root) may actually be independent (e.g. if bothnouns are hyponyms of “thing” but otherwise areunconnected) and noun pairs for which the lowestshared parent is very deep in the hierarchy (close tothe leaf nodes) may actually be in a hypernymy re-lationship (e.g. if both nodes are children of some-thing specific like “happiness”). We test the hypoth-esis that the depth of the shared parent can be tunedto reduce the number of false positive labels for al-ternation.

Distance from parent Additionally, we considerthe nouns’ distance from their shared parent. Whilein some alternation pairs, both nouns are immediatedecedents of their shared hypernym (e.g. “love” and“fear” are both children of “emotion”), other pairsconsist of nouns which are very far from the shared

depth

1 0

3 1

22 2

228 3

2020 4

6249 5

12267 6

18936 7

14155 8

11042 9

7207 10

4267 11

2505 12

1383 13

846 14

449 15

341 16

164 17

30 18

0

5000

10000

15000

20000

Depth0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

3016434144984613832505

4267

7207

11042

14155

18936

12267

6249

20202282231

Figure 5: Number of nodes at each depth in WN.

0 entity.n.011 abstraction.n.06, physical entity.n.01, thing.n.082 attribute.n.02, causal agent.n.01, change.n.06,

communication.n.02, freshener.n.01, group.n.01,horror.n.02, jimdandy.n.02, matter.n.03, mea-sure.n.02, object.n.01, otherworld.n.01, paci-fier.n.02, process.n.06, psychological feature.n.01,relation.n.01, security blanket.n.01, set.n.02,stinker.n.02, substance.n.04, thing.n.12,whacker.n.01

Table 5: 26 synsets appearing with depth <3. Mostof these synsets refer to general concepts (e.g. processand object). Others are specific, but have few or nochild nodes. The following have no children: change,freshener, horror, jimdandy, otherworld, pacifier, secu-rity blanket, stinker, substance, and whacker.

parent (e.g. “step father” and “oil painter” can beclassified as alternation through the shared parent“person,” but the nouns are 7 and 4 steps away, re-spectively). We explore the idea of limiting the max-imum distance from the shared parent in order to in-crease the precision of our alternation classification.

It is worth noting that Wordnet’s “sister terms” areequivalent to our alternation if we restrict the maxi-mum distance from the shared parent to be no morethan 1.

Number of children Finally, we consider thenumber of immediate children of the shared parent.While it is less intuitive than the previous parame-

ters how this may affect the likelihood of alterna-tion, it may be that parents with many children arenodes which enumerate subtypes (e.g. the synsetfor “flower” has over 100 children, which enumer-ate different types of flowers) and thus are high-precision indicators of alternation. We therefore alsoinvestigate the relationship between the number ofchildren of a node and the likelihood of the childrenbeing in alternation.

7.3 Labelled data for testing

In order to compare the effect of altering these pa-rameters, we manually labeled 200 noun pairs asalternation/non-alternation. We chose these pairsby randomly sorting the noun pairs that appearedin PPDB (xl) and were labeled as alternation usingour original configuration (minimum depth of 3 andno other restrictions). We labelled the first 100 un-ambiguous examples of alternation and the first 100unambiguous examples of non-alternation. The fulllist used is included at the end of this paper (table 8).This data is used to compute precision/recall mea-sures in the following experiments.

7.4 Experiments

We first performed a qualitative analysis. Manuallyinspecting the data to evaluate the minimum depthparameter, we see that there are examples of bothgood and bad alternation pairs alternating throughparents at every depth. Table 6 illustrates this.

Manual inspection of the maximum distance pa-rameter is similarly inconclusive. We ran our tag-ging algorithm on a random sample of 30 manuallylabeled pairs (15 alternation, 15 non-alternation),setting max distance d to each value from 1 to 10,and enforcing the constraint that a noun pair is in al-ternation if the nouns in the pair share a hypernymand neither noun is more than d hops from the sharedhypernym. Increasing d increases equally the num-ber of true positives and the number of false posi-tives. At d = 7, all 30 nouns were labeled as alter-nation. Table 7 shows these results.

We also look quantitatively at the effects of theseparameters. Figure 6 shows the precision and re-call curves generated by a) fixing the max distanceparameter at infinity and varying the min depth be-tween 1 and 10 and b) fixing the min depth at 0 andvarying the max distance parameter between 1 and

depth parent x/y

3 whole.n.02 ballplayer/baseballstate.n.02 bliss/happiness

4 happening.n.01 avalanche/flooddisorder.n.03 instability/turmoil

5 linear unit.n.01 km/miledistrict.n.01 town/village

6 meal.n.01 dinner/lunchatmospheric phenomenon.n.01 snow/snowstorm

7 modulation.n.02 am/pmpayment.n.01 compensation/wage

8 garment.n.01 sweater/t-shirtman.n.01 boy/guy

9 movable feast.n.01 easter/passoverclergyman.n.01 preacher/priest

10 bird of prey.n.01 eagle/hawkplacental.n.01 cattle/livestock

Table 6: Pairs of words labeled as alternation, and thewords’ shared parent. Parents at every depth can produceboth good (top) and bad (bottom) examples of alternation.

10. In the former case (left of figure 6), we see thatwe are able to achieve acceptable precision when weset a high minimum depth, but only at the expenseof unacceptably low recall. In the latter case (rightof figure 6), precision stays near 50% (the equivalentof labeling everything as alternation) no matter whatvalue we choose for max distance.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Max distance to shared parent0 1 2 3 4 5 6 7 8 9 10

P R

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Min depth of shared parent0 1 2 3 4 5 6 7 8 9 10

P R

Figure 6: Precision and recall curves when varying maxdistance (left) and min depth (right). For max distancecurves, min depth was fixed at 0. For min depth curves,max distance was fixed at 20. A precision of 0.5 anda recall of 1.0 can are achieved by labeling all pairs asalternation.

We also look at the effect of using these two pa-rameters (maximum distance from parent and mini-mum depth of parent) together. Figure 7 shows theF1 scores achieved for different combinations of pa-rameter settings. No combination achieves higherthan 0.67, the F1 score achievable by labeling every-thing as alternation (which results in 50% precision

1 2 3 4 5 6 720/21 20/21

ALTERNATION ballplayer/baseball ballplayer/baseball20/21 20/21 butterflies/moths butterflies/moths

20/21 butterflies/moths butterflies/moths clam/mussel clam/musselbutterflies/moths clam/mussel clam/mussel coat/gown coat/gownclam/mussel coat/gown coat/gown cuttlefish/squid cuttlefish/squid

20/21 coat/gown cuttlefish/squid cuttlefish/squid dress/skirt dress/skirt20/21 butterflies/moths cuttlefish/squid dress/skirt dress/skirt duck/goose duck/goosebutterflies/moths clam/mussel dress/skirt duck/goose duck/goose ecg/eeg ecg/eegclam/mussel cuttlefish/squid duck/goose ecg/eeg ecg/eeg elf/goblin elf/goblincuttlefish/squid duck/goose ecg/eeg elf/goblin elf/goblin girls/guys girls/guysduck/goose ecg/eeg elf/goblin girls/guys girls/guys km/mile km/mileecg/eeg km/mile km/mile km/mile km/mile tornado/typhoon tornado/typhoontornado/typhoon tornado/typhoon tornado/typhoon tornado/typhoon tornado/typhoon volleyball/weightlifting volleyball/weightliftingxxvii/xxviii xxvii/xxviii xxvii/xxviii xxvii/xxviii xxvii/xxviii xxvii/xxviii xxvii/xxviiiadjustment/tuning adjustment/tuning adjustment/tuning adjustment/tuning adjustment/tuning adjustment/tuning adjustment/tuningbriefing/presentation bomber/kamikaze artist/performer artist/performer artist/performer artist/performer artist/performerhome/shelter briefing/presentation bomber/kamikaze bomber/kamikaze bomber/kamikaze bomber/kamikaze bomber/kamikazeinterval/span contaminant/pollutant briefing/presentation briefing/presentation briefing/presentation briefing/presentation briefing/presentation

diagram/graphic contaminant/pollutant contaminant/pollutant contaminant/pollutant contaminant/pollutant contaminant/pollutanthome/shelter diagram/graphic cutoff/threshold cutoff/threshold cutoff/threshold cutoff/thresholdinterval/span drama/theatre diagram/graphic diagram/graphic diagram/graphic diagram/graphicjudgement/trial home/shelter drama/theatre drama/theatre drama/theatre drama/theatre

indignation/resentment home/shelter home/shelter home/shelter home/shelterinterval/span indignation/resentment indignation/resentment indignation/resentment indignation/resentmentjudgement/trial internet/website internet/website internet/website internet/website

interval/span interval/span interval/span interval/spanjudgement/trial judgement/trial judgement/trial judgement/trial

NON-ALTERNATION monitoring/surveillance monitoring/surveillance looting/theftmonitoring/surveillance

Table 7: Noun pairs labeled as alternation (out of a test set of 30 manually labeled pairs) with varying values of maxdistance. The numbers of true positives and of false positives increase equally as the max distance increases.

and 100% recall).

depth/dist 0 1 2 3 4 5 6 7 8 9 10

0 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667

1 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667

2 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667

3 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667

4 0.000 0.532 0.567 0.636 0.651 0.664 0.667 0.664 0.664 0.662 0.662

5 0.000 0.521 0.570 0.649 0.653 0.667 0.672 0.672 0.674 0.674 0.674

6 0.000 0.491 0.541 0.629 0.627 0.624 0.628 0.628 0.628 0.628 0.628

7 0.000 0.403 0.468 0.561 0.570 0.570 0.575 0.575 0.575 0.575 0.575

8 0.000 0.331 0.400 0.468 0.468 0.468 0.465 0.465 0.465 0.465 0.465

9 0.000 0.293 0.318 0.357 0.357 0.357 0.354 0.354 0.354 0.354 0.354

10 0.000 0.212 0.226 0.226 0.226 0.226 0.224 0.224 0.224 0.224 0.224

Figure 7: F1 scores for varying minimum depths (rows)and maximum distances (columns). Precision and recallwere calculated on a set of 200 manually labeled nounpairs (100 positive examples of alternation, 100 nega-tive). No combination achieves acceptable results: 0.67is the F1 score that would be achieved by labeling everypair as alternation (0.5 precision and 1.0 recall).

Finally, we look at the association between thenumber of children a parent has and the likelihoodthat those children are in alternation. Figure 8 showsa histogram of the number of immediate childrenper parent for each pair in our manually labeled dataset. The distributions of number of children is nearlyidentical for both alternation and non-alternationpairs. We also ran a precision-recall analysis vary-ing both the maximum and the minimum number ofchildren. That is, we tried labeling pairs as alter-nation if 1) both nouns shared a parent and 2) thatparent had at least/at most k children. Although the

graphs are not shown, the results were the same asin the graph for maximum distance: for all values ofk and for both the “at least” and the “at most” set-ting, precision remained at 50%, equivalent to ran-dom guessing.

0-10

10-20

20-30

30-40

40-50

50-60

60-70

70-80

80-90

>90

0-10

10-20

20-30

30-40

40-50

50-60

60-70

70-80

80-90

>90

32

19

11

2

19

3

2

5

7 2

4

1

4

7

18

9

23

32

non-alternation alternation

Figure 8: Number of immediate children of the deepestshared parent for non-alternation pairs (left) and for alter-nation pairs (right). The distributions are almost identi-cal, suggesting that the number of children of the sharedparent is not a useful way of reducing the number of falsepositive alternation labels.

8 Next steps

Do to the described limitations, and the goal applica-tion to RTE, we will gather manual annotations for asubset of our data in PPDB. By using more reliablylabeled data than what we can collect from Wordnet,we should be able to train a classifier to automati-cally label the remainder of the pairs in PPDB. The

alternation non-alternation10/11 clam/mussel grape/raisin adjustment/tuning conceptualization/design judgement/trial18/19 cm/inch hotel/restaurant ailment/disease configuration/setup lady/queen19/20 coat/gown i/j ambiguity/vagueness construction/design leave/vacation20/21 coffee/diner ii/iii analysis/assessment contaminant/pollutant looting/theftacre/hectare crab/prawn km/mile anarchy/chaos correctness/validity love/romanceafternoon/evening cricket/grasshopper latte/milk angle/perspective corruption/embezzlement mansion/villaam/pm crustacean/shellfish lavender/violet archives/file cutoff/threshold map/routeanimal/zoo cucumber/lettuce leopard/lion army/military demonstration/rally means/resourcesaustralasia/oceania cuttlefish/squid librarian/library artist/performer determination/will merit/valueavalanche/flood d/j mouth/throat atmosphere/climate diagram/graphic mil/thousandbadger/skunk daughter-in-law/son-in-law n/s award/compensation dish/platter monitoring/surveillanceballplayer/baseball dictionary/thesaurus ovum/zygote balloting/election drama/theater motion/petitionband/orchestra dinner/lunch p/s bigotry/fanaticism drama/theatre municipality/townshipbands/orchestras disc/drive pad/tampon bliss/happiness drum/keg opponent/rivalbatter/catcher discussion/review pan/pot block/chunk dust/powder outfit/suitbattleship/boat dizziness/numbness psychiatrist/psychologist blouse/shirt education/studies papa/popebeef/ox dollar/yuan rider/runner bomber/kamikaze election/poll pattern/trendbillionaire/millionaire dress/skirt roman/romanian boy/guy fellowship/scholarship persecution/repressionbinoculars/telescope dress/suit sailboat/yacht brawl/rumble figure/graphic personality/selfboat/ship drink/wood saturday/sunday briefing/presentation figure/silhouette polo/shirtbonnet/cap drive/walk sauce/spice bullying/harassment first/top preacher/priestbox/glove driveway/garage skating/skiing cannabis/hashish focus/spotlight psychologist/therapistboxer/wrestler duck/goose spite/sympathy capital/investment freeway/interstate reality/truthbrass/bronze dungeon/tower sweater/t-shirt case/file guard/warden room/suitebucks/bullets eagle/hawk sweater/vest cassette/tapes high-rise/skyscraper sailing/yachtingbutterflies/moths easter/passover tornado/typhoon cattle/livestock home/shelter seminars/symposiumscabin/cockpit ecg/eeg typewriter/typist charisma/charm humidity/moisture senior/superiorcanoe/motorboat elbow/shoulder vcr/video chevalier/knight idea/insight shawl/wrapcape/robe elf/goblin vii/viii chin/jaw inability/weakness shuttle/spaceshipcatcher/hitter epinephrine/norepinephrine volleyball/weightlifting chip/token indignation/resentment snow/snowstormchar/trout firefighter/firehouse windshield/wiper coil/spool instability/turmoil sore/woundchariot/wagon gallon/litre xvii/xviii companion/partner internet/website subway/traincheetah/leopard gdp/gnp xxvii/xxviii compensation/wage interval/span town/village

girls/guys itinerary/trip

Table 8: Manually annotated noun pairs used in precision/recall calculations.

resulting labels (or distributions over labels) willhopefully offer a valuable contribution to RTE sys-tems, with a scale and level of annotation that is notcurrently available in existing resources.

9 References

MacCartney, Bill. Natural Language Inference.Diss. Stanford University, 2009.

Documents

Classiﬁcation of Entailment Relations in PPDB · 2017. 11. 15. · Classiﬁcation of Entailment Relations in PPDB 1 Overview This document outlines our protocol for labeling noun