71
Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Embed Size (px)

Citation preview

Page 1: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Deep Text Understanding with WordNet

Christiane Fellbaum

Princeton University and

Berlin-Brandenburg Academy of Sciences

Page 2: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

WordNet• What is WordNet and why is it interesting/useful?

• A bit of history

• WordNet for natural language processing/word sense disambiguation

Page 3: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

What is WordNet?• A large lexical database, or “electronic dictionary,”

developed and maintained at Princeton University http://wordnet.princeton.edu• Includes most English nouns, verbs, adjectives, adverbs• Electronic format makes it amenable to automatic

manipulation• Used in many Natural Language Processing applications

(information retrieval, text mining, question answering, machine translation, AI/reasoning,...)

• Wordnets are built for many languages (including Danish!)

Page 4: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

What’s special about WordNet? • Traditional paper dictionaries are organized alphabetically:

words that are found together (on the same page) are not related by meaning

• WordNet is organized by meaning: words in close proximity are semantically similar

• Human users and computers can browse WordNet and find words that are meaningfully related to their queries (somewhat like in a hyperdimensional thesaurus)

• Meaning similiarity can be measured and quantified to support Natural Language Understanding

Page 5: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

A bit of history

Research in Artificial Intelligence (AI):How do humans store and access knowledge about

concept?Hypothesis: concepts are interconnected via

meaningful relationsKnowledge about concepts is huge--must be stored

in an efficient and economic fashion

Page 6: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

A bit of history

Knowledge about concepts is computed “on the fly” via access to general concepts

E.g., we know that “canaries fly” because

“birds fly” and “canaries are a kind of bird”

Page 7: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

A simple picture

animal (animate, breathes, has heart,...)

|

bird (has feathers, flies,..)

|

canary (yellow, sings nicely,..)

Page 8: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Knowledge is stored at the highest possible node and inherited by lower (more specific) concepts rather than being multiply stored

Collins & Quillian (1969) measured reaction times to statements involving knowledge distributed across different “levels”

Page 9: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Do birds fly?

--short RT

Do canaries fly?

--longer RT

Do canaries have a heart?

--even longer RT

Page 10: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Collins’ & Quillian’s results are subject to criticism (reaction time to statements like “do canaries move?” are influenced by prototypicality, word frequency, uneven semantic distance across levels)

But other evidence from psychological experiments confirms that humans organize knowledge about words and concept by means of meaningful relations

Access to one concepts activates related concepts in an outward spreading (radial) fashion

Page 11: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

A bit of history

But the idea inspired WordNet (1986), which asked:Can most/all of the lexicon be represented as a semantic

network where words are interlinked by meaning? If so, the result would be a semantic network (a graph)

Page 12: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

WordNet

If the (English) lexicon can be represented as a semantic network, which are the relations that connect the nodes?

Page 13: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Whence the relations?

• Inspection of association normsstimulus: hand reponse: finger, arm

• Classical ontology (Aristotle): IS-A (maple-tree), HAS-A (maple-leaves)

• Co-occurrence patterns in texts (meaningfully related words are used together)

Page 14: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Relations:Synonymy

One concept is expressed by several different word forms:

{beat, hit, strike}

{car, motorcar, auto, automobile}

{ big, large}

Synonymy = one:many mapping of meaning and form

Page 15: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Synonymy in WordNet

WordNet groups (roughly) synonymous, denotationally equivalent, words into unordered sets of synonyms (“synsets”)

{hit, beat, strike}{big, large}{queue, line}

Each synset expresses a distinct meaning/concept

Page 16: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

PolysemyOne word form expresses multiple meaningsPolysemy = one:many mapping of form and

meaning

{table, tabular_array}{table, piece_of_furniture}{table, mesa}{table, postpone}

Note: the most frequent word forms are the most polysemous!

Page 17: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Polysemy in WordNet

A word form that appears in n synsets is n-fold polysemous

{table, tabular_array}{table, piece_of_furniture}{table, mesa}{table, postpone}

table is fourfold polysemous/has four senses

Page 18: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Some WordNet stats

117,659155,287total

3,6214,481adverb

18,15621,479adjective

13,76711,529verb

82,115117,798noun

Synsets containing wf

Word formsPart of speech

Page 19: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

The “Net” part of WordNet

Synsets arethe building block of the network

Synsets are interconnected via relations

Bi-directional arcs express semantic relations

Result: large semantic network (graph)

Page 20: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Hypo-/hypernymy relates noun synsets

Relates more/less general conceptsCreates hierarchies, or “trees” {vehicle} / \ {car, automobile} {bicycle, bike} / \ \ {convertible} {SUV} {mountain bike}

“A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes”Hierarchies can have up to 16 levels

Page 21: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Hyponymy

Transitivity:

A car is a kind of vehicle

An SUV is a kind of car

=> An SUV is a kind of vehicle

Page 22: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Meronymy/holonymy(part-whole relation)

{car, automobile} | {engine} / \ {spark plug} {cylinder}

“An engine has spark plugs” “Spark plus and cylinders are parts of an engine”

Page 23: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Meronymy/Holonymy

Inheritance:

A finger is part of a hand A hand is part of an armAn arm is part of a body=>a finger is part of a body

Page 24: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Structure of WordNet (Nouns)

{vehicle}

{conveyance; transport}

{car; auto; automobile; machine; motorcar}

{cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab; }

{motor vehicle; automotive vehicle}

{bumper}

{car door}

{car window}

{car mirror}

{hinge; flexible joint}

{doorlock}

{armrest}

hyperonym

hyperonym

hyperonym

hyperonymhyperonym

meronym

meronym

meronym

meronym

Page 25: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

WordNet Data Model

bank

fiddleviolin

violistfiddler

string

rec: 12345- financial instituterec: 54321

- side of a riverrec: 9876

- small string instrumentrec: 65438

- musician playing violinrec:42654

- musician

rec:25876

- string instrument

rec:35576

- string of instrumentrec:29551

- subatomic particle

type-of

type-of

part-of

Vocabulary of a languageConceptsRelations

1

2

2

1

1

2

Page 26: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences
Page 27: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

WordNet for Natural Language Processing

Challenge:

get a computer to “understand” language

• Information retrieval

• Text mining

• Document sorting

• Machine translation

Page 28: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Natural Language Processing• Stemming, parsing currently at >90% accuracy

level • Word sense discrimination (lexical

disambiguation) still a major hurdle for successful NLP

• Which sense is intended by the writer (relative to a dictionary)?

• Best systems: ~60% precision, ~60% recall (but human inter-annotator agreement isn’t perfect, either!)

Page 29: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Understanding text beyond the word level

(joint work with Peter Clark and Jerry Hobbs)

Page 30: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Knowledge in text

Human language users routinely derive knowledge from text that is NOT expressed on the surface

Perhaps more knowledge is unexpressed than overtly expressed on the surface

Grasser (1981) estimates

explicit:implicit info = 1:8

Page 31: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

An exampleText: A soldier was killed in a gun battleInferences:Soldiers were fighting one anotherThe soldiers had guns with live ammunitionMultiple shots were firedOne soldier shot another soldierThe shot soldier died as a result of the injuries

caused by the shotThe time interval between the fatal shot and the

death was short

Page 32: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Humans use world knowledge to supplement word knowledge

(How) can such knowledge be encoded and harnessed by automatic systems?

Previous attempts (e.g., Cyc’s microtheories)

--too few theories

--uneven coverage of world knowledge

Page 33: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Recognizing Textual Entailment

Task:

Evaluate truth of hypothesis H given a text T

(T) A soldier was killed in a gun battle

(H) A soldier died

Answer may be yes/no/probably/...

Page 34: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

RTE

Many automatic system attempt RTE via lexical, syntactic matching algorithms (“do the same words occur in T, H?” “do T, H have the same subject/object?”)

Not “deep” language understanding

Page 35: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Our RTE test suite

250 Text-Hypothesis pairs

for 50% of them, H is entailed by T

for the remaining 50%, H is not (necessarily) entailed

Focus on semantic interpretation

Page 36: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

RTE test suite

Core of T statements came from newspaper texts

H statements were hand-coded

focus on general world knowledge

Page 37: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

RTE test suite

Manually analyzed pairs

Distinguished, classified 19 types of knowledge among the T-H pairs

some partial overlap

Page 38: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Exx: Types of knowledge(increasing order of difficulty)

Lexical: relation among irregular forms of a single lemma, Named Entities vs. proper nouns

Lexical-semantic (paradigmatic): synonyms, hypernyms, meronyms, antonyms, metonymy, derivations

Syntagmatic: selectional preferences, telic rolesPropositional: cause-effect, preconditionsWorld knowledge/core theories (e.g., ambush entails

concealment)

Page 39: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Overall approach (bag of tricks)

• Initial text interpretation with language processing tools (Peter Clark et al.)

• Compute subsumption among text fragments

• WordNet augmentations

Page 40: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Text interpretation

First step: parsing (assign a structure to a sentence or phrase)

SAPIR parser (Harrison & Maxwell 1986)

SAPIR also produces a Logical Form (LF)

Page 41: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

LFs

LF structures are trees generated by rules parallel to grammar rules

contain logic elements

nouns, verbs, adj’s, prepositions represented as variables

LFs are parsed and have part-of-speech tags

LFs generate ground logical assertions

Page 42: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Example

LF for "A soldier was killed in a gun battle."

(DECL

((VAR X1 "a" "soldier")

(VAR X2 "a" "battle" (NN "gun" "battle")))

(S (PAST) NIL "kill" ?X1 (PP "in" ?X2)))

Page 43: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Logical assertions

logic for "A soldier was killed in a gun battle."

object(kill,soldier) & in(kill,battle) & modifier(battle,gun)

Page 44: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Result: T, H in Logical Form

Page 45: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Matching sentences/fragments with subsumption

A basic reasoning operationA person loves a personsubsumesA man loves a woman

Set1 of clauses subsumes another Set2 of clauses if each clause in S1 subsumes some member of S2.

Similary, a set of clauses subsumes another set of clauses if the arguments of the first subsume or match the arguments of the second

Argument (word) subsumption as in WordNet (X is a Y)Matching = synonyms

Page 46: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Syntactic matching of predicates

--both are the same

--one is predicate “of” or modifier (my friend’s car, the car of my friend)

--predicates “subject” and “by” match (passives)

Page 47: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Lexical (word) matching

Words related by derivational morphology (destroy, destruction) are considered matches in conjunction with syntactic matches

Page 48: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Recognize as equivalent:

the bomb destroyed the shrine

the destruction of the shrine by the bomb

But not

the destruction of the bomb by the shrine

a person attacks with a bomb

there is a bomb attack by a person

Page 49: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Benefits for text understanding/RTE

(T) Moore is a prolific writer(H) Moore writes many books

Moore is the Agent of write

Page 50: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Exploiting word and world knowledge encoded in WordNet

Page 51: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Use of WordNet glosses

Glosses = definition of “concept” expressed by synset members

{airplane, plane (an aircraft that has fixed wings and is powered by propellers or jets)}

syntagmatic information, world knowledge

Page 52: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Translating glossed into First Order Logic Axioms

{ bridge, span (any structure that allows people or vehicles to cross an obstacle such as a river or canal...)}

bridgeN1(x,y)

<--> structureN1(x) & allowV1(x,e1) & crossV1(e1,z,y)

& obstacleN2(y) & person/vehicle(z)

personN1(z) --> person/vehicle(z)

vehicleN1(z) --> vehicle/person(z)

riverN2(y) --> obstacleN2(y)

canalN3(y) --> obstacleN2(y)

Page 53: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

The nouns, verbs, adjectives, adverbs in the LF glosses were manually disambiguated

Thus, each variable in the LFs was identified not just with a word form, but a form-meaning pair (sense) in WordNet

LFs were generated for 110K glosses Particular emphasis on CoreWordNet

Page 54: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

How well do our tricks perform?

Page 55: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

An example that works

Exploiting formally related words in WN:

(T) …go through licensing procedures

(H) …go through licensing processes

Exploiting hyponymy (IS-A relation):

(T) Beverley served at WEDCOR

(H) Beverley worked at WEDCOR

Page 56: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

More complex example that works

(T) Britain puts curbs on immigrant labor from Bulgaria

(H) Britain restricted workers from Bulgaria

Page 57: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Knowledge from WordNet

Synset with gloss: {restrict, restrain, place_limits_on, (place restrictions on)}

Synonymy: {put, place}, {curb, limit}

Morphosemantic link: {labor} - {laborer}

Hyponymy: {laborer} ISA {worker}

Page 58: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Example that doesn’t work

(T) The Philharmonic orchestra draws large crowds(H) Large crowds were drawn to listen to the

orchestraWordNet tells us thatorchestra = collection of musiciansmusician = someone who plays musical instrumentmusic = sound produced by musical instrumentslisten = hear = perceive soundBut WN doesn’t tell us that playing results in sound

production and that there is a listener

Page 59: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Examples that don’t work

The most fundamental knowledge that humans take for granted trips up automatic systems

Such knowledge is not explicitly taught to children

But it must be “taught” to machines!

Page 60: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Core theories (Jerry Hobbs)

• Attempt to encode fundamental knowledge

• Space, time, causality,...

• Essential for reasoning

• Not encoded in WordNet glosses

Page 61: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Core theories

• Manually encoded

• Axiomatized

Page 62: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Core theories

• Composite entities (things made of other things, stuff)

• Scalar notions (time, space,...)

• Change of state

• Causality

Page 63: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Core theories

Example of predications:

change(e1,e2)

changeFrom(e1)

changeTo(e2)

Page 64: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Core theories and WordNet

map core theories to Core WN synsets

encode meanings of synsets denoting events, event structure in terms of core theory predications

Page 65: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Examples

let(x,e) <--> not(cause(x,not(e)))

{go, become, get (“he went wild”)}go(x,e) <--> changeTo(e)

free(x,y) <--> cause(x,changeTo(free(y)))

(All words are linked to WN senses)

Page 66: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Example

The captors freed the hostagesThe hostages were free

free = let(x, go(y, free(y)))<--> not(cause(x, not(changeTo(free(y)))<--> cause(x, changeTo(free(y)))<--> free(x,y)

Page 67: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Preliminary evaluation

(What) does each component contribute to RTE?

For the 250 Text-Hypothesis pairs in our test suite:

Page 68: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

7297when H or ¬H is not predicted (assumed not to be entailed)

14WordNet glosses in LF

114WordNet relations

311syntactic transformations

IncorrectCorrectwhen H or ¬H is predicted by:

Page 69: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Conclusion

• Way to go!

• Deliberately exclude statistical similarity measures (this hurts our results)

• Symbolic approach: aim at deep level understanding

Page 70: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

WordNet for Deeper Text Understanding

• Axioms in Logical Form are useful for many other NL Understanding applications

• E.g., automated question answering: translate Qs and As into logic representation

• Logic representations enable reasoning (axioms can be fed into a reasoner/logic prover)

Page 71: Deep Text Understanding with WordNet Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Thanks for your attention