WordNet for MT Christiane Fellbaum Dept. of Computer Science [email protected]

WordNet for MT

Christiane Fellbaum

Dept. of Computer Science

[email protected]

The Challenge

Globalization requires more texts and speech to be translated faster across more languages

The half-empty glass

• Manual translation is difficult, expensive, time-consuming

• Machine translation is of low quality, often unacceptable

The half-full glass

• Human-aided machine translation can work for restricted domains (science, instruction manuals, etc., but not literature or poetry)

• Restricted domains use limited vocabulary (terminology)

• Key words are less polysemous (often monosemous)

• “Prefabricated” phrases (“let x be y”) don’t need to be translated anew each time

Hardest part of translation(?):

lexical disambiguation

Focus on two challenges

• identify the intended sense of a polysemous word in the source

• find the context-appropriate word in the target language

A local resource: WordNet

What is WordNet?

• A large lexical database, or “electronic dictionary,” developed and maintained at Princeton

http://wordnet.princeton.edu• Includes most English nouns, verbs,

adjectives, adverbs• Can be used by humans and machines• Princeton WordNet is for English only,

but it is linked to wordnets is many other languages

What’s special about WordNet?

• Traditional paper dictionaries are organized alphabetically: words that are found together (on the same page) are not related by meaning

• WordNet is organized by meaning: words in close proximity are semantically similar

• Human users and computers can browse WordNet and find words that are meaningfully related to their queries (somewhat like in a hyperdimensional thesaurus)

What’s special about WordNet?

WordNet gives information about two fundamental, universal properties of human language:

polysemy and synonymy

Polysemy = one:many mapping of form and meaning

Synonymy = one:many mapping of meaning and form

Polysemy

One word form expresses multiple meanings

{table, tabular_array}{table, piece_of_furniture}{table, mesa}{table, postpone}

Note: the most frequent word forms are the most polysemous!

Synonymy

One concept is expressed by several different word forms:

{beat, hit, strike}

{car, motorcar, auto, automobile}

Polysemy and synonymy

Understanding and generating language (as for translation) means matching a word form with the intended, context-appropriate meaning

People (fluent speakers of a language) do this very efficiently

Synonymy in WordNet

WordNet groups (roughly) synonymous, denotationally equivalent, words into unordered sets of synonyms (“synsets”)

{hit, beat, strike}{big, large}{queue, line}

By definition, each synset expresses a distinct meaning/concept

Each word form-meaning pair is unique

Polysemy in WordNetA word form that appears in n synsets is n-fold polysemous

{table, tabular_array}{table, piece_of_furniture}{table, mesa}{table, postpone}

table is fourfold polysemous/has four sensesfour distinct concepts are associated with the

word form table

Some WordNet stats

Part of speech

Word forms Synsets

noun 117,798 82,115

verb 11,529 13,767

adjective 21,479 18,156

adverb 4,481 3,621

total 155,287 117,659

The “Net” part of WordNet

Synsets are interconnected

Bi-directional arcs express semantic relations

Result: large semantic network (graph)

Hypo-/hypernymy relates noun synsets

Relates more/less general conceptsCreates hierarchies, or “trees” {vehicle} / \ {car, automobile} {bicycle, bike} / \ \ {convertible} {SUV} {mountain bike}

“A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes”

Hierarchies can have up to 16 levels

Hyponymy

Transitivity:

A car is a kind of vehicle

An SUV is a kind of car

=> An SUV is a kind of vehicle

Meronymy/holonymy(part-whole relation)

{car, automobile} | {engine} / \ {spark plug} {cylinder}

“An engine has spark plugs” “Spark plus and cylinders are parts of an engine”

Meronymy/Holonymy

Inheritance:

A finger is part of a hand A hand is part of an armAn arm is part of a body=>a finger is part of a body

Structure of WordNet (Nouns)

{vehicle}

{conveyance; transport}

{car; auto; automobile; machine; motorcar}

{cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab; }

{motor vehicle; automotive vehicle}

{bumper}

{car door}

{car window}

{car mirror}

{hinge; flexible joint}

{doorlock}

{armrest}

hyperonym

hyperonym

hyperonym

hyperonymhyperonym

meronym

meronym

meronym

meronym

WordNet Data Model

bank

fiddleviolin

violistfiddler

string

rec: 12345- financial instituterec: 54321

- side of a riverrec: 9876

- small string instrumentrec: 65438

- musician playing violinrec:42654

- musician

rec:25876

- string instrument

rec:35576

- string of instrumentrec:29551

- subatomic particle

type-of

type-of

part-of

Vocabulary of a languageConceptsRelations

1

2

2

1

1

2

A bit of history (or: Did we just make this stuff up?)

1980s Research in Artificial Intelligence (AI):

How do humans store and access knowledge about concept?

Knowledge about concepts is huge--must be stored in an efficient and economic fashion

Hypothesis: concepts are interconnected via meaningful relations

A bit of history

One hypothesis:

Knowledge about concepts is computed “on the fly” via access to general concepts

E.g., we know that “canaries fly” because

“birds fly” and “canaries are a kind of bird”

A bit of history

animal (alive, breathes, moves..)

| bird (has feathers, lays eggs,..)

| canary (yellow, sings,…)

A bit of history

Knowledge is stored at the highest possible node (animals move, birds fly, canaries sing)

Collins & Quillian (1969) measured reaction times to statements involving knowledge distributed across different “levels”

A bit of history

People confirmed statements like

(1) bird lays eggs

faster than

(2) canary lays eggs

Hypothesis: (2) requires “look-up” at higher level, (1) doesn’t

A bit of history

Collins’ & Quillian’s results are not compelling Reaction time to statements like “do canaries move?”

are influenced by prototypicality (robins are more typical birds than emus) word frequency (robin occurs more often than emu and people recognize frequent words faster), uneven semantic distance across levels

But the idea inspired WordNet (1986), which asked:Can most/all of the lexicon be represented as a

semantic network? Or are there unconnectable words and concepts?

Adjective relations

Strong association between members of antonymous adjective pairs:

hot-cold, old-new, high-low, big-small,...

WordNet connects members of such pairs(“direct antonyms”) as well as similar but

less salient adjectives ( e.g., cool, lukewarm...)

Experimental Evidence

Reaction time measurements for semantic judgments (it takes less time to confirm that hot and cold are opposites than that hot and chilly are)

Weakened by: frequency, prototypicality effects

Relations among verbs

Manner relation connects verbs like

move-walk-run-jog

communicate-talk-whisper

Relations reflecting temporal or logical order:

divorce-marry, snore-sleep, buy-pay

Manner relation builds “trees”

WN as a lexical resource

“Have concept, need words”

--depart from synset, travel in WordNet space

“Have word, need concept”

--query word form, find associated synsets

Is WordNet a Thesaurus?• Yes:--it groups together meaningfully related words• No:--it labels the relations--the relations are limited--related words are linked to specific concepts

(disambiguated); thesaurus is a “bag of words”

--many words linked in WordNet do not co-occur in the same thesaurus entry

--WordNet allows one to measure and quantify the semantic similarity or distance among words and concepts

Web interface for WN search• Noun

ｷｷ S: (n) bicycle, bike, wheel, cycle (a wheeled vehicle that has two wheels and is moved by foot pedals)

• o direct hyponym / full hyponym

ｷｷ S: (n) bicycle-built-for-two, tandem bicycle, tandem (a bicycle with two sets of pedals and two seats)

ｷｷ S: (n) mountain bike, all-terrain bike, off-roader (a bicycle with a sturdy frame and fat tires; originally designed for riding in mountainous country)

ｷｷ S: (n) ordinary, ordinary bicycle (an early bicycle with a very large front wheel and small back wheel) ｷｷ S: (n) push-bike (a bicycle that must be pedaled) ｷｷ S: (n) safety bicycle, safety bike (bicycle that has two wheels of equal size; pedals are connected to the rear wheel by a multiplying gear)

ｷｷ S: (n) velocipede (any of several early bicycles with pedals on the front wheel)

• o part meronym

ｷｷ S: (n) bicycle seat, saddle (a seat for the rider of a bicycle)

ｷｷ S: (n) bicycle wheel (the wheel of a bicycle) ｷｷ S: (n) chain (a series of (usually metal) rings or links fitted into one another to make a flexible ligament)

ｷｷ S: (n) coaster brake (a brake on a bicycle that engages with reverse pressure on the pedals)

ｷS: (n) handlebar (the shaped bar used to steer a bicycle)

ｷｷ S: (n) kickstand (a swiveling metal rod attached to a bicycle or motorcycle or other two-wheeled vehicle; the rod lies horizontally when not in

use but can be kicked into a vertical position as a support to hold the vehicle upright when it is not being ridden) ｷVerb

ｷｷ S: (v) bicycle, cycle, bike, pedal, wheel (ride a bicycle)

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=bicycle&h=00&j=0

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=bike

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=wheel

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=cycle


http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&r=1&s=bicycle&i=1&h=110000001000000000000

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=bicycle&i=2&h=110000001000000000000

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=bicycle-built-for-two

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=tandem+bicycle

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=tandem


http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=mountain+bike

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=all-terrain+bike

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=off-roader

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=off-roader


http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=ordinary

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=ordinary+bicycle


http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=push-bike


http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=safety+bicycle

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=safety+bike


http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=velocipede




http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=bicycle+seat

http://wordnet.princeton.edu/perl/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=saddle


WordNet as a lexical resource

• WN has been incorporated into many dictionaries

• Google “define” usually brings up WN entry at the top of the list

• User-created visual interfaces (e.g., visualthesaurus.com)

Back to the problem

• How does a computer find the right word associated with a given concept, or

• the right concept associated with a given word?

(This is is a crucial step in information retrieval, text mining, document sorting, machine translation...)

• Example:

John needed cash so he walked over to the bank

Which bank?

Money institution? (building/institution?)

Sloping land by the water?

Word Sense Discrimination

Less difficult: homonymy (unrelated senses of a word form): bank (river bank/financial institution), bat (mammal/raquet), club (social organization/stick)

Systems perform very well/near-perfectE.g., can rely on “one sense per discourse” rule

(Yarowsky)

Very difficult: polysemy (related senses of a word form): bank (institution vs. building)

Systems perform much worseNeed local context

Approaches to WSD

• Knowledge-based (using resources like WordNet)

• Statistical (corpus-based clustering of senses; many use WordNet in addition)

• Supervised (train on manually disambiguated corpus)

• Unsupervised (“discover” senses)

Knowledge-Based Approaches

Determine which sense in a lexical resource (like WordNet) a given token (occurrence) of a word represents

“Dumb” method: assume that the most frequently occurring sense of a polysemous word is the context-appropriate one

Works amazingly well: 65-70% correct for nouns

Frequency is relative to a human-annotated “gold standard” corpus

Less “dumb” approaches

Basic Assumptions

• Natural language context is coherent (*Colorless green ideas sleep furiously)

• Words co-ocurring in context are semantically related to one another

• Given word w1 with sense 1, determine which sense 1, 2, 3,…of word w2 is the context-appropriate one

• WN allows one to determine and measure meaning similarity among co-occurring words

WordNet-Based Method

• Look for words in the vicinity (context) of the target word

• Find that sense of the target word that is related to the context words in WordNet (shared superordinate, parts, definition, etc.)

• Shortest path among candidate words often shows the intended sense

Bank/money institution and cash are linked (share word in their definitions), so if cash and bank co-occur in a context, bank likely has the “financial institution” sense

Deriving meaning similarity metrics from WN’s structure

Shortest path length is between concepts in noun hierarchies (Leacock&Chodorow)

Problem: edges are not of equal length in natural language (i.e., semantic distance is not uniform)

possession elephant | |white elephant Indian elephant

Corrections for differences in edge lengths

• Scale by depth of hierarchies of the synsets whose similarity is measured (path from target to its root)

• Density of sub-hierarchy: intuition that words in denser hierarchies (more sisters) are very closely related to one another

• Types of link (IS-A, HAS-A, other): too many “changes of directions” reduce score

Information-based similarity measure (Resnik)

Intuition: similarity of two concepts is the degree to which they share information in common

Can be determined by the concepts’ relative position wrt the lowest common superordinate

Define “class”: all concepts below the lowest common superordinate


medium of exchange / \ money \ | \ cash \ | \ coin credit / \ | nickel dime credit card

Classes: coin, medium of exchange


Information content of a concept c:

IC(c) = -logp(p(c))

p is p of encountering a token of concept c in a corpus Class notion entails that occurrence of a member of the class counts as occurrence of all class members Probability of a word is the sum of all counts of all class members divided by total number of words in the corpus

p increases as you go up the hierarchy


Similarity of a pair of concepts is determined by the least probable (most informative) class they belong to

simR (c1 c2) = -logp(lso (c1 c2))

“Lesk” method

overlap of words in definitions of synsets is a measure of similarity

(Lesk: Why is a pine cone not like an ice cream cone? Because there’s no lexical overlap in their definitions!)

Knowledge-Lean Approaches

• Exploit distributional (contextual) properties of words in a corpus

• Context-based clustering (with/without WN support)

• Induce senses from clusters• Use WN similarity to evaluate clusters

Knowledge-Lean Approaches(McCarthy et al.)

• For each token of a target word, find words that are distributionally similar (based on corpus analysis)

• Nearest neighbors characterize the domain of the target

• Use WN relations and WN gloss overlap to measure similarity between target and its neighbors

• Sense of target that is most similar to words in the domain is predominant in that domain

• Target word (token) is assigned that WN sense

Supervised systems

• Perform better than unsupervised ones• WSD is a learning task• Train classifiers on data annotated by

humans (“gold standard”)• Each sense-tagged occurrence of a particular

word is a feature vector used in learning• Problem: hand-annotated data are sparse!

Supervised learning with sparse data

• Start with hand-annotated seed• Determine contextual classifiers• Augment contextual classifiers with WN similarity• Reasoning: if baseball is a good discriminator for a sense of

play, then football, hockey, etc. should also be a good discriminator for that sense of play

• Use monosemous (single-sense) relatives only

Similarity measures

Good reference:

Ted Pedersen::WordNetSimilarity

http://wn-similarity.sourceforge.net

Back to MT

Now that we (think we) can discriminate word senses within one language, how do we find the corresponding senses in another language?

WordNet(s) for Translation

Needed:

Wordnet(s) in the target language(s)

Crosslinguistic WordNets • Starting in late 1990s, WordNets were built for

languages other than English• Genetically and typologically unrelated

languages: Turkish, Hindi, Chinese, Korean, Basque, Zulu, Arabic,… (currently >60)

• Mapped to Princeton WordNet

www.globalwordnet.org

• Great potential for crosslinguistic applications

Mapping words and synsets across multilingual WordNets

• First set of foreign-language WNs (“EuroWordNet”) were built with reference to Princeton WordNet

• Princeton WN as the hub (“interlingual index”) • Each synset in each WN was linked to a

“record” (PWN synset identifier) in the index• Crosslingual mapping of words and synsets

proceeds via the index

ENGLISHCar…

Train…

Vehicle

Inter-Lingual-Index

Transport

Road Air Water

Domains

Device

Object

TransportDevice

English Words

vehicle

car train

1

2

4

3 3

Czech Words

dopravní prostředník

auto vlak

2

1

French Words

véhicule

voiture train

2

1

Estonian Words

liiklusvahend

auto killavoor

2

1

German Words

Fahrzeug

Auto Zug

2

1

Spanish Words

vehículo

auto tren

2

1

Italian Words

veicolo

auto treno

2

1

Dutch Words

voertuig

auto trein

2

1

EWN Interlingual Index

Index is flat list of synsets (relations were removed)

Relations are represented in each language-specific wordnet (incl. English

WN, which “resurfaced” as one of the language-specific wordnets)

Mismatches in multilingual WordNets

Concepts not lexicalized in English required new entries in the Interlingual Index (w/out English synset):

--Arabic lexically distinguishes 12 kinds of cousin--Index may refer to “son of father’s brother”/”daughter

of mother’s sister” etc.

Automatic system will likely choose underspecified concept, “cousin.” Human translator can decide to use “cousin” or a more specific paraphrase

Mismatches in multilingual WordNets

Conversely, some lgs. lack equivalents of English words:

--Dutch lacks a word for container but has kinds (hyponyms) of container (box, bag, bucket..)

Respective hierarchies reflects this difference:Du. bag, box,..kind of artifactEngl. bag, box,…kind of container, which is a kind of

artifact

Translator may specify the kind of container

English-Dutch snippet

voorwerp{object}

artifact

tas{bag}

bak{box}

lichaam{body}

English Wordnet Dutch Wordnet

bagbox

object

natural object artifact, artefact (a man-made object)

body

container

From EuroWordNet to Global WordNet

• Currently, wordnets exist for more than 60 languages, including:

• Arabic, Bantu, Basque, Chinese, Bulgarian, Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean, Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, Zulu...

• Many languages are genetically and typologically unrelated

• http://www.globalwordnet.org

• The more languages, the more mismatches

• Not all languages have the same lexical categories (N, V, Adj)

Problems with ILI model

ILI model requires lots of entries that are not represented in the ILI language (English)

ILI is English-centric, may bias constructions of other wordnets (esp. those using the “translation” method)

Better model

Replace ILI based on a natural language

with a formal, language-independent ontology

Concepts are represented by axioms in first order logic

Top level ontologies have been worked out by philosophers

SUMO, Dolce,...

strictly categorize and distinguish entities (objects, endurants), events (perdurants), properties

Ontology

• Mapping of several competing ontologies to WordNet(s)

• Suggested Upper Merged Ontology (SUMO; Niles and Pease)

SUMO• Upper-level, formal ontology (abstract concepts)• 1K terms, 4K axioms• MILO: (mid-level ontology): several K more terms• SUMO+MILO: 20K terms, 70K axioms• axioms are written in SUO-KIF language (Knowledge

Interchange Format)• All terms manually mapped to WN (and WNs in other

languages)

Axiom for “earlier” in SUMO

(<=> (earlier ?INTERVAL1 ?INTERVAL2) (before (EndFn ?INTERVAL1) (BeginFn ?INTERVAL2)))

“An interval that precedes another; the ending of the first interval is before the beginning of the second interval”

? = variable

Fn=function that takes a time interval and returns the time point at the end of the interval

Ontology also has axioms for “before,” etc.

“Clean” ontologies refer to essential properties

distinguish among rigid and non-rigid entities

allow reasoning, inferencing:

if x is a y, and y is a z, then z is an x

An ontology differs from a wordnet

Wordnets – represent how we use language e.g., the word „cat“.

Ontologies - represent what it is to be a cat. e.g., meta-property „rigidity“

Rigidity„Cat“ is a rigid concept. „Pet“ is a non-rigid concept.A concept is rigid if it is essential to all of its

instances. Permanence – Fluffy is always a cat, not

always a pet. Necessity – Fluffy cannot stop being a cat.

Fluffy can stop being a pet.

Reasoning with rigidityFluffy was a cat. Fluffy was not a pet on Monday.Fluffy was a pet on Tuesday.#Fluffy was not a cat on Tuesday.

Pet-hood is sensitive to time and circumstancesCat-hood is notNeed to distinguish rigid and non-rigid in the ontology!

Making ontological distinctions

Fluffy(Instance)

Animal (Rigid)

Pet(Non-Rigid)

Fluffy(Instance)

Animal

Pet(Non-Rigid)

Cat (Rigid)

“Rudify”

An ontology annotation tool currently being developed

“Rudify”

Rudify learns to classify words collected from the web by means of lexical patterns

Associate the words with the appropriate concept

Distinguish rigid from non-rigid concepts

Lexical patterns that distinguish rigid from non-rigid

conceptsX (would be | make) (a good | a bad) Y

X is no longer a(n) Y

vs.

Xs (such as | like) Ys

Xs and other Ys

Training Rudify

100 selected words/concepts

contexts from Google

50 rigid, 50 non-rigid

manually annotated as +/- rigid

Testing Rudify

Two test suites:

--297 Base Concepts (identified by the Spanish wordnet team)

--287 terms referring to Regions and Species

(common regions and species, Latin species)

How well does Rudify do?

Rudify’s classification is compared with that of OntoClean

Out of the 287 Regions and Species terms,

Rudify misclassified only three, where

rigid concepts were misclassified as non-rigid

Error Analysis

(1) Misclassification due to lexical pattern:

Wolf was classified as non-rigid

“The dog is no longer a wolf but a separate species”

(2) Misclassification due to polysemy/metaphor

wildcat (animal, gun, mascot)

Apollo (butterfly, god, space mission)

Documents

WordNet for MT Christiane Fellbaum Dept. of Computer Science [email protected]