CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 1

CPSC 503Computational Linguistics

Computational Lexical SemanticsLecture 12

Giuseppe Carenini

04/19/23 CPSC503 Winter 2007 2

Today 22/10

Three well-defined Semantic Task• Word Sense Disambiguation

– Corpus and Thesaurus

• Word Similarity– Thesaurus and Corpus

• Semantic Role Labeling

04/19/23 CPSC503 Winter 2007 3

WSD example: table + ?? -> [1-6]

The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …)

04/19/23 CPSC503 Winter 2007 4

WSD methods

•Machine Learning – Supervised– Unsupervised

•Dictionary / Thesaurus (Lesk)

04/19/23 CPSC503 Winter 2007 5

Supervised ML Approaches to WSD

MachineLearning

Classifier

TrainingData

((word + context1) sense1)……((word + contextn) sensen)

sense(word + context)

04/19/23 CPSC503 Winter 2007 6

Training Data Example

..after the soup she had bass with a big salad…

((word + context) sense)i

context

Examples, • One of 8 possible senses for “bass” in WordNet• One of the 2 key distinct senses for “bass” in

WordNet

sense

04/19/23 CPSC503 Winter 2007 7

WordNet Bass: music vs. fishThe noun ``bass'' has 8 senses in WordNet1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic

music)3. bass, basso - (an adult male singer with …)4. sea bass, bass - (flesh of lean-fleshed saltwater fish

of the family Serranidae)5. freshwater bass, bass - (any of various North

American lean-fleshed ………)6. bass, bass voice, basso - (the lowest adult male

singing voice)7. bass - (the member with the lowest range of a

family of musical instruments)8. bass -(nontechnical name for any of numerous

edible marine and freshwater spiny-finned fishes)

04/19/23 CPSC503 Winter 2007 8

Representations for Context

• GOAL: Informative characterization of the window of text surrounding the target word

• Supervised ML requires a very simple representation for the training data:

vectors of feature/value pairs

• TASK: Select relevant linguistic information, encode them as a feature vector

04/19/23 CPSC503 Winter 2007 9

Relevant Linguistic Information(1)

• Collocational: info about the words that appear in specific positions to the right and left of the target word

• Example text (WSJ)– An electric guitar and bass player stand off

to one side not really part of the scene, …

Assume a window of +/- 2 from the target

[guitar, NN, and, CJC, player, NN, stand, VVB]

[word in position -n, part-of-speech position -n, …word in position +n, part-of-speech position +n,]

Typically words and their POS

04/19/23 CPSC503 Winter 2007 10

Relevant Linguistic Information(2)

Co-occurrence: info about the words that occur anywhere in the window regardless of position

• Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band))

Vector for one case: [c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar),

c(band)]• Example text (WSJ)

– An electric guitar and bass player stand off to one side not really part of the scene, …

[0,0,0,1,0,0,0,0,0,0,1,0]

04/19/23 CPSC503 Winter 2007 11

Training Data Examples

[guitar, NN, and, CJC, player, NN, stand, VVB, 0] [0,0,0,1,0,0,0,0,0,0,1,0,0]

Let’s assume: bass-music encoded as 0bass-fish encoded as 1

[a, AT0, sea, CJC, to, PRP, me, PNP, 1]

[play, VVB, the, AT0, with, PRP, others, PNP, 0]

[……… ]

[1,0,0,0,0,0,0,0,0,0,0,0,1]

[1,0,0,0,0,0,0,0,0,0,0,1,1]

[…………………..]

• Inputs to classifiers

[guitar, NN, and, CJC, could, VM0, be, VVI] [1,1,0,0,0,1,0,0,0,0,0,0]

04/19/23 CPSC503 Winter 2007 12

ML for Classifiers

MachineLearning

Training Data:•Co-occurrence•Collocational

Classifier

• Naïve Bayes• Decision lists• Decision trees• Neural nets• Support vector machines• Nearest neighbor

methods…

04/19/23 CPSC503 Winter 2007 13

Naïve Bayes

)|( argmaxs VsPSs

P(V)

P(s)P(V|s)

Ss argmaxs

n

jj

SssvPsP

1 )|()( argmaxs

n

jj svPsVP

1)|()|(

Independence

04/19/23 CPSC503 Winter 2007 14

Naïve Bayes: EvaluationExperiment comparing different

classifiers [Mooney 96]• Naïve Bayes and Neural Network

achieved highest performance• 73% in assigning one of six senses

to line • Is this good?

• Simplest Baseline: “most frequent sense”• Celing: human inter-annotator agreement

– 75%-80% on refined sense distinctions (wordnet)

– Closer to 90% for binary distinctions

04/19/23 CPSC503 Winter 2007 15

Bootstrapping• What if you don’t have enough data

to train a system…

More DataMoreClassifiedData

MachineLearning

SmallTrainingData

Classifier

seeds

04/19/23 CPSC503 Winter 2007 16

Bootstrapping: how to pick the seeds

E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense

• Hand-labeling (Hearst 1991):– Likely correct– Likely to be prototypical

• One sense per collocation (Yarowsky 1995):

• One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense

04/19/23 CPSC503 Winter 2007 17

Unsupervised Methods [Schutze ’98]

MachineLearning

(Clustering)TrainingData

K Clusters ci

(word + vector)1

……(word + vector)n

Hand-labeling

(c1 sense1)……

(word + vector) senseVector/cluster

Similarity

04/19/23 CPSC503 Winter 2007 18

Agglomerative Clustering

• Assign each instance to its own cluster• Repeat

– Merge the two clusters that are more similar

• Until (specified # of clusters is reached)

• If there are too many training instances ->random sampling

04/19/23 CPSC503 Winter 2007 19

Problems

• Given these general ML approaches, how many classifiers do I need to perform WSD robustly– One for each ambiguous word in the

language

• How do you decide what set of tags/labels/senses to use for a given word?– Depends on the application

04/19/23 CPSC503 Winter 2007 20

WDS: Dictionary and Thesaurus Methods

Most common: Lesk method• Choose the sense whose

dictionary gloss shares most words with the target word’s neighborhood

• Exclude stop-wordsDef: Words in gloss for a sense

is called the signature

04/19/23 CPSC503 Winter 2007 21

Lesk: ExampleTwo SENSES for channel

S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rainwater into a series of channels under the street"

S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" …..

“ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .”

04/19/23 CPSC503 Winter 2007 22

Corpus LeskBest performer• If a corpus with annotated senses is

available• For each sense: add all the words in

the sentences containing that sense to the signature for that sense

CORPUS……“most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain.….. ?

04/19/23 CPSC503 Winter 2007 23

WSD: More Recent Trends

• Better ML techniques (e.g., Combining Classifiers)

• Combining ML and Lesk

• Other Languages

• Building better/larger corpora

04/19/23 CPSC503 Winter 2007 24

Today 22/10

• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling

04/19/23 CPSC503 Winter 2007 25

Word SimilarityActually relation between two sensesSimilarity vs. Relatednesssun vs. moon – mouth vs. food – hot vs.

cold

Applications?

• Thesaurus methods: measure distance in online thesauri (e.g., Wordnet)

• Distributional methods: finding if the two words appear in similar contexts

04/19/23 CPSC503 Winter 2007 26

WS: Thesaurus Methods(1)• Path-length based sim on hyper/hypo

hierarchies),(log),(sim 2121path ccpathlencc

• Information content word similarity (not all edges are equal)

N

ccount

c csubsensesci

i

)(

)(

)(P)(log)(I cPcC ),( 21 ccLCSprobability

Information Lowest Common Subsumer

)),((log),(sim 2121resnik ccLCSPcc

04/19/23 CPSC503 Winter 2007 27

WS: Thesaurus Methods(2)• One of best performers – Jiang-

Conrath distance

))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC

• This is a measure of distance. Reciprocal for similarity!

• See also Extended Lesk

04/19/23 CPSC503 Winter 2007 28

WS: Distributional Methods• Do not have any thesauri for target

language• If you have thesaurus, still

– Missing domain-specific (e.g., technical words)– Poor hyponym knowledge (for V) and nothing for

Adj and Adv– Difficult to compare senses from different

hierarchies• Solution: extract similarity from corpora

• Basic idea: two words are similar if they appear in similar contexts

04/19/23 CPSC503 Winter 2007 29

WS Distributional Methods (1)

• Simple Context: feature vector

Example: fi how many times wi appeared in the neighborhood of w

Stop list

),...,,( 21 Nfffw

• More Complex Context: feature matrix

aij how many times wi appeared in the neighborhood of w and was related to w by the syntactic relation rj

][ijA

04/19/23 CPSC503 Winter 2007 30


• More informative values (referred to as weights or measure of association in the literature)

• Point-wise Mutual Information

)()(

),(log),( 2 fPwP

fwPfwassocPMI

• t-test

)()(

)()(),(),(

fPwP

fPwPfwPfwassoct testt

04/19/23 CPSC503 Winter 2007 31

WS Distributional Methods (3)• Similarity between vectors

Not sensitive to extreme values

)cos(),(cos

wv

wv

w

w

v

vwvsim ine

v

w

N

iii

N

iii

Jaccard

wv

wvwvsim

1

1

),max(

),min(),(

Normalized

(weighted) number of overlapping features

04/19/23 CPSC503 Winter 2007 32


• Best combination overall– t-test for weights– Jaccard (or Dice) for vector

similarity

04/19/23 CPSC503 Winter 2007 33

Today 22/10

• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling

04/19/23 CPSC503 Winter 2007 34

Semantic Role LabelingTypically framed as a classification

problem [Gildea, Jurfsky 2002]1. Assign parse tree to input2. Find all predicate-bearing words

(PropBank, FrameNet)3. For each predicate:

determine for each synt. constituent which role (if any) it plays with respect to the predicate

Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others

04/19/23 CPSC503 Winter 2007 35

Semantic Role Labeling: Example

[issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..]

04/19/23 CPSC503 Winter 2007 36

Next Time

• Discourse and Dialog • Overview of Chapters 21 and 24

Documents

CPSC 503 Computational Linguistics