36
06/23/22 CPSC503 Winter 2007 1 CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 12 Giuseppe Carenini

CPSC 503 Computational Linguistics

Embed Size (px)

DESCRIPTION

CPSC 503 Computational Linguistics. Computational Lexical Semantics Lecture 12 Giuseppe Carenini. Today 22/10. Three well-defined Semantic Task Word Sense Disambiguation Corpus and Thesaurus Word Similarity Thesaurus and Corpus Semantic Role Labeling. WSD example: table + ?? -> [1-6]. - PowerPoint PPT Presentation

Citation preview

Page 1: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 1

CPSC 503Computational Linguistics

Computational Lexical SemanticsLecture 12

Giuseppe Carenini

Page 2: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 2

Today 22/10

Three well-defined Semantic Task• Word Sense Disambiguation

– Corpus and Thesaurus

• Word Similarity– Thesaurus and Corpus

• Semantic Role Labeling

Page 3: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 3

WSD example: table + ?? -> [1-6]

The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …)

Page 4: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 4

WSD methods

•Machine Learning – Supervised– Unsupervised

•Dictionary / Thesaurus (Lesk)

Page 5: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 5

Supervised ML Approaches to WSD

MachineLearning

Classifier

TrainingData

((word + context1) sense1)……((word + contextn) sensen)

sense(word + context)

Page 6: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 6

Training Data Example

..after the soup she had bass with a big salad…

((word + context) sense)i

context

Examples, • One of 8 possible senses for “bass” in WordNet• One of the 2 key distinct senses for “bass” in

WordNet

sense

Page 7: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 7

WordNet Bass: music vs. fishThe noun ``bass'' has 8 senses in WordNet1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic

music)3. bass, basso - (an adult male singer with …)4. sea bass, bass - (flesh of lean-fleshed saltwater fish

of the family Serranidae)5. freshwater bass, bass - (any of various North

American lean-fleshed ………)6. bass, bass voice, basso - (the lowest adult male

singing voice)7. bass - (the member with the lowest range of a

family of musical instruments)8. bass -(nontechnical name for any of numerous

edible marine and freshwater spiny-finned fishes)

Page 8: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 8

Representations for Context

• GOAL: Informative characterization of the window of text surrounding the target word

• Supervised ML requires a very simple representation for the training data:

vectors of feature/value pairs

• TASK: Select relevant linguistic information, encode them as a feature vector

Page 9: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 9

Relevant Linguistic Information(1)

• Collocational: info about the words that appear in specific positions to the right and left of the target word

• Example text (WSJ)– An electric guitar and bass player stand off

to one side not really part of the scene, …

Assume a window of +/- 2 from the target

[guitar, NN, and, CJC, player, NN, stand, VVB]

[word in position -n, part-of-speech position -n, …word in position +n, part-of-speech position +n,]

Typically words and their POS

Page 10: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 10

Relevant Linguistic Information(2)

Co-occurrence: info about the words that occur anywhere in the window regardless of position

• Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band))

Vector for one case: [c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar),

c(band)]• Example text (WSJ)

– An electric guitar and bass player stand off to one side not really part of the scene, …

[0,0,0,1,0,0,0,0,0,0,1,0]

Page 11: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 11

Training Data Examples

[guitar, NN, and, CJC, player, NN, stand, VVB, 0] [0,0,0,1,0,0,0,0,0,0,1,0,0]

Let’s assume: bass-music encoded as 0bass-fish encoded as 1

[a, AT0, sea, CJC, to, PRP, me, PNP, 1]

[play, VVB, the, AT0, with, PRP, others, PNP, 0]

[……… ]

[1,0,0,0,0,0,0,0,0,0,0,0,1]

[1,0,0,0,0,0,0,0,0,0,0,1,1]

[…………………..]

• Inputs to classifiers

[guitar, NN, and, CJC, could, VM0, be, VVI] [1,1,0,0,0,1,0,0,0,0,0,0]

Page 12: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 12

ML for Classifiers

MachineLearning

Training Data:•Co-occurrence•Collocational

Classifier

• Naïve Bayes• Decision lists• Decision trees• Neural nets• Support vector machines• Nearest neighbor

methods…

Page 13: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 13

Naïve Bayes

)|( argmaxs VsPSs

P(V)

P(s)P(V|s)

Ss argmaxs

n

jj

SssvPsP

1 )|()( argmaxs

n

jj svPsVP

1)|()|(

Independence

Page 14: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 14

Naïve Bayes: EvaluationExperiment comparing different

classifiers [Mooney 96]• Naïve Bayes and Neural Network

achieved highest performance• 73% in assigning one of six senses

to line • Is this good?

• Simplest Baseline: “most frequent sense”• Celing: human inter-annotator agreement

– 75%-80% on refined sense distinctions (wordnet)

– Closer to 90% for binary distinctions

Page 15: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 15

Bootstrapping• What if you don’t have enough data

to train a system…

More DataMoreClassifiedData

MachineLearning

SmallTrainingData

Classifier

seeds

Page 16: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 16

Bootstrapping: how to pick the seeds

E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense

• Hand-labeling (Hearst 1991):– Likely correct– Likely to be prototypical

• One sense per collocation (Yarowsky 1995):

• One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense

Page 17: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 17

Unsupervised Methods [Schutze ’98]

MachineLearning

(Clustering)TrainingData

K Clusters ci

(word + vector)1

……(word + vector)n

Hand-labeling

(c1 sense1)……

(word + vector) senseVector/cluster

Similarity

Page 18: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 18

Agglomerative Clustering

• Assign each instance to its own cluster• Repeat

– Merge the two clusters that are more similar

• Until (specified # of clusters is reached)

• If there are too many training instances ->random sampling

Page 19: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 19

Problems

• Given these general ML approaches, how many classifiers do I need to perform WSD robustly– One for each ambiguous word in the

language

• How do you decide what set of tags/labels/senses to use for a given word?– Depends on the application

Page 20: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 20

WDS: Dictionary and Thesaurus Methods

Most common: Lesk method• Choose the sense whose

dictionary gloss shares most words with the target word’s neighborhood

• Exclude stop-wordsDef: Words in gloss for a sense

is called the signature

Page 21: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 21

Lesk: ExampleTwo SENSES for channel

S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rainwater into a series of channels under the street"

S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" …..

“ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .”

Page 22: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 22

Corpus LeskBest performer• If a corpus with annotated senses is

available• For each sense: add all the words in

the sentences containing that sense to the signature for that sense

CORPUS……“most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain.….. ?

Page 23: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 23

WSD: More Recent Trends

• Better ML techniques (e.g., Combining Classifiers)

• Combining ML and Lesk

• Other Languages

• Building better/larger corpora

Page 24: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 24

Today 22/10

• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling

Page 25: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 25

Word SimilarityActually relation between two sensesSimilarity vs. Relatednesssun vs. moon – mouth vs. food – hot vs.

cold

Applications?

• Thesaurus methods: measure distance in online thesauri (e.g., Wordnet)

• Distributional methods: finding if the two words appear in similar contexts

Page 26: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 26

WS: Thesaurus Methods(1)• Path-length based sim on hyper/hypo

hierarchies),(log),(sim 2121path ccpathlencc

• Information content word similarity (not all edges are equal)

N

ccount

c csubsensesci

i

)(

)(

)(P)(log)(I cPcC ),( 21 ccLCSprobability

Information Lowest Common Subsumer

)),((log),(sim 2121resnik ccLCSPcc

Page 27: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 27

WS: Thesaurus Methods(2)• One of best performers – Jiang-

Conrath distance

))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC

• This is a measure of distance. Reciprocal for similarity!

• See also Extended Lesk

Page 28: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 28

WS: Distributional Methods• Do not have any thesauri for target

language• If you have thesaurus, still

– Missing domain-specific (e.g., technical words)– Poor hyponym knowledge (for V) and nothing for

Adj and Adv– Difficult to compare senses from different

hierarchies• Solution: extract similarity from corpora

• Basic idea: two words are similar if they appear in similar contexts

Page 29: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 29

WS Distributional Methods (1)

• Simple Context: feature vector

Example: fi how many times wi appeared in the neighborhood of w

Stop list

),...,,( 21 Nfffw

• More Complex Context: feature matrix

aij how many times wi appeared in the neighborhood of w and was related to w by the syntactic relation rj

][ijA

Page 30: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 30

WS Distributional Methods (2)

• More informative values (referred to as weights or measure of association in the literature)

• Point-wise Mutual Information

)()(

),(log),( 2 fPwP

fwPfwassocPMI

• t-test

)()(

)()(),(),(

fPwP

fPwPfwPfwassoct testt

Page 31: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 31

WS Distributional Methods (3)• Similarity between vectors

Not sensitive to extreme values

)cos(),(cos

wv

wv

w

w

v

vwvsim ine

v

w

N

iii

N

iii

Jaccard

wv

wvwvsim

1

1

),max(

),min(),(

Normalized

(weighted) number of overlapping features

Page 32: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 32

WS Distributional Methods (4)

• Best combination overall– t-test for weights– Jaccard (or Dice) for vector

similarity

Page 33: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 33

Today 22/10

• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling

Page 34: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 34

Semantic Role LabelingTypically framed as a classification

problem [Gildea, Jurfsky 2002]1. Assign parse tree to input2. Find all predicate-bearing words

(PropBank, FrameNet)3. For each predicate:

determine for each synt. constituent which role (if any) it plays with respect to the predicate

Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others

Page 35: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 35

Semantic Role Labeling: Example

[issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..]

Page 36: CPSC 503 Computational Linguistics

04/19/23 CPSC503 Winter 2007 36

Next Time

• Discourse and Dialog • Overview of Chapters 21 and 24