CIS630 1
Penn
Different Sense Granularities
Martha Palmer, Olga Babko-Malaya
September 20, 2004
CIS630 2
PennStatistical Machine Translation results
CHINESE TEXT The japanese court before china photo
trade huge & lawsuit. A large amount of the proceedings before
the court dismissed workers. japan’s court, former chinese servant
industrial huge disasters lawsuit. Japanese Court Rejects Former Chinese
Slave Workers’ Lawsuit for Huge Compensation.
CIS630 3
PennOutline MT example Sense tagging Issues highlighted by
Senseval1 Senseval2
Groupings, Impact on ITA
Automatic WSD, impact on scores
CIS630 4
PennWordNet - Princeton On-line lexical reference (dictionary)
Words organized into synonym sets <=> concepts
Hypernyms (ISA), antonyms, meronyms (PART) Useful for checking selectional restrictions (doesn’t tell you what they should be)
Typical top nodes - 5 out of 25 (act, action, activity) (animal, fauna) (artifact) (attribute, property) (body, corpus)
CIS630 5
PennWordNet – president, 6 senses1. president -- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE… LEADER 2. President of the United States, President, Chief Executive -- (the person who
holds the office of head of state of the United States government; "the President likes to jog every morning")-->HEAD OF STATE, CHIEF OF STATE
3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE
4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson") --> PRESIDING OFFICER LEADER
5. president -- (the head administrative officer of a college or university)--> ACADEMIC ADMINISTRATOR …. LEADER
6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years")
--> PRESIDENCY, PRESIDENTSHIP POSITION
CIS630 6
PennLimitations to WordNet Poor inter-annotator agreement (73%)
Just sense tags - no representationsVery little mapping to syntaxNo predicate argument structure no selectional restrictions
No generalizations about sense distinctions
No hierarchical entries
CIS630 7
PennSIGLEX98/SENSEVAL Workshop on Word Sense Disambiguation
54 attendees, 24 systems, 3 languages 34 Words (Nouns, Verbs, Adjectives) Both supervised and unsupervised systems Training data, Test data
Hector senses - very corpus based (mapping to WordNet)
lexical samples - instances, not running text Replicability over 90%, ITA 85%
ACL-SIGLEX98,SIGLEX99, CHUM00
CIS630 8
PennHector - bother, 10 senses 1. intransitive verb, - (make an effort), after negation,
usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.” 1.1 (can't be bothered), idiomatic, be unwilling to make the effort
needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.''
2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or
someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.” 2.1 v-passive; with `about" or `with“ - (of a person) to be concerned
about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”
CIS630 9
PennMismatches between lexicons:Hector - WordNet, shake
CIS630 10
PennVERBNET
CIS630 11
PennVerbNet/WordNet
CIS630 12
PennMapping WN-Hector via VerbNet
SIGLEX99, LREC00
CIS630 13
PennSENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer
All-words task Lexical sample taskCzech BasqueDutch ChineseEnglish EnglishEstonian Italian
Japanese Korean Spanish Swedish
CIS630 14
PennEnglish Lexical Sample - Verbs
Preparation for Senseval 2manual tagging of 29 highly polysemous verbs
(call, draw, drift, carry, find, keep, turn,...)WordNet (pre-release version 1.7)
To handle unclear sense distinctionsdetect and eliminate redundant sensesdetect and cluster closely related senses
NOT ALLOWED
CIS630 15
PennWordNet – call, 28 senses1. name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named
after the famous Civil Rights leader") -> LABEL
2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone;
"I tried to call you all night"; "Take two aspirin and call me in the morning")
->TELECOMMUNICATE
3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality;
"He called me a bastard"; "She called her children lazy and ungrateful")
-> LABEL
CIS630 16
PennWordNet – call, 28 senses4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!")
-> ORDER
5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry;
"she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me")
-> UTTER
6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens")
-> MEET
CIS630 17
PennGroupings Methodology
Double blind groupings, adjudication Syntactic Criteria (VerbNet was useful)
Distinct subcategorization frames call him a bastard call him a taxi
Recognizable alternations – regular sense extensions: play an instrument play a song play a melody on an instrument
CIS630 18
PennGroupings Methodology (cont.)
Semantic CriteriaDifferences in semantic classes of arguments
Abstract/concrete, human/animal, animate/inanimate, different instrument types,…
Differences in entailments Change of prior entity or creation of a new entity?
Differences in types of events Abstract/concrete/mental/emotional/….
Specialized subject domains
CIS630 19
PennWordNet: - call, 28 senses
WN2 , WN13,WN28 WN15 WN26
WN3 WN19 WN4 WN 7 WN8 WN9
WN1 WN22
WN20 WN25
WN18 WN27
WN5 WN 16 WN6 WN23
WN12
WN17 , WN 11 WN10, WN14, WN21, WN24
CIS630 20
PennWordNet: - call, 28 senses, groups
WN2, WN13,WN28 WN15 WN26
WN3 WN19 WN4 WN 7 WN8 WN9
WN1 WN22
WN20 WN25
WN18 WN27
WN5 WN 16 WN6 WN23
WN12
WN17 , WN 11 WN10, WN14, WN21, WN24,
Phone/radio
Label
Loud cry
Bird or animal cry
Request
Call a loan/bond
Visit
Challenge
Bid
CIS630 21
PennWordNet – call, 28 senses, Group11. name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named
after the famous Civil Rights leader") --> LABEL3. call -- (ascribe a quality to or give a name of a common
noun that reflects a quality; "He called me a bastard"; "She called her children lazy and
ungrateful") --> LABEL
19. call -- (consider or regard as being; "I would not call her beautiful")--> SEE
22. address, call -- (greet, as with a prescribed form, title, or name;
"He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name")
--> ADDRESS
CIS630 22
PennSense Groups: verb ‘develop’
WN1 WN2 WN3 WN4
WN6 WN7 WN8 WN5 WN 9 WN10
WN11 WN12 WN13 WN 14
WN19 WN20
CIS630 23
PennGroups 1 and 2 of Develop
Group Sense No.
Gloss Hypernym
1 – Abstract
WN1
WN2
Products, or mental creations
Mental creations – “new theory” Gradually unfold – “the plot …”
create
create
2 – New
(property) WN3
WN4
Personal attribute – “a passion for …”Physical characteristic – “a beard”
change
change
CIS630 24
PennGroup 3 of Develop
Group Sense No.
Gloss Hypernym
3 – New
(self)
WN5
WN9
WN10
WN14
WN20
Originate- “new religious movement”
Gradually unfold – “the plot …”
Grow – “a flower developed …”
Mature – “The child developed …”
Happen – “report the news as it …”
become
occur
grow
change
occur
CIS630 25
PennGroup 4 of Develop
Group Sense No.
Gloss Hypernym
4 – Improve
item
WN6
WN7
WN8
WN11
WN12
WN13
WN19
Resources – “natural resources”
Ideas – “ideas in your thesis”
Train animate beings – “violinists”
Civilize – “developing countries”
Make, grow – “develop the grain”
Business – “develop the market” Music – “develop the melody”
improve
theorize
teach
change
change
generate
complicate
CIS630 26
PennMaximum Entropy WSDHoa Dang (in progress)
Maximum entropy frameworkcombines different features with no assumption of
independenceestimates conditional probability that W has sense X in
context Y, (where Y is a conjunction of linguistic features
feature weights are determined from training dataweights produce a maximum entropy probability
distribution
CIS630 27
PennFeatures used Topical contextual linguistic feature for W:
presence of automatically determined keywords in S Local contextual linguistic features for W:
presence of subject, complementswords in subject, complement positions, particles, prepsnoun synonyms and hypernyms for subjects,
complementsnamed entity tag (PERSON, LOCATION,..) for proper
Nswords within +/- 2 word window
CIS630 28
PennMaximum Entropy WSDHoa Dang, Senseval2 Verbs (best)
Maximum entropy framework, p(sense|context) Contextual Linguistic Features
Topical feature for W: +2.5%, keywords (determined automatically)
Local syntactic features for W: +1.5 to +5%, presence of subject, complements, passive? words in subject, complement positions, particles,
preps, etc.Local semantic features for W: +6%
Semantic class info from WordNet (synsets, etc.) Named Entity tag (PERSON, LOCATION,..) for
proper Ns words within +/- 2 word window
CIS630 29
PennResults - first 5 Senseval2 verbs
Verb Begin
Call
Carry
Develop
Draw Dress
WN/corpus
10/9 28/14 39/22 21/16 35/21 15/8
Grp/corp 10/9 11/7 16/11 9/6 15/9 7/4
Entropy 1.76 3.68 3.97 3.17 4.60 2.89
ITA-fine .812 .693 .607 .678 .767 .865
ITA-coarse
.814 .892 .753 .852 .825 1.00
MX-fine .832 .470 .379 .493 .366 .610
MX-coarse
.832 .636 .485 .681 .512 .898
CIS630 30
PennResults – averaged over 28 verbs
Total
WN/corpus 16.28/10.83
Grp/corp 8.07/5.90
Entropy 2.81
ITA-fine 71%
ITA-coarse 82%
MX-fine 59%
MX-coarse 69%
CIS630 31
PennGrouping improved sense identification for MxWSD
75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses Most commonly confused senses suggest grouping:
(1) name, call--assign a specified proper name to; ``They called their son David'' (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard''; (3) call--consider or regard as being; ``I would not call her beautiful'' (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''
CIS630 32
PennCriteria to split Framesets
Semantic classes of arguments, such as animacy vs. inanimacy
Serve 01. Act, workGroup 1: function (His freedom served him well)Group 2: work (He served in Congress)
CIS630 33
PennCriteria to split Framesets
Semantic type of event(abstract vs. concrete)
See 01. ViewGroup 1: Perceive by sight
(Can you see the bird?)Group 5: determine, check
(See whether it works)
CIS630 34
PennOverlap with PropBank Framesets
WN5, WN16,WN12 WN15 WN26
WN3 WN19 WN4 WN 7 WN8 WN9
WN1 WN22
WN20 WN25
WN18 WN27
WN2 WN 13 WN6 WN23
WN28
WN17 , WN 11 WN10, WN14, WN21, WN24,
Loud cry
Label
Phone/radio
Bird or animal cry
Request
Call a loan/bond
Visit
Challenge
Bid
CIS630 35
PennOverlap between Senseval2Groups and Framesets – 95%
WN1 WN2 WN3 WN4
WN6 WN7 WN8 WN5 WN 9 WN10
WN11 WN12 WN13 WN 14
WN19 WN20
Frameset1
Frameset2
develop
CIS630 36
PennFramesets →Groups→ WordNet
WN1 WN2
WN9 WN8
WN3 WN4 WN12 WN5 WN16
WN18 WN14 WN7
WN15
WN10 WN6 WN13
Frameset1 Frameset2
drop
WN11
Frameset3
CIS630 37
PennGroups 1 and 2 of Develop
Group Sense No.
Gloss Hypernym
1 – Abstract
WN1
WN2
Products, or mental creations
Mental creations – “new theory” Gradually unfold – “the plot …”
create
create
2 – New
(property) WN3
WN4
Personal attribute – “a passion for …”Physical characteristic – “a beard”
change
change
CIS630 38
PennGroup 3 of Develop
Group Sense No.
Gloss Hypernym
3 – New
(self)
WN5
WN9
WN10
WN14
WN20
Originate- “new religious movement”
Gradually unfold – “the plot …”
Grow – “a flower developed …”
Mature – “The child developed …”
Happen – “report the news as it …”
become
occur
grow
change
occur
CIS630 39
PennTranslations of Develop groups
Group Sense No. Portuguese German
G4
G1
G1
G2
G2
G4
G3
G3
WN13 markets
WN1 products
WN2 ways
WN2 theory
WN3 understanding
WN2 character
WN10 bacteria
WN5 movements
desenvolver
desenvolver
desenvolver
desenvolver
desenvolver
desenvolver
desenvolver-se
desenvolver-se
entwickeln
entwickeln
entwickeln
entwickeln
bilden
ausbilden
bilden sich
bilden
CIS630 40
PennTranslations of Develop groups
Group Sense No. Chinese Korean
G4
G1
G1
G2
G2
G4
G3
G3
WN13 markets
WN1 products
WN2 ways
WN2 theory
WN3 understanding
WN2 character
WN10 bacteria
WN5 movements
kai1-fa1
kai1-fa1
fa1-zhan3
pei2-yang3-chu1
pei2-yang3
pei2-yang3
fa1-yu4
xing2-cheng2
hyengsengha-ta
kaypalha-ta palcensikhi-ta
palcensikhi-ta
yangsengha-ta
yangsengha-ta
paltalha-ta
hyengsengtoy-ta
CIS630 41
PennAn Example of Mapping: verb ‘serve’Assignment: Do you agree?
Frameset id = serve.01
Sense Groups
serve 01: Act, work
Roles:
Arg0:worker
Arg1:job, project
Arg2:employer
GROUP 1: WN1 (function)
WN3(contribute to)
WN12 (answer)
GROUP 2: WN2 (do duty)
WN13 (do military service)
GROUP 3: WN4 (be used by)
WN8 (serve well)
WN14 (service)
GROUP 5: WN7 (devote one’s efforts)
WN10 (attend to)
CIS630 42
PennFrameset Tagging Results: overall accuracy 90%* (baseline 73.5%)
Verb Framesets Instances Accuracy
call 11 522 0.835
carry 4 195 0.933
develop 2 240 0.938
draw 3 94 0.926
leave 3 147 0.762
pull 6 88 0.784
serve 2 150 0.967
use 2 820 0.988
work 7 398 0.955* Gold Standard parses
CIS630 43
PennSense Hierarchy PropBank Framesets – ITA 94% coarse grained distinctions
20 Senseval2 verbs w/ > 1 FramesetMaxent WSD system, 73.5% baseline, 90% accuracy
Sense Groups (Senseval-2) - ITA 82% (now 89%) Intermediate level (includes Levin classes) – 69%
WordNet – ITA 71% fine grained distinctions, 60.2%
CIS630 44
PennSummary of WSD
Choice of features is more important than choice of machine learning algorithm
Importance of syntactic structure (English WSD but not Chinese) Importance of dependencies Importance of an hierarchical approach to
sense distinctions, and quick adaptation to new usages.