Automatic Term Ambiguity Detection
Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana R. Stanoi
IBM Research - Almaden
What is the buzz about Brave on Twitter?
Find tweets about the movie Brave:
Movie night watching brave with Cammie n Isla n loads munchies
This brave girl deserves endless retweets!
Watching brave with the kiddos!
watching Bregor playing Civ 5: Brave New World and thinking of getting it
Skyfall 007 in class with @MariaWiheelste
So I was dead set on seeing skyfall 007 for like a year
NowWatching #skyFall 007!
What movie amazed u — skyfall 007
Existing Disambiguation Methods
Word Sense Disambiguation (WSD) Which word sense does this instance refer to?
Named Entity Disambiguation (NED) Which entity type is this instance associated with?
Existing Disambiguation Methods
Word Sense Disambiguation (WSD) Which word sense does this specific instance refers to?
Named Entity Disambiguation (NED) Which entity type is this individual instance associated with?
Limitations: Assume the number of senses/entities is known
− Often not the case Inefficient on very large data sets
− Attempt to disambiguate each instance
Term Ambiguity Detection (TAD)
Perform term disambiguation at the term, not instance level Given a term T and its category C, do all the
mentions of the term reference a member of that category?
Term Ambiguity Detection (TAD)
Perform term disambiguation at the term, not instance level Given a term T and its category C, do all the
mentions of the term reference a member of that category?
Level of ambiguity of the term Hybrid information extraction (IE) systems
− Simpler model if the term unambiguous− More complex model otherwise
Potentially useful for other NLP tasks
Term Ambiguity Detection (TAD)
CameraEOS 5D
Video GameA New Beginning
MovieSkyfall 007
MovieBrave
CategoryTerm
Video GameA New Beginning
MovieBrave
CategoryTerm
Ambiguous
CameraEOS 5D
MovieSkyfall 007
CategoryTerm
Unambiguous
TAD
TAD Framework
Step 1: N-gram
Step 2: Ontology
Step 3: Clustering
Ambiguous
Unambiguous
TAD Framework
Step 1: N-gramDoes the term share a namewith a common word/phrase?
1. Normalize input term t (stopword removel + lowercase)
2. Calculate unigram probability
3. Ambiguous if the probability is above the empirically determined threshold
Ambiguous
Unambiguous
TAD Framework
Step 1: N-gram
Step 2: Ontology
• Wiktionary:Ambiguous if term has several
senses in Wiktionary
• Wikipedia:Ambiguous if term has a Wikipedia disambiguation page
Ambiguous
Unambiguous
TAD Framework
Step 1: N-gram
Step 2: Ontology
Step 3: ClusteringCluster the contexts in which the term appear
Ambiguous
Unambiguous
1. Remove stopwords and infrequent words from all documens containing the term
2. Cluster the document using Latent Dirichlet Allocation (LDA)
3. Ambiguous if category term or WordNet synonym does not appear in the most heavily weighted terms of any cluster
Evaluation
Dataset: terms from 4 product domains: Movies, Video Games, Cameras, Books
− 100 terms per domain− Extracted randomly from dbpedia and Flickr
Gold standard: ambiguity determined by examining usage in TREC Tweets2011 corpus 10 tweets labeled per term
− Unambiguous only if all tweets reference category
Questions to Answer
How effective is TAD?
How useful is TAD?
Results - Effectiveness
Each module produced above baseline performance
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960
Results - Effectiveness
Ontology method is of limited usage, as most of the terms cannot be found in the ontology.
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960
Results - Effectiveness
Each module produced above baseline performance Combined framework produced high F-measure of 0.96
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960
Results - Usefulness
Integrated TAD pipeline into commercially available IE system Extracted mentions of terms from Camera and
Video game domains on Twitter data Manually judged relevance of extracted Tweets
Results - Usefulness
Using ambiguity detection hurt recall Only 57% of the relevant documents returned
with TAD Ambiguity detection necessary for high
precision w/ ambiguity detection:
− Precision: 0.96 w/o ambiguity detection
− Precision: 0.16
Conclusion
Term ambiguity detection is helpful for large-scale information extraction Able to detect ambiguity when number of senses is
unknown Able to be applied to large datasets where instance-
level interpretation is impractical 3-Module TAD approach results is high
performance Detects ambiguity with F-measure of 0.96 Allows IE system to produce high precision
BACKUP
TAD FrameworkN-gram
suggests non-referential instances
Ontology suggests across domain
instances
Clustering suggests
either case
Ambiguous Terms
Unambiguous Terms
Yes
Yes
Yes
No
No
No
N-gram
Ontology
Clustering