Upload
leon-derczynski
View
131
Download
4
Tags:
Embed Size (px)
DESCRIPTION
An introduction to WSD and WSI, based on the talks given at ESSLLI 2010.
Citation preview
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Sense Disambiguation and Induction
Leon Derczynski
University of Sheffield
27 January 2011
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Origin
Originally a course at ESSLLI 2011, Copenhagenby Roberto Navigli and Simone Ponzetto
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
General Problem
Being able to disambiguate words in context is a crucialproblem
Can potentially help improve many other NLP applications
Polysemy is everywhere – our job is to model this
Ambiguity is rampant.
I saw a man who is 98 years old and can still walk and telljokes.
saw:26 man:11 years:4 old:8 can:5 still:4 walk:10 tell:8 jokes:3
43 929 600 possible senses for this simple sentence.
general problem, ambiguity is rampant
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Senses
Monosemous words – only one meaning; plant life, internet
Polysemous words – more than one meaning; bar, bass
A word sense is a commonly-accepted meaning of a word.
We are fond of fruit such as the kiwifruit and banana.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Enumerative Approach
Fixed sense inventory enumerates the range of possiblemeanings of a word
Context is used to select a particular sense
chop vegetables with a knife, was stabbed with a knife
However, we may want to add senses.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
WSD Tasks
Different representations of senses change the way we thinkabout WSD
Lexical sample – disambiguate a restricted set of words
All words – disambiguate all content words
Cross lingual WSD – disambiguate a target word by labeling itwith the appropriate translation in other languages; eg.English coach → German Bus/Linienbus/Omnibus/Reisebus.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Representing the Context
Text is unstructured, and needs to be made machine-readable.
Flat representation (surface features) vs. Structuredrepresentation (graphs, trees)
Local features: local context of a word usage, e.g. PoS tagsand surrounding word forms
Topical features: general topic of a sentence or discourse,represented as a bag of words
Syntactic features: argument-head relations between targetand rest of sentence
Semantic features: previously established word senses
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge Resources
Structured and Unstructured
Thesauri, machine-readable dictionaries, semantic networks(WordNet)
BabelNet – Babel synsets, with semantic relations (is-a,part-of)
Raw corpora
Collocation (Web1T)
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Applications
Information extraction – acronym expansion, disambiguatepeople names, domain-specific IE
Information retrieval
Machine Translation
Semantic web
Question answering
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Approaches
Supervised WSD: classification task, hand-labelled data
KB WSD: uses knowledge resources, no training
Unsupervised: performs WSI
Word sense dominance: find predominant sense of a word
Domain-driven WSD: use domain information as vectors tocompare with senses of w
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Supervised WSD
Given a set of manually sense-annotated examples (trainingset), learn a classifier
Features for WSD: Bag of words, bigrams, collocations, VPand NP heads, PoS
Using WordNet as a sense inventory, SemCor is a readilyavailable source of sense-labelled data
Current SotA performance from SVMs
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge-based WSD
Exploit knowledge resources (dictionaries, thesauri,collocations) to assign senses
Lower performance than supervised methods, but widercoverage
No need to train or be tuned to a task/domain
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Gloss Overlap
Knowledge-based method proposed by Lesk (1986)
Retrieve all sense definitions of target word
Compare each sense definition with the definitions of otherwords in context
Choose the sense with the most overlap
To disambiguate pine cone;
pine: 1. a kind of evergreen tree; 2. to waste away throughsorrow.
cone: 1. a solid body which narrows to a point; 2. somethingof this shape; 3. fruit of certain evergreen trees.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Lexical Chains
Knowledge-based method proposed by Hirst and St Onge(1998)
A lexical chain is a sequence of semantically related words in atext
Assign scores to senses based on the chain of related words itis in
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
PageRank
Knowledge-based method proposed by Agirre and Soroa(2009)
Build a graph including all synsets of words in the input text
Assign an initial low value to each node in the graph
Apply PageRank (Brin and Page) to the graph, and selectsynsets with the highest PR
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge Acquisition Bottleneck
WSD needs knowledge! Corpora, dictionaries, semanticnetworks
More knowledge is required to improve the performance ofboth:
Supervised systems – more training data
Knowledge based systems – richer networks
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Minimally Supervised WSD
Human supervision is expensive, but required for trainingexamples or a knowledge base
Minimally supervised approaches aim to learn classifiers fromannotated data with minimal human supervision
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Bootstrapping
Given a set labelled examples L, a set of unlabelled examplesU and a classifier c :
1. Choose N examples from U and add them to U ′
2. Train c on L and label U ′
3. Select K most confidently labelled instances from U ′ andassign them to L
Repeat until U or K is empty
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Sense Induction
Based on the idea that one sense of a word will have similarneighbouring words
Follows the idea that the meaning of a word is given by itsusage
We induce word sense from input text by clustering wordoccurrences
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Clustering
Unsupervised machine learning for grouping similar objectsinto groups
No a priori input (sense labels)
Context clustering: each occurrence of a word is representedas a context vector; cluster vectors into groups
Word clustering: cluster words which are semantically similarand thus have a specific meaning
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Clustering
Aims to cluster words which are semantically similar
Lin (1998) proposes this method:
1. Extract dependency triples from a text corpus
John eats a yummy kiwi → (eat subj John), (kiwi obj-of eat),(kiwi det a) ...
2. Define a measure of similarity between two words
3. Use similarity scores to create a similarity tree; start with aroot node, and add recursively add children in descendingorder of similarity.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Lin’s approach: example
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
WSI: pros and cons
+ Actually performs word sense disambiguation
+ Aims to divide the occurrences of a word into a number ofclasses
- Makes objective evaluation more difficult if notdomain-specific
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Disambiguation Evaluation
Disambiguation is easy to evaluate – we have discrete senseinventories
Evaluate with Coverage (answers given),
Precision and Recall, and then F1
Accuracy – correct answers / total answers
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Disambiguation Baselines
MFS – Most Frequent Sense
Strong baseline - 50-60% accuracy on lexical sample task
Doesn’t take into account genre (e.g. star in astrophysics /newswire)
Subject to idiosyncracies of corpus
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Evaluation with gold-standard clustering
Given a standard clustering, compare the gold standard andoutput clustering
Can evaluate with set Entropy, Purity
Also RandIndex (similar to Jacquard) and F-Score.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Discrimination Baselines
All-in-one: group all words into one big cluster
Random: produce a random set of clusters
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Pseudowords
Discrimination evaluation method
Generates new words with artificial ambiguity
Select two or more monosemous terms from gold standarddata
Given all their occurrences in a corpus, replace them with apseudoword formed by joining the two terms
Compare automatic discrimination to gold standard
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
SemEval-2007
Lexical sample and all-words coarse grained WSD
Preposition disambiguation
Evaluation of WSD on cross-language RI
WSI, lexical substitution
Top systems reach 88.7% accuracy (on lexical sample) and82.5% (on all-words)
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
SemEval-2010
Fifth event of its kind
Includes specific cross-lingual tasks
Combined WSI/WSD task
Domain-specific all-words task
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Issues
Representation of word senses: enumerative vs. generativeapproach
Knowledge Acquisition Bottleneck: not enough data!
Benefits for AI/NLP applications
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Alleviating the Knowledge Acquisition Bottleneck
Weakly-supervised algorithms, incorporating bootstrapping oractive learning
Continuing manual efforts – WordNet, Open Mind WordExpert, OntoNotes
Automatic enrichment of knowledge resources – collocationand relation triple extraction, BabelNet
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Future Challenges
How can we mine even larger repositories of textual data –e.g. the whole web! – to create huge knowledge repositories?
How can we design high performance and scalable algorithmsto use this data?
Need to decide which kind of word sense are needed for whichapplication
Still, need to develop a general representation of word senses
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Wikipedia as sense inventory
Wikipedia articles provide an inventory of disambiguated wordsenses and entity references
Task: Use their occurrences in texts, i.e. the internalWikipedia hyperlinks, as named entity and sense annotations
The articles’ texts provide a sense annotated corpus
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Mihalcea (2007)
Mihalcea proposes a method for automatically generatingsense-tagged data using Wikipedia
Rhythm is the arrangement of sounds in time. Meter animatestime in regular pulse groupings, called measures or [[bar(music)—bar]].
The nightlife is particularly active around the beachfrontpromenades because of its many nightclubs and [[bar(establishment)—bars]].
1. Extract all paragraphs in Wikipedia containing word w
2. Collect all possible labels l2..ln for w
3. Map each label l to its WordNet sense s
4. Annotate each occurrence of li |w with its sense s
System trained on Wikipedia significantly outperforms MFSand Lesk baselines
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge-rich WSD
General aim is to relieve knowledge acquisition bottleneck ofNLP systems, with WSD as a case study
Main ideas:
- Extend WordNet with millions of semantic relations (usingWikipedia)
- Apply knowledge-based WSD to exploit extended WordNet
Results: integration of many, many semantic relations inknowledge-based systems yields performance competitive withSotA supervised approaches
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Wikification
The task of generating hyperlinks to disambiguated Wikipediaconcepts
Two sub-tasks: automatic keyword extraction, WSD
Wikify!1 can perform KW extraction by extracting candidatesand then ranking them
The system does knowledge-based and data-driven WSD,filtering out annotations that contain disagreements
Disambiguate links using relatedness, commonness (priorprobability of a sense), and context quality (context terms).
1Csomai and Mihalcea (2008)Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Questions
Thank you. Are there any questions?
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction