44
Introduction WSD WSI Evaluation and Issues Wikipedia Summary Word Sense Disambiguation and Induction Leon Derczynski University of Sheffield 27 January 2011 Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction

Word Sense Disambiguation and Induction

Embed Size (px)

DESCRIPTION

An introduction to WSD and WSI, based on the talks given at ESSLLI 2010.

Citation preview

Page 1: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Word Sense Disambiguation and Induction

Leon Derczynski

University of Sheffield

27 January 2011

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 2: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Origin

Originally a course at ESSLLI 2011, Copenhagenby Roberto Navigli and Simone Ponzetto

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 3: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Outline

1 Introduction

2 WSD

3 WSI

4 Evaluation and Issues

5 Wikipedia

6 Summary

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 4: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

General Problem

Being able to disambiguate words in context is a crucialproblem

Can potentially help improve many other NLP applications

Polysemy is everywhere – our job is to model this

Ambiguity is rampant.

I saw a man who is 98 years old and can still walk and telljokes.

saw:26 man:11 years:4 old:8 can:5 still:4 walk:10 tell:8 jokes:3

43 929 600 possible senses for this simple sentence.

general problem, ambiguity is rampant

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 5: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Word Senses

Monosemous words – only one meaning; plant life, internet

Polysemous words – more than one meaning; bar, bass

A word sense is a commonly-accepted meaning of a word.

We are fond of fruit such as the kiwifruit and banana.

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 6: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Enumerative Approach

Fixed sense inventory enumerates the range of possiblemeanings of a word

Context is used to select a particular sense

chop vegetables with a knife, was stabbed with a knife

However, we may want to add senses.

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 7: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

WSD Tasks

Different representations of senses change the way we thinkabout WSD

Lexical sample – disambiguate a restricted set of words

All words – disambiguate all content words

Cross lingual WSD – disambiguate a target word by labeling itwith the appropriate translation in other languages; eg.English coach → German Bus/Linienbus/Omnibus/Reisebus.

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 8: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Representing the Context

Text is unstructured, and needs to be made machine-readable.

Flat representation (surface features) vs. Structuredrepresentation (graphs, trees)

Local features: local context of a word usage, e.g. PoS tagsand surrounding word forms

Topical features: general topic of a sentence or discourse,represented as a bag of words

Syntactic features: argument-head relations between targetand rest of sentence

Semantic features: previously established word senses

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 9: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Knowledge Resources

Structured and Unstructured

Thesauri, machine-readable dictionaries, semantic networks(WordNet)

BabelNet – Babel synsets, with semantic relations (is-a,part-of)

Raw corpora

Collocation (Web1T)

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 10: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Applications

Information extraction – acronym expansion, disambiguatepeople names, domain-specific IE

Information retrieval

Machine Translation

Semantic web

Question answering

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 11: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Approaches

Supervised WSD: classification task, hand-labelled data

KB WSD: uses knowledge resources, no training

Unsupervised: performs WSI

Word sense dominance: find predominant sense of a word

Domain-driven WSD: use domain information as vectors tocompare with senses of w

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 12: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Outline

1 Introduction

2 WSD

3 WSI

4 Evaluation and Issues

5 Wikipedia

6 Summary

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 13: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Supervised WSD

Given a set of manually sense-annotated examples (trainingset), learn a classifier

Features for WSD: Bag of words, bigrams, collocations, VPand NP heads, PoS

Using WordNet as a sense inventory, SemCor is a readilyavailable source of sense-labelled data

Current SotA performance from SVMs

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 14: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Knowledge-based WSD

Exploit knowledge resources (dictionaries, thesauri,collocations) to assign senses

Lower performance than supervised methods, but widercoverage

No need to train or be tuned to a task/domain

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 15: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Gloss Overlap

Knowledge-based method proposed by Lesk (1986)

Retrieve all sense definitions of target word

Compare each sense definition with the definitions of otherwords in context

Choose the sense with the most overlap

To disambiguate pine cone;

pine: 1. a kind of evergreen tree; 2. to waste away throughsorrow.

cone: 1. a solid body which narrows to a point; 2. somethingof this shape; 3. fruit of certain evergreen trees.

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 16: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Lexical Chains

Knowledge-based method proposed by Hirst and St Onge(1998)

A lexical chain is a sequence of semantically related words in atext

Assign scores to senses based on the chain of related words itis in

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 17: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

PageRank

Knowledge-based method proposed by Agirre and Soroa(2009)

Build a graph including all synsets of words in the input text

Assign an initial low value to each node in the graph

Apply PageRank (Brin and Page) to the graph, and selectsynsets with the highest PR

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 18: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Knowledge Acquisition Bottleneck

WSD needs knowledge! Corpora, dictionaries, semanticnetworks

More knowledge is required to improve the performance ofboth:

Supervised systems – more training data

Knowledge based systems – richer networks

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 19: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Minimally Supervised WSD

Human supervision is expensive, but required for trainingexamples or a knowledge base

Minimally supervised approaches aim to learn classifiers fromannotated data with minimal human supervision

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 20: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Bootstrapping

Given a set labelled examples L, a set of unlabelled examplesU and a classifier c :

1. Choose N examples from U and add them to U ′

2. Train c on L and label U ′

3. Select K most confidently labelled instances from U ′ andassign them to L

Repeat until U or K is empty

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 21: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Outline

1 Introduction

2 WSD

3 WSI

4 Evaluation and Issues

5 Wikipedia

6 Summary

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 22: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Word Sense Induction

Based on the idea that one sense of a word will have similarneighbouring words

Follows the idea that the meaning of a word is given by itsusage

We induce word sense from input text by clustering wordoccurrences

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 23: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Clustering

Unsupervised machine learning for grouping similar objectsinto groups

No a priori input (sense labels)

Context clustering: each occurrence of a word is representedas a context vector; cluster vectors into groups

Word clustering: cluster words which are semantically similarand thus have a specific meaning

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 24: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Word Clustering

Aims to cluster words which are semantically similar

Lin (1998) proposes this method:

1. Extract dependency triples from a text corpus

John eats a yummy kiwi → (eat subj John), (kiwi obj-of eat),(kiwi det a) ...

2. Define a measure of similarity between two words

3. Use similarity scores to create a similarity tree; start with aroot node, and add recursively add children in descendingorder of similarity.

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 25: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Lin’s approach: example

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 26: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

WSI: pros and cons

+ Actually performs word sense disambiguation

+ Aims to divide the occurrences of a word into a number ofclasses

- Makes objective evaluation more difficult if notdomain-specific

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 27: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Outline

1 Introduction

2 WSD

3 WSI

4 Evaluation and Issues

5 Wikipedia

6 Summary

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 28: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Disambiguation Evaluation

Disambiguation is easy to evaluate – we have discrete senseinventories

Evaluate with Coverage (answers given),

Precision and Recall, and then F1

Accuracy – correct answers / total answers

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 29: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Disambiguation Baselines

MFS – Most Frequent Sense

Strong baseline - 50-60% accuracy on lexical sample task

Doesn’t take into account genre (e.g. star in astrophysics /newswire)

Subject to idiosyncracies of corpus

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 30: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Evaluation with gold-standard clustering

Given a standard clustering, compare the gold standard andoutput clustering

Can evaluate with set Entropy, Purity

Also RandIndex (similar to Jacquard) and F-Score.

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 31: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Discrimination Baselines

All-in-one: group all words into one big cluster

Random: produce a random set of clusters

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 32: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Pseudowords

Discrimination evaluation method

Generates new words with artificial ambiguity

Select two or more monosemous terms from gold standarddata

Given all their occurrences in a corpus, replace them with apseudoword formed by joining the two terms

Compare automatic discrimination to gold standard

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 33: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

SemEval-2007

Lexical sample and all-words coarse grained WSD

Preposition disambiguation

Evaluation of WSD on cross-language RI

WSI, lexical substitution

Top systems reach 88.7% accuracy (on lexical sample) and82.5% (on all-words)

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 34: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

SemEval-2010

Fifth event of its kind

Includes specific cross-lingual tasks

Combined WSI/WSD task

Domain-specific all-words task

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 35: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Issues

Representation of word senses: enumerative vs. generativeapproach

Knowledge Acquisition Bottleneck: not enough data!

Benefits for AI/NLP applications

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 36: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Alleviating the Knowledge Acquisition Bottleneck

Weakly-supervised algorithms, incorporating bootstrapping oractive learning

Continuing manual efforts – WordNet, Open Mind WordExpert, OntoNotes

Automatic enrichment of knowledge resources – collocationand relation triple extraction, BabelNet

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 37: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Future Challenges

How can we mine even larger repositories of textual data –e.g. the whole web! – to create huge knowledge repositories?

How can we design high performance and scalable algorithmsto use this data?

Need to decide which kind of word sense are needed for whichapplication

Still, need to develop a general representation of word senses

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 38: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Outline

1 Introduction

2 WSD

3 WSI

4 Evaluation and Issues

5 Wikipedia

6 Summary

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 39: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Wikipedia as sense inventory

Wikipedia articles provide an inventory of disambiguated wordsenses and entity references

Task: Use their occurrences in texts, i.e. the internalWikipedia hyperlinks, as named entity and sense annotations

The articles’ texts provide a sense annotated corpus

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 40: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Mihalcea (2007)

Mihalcea proposes a method for automatically generatingsense-tagged data using Wikipedia

Rhythm is the arrangement of sounds in time. Meter animatestime in regular pulse groupings, called measures or [[bar(music)—bar]].

The nightlife is particularly active around the beachfrontpromenades because of its many nightclubs and [[bar(establishment)—bars]].

1. Extract all paragraphs in Wikipedia containing word w

2. Collect all possible labels l2..ln for w

3. Map each label l to its WordNet sense s

4. Annotate each occurrence of li |w with its sense s

System trained on Wikipedia significantly outperforms MFSand Lesk baselines

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 41: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Knowledge-rich WSD

General aim is to relieve knowledge acquisition bottleneck ofNLP systems, with WSD as a case study

Main ideas:

- Extend WordNet with millions of semantic relations (usingWikipedia)

- Apply knowledge-based WSD to exploit extended WordNet

Results: integration of many, many semantic relations inknowledge-based systems yields performance competitive withSotA supervised approaches

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 42: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Wikification

The task of generating hyperlinks to disambiguated Wikipediaconcepts

Two sub-tasks: automatic keyword extraction, WSD

Wikify!1 can perform KW extraction by extracting candidatesand then ranking them

The system does knowledge-based and data-driven WSD,filtering out annotations that contain disagreements

Disambiguate links using relatedness, commonness (priorprobability of a sense), and context quality (context terms).

1Csomai and Mihalcea (2008)Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 43: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Outline

1 Introduction

2 WSD

3 WSI

4 Evaluation and Issues

5 Wikipedia

6 Summary

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction

Page 44: Word Sense Disambiguation and Induction

Introduction WSD WSI Evaluation and Issues Wikipedia Summary

Questions

Thank you. Are there any questions?

Leon Derczynski University of Sheffield

Word Sense Disambiguation and Induction