34
Word Sense Disambiguation: A Survey Roberto Navigli Department of Computer Science University of Rome "La Sapienza“ ACM Computing Surveys, Vol. 41, No. 2, 2009. Reporter: Yi-Ting Huang Date: 2009/11/20 1

Word Sense Disambiguation: A Survey

  • Upload
    gyan

  • View
    67

  • Download
    1

Embed Size (px)

DESCRIPTION

Word Sense Disambiguation: A Survey. Roberto Navigli Department of Computer Science University of Rome "La Sapienza “ ACM Computing Surveys, Vol. 41, No. 2, 2009. Reporter: Yi-Ting Huang Date: 2009/11/20. What is WSD?. (a) I can hear bass sounds. (b) They like grilled bass . - PowerPoint PPT Presentation

Citation preview

Page 1: Word Sense Disambiguation: A Survey

1

Word Sense Disambiguation: A Survey

Roberto NavigliDepartment of Computer ScienceUniversity of Rome "La Sapienza“

ACM Computing Surveys, Vol. 41, No. 2, 2009.

Reporter: Yi-Ting HuangDate: 2009/11/20

Page 2: Word Sense Disambiguation: A Survey

2

What is WSD?

(a) I can hear bass sounds.(b) They like grilled bass.

(a) low-frequency tones (b) a type of fish

• The computational identification of meaning for words in context is called word sense disambiguation (WSD).

Page 3: Word Sense Disambiguation: A Survey

3

Motivation

• WSD has been described as an AI-complete problem, that is, by analogy to NP-completeness in complexity theory.– the task lends itself to different formalizations due

to fundamental questions, like the approach to the representation of a word sense, the granularity of sense inventories, etc.

– WSD heavily relies on knowledge.

Page 4: Word Sense Disambiguation: A Survey

4

Task description

• Word sense disambiguation is the ability to computationally determine which sense of a word is activated by its use in a particular context.

• Input: T=• Sense: is the set of senses encoded

in a dictionary D for word wi.• WSD: a mapping A from words to senses,• Output: A(i) for wi.

Page 5: Word Sense Disambiguation: A Survey

5

The four main elements of WSD

• Selection of Word Senses• the use of external knowledge sources• the representation of context• The selection of an method.

Page 6: Word Sense Disambiguation: A Survey

6

Selection of Word Senses

(c) She chopped the vegetables with a chef ’s knife.(d) A man was beaten and cut with a knife.

(c) a tool(d) a weapon

• Sense granularity? (splitting vs. lumping?)

Page 7: Word Sense Disambiguation: A Survey

7

the use of external knowledge sources

Structure resource• Thesauri

e.g. Macquarie Thesaurus• Machine-readable

dictionariese.g. WordNet

• Ontologiese.g. Omega Ontology, SUMO, UMLS

Unstructured resources• Corpora

– Raw Corpora: Brown Corpus, British National Corpus (BNC), Wall Street Journal(WSJ), American National Corpus, etc.

– Sense-Annotated Corpora: SemCor, Open Mind Word Expert data set.

– Collocation: Word Sketch Engine, JustTheWord, The British National Corpus collcations, etc.

Page 8: Word Sense Disambiguation: A Survey

8

WordNet

• WordNet is a computational lexicon of English based on psycholinguistic principles, created and maintained at Princeton University.

• Its latest version, WordNet 3.0, contains about 155,000 words organized in over 117,000 synsets.

Page 9: Word Sense Disambiguation: A Survey

9

For each synset, WordNet provides the following information:

• A gloss: a textual definition of the synset possibly with a set of usage examples.

• Lexical and semantic relation:– Antonymy: X is an anyonym of Y. (good vs. bad)– Pertainymy: X is adjective which can be defined as “of”

or “pertaining to ” a noun. (dental pertains to tooth)– Nominalization: a noun X nominalize a verb Y.– Hypernymy: kind-of or is-a. (noun and verb)

Page 10: Word Sense Disambiguation: A Survey

10

For each synset, WordNet provides the following information: (cont.)

Page 11: Word Sense Disambiguation: A Survey

11

Page 12: Word Sense Disambiguation: A Survey

12

WordNet Domain taxonomy

• Magnini and Cavagli ` a [2000] developed a data set of domain labels for WordNet synsets

• WordNet synsets have been semiautomatically annotated with one or more domain labels from a predefined set of about 200 tags from the Dewey Decimal Classification (e.g. FOOD, ARCHITECTURE, SPORT, etc.) plus a generic label (FACTOTUM) when no domain information is available.

• Labels are organized in a hierarchical structure (e.g., PSYCHOANALYSIS is a kind of PSYCHOLOGY domain)

Page 13: Word Sense Disambiguation: A Survey

13

Representation of Context

• Preprocessing

Page 14: Word Sense Disambiguation: A Survey

14

Representation of Context (cont.)

Page 15: Word Sense Disambiguation: A Survey

15

Representation of Context (cont.)

• It must be noted that choosing the appropriate size of context (both in structured and unstructured representations) is an important factor in the development of a WSD algorithm, as it is known to affect the disambiguation performance.

Page 16: Word Sense Disambiguation: A Survey

16

Choice of a methods

• Supervised disambiguation– Naïve Bayes

• Unsupervised disambiguation– Context clustering– Coocurrence graph

• Knowledge-based disambiguation– Overlap of sense definitions– Structural approaches

• Determining Word Sense Dominance

Page 17: Word Sense Disambiguation: A Survey

17

Naïve Bayes

• Given target word w, features(context), training data.

• Train: By tagged example, to calculate P(sense) and P(features|sense)

• Test:

• Example: “The bank(Finance or River) cashed my check”

Page 18: Word Sense Disambiguation: A Survey

18

Context Clustering

• Matrix: A word w in a corpus can be represented as a vector whose jth component counts the number of times that word wj cooccurs with w within a fixed context (a sentence or a larger context).

stock(deposit, money, account)

Page 19: Word Sense Disambiguation: A Survey

19

Cooccurrence Graphs• a graph G = (V, E) who se vertices V correspond to words in a text and

edges E connect pairs of words which cooccur in a syntactic relation, in the same paragraph, or in a larger context.

• Each edge is assigned a weight according to the relative cooccurrence frequency of the two words connected by the edge.

• Iteration algorithm: each iteration, the node with highest relative degree in the graph is selected as a hub (based on the experimental finding that a node’s degree and its frequency in the original text are highly correlated). As a result, all its neighbors are no longer eligible as hub candidates. The algorithmstops when the relative frequency of the word corresponding to the selected hub is below a fixed threshold.

Page 20: Word Sense Disambiguation: A Survey

20

Overlap of Sense Definitions•

• where context(w) is, as above, the bag of all content words in a context window around the target word w and gloss(S) is the bag of words in the textual definition of a sense S which is either S itself or related to S through a relation rel .

Page 21: Word Sense Disambiguation: A Survey

21

Structural Approaches• Similarity-based

We disambiguate a target word wi in a text T = (w1, ... , wn) by choosing the sense ^S of wi which maximizes the following sum:

•where d(Sw, Sw’) is the shortest distance between Sw and Sw’ (i.e., the number of edges of the shortest path over the lexicon network). The shortest path is calculated on the WordNet taxonomy.

• Leacock and Chodorow [1998] focused on hypernymy links and scaled the path length by the overall depth D of the taxonomy.

Page 22: Word Sense Disambiguation: A Survey

22

Structural Approaches (cont.)• Resnik [1995]: the more specific the concept that subsumes two or

more words, the more semantically related they areassumed to be.

where lso(Sw, Sw’ ) is the lowest superordinate (i.e., most specific common ancestor in the noun taxonomy) of Sw and Sw’ , and p(S) is the probability of encountering an instance of sense S in a reference corpus.

• Jiang and Conrath’s [1997]

• Lin’s [1998b]

Page 23: Word Sense Disambiguation: A Survey

23

Structural Approaches (cont.)

Page 24: Word Sense Disambiguation: A Survey

24

Structural Approaches

• graph-based: lexical chain is a sequence of semantically related words w1, ... , wn in a text, such that wi is related to wi+1 by a lexicosemantic relation (e.g., is-a, has-part, etc.). E.g. Rome → city → inhabitant, eat → dish → vegetable → aubergine, etc.

Page 25: Word Sense Disambiguation: A Survey

25

Lexical chain

• Hirst and St-Onge [1998]1. Construct a lexical chain

However, it is inaccurate and inefficiency.

Page 26: Word Sense Disambiguation: A Survey

26

Lexical chain (cont.)• Galley and McKeown [2003]

– To build a graph by all possible interpretations of the target words.

– Compare to each word against all words previously read. If a relation exists, to establish a relation.

– Calculate strength and distance between the words.

– In the disambiguation stage, all occurrences of a give word are collected together.

– For each sense of the target word, the strength of all connections are summed.

Then, the actual connections comprising the winning unified score are used as a basis for computing the lexical chains.

62.1 accuracy.

Page 27: Word Sense Disambiguation: A Survey

27

Structural Approaches• Structural Semantic Interconnections (SSI): a development

of lexical chains based on the encoding of a context free grammar of valid semantic interconnection patterns.

• Given a word context W, SSI builds a subgraph of the WordNet.

• Then, selects those senses for the words in context which maximize the degree of connectivity of the induced subgraph of the WordNet lexicon.

• SSI outperforms state-of-the-art unsupervised systems in the Senseval-3 all-words and the Semeval-2007 coarse-grained all-words competition

Page 28: Word Sense Disambiguation: A Survey

28

Determining Word Sense Dominance

• Idea: given a word, the frequency distribution of its senses is highly skewed in texts. Key in their approach is the observation that distributionally similar neighbors often provide cues about the senses of a word.– Given the target word w, N(w)={n1,n2,…,nk} –

Page 29: Word Sense Disambiguation: A Survey

29

Determining Word Sense Dominance

Page 30: Word Sense Disambiguation: A Survey

30

How to Evaluation

• Coverage as the percentage of items in the test set for which the system provided a sense assignment that is:

• Precision

• Recall

Page 31: Word Sense Disambiguation: A Survey

31

Baseline• The Random Baseline.

Under the uniform distribution, for each word wi the probability of success of such a choice is ,

• The First Sense Baseline is based on a ranking of word senses.– In WordNet, senses of the same word are ranked based on the

frequency of occurrence of each sense in the SemCor corpus.

Page 32: Word Sense Disambiguation: A Survey

32

Evaluation

• Senseval-1

• Senseval-3

• Semeval-2007

Page 33: Word Sense Disambiguation: A Survey

33

Conclusion

• In this article the author surveyed the field of word sense disambiguation.

• Methods– Supervised disambiguation– Unsupervised disambiguation– Knowledge-based disambiguation– Type-based disambiguation

Page 34: Word Sense Disambiguation: A Survey

34

Comments

• Help me to define my research problem and task description: – Given a paragraph P={t1, t2…tm}, how to find

implicit sense s of a paragraph.• Help me to develop my methods:– Knowledge-based methods