10
Unsupervised Word Sense Disambiguation REU, Summer, 2009

Unsupervised Word Sense Disambiguation

  • Upload
    gaetan

  • View
    78

  • Download
    0

Embed Size (px)

DESCRIPTION

Unsupervised Word Sense Disambiguation. REU, Summer, 2009. Word Sense Disambiguation. large vessel for holding gases or liquids. E.g., “The soldiers drove the tank .”. armored combat vehicle. Context Knowledge Base. “Many companies hire computer programmers”. - PowerPoint PPT Presentation

Citation preview

Page 1: Unsupervised Word Sense Disambiguation

UnsupervisedWord Sense Disambiguation

REU, Summer, 2009

Page 2: Unsupervised Word Sense Disambiguation

Word Sense Disambiguation

E.g., “The soldiers drove the tank .”

armored combat vehicle

large vessel for holding gases or liquids

Page 3: Unsupervised Word Sense Disambiguation

hire

Context Knowledge Base

company programmer

many computer

“Many companies hire computer programmers”

write

programmer software

“Computer programmers write software”

+

computer

Page 4: Unsupervised Word Sense Disambiguation

Context Knowledge Base

hire

company programmer

many computer

write

software

1 1

1

1 1

2

Result of merging dependency trees

Weights are number of dependency relation

instances found

Page 5: Unsupervised Word Sense Disambiguation

WSD Algorithm

Parse original sentence using Minipar, get weighted dependency tree.

hire

company programmer

software computer

“A large software company hires computer programmers.”

1

0.5

0.33

To-be-disambiguated word

large

1 1

Weights are distances from to-be-disambiguated word

Page 6: Unsupervised Word Sense Disambiguation

Parse each gloss of to-be-disambiguated word, get weighted dependency trees.

WSD Algorithm

Gloss 1: an institution created to conduct business

create

institution

business

unit

small military

Gloss 2: a small military unit

conduct

Page 7: Unsupervised Word Sense Disambiguation

For each word in a gloss tree, find that word’s dependent words in the context knowledge base. We are looking for words in the knowledge base that match words in the original sentence. In other words, we are looking for context clues to disambiguate a word.

A score is generated based on the weights of those dependency relations in the knowledge base, and the dependent words of the to-be-disambiguated word in the original sentence. The more matches we find, the higher the generated score will be.

The gloss with the highest generated score will be selected as the correct sense of the word.

WSD Algorithm

Page 8: Unsupervised Word Sense Disambiguation

Synonym Matching

If no direct matches are found between a gloss word and dependency relations in context knowledge base, we can replace the gloss word with one of its synonyms, since synonyms are semantically equivalent words.

Page 9: Unsupervised Word Sense Disambiguation

Hypernym/hyponym Matching

E.g., animal

mammal dog

poodle

• Extract hypernyms and hyponyms of words from WordNet database.

• Store these in a data structure.

• Strategies: use all “levels”use only levels close to the original wordapply the above strategies to synonym matching, as

well

Page 10: Unsupervised Word Sense Disambiguation

Word Similarity

• Use WordNet::Similarity Perl module to calculate “similarity score” between gloss word and dependent words in knowledge base.

• The most similar word found will be considered the closest to an actual match.

dog animal0.780

dog desk0.162

WordNet::Similarity similarity scores