Towards a Universal Wordnet by Learning from Combined Evidence

Preview:

DESCRIPTION

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.

Citation preview

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Towards a Universal Wordnetby Learning from Combined Evidence

Gerard de Melo and Gerhard Weikum

Max Planck Institute for InformaticsSaarbrucken, Germany

2009-11-03

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Lexical Knowledge

What meanings doesa word have?

How do those meaningsrelate to the meaningsof other words?

Many Applications

examples:NLP, AIquestion answeringquery expansionhuman consultation

person whogives a talk

“speaker”

device that produces

sounds

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Lexical Knowledge

What meanings doesa word have?

How do those meaningsrelate to the meaningsof other words?

Many Applications

examples:NLP, AIquestion answeringquery expansionhuman consultation

flat piece of wood

“board”

committee

panel for writingwith chalk

to enter a transportation

vehicle

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Lexical Knowledge

What meanings doesa word have?

How do those meaningsrelate to the meaningsof other words?

Many Applications

examples:NLP, AIquestion answeringquery expansionhuman consultation

someone who studies

“student”

“pupil”

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Lexical Knowledge

What meanings doesa word have?

How do those meaningsrelate to the meaningsof other words?

Many Applications

examples:NLP, AIquestion answeringquery expansionhuman consultation

faculty

professor

memberpart

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Lexical Knowledge

What meanings doesa word have?

How do those meaningsrelate to the meaningsof other words?

Many Applications

examples:NLP, AIquestion answeringquery expansionhuman consultation

entity

institution

educationalinstitution

university

...

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Lexical Knowledge

What meanings doesa word have?

How do those meaningsrelate to the meaningsof other words?

Many Applications

examples:NLP, AIquestion answeringquery expansionhuman consultation

entity

institution

educationalinstitution

university

...

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Multilinguality

the world ismultilingual

the Internet is alsoincreasinglymultilingual

Top 10 Languages byApprox. No. of Speakers

Source: Ethnologue 2005

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

Multilinguality

the world ismultilingual

the Internet is alsoincreasinglymultilingual

Internet users by Region

Source:

Internet World Stats

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

person who gives a talk

eng: “speaker”

jpn: “ ”話者

rus: “докладчик”

ces: “řečník”

... ......

Vision

universal index of wordmeanings

large-scale semantic networkwith class hierarchy

look up any wordin any language,get a list of its meanings

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Introduction

entitypor: “entidade”

cmn: “ ”制度 institution

educationalinstitution

university

heb: “ישות.”

deu: “Bildungs-einrichtung”

cym: “prifysgol”

...

Vision

universal index of wordmeanings

large-scale semantic networkwith class hierarchy

meanings should be connectedvia semantic relations

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

Lexical KnowledgeMultilingualityVision

Outline

1 Existing Lexical Knowledge Bases

2 Building a Multilingual Wordnet

3 Results and Experiments

4 Summary and Future Work

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Outline

1 Existing Lexical Knowledge Bases

2 Building a Multilingual Wordnet

3 Results and Experiments

4 Summary and Future Work

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

WordNet

lexical database created at Princeton

enumerates meanings of Englishwords

meaning-to-meaning links

Miller, Fellbaum et al. (1990)among most-cited papersin computer science(source: CiteseerX)

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

WordNet

lexical database created at Princeton

enumerates meanings of Englishwords

meaning-to-meaning links

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

WordNet

lexical database created at Princeton

enumerates meanings of Englishwords

meaning-to-meaning links

hypernym hierarchymeronymy (part of)etc.

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Non-English Wordnets

EuroWordNet, BalkaNet, Global WordNet Association

problem: many are small, incomplete

problem: different identifiers, formats, etc.

problem: only ∼10 languages with freely available wordnets

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Non-English Wordnets

EuroWordNet, BalkaNet, Global WordNet Association

problem: many are small, incomplete

problem: different identifiers, formats, etc.

problem: only ∼10 languages with freely available wordnets

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Non-English Wordnets

EuroWordNet, BalkaNet, Global WordNet Association

problem: many are small, incomplete

problem: different identifiers, formats, etc.

problem: only ∼10 languages with freely available wordnets

not a single, coherent resource

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Non-English Wordnets

EuroWordNet, BalkaNet, Global WordNet Association

problem: many are small, incomplete

problem: different identifiers, formats, etc.

problem: only ∼10 languages with freely available wordnets

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Other Resources

PANGLOSS Ontology: Knight & Luk (1994)

TransGraph system: Etzioni et al. (2007)

DBPedia, YAGO, OpenCyc

2 languages, around 70 000 entities

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Other Resources

PANGLOSS Ontology: Knight & Luk (1994)

TransGraph system: Etzioni et al. (2007)

DBPedia, YAGO, OpenCyc

large translation graphlimited structuree.g. no semantic hierarchy

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

WordNetNon-English WordnetsOther Resources

Existing Lexical Knowledge Bases

Other Resources

PANGLOSS Ontology: Knight & Luk (1994)

TransGraph system: Etzioni et al. (2007)

DBPedia, YAGO, OpenCyc

class hierarchy not multilingual

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Outline

1 Existing Lexical Knowledge Bases

2 Building a Multilingual Wordnet

3 Results and Experiments

4 Summary and Future Work

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Strategy

use existing wordnets as backbone

add new terms, link to meaning nodes

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

eng: “course”

eng: “class”

Existing Wordnets

−→

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Desired Output

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Strategy

use existing wordnets as backbone

add new terms, link to meaning nodes

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

eng: “course”

eng: “class”

Existing Wordnets

−→

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Desired OutputGerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Input Graph

use existing wordnets as backbone

add translations to graph

mainly English, Spanish, Catalan

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

eng: “course”

eng: “class”

Input Graph G0

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Input Graph

use existing wordnets as backbone

add translations to graph

dictionaries (e.g. Wiktionary)thesauri and ontologiesparallel corpora (word alignment)

also: predict new translations

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Input Graph G0

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Approach: Link new words to meanings of their translations

Huge Challenge: Disambiguation!

academic course

part of a meal

route of travel

series of events

ita: “piatto”

eng: “course”

trans-lation

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Approach: Link new words to meanings of their translations

Huge Challenge: Disambiguation!

academic course

part of a meal

route of travel

series of events

ita: “piatto”

eng: “course”

trans-lation

?

?

??

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

academic course

part of a meal

route of travel

series of events

ita: “piatto”

eng: “course”

trans-lation

?

?

??

Approach

variety of features that analyseprevious graph Gi−1,incorporate neighbourhoodinformation into anedge’s feature vector

supervised learning: new edgeweights determined usingRBF-kernel SVM with posteriorprobability estimation

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

academic course

part of a meal

route of travel

series of events

ita: “piatto”

eng: “course”

trans-lation

?

?

??

Approach

variety of features that analyseprevious graph Gi−1,incorporate neighbourhoodinformation into anedge’s feature vector

supervised learning: new edgeweights determined usingRBF-kernel SVM with posteriorprobability estimation

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Example Feature:

fra: “suite” academic course?

t m

Given term tand meaning m

Question: Should they be linked?

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Example Feature:

fra: “suite” academic course?

t m

fra: “suite”

spa: “trayectoria”

eng: “course”

part of a meal

academic course

route of travel

...

series of eventst'

m'm'

Given term tand meaning m

Question: Should they be linked?

Look at neighbours t ′ ∈ Γt

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Example Feature:

fra: “suite” academic course?

t m

fra: “suite”

spa: “trayectoria”

eng: “course”

part of a meal

academic course

route of travel

...

series of eventst'

m'm'

∑t′∈Γ(t)

sim∗(t ′,m)

sim∗(t ′,m) + dissim(t ′,m)

sim∗(t′,m)= maxm′∈Γ(t′)

sim(m′,m)

dissim(t′,m)=P

m′∈Γ(t′)(1−sim(m′,m))

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Example Feature:

fra: “suite” academic course?

t m

fra: “suite”

spa: “trayectoria”

eng: “course”

part of a meal

academic course

route of travel

...

series of eventst'

m'm'

∑t′∈Γ(t)

sim∗(t ′,m)

sim∗(t ′,m) + dissim(t ′,m)

sim∗(t′,m)= maxm′∈Γ(t′)

sim(m′,m)

dissim(t′,m)=P

m′∈Γ(t′)(1−sim(m′,m))

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

Example Feature:

fra: “suite” academic course?

t m

fra: “suite”

spa: “trayectoria”

eng: “course”

part of a meal

academic course

route of travel

...

series of eventst'

m'm'

∑t′∈Γ(t)

φ1(t, t ′) sim∗(t ′,m)

sim∗(t ′,m) + dissim(t ′,m)

sim∗(t′,m)= maxm′∈Γ(t′)

φ2(t′,m′)sim(m′,m)

dissim(t′,m)=P

m′∈Γ(t′)φ2(t′,m′)(1−sim(m′,m))

weighting based on:part-of-speechcorpus frequency...

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Other Features

cosine similarity oftranslations with gloss

scores assessing polysemy bylooking at back-translations

many more(see paper for details)

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Approach

use scores as features forRBF-kernel SVM

multiple iterations:each graphs Gi based on theprevious Gi−1

stop when F1 score plateauis reached on a validation set

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Approach

use scores as features forRBF-kernel SVM

multiple iterations:each graphs Gi based on theprevious Gi−1

stop when F1 score plateauis reached on a validation set

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

StrategyInput GraphApproachFeatures

Building a Multilingual Wordnet

deu: “Reihe”

spa: “trayectoria”

academic course

part of a meal

route of travel

series of events

ita: “piatto”

fra: “suite”

eng: “course”

deu: “Kurs”

eng: “class”

Approach

use scores as features forRBF-kernel SVM

multiple iterations:each graphs Gi based on theprevious Gi−1

stop when F1 score plateauis reached on a validation set

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Outline

1 Existing Lexical Knowledge Bases

2 Building a Multilingual Wordnet

3 Results and Experiments

4 Summary and Future Work

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Results

Setup

input graph G0:448,069 pre-existing term-meaning links10,805,400 translation edges1.3 million term nodes with candidates7.7 candidate meanings per new term

2,445 term-meaning links for training (French/German)

2,901 term-meaning links as validation set

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Results

Setup

input graph G0:448,069 pre-existing term-meaning links10,805,400 translation edges1.3 million term nodes with candidates7.7 candidate meanings per new term

2,445 term-meaning links for training (French/German)

2,901 term-meaning links as validation set

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Results

Setup

input graph G0:448,069 pre-existing term-meaning links10,805,400 translation edges1.3 million term nodes with candidates7.7 candidate meanings per new term

2,445 term-meaning links for training (French/German)

2,901 term-meaning links as validation set

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Results

deu: “Schulgebäude”

school (group of fish)

school(institution)

school(building)

deu: “Schulhaus”

deu: “Fischschwarm”

ces: “hejno”

fra: “banc”

ind: “sekolah”

jpn: “ ”学校

kor: “ ”학교

lao: “ໂຮງຮຽນ”

kat: “ ”სკოლა

Excerpt from final UWN graph G3 after 3 iterationsretaining only edges with sufficiently high weights (0.5 / 0.6)

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Evaluation

Relation Precision1

Term-Meaning Links (French) 89.2% ± 3.4%Term-Meaning Links (German) 85.9% ± 3.8%Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3%

Generalization (Hypernymy) 87.1% ± 4.8%Instance 89.3% ± 4.4%Similarity 92.0% ± 3.8%Category 93.3% ± 4.5%Part (Meronymy) 94.4% ± 4.1%Member (Meronymy) 92.7% ± 4.0%Substance (Meronymy) 95.6% ± 3.5%Opposite 94.3% ± 3.9%

1: Wilson score intervals for random samples

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Coverage

Language Term-Meaning Links Distinct Terms

Overall 1,595,763 822,212

German 132,523 67,087French 75,544 33,423Esperanto 71,247 33,664Dutch 68,792 30,154Spanish 68,445 32,143Turkish 67,641 31,553Czech 59,268 33,067Russian 57,929 26,293Portuguese 55,569 23,499Italian 52,008 24,974Hungarian 46,492 28,324Thai 44,523 30,815

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Semantic Relatedness

Experimental Setup

Example: “curriculum” considered closely related to“school”, but not to “water”

compute term relatedness using UWNsim(t1, t2) = max

s1∈σ(t1)max

s2∈σ(t2)sim(s1, s2) sim(s1, s2):

combined graph-/gloss-based method

compare with assessments of relatedness made by humanjudges

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Semantic Relatedness

Experimental Setup

Example: “curriculum” considered closely related to“school”, but not to “water”

compute term relatedness using UWNsim(t1, t2) = max

s1∈σ(t1)max

s2∈σ(t2)sim(s1, s2) sim(s1, s2):

combined graph-/gloss-based method

compare with assessments of relatedness made by humanjudges

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Semantic Relatedness

Experimental Setup

Example: “curriculum” considered closely related to“school”, but not to “water”

compute term relatedness using UWNsim(t1, t2) = max

s1∈σ(t1)max

s2∈σ(t2)sim(s1, s2) sim(s1, s2):

combined graph-/gloss-based method

compare with assessments of relatedness made by humanjudges

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Semantic Relatedness

Results for 3 German DatasetsDataset GUR65 GUR350 ZG222

r Cov. r Cov. r Cov.

Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222)

Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205GermaNet (Lin*) 0.73 60 0.50 208 0.08 88

UWN 0.80 60 0.68 242 0.51 106r : Pearson product-moment correlation coefficient

Cov.: absolute coverage

∗: scores by Gurevych et al. (2007)

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Cross-Lingual Text Classification

cross-lingual TC: train using documents in one language,classify documents in another language

used bag-of-words/meanings TF-IDF vectors

Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents

SVMlight

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Cross-Lingual Text Classification

cross-lingual TC: train using documents in one language,classify documents in another language

used bag-of-words/meanings TF-IDF vectors

Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents

SVMlight

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Cross-Lingual Text Classification

cross-lingual TC: train using documents in one language,classify documents in another language

used bag-of-words/meanings TF-IDF vectors

Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents

SVMlight

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Cross-Lingual Text Classification

cross-lingual TC: train using documents in one language,classify documents in another language

used bag-of-words/meanings TF-IDF vectors

Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents

SVMlight

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification

Application: Cross-Lingual Text Classification

Language Pair Terms only Terms + Meanings

English-Italian 68.3% 76.3%

English-Russian 51.7% 71.2%

Italian-English 74.4% 78.1%

Italian-Russian 58.4% 73.2%

Russian-English 67.3% 76.8%

Russian-Italian 62.2% 71.8%

(all values are F1 scores)

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Outline

1 Existing Lexical Knowledge Bases

2 Building a Multilingual Wordnet

3 Results and Experiments

4 Summary and Future Work

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Summary

large-scale multilingual wordnet:85% accuracy, 800,000 terms, over 1.5 million links fromterms to meanings,

built by learning edge weights using graph-based evidence

useful for monolingual and cross-lingual tasks

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Summary

large-scale multilingual wordnet:85% accuracy, 800,000 terms, over 1.5 million links fromterms to meanings,

built by learning edge weights using graph-based evidence

useful for monolingual and cross-lingual tasks

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Summary

large-scale multilingual wordnet:85% accuracy, 800,000 terms, over 1.5 million links fromterms to meanings,

built by learning edge weights using graph-based evidence

useful for monolingual and cross-lingual tasks

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Future Work

ongoing work: user interface incl. user contributions

techniques to automatically discover new word meanings

word sense disambiguation, query expansion using UWN

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Future Work

ongoing work: user interface incl. user contributions

techniques to automatically discover new word meanings

word sense disambiguation, query expansion using UWN

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Future Work

ongoing work: user interface incl. user contributions

techniques to automatically discover new word meanings

word sense disambiguation, query expansion using UWN

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29

IntroductionExisting Lexical Knowledge Bases

Building a Multilingual WordnetResults and Experiments

Summary and Future Work

SummaryFuture Work

Thanks!

expression of gratitude

eng: “thank you”

yue: “ ”唔該

cmn: “ ”谢谢

jap: “ ”ありがとう

spa: “gracias”

ara: “شكرا.”

Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29

Recommended