Upload
gerard-de-melo
View
175
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.
Citation preview
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Towards a Universal Wordnetby Learning from Combined Evidence
Gerard de Melo and Gerhard Weikum
Max Planck Institute for InformaticsSaarbrucken, Germany
2009-11-03
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Lexical Knowledge
What meanings doesa word have?
How do those meaningsrelate to the meaningsof other words?
Many Applications
examples:NLP, AIquestion answeringquery expansionhuman consultation
person whogives a talk
“speaker”
device that produces
sounds
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Lexical Knowledge
What meanings doesa word have?
How do those meaningsrelate to the meaningsof other words?
Many Applications
examples:NLP, AIquestion answeringquery expansionhuman consultation
flat piece of wood
“board”
committee
panel for writingwith chalk
to enter a transportation
vehicle
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Lexical Knowledge
What meanings doesa word have?
How do those meaningsrelate to the meaningsof other words?
Many Applications
examples:NLP, AIquestion answeringquery expansionhuman consultation
someone who studies
“student”
“pupil”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Lexical Knowledge
What meanings doesa word have?
How do those meaningsrelate to the meaningsof other words?
Many Applications
examples:NLP, AIquestion answeringquery expansionhuman consultation
faculty
professor
memberpart
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Lexical Knowledge
What meanings doesa word have?
How do those meaningsrelate to the meaningsof other words?
Many Applications
examples:NLP, AIquestion answeringquery expansionhuman consultation
entity
institution
educationalinstitution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Lexical Knowledge
What meanings doesa word have?
How do those meaningsrelate to the meaningsof other words?
Many Applications
examples:NLP, AIquestion answeringquery expansionhuman consultation
entity
institution
educationalinstitution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Multilinguality
the world ismultilingual
the Internet is alsoincreasinglymultilingual
Top 10 Languages byApprox. No. of Speakers
Source: Ethnologue 2005
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
Multilinguality
the world ismultilingual
the Internet is alsoincreasinglymultilingual
Internet users by Region
Source:
Internet World Stats
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
person who gives a talk
eng: “speaker”
jpn: “ ”話者
rus: “докладчик”
ces: “řečník”
... ......
Vision
universal index of wordmeanings
large-scale semantic networkwith class hierarchy
look up any wordin any language,get a list of its meanings
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Introduction
entitypor: “entidade”
cmn: “ ”制度 institution
educationalinstitution
university
heb: “ישות.”
deu: “Bildungs-einrichtung”
cym: “prifysgol”
...
Vision
universal index of wordmeanings
large-scale semantic networkwith class hierarchy
meanings should be connectedvia semantic relations
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
Lexical KnowledgeMultilingualityVision
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of Englishwords
meaning-to-meaning links
Miller, Fellbaum et al. (1990)among most-cited papersin computer science(source: CiteseerX)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of Englishwords
meaning-to-meaning links
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of Englishwords
meaning-to-meaning links
hypernym hierarchymeronymy (part of)etc.
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
not a single, coherent resource
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
2 languages, around 70 000 entities
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
large translation graphlimited structuree.g. no semantic hierarchy
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
WordNetNon-English WordnetsOther Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
class hierarchy not multilingual
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
−→
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Desired Output
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
−→
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Desired OutputGerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
mainly English, Spanish, Catalan
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
dictionaries (e.g. Wiktionary)thesauri and ontologiesparallel corpora (word alignment)
also: predict new translations
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-lation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-lation
?
?
??
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-lation
?
?
??
Approach
variety of features that analyseprevious graph Gi−1,incorporate neighbourhoodinformation into anedge’s feature vector
supervised learning: new edgeweights determined usingRBF-kernel SVM with posteriorprobability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-lation
?
?
??
Approach
variety of features that analyseprevious graph Gi−1,incorporate neighbourhoodinformation into anedge’s feature vector
supervised learning: new edgeweights determined usingRBF-kernel SVM with posteriorprobability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course?
t m
Given term tand meaning m
Question: Should they be linked?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
Given term tand meaning m
Question: Should they be linked?
Look at neighbours t ′ ∈ Γt
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
∑t′∈Γ(t)
sim∗(t ′,m)
sim∗(t ′,m) + dissim(t ′,m)
sim∗(t′,m)= maxm′∈Γ(t′)
sim(m′,m)
dissim(t′,m)=P
m′∈Γ(t′)(1−sim(m′,m))
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
∑t′∈Γ(t)
sim∗(t ′,m)
sim∗(t ′,m) + dissim(t ′,m)
sim∗(t′,m)= maxm′∈Γ(t′)
sim(m′,m)
dissim(t′,m)=P
m′∈Γ(t′)(1−sim(m′,m))
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
∑t′∈Γ(t)
φ1(t, t ′) sim∗(t ′,m)
sim∗(t ′,m) + dissim(t ′,m)
sim∗(t′,m)= maxm′∈Γ(t′)
φ2(t′,m′)sim(m′,m)
dissim(t′,m)=P
m′∈Γ(t′)φ2(t′,m′)(1−sim(m′,m))
weighting based on:part-of-speechcorpus frequency...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Other Features
cosine similarity oftranslations with gloss
scores assessing polysemy bylooking at back-translations
many more(see paper for details)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features forRBF-kernel SVM
multiple iterations:each graphs Gi based on theprevious Gi−1
stop when F1 score plateauis reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features forRBF-kernel SVM
multiple iterations:each graphs Gi based on theprevious Gi−1
stop when F1 score plateauis reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
StrategyInput GraphApproachFeatures
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features forRBF-kernel SVM
multiple iterations:each graphs Gi based on theprevious Gi−1
stop when F1 score plateauis reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Results
Setup
input graph G0:448,069 pre-existing term-meaning links10,805,400 translation edges1.3 million term nodes with candidates7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Results
Setup
input graph G0:448,069 pre-existing term-meaning links10,805,400 translation edges1.3 million term nodes with candidates7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Results
Setup
input graph G0:448,069 pre-existing term-meaning links10,805,400 translation edges1.3 million term nodes with candidates7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Results
deu: “Schulgebäude”
school (group of fish)
school(institution)
school(building)
deu: “Schulhaus”
deu: “Fischschwarm”
ces: “hejno”
fra: “banc”
ind: “sekolah”
jpn: “ ”学校
kor: “ ”학교
lao: “ໂຮງຮຽນ”
kat: “ ”სკოლა
Excerpt from final UWN graph G3 after 3 iterationsretaining only edges with sufficiently high weights (0.5 / 0.6)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Evaluation
Relation Precision1
Term-Meaning Links (French) 89.2% ± 3.4%Term-Meaning Links (German) 85.9% ± 3.8%Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3%
Generalization (Hypernymy) 87.1% ± 4.8%Instance 89.3% ± 4.4%Similarity 92.0% ± 3.8%Category 93.3% ± 4.5%Part (Meronymy) 94.4% ± 4.1%Member (Meronymy) 92.7% ± 4.0%Substance (Meronymy) 95.6% ± 3.5%Opposite 94.3% ± 3.9%
1: Wilson score intervals for random samples
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Coverage
Language Term-Meaning Links Distinct Terms
Overall 1,595,763 822,212
German 132,523 67,087French 75,544 33,423Esperanto 71,247 33,664Dutch 68,792 30,154Spanish 68,445 32,143Turkish 67,641 31,553Czech 59,268 33,067Russian 57,929 26,293Portuguese 55,569 23,499Italian 52,008 24,974Hungarian 46,492 28,324Thai 44,523 30,815
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to“school”, but not to “water”
compute term relatedness using UWNsim(t1, t2) = max
s1∈σ(t1)max
s2∈σ(t2)sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by humanjudges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to“school”, but not to “water”
compute term relatedness using UWNsim(t1, t2) = max
s1∈σ(t1)max
s2∈σ(t2)sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by humanjudges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to“school”, but not to “water”
compute term relatedness using UWNsim(t1, t2) = max
s1∈σ(t1)max
s2∈σ(t2)sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by humanjudges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Semantic Relatedness
Results for 3 German DatasetsDataset GUR65 GUR350 ZG222
r Cov. r Cov. r Cov.
Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222)
Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205GermaNet (Lin*) 0.73 60 0.50 208 0.08 88
UWN 0.80 60 0.68 242 0.51 106r : Pearson product-moment correlation coefficient
Cov.: absolute coverage
∗: scores by Gurevych et al. (2007)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)for each language pair:105 binary classification tasks, each using200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SetupOutputEvaluationApplication: Semantic RelatednessApplication: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
Language Pair Terms only Terms + Meanings
English-Italian 68.3% 76.3%
English-Russian 51.7% 71.2%
Italian-English 74.4% 78.1%
Italian-Russian 58.4% 73.2%
Russian-English 67.3% 76.8%
Russian-Italian 62.2% 71.8%
(all values are F1 scores)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Summary
large-scale multilingual wordnet:85% accuracy, 800,000 terms, over 1.5 million links fromterms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Summary
large-scale multilingual wordnet:85% accuracy, 800,000 terms, over 1.5 million links fromterms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Summary
large-scale multilingual wordnet:85% accuracy, 800,000 terms, over 1.5 million links fromterms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
IntroductionExisting Lexical Knowledge Bases
Building a Multilingual WordnetResults and Experiments
Summary and Future Work
SummaryFuture Work
Thanks!
expression of gratitude
eng: “thank you”
yue: “ ”唔該
cmn: “ ”谢谢
jap: “ ”ありがとう
spa: “gracias”
ara: “شكرا.”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29