Upload
gerard-de-melo
View
151
Download
1
Tags:
Embed Size (px)
DESCRIPTION
We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
Citation preview
Step 1: Link PredictionStep 1: Link Prediction
UWN's Multilingual GraphUWN's Multilingual Graph
• Goal: Richer, Less Sparse Features• How: Model Synonymy, Polysemy, Semantic Relatedness, Taxonomy. (within and across languages)
UWN: A Large MultilingualLexical Knowledge Base
Gerard de Melo and Gerhard WeikumICSI Berkeley / Max Planck Institute for Informatics
Better NLP Features using Lexical SemanticsBetter NLP Features using Lexical Semantics
More Information:www.lexvo.org/gdm/
• Downloadable API available
• Web User Interface
EntityEntitypor: “entidade”por: “entidade”
cmn: “制度”cmn: “制度” InstitutionInstitution
Educationalinstitution
Educationalinstitution
UniversityUniversity
heb: “ישות.”heb: “ישות.”
deu: “Bildungs-einrichtung”
deu: “Bildungs-einrichtung”
srp:“универзитете”
srp:“универзитете”
...
University of California, Berkeley
University of California, Berkeley
eng: “Berkeley ”eng: “Berkeley ”
ara: ”وجود، كينونة“
ara: ”وجود، كينونة“
tha: “ สถาบัน”tha: “ สถาบัน”
fin: “oppilaitos”fin: “oppilaitos”
fin: “yliopisto”fin: “yliopisto”
cmn: “柏克萊加州大學”
cmn: “柏克萊加州大學”
Berkeley, CABerkeley, CA
George BerkeleyGeorge Berkeley
deu: “Schulgebäude”deu: “Schulgebäude”
school (group of fish)
school (group of fish)
school(institution)
school(institution)
school(building)school
(building)
deu: “Schulhaus”deu: “Schulhaus”
deu: “Fischschwarm”deu: “Fischschwarm”
ces: “hejno”ces: “hejno”
fra: “banc”fra: “banc”
chv: “шкул”chv: “шкул”
jpn: “学校”jpn: “学校”
kor: “학교”kor: “학교”
lao: “ໂຮງຮຽນ”lao: “ໂຮງຮຽນ”
kat: “სკოლა”kat: “სკოლა”
• Over 16 million words and names in over 200 languages semantically connected
• Ambiguity and synonymy captured
eng: “UC Berkeley”eng: “UC Berkeley” eng: “Cal”eng: “Cal”
CityCity
GeopoliticalEntity
GeopoliticalEntity
ChuvashChuvash
GeorgianGeorgian
Lexvo.org LanguageDescriptions:LanguagesScriptsCharactersCountries
Cyrllic(Script) Cyrllic(Script)
Russia (Country)Russia
(Country)
UWN: Meaning Distinctions
OntologicalTaxonomy
Encyclopedic Knowledge,
Pictures, Video,
Sounds, Maps
Etymological and other word
relationships
Millions of Named Entities(People, Places,
Proteins, Asteroids,
Companies, etc.)
200+ languages
Step 2: Entity IntegrationStep 2: Entity Integration
Step 3: Taxonomy InductionStep 3: Taxonomy Induction ExtrasExtras
• Markov Chain to rank taxonomic parents• 270 Wikipedia taxonomies integrated with WordNet's hypernym hierarchy
es: Televisores: Televisor
es: Televisiónes: Televisión
ru: Телевизорru: Телевизор
hi: दूरदर्शनhi: दूरदर्शन
ja: テレビja: テレビ
en: Televisionen: Television
en:Television
set
en:Television
set
zh: 电视机zh: 电视机
ja: テレビ受像機ja: テレビ受像機
en: TV seten: TV set
en: T.V.en: T.V.
V1 ,u
V1 ,u
V1 ,v
V1 ,v
• LP for constraint-based computation of equivalence classes of entities• Region Growing approximation algorithm
• Link multilingual words to WordNet• Connect Wikipedia with WordNet (equivalence and
taxonomic links)
• FrameNet Linking• Common-Sense Knowledge Extraction
• Multilingual Roget's Thesaurus