Wordnet, EuroWordNet, Global Wordnet Piek Vossen Piek.Vossen@irion.nl http://www.globalwordnet.org

  • View
    216

  • Download
    3

Embed Size (px)

Transcript

  • Slide 1

Wordnet, EuroWordNet, Global Wordnet Piek Vossen Piek.Vossen@irion.nl http://www.globalwordnet.org Slide 2 Overview Princeton WordNet (1980 - ongoing) EuroWordNet (1996 - 1999) The database design The general building strategy Towards a universal index of meaning Global WordNet Association (2001 - ongoing) Other wordnets BalkaNet (2001 - 2004) IndoWordnet (2002 - ongoing) Meaning (2002 - 2005) Slide 3 WordNet1.5 Developed at Princeton by George Miller and his team as a model of the mental lexicon. Semantic network in which concepts are defined in terms of relations to other concepts. Structure: organized around the notion of synsets (sets of synonymous words) basic semantic relations between these synsets Initially no glosses Main revision after tagging the Brown corpus with word meanings: SemCor. http://www.cogsci.princeton.edu/~wn/w3wn.html http://www.cogsci.princeton.edu/~wn/w3wn.html Slide 4 Structure of WordNet1.5 Slide 5 EuroWordNet The development of a multilingual database with wordnets for several European languages Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328 March 1996 - September 1999 2.5 Million EURO. http://www.hum.uva.nl/~ewn URL: http://www.hum.uva.nl/~ewn Slide 6 Objectives of EuroWordNet Languages covered: EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian. Size of vocabulary: EuroWordNet-1: 30,000 concepts - 50,000 word meanings. EuroWordNet-2: 15,000 concepts- 25,000 word meaning. Type of vocabulary: the most frequent words of the languages all concepts needed to relate more specific concepts Slide 7 Consortium Slide 8 The basic principles of EuroWordNet the structure of the Princeton WordNet the design of the EuroWordNet database wordnets as language-specific structures the language-internal relations the multilingual relations Slide 9 Specific features of EuroWordNet it contains semantic lexicons for other languages than English. each wordnet reflects the relations as a language-internal system, maintaining cultural and linguistic differences in the wordnets. it contains multilingual relations from each wordnet to English meanings, which makes it possible to compare the wordnets, tracking down inconsistencies and cross-linguistic differences. each wordnet is linked to a language independent top-ontology and to domain labels. Slide 10 Autonomous & Language-Specific voorwerp {object} lepel {spoon} werktuig{tool} tas {bag} bak {box} blok {block} lichaam {body} Wordnet1.5Dutch Wordnet bag spoon box object natural object (an object occurring naturally) artifact, artefact (a man-made object) instrumentality blockbody container device implement tool instrument Slide 11 Differences in structure Artificial Classes versus Lexicalized Classes: instrumentality; natural object Lexicalization differences of classes: container and artifact (object) are not lexicalized in Dutch What is the purpose of different hierarchies? Should we include all lexicalized classes from all (8) languages? Slide 12 Conceptual ontology: A particular level or structuring may be required to achieve a better control or performance, or a more compact and coherent structure. introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise ). What properties can we infer for spoons? spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking Linguistic versus Conceptual Ontologies Slide 13 Linguistic ontology: Exactly reflects the relations between all the lexicalized words and expressions in a language. It therefore captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language. What words can be used to name spoons? spoon -> object, tableware, silverware, merchandise, cutlery, Slide 14 Separate Wordnets and Ontologies ReferenceOntologyClasses: BOX ContainerProduct; SolidTangibleThing Language-Neutral Ontology object box container box container WordNet1.5 Language-Specific Wordnets doos voorwerp Dutch Wordnet EuroWordNet Top-Ontology: Form: Cubic Function: Contain Origin: Artifact Composition: Whole Slide 15 Wordnets versus ontologies Wordnets: autonomous language-specific lexicalization patterns in a relational network. Usage: to predict substitution in text for information retrieval, text generation, machine translation, word-sense- disambiguation. Ontologies: data structure with formally defined concepts. Usage: making semantic inferences. Slide 16 Classical Substitution Principle: Any word that is used to refer to something can be replaced by its synonyms, hyperonyms and hyponyms: horse stallion, mare, pony, mammal, animal, being. It cannot be referred to by co-hyponyms and co-hyponyms of its hyperonyms: horseXcat, dog, camel, fish, plant, person, object. Conceptual Distance Measurement: Number of hierarchical nodes between words is a measurement of closeness, where the level and the local density of nodes are additional factors. Wordnets as Linguistic Ontologies Slide 17 Linguistic Principles for deriving relations 1. Substitution tests (Cruse 1986): 1a.It is a fiddle therefore it is a violin. bIt is a violin therefore it is a fiddle. 2a.It is a dog therefore it is an animal. b*It is an animal therefore it is a dog. 3ato kill (/a murder) causes to die (/ death) to kill (/a murder) has to die (/ death) as a consequence b*to die / death causes to kill *to die / death has to kill as a consequence Slide 18 Linguistic Principles for deriving relations 2. Principle of Economy (Dik 1978): If a word W 1 (animal) is the hyperonym of W 2 (mammal) and W 2 is the hyperonym of W 3 (dog) then W 3 (dog) should not be linked to W 1 (animal) but to W 2 (mammal). 3. Principle of Compatibility If a word W 1 is related to W 2 via relation R 1, W 1 and W 2 cannot be related via relation R n, where R n is defined as a distinct relation from R 1. Slide 19 Architecture of the EuroWordNet Data Base I I = Language Independent link II = Link from Language Specific to Inter lingual Index III = Language Dependent Link II Lexical Items Table bewegen gaan rijden berijden III guidare III Lexical Items Table cavalcare andare muoversi ILI-record {drive} Inter-Lingual-Index I Lexical Items Table driveride move go III Ontology 2OrderEntity LocationDynamic Lexical Items Table cabalgar jinetear III conducir mover transitar Domains Traffic AirRoad` III II Slide 20 The mono-lingual design of EuroWordNet Slide 21 Language Internal Relations WN 1.5 starting point The synset as a weak notion of synonymy: two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value. (Miller et al. 1993) Relations between synsets: RelationPOS-combinationExample ANTONYMYadjective-to-adjective verb-to-verbopen/ close HYPONYMYnoun-to-nouncar/ vehicle verb-to-verbwalk/ move MERONYMYnoun-to-nounhead/ nose ENTAILMENTverb-to-verbbuy/ pay CAUSEverb-to-verbkill/ die Slide 22 Differences EuroWordNet/WordNet1.5 Added Features to relations Cross-Part-Of-Speech relations New relations to differentiate shallow hierarchies New interpretations of relations Slide 23 EWN Relationship Labels Disjunction/Conjunction of multiple relations of the same type WordNet1.5 door1 -- (a swinging or sliding barrier that will close the entrance to a room or building; "he knocked on the door"; "he slammed the door as he left") PART OF: doorway, door, entree, entry, portal, room access door 6 -- (a swinging or sliding barrier that will close off access into a car; "she forgot to lock the doors of her car") PART OF: car, auto, automobile, machine, motorcar. Slide 24 EWN Relationship Labels {airplane}HAS_MERO_PART: conj1 {door} HAS_MERO_PART: conj2 disj1{jet engine} HAS_MERO_PART: conj2 disj2{propeller} {door}HAS_HOLO_PART: disj1 {car} HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance} {dog} HAS_HYPERONYM: conj1{mammal} HAS_HYPERONYM: conj2{pet} {albino}HAS_HYPERONYM: disj1{plant} HAS_HYPERONYM: dis2{animal} Default Interpretation: non-exclusive disjunction Slide 25 EWN Relationship Labels Disjunction/Conjunction of multiple relations of the same type { {dog} HAS_HYPONYM: dis1{poodle} HAS_HYPONYM: dis1{labrador} HAS_HYPONYM: {sheep dog}(Orthogonal) HAS_HYPONYM: {watch dog}(Orthogonal) Default Interpretation: non-exclusive disjunction Slide 26 Factive/Non-factive CAUSES (Lyons 1977) factive (default interpretation): to kill causes to die: {kill}CAUSES{die} non-factive: E 1 probably or likely causes event E 2 or E 1 is intended to cause some event E 2 : to search may cause to find. {search}CAUSES {find} non-factive EWN Relationship Labels Slide 27 Reversed In the database every relation must have a reverse counter-part but there is a difference between relations which are explicitly coded as reverse and automatically reversed relations: {finger} HAS_HOLONYM{hand} {hand}HAS_MERONYM{finger} {paper-clip} HAS_MER_MADE_OF{metal} {metal}HAS_HOL_MADE_OF{paper-clip} reversed Negation {monkey}HAS_MERO_PART{tail} {ape}HAS_MERO_PART {tail} not Slide 28 Cross-Part-Of-Speech relations WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy: adornment 2 change of state-- (the act of changing something) adorn 1 change, alter-- (cause to change; make different) EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations: {adorn V}XPOS_NEAR_SYNONYM{adornment N} Slide 29 The advantages of such explicit cross-part-of-speech relations are: similar words with different parts of speech are grouped together. the same information can be coded in an NP or in a sentence.