Wordnet, EuroWordNet, Global Wordnet Piek Vossen Piek.Vossen@irion.nl http://www.globalwordnet.org

  • Published on
    26-Mar-2015

  • View
    216

  • Download
    3

Embed Size (px)

Transcript

  • Slide 1

Wordnet, EuroWordNet, Global Wordnet Piek Vossen Piek.Vossen@irion.nl http://www.globalwordnet.org Slide 2 Overview Princeton WordNet (1980 - ongoing) EuroWordNet (1996 - 1999) The database design The general building strategy Towards a universal index of meaning Global WordNet Association (2001 - ongoing) Other wordnets BalkaNet (2001 - 2004) IndoWordnet (2002 - ongoing) Meaning (2002 - 2005) Slide 3 WordNet1.5 Developed at Princeton by George Miller and his team as a model of the mental lexicon. Semantic network in which concepts are defined in terms of relations to other concepts. Structure: organized around the notion of synsets (sets of synonymous words) basic semantic relations between these synsets Initially no glosses Main revision after tagging the Brown corpus with word meanings: SemCor. http://www.cogsci.princeton.edu/~wn/w3wn.html http://www.cogsci.princeton.edu/~wn/w3wn.html Slide 4 Structure of WordNet1.5 Slide 5 EuroWordNet The development of a multilingual database with wordnets for several European languages Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328 March 1996 - September 1999 2.5 Million EURO. http://www.hum.uva.nl/~ewn URL: http://www.hum.uva.nl/~ewn Slide 6 Objectives of EuroWordNet Languages covered: EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian. Size of vocabulary: EuroWordNet-1: 30,000 concepts - 50,000 word meanings. EuroWordNet-2: 15,000 concepts- 25,000 word meaning. Type of vocabulary: the most frequent words of the languages all concepts needed to relate more specific concepts Slide 7 Consortium Slide 8 The basic principles of EuroWordNet the structure of the Princeton WordNet the design of the EuroWordNet database wordnets as language-specific structures the language-internal relations the multilingual relations Slide 9 Specific features of EuroWordNet it contains semantic lexicons for other languages than English. each wordnet reflects the relations as a language-internal system, maintaining cultural and linguistic differences in the wordnets. it contains multilingual relations from each wordnet to English meanings, which makes it possible to compare the wordnets, tracking down inconsistencies and cross-linguistic differences. each wordnet is linked to a language independent top-ontology and to domain labels. Slide 10 Autonomous & Language-Specific voorwerp {object} lepel {spoon} werktuig{tool} tas {bag} bak {box} blok {block} lichaam {body} Wordnet1.5Dutch Wordnet bag spoon box object natural object (an object occurring naturally) artifact, artefact (a man-made object) instrumentality blockbody container device implement tool instrument Slide 11 Differences in structure Artificial Classes versus Lexicalized Classes: instrumentality; natural object Lexicalization differences of classes: container and artifact (object) are not lexicalized in Dutch What is the purpose of different hierarchies? Should we include all lexicalized classes from all (8) languages? Slide 12 Conceptual ontology: A particular level or structuring may be required to achieve a better control or performance, or a more compact and coherent structure. introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise ). What properties can we infer for spoons? spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking Linguistic versus Conceptual Ontologies Slide 13 Linguistic ontology: Exactly reflects the relations between all the lexicalized words and expressions in a language. It therefore captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language. What words can be used to name spoons? spoon -> object, tableware, silverware, merchandise, cutlery, Slide 14 Separate Wordnets and Ontologies ReferenceOntologyClasses: BOX ContainerProduct; SolidTangibleThing Language-Neutral Ontology object box container box container WordNet1.5 Language-Specific Wordnets doos voorwerp Dutch Wordnet EuroWordNet Top-Ontology: Form: Cubic Function: Contain Origin: Artifact Composition: Whole Slide 15 Wordnets versus ontologies Wordnets: autonomous language-specific lexicalization patterns in a relational network. Usage: to predict substitution in text for information retrieval, text generation, machine translation, word-sense- disambiguation. Ontologies: data structure with formally defined concepts. Usage: making semantic inferences. Slide 16 Classical Substitution Principle: Any word that is used to refer to something can be replaced by its synonyms, hyperonyms and hyponyms: horse stallion, mare, pony, mammal, animal, being. It cannot be referred to by co-hyponyms and co-hyponyms of its hyperonyms: horseXcat, dog, camel, fish, plant, person, object. Conceptual Distance Measurement: Number of hierarchical nodes between words is a measurement of closeness, where the level and the local density of nodes are additional factors. Wordnets as Linguistic Ontologies Slide 17 Linguistic Principles for deriving relations 1. Substitution tests (Cruse 1986): 1a.It is a fiddle therefore it is a violin. bIt is a violin therefore it is a fiddle. 2a.It is a dog therefore it is an animal. b*It is an animal therefore it is a dog. 3ato kill (/a murder) causes to die (/ death) to kill (/a murder) has to die (/ death) as a consequence b*to die / death causes to kill *to die / death has to kill as a consequence Slide 18 Linguistic Principles for deriving relations 2. Principle of Economy (Dik 1978): If a word W 1 (animal) is the hyperonym of W 2 (mammal) and W 2 is the hyperonym of W 3 (dog) then W 3 (dog) should not be linked to W 1 (animal) but to W 2 (mammal). 3. Principle of Compatibility If a word W 1 is related to W 2 via relation R 1, W 1 and W 2 cannot be related via relation R n, where R n is defined as a distinct relation from R 1. Slide 19 Architecture of the EuroWordNet Data Base I I = Language Independent link II = Link from Language Specific to Inter lingual Index III = Language Dependent Link II Lexical Items Table bewegen gaan rijden berijden III guidare III Lexical Items Table cavalcare andare muoversi ILI-record {drive} Inter-Lingual-Index I Lexical Items Table driveride move go III Ontology 2OrderEntity LocationDynamic Lexical Items Table cabalgar jinetear III conducir mover transitar Domains Traffic AirRoad` III II Slide 20 The mono-lingual design of EuroWordNet Slide 21 Language Internal Relations WN 1.5 starting point The synset as a weak notion of synonymy: two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value. (Miller et al. 1993) Relations between synsets: RelationPOS-combinationExample ANTONYMYadjective-to-adjective verb-to-verbopen/ close HYPONYMYnoun-to-nouncar/ vehicle verb-to-verbwalk/ move MERONYMYnoun-to-nounhead/ nose ENTAILMENTverb-to-verbbuy/ pay CAUSEverb-to-verbkill/ die Slide 22 Differences EuroWordNet/WordNet1.5 Added Features to relations Cross-Part-Of-Speech relations New relations to differentiate shallow hierarchies New interpretations of relations Slide 23 EWN Relationship Labels Disjunction/Conjunction of multiple relations of the same type WordNet1.5 door1 -- (a swinging or sliding barrier that will close the entrance to a room or building; "he knocked on the door"; "he slammed the door as he left") PART OF: doorway, door, entree, entry, portal, room access door 6 -- (a swinging or sliding barrier that will close off access into a car; "she forgot to lock the doors of her car") PART OF: car, auto, automobile, machine, motorcar. Slide 24 EWN Relationship Labels {airplane}HAS_MERO_PART: conj1 {door} HAS_MERO_PART: conj2 disj1{jet engine} HAS_MERO_PART: conj2 disj2{propeller} {door}HAS_HOLO_PART: disj1 {car} HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance} {dog} HAS_HYPERONYM: conj1{mammal} HAS_HYPERONYM: conj2{pet} {albino}HAS_HYPERONYM: disj1{plant} HAS_HYPERONYM: dis2{animal} Default Interpretation: non-exclusive disjunction Slide 25 EWN Relationship Labels Disjunction/Conjunction of multiple relations of the same type { {dog} HAS_HYPONYM: dis1{poodle} HAS_HYPONYM: dis1{labrador} HAS_HYPONYM: {sheep dog}(Orthogonal) HAS_HYPONYM: {watch dog}(Orthogonal) Default Interpretation: non-exclusive disjunction Slide 26 Factive/Non-factive CAUSES (Lyons 1977) factive (default interpretation): to kill causes to die: {kill}CAUSES{die} non-factive: E 1 probably or likely causes event E 2 or E 1 is intended to cause some event E 2 : to search may cause to find. {search}CAUSES {find} non-factive EWN Relationship Labels Slide 27 Reversed In the database every relation must have a reverse counter-part but there is a difference between relations which are explicitly coded as reverse and automatically reversed relations: {finger} HAS_HOLONYM{hand} {hand}HAS_MERONYM{finger} {paper-clip} HAS_MER_MADE_OF{metal} {metal}HAS_HOL_MADE_OF{paper-clip} reversed Negation {monkey}HAS_MERO_PART{tail} {ape}HAS_MERO_PART {tail} not Slide 28 Cross-Part-Of-Speech relations WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy: adornment 2 change of state-- (the act of changing something) adorn 1 change, alter-- (cause to change; make different) EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations: {adorn V}XPOS_NEAR_SYNONYM{adornment N} Slide 29 The advantages of such explicit cross-part-of-speech relations are: similar words with different parts of speech are grouped together. the same information can be coded in an NP or in a sentence. By unifying higher-order nouns and verbs in the same ontology it will be possible to match expressions with very different syntactic structures but comparable content by merging verbs and abstract nouns we can more easily link mismatches across languages that involve a part-of-speech shift. Dutch nouns such as afsluiting, gehuil are translated with the English verbs close and cry, respectively. Cross-Part-Of-Speech relations Slide 30 Entailment in WordNet WordNet1.5: Entailment indicates the direction of the implication or entailment: a. + Temporal Inclusion (the two situations partially or totally overlap) a.1 co-extensiveness (e. g., to limp/to walk) hyponymy/troponymy a.2 proper inclusion (e.g., to snore/to sleep)entailment b. - Temporal Exclusion (the two situations are temporally disjoint) b.1 backward presupposition (e.g., to succeed/to try)entailment b.2 cause (e.g., to give/to have) Slide 31 Subevents in EuroWordNet EuroWordNet Direction of the entailment is expressed by the labels factive and reversed: {to succeed} is_caused_by{to try}factive {to try}causes{to succeed}non-factive Proper inclusion is described by the has_subevent/ is_subevent_of relation in combination with the label reversed: {to snore}is_subevent_of{to sleep} {to sleep}has_subevent{to snore}reversed {to buy}has_subevent{to pay} {to pay}is_subevent_of{to buy}reversed Slide 32 The interpretation of the CAUSE relation WordNet1.5: The causal relation only holds between verbs and it should only apply to temporally disjoint situations: EuroWordNet: the causal relation will also be applied across different parts of speech: {to kill} Vcauses{death} N {death} nis_caused_by{to kill} vreversed {to kill } vcauses{dead} a {dead} ais_caused_by{to kill} vreversed {murder} ncauses{death}n {death} ais_caused_by{murder} nreversed Slide 33 The interpretation of the CAUSE relation Various temporal relationships between the (dynamic/non- dynamic) situations may hold: Temporally disjoint: there is no time point when dS 1 takes place and also S 2 (which is caused by dS 1 ) (e.g. to shoot/to hit); Temporally overlapping: there is at least one time point when both dS 1 and S 2 take place, and there is at least one time point when dS 1 takes place and S 2 (which is caused by dS 1 ) does not yet take place (e.g. to teach/to learn); Temporally co-extensive: whenever dS 1 takes place also S 2 (which is caused by dS 1 ) takes place and there is no time point when dS 1 takes place and S 2 does not take place, and vice versa (e.g. to feed/to eat). Slide 34 Role relations In the case of many verbs and nouns the most salient relation is not the hyperonym but the relation between the event and the involved participants. These relations are expressed as follows: {hammer}ROLE_INSTRUMENT{to hammer} {to hammer}INVOLVED_INSTRUMENT{hammer}reversed {school}ROLE_LOCATION {to teach} {to teach}INVOLVED_LOCATION {school}reversed These relations are typically used when other relations, mainly hyponymy, do not clarify the position of the concept network, but the word is still closely related to another word. Slide 35 Co_Role relations guitar playerHAS_HYPERONYMplayer CO_AGENT_INSTRUMENTguitar player HAS_HYPERONYMperson ROLE_AGENTto play music CO_AGENT_INSTRUMENTmusical instrument to play musicHAS_HYPERONYM to make ROLE_INSTRUMENTmusical instrument guitarHAS_HYPERONYMmusical instrument CO_INSTRUMENT_AGENTguitar player ice saw HAS_HYPERONYMsaw CO_INSTRUMENT_PATIENTice sawHAS_HYPERONYMsaw ROLE_INSTRUMENTto saw iceCO_PATIENT_INSTRUMENTice saw REVERSED Slide 36 Co_Role relations Examples of the other relations are: criminalCO_AGENT_PATIENTvictim novel writer/ poetCO_AGENT_RESULTnovel/ poem doughCO_PATIENT_RESULTpastry/ bread photograpic cameraCO_INSTRUMENT_RESULTphoto Slide 37 BE_IN_STATE and STATE_OF Example:the poor are the ones to whom the state poor applies Effect:poor NHAS_HYPERONYMperson N poor NBE_IN_STATEpoor A poor ASTATE_OFpoor N reversed IN_MANNER and MANNER_OF Example:to slurp is to eat in a noisely manner Effect:slurp VHAS_HYPERONYMeat V slurp VIN_MANNERnoisely Adverb noisely AdverbMANNER_OFslurp V reversed Slide 38 Overview of the Language Internal relations in EuroWordnet Same Part of Speech relations: NEAR_SYNONYMYapparatus - machine HYPERONYMY/HYPONYMYcar - vehicle ANTONYMYopen - close HOLONYMY/MERONYMYhead - nose Cross-Part-of-Speech relations: XPOS_NEAR_SYNONYMYdead - death; to adorn - adornment XPOS_HYPERONYMY/HYPONYMYto love - emotion XPOS_ANTONYMYto live - dead CAUSEdie - death SUBEVENTbuy - pay; sleep - snore ROLE/INVOLVEDwrite - pencil; hammer - hammer STATEthe poor - poor MANNERto slurp - noisily BELONG_TO_CLASSRome - city Slide 39 Thematic networks behandelen(treat) zieke (sick person, patient) genezen (to get well) arts (doctor) scalpel opereren (operate) persoon (person) wezen(being) organisme (organism) orgaan (organ) maag (stomach) maagaandoening (stomach disease) ziekte (disease) Agent Patient Causes Patient Involves Instrument Part of Patient Slide 40 The multi-lingual de...