Piek Vossen VU University Amsterdam

  • Published on
    11-Feb-2016

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning. Piek Vossen VU University Amsterdam. Overview. Wordnet, EuroWordNet Global Wordnet Grid Stevin project Cornetto 7 th Frame work project KYOTO. WordNet. http://wordnet.princeton.edu/ - PowerPoint PPT Presentation

Transcript

  • From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaningPiek Vossen

    VU University Amsterdam

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • OverviewWordnet, EuroWordNetGlobal Wordnet GridStevin project Cornetto7th Frame work project KYOTO

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • WordNethttp://wordnet.princeton.edu/Lexical semantic database for EnglishDeveloped by George Miller and his team at Princeton University, as the implementation of a mental model of the lexiconOrganized around the notion of a synset: a set of synonyms in a language that represent a single conceptSemantic relations between concepts (synsets) and not between wordsCurrently covers over 117,000 concepts (synsets) and over 150,000 English words

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Relational model of meaningmanwomanboygirlmanwomanboycatkittendogpuppyanimal

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet: a network of semantically related words{car; auto; automobile; machine; motorcar}hyper(o)nymhyponymmeronymsHyponymy and meronymy relations are: transitive directed

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet Semantic RelationsWN 1.5 starting point

    The synset as a weak notion of synonymy:two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value. (Miller et al. 1993)

    Relations between synsets:ExampleHYPONYMYnoun-to-nouncar/ vehicleverb-to-verbwalk/ moveMERONYMYnoun-to-nounhead/ noseANTONYMYadjective-to-adjectivegood/badverb-to-verbopen/ closeENTAILMENTverb-to-verbbuy/ payCAUSEverb-to-verbkill/ die

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet Data Modelbankfiddleviolinviolistfiddlerstringrec: 12345 financial instituterec: 54321- side of a riverrec: 9876- small string instrumentrec: 65438- musician playing violinrec:42654- musicianrec:25876- string instrumentrec:35576- string of instrumentrec:29551- underweartype-oftype-ofpart-ofVocabulary of a languageConceptsRelations122112polysemypolysemy&synonymypolysemy

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Some observations on Wordnetsynsets are more compact representations for concepts than word meanings in traditional lexiconssynonyms and hypernyms are substitutional variants:begin commenceI once had a canary. The bird got sick. The poor animal died.hyponymy and meronymy chains are important transitive relations for predicting properties and explaining textual properties:object -> artifact -> vehicle -> 4-wheeled vehicle -> carstrict separation of part of speech although concepts are closely related (bed sleep) and are similar (dead death)lexicalization patterns reveal important mental structures

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Lexicalization patterns25 unique beginnersgarbagetreeorganismanimalbirdcanarychurchbuildingartifactobjectplantflowerrosewastethreatentitycommon canaryabbeycrocodiledogbasic level concepts balance of two principles: predict most features apply to most subclasses where most concepts are created amalgamate most parts most abstract level to draw a pictures

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet top level

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Meronymy & picturesbeak

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Meronymy & pictures

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet 3.0 statistics

    POS Unique Synsets Total Strings Word-Sense Pairs Noun117,79882,115146,312Verb11,52913,76725,047Adjective21,47918,15630,002Adverb4,4813,6215,580Totals155,287117,659206,941

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet 3.0 statistics

    POS Monosemous Polysemous Polysemous Words and Senses Words Senses Noun101,86315,93544,449Verb6,2775,25218,770Adjective16,5034,97614,399Adverb3,7487331,832Totals128,39126,89679,450

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnet 3.0 statistics

    POS Average Polysemy Average Polysemy Including Monosemous Words Excluding Monosemous Words Noun1.242.79Verb2.173.57Adjective1.42.71Adverb1.252.5

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • http://www.visuwords.com

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Usage of WordnetMostly used database in language technologyEnormous impact in language technology developmentLargeFree and downloadableEnglish

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Usage of WordnetImprove recall of textual based analysis: Query -> IndexSynonyms: commence beginHypernyms: taxi -> carHyponyms: car -> taxiMeronyms: trunk -> elephantLexical entailments: gun -> shootInferencing:what things can burn?Expression in language generation and translation:alternative words and paraphrases

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Improve recallInformation retrieval: effective on small databases without redundancy, e.g. image captions, video textText classification:expand small training setsreduce training effortQuestion & Answer systemsquestion classification: who, where, what, whenmatch answers to question types

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Improve recallAnaphora resolution:The girl fell off the table. She....The glass fell of the table. It...Coreference resolution:When he moved the furniture, the antique table got damaged. Information extraction (unstructed text to structured databases):generic forms or patterns "vehicle" - > text with specific cases "car"

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Improve recallSummarizers:Sentence selection based on word counts -> concept countsAvoid repetition in summary -> language generation, pick out another synonym or hypernymLimited inferencing: detect locations, people, organisations, etc.

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Enabling technologiesSemantic similarity: what sentences or expressions are semantically similar?Semantic relatedness and textual entailment: smoke entails fire, fire entails damageWord-Senses-DisambiguationErwin Marsi, University of Tilbug, http://daeso.uvt.nl/demos/index.html

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Recall & Precisioncellphonemobilephonesnerve cellpolice cellrecall = doorsnede / relevantprecision = doorsnede / gevondenRecall < 20% for basic search engines!(Blair & Maron 1985)jailneuron

  • Many othersData sparseness for machine learning: hapaxes can be replaced by semantic classes that match classes from the training setUse redundancy for more robustness: spelling correction and speech recognition can built semantic expectations using Wordnet and make better choicesSentiment and opinion miningNatural language learning

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • EuroWordNetThe development of a multilingual database with wordnets for several European languagesFunded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328March 1996 - September 19992.5 Million EURO.http://www.hum.uva.nl/~ewnhttp://www.illc.uva.nl/EuroWordNet/finalresults-ewn.html

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • EuroWordNetLanguages covered: EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, ItalianEuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.Size of vocabulary:EuroWordNet-1: 30,000 concepts - 50,000 word meanings.EuroWordNet-2: 15,000 concepts- 25,000 word meaning.Type of vocabulary: the most frequent words of the languagesall concepts needed to relate more specific concepts

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • EuroWordNet Model I = Language Independent linkII = Link from Language Specific to Inter lingual IndexIII = Language Dependent LinkInter-Lingual-IndexIIII

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Differences in relations between EuroWordNet and WordNet

    Added Features to relations

    Cross-Part-Of-Speech relations

    New relations to differentiate shallow hierarchies

    New interpretations of relations

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • EWN Relationship Labels{airplane}HAS_MERO_PART: conj1 {door}HAS_MERO_PART: conj2 disj1{jet engine}HAS_MERO_PART: conj2 disj2{propeller}

    {door}HAS_HOLO_PART: disj1 {car}HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance}

    Default Interpretation: non-exclusive disjunction

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Overview of the Language Internal relations in EuroWordnetSame Part of Speech relations:HYPERONYMY/HYPONYMYcar - vehicleANTONYMYopen - closeHOLONYMY/MERONYMYhead noseNEAR_SYNONYMYapparatus - machineCross-Part-of-Speech relations:XPOS_NEAR_SYNONYMYdead - death; to adorn - adornmentXPOS_HYPERONYMY/HYPONYMYto love - emotionXPOS_ANTONYMYto live - deadCAUSEdie - deathSUBEVENTbuy - pay; sleep - snoreROLE/INVOLVEDwrite - pencil; hammer - hammerSTATEthe poor - poorMANNERto slurp - noisily BELONG_TO_CLASSRome - city

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Co_Role relationscriminalCO_AGENT_PATIENTvictimnovel writer/ poetCO_AGENT_RESULTnovel/ poemdoughCO_PATIENT_RESULTpastry/ breadphotograpic cameraCO_INSTRUMENT_RESULTphoto

    guitar playerHAS_HYPERONYMplayerCO_AGENT_INSTRUMENTguitarplayerHAS_HYPERONYMpersonROLE_AGENTto play musicCO_AGENT_INSTRUMENTmusical instrumentto play musicHAS_HYPERONYM to makeROLE_INSTRUMENTmusical instrumentguitarHAS_HYPERONYMmusical instrumentCO_INSTRUMENT_AGENTguitar player

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • chronical patient ; mental patientpatientHYPONYM-PROCEDURE-LOCATIONSTATE-CAUSEcure-PATIENTtreatdocterdisease; disorderphysiotherapymedicineetc.hospital, etc.stomach disease, kidney disorder, -PATIENT-AGENTchild docter childco--AGENT-PATIENTHorizontal & vertical semantic relationsHYPONYMHYPONYM

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • The Multilingual DesignInter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages;Index-records are mainly based on WordNet synsets and consist of synonyms, glosses and source references;Various types of complex equivalence relations are distinguished;Equivalence relations from synsets to index records: not on a word-to-word basis;Indirect matching of synsets linked to the same index items;

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Equivalent Near Synonym1. Multiple Targets (1:many)Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5: make clean by removing dirt, filth, or unwanted substances from remove unwanted substances from, such as feathers or pits, as of chickens or fruit remove in making clean; "Clean the spots off the rug" remove unwanted substances from - (as in chemistry)2. Multiple Sources (many:1)Dutch wordnet: versiersel near_synonym versiering ILI-Record:decoration.3. Multiple Targets and Sources (many:many)Dutch wordnet: toestel near_synonym apparaatILI-records:machine; device; apparatus; tool

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Equivalent HyperonymyTypically used for gaps in English WordNet:

    genuine, cultural gaps for things not known in English culture:Dutch: klunen, to walk on skates over land from one frozen water to the other

    pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English: Dutch: kunststof = artifact substance artifact object

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • EuroWordNet statistics

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

    Synsets

    No. of senses

    Sens./

    syns.

    Entries

    Sens./

    entry

    LIRels.

    LIRels/

    syns

    EQRels-ILI

    EQRels/syn

    Synsets without ILI

    Dutch

    44015

    70201

    1,59

    56283

    1,25

    111639

    2,54

    53448

    1,21

    7203

    Spanish

    23370

    50526

    2,16

    27933

    1,81

    55163

    2,36

    21236

    0,91

    0

    Italian

    40428

    48499

    1,20

    32978

    1,47

    117068

    2,90

    71789

    1,78

    1561

    French

    22745

    32809

    1.44

    18777

    1.75

    49494

    2.18

    22730

    1.00

    20

    German

    15132

    20453

    1.35

    17098

    1.20

    34818

    2.30

    16347

    1.08

    0

    Czech

    12824

    19949

    1.56

    12283

    1.62

    26259

    2.05

    12824

    1.00

    0

    Estonian

    7678

    13839

    1.80

    10961

    1.26

    16318

    2.13

    9004

    1.17

    0

    English

    16361

    40588

    2,48

    17320

    2,34

    42140

    2,58

    n.a.

    n.a.

    n.a.

    WN15

    94515

    187602

    1,98

    126617

    1,48

    211375

    2,24

    n.a.

    n.a.

    n.a.

  • Wordnets as semantic structuresWordnets are unique language-specific structures:same organizational principles: synset structure and same set of semantic relations. different lexicalizationsdifferences in synonymy and homonymy:"decoration" in English versus "versiersel/versiering" in Dutch"bank" in English (money/river) versus "bank" in Dutch (money/furniture)BUT also different relations for similar synsets

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Autonomous & Language-Specificvoorwerp{object}lepel{spoon}werktuig{tool}tas{bag}bak{box}blok{block}lichaam{body}Wordnet1.5Dutch Wordnet

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Linguistic versus Artificial OntologiesArtificial ontology: better control or performance, or a more compact and coherent structure. introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise).

    What properties can we infer for spoons?spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Linguistic versus Artificial OntologiesLinguistic ontology: Exactly reflects the relations between all the lexicalized words and expressions in a language. Captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language.

    What words can be used to name spoons?spoon -> object, tableware, silverware, merchandise, cutlery,

    Guest lecture, Language Engineering Applications, February, 26th 2009, Leuven

  • Wordnets versu...

Recommended

View more >