91
AI and NLP AI and NLP German Rigau i Claramunt [email protected] IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP German Rigau i Claramunt [email protected] IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

Embed Size (px)

Citation preview

Page 1: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

AI and NLP

German Rigau i [email protected]

IXA groupDepartamento de Lenguajes y Sistemas Informáticos

UPV/EHU

Page 2: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 3: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 4: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 5: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

AI and NLP

Current AI challanges

Natural Language Understanding Image/video Understanding Process/agents/services Understanding Database Understanding Web Understanding ...

Page 6: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

AI and NLP

Page 7: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 8: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 9: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 10: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

AI and NLP

A Machine Translation tale

The spirit is willing, but the flesh is weak

The vodka is strong, but the meat is rotten

Page 11: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 12: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 13: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 14: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

CIRCA Technology• Ontology with millions of

words, meanings and and

relationships

• Created by a team of

experienced

lexicographers and

computational linguists

• Automatic expansion

coupled with manual

quality supervision

• Includes slang, pop, and

nonstandard usages

• Language independent

• Ranks among the most

comprehensive lexicons in

the world

Page 15: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 16: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 17: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 18: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

EFE scenario

Page 19: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

News Article 10 TOPIC = TERRORISMOCONTEXT = Sigue la violencia en Colombia y especialmente en Medellín. GOAL = Un entierro en Medellín.

QUERY = entierro medellín TEXT = sepelio medellín RESULT = FH_1205173 20040524RESULT = FH_1205172 20040524

<entierro #35, sepelio #14, enterramiento #7> = <funeral>

MEANING: WP8 User validation

Page 20: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: WP8 User validation: CLIR

Page 21: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: WP8 User validation: CLIR

Page 22: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 23: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 23

AI and NLP Setting

From Cyc

Fred saw the plane flying over Zurich.

Fred saw the mountains flying over Zurich.

Page 24: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 24

From NLP to NLU Large-scale Semantic Processing dealing

with concepts (senses) rather than words Two complementary problems:

Acquisition bottleneck Autonomous large-scale knowledge acquisition

systems

Ambiguity Highly accurate and robust semantic systems

AI and NLPSetting

Page 25: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Which Knowledge is needed by a concrete NLP system?

Where is this Knowledge located?

Which automatic procedures can be applied?

AI and NLPIntroduction

Page 26: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Which Knowledge is needed by a concrete NLP system?

Phonology: phonemes, stress, etc. Morphology: POS, etc. Syntactic: category, subcat., etc. Semantic: class, SRs, etc. Pragmatic: usage, registers, TDs, etc. Translations: translation links

AI and NLPIntroduction

Page 27: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Where is this Knowledge located?

Human brain Structured Lexical Resources:

Monolingual and bilingual MRDs Thesauri

Unstructured Lexical Resources: Monolingual and bilingual Corpora

Mixing resources

AI and NLPIntroduction

Page 28: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Which automatic procedures can be applied?

Prescriptive approach Machine-aided manual construction

Descriptive approach Automatic acquisition from pre-existing

Lexical Resources

Mixed approach

AI and NLPIntroduction

Page 29: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

jardín_1_1 Terreno donde se cultivan plantas y flores ornamentales.

florero_1_4 Maceta con flores.ramo_1_3 Conjunto natural o artificial de flores, ramas o

hierbas.pétalo_1_1 Hoja que forma la corola de la flor. tálamo_1_3 Receptáculo de la flor. miel_1_1 Substancia viscosa y muy dulce que elaboran las abejas, en

una distensión del esófago, con el jugo de las flores y luego depositan en las celdillas de sus panales.

florería_1_1 Floristería; tienda o puesto donde se venden flores. florista_1_1 Persona que tiene por oficio hacer o vender flores.camelia_1_1 Arbusto cameliáceo de jardín, originario de Oriente,

de hojas perennes y lustrosas, y flores grandes, blancas, rojas o rosadas (Camellia japonica).

camelia_1_2 Flor de este arbusto. rosa_1_1 Flor del rosal.

AI and NLPMRDs and Semantic Knowledge

Page 30: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 30

Dealing with ambiguity

Morphosyntactic:

Yo bajo con el hombre bajo a tocar el bajo bajo la escalera. Syntactic:

Zapatos de piel de señora

Semantic:Baja al banco y espérame allí

Se dieron pasteles a los niños

...

Basic NLP levels Introduction

Page 31: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 31

Morphological Analysis

Setting Systems

Morpholexical relationships (Octavio Santana) Freeling (Lluís Padró) ...

Page 32: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 32

Morphological Analysis

Morphology deals with the orthographic form of the words

Morphological processes Inflection: prefixes + root + suffixes (root, lemma,

form) Derivation: change of category

Multi-word expressions: compounds, idioms, phrasal verbs, ...

Grammatical categories, parts-of-speech Open categories and closed (functional) categories Lexicon POS tags

Page 33: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 33

Morphological Analysis

Main Parts-of-Speech Open class words

Noun: common noun, proper noun (gender, number, ...) Adjective: atributive, comparative ... Verb: (number, person, mode, tense), auxiliary verbs Adverb: place, time, manner, degree, ...

Closed class words Pronoun: nominative, accusative, ... (anaphora) Determiner: articles, demonstratives, quantifiers ... Preposition: Conjunction:

Page 34: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 34

Page 35: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 36: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 36

Named Entity Recognition and Classification

Setting CONLL'02, CONLL'03 shared tasks Systems

Cognitive Computation Group (Dan Roth) Freeling ...

Page 37: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 37

Named Entity Recognition and Classification (NERC)Setting

Named Entity Recognition (NER) is a subtask of Information Extraction. Different NER systems were evaluated as a part of the Sixth Message Understanding Conference in 1995 (MUC6)

Named entities are phrases that contain the names of persons, organizations, locations, times and quantities.

[ORG U.N. ] official [PER Ekeus ] heads for [LOC Baghdad ] .

Page 38: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 38

Parsing (Syntactic Analysis)

Setting PARSEVAL Systems

RASP (John Carroll & Ted Briscoe) Minipar (Dekang Lin) VISL (Eckhard Bick) Freeling ...

Page 39: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 39

Parsing (Syntactic Analysis)

Syntax and grammar Phrase structure

Word order Syntagma, phrase, constituent

NP, VP, AP, head, relative clause

Grammars Syntax vs. lexicon Coverage: complete, partial ... Chunking, clausing, ... Context-free grammars

Terminals, no terminals, parse trees, recursivity Non-local dependencies

The woman who found the wallet were given a reward

Page 40: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 40

Word Sense Disambiguation

Setting WSD Tutorial (Ted Pedersen & Rada Mihalcea 05) SENSEVAL 1, 2 and 3 ... Systems

Knowledge-based WSD Conceptual Distance (Ted Pedersen) Structural Semantic Interconections (Roberto

Navigli) Corpus-based WSD

GAMBL (Walter Daelemans) ...

Page 41: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 41

Word Sense DisambiguationSetting

WSD is the problem of assigning the appropriate meaning (sense) to a given word in a text

“WSD is perhaps the great open problem at the lexical level of NLP” (Resnik & Yarowsky 97)

WSD resolution would allow: acquisition of knowledge: SCF, Selectional

Preferences, Predicate Models, etc. improve existing Parsing, IR, IE Machine Translation Natural Language Understanding ...

Page 42: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 42

Word Sense DisambiguationSetting

WSD is the problem of assigning the appropriate meaning (sense) to a given word in a text

“WSD is perhaps the great open problem at the lexical level of NLP” (Resnik & Yarowsky 97)

WSD resolution would allow: acquisition of knowledge: SCF, Selectional

Preferences, Predicate Models, etc. improve existing Parsing, IR, IE Machine Translation Natural Language Understanding ...

Page 43: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 43

From Financial Times

GM’s drive to make Saturn a star again

Word Sense DisambiguationSetting

Page 44: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 44

From Financial Times

GM’s drive to make Saturn a star again

car manufacturer, car maker, carmaker_1, auto manufacturer, auto maker, automaker -- a business engaged in the manufacture of automobiles

campaign, cause, crusade, drive_3, movement, effort -- a series of actions advancing a principle or tending toward a particular end

car_1, auto, automobile, machine, motorcar -- 4-wheeled motor vehicle; usually propelled by an internal combustion engine; "he needs a car to get to work"

star_5, principal, lead -- an actor who plays a principal role

star_1 -- ((astronomy) a celestial body of hot gases that radiates energy derived from thermonuclear reactions in the interior

figno person

Word Sense DisambiguationSetting

Page 45: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 45

Word Sense DisambiguationSetting

Knowledge-Driven WSD knowledge-based WSD No Training Process (~ unsupervised) Large scale lexical knowledge resources

WordNet, MRDs, Thesaurus, ... 100% coverage 55% accuracy (SensEval)

Page 46: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 46

Word Sense DisambiguationSetting

Corpus-Driven WSD statistical-based WSD Machine-Learning WSD Training Process (~ supervised)

learning from sense annotated corpora (Ng 97) effort of 16 man/year per year per

language no full coverage 70% accuracy (SensEval)

Page 47: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 47

Semantic Role Labeling

Setting SRL Tutorial (Lluís Màrquez 05) CONLL'05 shared task Systems

Knowledge-based WSD Corpus-based WSD

Memory-based Shallow Parser (Walter Daelemans)

Cognitive Computation Group (Dan Roth) Center for Spoken Language Research

(Colorado) ...

Page 48: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 48

Semantic Role LabelingSetting

SRL is the problem of recognizing and labeling semantic roles of a predicate

A semantic role in language is the relationship that a syntactic constituent has with a predicate.

Typical semantic arguments include: Agent, Patient, Instrument, etc.

and also adjunctive arguments: Locative, Temporal, Manner, Cause, etc.

Useful for answering "Who", "When", "What", "Where", "Why", etc. IE, QA, Summarization and Semantic

Interpretation

Page 49: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 49

Semantic Role LabelingSetting

From PropBank

[A0 He ] [AM-MOD would ] [AM-NEG n't ] [V accept ][A1 anything of value ] from [A2 those he was writing about ] .

Roleset V: verb

A0: acceptor

A1: thing accepted

A2: accepted-from

A3: attribute

AM-MOD: modal

AM-NEG: negation

Page 50: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

German was hungry.He opened the refrigerator.

hungry (feeling a need or desire to eat)

eat (take in solid food)

refrigerator (an appliance in which foods can be stored at low temperature)

WordNetText Inferences

Page 51: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

WordNet & EuroWordNetWordNet

Princeton University (Miller et al. 1990, Fellbaum 98)

Lexicalised concepts (words, compounds, multiwords)

Synset: synonym set (of words) Large semantic net conecting synsets

synonymy, antonymy, hyperonymy, hyponymy meronymy, implication, causation ...

Structure Noun hierarchy depth ~12 Verb hierarchy depth ~3 Adjective/adverb not in hierarchy, but in star structure

Freely available: http://wordnet.princeton.edu/ Extensively used in CompLing, but not very useful

yet Except: definitions converted to axioms and used for theorem

proving in automated QA (Moldovan et al.), CLIR (Meaning)

Page 52: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Synonymy

Bettween senses (or variants) in synsets Weak notion of synonimy: from context Synset: set of words expressing the same

concept in a particular context (SemCor)

Hyperonymy / Hyponymy Class/subclass relation

{lion} -> {feline}

WordNet & EuroWordNetWordNet Semantic Relations

Page 53: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Meronymy relations

Part /component {hand}{arm}

Element of a colectivity {person}{people}

Sustance {newspaper}{paper}

WordNet & EuroWordNet WordNet Semantic Relations

Page 54: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Antonymy {big}{small}

Causation {kill}{die}

Implication {divorce}{marry}

Derivation {presidential}{president}

Simililarity {good}{positive}

WordNet & EuroWordNet WordNet Semantic Relations

Page 55: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

WordNet & EuroWordNetWordNet example

<cruiser, squad car, patrol car, ...>

<cruiser, squad car, patrol car, ...>

<cab, taxi, hack, ...>

<motor vehicle, automovile,...>

<vehicle>

<conveyance>

<car door>

<doorlock>

Page 56: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 56

Version nouns verbs adj adv synsetsrelations

WN1.5 51,253 8,847 13,4603,145 76,705103,445 WN1.6 66,02512,127 17,9153,575 99,642138,741WN1.7 74,48812,75418,5233,612 109,377 151,546XWN 74,48812,75418,5233,612 109,377 551,551WN1.7.1 75,80413,21418,5763,629 111,223 153,781WN2.0 79,68913,50818,5633,664 115,424 204,074

WordNet & EuroWordNetWordNet volumes

Page 57: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Project LE-2 4003 Telematics Application Programme de la UE

Integrated local wordnets in several languages English Sheffield Dutch Amsterdam Italian Pisa Spanish UB, UPC, UNED.

Computers and the Humanities (Vossen 98)

http://www.hum.uva.nl/~ewn/

WordNet & EuroWordNetEuroWordNet

Page 58: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Page 59: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

(forall (?X) (or (not (instance ?X FloweringPlant)) (not (instance ?X BodyPart)))) YES.

(exists (?X) (and (instance ?X FloweringPlant) (instance ?X BodyPart))) NO.

(forall (?Y) (forall (?X) (=> (equal ?X ?Y) (equal ?X ?Y)))) YES.

(forall (?X) (=> (instance ?X Flower) (instance ?X Organ))) YES.

(instance Ear FloweringPlant) NO.(instance Ear Organ) NO.(subclass Ear Organ) YES.

(forall (?X) (=> (not (subclass ?X Organ)) (subclass ?X Flower) )) NO.

(forall (?X) (=> (subclass ?X Flower) (subclass ?X Organ) )) YES.

SUMOReasoning with Sigma (Vampire)

Page 60: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

A.1) First labelling:Conceptual Distance (Agirre et al. 94)

length of the shortest path specificity of the concepts

)c,path(cc kwcwc

21

i2i1k2i2

1i1 )depth(c

1min)w,dist(w

using WordNet Bilingual dictionary

Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

Page 61: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

<object, ...>

<artifact, artefact>

<entity><entity>

<structure, construction>

<building, edifice>

<place of worship, ...>

<church, church building>

<abbey>

<house, lodging>

abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess)

<religious residence, cloiser>

<monastery>

<abbey>

<convent>

<abbey>

Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

Page 62: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

<object, ...>

<artifact, artefact>

<entity>

<structure, construction>

<building, edifice>

<place of worship, ...>

<church, church building>

<abbey> 06 ARTIFACT

<house, lodging>

abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess)

<religious residence, cloiser>

<monastery>

<abbey>

<convent>

<abbey>

Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

Page 63: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

A.1) First labelling (Results)

29,205 labelled definitions (31% coverage)

61% accuracy at a sense level 64% accuracy at a file level

Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

Page 64: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

A.2) Second labelling:Salient Words (Yarowsky 92)

P r ( w )

S C )|P r ( wl o gS C )|P r ( wS C )A R ( w , 2

Importance local frequency appears more significantly more often in the

corpus of a semantic category than at other points in the whole corpus

Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

Page 65: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

A.2) Second labelling (Results):

86,759 labelled definitions (93%) 80% accuracy at a file level

biberón_1_1 ARTIFACT 4.8399 Frasco de cristal ... (glass flask ...)

biberón_1_2 FOOD 7.4443 Leche que contiene este frasco ... (milk contained in that flask ...)

Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

Page 66: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

C1C1

C2C2

C3C3

C5C5

C6C6

C4C4

Merge approach: Mapping Taxonomies Step 3: Creation of translation links

Page 67: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

C1C1

C2C2

C3C3

C5C5

C6C6

C4C4

Merge approach: Mapping Taxonomies Step 3: Creation of translation links

Page 68: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

SUMOIntroduction

The Suggested Upper Merged Ontology (SUMO) IEEE Standard Upper Ontology Working Group An upper ontology is limited to concepts that are

meta, generic, abstract, general enough to address a broad range of domain areas.

To promote: Interoperability Information Search and retrieval Automated inference NLP Developement of Domain ontologies

Page 69: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR0

vaso_1 02755829n 06-NOUN.ARTIFACT FACTOTUMGLOSS: a glass container for holding liquids while drinking

TO: 1stOrderEntity-Form-ObjectTO: 1stOrderEntity-Origin-ArtifactTO: 1stOrderEntity-Function-ContainerTO: 1stOrderEntity-Function-Instrument

EN: drinking_glass glassIT: bicchiereBA: edontzi baso edalontziCA: got vas

DOBJ SemCor00849393v 0.0074 polish shine smooth ...00201878v 0.0013 beautify embellish prettify00826635v 0.0010 get_hold_of take00140937v 0.0001 ameliorate amend ...00083947v 0.0000 alter change

Page 70: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR0

vaso_2 04195626n 08-NOUN.BODY ANATOMYGLOSS: a tube in which a body fluid circulates

TO: 1stOrderEntity-Form-Substance-SolidTO: 1stOrderEntity-Origin-Natural-LivingTO: 1stOrderEntity-Composition-PartTO: 1stOrderEntity-Function-Container

EN: vessel vasIT: vaso canaleBA: hodi basoCA: vas

DOBJ SemCor SUBJ SemCor01781222v 0.0334 be occur 01831830v 0.0133 stop

terminate00058757v 0.0072 inject shoot 01357963v 0.0127 flow

travel_along01357963v 0.0068 flow travel_along 01830886v 0.0043 discontinue00055849v 0.0045 administer dispense ... 01779664v 0.0008 cease end

finish ...

Page 71: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR0

vaso_3 09914390n 23-NOUN.QUANTITY NUMBER GLOSS: the quantity a glass will hold

TO: 1stOrderEntity-Composition-PartTO: 2ndOrderEntity-SituationType-StaticTO: 2ndOrderEntity-SituationComponent-Quantity

EN: glassful glassIT: bicchierata bicchiereBA: basokadaCA: got vas

DOBJ SemCor00795711v 0.0026 drink imbibe01530096v 0.0009 accept have take00786286v 0.0009 consume have ingest take take_in01513874v 0.0001 acquire get

Page 72: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR

Page 73: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR

Page 74: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR

Page 75: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR1vaso_1 02755829n 06-NOUN.ARTIFACT FACTOTUMSUMO:     &%Artifact+ LOGICAL FORMULA:

glass:NN(x1) -> glass:NN(x1) container:NN(x2) for:IN(x1, e1) hold:VB(e1, x1, x3) liquid:NN(x3) while:IN(e0, e2) drink:VB(e2, x1)

PARSING: (TOP (S (NP (NN glass) )        (VP (VBZ is)            (NP (NP (DT a) (NN glass) (NN container) )                (PP (IN for)                    (S (VP (VBG holding)                           (PP (NP (NNS liquids) )                               (IN while) )                           (VBG drinking) ) ) ) ) )        (. .) ) )

WSD: <wf pos="DT" >a</wf> <wf pos="NN" lemma="glass" quality="silver" wnsn="2" >glass</wf> <wf pos="NN" lemma="container" quality="silver" wnsn="1" >container</wf> <wf pos="IN" >for</wf> <wf pos="VBG" lemma="hold" quality="normal" wnsn="8" >holding</wf> <wf pos="NNS" lemma="liquid" quality="normal" wnsn="1" >liquids</wf> <wf pos="IN" >while</wf> <wf pos="VBG" lemma="drink" quality="normal" wnsn="1" >drinking</wf>

Page 76: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR1

vaso_2 04195626n 08-NOUN.BODY ANATOMY SUMO:    &%BodyVessel+ LOGICAL FORMULA:

vessel:NN(x1) -> tube:NN(x1) in:IN(x2, x3) body_fluid:NN(x2) circulate:VB(e1, x2)

PARSING: (TOP (S (NP (NN vessel) )        (VP (VBZ is)            (NP (NP (DT a) (NN tube) )                (SBAR (WHPP (IN in)                            (WHNP (WDT which) ) )                      (S (NP (DT a) (NN body) (NN fluid) )                         (VP (VBZ circulates) ) ) ) ) )        (. .) ) )

WSD: <wf pos="DT" >a</wf> <wf pos="NN" lemma="tube" quality="gold" wnsn="4" wnsn="4" >tube</wf> <wf pos="IN" >in</wf> <wf pos="WDT" >which</wf> <wf pos="DT" >a</wf> <wf pos="NN" lemma="body_fluid" quality="silver" wnsn="1" >body_fluid</wf> <wf pos="VBZ" lemma="circulate" quality="gold" wnsn="4" wnsn="4“ >circulates</wf>

Page 77: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR1

vaso_3 09914390n 23-NOUN.QUANTITY NUMBER SUMO:    &%ConstantQuantity+ LOGICAL FORMULA:

glass:NN(x1) -> quantity:NN(x1) glass:NN(x2) hold:VB(e1, x2) PARSING:

(TOP (S (NP (NN glass) )        (VP (VP (VBZ is)                (NP (DT the) (NN quantity) )                (NP (DT a) (NN glass) ) )            (VP (MD will)                (VP (VB hold) ) ) )        (. .) ) )

WSD: <wf pos="DT" >the</wf> <wf pos="NN" lemma="quantity" quality="silver" wnsn="1" >quantity</wf> <wf pos="DT" >a</wf> <wf pos="NN" lemma="glass" quality="normal" wnsn="2" >glass</wf> <wf pos="MD" >will</wf> <wf pos="VB" lemma="hold" quality="normal" wnsn="1" >hold</wf>

Page 78: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR and consistency checking

00536235n blow &%Breathing+ anatomy 00005052v blow &%Breathing+ medicine

00003430v exhale &%Breathing+ biology 00003142v exhale &%Breathing+ medicine 00899001a exhaled &%Breathing+ factotum 00263355a exhaling &%Breathing+ factotum

00536039n expiration &%Breathing+ anatomy 02849508a expiratory &%Breathing+ anatomy 00003142v expire &%Breathing+ medicine

02579534a inhalant &%Breathing+ anatomy 00536863n inhalation &%Breathing+ anatomy 00003763v inhale &%Breathing+ medicine 00898664a inhaled &%Breathing+ factotum 00263512a inhaling &%Breathing+ factotum

00537041n pant &%Breathing+ anatomy 00004002v pant &%Breathing+ medicine 00535106n panting &%Breathing+ anatomy 00264603a panting &%Breathing+ factotum 00411482r pantingly &%Breathing+ factotum

...

Page 79: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: MCR and consistency checking

Page 80: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Does an orchard apple tree have leaves? Does an orchad apple tree have fruits? Does a cactus have leaves?

MEANING: MCR and consistency checking

Page 81: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

Use and design of ontologies for NLP and the Semantic Web

Example SUMO: Boiling

(subclass Boiling StateChange) (documentation Boiling "The Class of Processes where an Object is

heated and converted from a Liquid to a Gas.") (=>

    (instance ?BOIL Boiling)     (exists         (?HEAT)         (and             (instance ?HEAT Heating)             (subProcess ?HEAT ?BOIL))))

"if instance BOIL Boiling, then there exists HEAT such that instance HEAT Heating and subProcess HEAT BOIL"

MEANING: MCR and consistency checking

Page 82: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING

hospital_1building_industrymedicinetown_planningartifact StationaryArtifact+Artifact+Building+Object+

health_falicility_1building_industrymedicinetown_planningartifact Building+Artifact+Building+Object+

ISA

hospital_1 a health facility where patients receive treatment

where

patient_1medicineperson patient+Function+Human+Living+Object+

receive_2

treatement_1

factotumchange Getting+Dynamic=Experience=

medicine act TherapeuticProcess+Agentive=Cause+Condition=Dynamic=Purpose=Social=UnboundedEvent+

Page 83: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING

Frame Elements Core Type

Affliction Core

Body_part Core

Degree Peripheral

Duration Extra-Thematic

Healer Core

Manner Peripheral

Medication Core

Motivation Extra-Thematic

Patient Core

Place Peripheral

Purpose Extra-Thematic

Time Peripheral

Treatment Core

FRAMENET: cure.n

Page 84: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING

hospital_1building_industrymedicinetown_planningartifact StationaryArtifact+Artifact+Building+Object+

health_falicility_1building_industrymedicinetown_planningartifact Building+Artifact+Building+Object+

ISA

hospital_1 a health facility where patients receive treatment

where

patient_1medicineperson patient+Function+Human+Living+Object+

receive_2

treatement_1

factotumchange Getting+Dynamic=Experience=

medicine act TherapeuticProcess+Agentive=Cause+Condition=Dynamic=Purpose=Social=UnboundedEvent+

PLACE PATIENT TREATEMENT

PATIENT TREATEMENT

PLACE

Page 85: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 85

#rel. offset synset2338 18 00017954-n group 1,grouping 1

0 19 05962976-n social group 1 729 37 05997592-n organisation 2,organization 1 30 10 06002286-n establishment 2,institution 1 15 12 06023733-n faith 3,religion 2 62 5 06024357-n Christianity 2,church 1,Christian church 1

11 14 00001740-n entity 1,something 1 51 29 00009457-n ob ject 1,physical ob ject 1 1 39 00011937-n artifact 1,artefact 1 68 63 03431817-n construction 3,structure 1 50 79 02347413-n building 1,edifice 1 0 11 03135441-n place of worship 1,house of prayer 1,house of God 1 59 19 02438778-n church 2,church building 1

25 20 00017487-n act 2,human action 1,human activity 1 611 69 00261466-n activity 1 2 5 00662816-n ceremony 3 0 11 00663517-n religious ceremony 1,religious ritual 1 243 7 00666638-n service 3,religious service 1,divine service 1 11 1 00666912-n church 3,church service 1

MEANINGAutomatic selection of Base Concepts

Page 86: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 86

Version nouns verbs adj adv synsetsrelations

WN(1.6, 2.0) 74,48812,75418,5233,612 109,377 180,303 XWN 550,922Semcor203,546BNC 707,618Total 1.642,389

MEANINGMCR volumes

Page 87: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP 87

Page 88: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

MEANING: Semantic Interpretation

(Aznar) (invita) (al Dalai_Lama) (a un almuerzo oficial)

invitar objetoorigen

acto

destino

Page 89: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Conclusion

NLU: semántica y pragmática

COLECCIÓN DE PREGUNTAS Y RESPUESTAS HECHAS EN JUICIOS CELEBRADOS EN ESPAÑA.

Publicado en la revista del Colegio de Abogados de Madrid.

"¿Estaba usted presente cuando le tomaron la foto?" "¿Estaba usted solo, o era el único?" "¿Fue usted, o su hermano menor, quien murió en la

guerra?" "¿Afirma que él le mató a usted?" "¿A qué distancia estaban los vehículos en el momento de la

colisión?“ "Usted permaneció allí hasta que se marchó, ¿no es cierto?"

Page 90: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Conclusion

Pregunta: "Doctor, ¿cuantas autopsias ha realizado usted a personas fallecidas?"

Respuesta: "Todas mis autopsias las practiqué a personas fallecidas“

Pregunta: "Cada una de sus respuestas debe ser verbal, ¿de acuerdo?" "¿A qué escuela fue usted?"

Respuesta: "Verbal" (risas y comentarios jocosos en la sala)

Pregunta: "¿Recuerda usted a qué hora examinó el cadáver?"

Respuesta: "La autopsia comenzó alrededor de las 8:30" Pregunta: "¿El Sr. Perez Tomadilla estaba muerto en ese

momento?" Respuesta: "No, estaba sentado en la mesa preguntándose

por que estaba yo haciéndole la autopsia."

Pregunta: "¿Le dispararon en medio de la refriega?" Respuesta: "No, me dispararon entre la refriega y el

ombligo."

Page 91: AI and NLP German Rigau i Claramunt german.rigau@ehu.es IXA group Departamento de Lenguajes y Sistemas Informáticos UPV/EHU

AI and NLP

Conclusion

Pregunta: "Doctor, ¿antes de realizar la autopsia, verificó si había pulso?"

Respuesta: "No" Pregunta: "¿Comprobó la presión sanguínea?" Respuesta: "No" Pregunta: "¿Verificó si respiraba?" Respuesta: "No" Pregunta: "¿Entonces, es posible que el paciente estuviera

vivo cuando usted comenzó la autopsia?" Respuesta: "No" Pregunta: "¿Cómo puede usted estar tan seguro, Doctor?" Respuesta: "Porque su cerebro estaba sobre mi mesa, en un

tarro" Pregunta: "¿Pero podría, no obstante, haber estado aún vivo

el paciente?“ Respuesta: "Es posible que hubiera estado vivo y ejerciendo

de abogado en alguna parte."